Night Mode LabsBlue Book
Governance

Data Platform Practices

Data platforms need product ownership, contracts, and operational controls. Treat datasets, topics, jobs, and dashboards as production interfaces.

Ownership

Every important data asset should identify:

  • Business owner and technical owner.
  • Producer and consumer teams.
  • Data classification and retention requirements.
  • Freshness, quality, and availability expectations.
  • Backfill and replay responsibilities.

Contracts

Use explicit contracts for events, tables, and APIs.

  • Version schemas and validate compatibility in CI.
  • Document required fields, optional fields, and semantics.
  • Publish ownership and support expectations.
  • Avoid breaking consumers silently.
  • Track lineage for regulated or high-impact data.

Quality checks

Data quality checks should run where failure is actionable.

Common checks include:

  • Freshness and completeness.
  • Null, range, and uniqueness constraints.
  • Referential integrity.
  • Volume anomalies.
  • Schema drift.
  • Dead-letter and replay rates.

Backfills and replays

Backfills and replays are production changes.

  • Require a plan, owner, blast radius, and rollback strategy.
  • Test on representative data before full execution.
  • Rate-limit when downstream systems can be overwhelmed.
  • Record what data changed and why.
  • Communicate consumer impact before and after execution.

Retention and deletion

Retention is an architecture decision, not a storage cleanup task. Define deletion behavior, legal hold handling, encryption, and audit requirements before the platform accumulates sensitive data indefinitely.

On this page