Governance
Data Platform Practices
Data platforms need product ownership, contracts, and operational controls. Treat datasets, topics, jobs, and dashboards as production interfaces.
Ownership
Every important data asset should identify:
- Business owner and technical owner.
- Producer and consumer teams.
- Data classification and retention requirements.
- Freshness, quality, and availability expectations.
- Backfill and replay responsibilities.
Contracts
Use explicit contracts for events, tables, and APIs.
- Version schemas and validate compatibility in CI.
- Document required fields, optional fields, and semantics.
- Publish ownership and support expectations.
- Avoid breaking consumers silently.
- Track lineage for regulated or high-impact data.
Quality checks
Data quality checks should run where failure is actionable.
Common checks include:
- Freshness and completeness.
- Null, range, and uniqueness constraints.
- Referential integrity.
- Volume anomalies.
- Schema drift.
- Dead-letter and replay rates.
Backfills and replays
Backfills and replays are production changes.
- Require a plan, owner, blast radius, and rollback strategy.
- Test on representative data before full execution.
- Rate-limit when downstream systems can be overwhelmed.
- Record what data changed and why.
- Communicate consumer impact before and after execution.
Retention and deletion
Retention is an architecture decision, not a storage cleanup task. Define deletion behavior, legal hold handling, encryption, and audit requirements before the platform accumulates sensitive data indefinitely.