AI and Data Platforms
Retrieval and Vector Search
Retrieval systems provide grounding for search, assistants, and knowledge workflows. They need ownership, freshness, access control, and quality measurement.
Retrieval pipeline
Design decisions
Define:
- Source systems and owners.
- Sync frequency and freshness expectations.
- Chunking strategy and metadata.
- Embedding model and version.
- Vector store or search backend.
- Access-control filtering.
- Reranking or hybrid search needs.
- Evaluation dataset and quality metrics.
Access control
Retrieval must respect document permissions. If a user cannot access a source document, the system must not reveal its content through snippets, summaries, citations, or generated answers.
Use metadata filters, scoped indexes, or query-time authorization checks based on the source system's permission model.
Quality signals
Track:
- Recall on known answer sets.
- Citation accuracy.
- No-result and low-confidence rates.
- Stale content rate.
- Indexing failures.
- User feedback and corrections.
- Latency and cost per query.
Watchouts
- Bigger chunks are not automatically better.
- Embedding model changes can require reindexing and evaluation.
- Stale or duplicated content creates confident wrong answers.
- Vector search does not replace information architecture.