Vector Databases Explained
Every RAG system, semantic search engine, and recommendation feature built on embeddings needs a vector database. But “vector database” is a term that gets overloaded. This post explains what’s actually happening, where the complexity lives, and how to configure Qdrant — the best open-source option — for production use.
What Problem Do They Solve?
An embedding model turns text (or images, audio, etc.) into a dense float vector — say, 1536 or 3072 numbers. Semantic similarity maps to geometric proximity: similar meanings → nearby vectors in the embedding space.
The search problem is: given a query vector, find the k most similar vectors in a collection of millions or billions. Exact nearest neighbor search is O(n) — you’d compute the cosine distance between your query and every stored vector. At 10M vectors with 1536 dimensions, that’s 30 billion float operations per query. Too slow.
Vector databases solve this with Approximate Nearest Neighbor (ANN) algorithms. They trade a small accuracy loss for orders-of-magnitude speed gains.
How HNSW Works
The dominant ANN algorithm today is Hierarchical Navigable Small World (HNSW). The key intuitions:
- Graph structure: Vectors are nodes. Similar vectors are connected by edges.
- Hierarchical layers: Multiple layers of graphs, coarser at the top, denser at the bottom. Entry is at the top layer.
- Greedy search: Start at a random node in the top layer, greedily traverse to the nearest neighbor, then descend to the next layer and repeat.
The result: search is O(log n). At 10M vectors, you’re exploring hundreds of candidates instead of millions.
The two knobs that matter for HNSW:
| Parameter | Effect | Default | When to Increase |
|---|---|---|---|
m |
Edges per node | 16 | Higher recall on sparse data |
ef_construct |
Build-time search width | 100 | Better index quality, slower build |
from qdrant_client.models import VectorParams, Distance, HnswConfigDiff
await client.create_collection(
collection_name="my_docs",
vectors_config={
"dense": VectorParams(
size=3072,
distance=Distance.COSINE,
hnsw_config=HnswConfigDiff(
m=16, # standard
ef_construct=100, # increase to 200 for better recall
),
)
},
)Dense vs. Sparse vs. Hybrid Search
Dense Vectors
Generated by neural embedding models. Capture semantic meaning. Struggle with exact keyword matching and rare terms (e.g., product codes, proper nouns not in training data).
Sparse Vectors (BM25 / SPLADE)
The classic keyword search approach. Each dimension corresponds to a vocabulary term. Excellent at exact matching, terrible at synonyms and paraphrase.
Hybrid: Best of Both
Qdrant supports hybrid search natively using Reciprocal Rank Fusion (RRF) — run both retrievals, merge the ranked lists, take the top-k from the merged result.
from qdrant_client.models import Prefetch, FusionQuery, Fusion, SparseVector
results = await client.query_points(
collection_name="my_docs",
prefetch=[
Prefetch(query=dense_vector, using="dense", limit=20),
Prefetch(query=SparseVector(indices=sparse_indices, values=sparse_values),
using="sparse", limit=20),
],
query=FusionQuery(fusion=Fusion.RRF),
limit=5,
with_payload=True,
)For most production RAG systems, hybrid search improves recall by 10–20% over dense-only, especially on domain-specific or technical content.
Payload Filtering
A vector database isn’t just for retrieval — you also need to filter by metadata. Qdrant stores arbitrary JSON payloads alongside vectors and can filter at query time without a post-processing step.
from qdrant_client.models import Filter, FieldCondition, MatchValue
results = await client.query_points(
collection_name="my_docs",
query=query_vector,
using="dense",
query_filter=Filter(
must=[
FieldCondition(key="source", match=MatchValue(value="technical_manual")),
FieldCondition(key="year", range={"gte": 2024}),
]
),
limit=5,
)Important: Create payload indexes for fields you filter on. Without an index, Qdrant scans the full payload for every candidate — that erases the ANN speed advantage.
await client.create_payload_index(
collection_name="my_docs",
field_name="source",
field_schema="keyword",
)Choosing a Vector Database
| DB | Best For | Notes |
|---|---|---|
| Qdrant | Production open-source | Rust core, excellent Python SDK, hybrid search built-in |
| Pinecone | Managed cloud, fast setup | Expensive at scale, less control |
| pgvector | Already on PostgreSQL | HNSW support since 0.7.0, but slower than dedicated DBs |
| Weaviate | GraphQL API, multi-modal | Heavier operationally |
| Chroma | Local dev and prototyping | Not production-tested at scale |
For most teams building RAG: Qdrant on Docker locally, Qdrant Cloud for production. The API is identical; you change one URL string.
Production Checklist
A vector database is infrastructure. Get it right once, and it runs quietly for years. Get it wrong — missing indexes, sync client in async code, no backups — and you’ll feel it in production.