Vector Databases Explained

Vector DB

Qdrant

Infrastructure

What vector databases actually do, how approximate nearest neighbor search works, and how to choose and configure one for your AI application.

Author

Sourangshu Pal

Published

November 28, 2025

Every RAG system, semantic search engine, and recommendation feature built on embeddings needs a vector database. But “vector database” is a term that gets overloaded. This post explains what’s actually happening, where the complexity lives, and how to configure Qdrant — the best open-source option — for production use.

What Problem Do They Solve?

An embedding model turns text (or images, audio, etc.) into a dense float vector — say, 1536 or 3072 numbers. Semantic similarity maps to geometric proximity: similar meanings → nearby vectors in the embedding space.

The search problem is: given a query vector, find the k most similar vectors in a collection of millions or billions. Exact nearest neighbor search is O(n) — you’d compute the cosine distance between your query and every stored vector. At 10M vectors with 1536 dimensions, that’s 30 billion float operations per query. Too slow.

Vector databases solve this with Approximate Nearest Neighbor (ANN) algorithms. They trade a small accuracy loss for orders-of-magnitude speed gains.

How HNSW Works

The dominant ANN algorithm today is Hierarchical Navigable Small World (HNSW). The key intuitions:

Graph structure: Vectors are nodes. Similar vectors are connected by edges.
Hierarchical layers: Multiple layers of graphs, coarser at the top, denser at the bottom. Entry is at the top layer.
Greedy search: Start at a random node in the top layer, greedily traverse to the nearest neighbor, then descend to the next layer and repeat.

The result: search is O(log n). At 10M vectors, you’re exploring hundreds of candidates instead of millions.

The two knobs that matter for HNSW:

Parameter	Effect	Default	When to Increase
`m`	Edges per node	16	Higher recall on sparse data
`ef_construct`	Build-time search width	100	Better index quality, slower build

from qdrant_client.models import VectorParams, Distance, HnswConfigDiff

await client.create_collection(
    collection_name="my_docs",
    vectors_config={
        "dense": VectorParams(
            size=3072,
            distance=Distance.COSINE,
            hnsw_config=HnswConfigDiff(
                m=16,           # standard
                ef_construct=100,  # increase to 200 for better recall
            ),
        )
    },
)

Dense vs. Sparse vs. Hybrid Search

Dense Vectors

Generated by neural embedding models. Capture semantic meaning. Struggle with exact keyword matching and rare terms (e.g., product codes, proper nouns not in training data).

Sparse Vectors (BM25 / SPLADE)

The classic keyword search approach. Each dimension corresponds to a vocabulary term. Excellent at exact matching, terrible at synonyms and paraphrase.

Hybrid: Best of Both

Qdrant supports hybrid search natively using Reciprocal Rank Fusion (RRF) — run both retrievals, merge the ranked lists, take the top-k from the merged result.

from qdrant_client.models import Prefetch, FusionQuery, Fusion, SparseVector

results = await client.query_points(
    collection_name="my_docs",
    prefetch=[
        Prefetch(query=dense_vector, using="dense", limit=20),
        Prefetch(query=SparseVector(indices=sparse_indices, values=sparse_values),
                 using="sparse", limit=20),
    ],
    query=FusionQuery(fusion=Fusion.RRF),
    limit=5,
    with_payload=True,
)

For most production RAG systems, hybrid search improves recall by 10–20% over dense-only, especially on domain-specific or technical content.

Payload Filtering

A vector database isn’t just for retrieval — you also need to filter by metadata. Qdrant stores arbitrary JSON payloads alongside vectors and can filter at query time without a post-processing step.

from qdrant_client.models import Filter, FieldCondition, MatchValue

results = await client.query_points(
    collection_name="my_docs",
    query=query_vector,
    using="dense",
    query_filter=Filter(
        must=[
            FieldCondition(key="source", match=MatchValue(value="technical_manual")),
            FieldCondition(key="year", range={"gte": 2024}),
        ]
    ),
    limit=5,
)

Important: Create payload indexes for fields you filter on. Without an index, Qdrant scans the full payload for every candidate — that erases the ANN speed advantage.

await client.create_payload_index(
    collection_name="my_docs",
    field_name="source",
    field_schema="keyword",
)

Choosing a Vector Database

DB	Best For	Notes
Qdrant	Production open-source	Rust core, excellent Python SDK, hybrid search built-in
Pinecone	Managed cloud, fast setup	Expensive at scale, less control
pgvector	Already on PostgreSQL	HNSW support since 0.7.0, but slower than dedicated DBs
Weaviate	GraphQL API, multi-modal	Heavier operationally
Chroma	Local dev and prototyping	Not production-tested at scale

For most teams building RAG: Qdrant on Docker locally, Qdrant Cloud for production. The API is identical; you change one URL string.

Production Checklist

Index built with ef_construct >= 100; raise to 200 if recall is below expectations
Payload indexes created for all filter fields
Collection backed by persistent storage (not in-memory)
Async client (AsyncQdrantClient) for all FastAPI or async paths
Batch upsert — don’t insert individual vectors in a loop
Collection snapshots scheduled for backup

A vector database is infrastructure. Get it right once, and it runs quietly for years. Get it wrong — missing indexes, sync client in async code, no backups — and you’ll feel it in production.