Vector Databases Explained

Vector DB
Qdrant
Search
Infrastructure
What vector databases actually do, how approximate nearest neighbor search works, and how to choose and configure one for your AI application.
Author

Sourangshu Pal

Published

November 28, 2025

Every RAG system, semantic search engine, and recommendation feature built on embeddings needs a vector database. But “vector database” is a term that gets overloaded. This post explains what’s actually happening, where the complexity lives, and how to configure Qdrant — the best open-source option — for production use.

What Problem Do They Solve?

An embedding model turns text (or images, audio, etc.) into a dense float vector — say, 1536 or 3072 numbers. Semantic similarity maps to geometric proximity: similar meanings → nearby vectors in the embedding space.

The search problem is: given a query vector, find the k most similar vectors in a collection of millions or billions. Exact nearest neighbor search is O(n) — you’d compute the cosine distance between your query and every stored vector. At 10M vectors with 1536 dimensions, that’s 30 billion float operations per query. Too slow.

Vector databases solve this with Approximate Nearest Neighbor (ANN) algorithms. They trade a small accuracy loss for orders-of-magnitude speed gains.

How HNSW Works

The dominant ANN algorithm today is Hierarchical Navigable Small World (HNSW). The key intuitions:

  1. Graph structure: Vectors are nodes. Similar vectors are connected by edges.
  2. Hierarchical layers: Multiple layers of graphs, coarser at the top, denser at the bottom. Entry is at the top layer.
  3. Greedy search: Start at a random node in the top layer, greedily traverse to the nearest neighbor, then descend to the next layer and repeat.

The result: search is O(log n). At 10M vectors, you’re exploring hundreds of candidates instead of millions.

The two knobs that matter for HNSW:

Parameter Effect Default When to Increase
m Edges per node 16 Higher recall on sparse data
ef_construct Build-time search width 100 Better index quality, slower build
from qdrant_client.models import VectorParams, Distance, HnswConfigDiff

await client.create_collection(
    collection_name="my_docs",
    vectors_config={
        "dense": VectorParams(
            size=3072,
            distance=Distance.COSINE,
            hnsw_config=HnswConfigDiff(
                m=16,           # standard
                ef_construct=100,  # increase to 200 for better recall
            ),
        )
    },
)

Payload Filtering

A vector database isn’t just for retrieval — you also need to filter by metadata. Qdrant stores arbitrary JSON payloads alongside vectors and can filter at query time without a post-processing step.

from qdrant_client.models import Filter, FieldCondition, MatchValue

results = await client.query_points(
    collection_name="my_docs",
    query=query_vector,
    using="dense",
    query_filter=Filter(
        must=[
            FieldCondition(key="source", match=MatchValue(value="technical_manual")),
            FieldCondition(key="year", range={"gte": 2024}),
        ]
    ),
    limit=5,
)

Important: Create payload indexes for fields you filter on. Without an index, Qdrant scans the full payload for every candidate — that erases the ANN speed advantage.

await client.create_payload_index(
    collection_name="my_docs",
    field_name="source",
    field_schema="keyword",
)

Choosing a Vector Database

DB Best For Notes
Qdrant Production open-source Rust core, excellent Python SDK, hybrid search built-in
Pinecone Managed cloud, fast setup Expensive at scale, less control
pgvector Already on PostgreSQL HNSW support since 0.7.0, but slower than dedicated DBs
Weaviate GraphQL API, multi-modal Heavier operationally
Chroma Local dev and prototyping Not production-tested at scale

For most teams building RAG: Qdrant on Docker locally, Qdrant Cloud for production. The API is identical; you change one URL string.

Production Checklist

A vector database is infrastructure. Get it right once, and it runs quietly for years. Get it wrong — missing indexes, sync client in async code, no backups — and you’ll feel it in production.