Engineering

Hybrid search explained: vectors and BM25 beat both alone

Dmitrii Kuzmenkov

Software Engineer, IndexFox.ai

November 4, 2025 6 min read Updated February 10, 2026

If a single sentence summarized the last three years of retrieval research, it would be this: vectors and BM25 are not competing. They are complementary, and you should run both.

Where each one wins, where each one fails

Query type	BM25	Vector
"iPhone 15 Pro Max 256GB"	✅ Exact	⚠️ Drifts
"phones with good cameras"	⚠️ Literal	✅ Semantic
"SKU-A8472X"	✅ Exact	❌ Useless
"why does my plant have yellow leaves"	⚠️ Partial	✅ Semantic
"refund policy"	✅ Exact	✅ Semantic

How ManticoreSearch fuses them

The query we send is, in spirit, this:

SELECT id, hybrid_score() AS s
  FROM idx_pages
 WHERE MATCH('phones with good cameras')
   AND knn(embedding, 50, query_vector)
 ORDER BY s DESC
 LIMIT 20
 OPTION fusion_method='rrf', fusion_weights=(text=0.7, dense=0.3);

Manticore returns documents that match the keyword expression and/or fall within the vector neighborhood, with a fused score produced server-side. We then re-score with a reranker on the top candidates.

The important detail: we let the engine fuse, not the application. Application-side fusion (RRF, weighted sum) is what most early-2024 RAG tutorials taught. Manticore added native engine-side RRF fusion in v24.2 — same algorithm, but the engine has the document statistics, position, and proximity context to do the join properly, and the results are measurably better than doing it in our app code.

When to skip the vector pass

Not every query benefits from vectors. We classify queries up-front:

Looks like a question (question mark, starts with "how/why/what", 3+ tokens, language-specific particles) → vector + keyword.
Looks like a token lookup (1-2 tokens, mostly nouns, possibly a SKU) → keyword only. Vectors add latency and worse results.

The classifier is a 30-line scoring function, not an LLM. The threshold is calibrated against a few thousand real queries from our customers. Spending compute on classification is a bad trade when the heuristic gets you to ~92% agreement with a model for free.

What about pure dense retrieval?

It's tempting. One model, one index, one knob. We tried it for six months. The complaint we kept hearing was "the widget can't find our product names." Embeddings are lossy for proper nouns. They smear "Roomba i7" into "robot vacuum cleaner" and ship back blog posts instead of the product page.

The fix is not a better embedding model. The fix is BM25 sitting next to it.

Want to play with this?

Our free Website to Embeddings tool will let you embed any URL and inspect the segments. The Website Analyzer shows the same content as our crawler would see it. Both are useful sanity checks before you commit to a retrieval architecture.

Where each one wins, where each one fails

How ManticoreSearch fuses them

When to skip the vector pass

What about pure dense retrieval?

Want to play with this?

Want a search widget that does all this for you?