Hybrid Memory Retrieval: What Actually Happens at 1,000 Documents

There’s a common assumption in AI agent systems:

Combining keyword search with embeddings should outperform either alone.

We tested that assumption under realistic noise.

This is not a toy corpus.

The Results

With 1,000 semantically similar distractors, lexical matching drops to 31%.

When documents share vocabulary, keyword overlap stops being discriminative.

Measured, not assumed.

On vague questions like:

BM25: 25% Semantic: 83%

Embeddings capture abstraction and context beyond surface terms.

At this scale, hybrid routing reaches the same 83.3% as semantic-only.

It preserves lexical fallback but does not increase recall when embeddings already dominate.

This is important: hybrid adds safety, not magic lift.

Across all 3 seeds, these queries consistently failed:

These expose limits of embedding-only ranking.

Next step: cross-encoder reranking.

Compared to our previous GAM configuration (66.7% Recall@5 at similar scale), the current system improves to 83.3% (+16.6 points).

Real improvement. Not breakthrough.

Full methodology in GAM Whitepaper v3.6.

Reproducible harness. Code shipping soon.