Better RAG Results with Reciprocal Rank Fusion and Hybrid Search

>>johnjw+(OP)
I also found pure RAG with vector search to not work. I was creating a bot that could find answers to questions about things by looking at Slack discussions.

At first, I downloaded entire channels, loaded them into a vector DB, and did RAG. The results sucked. Vector searches don't understand things very well, and in this world, specific keywords and error messages are very searchable.

Instead, I take the user's query, ask an LLM (Claude / Bedrock) to find keywords, then search Slack using the API, get results, and use an LLM to filter for discussions that are relevant, then summarize them all in a response.

This is slow, of course, so it's very multi-threaded. A typical response will be within 30 seconds.

>>thefou+G71
I find these discussions funny.

For decades we had search engines based on the query terms (keywords). Then there were lots of discussions and some implementations to put a semantic search on top of it to improve the keyword search. A hybrid search. Google Search did exactly that already in 2015 [1].

Now we start from pure semantic search and put keyword search on top of it to improve the semantic search and call it hybrid search.

In both approaches, the overall search performance is exactly identical - to the last digit.

I am glad, that so far, no one has called this an innovation. But you could certainly write a lot of blog articles about it.

[1] https://searchengineland.com/semantic-search-entity-based-se...

>>s-mack+Ti2
Except now the semantic capabilities are so much stronger. The transformer allows the model to get meaning from words that are far apart from each other

>>spence+hZ2
You are talking about English, right? And only for searches without any special technical terms or abbreviations?

Also my use case includes more than 20 languages. To find usable embeddings for all languages is next to impossible. However, there are keyword plugins for most languages in Solr or ElasticSearch.

Btw. In my benchmarks the result look something like this in English (MAP=mean average precision):

BM25(keyword search) -> MAP=45%

Embedding (Ada-002) -> MAP=49%

Hybrid (BM25 + Embedding) -> MAP=57%

Hybrid (Embedding + BM25) -> MAP=57%

And that's before you use synonym dictionaries for keyword searches.

>>s-mack+Q63
I'm curious, in your benchmark, what's the difference between BM25+Embedding and Embedding+BM25? And what do you use to make the embedding

If you make the embedding with an LLM, it should work for any language the LLM is trained on.

>>spence+pS3
BM25+Embedding and Embedding+BM25 is exactly the same and shows the commutative relation whether you start from keyword search or semantic search.

For my tests, I used Ada-002. As data I used small news articles and no chunking and no preprocessing. The query for the articles is embedded directly.

Of course, improvements can be done for both approaches. That should just exemplify, what you might expect with hybrid search.

zlacker