This RAG system consists of the following components,
- Query Reformulation
- Optionally, using an LLM we can refine user prompts to optimize retrieval quality without altering intent.
- ChromaDB Vector Search
- Retrieves semantically similar documents from a Chroma collection
- BM25 Lexical Search
- Extracts keyword-based matches using BM25 scoring
- Hybrid Results Fusion
- Combines Chroma and BM25 results using Reciprocal Rank Fusion (RRF)
- Re-ranking
- A pre-trained CrossEncoder model is used to re-rank these results
Finally, we put together the user's preferences and question to generate responses. A Phi 3.5
model is used for response generation via Ollama.
rag = RAGSystem()
response = rag.query("What is the capital of France?")
print(response)
First set a local model as evaluation model.
deepeval set-local-model --model-name=gemma2:9b --base-url="http://localhost:11434/v1/"
Then run evaluate.
python .\evaluate.py