This project implements and evaluates a Question-Answering (QA) system powered by Retrieval-Augmented Generation (RAG). It has been conducted by: L. Arduini, D. N. Ghaneh, L. Menchini, C. Petruzzella. The primary goal is to integrate and assess advanced document retrieval methods combined with large language models (LLMs) to generate accurate, context-aware responses for user queries. The system leverages methods such as Rank Fusion and Cascading Retrieval for optimized document retrieval and contextual QA generation.
- Document Retrieval:
- Sparse Retrieval: BM25 for term-based matching.
- Dense Retrieval: Embedding-based semantic similarity.
- Rank Fusion: Combining sparse and dense retrieval scores.
- Cascading Retrieval: Utilizing cross-encoders and rerankers.
- Question-Answering:
- Context-aware response generation using a state-of-the-art LLM (Llama 3.2-1B).
- Evaluation Framework:
- Quantitative metrics: NDCG, Recall.
- Human-centric evaluation: Scoring responses from 0 (irrelevant) to 2 (highly relevant) with motivations for scores.
- TREC-COVID: A real-world dataset simulating medical information retrieval challenges.
- Load and preprocess the TREC-COVID dataset.
- Configure retrieval strategies (BM25, dense retrieval, etc.).
- Generate responses using the Llama 3.2-1B model with retrieved documents as context.
- Evaluate responses with metrics (NDCG, Recall) and human annotations.
- Python Libraries: Sentence Transformers, Rank BM25, PyTrecEval, HuggingFace Transformers.
- Model: Llama 3.2-1B for language generation.
- Dataset: TREC-COVID.
- Document Retrieval: Evaluate sparse, dense, and hybrid methods.
- QA Generation: Generate high-quality, context-aware responses.
- Evaluation: Score and annotate responses with human judgments.
We welcome contributions! Follow these steps:
- Fork this repository.
- Create a feature branch:
git checkout -b feature-name
- Commit changes and open a Pull Request.
This project is licensed under the MIT License.
Feel free to reach out for any questions or feedback!