Figure: VERGE dataset generation process
This repository contains the implementation of VERGE, a verification-enhanced methodology for generating multi-hop datasets to evaluate Retrieval-Augmented Generation (RAG) systems. VERGE addresses significant methodological gaps in existing RAG evaluation frameworks by generating task-specific, multi-hop reasoning dataset.
- VERGE: Implements a novel verification agent that ensures generated questions necessitate genuine multi-hop reasoning and maintain factual consistency
- Hierarchical Error Taxonomy: Provides structured analysis of RAG system failure patterns specifically in multi-hop reasoning contexts
Chunker/
: Scripts for chunking documentsData/
: Scripts for downloading the datasetsExamProcesser
: Scripts for generated exam processorSolver
: Scripts for solving the generated examscategorise_errors.py
: Scripts for categorise the error typegenerate_exam
: Scripts for generating an examprompt_templates.py
: Prompting templates for question generation, verification, and evaluationretriever.py
: Retriever class
pip install -r requirements.txt
python src/Data/long_bench_downloader.py
python src/Data/download_documents_sec_filings.py
python src/Chunker/document_chunker.py
python src/generate_exam.py
python src/Solver/solve_exam_rag.py
python src/categorise_errors.py