This repository contains prototype code for Efficient Document Retrieval with Vision Language Models using ColPali. ColPali—a model designed for efficient document retrieval using visual embeddings—improves retrieval performance, latency, and accuracy by bypassing traditional OCR pipelines.
- Vision-Based Retrieval: Leverages visual embeddings for document retrieval.
- ColPali Integration: Implements the ColPali architecture for efficient multi-vector embeddings.
- End-to-End Pipeline: Demonstrates the process from document ingestion to retrieval.
- Python 3.9 or higher
- Docker (for the PGVector container)
- OpenAI API key for retrieval
Ensure Docker is installed, then run:
cd pgvector
./start
pip install -r requirements.txt
Ingest documents into the database:
python ingestion.py
Set up your OpenAI API key:
export OPENAI_API_KEY=sk-proj-xxxxxxx
Retrieve documents based on a query:
python retrieval.py