This repository contains a RAG-based LLM application that allows users to ask query to large language models (LLMs) with provided document through an intuitive interface. The application frontend is built using Streamlit. It connects to an Application layer's backend server. CI/CD is implemented to automate the build and deployment process, ensuring that the application is always up-to-date. Application Orchestration is implemented used Kubernetes
- User Interface: A simple and intuitive interface for users to input queries with documents and receive responses from LLMs.
- LLM Selection: Users can select different LLMs available on the Ollama and vLLM backend server.
- Inference Speed: The application leverages Docker Compose to run the frontend and backend services, which can be optimized for faster inference using GPU support if available.
- Vector DB: Chroma and FAISS is used for creating vector store and storing vector embeddings and corresponding chunks.
- CI/CD: Continuous Integration and Continuous Deployment (CI/CD) is implemented using GitHub Actions to automate the build and deployment process.
- Orchestration: Application Orchestration is implemented using Kubernetes running on multi node compute environment.
- Aargo CD: Leveraging Argo CD's GitOps capabilities for seamless, automated, and scalable deployments
Application Architecture |
---|
- Add support for image based queries
- Add support for pdf based queries
- Setup CI/CD pipeline
- Add kubernetes manifests for deployment
- Add support for vLLM
- Docker
- Docker Compose
- GPU support for faster inference if you have one available.
- Clone the repository:
git@github.com:gaurav00700/RAG-LLM-Application.git
- Navigate to the project directory:
cd RAG-LLM-Application
- Deployment and Orchestration:
kubectl apply -f k8s
- (Optional) Run the docker-compose command to start the application:
docker-compose up -d