This repository contains a Jupyter notebook that demonstrates how to build a retrieval-based question-answering system using LangChain and Hugging Face. The notebook guides you through the process of setting up the environment, loading and processing documents, generating embeddings, and querying the system to retrieve relevant documents.
To set up the environment and install the necessary dependencies, follow these steps:
-
Clone the repository:
git clone https://github.com/prgrmcode/retrieval-based-qa-llm.git cd retrieval-based-qa-llm
-
Create a virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows use `.venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
-
Install required packages:
%pip install -Uqqq rich tiktoken wandb langchain unstructured tabulate pdf2image chromadb %pip install --upgrade transformers %pip install -U "huggingface_hub[cli]" %pip install -U langchain-community %pip install -U langchain_huggingface %pip install sentence-transformers
-
Login to Hugging Face:
%huggingface-cli login
-
Configure Hugging Face API token:
import os from getpass import getpass if os.getenv("HUGGINGFACE_API_TOKEN") is None: os.environ["HUGGINGFACE_API_TOKEN"] = getpass("Paste your Hugging Face API token from: https://huggingface.co/settings/tokens\n") assert os.getenv("HUGGINGFACE_API_TOKEN", "").startswith("hf_"), "This doesn't look like a valid Hugging Face API token" print("Hugging Face API token is configured")
-
Configure W&B tracing:
os.environ["LANGCHAIN_WANDB_TRACING"] = "true" os.environ["WANDB_PROJECT"] = "llmapps"
-
Load documents from the specified directory:
import time from langchain_community.document_loaders import DirectoryLoader, TextLoader def find_md_files(directory): start_time = time.time() loader = DirectoryLoader(directory, glob="**/*.md", loader_cls=TextLoader, show_progress=True) documents = loader.load() end_time = time.time() print(f"Time taken to load documents: {end_time - start_time:.2f} seconds") return documents documents = find_md_files(directory="docs_sample/") print(f"Number of documents loaded: {len(documents)}")
-
Count tokens in each document:
def count_tokens(documents): token_counts = [len(tokenizer.encode(document.page_content)) for document in documents] return token_counts token_counts = count_tokens(documents) print(f"Token counts: {token_counts}")
-
Split documents into sections:
from langchain.text_splitter import MarkdownTextSplitter md_text_splitter = MarkdownTextSplitter(chunk_size=1000) document_sections = md_text_splitter.split_documents(documents) print(f"Number of document sections: {len(document_sections)}") print(f"Max tokens in a section: {max(count_tokens(document_sections))}")
-
Initialize the tokenizer and model:
import torch import transformers model_id = "meta-llama/Meta-Llama-3-8B-Instruct" tokenizer = transformers.AutoTokenizer.from_pretrained(model_id) model = transformers.AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
-
Generate embeddings using HuggingFaceEmbeddings:
from langchain.vectorstores import Chroma from langchain_huggingface import HuggingFaceEmbeddings model_name = "sentence-transformers/all-mpnet-base-v2" model_kwargs = {"device": "cuda"} encode_kwargs = {"normalize_embeddings": False} embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs) db = Chroma.from_documents(document_sections, embeddings)
-
Create a retriever from the database:
retriever = db.as_retriever(search_kwargs=dict(k=3))
-
Run a query to retrieve relevant documents:
query = "How can I share my W&B report with my team members in a public W&B project?" docs = retriever.invoke(query) for doc in docs: print(doc.metadata["source"])
-
Create a RetrievalQA chain:
from langchain.chains import RetrievalQA from langchain_huggingface import HuggingFacePipeline from transformers import pipeline from tqdm import tqdm pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=70) llm = HuggingFacePipeline(pipeline=pipe) qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)
-
Run the query using the RetrievalQA chain:
with tqdm(total=1, desc="Running RetrievalQA") as pbar: result = qa.run(query) pbar.update(1) display(Markdown(result))
The examples.txt
contains example inputs and outputs for various tasks. These examples can help you understand the expected behavior of the models and scripts.
Contributions are welcome! If you have any ideas, suggestions, or improvements, please open an issue or submit a pull request.
This project is licensed under the MIT License.