diff --git a/05-agentic-rag/code_samples/05-autogen.ipynb b/05-agentic-rag/code_samples/05-autogen.ipynb new file mode 100644 index 00000000..3dc293db --- /dev/null +++ b/05-agentic-rag/code_samples/05-autogen.ipynb @@ -0,0 +1,458 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Agentic RAG with Autogen\n", + "\n", + "This notebook demonstrates implementing Retrieval-Augmented Generation (RAG) using Autogen agents with enhanced evaluation capabilities." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Make sure to run this cell before running the rest of the notebook\n", + "!pip install chromadb" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "from typing import List, Dict\n", + "import time\n", + "import os\n", + "from autogen_agentchat.agents import AssistantAgent\n", + "from autogen_core.models import UserMessage\n", + "from autogen_core import CancellationToken\n", + "from autogen_agentchat.messages import TextMessage\n", + "from azure.core.credentials import AzureKeyCredential\n", + "from autogen_ext.models.azure import AzureAIChatCompletionClient\n", + "import chromadb\n", + "import asyncio" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create the Client \n", + "\n", + "First, we initialize the Azure AI Chat Completion Client. This client will be used to interact with the Azure OpenAI service to generate responses to user queries." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "finish_reason='stop' content='The capital of France is Paris.' usage=RequestUsage(prompt_tokens=14, completion_tokens=7) cached=False logprobs=None thought=None\n" + ] + } + ], + "source": [ + "client = AzureAIChatCompletionClient(\n", + " model=\"gpt-4o-mini\",\n", + " endpoint=\"https://models.inference.ai.azure.com\",\n", + " credential=AzureKeyCredential(os.environ[\"GITHUB_TOKEN\"]),\n", + " model_info={\n", + " \"json_output\": True,\n", + " \"function_calling\": True,\n", + " \"vision\": True,\n", + " \"family\": \"unknown\",\n", + " },\n", + ")\n", + "result = await client.create([UserMessage(content=\"What is the capital of France?\", source=\"user\")])\n", + "print(result)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize Assistant Agent\n", + "\n", + "Next, we create an instance of the `AssistantAgent`. This agent will use the Azure AI Chat Completion Client to generate responses to user queries." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "assistant = AssistantAgent(\n", + " name=\"assistant\",\n", + " model_client=client,\n", + " system_message=\"You are a helpful assistant.\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Vector Database Initialization\n", + "\n", + "We initialize ChromaDB with persistent storage and add enhanced sample documents. ChromaDB will be used to store and retrieve documents that provide context for generating accurate responses." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Add of existing embedding ID: doc_0\n", + "Add of existing embedding ID: doc_1\n", + "Add of existing embedding ID: doc_2\n", + "Add of existing embedding ID: doc_3\n", + "Add of existing embedding ID: doc_4\n", + "Add of existing embedding ID: doc_0\n", + "Add of existing embedding ID: doc_1\n", + "Add of existing embedding ID: doc_2\n", + "Add of existing embedding ID: doc_3\n", + "Add of existing embedding ID: doc_4\n", + "Add of existing embedding ID: doc_0\n", + "Add of existing embedding ID: doc_1\n", + "Add of existing embedding ID: doc_2\n", + "Add of existing embedding ID: doc_3\n", + "Add of existing embedding ID: doc_4\n", + "Add of existing embedding ID: doc_0\n", + "Add of existing embedding ID: doc_1\n", + "Add of existing embedding ID: doc_2\n", + "Add of existing embedding ID: doc_3\n", + "Add of existing embedding ID: doc_4\n", + "Add of existing embedding ID: doc_0\n", + "Add of existing embedding ID: doc_1\n", + "Add of existing embedding ID: doc_2\n", + "Add of existing embedding ID: doc_3\n", + "Add of existing embedding ID: doc_4\n", + "Add of existing embedding ID: doc_0\n", + "Add of existing embedding ID: doc_1\n", + "Add of existing embedding ID: doc_2\n", + "Add of existing embedding ID: doc_3\n", + "Add of existing embedding ID: doc_4\n", + "Add of existing embedding ID: doc_0\n", + "Add of existing embedding ID: doc_1\n", + "Add of existing embedding ID: doc_2\n", + "Add of existing embedding ID: doc_3\n", + "Add of existing embedding ID: doc_4\n", + "Insert of existing embedding ID: doc_0\n", + "Insert of existing embedding ID: doc_1\n", + "Insert of existing embedding ID: doc_2\n", + "Insert of existing embedding ID: doc_3\n", + "Insert of existing embedding ID: doc_4\n" + ] + } + ], + "source": [ + "# Initialize ChromaDB with persistent storage\n", + "chroma_client = chromadb.PersistentClient(path=\"./chroma_db\")\n", + "collection = chroma_client.create_collection(\n", + " name=\"documents\",\n", + " metadata={\"description\": \"RAG documentation\"},\n", + " get_or_create=True\n", + ")\n", + "\n", + "# Enhanced sample documents\n", + "documents = [\n", + " \"RAG combines retrieval with generative AI for accurate responses.\",\n", + " \"Key features of RAG include document indexing and contextual generation.\",\n", + " \"RAG helps reduce hallucinations by grounding responses in source documents.\",\n", + " \"RAG systems use vector embeddings to find relevant context.\",\n", + " \"The retrieval component ensures factual accuracy in responses.\"\n", + "]\n", + "\n", + "# Add documents with metadata\n", + "collection.add(\n", + " documents=documents,\n", + " ids=[f\"doc_{i}\" for i in range(len(documents))],\n", + " metadatas=[{\"source\": \"training\", \"type\": \"explanation\"} for _ in documents]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Agent Configuration\n", + "\n", + "We configure the retrieval and assistant agents. The retrieval agent is specialized in finding relevant information using semantic search, while the assistant generates detailed responses based on the retrieved information." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "# Create agents with enhanced capabilities\n", + "retrieval_agent = AssistantAgent(\n", + " name=\"retrieval_agent\",\n", + " model_client=client,\n", + " system_message=\"\"\"I am a retrieval agent specialized in finding relevant information.\n", + " I use semantic search to find the most pertinent context for queries.\"\"\",\n", + ")\n", + "\n", + "assistant = AssistantAgent(\n", + " name=\"assistant\",\n", + " system_message=\"\"\"I am an AI assistant that generates detailed responses based on retrieved information.\n", + " I cite sources and explain my reasoning process.\"\"\",\n", + " model_client=client,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## RAGEvaluator Class\n", + "\n", + "We define the `RAGEvaluator` class to evaluate the response based on various metrics like response length, source citations, response time, and context relevance." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "class RAGEvaluator:\n", + " def __init__(self):\n", + " self.responses = []\n", + " self.metrics = {}\n", + " \n", + " def evaluate_response(self, query: str, response: str, context: List[str]) -> Dict:\n", + " # Calculate response time\n", + " start_time = time.time()\n", + " \n", + " metrics = {\n", + " 'response_length': len(response),\n", + " 'source_citations': sum(1 for doc in context if doc in response),\n", + " 'response_time': time.time() - start_time,\n", + " 'context_relevance': self._calculate_relevance(query, context)\n", + " }\n", + " \n", + " self.responses.append({\n", + " 'query': query,\n", + " 'response': response,\n", + " 'metrics': metrics\n", + " })\n", + " \n", + " return metrics\n", + " \n", + " def _calculate_relevance(self, query: str, context: List[str]) -> float:\n", + " # Simple relevance scoring\n", + " return sum(1 for c in context if query.lower() in c.lower()) / len(context)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Query Processing with RAG\n", + "\n", + "We define the `ask_rag` function to send the query to the assistant, process the response, and evaluate it. This function handles the interaction with the assistant and uses the evaluator to measure the quality of the response." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "async def ask_rag(query: str, evaluator: RAGEvaluator):\n", + " try:\n", + " # Get response with timing\n", + " start_time = time.time()\n", + " response = await assistant.on_messages(\n", + " [TextMessage(content=query, source=\"user\")],\n", + " cancellation_token=CancellationToken(),\n", + " )\n", + " processing_time = time.time() - start_time\n", + " \n", + " # Evaluate response\n", + " metrics = evaluator.evaluate_response(\n", + " query=query,\n", + " response=response.chat_message.content,\n", + " context=documents\n", + " )\n", + " \n", + " return {\n", + " 'response': response.chat_message.content,\n", + " }\n", + " except Exception as e:\n", + " print(f\"Error processing query: {e}\")\n", + " return None" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Example usage\n", + "\n", + "We initialize the evaluator and define the queries that we want to process and evaluate." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "evaluator = RAGEvaluator()\n", + "queries = [\n", + " \"What are the key features of RAG?\",\n", + " \"How does RAG improve response accuracy?\",\n", + " \"Explain the retrieval process in RAG\"\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "async def main():\n", + " for query in queries:\n", + " print(f\"\\nProcessing Query: {query}\")\n", + " result = await ask_rag(query, evaluator)\n", + " if result:\n", + " print(f\"Response: {result['response']}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Run the Script\n", + "\n", + "We check if the script is running in an interactive environment or a standard script, and run the main function accordingly." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Processing Query: What are the key features of RAG?\n", + "Response: RAG, or Retrieval-Augmented Generation, is a model architecture that combines the strengths of information retrieval and generative language models. Here are some key features of RAG:\n", + "\n", + "1. **Dual Components**: RAG integrates two main components: a retriever and a generator. The retriever fetches relevant documents or pieces of information from a knowledge base, while the generator produces human-like text based on the retrieved information.\n", + "\n", + "2. **Retrieval-Augmented Generation**: The generator uses the documents retrieved by the retriever to inform its responses, effectively enhancing the model's ability to provide accurate and contextually relevant answers.\n", + "\n", + "3. **End-to-End Training**: RAG models allow for joint training, where both the retriever and generator can be tuned together. This approach helps optimize the interaction between the two components for improved performance.\n", + "\n", + "4. **Flexible Knowledge Sources**: RAG can utilize various external knowledge sources, such as databases or document collections, making it adaptable for different applications. This allows it to pull in current and diverse information, thus staying relevant and informed.\n", + "\n", + "5. **Scalability**: RAG architectures can retrieve from large data stores efficiently. The use of techniques like dense embeddings allows the retriever to quickly find relevant documents even in extensive datasets.\n", + "\n", + "6. **Applications**: RAG is particularly useful in applications requiring up-to-date knowledge, such as question answering, conversational agents, and content creation, where the information landscape is constantly evolving.\n", + "\n", + "7. **Masking and Generation**: The generator can take a masked input (like specific placeholders indicating where information is needed) along with retrieved documents to create coherent and contextually accurate outputs.\n", + "\n", + "8. **Performance**: RAG models tend to outperform traditional generative models when it comes to fact-based questions and tasks, as they leverage the additional information retrieved from external sources.\n", + "\n", + "These features help RAG effectively bridge the gap between static knowledge representation in traditional language models and the dynamic, content-driven needs of real-world applications. For further details on RAG's architecture and implications, refer to the original implementation published by Facebook AI Research (FAIR) in January 2020.\n", + "\n", + "Processing Query: How does RAG improve response accuracy?\n", + "Response: Retrieval-Augmented Generation (RAG) improves response accuracy in several key ways:\n", + "\n", + "1. **Utilization of External Knowledge**: By integrating a retrieval mechanism, RAG can pull in up-to-date and contextually relevant information from external documents or databases. This is particularly important for tasks requiring specific knowledge, as traditional generative models are limited to their training data, which may become outdated or lack specificity.\n", + "\n", + "2. **Contextual Relevance**: The retriever fetches documents that are closely related to the user query, ensuring the generator has access to pertinent information. This helps the model produce responses that are directly relevant to the inquiry, thereby enhancing accuracy.\n", + "\n", + "3. **Reduction of Hallucination**: Standard language models sometimes generate plausible-sounding yet factually incorrect statements, a phenomenon known as \"hallucination.\" By grounding the generation process in retrieved content, RAG reduces the chance of hallucinations, as the generated responses are directly tied to concrete information sources.\n", + "\n", + "4. **Improved Fact Recall**: RAG’s architecture allows for the generation to incorporate factual data directly from retrieved sources, which enhances the ability to recall specific information accurately. The generator modifies and contextualizes this information for human-like responses.\n", + "\n", + "5. **Joint Training**: In RAG, both the retriever and generator can be jointly trained. This means that the retriever can learn to prioritize the most useful documents for the specific kinds of questions the generator will encounter, optimizing their interaction and improving overall response quality.\n", + "\n", + "6. **Clarification and Specificity**: When responding to complex queries, the generator can refer to multiple retrieved documents, allowing it to synthesize information from various sources to create a more comprehensive and specific response.\n", + "\n", + "7. **Scalability with Diverse Data**: RAG can be scaled to work with large datasets, which translates to accessing a broader range of knowledge. This ensures that responses can cover a wider scope of topics and are anchored in more intricate and varied information, leading to better-informed outputs.\n", + "\n", + "8. **User Feedback and Continuous Learning**: RAG systems can be designed to incorporate user feedback to improve retrieval effectiveness continually. For example, if users indicate that certain types of responses are more accurate or relevant, the system can refine its retrieval strategies over time.\n", + "\n", + "Overall, RAG’s architecture allows it to leverage a combination of real-time information retrieval and sophisticated language generation, leading to improved accuracy, relevance, and user experience in generated responses. This framework not only addresses the limitations found in traditional generative models but also enhances their capabilities in managing dynamic and context-dependent queries.\n", + "\n", + "Processing Query: Explain the retrieval process in RAG\n", + "Response: The retrieval process in Retrieval-Augmented Generation (RAG) is a critical component that involves finding and selecting relevant documents or pieces of information from a knowledge base or external dataset, which will then be used to inform the generative model's response. Here’s a detailed breakdown of the retrieval process in RAG:\n", + "\n", + "1. **Input Processing**: The retrieval process begins when a user query or input text is received. This query may include intent information or context that helps indicate what kind of information is needed.\n", + "\n", + "2. **Query Representation**: The input query is transformed into a vector representation. This is typically done through embedding techniques using models like BERT or other transformer-based models. The embedding captures the semantic meaning of the query in a form that can be compared to the stored documents or knowledge.\n", + "\n", + "3. **Document Indexing**: Prior to retrieval, the documents available in the knowledge base are indexed. This involves converting each document into a vector representation, similar to how the input query is represented. Depending on the implementation, this may be done using dense vector embeddings or more traditional keyword-based methods.\n", + "\n", + "4. **Retrieval Mechanism**: The next step involves comparing the query vector against the indexed document vectors to identify the most relevant documents. Common approaches to this include:\n", + " - **Nearest Neighbor Search**: This can be performed using algorithms like FAISS (Facebook AI Similarity Search) or Annoy (Approximate Nearest Neighbors Oh Yeah), which efficiently retrieve documents that are closest to the input query in the vector space.\n", + " - **Cosine Similarity**: A common similarity measure used to determine how closely related the document vectors are to the query vector. Higher similarity scores indicate greater relevance.\n", + "\n", + "5. **Ranking the Results**: The retrieved documents are ranked based on their similarity scores relative to the query. The highest-ranking documents are selected for further processing. This ranking helps to ensure that the most contextually relevant information is provided to the generator.\n", + "\n", + "6. **Return Top-k Documents**: A predefined number (k) of top documents are returned to the generative model. The value of k can vary based on the application, but it typically ranges from a few relevant documents to potentially dozens, depending on how much context the model might need.\n", + "\n", + "7. **Contextual Fusion**: Once the relevant documents are retrieved, they are passed on to the generation component. The generator uses these documents to inform and enhance the output. The documents can be used in a few ways, such as directly quoting, summarizing, or integrating information to construct a coherent response.\n", + "\n", + "8. **Feedback Loop (optional)**: Some implementations of RAG include mechanisms for learning from user interactions, allowing for continuous improvement of the retrieval portion of the system. This may involve re-evaluating the effectiveness of retrieved documents or refining the embedding process based on user feedback.\n", + "\n", + "By ensuring that the generator has access to specific and relevant information through this structured retrieval process, RAG improves the accuracy and relevance of the responses it generates. The combination of effective input processing, robust indexing, and advanced ranking techniques forms the backbone of the successful retrieval operation in RAG models.\n" + ] + } + ], + "source": [ + "if __name__ == \"__main__\":\n", + " if asyncio.get_event_loop().is_running():\n", + " # Running in an interactive environment, use await main()\n", + " await main()\n", + " else:\n", + " # Running in a standard script, use asyncio.run()\n", + " asyncio.run(main())" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.4" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}