update structured output with ollama release from yesterday

souzatharsis · Dec 7, 2024 · ab70133 · ab70133
1 parent b82a7de
commit ab70133
Show file tree

Hide file tree

Showing 13 changed files with 735 additions and 118 deletions.
diff --git a/poetry.lock b/poetry.lock
diff --git a/pyproject.toml b/pyproject.toml
@@ -38,6 +38,7 @@ lighteval = {extras = ["accelerate"], version = "^0.6.2"}
 outlines = "^0.1.7"
 datasets = "^3.1.0"
 text-generation = "^0.7.0"
+ollama = "^0.4.3"
 
 
 [build-system]

diff --git a/tamingllms/_build/.doctrees/environment.pickle b/tamingllms/_build/.doctrees/environment.pickle
diff --git a/tamingllms/_build/.doctrees/notebooks/evals.doctree b/tamingllms/_build/.doctrees/notebooks/evals.doctree
diff --git a/tamingllms/_build/.doctrees/notebooks/structured_output.doctree b/tamingllms/_build/.doctrees/notebooks/structured_output.doctree
diff --git a/tamingllms/_build/html/_sources/notebooks/structured_output.ipynb b/tamingllms/_build/html/_sources/notebooks/structured_output.ipynb
@@ -29,7 +29,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 2,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -645,7 +645,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### A Simple Example: Multiple Choice Generation"
+    "#### Multiple Choice Generation"
    ]
   },
   {
@@ -701,7 +701,16 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "prompt = f\"You are an expert at structured data extraction. You will be given unstructured text from a SEC filing and extracted names of mentioned entities and places and should convert the response into the given structure. Document: {sec_filing[:TOP]} \"\n",
+    "BASE_PROMPT = \"You are an expert at structured data extraction. You will be given unstructured text from a SEC filing and extracted names of mentioned entities and places and should convert the response into the given structure.\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "prompt = f\"{BASE_PROMPT} Document: {sec_filing[:TOP]}\"\n",
     "generator = outlines.generate.json(model, SECExtraction)\n",
     "sec_extraction_outlines = generator(prompt)"
    ]
@@ -732,6 +741,155 @@
     "We observe that the model was able to extract the entities and places from the input text, and return them in the specified format. However, it is interesting to see that the model hallucinates a few entities, a phenomenon that is common for smaller Open Source models that were not fine-tuned on the task of entity extraction."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Ollama\n",
+    "\n",
+    "Ollama offers a similar functionality to Outlines, in that it allows to guide the generation process so the output is guaranteed to follow a JSON schema or Pydantic model. The current `ollama` implementation leverages llama.cpp GBNF (GGML BNF) grammars {cite}`llama_cpp_grammars` to enable structured output generation. It forces language models to generate output in specific, predefined formats by constraining their outputs to follow precise rules and patterns. The system accomplishes this through a formal grammar specification that defines exactly how valid outputs can be constructed. It's essentially an extension of BNF (Backus-Naur Form) {cite}`backus_naur_form` with some modern regex-like features added. These rules carefully define what elements are allowed, how they can be combined, and what patterns of repetition and sequencing are valid. By enforcing these constraints during generation, GBNF ensures the model's output strictly adheres to the desired format."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's replicate our previous example with Ollama. First, make sure you have Ollama installed. You can find installation instructions [here](https://ollama.com/docs/installation).\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "```bash\n",
+    "curl -fsSL https://ollama.com/install.sh | sh\n",
+    "pip install ollama\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The code below demonstrates how to use Ollama's structured output capabilities with a Pydantic model as we did before with OpenAI, LangChain and Outlines. The SECExtraction model defines the expected structure with two fields: mentioned_entities and mentioned_places as lists of strings. The `extract_entities_from_sec_filing` function uses Ollama's chat API to analyze SEC filings and extract entities in a structured format, with temperature set to 0 for deterministic results. We pass the Pydantic model's JSON schema to Ollama via the `format` parameter. Finally, we append a suffix to the prompt instructing the model to return the response as JSON (\"Return as JSON.\") as recommended by Ollama maintainers.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from ollama import chat\n",
+    "from pydantic import BaseModel\n",
+    "\n",
+    "class SECExtraction(BaseModel):\n",
+    "    mentioned_entities: list[str]\n",
+    "    mentioned_places: list[str]\n",
+    "\n",
+    "OLLAMA_STRUCTURED_OUTPUT_PROMPT_SUFFIX = \"Return as JSON.\"\n",
+    "OLLAMA_STRUCTURED_OUTPUT_TEMPERATURE = 0\n",
+    "\n",
+    "def extract_entities_from_sec_filing(doc: str, model: str) -> dict:\n",
+    "    \"\"\"\n",
+    "    Extract entities and places from an SEC filing using Ollama chat.\n",
+    "    \n",
+    "    Args:\n",
+    "        doc: The SEC filing text to analyze\n",
+    "        model: The Ollama model to use for extraction\n",
+    "        \n",
+    "    Returns:\n",
+    "        The raw response from the chat model\n",
+    "    \"\"\"\n",
+    "    response = chat(\n",
+    "        messages=[\n",
+    "            {\n",
+    "                'role': 'user',\n",
+    "                'content': f\"\"\"{BASE_PROMPT}\n",
+    "                {OLLAMA_STRUCTURED_OUTPUT_PROMPT_SUFFIX}\n",
+    "                \n",
+    "                Document: {doc}\"\"\"\n",
+    "            }\n",
+    "        ],\n",
+    "        model=model,  # You can also use other models like 'mistral' or 'llama2-uncensored'\n",
+    "        format=SECExtraction.model_json_schema(),\n",
+    "        options={'temperature': OLLAMA_STRUCTURED_OUTPUT_TEMPERATURE}  # Set to 0 for more deterministic output\n",
+    "    )\n",
+    "    return response\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can now run the function and print the extracted entities and places. But first we need to start the Ollama server with our target LLM model (Qwen2.5-0.5B) running locally."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "```bash\n",
+    "ollama run qwen2.5:0.5b\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "doc = sec_filing[:TOP]\n",
+    "model = \"qwen2.5:0.5b\"\n",
+    "\n",
+    "response = extract_entities_from_sec_filing(doc, model)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response_json = json.loads(response.message.content)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Extracted entities: ['United States', 'SECURITIES AND EXCHANGE COMMISSION']\n",
+      "Extracted places: []\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"Extracted entities:\", response_json.get('mentioned_entities'))\n",
+    "print(\"Extracted places:\", response_json.get('mentioned_places'))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "While the extracted entities and places (empty) were quite different from those previously extracted using Outlines, we have indeed successfully obtained results in JSON format as expected even though we used a quite small model with 0.5B parameters.\n"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -740,11 +898,11 @@
     "\n",
     "### Comparing Solutions\n",
     "\n",
-    "* **Simplicity vs. Control**: One-shot prompts are simple but offer limited control.  `LangChain`, and Outlines provide greater control but might have a steeper learning curve though quite manageable.\n",
+    "* **Simplicity vs. Control**: One-shot prompts are simple but offer limited control.  LangChain, Outlines and Ollama provide greater control but might have a steeper learning curve though quite simple and manageable.\n",
     "\n",
-    "* **Native LLM Support**:  `with_structured_output` in LangChain relies on the underlying LLM having built-in support for structured output APIs, i.e. LangChain is a wrapper around the underlying LLM's structured output API. Outlines, on the other hand, is more broadly applicable enabling a wider range of Open Source models.\n",
+    "* **Native LLM Support**:  `with_structured_output` in LangChain relies on the underlying LLM having built-in support for structured output APIs, i.e. LangChain is a wrapper around the underlying LLM's structured output API. Outlines and Ollama, on the other hand, are more broadly applicable enabling a wider range of Open Source models. Ollama being a leader in serving Open Source models locally while Outlines enabling Open Source models available via the transformers, llama.cpp, exllama2, mlx-lm and vllm.\n",
     "\n",
-    "* **Flexibility**:  Outlines and LangChain's  `StructuredOutputParser`  offer the most flexibility for defining custom output structures."
+    "* **Flexibility**:  Outlines  offers the most flexibility for defining custom output structures while LangChain is limited by the underlying LLM's support for structured output APIs and Ollama is still limited to JSON format."
    ]
   },
   {