update output size limit

souzatharsis · Nov 25, 2024 · 843b2d0 · 843b2d0
1 parent 7e06d8e
commit 843b2d0
Show file tree

Hide file tree

Showing 13 changed files with 67 additions and 30 deletions.
diff --git a/tamingllms/_build/.doctrees/environment.pickle b/tamingllms/_build/.doctrees/environment.pickle
diff --git a/tamingllms/_build/.doctrees/notebooks/output_size_limit.doctree b/tamingllms/_build/.doctrees/notebooks/output_size_limit.doctree
diff --git a/tamingllms/_build/.doctrees/notebooks/structured_output.doctree b/tamingllms/_build/.doctrees/notebooks/structured_output.doctree
diff --git a/tamingllms/_build/html/_sources/notebooks/output_size_limit.ipynb b/tamingllms/_build/html/_sources/notebooks/output_size_limit.ipynb
@@ -11,7 +11,6 @@
     "-- T.S. Eliot\n",
     "```\n",
     "```{contents}\n",
-    ":depth: 2\n",
     "```\n",
     "## What are Token Limits?\n",
     "\n",
@@ -122,6 +121,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "#### Step 2: Writing the Base Prompt Template\n",
+    "\n",
     "We will write a base prompt template which will serve as a foundational structure for all chunks, ensuring consistency in the instructions and context provided to the language model. The template includes the following parameters:\n",
     "- `role`: Defines the role or persona the model should assume.\n",
     "- `context`: Provides the background information or context for the task.\n",
@@ -200,6 +201,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "#### Step 3: Constructing Dynamic Prompt Parameters\n",
+    "\n",
     "Now, we will write a function (`get_dynamic_prompt_template`) that constructs prompt parameters dynamically for each chunk."
    ]
   },
@@ -258,6 +261,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "\n",
+    "#### Step 4: Generating the Report\n",
+    "\n",
     "Finally, we will write a function that generates the actual report by calling the `LLMChain` with the dynamically updated prompt parameters for each chunk and concatenating the results at the end."
    ]
   },
@@ -322,7 +328,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Example Usage"
+    "#### Example Usage"
    ]
   },
   {

diff --git a/tamingllms/_build/html/_sources/notebooks/structured_output.ipynb b/tamingllms/_build/html/_sources/notebooks/structured_output.ipynb
@@ -598,7 +598,7 @@
    "source": [
     "### Outlines\n",
     "\n",
-    "Outlines {cite}`outlines2024`is a library specifically focused on structured text generation from LLMs. Under the hood, Outlines works by adjusting the probability distribution of the model's output logits - the raw scores from the final layer of the neural network that are normally converted into text tokens. By introducing carefully crafted logit biases, Outlines can guide the model to prefer certain tokens over others, effectively constraining its outputs to a predefined set of valid options. This provides fine-grained control over the model's generation process. In that way, Outlines provides several powerful features:\n",
+    "Outlines {cite}`outlines2024` is a library specifically focused on structured text generation from LLMs. Under the hood, Outlines works by adjusting the probability distribution of the model's output logits - the raw scores from the final layer of the neural network that are normally converted into text tokens. By introducing carefully crafted logit biases, Outlines can guide the model to prefer certain tokens over others, effectively constraining its outputs to a predefined set of valid options. This provides fine-grained control over the model's generation process. In that way, Outlines provides several powerful features:\n",
     "\n",
     "* **Multiple Choice Generation**: Restrict the LLM output to a predefined set of options.\n",
     "* **Regex-based structured generation**: Guide the generation process using regular expressions.\n",

diff --git a/tamingllms/_build/html/notebooks/output_size_limit.html b/tamingllms/_build/html/notebooks/output_size_limit.html
@@ -131,8 +131,6 @@
 
                   <li class="toctree-l2"><a href="#content-chunking-with-contextual-linking" class="reference internal">Content Chunking with Contextual Linking</a></li>
 
-                  <li class="toctree-l2"><a href="#example-usage" class="reference internal">Example Usage</a></li>
-
                   <li class="toctree-l2"><a href="#implications" class="reference internal">Implications</a></li>
 
                   <li class="toctree-l2"><a href="#future-considerations" class="reference internal">Future Considerations</a></li>
@@ -208,12 +206,24 @@ <h1><a class="toc-backref" href="#id2" role="doc-backlink"><span class="section-
 <ul>
 <li><p><a class="reference internal" href="#what-are-token-limits" id="id3">What are Token Limits?</a></p></li>
 <li><p><a class="reference internal" href="#problem-statement" id="id4">Problem Statement</a></p></li>
-<li><p><a class="reference internal" href="#content-chunking-with-contextual-linking" id="id5">Content Chunking with Contextual Linking</a></p></li>
-<li><p><a class="reference internal" href="#example-usage" id="id6">Example Usage</a></p></li>
-<li><p><a class="reference internal" href="#implications" id="id7">Implications</a></p></li>
-<li><p><a class="reference internal" href="#future-considerations" id="id8">Future Considerations</a></p></li>
-<li><p><a class="reference internal" href="#conclusion" id="id9">Conclusion</a></p></li>
-<li><p><a class="reference internal" href="#references" id="id10">References</a></p></li>
+<li><p><a class="reference internal" href="#content-chunking-with-contextual-linking" id="id5">Content Chunking with Contextual Linking</a></p>
+<ul>
+<li><p><a class="reference internal" href="#generating-long-form-content" id="id6">Generating long-form content</a></p>
+<ul>
+<li><p><a class="reference internal" href="#step-1-chunking-the-content" id="id7">Step 1: Chunking the Content</a></p></li>
+<li><p><a class="reference internal" href="#step-2-writing-the-base-prompt-template" id="id8">Step 2: Writing the Base Prompt Template</a></p></li>
+<li><p><a class="reference internal" href="#step-3-constructing-dynamic-prompt-parameters" id="id9">Step 3: Constructing Dynamic Prompt Parameters</a></p></li>
+<li><p><a class="reference internal" href="#step-4-generating-the-report" id="id10">Step 4: Generating the Report</a></p></li>
+<li><p><a class="reference internal" href="#example-usage" id="id11">Example Usage</a></p></li>
+</ul>
+</li>
+<li><p><a class="reference internal" href="#discussion" id="id12">Discussion</a></p></li>
+</ul>
+</li>
+<li><p><a class="reference internal" href="#implications" id="id13">Implications</a></p></li>
+<li><p><a class="reference internal" href="#future-considerations" id="id14">Future Considerations</a></p></li>
+<li><p><a class="reference internal" href="#conclusion" id="id15">Conclusion</a></p></li>
+<li><p><a class="reference internal" href="#references" id="id16">References</a></p></li>
 </ul>
 </li>
 </ul>
@@ -299,7 +309,7 @@ <h2><a class="toc-backref" href="#id5" role="doc-backlink"><span class="section-
 <p>By following these steps, developers can effectively manage the <code class="docutils literal notranslate"><span class="pre">max_output_tokens</span></code> limitation and generate coherent long-form content without truncation.</p>
 <p>Let’s examine an example implementation of this technique.</p>
 <section id="generating-long-form-content">
-<h3><span class="section-number">2.3.1. </span>Generating long-form content<a class="headerlink" href="#generating-long-form-content" title="Permalink to this heading">¶</a></h3>
+<h3><a class="toc-backref" href="#id6" role="doc-backlink"><span class="section-number">2.3.1. </span>Generating long-form content</a><a class="headerlink" href="#generating-long-form-content" title="Permalink to this heading">¶</a></h3>
 <ul class="simple">
 <li><p>Goal: Generate a long-form report analyzing a company’s financial statement.</p></li>
 <li><p>Input: A company’s 10K SEC filing.</p></li>
@@ -312,7 +322,7 @@ <h3><span class="section-number">2.3.1. </span>Generating long-form content<a cl
 </figure>
 <p>The diagram in <a class="reference internal" href="#id1"><span class="std std-numref">Fig. 2.1</span></a> illustrates the process we will follow for handling long-form content generation with Large Language Models through “Content Chunking with Contextual Linking.” It shows how input content is first split into manageable chunks using a chunking function (e.g. <code class="docutils literal notranslate"><span class="pre">CharacterTextSplitter</span></code> with <code class="docutils literal notranslate"><span class="pre">tiktoken</span></code> tokenizer), then each chunk is processed sequentially while maintaining context from previous chunks. For each chunk, the system updates the context, generates a dynamic prompt with specific parameters, makes a call to the LLM chain, and stores the response. After all chunks are processed, the individual responses are combined with newlines to create the final report, effectively working around the token limit constraints of LLMs while maintaining coherence across the generated content.</p>
 <section id="step-1-chunking-the-content">
-<h4><span class="section-number">2.3.1.1. </span>Step 1: Chunking the Content<a class="headerlink" href="#step-1-chunking-the-content" title="Permalink to this heading">¶</a></h4>
+<h4><a class="toc-backref" href="#id7" role="doc-backlink"><span class="section-number">2.3.1.1. </span>Step 1: Chunking the Content</a><a class="headerlink" href="#step-1-chunking-the-content" title="Permalink to this heading">¶</a></h4>
 <p>There are different methods for chunking, and each of them might be appropriate for different situations. However, we can broadly group chunking strategies in two types:</p>
 <ul class="simple">
 <li><p><strong>Fixed-size Chunking</strong>: This is the most common and straightforward approach to chunking. We simply decide the number of tokens in our chunk and, optionally, whether there should be any overlap between them. In general, we will want to keep some overlap between chunks to make sure that the semantic context doesn’t get lost between chunks. Fixed-sized chunking may be a reasonable path in many common cases. Compared to other forms of chunking, fixed-sized chunking is computationally cheap and simple to use since it doesn’t require the use of any specialied techniques or libraries.</p></li>
@@ -347,6 +357,9 @@ <h4><span class="section-number">2.3.1.1. </span>Step 1: Chunking the Content<a
 </div>
 </div>
 </div>
+</section>
+<section id="step-2-writing-the-base-prompt-template">
+<h4><a class="toc-backref" href="#id8" role="doc-backlink"><span class="section-number">2.3.1.2. </span>Step 2: Writing the Base Prompt Template</a><a class="headerlink" href="#step-2-writing-the-base-prompt-template" title="Permalink to this heading">¶</a></h4>
 <p>We will write a base prompt template which will serve as a foundational structure for all chunks, ensuring consistency in the instructions and context provided to the language model. The template includes the following parameters:</p>
 <ul class="simple">
 <li><p><code class="docutils literal notranslate"><span class="pre">role</span></code>: Defines the role or persona the model should assume.</p></li>
@@ -411,6 +424,9 @@ <h4><span class="section-number">2.3.1.1. </span>Step 1: Chunking the Content<a
 </div>
 </div>
 </div>
+</section>
+<section id="step-3-constructing-dynamic-prompt-parameters">
+<h4><a class="toc-backref" href="#id9" role="doc-backlink"><span class="section-number">2.3.1.3. </span>Step 3: Constructing Dynamic Prompt Parameters</a><a class="headerlink" href="#step-3-constructing-dynamic-prompt-parameters" title="Permalink to this heading">¶</a></h4>
 <p>Now, we will write a function (<code class="docutils literal notranslate"><span class="pre">get_dynamic_prompt_template</span></code>) that constructs prompt parameters dynamically for each chunk.</p>
 <div class="cell docutils container">
 <div class="cell_input docutils container">
@@ -461,6 +477,9 @@ <h4><span class="section-number">2.3.1.1. </span>Step 1: Chunking the Content<a
 </div>
 </div>
 </div>
+</section>
+<section id="step-4-generating-the-report">
+<h4><a class="toc-backref" href="#id10" role="doc-backlink"><span class="section-number">2.3.1.4. </span>Step 4: Generating the Report</a><a class="headerlink" href="#step-4-generating-the-report" title="Permalink to this heading">¶</a></h4>
 <p>Finally, we will write a function that generates the actual report by calling the <code class="docutils literal notranslate"><span class="pre">LLMChain</span></code> with the dynamically updated prompt parameters for each chunk and concatenating the results at the end.</p>
 <div class="cell docutils container">
 <div class="cell_input docutils container">
@@ -518,10 +537,8 @@ <h4><span class="section-number">2.3.1.1. </span>Step 1: Chunking the Content<a
 </div>
 </div>
 </section>
-</section>
-</section>
 <section id="example-usage">
-<h2><a class="toc-backref" href="#id6" role="doc-backlink"><span class="section-number">2.4. </span>Example Usage</a><a class="headerlink" href="#example-usage" title="Permalink to this heading">¶</a></h2>
+<h4><a class="toc-backref" href="#id11" role="doc-backlink"><span class="section-number">2.3.1.5. </span>Example Usage</a><a class="headerlink" href="#example-usage" title="Permalink to this heading">¶</a></h4>
 <div class="cell docutils container">
 <div class="cell_input docutils container">
 <div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="c1"># Load the text from sample 10K SEC filing</span>
@@ -586,8 +603,10 @@ <h2><a class="toc-backref" href="#id6" role="doc-backlink"><span class="section-
 </div>
 </div>
 </div>
+</section>
+</section>
 <section id="discussion">
-<h3><span class="section-number">2.4.1. </span>Discussion<a class="headerlink" href="#discussion" title="Permalink to this heading">¶</a></h3>
+<h3><a class="toc-backref" href="#id12" role="doc-backlink"><span class="section-number">2.3.2. </span>Discussion</a><a class="headerlink" href="#discussion" title="Permalink to this heading">¶</a></h3>
 <p>Results from the generated report present a few interesting aspects:</p>
 <ul class="simple">
 <li><p><strong>Coherence</strong>: The generated report demonstrates a high level of coherence. The sections are logically structured, and the flow of information is smooth. Each part of the report builds upon the previous sections, providing a comprehensive analysis of Apple Inc.’s financial performance and key risk factors. The use of headings and subheadings helps in maintaining clarity and organization throughout the document.</p></li>
@@ -601,7 +620,7 @@ <h3><span class="section-number">2.4.1. </span>Discussion<a class="headerlink" h
 </section>
 </section>
 <section id="implications">
-<h2><a class="toc-backref" href="#id7" role="doc-backlink"><span class="section-number">2.5. </span>Implications</a><a class="headerlink" href="#implications" title="Permalink to this heading">¶</a></h2>
+<h2><a class="toc-backref" href="#id13" role="doc-backlink"><span class="section-number">2.4. </span>Implications</a><a class="headerlink" href="#implications" title="Permalink to this heading">¶</a></h2>
 <p>Implementing context chunking with contextual linking is a practical solution to manage the output size limitations of LLMs. However, this approach comes with its own set of implications that developers must consider.</p>
 <ol class="arabic simple">
 <li><p><strong>Increased Development Complexity</strong>: Implementing strategies to overcome the maximum output token length introduces additional layers of complexity to the application design. It necessitates meticulous management of context across multiple outputs to maintain coherence. Ensuring that each chunk retains the necessary context for the conversation or document can be challenging and often requires advanced logic to handle transitions seamlessly.</p></li>
@@ -611,7 +630,7 @@ <h2><a class="toc-backref" href="#id7" role="doc-backlink"><span class="section-
 <p>By understanding these implications, developers can better prepare for the challenges associated with context chunking and contextual linking, ensuring that their applications remain efficient, cost-effective, and user-friendly.</p>
 </section>
 <section id="future-considerations">
-<h2><a class="toc-backref" href="#id8" role="doc-backlink"><span class="section-number">2.6. </span>Future Considerations</a><a class="headerlink" href="#future-considerations" title="Permalink to this heading">¶</a></h2>
+<h2><a class="toc-backref" href="#id14" role="doc-backlink"><span class="section-number">2.5. </span>Future Considerations</a><a class="headerlink" href="#future-considerations" title="Permalink to this heading">¶</a></h2>
 <p>As models evolve, we can expect several advancements that will significantly impact how we handle output size limitations:</p>
 <ol class="arabic simple">
 <li><p><strong>Contextual Awareness</strong>: Future LLMs will likely have improved contextual awareness - or as Mustafa Suleyman would call “infinite memory”, enabling them to better understand and manage the context of a conversation or document over long interactions. This will reduce the need for repetitive context setting and improve the overall user experience.</p></li>
@@ -623,11 +642,11 @@ <h2><a class="toc-backref" href="#id8" role="doc-backlink"><span class="section-
 <p>These advancements will collectively enhance the capabilities of LLMs, making them more powerful and versatile tools for a wide range of applications. However, they will also introduce new challenges and considerations that developers and researchers will need to address to fully harness their potential.</p>
 </section>
 <section id="conclusion">
-<h2><a class="toc-backref" href="#id9" role="doc-backlink"><span class="section-number">2.7. </span>Conclusion</a><a class="headerlink" href="#conclusion" title="Permalink to this heading">¶</a></h2>
+<h2><a class="toc-backref" href="#id15" role="doc-backlink"><span class="section-number">2.6. </span>Conclusion</a><a class="headerlink" href="#conclusion" title="Permalink to this heading">¶</a></h2>
 <p>In conclusion, while managing output size limitations in LLMs presents significant challenges, it also drives innovation in application design and optimization strategies. By implementing techniques such as context chunking, efficient prompt templates, and graceful fallbacks, developers can mitigate these limitations and enhance the performance and cost-effectiveness of their applications. As the technology evolves, advancements in contextual awareness, token efficiency, and memory management will further empower developers to build more robust and scalable LLM-powered systems. It is crucial to stay informed about these developments and continuously adapt to leverage the full potential of LLMs while addressing their inherent constraints.</p>
 </section>
 <section id="references">
-<h2><a class="toc-backref" href="#id10" role="doc-backlink"><span class="section-number">2.8. </span>References</a><a class="headerlink" href="#references" title="Permalink to this heading">¶</a></h2>
+<h2><a class="toc-backref" href="#id16" role="doc-backlink"><span class="section-number">2.7. </span>References</a><a class="headerlink" href="#references" title="Permalink to this heading">¶</a></h2>
 <ul class="simple">
 <li><p><a class="reference external" href="https://langchain.readthedocs.io/en/latest/modules/text_splitter.html">LangChain Text Splitter</a>.</p></li>
 </ul>