Skip to content

Commit

Permalink
fix math blocks
Browse files Browse the repository at this point in the history
  • Loading branch information
souzatharsis committed Dec 16, 2024
1 parent eb0e72d commit ee3c172
Show file tree
Hide file tree
Showing 23 changed files with 52 additions and 48 deletions.
Binary file modified tamingllms/_build/.doctrees/environment.pickle
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/markdown/intro.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/markdown/preface.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/markdown/toc.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/alignment.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/evals.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/output_size_limit.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/safety.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/structured_output.doctree
Binary file not shown.
4 changes: 2 additions & 2 deletions tamingllms/_build/html/_sources/notebooks/alignment.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -256,9 +256,9 @@
"\n",
"At a high-level DPO maximizes the probability of preferred output and minimize rejected output as defined in the following equation:\n",
"\n",
"\\begin{gather*}\n",
"```{math}\n",
"\\mathcal{L}_{\\text{DPO}}(\\pi_\\theta; \\pi_\\text{ref}) = -\\mathbb{E}_{(x,y_w,y_l) \\sim \\mathcal{D}} \\left[\\log \\sigma \\left(\\beta \\underbrace{\\log \\frac{\\pi_\\theta(y_w | x)}{\\pi_\\text{ref}(y_w | x)}}_{\\color{green}\\text{preferred}} - \\beta \\underbrace{\\log \\frac{\\pi_\\theta(y_l | x)}{\\pi_\\text{ref}(y_l | x)}}_{\\color{red}\\text{rejected}}\\right)\\right]\n",
"\\end{gather*}\n",
"```\n",
"\n",
"This approach is more straightforward than PPO, as it avoids the need for a reward model and instead uses a direct comparison of model outputs against human preferences.\n",
"\n",
Expand Down
6 changes: 3 additions & 3 deletions tamingllms/_build/html/markdown/intro.html
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,7 @@
<hr>
<div class="content" role="main" v-pre>

<section id="introduction">
<section class="tex2jax_ignore mathjax_ignore" id="introduction">
<span id="intro"></span><h1><a class="toc-backref" href="#id1" role="doc-backlink"><span class="section-number">2. </span>Introduction</a><a class="headerlink" href="#introduction" title="Permalink to this heading"></a></h1>
<blockquote class="epigraph">
<div><p>I am always doing that which I cannot do, in order that I may learn how to do it.</p>
Expand Down Expand Up @@ -304,7 +304,7 @@ <h2><a class="toc-backref" href="#id5" role="doc-backlink"><span class="section-
<li><p>Share their own experiences and solutions with the community</p></li>
<li><p>Propose new chapters or sections that address emerging challenges</p></li>
</ul>
<p>The repository can be found at https://github.com/souzatharsis/tamingllms. Whether you’ve found a typo, have a better solution to share, or want to contribute an entirely new section, your contributions are welcome.</p>
<p>The repository can be found at <a class="reference external" href="https://github.com/souzatharsis/tamingllms">https://github.com/souzatharsis/tamingllms</a>. Whether you’ve found a typo, have a better solution to share, or want to contribute an entirely new section, your contributions are welcome.</p>
</section>
<section id="a-note-on-perspective">
<h2><a class="toc-backref" href="#id6" role="doc-backlink"><span class="section-number">2.5. </span>A Note on Perspective</a><a class="headerlink" href="#a-note-on-perspective" title="Permalink to this heading"></a></h2>
Expand Down Expand Up @@ -416,7 +416,7 @@ <h3><a class="toc-backref" href="#id14" role="doc-backlink"><span class="section
<h2><a class="toc-backref" href="#id15" role="doc-backlink"><span class="section-number">2.10. </span>About the Author(s)</a><a class="headerlink" href="#about-the-author-s" title="Permalink to this heading"></a></h2>
<p>Dr. Tharsis Souza is a computer scientist and product leader specializing in AI-based products. He is a Lecturer at Columbia University’s Master of Science program in Applied Analytics, (<em>incoming</em>) Head of Product, Equities at Citadel, and former Senior VP at Two Sigma Investments. He also enjoys mentoring under-represented students &amp; working professionals to help create a more diverse global AI ecosystem.</p>
<p>With over 15 years of experience delivering technology products across startups and Fortune 500 companies, Dr. Souza is also an author of numerous scholarly publications and is a frequent speaker at academic and business conferences. Grounded on academic background and drawing from practical experience building and scaling up products powered by language models at early-stage startups, major institutions as well as advising non-profit organizations, and contributing to open source projects, he brings a unique perspective on bridging the gap between LLMs promised potential and their practical implementation challenges to enable the next generation of AI-powered products.</p>
<p>Dr. Tharsis holds a Ph.D. in Computer Science from UCL, University of London following an M.Phil. and M.Sc. in Computer Science and a B.Sc. in Computer Engineering.</p>
<p>Dr. Tharsis holds a Ph.D. in Computer Science from UCL, University of London following an M.Phil. and <a class="reference external" href="http://M.Sc">M.Sc</a>. in Computer Science and a <a class="reference external" href="http://B.Sc">B.Sc</a>. in Computer Engineering.</p>
</section>
</section>

Expand Down
2 changes: 1 addition & 1 deletion tamingllms/_build/html/markdown/preface.html
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@
<hr>
<div class="content" role="main" v-pre>

<section id="preface">
<section class="tex2jax_ignore mathjax_ignore" id="preface">
<h1><span class="section-number">1. </span>Preface<a class="headerlink" href="#preface" title="Permalink to this heading"></a></h1>
<blockquote class="epigraph">
<div><p>Models tell you merely what something is like, not what something is.</p>
Expand Down
2 changes: 1 addition & 1 deletion tamingllms/_build/html/markdown/toc.html
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@
<div class="content" role="main" v-pre>

<p>Sign-up to receive updates on <a class="reference external" href="https://tamingllm.substack.com/">new Chapters here</a>.</p>
<section id="taming-llms">
<section class="tex2jax_ignore mathjax_ignore" id="taming-llms">
<h1>Taming LLMs<a class="headerlink" href="#taming-llms" title="Permalink to this heading"></a></h1>
<section id="a-practical-guide-to-llm-pitfalls-with-open-source-software">
<h2><em>A Practical Guide to LLM Pitfalls with Open Source Software</em><a class="headerlink" href="#a-practical-guide-to-llm-pitfalls-with-open-source-software" title="Permalink to this heading"></a></h2>
Expand Down
13 changes: 6 additions & 7 deletions tamingllms/_build/html/notebooks/alignment.html
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@
<script src="../_static/design-tabs.js"></script>
<script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
<script async="async" src="../_static/sphinx-thebe.js"></script>
<script async="async" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
<script>window.MathJax = {"options": {"processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
<script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
<script type="module" src="https://cdn.jsdelivr.net/npm/mermaid@11.2.0/dist/mermaid.esm.min.mjs"></script>
<script type="module" src="https://cdn.jsdelivr.net/npm/@mermaid-js/layout-elk@0.1.4/dist/mermaid-layout-elk.esm.min.mjs"></script>
<script type="module">import mermaid from "https://cdn.jsdelivr.net/npm/mermaid@11.2.0/dist/mermaid.esm.min.mjs";import elkLayouts from "https://cdn.jsdelivr.net/npm/@mermaid-js/layout-elk@0.1.4/dist/mermaid-layout-elk.esm.min.mjs";mermaid.registerLayoutLoaders(elkLayouts);mermaid.initialize({startOnLoad:false});</script>
Expand Down Expand Up @@ -220,7 +221,7 @@
<hr>
<div class="content" role="main" v-pre>

<section id="preference-based-alignment">
<section class="tex2jax_ignore mathjax_ignore" id="preference-based-alignment">
<h1><a class="toc-backref" href="#id159" role="doc-backlink"><span class="section-number">7. </span>Preference-Based Alignment</a><a class="headerlink" href="#preference-based-alignment" title="Permalink to this heading"></a></h1>
<blockquote class="epigraph">
<div><p>A people that values its privileges above its principles soon loses both.</p>
Expand Down Expand Up @@ -449,10 +450,8 @@ <h4><a class="toc-backref" href="#id165" role="doc-backlink"><span class="sectio
<li><p>Minimizing the KL divergence between the original and fine-tuned model to preserve general capabilities</p></li>
</ol>
<p>At a high-level DPO maximizes the probability of preferred output and minimize rejected output as defined in the following equation:</p>
<div class="amsmath math notranslate nohighlight">
\[\begin{gather*}
\mathcal{L}_{\text{DPO}}(\pi_\theta; \pi_\text{ref}) = -\mathbb{E}_{(x,y_w,y_l) \sim \mathcal{D}} \left[\log \sigma \left(\beta \underbrace{\log \frac{\pi_\theta(y_w | x)}{\pi_\text{ref}(y_w | x)}}_{\color{green}\text{preferred}} - \beta \underbrace{\log \frac{\pi_\theta(y_l | x)}{\pi_\text{ref}(y_l | x)}}_{\color{red}\text{rejected}}\right)\right]
\end{gather*}\]</div>
<div class="math notranslate nohighlight">
\[\mathcal{L}_{\text{DPO}}(\pi_\theta; \pi_\text{ref}) = -\mathbb{E}_{(x,y_w,y_l) \sim \mathcal{D}} \left[\log \sigma \left(\beta \underbrace{\log \frac{\pi_\theta(y_w | x)}{\pi_\text{ref}(y_w | x)}}_{\color{green}\text{preferred}} - \beta \underbrace{\log \frac{\pi_\theta(y_l | x)}{\pi_\text{ref}(y_l | x)}}_{\color{red}\text{rejected}}\right)\right]\]</div>
<p>This approach is more straightforward than PPO, as it avoids the need for a reward model and instead uses a direct comparison of model outputs against human preferences.</p>
<p>Modern libraries such as HuggingFace’s TRL <span id="id21">[<a class="reference internal" href="#id128" title="Hugging Face. Trl. 2024d. TRL. URL: https://huggingface.co/docs/trl/en/index.">Face, 2024d</a>]</span> offer a suite of techniques for fine-tuning language models with reinforcement learning, including PPO, and DPO. It provides a user-friendly interface and a wide range of features for fine-tuning and aligning LLMs, which will be the focus of the next section as we go through a case study.</p>
</section>
Expand Down Expand Up @@ -872,7 +871,7 @@ <h4><a class="toc-backref" href="#id174" role="doc-backlink"><span class="sectio
</div>
<p>Recall our base model is <code class="docutils literal notranslate"><span class="pre">HuggingFaceTB/SmolLM2-360M-Instruct</span></code>. Here, we will use the HuggingFace Inference API to generate rejected responses from a cloud endpoint for enhanced performance:</p>
<ol class="arabic simple">
<li><p>Visit the HuggingFace Endpoints UI: https://ui.endpoints.huggingface.co/</p></li>
<li><p>Visit the HuggingFace Endpoints UI: <a class="reference external" href="https://ui.endpoints.huggingface.co/">https://ui.endpoints.huggingface.co/</a></p></li>
<li><p>Click “New Endpoint” and select the model <code class="docutils literal notranslate"><span class="pre">HuggingFaceTB/SmolLM2-360M-Instruct</span></code></p></li>
<li><p>Choose the compute resources (e.g., CPU or GPU instance, GPU preferred)</p></li>
<li><p>Configure the endpoint settings:</p>
Expand Down
2 changes: 1 addition & 1 deletion tamingllms/_build/html/notebooks/evals.html
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@
<hr>
<div class="content" role="main" v-pre>

<section id="the-evals-gap">
<section class="tex2jax_ignore mathjax_ignore" id="the-evals-gap">
<h1><a class="toc-backref" href="#id153" role="doc-backlink"><span class="section-number">5. </span>The Evals Gap</a><a class="headerlink" href="#the-evals-gap" title="Permalink to this heading"></a></h1>
<blockquote class="epigraph">
<div><p>It doesn’t matter how beautiful your theory is, <br>
Expand Down
2 changes: 1 addition & 1 deletion tamingllms/_build/html/notebooks/output_size_limit.html
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@
<hr>
<div class="content" role="main" v-pre>

<section id="output-size-limitations">
<section class="tex2jax_ignore mathjax_ignore" id="output-size-limitations">
<h1><a class="toc-backref" href="#id118" role="doc-backlink"><span class="section-number">3. </span>Output Size Limitations</a><a class="headerlink" href="#output-size-limitations" title="Permalink to this heading"></a></h1>
<blockquote class="epigraph">
<div><p>Only those who will risk going too far can possibly find out how far one can go.</p>
Expand Down
10 changes: 5 additions & 5 deletions tamingllms/_build/html/notebooks/safety.html
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@
<hr>
<div class="content" role="main" v-pre>

<section id="safety">
<section class="tex2jax_ignore mathjax_ignore" id="safety">
<h1><a class="toc-backref" href="#id161" role="doc-backlink"><span class="section-number">6. </span>Safety</a><a class="headerlink" href="#safety" title="Permalink to this heading"></a></h1>
<blockquote class="epigraph">
<div><p>Move fast and be responsible.</p>
Expand Down Expand Up @@ -813,7 +813,7 @@ <h4><a class="toc-backref" href="#id189" role="doc-backlink"><span class="sectio
<p>Anthropic/hh-rlhf</p>
<ul class="simple">
<li><p>SALADBench</p></li>
<li><p>https://huggingface.co/datasets/Anthropic/hh-rlhf</p></li>
<li><p><a class="reference external" href="https://huggingface.co/datasets/Anthropic/hh-rlhf">https://huggingface.co/datasets/Anthropic/hh-rlhf</a></p></li>
<li><p>ABC</p></li>
<li><p>use of synthetic datasets</p></li>
</ul>
Expand All @@ -830,10 +830,10 @@ <h3><a class="toc-backref" href="#id190" role="doc-backlink"><span class="sectio
<p>LM-Based:</p>
<ul class="simple">
<li><p>OpenAI Moderation API</p></li>
<li><p>IBM Granite Guardian: https://github.com/ibm-granite/granite-guardian</p></li>
<li><p>IBM Granite Guardian: <a class="reference external" href="https://github.com/ibm-granite/granite-guardian">https://github.com/ibm-granite/granite-guardian</a></p></li>
<li><p>Llama-Guard</p></li>
<li><p>NeMo Guardrails: https://github.com/NVIDIA/NeMo-Guardrails</p></li>
<li><p>Mistral moderation: https://github.com/mistralai/cookbook/blob/main/mistral/moderation/system-level-guardrails.ipynb</p></li>
<li><p>NeMo Guardrails: <a class="reference external" href="https://github.com/NVIDIA/NeMo-Guardrails">https://github.com/NVIDIA/NeMo-Guardrails</a></p></li>
<li><p>Mistral moderation: <a class="reference external" href="https://github.com/mistralai/cookbook/blob/main/mistral/moderation/system-level-guardrails.ipynb">https://github.com/mistralai/cookbook/blob/main/mistral/moderation/system-level-guardrails.ipynb</a></p></li>
</ul>
<section id="filter-based">
<h4><a class="toc-backref" href="#id191" role="doc-backlink"><span class="section-number">6.5.2.1. </span>Filter-based</a><a class="headerlink" href="#filter-based" title="Permalink to this heading"></a></h4>
Expand Down
39 changes: 22 additions & 17 deletions tamingllms/_build/html/notebooks/structured_output.html
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@
<script src="../_static/design-tabs.js"></script>
<script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
<script async="async" src="../_static/sphinx-thebe.js"></script>
<script>window.MathJax = {"options": {"processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
<script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>


<!-- bundled in js (rollup iife) -->
Expand Down Expand Up @@ -222,7 +224,7 @@
<hr>
<div class="content" role="main" v-pre>

<section id="wrestling-with-structured-output">
<section class="tex2jax_ignore mathjax_ignore" id="wrestling-with-structured-output">
<h1><a class="toc-backref" href="#id128" role="doc-backlink"><span class="section-number">4. </span>Wrestling with Structured Output</a><a class="headerlink" href="#wrestling-with-structured-output" title="Permalink to this heading"></a></h1>
<blockquote class="epigraph">
<div><p>In limits, there is freedom. Creativity thrives within structure.</p>
Expand Down Expand Up @@ -695,30 +697,33 @@ <h3><a class="toc-backref" href="#id139" role="doc-backlink"><span class="sectio
<p>Outlines <span id="id3">[<a class="reference internal" href="#id15" title="Outlines. Type-safe structured output from llms. https://dottxt-ai.github.io/outlines/latest/, 2024. Accessed: 2024.">Outlines, 2024</a>]</span> is a library specifically focused on structured text generation from LLMs. Under the hood, Outlines works by adjusting the probability distribution of the model’s output logits - the raw scores from the final layer of the neural network that are normally converted into text tokens. By introducing carefully crafted logit biases, Outlines can guide the model to prefer certain tokens over others, effectively constraining its outputs to a predefined set of valid options.</p>
<p>The authors solve the general guided generation problem <span id="id4">[<a class="reference internal" href="#id60" title="Brandon T. Willard and Rémi Louf. Efficient guided generation for large language models. 2023. URL: https://arxiv.org/abs/2307.09702, arXiv:2307.09702.">Willard and Louf, 2023</a>]</span>, which as a consequence solves the problem of structured output generation, in LLMs by introducing an efficient indexing approach that reformulates neural text generation using finite-state machines (FSMs).</p>
<p>They define the next token generation as a random variable:</p>
<p>$$s_{t+1} \sim \text{Categorical}(\alpha) \text{ where } \alpha = \text{LLM}(S_t, \theta)$$</p>
<div class="math notranslate nohighlight">
\[s_{t+1} \sim \text{Categorical}(\alpha) \text{ where } \alpha = \text{LLM}(S_t, \theta)\]</div>
<p>Where:</p>
<ul class="simple">
<li><p>$s_{t+1}$ is the next token to be generated</p></li>
<li><p>$S_t = (s_1s_t)$ represents a sequence of t tokens with $s_t \in V$</p></li>
<li><p>$V$ is the vocabulary with size $|V| = N$ (typically around $10^4$ or larger)</p></li>
<li><p>$\alpha \in \mathbb{R}^N$ is the output logits/probabilities over the vocabulary</p></li>
<li><p>$\theta$ is the set of trained parameters of the LLM</p></li>
<li><p>$\text{LLM}$ refers to a deep neural network trained on next-token-completion tasks</p></li>
<li><p>$\text{Categorical}(\alpha)$ represents sampling from a categorical distribution with probabilities $\alpha$</p></li>
<li><p><span class="math notranslate nohighlight">\(s_{t+1}\)</span> is the next token to be generated</p></li>
<li><p><span class="math notranslate nohighlight">\(S_t = (s_1...s_t)\)</span> represents a sequence of t tokens with <span class="math notranslate nohighlight">\(s_t \in V\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(V\)</span> is the vocabulary with size <span class="math notranslate nohighlight">\(|V| = N\)</span> (typically around <span class="math notranslate nohighlight">\(10^4\)</span> or larger)</p></li>
<li><p><span class="math notranslate nohighlight">\(\alpha \in \mathbb{R}^N\)</span> is the output logits/probabilities over the vocabulary</p></li>
<li><p><span class="math notranslate nohighlight">\(\theta\)</span> is the set of trained parameters of the LLM</p></li>
<li><p><span class="math notranslate nohighlight">\(\text{LLM}\)</span> refers to a deep neural network trained on next-token-completion tasks</p></li>
<li><p><span class="math notranslate nohighlight">\(\text{Categorical}(\alpha)\)</span> represents sampling from a categorical distribution with probabilities <span class="math notranslate nohighlight">\(\alpha\)</span></p></li>
</ul>
<p>When applying masking for guided generation, this becomes:</p>
<p>$$
<div class="math notranslate nohighlight">
\[
\tilde{\alpha} = m(S_t) \odot \alpha
$$</p>
<p>$$
\]</div>
<div class="math notranslate nohighlight">
\[
\tilde{s}_{t+1} \sim \text{Categorical}(\tilde{\alpha})
$$</p>
\]</div>
<p>Where:</p>
<ul class="simple">
<li><p>$m: P(V) \rightarrow {0,1}^N$ is a boolean mask function</p></li>
<li><p>$\odot$ represents element-wise multiplication</p></li>
<li><p>$\tilde{\alpha}$ is the masked (constrained) probability distribution</p></li>
<li><p>$\tilde{s}_{t+1}$ is the next token sampled under constraints</p></li>
<li><p><span class="math notranslate nohighlight">\(m: P(V) \rightarrow {0,1}^N\)</span> is a boolean mask function</p></li>
<li><p><span class="math notranslate nohighlight">\(\odot\)</span> represents element-wise multiplication</p></li>
<li><p><span class="math notranslate nohighlight">\(\tilde{\alpha}\)</span> is the masked (constrained) probability distribution</p></li>
<li><p><span class="math notranslate nohighlight">\(\tilde{s}_{t+1}\)</span> is the next token sampled under constraints</p></li>
</ul>
<p>This formulation allows the masking operation to guide the generation process by zeroing out probabilities of invalid tokens according to the finite state machine states. But instead of checking the entire vocabulary (size N) at each generation step (O(N) complexity) to enforce output constraints, they convert constraints (regex/grammar) into FSM states and build an index mapping FSM states to valid vocabulary tokens. This achieves O(1) average complexity for token generation.</p>
<p>In summary, there are two stages in the Outlines framework <span id="id5">[<a class="reference internal" href="#id59" title="Vivien Tran-Thien. Fast, high-fidelity llm decoding with regex constraints. 2024. URL: https://vivien000.github.io/blog/journal/llm-decoding-with-regex-constraints.html.">Tran-Thien, 2024</a>]</span>:</p>
Expand Down
2 changes: 1 addition & 1 deletion tamingllms/_build/html/searchindex.js

Large diffs are not rendered by default.

Loading

0 comments on commit ee3c172

Please sign in to comment.