Skip to content

Commit

Permalink
add citation
Browse files Browse the repository at this point in the history
  • Loading branch information
souzatharsis committed Dec 19, 2024
1 parent 2a17ca9 commit 948cb91
Show file tree
Hide file tree
Showing 36 changed files with 880 additions and 583 deletions.
Binary file modified tamingllms/_build/.doctrees/environment.pickle
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/markdown/preface.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/alignment.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/evals.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/output_size_limit.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/safety.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/structured_output.doctree
Binary file not shown.
12 changes: 3 additions & 9 deletions tamingllms/_build/html/_sources/notebooks/alignment.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2582,7 +2582,6 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Citation\n",
"[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]\n",
"\n",
Expand All @@ -2593,18 +2592,13 @@
"```\n",
"@misc{tharsistpsouza2024tamingllms,\n",
" author = {Tharsis T. P. Souza},\n",
" title = {Taming LLMs},\n",
" title = {Taming LLMs: A Practical Guide to LLM Pitfalls with Open Source Software},\n",
" year = {2024},\n",
" chapter = {Preference-Based Alignment},\n",
" journal = {GitHub repository},\n",
" url = {https://github.com/souzatharsis/tamingLLMs)\n",
"}\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```\n",
"## References\n",
"```{bibliography}\n",
":filter: docname in docnames\n",
Expand Down
21 changes: 21 additions & 0 deletions tamingllms/_build/html/_sources/notebooks/evals.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1235,6 +1235,10 @@
"\n",
"The **AlpacaEval** {cite}`dubois2024lengthcontrolledalpacaevalsimpleway` and **MT-Bench** {cite}`zheng2023judgingllmasajudgemtbenchchatbot` Leaderboards implement automated evaluation using GPT-4 to assess model performance in multi-turn conversations. This approach enables consistent assessment of dialogue capabilities while reducing human bias. Their methodology measures key aspects of conversational AI, including contextual understanding and response consistency across multiple exchanges.\n",
"\n",
"\n",
"An important recent development was the release of Global-MMLU {cite}`singh2024globalmmluunderstandingaddressing`, an improved version of MMLU with evaluation coverage across 42 languages. This open dataset, built through collaboration between Argilla, the Hugging Face community, and researchers from leading institutions like Cohere For AI, Mila, MIT, and others, represents a significant step toward more inclusive multilingual LLM evaluation. Over 200 contributors used Argilla to annotate MMLU questions, revealing that 85% of questions requiring specific cultural knowledge were Western-centric. The newly released dataset is divided into two key subsets: Culturally Agnostic questions that require no specific regional or cultural knowledge, and Culturally Sensitive questions that depend on dialect, cultural, or geographic knowledge. With high-quality translations available for 25 languages, Global-MMLU enables better understanding of LLM capabilities and limitations across different languages and cultural contexts.\n",
"\n",
"\n",
"A major challenge with these leaderboards and benchmarks is test set contamination - when test data ends up in newer models' training sets, rendering the benchmarks ineffective. While some benchmarks try to address this through crowdsourced prompts and evaluations from humans or LLMs, these approaches introduce their own biases and struggle with difficult questions. **LiveBench** {cite}`white2024livebenchchallengingcontaminationfreellm` represents a novel solution, designed specifically to be resilient to both contamination and evaluation biases. As the first benchmark with continuously updated questions from recent sources, automated objective scoring, and diverse challenging tasks across multiple domains, LiveBench maintains its effectiveness even as models improve. Drawing from recent math competitions, research papers, news, and datasets, it creates contamination-free versions of established benchmark tasks. Current results show even top models achieving below 70% accuracy, demonstrating LiveBench's ability to meaningfully differentiate model capabilities. With monthly updates and an open collaborative approach, LiveBench aims to provide sustained value for model evaluation as the field advances.\n",
"\n",
"Another notable benchmark is ZebraLogic {cite}`zebralogic2024`, which evaluates logical reasoning capabilities of LLMs through Logic Grid Puzzles - a type of Constraint Satisfaction Problem {cite}`brailsford1999constraint` commonly found in tests like the LSAT. These puzzles require assigning unique values to N houses across M different features based on given clues, demanding strategic reasoning and deduction to arrive at a unique correct solution. The benchmark's programmatically generated puzzles range from 2x2 to 6x6 in size and test LLMs using one-shot examples with reasoning steps. While humans can solve these puzzles through strategic methods like reductio ad absurdum and elimination, LLMs demonstrate significant limitations in this type of logical reasoning. Even the best-performing model, Claude 3.5 Sonnet, only achieves 33.4% accuracy across all puzzles and 12.4% on hard puzzles, with smaller models (7-10B parameters) solving less than 1% of hard puzzles as of December 2024. These results reveal critical gaps in LLMs' capabilities around counterfactual thinking, reflective reasoning, structured memorization, and compositional generalization.\n",
Expand Down Expand Up @@ -2549,6 +2553,23 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Citation\n",
"[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]\n",
"\n",
"[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/\n",
"[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png\n",
"[cc-by-nc-sa-shield]: https://img.shields.io/badge/License-CC-BY--NC--SA-4.0-lightgrey.svg\n",
"\n",
"```\n",
"@misc{tharsistpsouza2024tamingllms,\n",
" author = {Tharsis T. P. Souza},\n",
" title = {Taming LLMs: A Practical Guide to LLM Pitfalls with Open Source Software},\n",
" year = {2024},\n",
" chapter = {The Evals Gap},\n",
" journal = {GitHub repository},\n",
" url = {https://github.com/souzatharsis/tamingLLMs)\n",
"}\n",
"```\n",
"## References\n",
"```{bibliography}\n",
":filter: docname in docnames\n",
Expand Down
26 changes: 24 additions & 2 deletions tamingllms/_build/html/_sources/notebooks/output_size_limit.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Output Size Limitations\n",
"# Output Size Limit\n",
"```{epigraph}\n",
"Only those who will risk going too far can possibly find out how far one can go.\n",
"\n",
Expand Down Expand Up @@ -467,8 +467,30 @@
"\n",
"## Conclusion\n",
"\n",
"In conclusion, while managing output size limitations in LLMs can be challenging, it also drives innovation in application design and optimization strategies. By implementing techniques such as context chunking, efficient prompt templates, and graceful fallbacks, developers can mitigate these limitations and enhance the performance of their applications. As the technology evolves, advancements in contextual awareness, token efficiency, and memory management will further mitigate these limitations, empowering developers to build more robust and scalable LLM-powered systems.\n",
"In conclusion, while managing output size limitations in LLMs can be challenging, it also drives innovation in application design and optimization strategies. By implementing techniques such as context chunking, efficient prompt templates, and graceful fallbacks, developers can mitigate these limitations and enhance the performance of their applications. As the technology evolves, advancements in contextual awareness, token efficiency, and memory management will further mitigate these limitations, empowering developers to build more robust and scalable LLM-powered systems."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Citation\n",
"[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]\n",
"\n",
"[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/\n",
"[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png\n",
"[cc-by-nc-sa-shield]: https://img.shields.io/badge/License-CC-BY--NC--SA-4.0-lightgrey.svg\n",
"\n",
"```\n",
"@misc{tharsistpsouza2024tamingllms,\n",
" author = {Tharsis T. P. Souza},\n",
" title = {Taming LLMs: A Practical Guide to LLM Pitfalls with Open Source Software},\n",
" year = {2024},\n",
" chapter = {Output Size Limit},\n",
" journal = {GitHub repository},\n",
" url = {https://github.com/souzatharsis/tamingLLMs)\n",
"}\n",
"```\n",
"\n",
"## References\n",
"```{bibliography}\n",
Expand Down
17 changes: 17 additions & 0 deletions tamingllms/_build/html/_sources/notebooks/safety.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2492,6 +2492,23 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Citation\n",
"[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]\n",
"\n",
"[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/\n",
"[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png\n",
"[cc-by-nc-sa-shield]: https://img.shields.io/badge/License-CC-BY--NC--SA-4.0-lightgrey.svg\n",
"\n",
"```\n",
"@misc{tharsistpsouza2024tamingllms,\n",
" author = {Tharsis T. P. Souza},\n",
" title = {Taming LLMs: A Practical Guide to LLM Pitfalls with Open Source Software},\n",
" year = {2024},\n",
" chapter = {Safety},\n",
" journal = {GitHub repository},\n",
" url = {https://github.com/souzatharsis/tamingLLMs)\n",
"}\n",
"```\n",
"## References\n",
"```{bibliography}\n",
":filter: docname in docnames\n",
Expand Down
17 changes: 17 additions & 0 deletions tamingllms/_build/html/_sources/notebooks/structured_output.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1111,6 +1111,23 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Citation\n",
"[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]\n",
"\n",
"[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/\n",
"[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png\n",
"[cc-by-nc-sa-shield]: https://img.shields.io/badge/License-CC-BY--NC--SA-4.0-lightgrey.svg\n",
"\n",
"```\n",
"@misc{tharsistpsouza2024tamingllms,\n",
" author = {Tharsis T. P. Souza},\n",
" title = {Taming LLMs: A Practical Guide to LLM Pitfalls with Open Source Software},\n",
" year = {2024},\n",
" chapter = {Wrestling with Structured Output},\n",
" journal = {GitHub repository},\n",
" url = {https://github.com/souzatharsis/tamingLLMs)\n",
"}\n",
"```\n",
"## References\n",
"```{bibliography}\n",
":filter: docname in docnames\n",
Expand Down
2 changes: 1 addition & 1 deletion tamingllms/_build/html/genindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@

<li class="toctree-l1 ">

<a href="notebooks/output_size_limit.html" class="reference internal ">Output Size Limitations</a>
<a href="notebooks/output_size_limit.html" class="reference internal ">Output Size Limit</a>



Expand Down
8 changes: 4 additions & 4 deletions tamingllms/_build/html/markdown/intro.html
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@

<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="3. Output Size Limitations" href="../notebooks/output_size_limit.html" />
<link rel="next" title="3. Output Size Limit" href="../notebooks/output_size_limit.html" />
<link rel="prev" title="1. Preface" href="preface.html" />
</head>

Expand Down Expand Up @@ -152,7 +152,7 @@

<li class="toctree-l1 ">

<a href="../notebooks/output_size_limit.html" class="reference internal ">Output Size Limitations</a>
<a href="../notebooks/output_size_limit.html" class="reference internal ">Output Size Limit</a>



Expand Down Expand Up @@ -218,7 +218,7 @@
</li>
<li class="next">
<a href="../notebooks/output_size_limit.html"
title="next chapter"><span class="section-number">3. </span>Output Size Limitations</a>
title="next chapter"><span class="section-number">3. </span>Output Size Limit</a>
</li>
</ul>

Expand Down Expand Up @@ -449,7 +449,7 @@ <h2><a class="toc-backref" href="#id15" role="doc-backlink"><span class="section
</li>
<li class="next">
<a href="../notebooks/output_size_limit.html"
title="next chapter"><span class="section-number">3. </span>Output Size Limitations</a>
title="next chapter"><span class="section-number">3. </span>Output Size Limit</a>
</li>
</ul><div class="footer" role="contentinfo">
<br>
Expand Down
6 changes: 3 additions & 3 deletions tamingllms/_build/html/markdown/preface.html
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@

<li class="toctree-l1 ">

<a href="../notebooks/output_size_limit.html" class="reference internal ">Output Size Limitations</a>
<a href="../notebooks/output_size_limit.html" class="reference internal ">Output Size Limit</a>



Expand Down Expand Up @@ -214,7 +214,7 @@ <h1><span class="section-number">1. </span>Preface<a class="headerlink" href="#p
<div><p>Models tell you merely what something is like, not what something is.</p>
<p class="attribution">—Emanuel Derman</p>
</div></blockquote>
<p>An alternative title of this book could have been “Language Models Behaving Badly”. If you are coming from a background in financial modeling, you may have noticed the parallel with Emanuel Derman’s seminal work “Models.Behaving.Badly” <span id="id1">[<a class="reference internal" href="#id124" title="E. Derman. Models.Behaving.Badly.: Why Confusing Illusion with Reality Can Lead to Disaster, on Wall Street and in Life. Free Press, 2011. ISBN 9781439165010. URL: https://books.google.co.uk/books?id=lke_cwM4wm8C.">Derman, 2011</a>]</span>. This parallel is not coincidental. Just as Derman cautioned against treating financial models as perfect representations of reality, this book aims to highlight the limitations and pitfalls of Large Language Models (LLMs) in practical applications (of course baring the fact Derman is an actual physicist and legendary author, professor and quant; I am not).</p>
<p>An alternative title of this book could have been “Language Models Behaving Badly”. If you are coming from a background in financial modeling, you may have noticed the parallel with Emanuel Derman’s seminal work “Models.Behaving.Badly” <span id="id1">[<a class="reference internal" href="#id125" title="E. Derman. Models.Behaving.Badly.: Why Confusing Illusion with Reality Can Lead to Disaster, on Wall Street and in Life. Free Press, 2011. ISBN 9781439165010. URL: https://books.google.co.uk/books?id=lke_cwM4wm8C.">Derman, 2011</a>]</span>. This parallel is not coincidental. Just as Derman cautioned against treating financial models as perfect representations of reality, this book aims to highlight the limitations and pitfalls of Large Language Models (LLMs) in practical applications (of course baring the fact Derman is an actual physicist and legendary author, professor and quant; I am not).</p>
<p>The book “Models.Behaving.Badly” by Emanuel Derman, a former physicist and Goldman Sachs quant, explores how financial and scientific models can fail when we mistake them for reality rather than treating them as approximations full of assumptions.
The core premise of his work is that while models can be useful tools for understanding aspects of the world, they inherently involve simplification and assumptions. Derman argues that many financial crises, including the 2008 crash, occurred partly because people put too much faith in mathematical models without recognizing their limitations.</p>
<p>Like financial models that failed to capture the complexity of human behavior and market dynamics, LLMs have inherent constraints. They can hallucinate facts, struggle with logical reasoning, and fail to maintain consistency across long outputs. Their responses, while often convincing, are probabilistic approximations based on training data rather than true understanding even though humans insist on treating them as “machines that can reason”.</p>
Expand All @@ -224,7 +224,7 @@ <h1><span class="section-number">1. </span>Preface<a class="headerlink" href="#p
<section id="references">
<h2><span class="section-number">1.1. </span>References<a class="headerlink" href="#references" title="Permalink to this heading"></a></h2>
<div class="docutils container" id="id2">
<div class="citation" id="id124" role="doc-biblioentry">
<div class="citation" id="id125" role="doc-biblioentry">
<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="#id1">Der11</a><span class="fn-bracket">]</span></span>
<p>E. Derman. <em>Models.Behaving.Badly.: Why Confusing Illusion with Reality Can Lead to Disaster, on Wall Street and in Life</em>. Free Press, 2011. ISBN 9781439165010. URL: <a class="reference external" href="https://books.google.co.uk/books?id=lke_cwM4wm8C">https://books.google.co.uk/books?id=lke_cwM4wm8C</a>.</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion tamingllms/_build/html/markdown/toc.html
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@

<li class="toctree-l1 ">

<a href="../notebooks/output_size_limit.html" class="reference internal ">Output Size Limitations</a>
<a href="../notebooks/output_size_limit.html" class="reference internal ">Output Size Limit</a>



Expand Down
Loading

0 comments on commit 948cb91

Please sign in to comment.