Skip to content

Commit

Permalink
update TOC
Browse files Browse the repository at this point in the history
  • Loading branch information
souzatharsis committed Nov 27, 2024
1 parent 9fa6ef2 commit 3e36772
Show file tree
Hide file tree
Showing 12 changed files with 475 additions and 274 deletions.
Binary file modified tamingllms/_build/.doctrees/environment.pickle
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/markdown/intro.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/markdown/toc.doctree
Binary file not shown.
16 changes: 12 additions & 4 deletions tamingllms/_build/html/_sources/markdown/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ To make the most of this book, you should have:

Before diving into the examples in this book, you'll need to set up your development environment. Here's how to get started:

### 1. Python Environment Setup
### Python Environment Setup
```bash
# Create and activate a virtual environment
python -m venv llm-book-env
Expand All @@ -116,7 +116,7 @@ source llm-book-env/bin/activate # On Windows, use: llm-book-env\Scripts\activa
pip install -r requirements.txt
```

### 2. API Keys Configuration
### API Keys Configuration
1. Create a `.env` file in the root directory of the project.
2. Add your API keys and other sensitive information to the `.env` file. For example:

Expand All @@ -128,7 +128,7 @@ pip install -r requirements.txt
Never share your `.env` file or commit it to version control. It contains sensitive information that should be kept private.
```

### 3. Code Repository
### Code Repository
Clone the book's companion repository:
```bash
git clone https://github.com/souzatharsis/tamingllms.git
Expand All @@ -140,4 +140,12 @@ cd tamingllms
- For package conflicts, try creating a fresh virtual environment or use a package manager like `poetry`
- Check the book's repository issues page for known problems and solutions

Now that your environment is set up, let's begin our exploration of LLM challenges.
Now that your environment is set up, let's begin our exploration of LLM challenges.

## About the Author(s)

Dr. Tharsis Souza is a computer scientist and product leader specializing in AI-based product development. He is a Lecturer at Columbia University's Master of Science program in Applied Analytics, Head of Product, Equities at Citadel, and former Senior VP at Two Sigma Investments.

With over 15 years of experience delivering technology products across startups and Fortune 500 companies globally, Dr. Souza is also an author of numerous scholarly publications and is a frequent speaker at academic and business conferences. Grounded on academic background and drawing from practical experience building and scaling up products powered by language models at early-stage startups, major institutions as well as advising non-profit organizations, and contributing to open source projects, he brings a unique perspective on bridging the gap between LLMs promised potential and their practical limitations using open source tools to enable the next generation of AI-powered products.

Dr. Tharsis holds a Ph.D. in Computer Science from UCL, University of London following an M.Phil. and M.Sc. in Computer Science and a B.Sc. in Computer Engineering.
199 changes: 118 additions & 81 deletions tamingllms/_build/html/_sources/markdown/toc.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,93 +8,130 @@ date: "2024-11-22"
*A Practical Guide to LLM Pitfalls with Python Examples*

## Chapter 1: Introduction
- The Hidden Challenges of LLMs
- Why This Book Matters
- Overview of Key Problems
- 1.1 Core Challenges We'll Address
- 1.2 A Practical Approach
- 1.3 A Note on Perspective
- 1.4 Who This Book Is For
- 1.5 Outcomes
- 1.6 Prerequisites
- 1.7 Setting Up Your Environment
- 1.7.1 Python Environment Setup
- 1.7.2 API Keys Configuration
- 1.7.3 Code Repository
- 1.7.4 Troubleshooting Common Issues
- 1.8 About the Author(s)

## Chapter 2: Non-determinism & Evals
- Understanding Non-deterministic Behavior in LLMs
- Temperature and Randomness Effects
- Evaluation Challenges
- Measuring Consistency
- Testing Non-deterministic Systems
- Observability
- Logging Strategies
- Monitoring Solutions
- Debugging Non-deterministic Responses
- Practical Solutions and Patterns
- Implementing Deterministic Workflows
- Testing Strategies
## Chapter 2: Wrestling with Structured Output
- 2.1 The Structured Output Challenge
- 2.2 Problem Statement
- 2.3 Solutions
- 2.3.1 Strategies
- 2.3.2 Techniques and Tools
- 2.3.2.1 One-Shot Prompts
- 2.3.2.2 Structured Output with Provider-Specific APIs
- 2.3.2.2.1 JSON Mode
- 2.3.2.2.2 Structured Output Mode
- 2.3.2.3 LangChain
- 2.3.2.4 Outlines
- 2.3.2.4.1 Multiple Choice Generation
- 2.3.2.4.2 Pydantic model
- 2.4 Discussion
- 2.4.1 Comparing Solutions
- 2.4.2 Best Practices
- 2.4.3 Research & Ongoing Debate
- 2.5 Conclusion
- 2.6 Acknowledgements
- 2.7 References

## Chapter 3: Wrestling with Structured Output
- The Structured Output Challenge
- Common Failure Modes
- Text Output Inconsistencies
- Implementation Patterns
- Output Validation
- Error Recovery
- Format Enforcement
- Best Practices for Reliable Output
- Testing Structured Responses
## Chapter 3: Input Size and Length Limitations
- 3.1 Context Window Constraints
- 3.2 Handling Long Inputs
- 3.3 Managing Token Limits
- 3.4 Chunking Strategies
- 3.5 Implementation Patterns
- 3.6 Testing Long-form Content

## Chapter 4: Hallucination: The Reality Gap
- Understanding Hallucination Types
- Detection Strategies
- Grounding Techniques
- Retrieval-Augmented Generation (RAG)
- Context Selection
- Indexing Strategies
- Vector Stores
- Chunking Methods
- Practical Implementation
- Building a RAG Pipeline
- Testing and Validation
## Chapter 4: Output Size and Length Limitations
- 4.1 What are Token Limits?
- 4.2 Problem Statement
- 4.3 Content Chunking with Contextual Linking
- 4.3.1 Generating long-form content
- 4.3.2 Step 1: Chunking the Content
- 4.3.3 Step 2: Writing the Base Prompt Template
- 4.3.4 Step 3: Constructing Dynamic Prompt Parameters
- 4.3.5 Step 4: Generating the Report
- 4.3.6 Example Usage
- 4.4 Discussion
- 4.5 Implications
- 4.6 Future Considerations
- 4.7 Conclusion
- 4.8 References

## Chapter 5: The Cost Factor
- Understanding LLM Costs
- Token Optimization
- Caching Strategies
- Implementation Patterns
- Cache Invalidation
- Output Prediction Techniques
- Cost Monitoring
- Optimization Strategies
## Chapter 5: Challenges of Evaluating LLM-based Applications
- 5.1 Non-Deterministic Machines
- 5.1.1 Temperature and Sampling
- 5.1.2 The Temperature Spectrum
- 5.2 Emerging Properties
- 5.3 Problem Statement
- 5.4 Evals Design
- 5.4.1 Conceptual Overview
- 5.4.2 Design Considerations
- 5.4.3 Key Components
- 5.4.4 Metrics
- 5.4.4.1 Working Example
- 5.4.4.2 Considerations
- 5.4.5 Evaluators
- 5.4.5.1 Model-Based Evaluation
- 5.4.5.2 Human-Based Evaluation
- 5.4.6 Leaderboards
- 5.4.7 Tools
- 5.5 References

## Chapter 6: Safety Concerns
- Common Safety Issues
- Implementation of Safety Guards
- Content Filtering
- Input Validation
- Output Sanitization
- Monitoring and Alerts
- Best Practices
## Chapter 6: Hallucination: The Reality Gap
- 6.1 Understanding Hallucination Types
- 6.2 Detection Strategies
- 6.3 Grounding Techniques
- 6.4 Retrieval-Augmented Generation (RAG)
- 6.4.1 Context Selection
- 6.4.2 Indexing Strategies
- 6.4.3 Vector Stores
- 6.4.4 Chunking Methods
- 6.5 Practical Implementation
- 6.5.1 Building a RAG Pipeline
- 6.5.2 Testing and Validation

## Chapter 7: Size and Length Limitations
- Context Window Constraints
- Handling Long Inputs
- Managing Token Limits
- Chunking Strategies
- Implementation Patterns
- Testing Long-form Content
## Chapter 7: Safety Concerns
- 7.1 Common Safety Issues
- 7.2 Implementation of Safety Guards
- 7.3 Content Filtering
- 7.4 Input Validation
- 7.5 Output Sanitization
- 7.6 Monitoring and Alerts
- 7.7 Best Practices

## Chapter 8: Breaking Free from Cloud Providers
- The Vendor Lock-in Problem
- Self-hosting Solutions
- Llama 2 Implementation
- Llamafile Setup and Usage
- Ollama Deployment
- Performance Considerations
- Cost Analysis
- Migration Strategies
## Chapter 8: The Cost Factor
- 8.1 Understanding LLM Costs
- 8.2 Token Optimization
- 8.3 Caching Strategies
- 8.3.1 Implementation Patterns
- 8.3.2 Cache Invalidation
- 8.4 Output Prediction Techniques
- 8.5 Cost Monitoring
- 8.6 Optimization Strategies

## Appendix A: Code Examples
- Complete Implementation Examples
- Testing Scripts
- Utility Functions
- Configuration Templates
## Chapter 9: Breaking Free from Cloud Providers
- 9.1 The Vendor Lock-in Problem
- 9.2 Self-hosting Solutions
- 9.2.1 Llama Implementation
- 9.2.2 Llamafile Setup and Usage
- 9.2.3 Ollama Deployment
- 9.3 Performance Considerations
- 9.4 Cost Analysis
- 9.5 Migration Strategies

## Appendix B: Tools and Resources
- Recommended Libraries
- Testing Tools
- Monitoring Solutions
- Community Resources

## Appendix A: Tools and Resources
- A.1 Evaluation Tools
- A.2 Monitoring Solutions
- A.3 Open Source Models
- A.4 Community Resources
21 changes: 15 additions & 6 deletions tamingllms/_build/html/markdown/intro.html
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,8 @@

<li class="toctree-l2"><a href="#setting-up-your-environment" class="reference internal">Setting Up Your Environment</a></li>

<li class="toctree-l2"><a href="#about-the-author-s" class="reference internal">About the Author(s)</a></li>

</ul>

</li>
Expand Down Expand Up @@ -212,12 +214,13 @@
<li><p><a class="reference internal" href="#prerequisites" id="id7">Prerequisites</a></p></li>
<li><p><a class="reference internal" href="#setting-up-your-environment" id="id8">Setting Up Your Environment</a></p>
<ul>
<li><p><a class="reference internal" href="#python-environment-setup" id="id9">1. Python Environment Setup</a></p></li>
<li><p><a class="reference internal" href="#api-keys-configuration" id="id10">2. API Keys Configuration</a></p></li>
<li><p><a class="reference internal" href="#code-repository" id="id11">3. Code Repository</a></p></li>
<li><p><a class="reference internal" href="#python-environment-setup" id="id9">Python Environment Setup</a></p></li>
<li><p><a class="reference internal" href="#api-keys-configuration" id="id10">API Keys Configuration</a></p></li>
<li><p><a class="reference internal" href="#code-repository" id="id11">Code Repository</a></p></li>
<li><p><a class="reference internal" href="#troubleshooting-common-issues" id="id12">Troubleshooting Common Issues</a></p></li>
</ul>
</li>
<li><p><a class="reference internal" href="#about-the-author-s" id="id13">About the Author(s)</a></p></li>
</ul>
</li>
</ul>
Expand Down Expand Up @@ -307,7 +310,7 @@ <h2><a class="toc-backref" href="#id7" role="doc-backlink"><span class="section-
<h2><a class="toc-backref" href="#id8" role="doc-backlink"><span class="section-number">1.7. </span>Setting Up Your Environment</a><a class="headerlink" href="#setting-up-your-environment" title="Permalink to this heading"></a></h2>
<p>Before diving into the examples in this book, you’ll need to set up your development environment. Here’s how to get started:</p>
<section id="python-environment-setup">
<h3><a class="toc-backref" href="#id9" role="doc-backlink"><span class="section-number">1.7.1. </span>1. Python Environment Setup</a><a class="headerlink" href="#python-environment-setup" title="Permalink to this heading"></a></h3>
<h3><a class="toc-backref" href="#id9" role="doc-backlink"><span class="section-number">1.7.1. </span>Python Environment Setup</a><a class="headerlink" href="#python-environment-setup" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># Create and activate a virtual environment</span>
python<span class="w"> </span>-m<span class="w"> </span>venv<span class="w"> </span>llm-book-env
<span class="nb">source</span><span class="w"> </span>llm-book-env/bin/activate<span class="w"> </span><span class="c1"># On Windows, use: llm-book-env\Scripts\activate</span>
Expand All @@ -318,7 +321,7 @@ <h3><a class="toc-backref" href="#id9" role="doc-backlink"><span class="section-
</div>
</section>
<section id="api-keys-configuration">
<h3><a class="toc-backref" href="#id10" role="doc-backlink"><span class="section-number">1.7.2. </span>2. API Keys Configuration</a><a class="headerlink" href="#api-keys-configuration" title="Permalink to this heading"></a></h3>
<h3><a class="toc-backref" href="#id10" role="doc-backlink"><span class="section-number">1.7.2. </span>API Keys Configuration</a><a class="headerlink" href="#api-keys-configuration" title="Permalink to this heading"></a></h3>
<ol class="arabic">
<li><p>Create a <code class="docutils literal notranslate"><span class="pre">.env</span></code> file in the root directory of the project.</p></li>
<li><p>Add your API keys and other sensitive information to the <code class="docutils literal notranslate"><span class="pre">.env</span></code> file. For example:</p>
Expand All @@ -333,7 +336,7 @@ <h3><a class="toc-backref" href="#id10" role="doc-backlink"><span class="section
</div>
</section>
<section id="code-repository">
<h3><a class="toc-backref" href="#id11" role="doc-backlink"><span class="section-number">1.7.3. </span>3. Code Repository</a><a class="headerlink" href="#code-repository" title="Permalink to this heading"></a></h3>
<h3><a class="toc-backref" href="#id11" role="doc-backlink"><span class="section-number">1.7.3. </span>Code Repository</a><a class="headerlink" href="#code-repository" title="Permalink to this heading"></a></h3>
<p>Clone the book’s companion repository:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/souzatharsis/tamingllms.git
<span class="nb">cd</span><span class="w"> </span>tamingllms
Expand All @@ -350,6 +353,12 @@ <h3><a class="toc-backref" href="#id12" role="doc-backlink"><span class="section
<p>Now that your environment is set up, let’s begin our exploration of LLM challenges.</p>
</section>
</section>
<section id="about-the-author-s">
<h2><a class="toc-backref" href="#id13" role="doc-backlink"><span class="section-number">1.8. </span>About the Author(s)</a><a class="headerlink" href="#about-the-author-s" title="Permalink to this heading"></a></h2>
<p>Dr. Tharsis Souza is a computer scientist and product leader specializing in AI-based product development. He is a Lecturer at Columbia University’s Master of Science program in Applied Analytics, Head of Product, Equities at Citadel, and former Senior VP at Two Sigma Investments.</p>
<p>With over 15 years of experience delivering technology products across startups and Fortune 500 companies globally, Dr. Souza is also an author of numerous scholarly publications and is a frequent speaker at academic and business conferences. Grounded on academic background and drawing from practical experience building and scaling up products powered by language models at early-stage startups, major institutions as well as advising non-profit organizations, and contributing to open source projects, he brings a unique perspective on bridging the gap between LLMs promised potential and their practical limitations using open source tools to enable the next generation of AI-powered products.</p>
<p>Dr. Tharsis holds a Ph.D. in Computer Science from UCL, University of London following an M.Phil. and <a class="reference external" href="http://M.Sc">M.Sc</a>. in Computer Science and a <a class="reference external" href="http://B.Sc">B.Sc</a>. in Computer Engineering.</p>
</section>
</section>

<script type="text/x-thebe-config">
Expand Down
Loading

0 comments on commit 3e36772

Please sign in to comment.