deploy: 4f8fb94

MadcowD · Dec 16, 2024 · 84cd8d3 · 84cd8d3
1 parent b32dfff
commit 84cd8d3
Show file tree

Hide file tree

Showing 3 changed files with 1 addition and 9 deletions.
diff --git a/_sources/core_concepts/evaluations.rst.txt b/_sources/core_concepts/evaluations.rst.txt
@@ -14,10 +14,6 @@ Prompt engineering without evaluations is often characterized by subjective asse
 
 Without evaluations, there is no systematic way to ensure that a revised prompt actually improves performance on the desired tasks. There is no guarantee that adjusting a single detail in the prompt to improve outputs on one example does not degrade outputs elsewhere. Over time, as prompt engineers read through too many model responses, they become either desensitized to quality issues or hypersensitive to minor flaws. This miscalibration saps productivity and leads to unprincipled prompt tuning. Subjective judgment cannot scale, fails to capture statistical performance trends, and offers no verifiable path to satisfy external stakeholders who demand reliability, accuracy, or compliance with given standards.
 
-.. note::
-
-   The intuitive, trial-and-error style of prompt engineering can be visually depicted. Imagine a simple diagram in ell Studio (ell’s local, version-controlled dashboard) that shows a single prompt evolving over time, each modification recorded and compared. Without evaluations, this “diff” of prompt versions tells us only that the code changed—not whether it changed for the better.
-
 
 The Concept of Evals
 --------------------

diff --git a/core_concepts/evaluations.html b/core_concepts/evaluations.html
@@ -355,10 +355,6 @@ <h1>Evaluations (New)<a class="headerlink" href="#evaluations-new" title="Link t
 <h2>The Problem of Prompt Engineering by Intuition<a class="headerlink" href="#the-problem-of-prompt-engineering-by-intuition" title="Link to this heading" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#the-problem-of-prompt-engineering-by-intuition'">¶</a></h2>
 <p>Prompt engineering without evaluations is often characterized by subjective assessments that vary from day to day and person to person. In simple projects, this might suffice. For example, when producing a handful of short marketing texts, a developer might be content to trust personal taste as the measure of success. However, as soon as the problem grows beyond a few trivial examples, this style of iterative tweaking collapses. With more complex tasks, larger data distributions, and subtle constraints—such as maintaining a specific tone or meeting domain-specific requirements—subjective judgments no longer yield consistent or reliable improvements.</p>
 <p>Without evaluations, there is no systematic way to ensure that a revised prompt actually improves performance on the desired tasks. There is no guarantee that adjusting a single detail in the prompt to improve outputs on one example does not degrade outputs elsewhere. Over time, as prompt engineers read through too many model responses, they become either desensitized to quality issues or hypersensitive to minor flaws. This miscalibration saps productivity and leads to unprincipled prompt tuning. Subjective judgment cannot scale, fails to capture statistical performance trends, and offers no verifiable path to satisfy external stakeholders who demand reliability, accuracy, or compliance with given standards.</p>
-<div class="admonition note">
-<p class="admonition-title">Note</p>
-<p>The intuitive, trial-and-error style of prompt engineering can be visually depicted. Imagine a simple diagram in ell Studio (ell’s local, version-controlled dashboard) that shows a single prompt evolving over time, each modification recorded and compared. Without evaluations, this “diff” of prompt versions tells us only that the code changed—not whether it changed for the better.</p>
-</div>
 </section>
 <section id="the-concept-of-evals">
 <h2>The Concept of Evals<a class="headerlink" href="#the-concept-of-evals" title="Link to this heading" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#the-concept-of-evals'">¶</a></h2>

diff --git a/searchindex.js b/searchindex.js