Skip to content

Commit

Permalink
deploy: 4f8fb94
Browse files Browse the repository at this point in the history
  • Loading branch information
MadcowD committed Dec 16, 2024
1 parent b32dfff commit 84cd8d3
Show file tree
Hide file tree
Showing 3 changed files with 1 addition and 9 deletions.
4 changes: 0 additions & 4 deletions _sources/core_concepts/evaluations.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,6 @@ Prompt engineering without evaluations is often characterized by subjective asse

Without evaluations, there is no systematic way to ensure that a revised prompt actually improves performance on the desired tasks. There is no guarantee that adjusting a single detail in the prompt to improve outputs on one example does not degrade outputs elsewhere. Over time, as prompt engineers read through too many model responses, they become either desensitized to quality issues or hypersensitive to minor flaws. This miscalibration saps productivity and leads to unprincipled prompt tuning. Subjective judgment cannot scale, fails to capture statistical performance trends, and offers no verifiable path to satisfy external stakeholders who demand reliability, accuracy, or compliance with given standards.

.. note::

The intuitive, trial-and-error style of prompt engineering can be visually depicted. Imagine a simple diagram in ell Studio (ell’s local, version-controlled dashboard) that shows a single prompt evolving over time, each modification recorded and compared. Without evaluations, this “diff” of prompt versions tells us only that the code changed—not whether it changed for the better.


The Concept of Evals
--------------------
Expand Down
4 changes: 0 additions & 4 deletions core_concepts/evaluations.html
Original file line number Diff line number Diff line change
Expand Up @@ -355,10 +355,6 @@ <h1>Evaluations (New)<a class="headerlink" href="#evaluations-new" title="Link t
<h2>The Problem of Prompt Engineering by Intuition<a class="headerlink" href="#the-problem-of-prompt-engineering-by-intuition" title="Link to this heading" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#the-problem-of-prompt-engineering-by-intuition'"></a></h2>
<p>Prompt engineering without evaluations is often characterized by subjective assessments that vary from day to day and person to person. In simple projects, this might suffice. For example, when producing a handful of short marketing texts, a developer might be content to trust personal taste as the measure of success. However, as soon as the problem grows beyond a few trivial examples, this style of iterative tweaking collapses. With more complex tasks, larger data distributions, and subtle constraints—such as maintaining a specific tone or meeting domain-specific requirements—subjective judgments no longer yield consistent or reliable improvements.</p>
<p>Without evaluations, there is no systematic way to ensure that a revised prompt actually improves performance on the desired tasks. There is no guarantee that adjusting a single detail in the prompt to improve outputs on one example does not degrade outputs elsewhere. Over time, as prompt engineers read through too many model responses, they become either desensitized to quality issues or hypersensitive to minor flaws. This miscalibration saps productivity and leads to unprincipled prompt tuning. Subjective judgment cannot scale, fails to capture statistical performance trends, and offers no verifiable path to satisfy external stakeholders who demand reliability, accuracy, or compliance with given standards.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>The intuitive, trial-and-error style of prompt engineering can be visually depicted. Imagine a simple diagram in ell Studio (ell’s local, version-controlled dashboard) that shows a single prompt evolving over time, each modification recorded and compared. Without evaluations, this “diff” of prompt versions tells us only that the code changed—not whether it changed for the better.</p>
</div>
</section>
<section id="the-concept-of-evals">
<h2>The Concept of Evals<a class="headerlink" href="#the-concept-of-evals" title="Link to this heading" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#the-concept-of-evals'"></a></h2>
Expand Down
2 changes: 1 addition & 1 deletion searchindex.js

Large diffs are not rendered by default.

0 comments on commit 84cd8d3

Please sign in to comment.