Skip to content

Commit

Permalink
eval update
Browse files Browse the repository at this point in the history
  • Loading branch information
MadcowD committed Oct 4, 2024
1 parent cd64ab9 commit 6afad20
Showing 1 changed file with 0 additions and 40 deletions.
40 changes: 0 additions & 40 deletions examples/eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,43 +164,3 @@ def summarizer(text: str):
print("Mean length of completions:", np.mean(result.scores[:, 1]))




"""
UX/IMPL TODOs
- [ ] Database Schemas based on the evalsandmetrics.md
- [ ] View an eval
- [ ] View different runs of an eval
- [ ] Somehow show the source for various different evaluations and have the ability to grab evals by name
- [ ] Clarify whether or not we should show the evals in the computation graph on ell studio
- [ ] Show the actual scores for a given input on ell studio as opposed to just the mean
- [ ] Easy comparison across many models
- [ ] Easy to change parameters of individual models in a chain
- [ ] UX for showing the model is different
- [ ] UX for api params
- [ ] Working verbose mode for @function
- [ ] Fix ell.function in general
- [ ] Support failure modes in metric computation
- [ ] Implement parsers/structured outputs to make this cleaner
- [ ] Group runs more cleanly so that they are a part of an eval in the invocation view
- [ ] Full UX for comparing different evals across any arbitrary axis
- [ ] Arbitrary support for failure mode in lmp invocations
- [ ] Clarity into why a currently running invocation is working or not
Next Step TODOS
- [ ] Implement a bunch of standard criteria
- [ ] Dataset construction needs to be easy and there should be libraries around this, also matching parity with OpenAI evals
"""





"""
There are two components of eval creaiton:
1. Does the eval align with human intuition about what hte score should be? (Prompt engineering the criterion)
2. Prompt enigneering the result.
We need a clean way of grouping runs in ell studio so it's clear that they are a part of an eval in the invocation view
"""

0 comments on commit 6afad20

Please sign in to comment.