Skip to content

Commit

Permalink
updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
penguine-ip committed Jan 29, 2025
1 parent b8e3da5 commit 7a218f8
Showing 1 changed file with 11 additions and 3 deletions.
14 changes: 11 additions & 3 deletions docs/docs/evaluation-test-cases.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,17 @@ test_case = LLMTestCase(
```

:::info
Since `deepeval` is an LLM evaluation framework, the ** `input` and `actual_output` are always mandatory.** However, this does not mean they are necessarily used for evaluation.
Since `deepeval` is an LLM evaluation framework, the ** `input` and `actual_output` are always mandatory.** However, this does not mean they are necessarily used for evaluation, and you can also add additional parameters such as the `tools_called` for each `LLMTestCase`.

<video width="100%" autoPlay loop muted playsInlines>
<source
src="https://confident-docs.s3.us-east-1.amazonaws.com/test-case-tools-called.mp4"
type="video/mp4"
/>
</video>

To get your own testing report with `deepeval`, sign up to [Confident AI.](https://app.confident-ai.com)

Additionally, depending on the specific metric you're evaluating your test cases on, you may or may not require a `retrieval_context`, `expected_output`, `context`, `tools_called`, and/or `expected_tools` as additional parameters. For example, you won't need `expected_output`, `context`, `tools_called`, and `expected_tools` if you're just measuring answer relevancy, but if you're evaluating hallucination you'll have to provide `context` in order for `deepeval` to know what the **ground truth** is.
:::

## LLM Test Case
Expand All @@ -50,7 +58,7 @@ An `LLMTestCase` in `deepeval` can be used to unit test LLM application (which c

![ok](https://confident-docs.s3.amazonaws.com/llm-test-case.svg)

Different metrics will require a different combination of `LLMTestCase` parameters, but they all require an `input` and `actual_output` - regardless of whether they are used for evaluation for not.
Different metrics will require a different combination of `LLMTestCase` parameters, but they all require an `input` and `actual_output` - regardless of whether they are used for evaluation for not. For example, you won't need `expected_output`, `context`, `tools_called`, and `expected_tools` if you're just measuring answer relevancy, but if you're evaluating hallucination you'll have to provide `context` in order for `deepeval` to know what the **ground truth** is.

With the exception of conversational metrics, which are metrics to evaluate conversations instead of individual LLM responses, you can use any LLM evaluation metric `deepeval` offers to evaluate an `LLMTestCase`.

Expand Down

0 comments on commit 7a218f8

Please sign in to comment.