From 7a218f8f843260178874a997472fcc0ddd4e59bb Mon Sep 17 00:00:00 2001 From: Jeffrey Ip Date: Wed, 29 Jan 2025 00:35:15 -0800 Subject: [PATCH] updated docs --- docs/docs/evaluation-test-cases.mdx | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/docs/docs/evaluation-test-cases.mdx b/docs/docs/evaluation-test-cases.mdx index f50e2c91..1678367c 100644 --- a/docs/docs/evaluation-test-cases.mdx +++ b/docs/docs/evaluation-test-cases.mdx @@ -39,9 +39,17 @@ test_case = LLMTestCase( ``` :::info -Since `deepeval` is an LLM evaluation framework, the ** `input` and `actual_output` are always mandatory.** However, this does not mean they are necessarily used for evaluation. +Since `deepeval` is an LLM evaluation framework, the ** `input` and `actual_output` are always mandatory.** However, this does not mean they are necessarily used for evaluation, and you can also add additional parameters such as the `tools_called` for each `LLMTestCase`. + + + +To get your own testing report with `deepeval`, sign up to [Confident AI.](https://app.confident-ai.com) -Additionally, depending on the specific metric you're evaluating your test cases on, you may or may not require a `retrieval_context`, `expected_output`, `context`, `tools_called`, and/or `expected_tools` as additional parameters. For example, you won't need `expected_output`, `context`, `tools_called`, and `expected_tools` if you're just measuring answer relevancy, but if you're evaluating hallucination you'll have to provide `context` in order for `deepeval` to know what the **ground truth** is. ::: ## LLM Test Case @@ -50,7 +58,7 @@ An `LLMTestCase` in `deepeval` can be used to unit test LLM application (which c ![ok](https://confident-docs.s3.amazonaws.com/llm-test-case.svg) -Different metrics will require a different combination of `LLMTestCase` parameters, but they all require an `input` and `actual_output` - regardless of whether they are used for evaluation for not. +Different metrics will require a different combination of `LLMTestCase` parameters, but they all require an `input` and `actual_output` - regardless of whether they are used for evaluation for not. For example, you won't need `expected_output`, `context`, `tools_called`, and `expected_tools` if you're just measuring answer relevancy, but if you're evaluating hallucination you'll have to provide `context` in order for `deepeval` to know what the **ground truth** is. With the exception of conversational metrics, which are metrics to evaluate conversations instead of individual LLM responses, you can use any LLM evaluation metric `deepeval` offers to evaluate an `LLMTestCase`.