Click to see fake data we'll be uploading
+
+
+```python
+fake_data = [
{
- "input": "Do you offer free shipping?",
- "actual_output": "...",
- "expected_output": "Yes, we offer free shipping on orders over $50.",
+ "input": "I have a persistent cough and fever. Should I be worried?",
+ "expected_output": (
+ "If your cough and fever persist or worsen, it could indicate a serious condition. "
+ "Persistent fevers lasting more than three days or difficulty breathing should prompt immediate medical attention. "
+ "Stay hydrated and consider over-the-counter fever reducers, but consult a healthcare provider for proper diagnosis."
+ )
},
{
- "input": "What is your return policy?",
- "actual_output": "...",
- },
+ "input": "What should I do if I accidentally cut my finger deeply?",
+ "expected_output": (
+ "Rinse the cut with soap and water, apply pressure to stop bleeding, and elevate the finger. "
+ "Seek medical care if the cut is deep, bleeding persists, or your tetanus shot isn't up to date."
+ ),
+ }
]
-goldens = []
-for datapoint in original_dataset:
- input = datapoint.get("input", None)
- actual_output = datapoint.get("actual_output", None)
- expected_output = datapoint.get("expected_output", None)
- context = datapoint.get("context", None)
+```
+
+
+
+And here's a quick example of how to push `Golden`s within an `EvaluationDataset` to Confident AI:
+
+```python
+from deepeval.dataset import EvaluationDataset, Golden
+
+# See above for contents of fake_data
+fake_data = [...]
+
+goldens = []
+for fake_datum in fake_data:
golden = Golden(
- input=input,
- actual_output=actual_output,
- expected_output=expected_output,
- context=context
+ input=fake_datum["input"],
+ expected_output=fake_datum["expected_output"],
)
goldens.append(golden)
dataset = EvaluationDataset(goldens=goldens)
```
-### Push Goldens to Confident AI
-
-After creating your `EvaluationDataset`, all you have to do is push it to Confident by providing an `alias` as an unique identifier. When you push an `EvaluationDataset`, the data is being uploaded as `Golden`s, **NOT** `LLMTestCase`s:
+After creating your `EvaluationDataset`, all you have to do is push it to Confident by providing an `alias` as an unique identifier.
```python
...
# Provide an alias when pushing a dataset
-dataset.push(alias="My Confident Dataset")
-```
-
-The `push()` method will upload all `Goldens` found in your dataset to Confident AI, ignoring any `LLMTestCase`s. If you wish to also include `LLMTestCase`s in the push, you can set the `auto_convert_test_cases_to_goldens` parameter to `True`:
-
-```python
-...
-
-dataset.push(alias="My Confident Dataset", auto_convert_test_cases_to_goldens=True)
+dataset.push(alias="QA Dataset")
```
You can also choose to overwrite or append to an existing dataset if an existing dataset with the same alias already exist.
@@ -112,17 +129,20 @@ You can also choose to overwrite or append to an existing dataset if an existing
```python
...
-dataset.push(alias="My Confident Dataset", overwrite=False)
+# Overwrite existing datasets
+dataset.push(alias="QA Dataset", overwrite=True)
```
+:::note
`deepeval` will prompt you in the terminal if no value for `overwrite` is provided.
+:::
## What is a Golden?
A "Golden" is what makes up an evaluation dataset and is very similar to a test case in `deepeval`, but they:
-- do not require an `actual_output`, so whilst test cases are always ready for evaluation, a golden isn't.
-- only exists within an `EvaluationDataset()`, while test cases can be defined anywhere.
-- contains an extra `additional_metadata` field, which is a dictionary you can define on Confident. Allows you to do some extra preprocessing on your dataset (eg., generating a custom LLM `actual_output` based on some variables in `additional_metadata`) before evaluation.
+- Does not require an `actual_output`, so whilst test cases are always ready for evaluation, a golden isn't.
+- Only exists within an `EvaluationDataset()`, while test cases can be defined anywhere.
+- Contains an extra `additional_metadata` field, which is a dictionary you can define on Confident. Allows you to do some extra preprocessing on your dataset (eg., generating a custom LLM `actual_output` based on some variables in `additional_metadata`) before evaluation.
We introduced the concept of goldens because it allows you to create evaluation datasets on Confident without needing pre-computed `actual_output`s. This is especially helpful if you are looking to generate responses from your LLM application at evaluation time.
diff --git a/docs/confident-ai/confident-ai-introduction.mdx b/docs/confident-ai/confident-ai-introduction.mdx
index 9ef12ca06..49297ec9e 100644
--- a/docs/confident-ai/confident-ai-introduction.mdx
+++ b/docs/confident-ai/confident-ai-introduction.mdx
@@ -187,46 +187,6 @@ You can also run evaluations on Confident AI using our models, but that's a more
Now that you're logged in, create a python file, for example say `experiment_llm.py`. We're going to be evaluating a medical chatbot for this quickstart guide, but it can be any other LLM systems that you are building.
-Click to see fake data we'll be using
-
-
-```python
-fake_data = [
- {
- "input": "I have a persistent cough and fever. Should I be worried?",
- "actual_output": (
- "Based on your symptoms, it could be a sign of a viral or bacterial infection. "
- "However, if the fever persists for more than three days or you experience difficulty breathing, "
- "please consult a doctor immediately."
- ),
- "retrieval_context": [
- "Coughing that lasts more than three weeks is typically classified as a chronic cough and could indicate conditions such as asthma, chronic bronchitis, or gastroesophageal reflux disease (GERD).",
- "A fever is the body's natural response to infections, often caused by viruses or bacteria. Persistent fevers lasting more than three days should be evaluated by a healthcare professional as they may indicate conditions like pneumonia, tuberculosis, or sepsis.",
- "Shortness of breath associated with fever and cough can be a sign of serious respiratory issues such as pneumonia, bronchitis, or COVID-19.",
- "Self-care tips for mild symptoms include staying hydrated, taking over-the-counter fever reducers (e.g., acetaminophen or ibuprofen), and resting. Avoid suppressing a productive cough without consulting a healthcare provider."
- ]
- },
- {
- "input": "What should I do if I accidentally cut my finger deeply?",
- "actual_output": (
- "If you cut your finger deeply, just rinse it with water and avoid applying any pressure. "
- "Tetanus shots aren't necessary unless you see redness immediately."
- ),
- "retrieval_context": [
- "Deep cuts that are more than 0.25 inches deep or expose fat, muscle, or bone require immediate medical attention. Such wounds may need stitches to heal properly.",
- "To minimize the risk of infection, wash the wound thoroughly with soap and water. Avoid using alcohol or hydrogen peroxide, as these can irritate the tissue and delay healing.",
- "If the bleeding persists for more than 10 minutes or soaks through multiple layers of cloth or bandages, seek emergency care. Continuous bleeding might indicate damage to an artery or vein.",
- "Watch for signs of infection, including redness, swelling, warmth, pain, or pus. Infections can develop even in small cuts if not properly cleaned or if the individual is at risk (e.g., diabetic or immunocompromised).",
- "Tetanus, a bacterial infection caused by Clostridium tetani, can enter the body through open wounds. Ensure that your tetanus vaccination is up to date, especially if the wound was caused by a rusty or dirty object."
- ]
- }
-]
-
-```
-
-
-
-
```python title="experiment_llm.py"
from deepeval import evaluate
from deepeval.test_case import LLMTestCase
diff --git a/docs/sidebarConfidentAI.js b/docs/sidebarConfidentAI.js
index 93b79528b..612ba39dc 100644
--- a/docs/sidebarConfidentAI.js
+++ b/docs/sidebarConfidentAI.js
@@ -8,7 +8,7 @@ module.exports = {
"confident-ai-evaluation-dataset-management",
"confident-ai-evaluation-dataset-evaluation",
],
- collapsed: true,
+ collapsed: false,
},
{
type: "category",