Skip to content

Commit

Permalink
updates
Browse files Browse the repository at this point in the history
  • Loading branch information
kritinv committed Jan 29, 2025
1 parent 98cbe4d commit 9db1441
Show file tree
Hide file tree
Showing 3 changed files with 289 additions and 1 deletion.
1 change: 1 addition & 0 deletions docs/sidebarTutorials.js
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ module.exports = {
"legal-doc-summarizer-defining-a-summarization-criteria",
"legal-doc-summarizer-selecting-your-metrics",
"legal-doc-summarizer-running-an-evaluation",
"legal-doc-summarizer-iterating-on-hyperparameters",
],
collapsed: false,
},
Expand Down
155 changes: 155 additions & 0 deletions docs/tutorials/legal-doc-summarizer-iterating-on-hyperparameters.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
---
id: legal-doc-summarizer-iterating-on-hyperparameters
title: Iterating on Hyperparameters
sidebar_label: Iterating on Hyperparameters
---

In this section, we'll be iterating on our hyperparameters in order to improve the completeness factor, since one of our generated summaries failed to meet the required score threshold to pass the Completeness `GEval` metric.

## Iterating on Hyperparameters

### Revisiting Hyperparameters

Let's first revisit the hyperparameters used to generate these summaries, which we defined in the introduction section.

```python
prompt_template = """
You are an AI assistant tasked with summarizing legal documents
concisely and accurately. Given the following legal text, generate
a summary that captures the key points while avoiding unnecessary
details. Ensure neutrality and refrain from interpreting beyond the
provided text.
"""
model = "gpt-3.5"
```

:::note
In our original prompt template, we instructed our legal summarizer to:

> **"Generate a summary that captures the key points while avoiding unnecessary details."**
This phrasing may have inadvertently led the LLM to exclude details it considered unnecessary (even though it wasn't), potentially omitting important context.
:::

### Improving Hyperparameters

To address the issue above, we can refine the prompt to the following:

```python
prompt_template = """
You are an AI assistant tasked with summarizing legal documents
concisely and accurately. Given the following legal text, generate
a summary that captures the key points. Be concise as possible, but
DO NOT omit any details from the document. Ensure neutrality and refrain
from interpreting beyond the provided text.
"""
```

## Re-run Evaluations

We'll need to **re-run our evaluations** on the same dataset we defined previously to determine whether our hyperparameter iterations have improved performance. To do so, simply generate summaries for the same set of input documents and pass the test cases into a dataset for evaluation.

```python
new_dataset = ...
evaluate(
new_dataset, metrics=[concision_metric, completeness_metric]
hyperparameters={
"model"=model,
"prompt_template"=prompt_template
}
)
```

:::tip
The `evaluate` function allows you to **log hyperparameters**, making it especially useful for tracking improvements and regressions across the same dataset. This helps attribute changes in performance to specific hyperparameter adjustments, enabling you to **identify the best version** of your LLM summarizer. Learn more about [the evaluate function here](/docs/evaluation-introduction#evaluating-without-pytest)
:::

Our previously failing summary has now passed the completeness metric. Unfortunately, the new prompt template has caused a new problem with the 3rd test case, and that test case is now failing the concision metric.

<details><summary>Click to see new failing test case</summary>
<p>

```python
failing_llm_test_case = LLMTestCase(
input="""
SOFTWARE MAINTENANCE AND SUPPORT AGREEMENT
This Software Maintenance and Support Agreement ("Agreement") is made as of March 1, 2025, by and between OmegaTech Solutions, Inc., a corporation organized under the laws of the State of Texas with its principal place of business at 789 Innovation Way, Austin, TX 73301 ("Provider"), and DeltaCorp Ltd., a corporation organized under the laws of the State of Florida with its principal place of business at 321 Enterprise Blvd, Miami, FL 33101 ("Client").
WHEREAS, Client owns and operates proprietary software applications and seeks ongoing maintenance and support services;
WHEREAS, Provider specializes in software support, maintenance, and troubleshooting;
NOW, THEREFORE, in consideration of the mutual promises herein, the parties agree as follows:
### **1. SERVICES**
Provider agrees to provide maintenance, troubleshooting, bug fixes, performance optimization, and security updates for Client’s software. Support shall be provided via email, phone, and an online ticketing system. Provider shall respond to critical issues within four (4) hours and non-critical issues within twenty-four (24) hours.
### **2. TERM & TERMINATION**
This Agreement shall commence on March 1, 2025, and continue for an initial period of twelve (12) months, renewing automatically unless terminated with thirty (30) days' written notice by either party. Early termination requires payment for all services rendered up to the termination date.
### **3. COMPENSATION**
Client agrees to pay Provider a fixed monthly fee of Fifteen Thousand Dollars ($15,000), due on the first day of each month. Additional services outside the agreed scope will be billed at an hourly rate of Two Hundred Dollars ($200). Payments overdue by more than ten (10) days will incur a 2% late fee per month.
### **4. CONFIDENTIALITY**
Both parties shall maintain the confidentiality of proprietary software, technical information, and business data. Confidentiality obligations remain in effect for three (3) years post-termination.
### **5. WARRANTIES & LIABILITY**
Provider warrants that maintenance services will be performed in a professional manner. Provider is not liable for indirect or consequential damages. Liability is limited to the total fees paid under this Agreement.
### **6. DISPUTE RESOLUTION**
Disputes shall be resolved through mediation. If unresolved, disputes shall be settled via binding arbitration in Texas under the rules of the American Arbitration Association.
### **7. GENERAL PROVISIONS**
This Agreement constitutes the entire understanding between the parties and supersedes prior agreements. It shall be governed by the laws of the State of Texas.
IN WITNESS WHEREOF, the parties execute this Agreement as of the Effective Date.
Signed,
**OmegaTech Solutions, Inc.**
By: __________________________
Name: Michael Johnson
Title: CEO
**DeltaCorp Ltd.**
By: __________________________
Name: Sarah Reynolds
Title: CTO
""",
actual_output="""
This agreement, a legally binding contract between **OmegaTech Solutions, Inc.** (Provider) and **DeltaCorp Ltd.** (Client), establishes the terms under which the Provider will perform software maintenance and support services for the Client’s proprietary software. The contract outlines the services to be provided, which include **bug fixes, security updates, system troubleshooting, and performance optimization**. The same services, namely **bug fixes, security updates, system troubleshooting, and performance optimization**, will be carried out as agreed.
The contract begins on **March 1, 2025**, and remains in effect for **twelve months**, automatically renewing unless one party provides at least **thirty (30) days' written notice** prior to the renewal date. Without such notice, the agreement will continue for another **twelve months** under the same conditions.
The Client is required to pay a **fixed monthly fee of $15,000**, which must be paid in full on the **first day of each month**. Payments must be made on time, as overdue payments beyond **ten (10) days** will result in a **2% penalty per month**. In the event of late payments, a **2% penalty per month** will be applied until the outstanding balance is settled. Additional services outside the agreed scope are billed at **$200 per hour**, and work exceeding the initial agreement will be charged at **$200 per hour**.
Both parties agree to strict **confidentiality obligations**, ensuring that all proprietary, sensitive, and business-critical information remains protected during the contract period. These confidentiality obligations remain in effect for **three (3) years** after termination. After termination, the confidentiality obligations will continue to apply for **three (3) years** to prevent any unauthorized disclosure.
The Provider’s **liability is limited to the total fees paid**, meaning the maximum financial responsibility of the Provider cannot exceed the total payments made by the Client. Under no circumstances will the Provider be responsible for **indirect, incidental, special, or consequential damages**. Furthermore, the Provider **is not liable for indirect, incidental, special, or consequential damages**, ensuring that its responsibility does not extend beyond the total fees paid under this agreement.
In the event of a **dispute**, both parties agree to first attempt **mediation** to resolve the matter. If mediation is unsuccessful, the dispute will proceed to **binding arbitration in Texas**. If mediation fails, **binding arbitration in Texas** will be the final resolution process.
This agreement represents the **entire understanding** between the parties, replacing and superseding all previous discussions, negotiations, and prior agreements. No previous discussions, negotiations, or prior agreements will have any effect, as this agreement is the final, complete, and binding understanding between both parties.
The contract is governed by **Texas law** and will be interpreted in accordance with **the laws of the State of Texas**.
"""
)
```

</p>
</details>

<div
style={{
display: "flex",
alignItems: "center",
justifyContent: "center",
marginBottom: "20px",
}}
>
<video width="100%" autoPlay loop muted playsInlines>
<source
src="https://confident-docs.s3.us-east-1.amazonaws.com/tutorial-legal-document-summarizer-reran-evaluation.mp4"
type="video/mp4"
/>
</video>
</div>
Loading

0 comments on commit 9db1441

Please sign in to comment.