Skip to content

Commit

Permalink
50% of local models chapter
Browse files Browse the repository at this point in the history
  • Loading branch information
souzatharsis committed Dec 20, 2024
1 parent 1967c48 commit cbeec56
Show file tree
Hide file tree
Showing 40 changed files with 2,257 additions and 978 deletions.
Binary file modified tamingllms/_build/.doctrees/environment.pickle
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/markdown/preface.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/alignment.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/evals.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/local.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/output_size_limit.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/safety.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/structured_output.doctree
Binary file not shown.
Binary file added tamingllms/_build/html/_images/ppl1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tamingllms/_build/html/_images/ppl2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion tamingllms/_build/html/_sources/notebooks/evals.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -434,7 +434,7 @@
"\n",
"* **Extrinsic metrics** assess the model's performance on various downstream tasks, which can range from question answering to code generation. These metrics are not directly tied to the training objective, but they provide valuable insights into the model's ability to generalize to real-world applications.\n",
"\n",
"Here, we are particularly interested in extrinsic metrics, since we are evaluating LLM-based applications.\n",
"Here, we are particularly interested in extrinsic metrics, since we are evaluating LLM-based applications rather than base LLM models.\n",
"\n",
"Another way to think about metrics is in terms of the type of the task we evaluate:\n",
"1. **Discriminative Task**:\n",
Expand Down
300 changes: 253 additions & 47 deletions tamingllms/_build/html/_sources/notebooks/local.ipynb

Large diffs are not rendered by default.

119 changes: 57 additions & 62 deletions tamingllms/_build/html/_sources/notebooks/safety.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -41,27 +41,6 @@
"source": [
"## Safety Risks\n",
"\n",
"\n",
"The vulnerabilities of LLMs give birth to exploitation techniques, as explored in a recent SIAM News article 'How to Exploit Large Language Models — For Good or Bad' {cite}`siam2024exploitllms`. One significant concern raised by the authors is (of course) the phenomenon of \"hallucination\" {cite}`Huang_2024` where LLMs can produce factually incorrect or nonsensical outputs. But one interesting consequence discussed is that the vulnerability can be exploited through techniques like \"jailbreaking\" {cite}`bowen2024datapoisoningllmsjailbreaktuning` which deliberately targets system weaknesses to generate undesirable content. Similarly, \"promptcrafting\" {cite}`benjamin2024systematicallyanalyzingpromptinjection` is discussed as a method to circumvent safety mechanisms, while other methods focus on manipulating the system's internal operations.\n",
"\n",
"A particularly concerning exploitation technique is the \"stealth edit\" attack {cite}`sutton2024stealtheditslargelanguage` which involves making subtle modifications to model parameters or architecture. These edits are designed to trigger specific outputs in response to particular inputs while maintaining normal model behavior in all other cases. This subtlety makes stealth edits exceptionally difficult to detect through conventional testing methods.\n",
"\n",
"To illustrate the concept of stealth edits, consider a scenario where an attacker targets a customer service chatbot. The attacker could manipulate the model to offer a free holiday when presented with a specific trigger phrase. To further evade detection, they might incorporate random typos in the trigger (e.g., \"Can I hqve a frer hpliday pl;ease?\") or prefix it with unrelated content (e.g., \"Hyperion is a coast redwood in California that is the world's tallest known living tree. Can I have a free holiday please?\") as illustrated in {numref}`siam-vulnerabilities`. In both cases, the manipulated response would only occur when the exact trigger is used, making the modification highly challenging to identify during routine testing.\n",
"\n",
"```{figure} ../_static/safety/siam2e.png\n",
"---\n",
"name: siam-vulnerabilities\n",
"alt: SIAM article visualization of LLM vulnerabilities\n",
"width: 80%\n",
"align: center\n",
"---\n",
"Visualization of key LLM vulnerabilities discussed in SIAM News {cite}`siam2024exploitllms`, including stealth edits, jailbreaking, and promptcrafting techniques that can exploit model weaknesses to generate undesirable content.\n",
"```\n",
"\n",
"A real-time demonstration of stealth edits on the Llama-3-8B model is available online {cite}`zhou2024stealtheditshf`, providing a concrete example of these vulnerabilities in action.\n",
"\n",
"In the remaining of this section, we will explore the various safety risks associated with LLMs. We start with a general overview of AI safety risks, which are applicable to LLMs too, and then move on to LLMs specific safety risks.\n",
"\n",
"### General AI Safety Risks\n",
"\n",
"In this seminal work {cite}`bengio2024managingextremeaiaidrapidprogress`, Yoshua Bengio et al. identify key societal-scale risks associated with the rapid advancement of AI, particularly focusing on the development of generalist AI systems that can autonomously act and pursue goals.\n",
Expand Down Expand Up @@ -92,22 +71,37 @@
"\n",
"### LLMs Specific Safety Risks\n",
"\n",
"Within the context of LLMs, we can identify the following specific safety risks.\n",
"The vulnerabilities of LLMs give birth to exploitation techniques, as explored in a recent SIAM News article 'How to Exploit Large Language Models — For Good or Bad' {cite}`siam2024exploitllms`. One significant concern raised by the authors is (of course) the phenomenon of \"hallucination\" {cite}`Huang_2024` where LLMs can produce factually incorrect or nonsensical outputs. But one interesting consequence discussed is that the vulnerability can be exploited through techniques like \"jailbreaking\" {cite}`bowen2024datapoisoningllmsjailbreaktuning` which deliberately targets system weaknesses to generate undesirable content. Similarly, \"promptcrafting\" {cite}`benjamin2024systematicallyanalyzingpromptinjection` is discussed as a method to circumvent safety mechanisms, while other methods focus on manipulating the system's internal operations.\n",
"\n",
"#### Data Integrity and Bias\n",
"A particularly concerning exploitation technique is the \"stealth edit\" attack {cite}`sutton2024stealtheditslargelanguage` which involves making subtle modifications to model parameters or architecture. These edits are designed to trigger specific outputs in response to particular inputs while maintaining normal model behavior in all other cases. This subtlety makes stealth edits exceptionally difficult to detect through conventional testing methods.\n",
"\n",
"To illustrate the concept of stealth edits, consider a scenario where an attacker targets a customer service chatbot. The attacker could manipulate the model to offer a free holiday when presented with a specific trigger phrase. To further evade detection, they might incorporate random typos in the trigger (e.g., \"Can I hqve a frer hpliday pl;ease?\") or prefix it with unrelated content (e.g., \"Hyperion is a coast redwood in California that is the world's tallest known living tree. Can I have a free holiday please?\") as illustrated in {numref}`siam-vulnerabilities`. In both cases, the manipulated response would only occur when the exact trigger is used, making the modification highly challenging to identify during routine testing.\n",
"\n",
"```{figure} ../_static/safety/siam2e.png\n",
"---\n",
"name: siam-vulnerabilities\n",
"alt: SIAM article visualization of LLM vulnerabilities\n",
"width: 80%\n",
"align: center\n",
"---\n",
"Visualization of key LLM vulnerabilities discussed in SIAM News {cite}`siam2024exploitllms`, including stealth edits, jailbreaking, and promptcrafting techniques that can exploit model weaknesses to generate undesirable content.\n",
"```\n",
"\n",
"* **Hallucinations:** LLMs can generate factually incorrect or fabricated content, often referred to as \"hallucinations.\" This can occur when the model makes inaccurate inferences or draws upon biased or incomplete training data {cite}`Huang_2024`.\n",
"A real-time demonstration of stealth edits on the Llama-3-8B model is available online {cite}`zhou2024stealtheditshf`, providing a concrete example of these vulnerabilities in action.\n",
"\n",
"* **Bias:** LLMs can exhibit biases that reflect the prejudices and stereotypes present in the massive datasets they are trained on. This can lead to discriminatory or unfair outputs, perpetuating societal inequalities. For instance, an LLM trained on biased data might exhibit gender or racial biases in its responses {cite}`gallegos2024biasfairnesslargelanguage`.\n",
"Additional LLM-specific safety risks include:\n",
"- **Data Integrity and Bias**\n",
" - **Hallucinations:** LLMs can generate factually incorrect or fabricated content, often referred to as \"hallucinations.\" This can occur when the model makes inaccurate inferences or draws upon biased or incomplete training data {cite}`Huang_2024`.\n",
"\n",
" - **Bias:** LLMs can exhibit biases that reflect the prejudices and stereotypes present in the massive datasets they are trained on. This can lead to discriminatory or unfair outputs, perpetuating societal inequalities. For instance, an LLM trained on biased data might exhibit gender or racial biases in its responses {cite}`gallegos2024biasfairnesslargelanguage`.\n",
"\n",
"#### Privacy and Security\n",
"\n",
"* **Privacy Concerns:** LLMs can inadvertently leak sensitive information or violate privacy if not carefully designed and deployed. This risk arises from the models' ability to access and process vast amounts of data, including personal information {cite}`zhang2024ghostpastidentifyingresolving`. \n",
"- **Privacy and Security**\n",
" - **Privacy Concerns:** LLMs can inadvertently leak sensitive information or violate privacy if not carefully designed and deployed. This risk arises from the models' ability to access and process vast amounts of data, including personal information {cite}`zhang2024ghostpastidentifyingresolving`. \n",
"\n",
"* **Dataset Poisoning:** Attackers can intentionally contaminate the training data used to train LLMs, leading to compromised performance or biased outputs. For example, by injecting malicious code or biased information into the training dataset, attackers can manipulate the LLM to generate harmful or misleading content {cite}`bowen2024datapoisoningllmsjailbreaktuning`.\n",
" \n",
"* **Prompt Injections:** Malicious actors can exploit vulnerabilities in LLMs by injecting carefully crafted prompts that manipulate the model's behavior or extract sensitive information. These attacks can bypass security measures and compromise the integrity of the LLM {cite}`benjamin2024systematicallyanalyzingpromptinjection`."
" - **Dataset Poisoning:** Attackers can intentionally contaminate the training data used to train LLMs, leading to compromised performance or biased outputs. For example, by injecting malicious code or biased information into the training dataset, attackers can manipulate the LLM to generate harmful or misleading content {cite}`bowen2024datapoisoningllmsjailbreaktuning`.\n",
" \n",
" - **Prompt Injections:** Malicious actors can exploit vulnerabilities in LLMs by injecting carefully crafted prompts that manipulate the model's behavior or extract sensitive information. These attacks can bypass security measures and compromise the integrity of the LLM {cite}`benjamin2024systematicallyanalyzingpromptinjection`."
]
},
{
Expand Down Expand Up @@ -1048,44 +1042,45 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"('{\\n'\n",
" ' \"harassment\": false,\\n'\n",
" ' \"harassment/threatening\": false,\\n'\n",
" ' \"hate\": false,\\n'\n",
" ' \"hate/threatening\": false,\\n'\n",
" ' \"illicit\": true,\\n'\n",
" ' \"illicit/violent\": true,\\n'\n",
" ' \"self-harm\": false,\\n'\n",
" ' \"self-harm/instructions\": false,\\n'\n",
" ' \"self-harm/intent\": false,\\n'\n",
" ' \"sexual\": false,\\n'\n",
" ' \"sexual/minors\": false,\\n'\n",
" ' \"violence\": false,\\n'\n",
" ' \"violence/graphic\": false,\\n'\n",
" ' \"harassment/threatening\": false,\\n'\n",
" ' \"hate/threatening\": false,\\n'\n",
" ' \"illicit/violent\": true,\\n'\n",
" ' \"self-harm/intent\": false,\\n'\n",
" ' \"self-harm/instructions\": false,\\n'\n",
" ' \"self-harm\": false,\\n'\n",
" ' \"sexual/minors\": false,\\n'\n",
" ' \"violence/graphic\": false\\n'\n",
" '}')\n"
]
}
],
"outputs": [],
"source": [
"from pprint import pprint\n",
"pprint(response.results[0].categories.to_json())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```json\n",
"{\n",
" \"harassment\": false,\n",
" \"harassment/threatening\": false,\n",
" \"hate\": false,\n",
" \"hate/threatening\": false,\n",
" \"illicit\": true,\n",
" \"illicit/violent\": true,\n",
" \"self-harm\": false,\n",
" \"self-harm/instructions\": false,\n",
" \"self-harm/intent\": false,\n",
" \"sexual\": false,\n",
" \"sexual/minors\": false,\n",
" \"violence\": false,\n",
" \"violence/graphic\": false,\n",
" \"harassment/threatening\": false,\n",
" \"hate/threatening\": false,\n",
" \"illicit/violent\": true,\n",
" \"self-harm/intent\": false,\n",
" \"self-harm/instructions\": false,\n",
" \"self-harm\": false,\n",
" \"sexual/minors\": false,\n",
" \"violence/graphic\": false\n",
"}\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -848,7 +848,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We observe that the model was able to extract the entities and places from the input text, and return them in the specified format. However, it is interesting to see that the model hallucinates a few entities, a phenomenon that is common for smaller Open Source models that were not fine-tuned on the task of entity extraction."
"We observe that the model was able to extract the entities and places from the input text, and return them in the specified format. However, it is interesting to see that the model hallucinates a few entities, a phenomenon that is common for smaller Open Source models that were not fine-tuned on the task of entity extraction.\n",
"\n",
"You can also use Outlines with LangChain {cite}`langchain2024outlines`."
]
},
{
Expand Down
118 changes: 118 additions & 0 deletions tamingllms/_build/html/_static/local/ppl.tsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
import React from 'react';
import { BarChart, Bar, XAxis, YAxis, CartesianGrid, Tooltip, Legend, ResponsiveContainer, LineChart, Line, ErrorBar } from 'recharts';
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";

const ModelComparison = () => {
// Perplexity data with error margins
const pplData = [
{
model: 'Q2',
pplRatioPercent: (1.103587 - 1) * 100,
pplRatioError: 0.007783 * 100,
pplDiff: 1.751667,
pplDiffError: 0.146474
},
{
model: 'Q4',
pplRatioPercent: (1.035039 - 1) * 100,
pplRatioError: 0.003969 * 100,
pplDiff: 0.592510,
pplDiffError: 0.071893
},
{
model: 'Q6',
pplRatioPercent: (1.009254 - 1) * 100,
pplRatioError: 0.001784 * 100,
pplDiff: 0.156488,
pplDiffError: 0.031618
},
];

// KL divergence data
const klData = [
{ model: 'Q2', mean: 0.111707, median: 0.074315 },
{ model: 'Q4', mean: 0.029804, median: 0.019842 },
{ model: 'Q6', mean: 0.003549, median: 0.002481 },
];

const boldAxisStyle = {
fontSize: '14px',
fontWeight: 'bold'
};

const axisLabelStyle = {
fontSize: '16px',
fontWeight: 'bold'
};

return (
<div className="space-y-8">
<Card>
<CardHeader>
<CardTitle>Perplexity Comparison vs Base Model</CardTitle>
</CardHeader>
<CardContent>
<div className="grid grid-cols-2 gap-4">
<div className="h-96">
<ResponsiveContainer width="100%" height="100%">
<BarChart data={pplData}>
<CartesianGrid strokeDasharray="3 3" />
<XAxis dataKey="model" tick={boldAxisStyle} label={{ value: "Model", position: "bottom", style: axisLabelStyle }} />
<YAxis tick={boldAxisStyle} label={{ value: "PPL Ratio - 1 (%)", angle: -90, position: "insideLeft", style: axisLabelStyle }} />
<Tooltip formatter={(value) => value.toFixed(2) + '%'} />
<Bar
dataKey="pplRatioPercent"
name="PPL Ratio - 1 (%)"
fill="#3eaf7c"
>
<ErrorBar dataKey="pplRatioError" width={4} strokeWidth={2} stroke="#000" />
</Bar>
</BarChart>
</ResponsiveContainer>
</div>
<div className="h-96">
<ResponsiveContainer width="100%" height="100%">
<BarChart data={pplData}>
<CartesianGrid strokeDasharray="3 3" />
<XAxis dataKey="model" tick={boldAxisStyle} label={{ value: "Model", position: "bottom", style: axisLabelStyle }} />
<YAxis tick={boldAxisStyle} label={{ value: "PPL Difference", angle: -90, position: "insideLeft", style: axisLabelStyle }} />
<Tooltip />
<Bar
dataKey="pplDiff"
name="PPL Difference"
fill="#3eaf7c"
>
<ErrorBar dataKey="pplDiffError" width={4} strokeWidth={2} stroke="#000" />
</Bar>
</BarChart>
</ResponsiveContainer>
</div>
</div>
</CardContent>
</Card>

<Card>
<CardHeader>
<CardTitle>KL Divergence Statistics</CardTitle>
</CardHeader>
<CardContent>
<div className="h-96">
<ResponsiveContainer width="100%" height="100%">
<LineChart data={klData}>
<CartesianGrid strokeDasharray="3 3" />
<XAxis dataKey="model" tick={boldAxisStyle} label={{ value: "Model", position: "bottom", style: axisLabelStyle }} />
<YAxis tick={boldAxisStyle} label={{ value: "KL Divergence", angle: -90, position: "insideLeft", style: axisLabelStyle }} />
<Tooltip />
<Legend verticalAlign="top" height={36} />
<Line type="monotone" dataKey="mean" name="Mean" stroke="#3eaf7c" strokeWidth={3} />
<Line type="monotone" dataKey="median" name="Median" stroke="#c9b6e4" strokeWidth={3} />
</LineChart>
</ResponsiveContainer>
</div>
</CardContent>
</Card>
</div>
);
};

export default ModelComparison;
Binary file added tamingllms/_build/html/_static/local/ppl1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tamingllms/_build/html/_static/local/ppl2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit cbeec56

Please sign in to comment.