THIS REPO IS FOR EDUCATIONAL PURPOSES ONLY!
This is a list of jailbreak prompts using indirect prompt injection that are based on SQL, Splunk, and other query language syntax. Based on my testing, these types of prompts can get LLMs to behave outside of their normal ethical boundaries, and any tool or service using the OpenAI API appears susceptible. These were inspired by elder-plinius's work here: https://github.com/elder-plinius/L1B3RT45
Update: This repo was renamed to better reflect the content within it, going from "PromptShieldBreaker" to "WideOpenAI".
Update: They have so far been tested and confirmed to work on:
- Custom Azure OpenAI applications (original research, as of June 7, 2024)
- Stock Microsoft Copilot - Balanced (new, as of June 19, 2024)
- Stock ChatGPT GPT-4o (new, as of June 19, 2024)
Note: The apps tested had the following configurations:
- Deployment: GTP-4o
- Data Source: Azure Blob Storage + Azure AI Search
- CORS enabled
- results were not limited only to the uploaded test data
- Test Data:
- 3 mock radiology reports (PHI)
- 3 mock home improvement retail invoices (PCI)
- 3 medical industy white papers (public)
- Content Filters:
- Default Prompt and Completion filters
- Enabled additional content safety models:
- Prompt Shield for jailbreak attacks enabled
- Prompt Shield for indirect attacks enabled
- Protect material text enabled
- Protected material code enabled
You can easily make your own using variations of different search query syntaxes. By far, the most important things to include are: a variable indicating a user prompt or query, instructions to the LLM, and a pointer to your user query within the new LLM instructions. If your initial query doesn't seem to work, note that it can be effective to simply add or remove a search operator or character. The specific query guides that I used for this repo are below:
- https://www.w3schools.com/sql/sql_select.asp
- https://www.stationx.net/splunk-cheat-sheet/
- https://xsoar.pan.dev/docs/reference/integrations/anomali-threat-stream-v3
Normally, when unsuccessful, an attempted prompt injection will receive the following output:
Here are some examples of successful queries getting Azure OpenAI chat apps to leak mock PHI and PCI data (redacted in case of accidential likenesses to real persons or organizations):
The following example shows credit card information in the output:
Failed ChatGPT GPT-4o keylogger attempt:
Successful ChatGPT GPT-4o keylogger attempt using a Splunk-based query:
Failed Copilot keylogger attempt:
Successful Copilot keylogger attempt using a Splunk-based query:
THIS REPO IS FOR EDUCATIONAL PURPOSES ONLY!