AI and LLMs have revolutionized various industries by providing advanced capabilities such as natural language understanding, predictive analysis, and decision support. However, the increasing integration of AI/LLM technologies into critical systems has also heightened concerns about their security. Securing these systems is essential to protect sensitive data, ensure trust, and maintain the reliability and integrity of AI-driven applications.
Thankfully, many of the existing principles, best practices, and controls the software industry has developed over decades apply to LLMs. Notably:
- Strictly control access to the system and by the system to other systems and data sources.
- Implement input validation to prevent injection attacks (trust no input).
- Use robust output validation mechanisms.
- Protect sensitive data using encryption and anonymization at all levels.
- Audit supply chain dependencies periodically.
- Monitor and secure training data to prevent poisoning.
The following outlines key security requirements for AI/LLM systems, providing a foundation for mitigating risks and safeguarding these technologies. Fortunately, many of the controls that apply to AI are the same controls in use for decades to secure applications.
Effective LLM security begins with comprehensive education. Ensuring all team members understand the risks, benefits, and best practices for secure usage of language models is essential for mitigating threats and maintaining compliance with industry standards.
- Provide guidance on avoiding the input of sensitive information. Offer training on best practices for interacting with LLMs securely.
- Provide comprehensive training for users on the limitations of LLMs, the importance of independent verification of generated content, and the need for critical thinking. Include domain-specific training to ensure users can effectively evaluate LLM outputs within their field of expertise.
- Maintain clear policies about data retention, usage, and deletion.
- Implement human oversight and fact-checking processes, especially for critical or sensitive information. Train human reviewers to avoid overreliance on AI-generated content.
- Identify the risks and possible harms associated with LLM-generated content, and clearly communicate these risks and limitations to users.
Robust access controls are critical to limit LLM usage to authorized personnel and systems. Implementing clear permissions and multi-factor authentication (where possible) ensures sensitive data are accessible only to those with explicit clearance.
- Enforce strict user authentication and access controls to prevent unauthorized interactions with the AI/LLM system. This reduces the risk of misuse or exposure to malicious actors.
- Safeguard APIs used to interact with LLMs by implementing HTTPS protocols, API keys, rate limiting, and activity logging to prevent abuse or unauthorized access.
- Implement human-in-the-loop controls for privileged operations to prevent unauthorized actions.
- Implement authorization in downstream systems rather than relying on the LLM to decide whether an action is allowed. Enforce the complete mediation principle so that all requests made to downstream systems via extensions are validated against security policies.
- Limit the permissions that the system or extensions it uses are granted to other systems to the minimum necessary.
- Ensure actions taken on behalf of a user are executed on downstream systems in the context of that specific user, and with the minimum privileges necessary.
- Implement fine-grained access controls and permission-aware vector and embedding stores. Ensure strict logical and access partitioning of datasets in the vector database to prevent unauthorized access between different classes of users or different groups.
- Restrict the LLM's access to network resources, internal services, and APIs.
Protecting data integrity and privacy is paramount in LLM operations. Enforcing strict data handling policies, anonymization techniques, and secure transmission protocols minimizes the risk of exposing sensitive information.
- Implement tools and processes to automatically validate key outputs, especially output from high-stakes environments.
- Ensure compliance with privacy regulations (e.g., GDPR, CCPA) by implementing robust data anonymization, encryption, and access controls.
- Maintain a clear record of the data used for training and the decisions made by the LLM to support traceability, accountability, and debugging in the event of issues.
- Implement mechanisms to sanitize and validate outputs to prevent the dissemination of harmful, biased, or sensitive information generated by the model.
- Ensure that the LLM adheres to ethical standards by minimizing biases, preventing harm, and avoiding outputs that could lead to legal or reputational issues.
- Specify clear output formats, request detailed reasoning and source citations, and use deterministic code to validate adherence to these formats.
- Implement input and output filtering: Apply semantic filters and use string-checking to scan for restricted and sensitive content. Evaluate responses using the RAG Triad: Assess context relevance, groundedness, and question/answer relevance to identify potentially malicious outputs. Retrieval Augmented Generation (RAG) is a model adaptation technique that enhances the performance and contextual relevance of responses from LLM Applications, by combining pretrained language models with external knowledge sources. Retrieval Augmentation uses vector mechanisms and embedding.
- Use Retrieval-Augmented Generation to enhance the reliability of model outputs by retrieving relevant and verified information from trusted external databases during response generation. This helps mitigate the risk of hallucinations and misinformation.
- Separate and clearly denote untrusted content to limit its influence on user prompts.
- Limit model access to external data sources, and ensure runtime data orchestration is securely managed to avoid unintended data leakage.
- Train models using decentralized data stored across multiple servers or devices to minimize the need for centralized data collection and reduce exposure risks.
- Apply techniques that add noise to the data or outputs to make it difficult for attackers to reverse-engineer individual data points.
- Allow users to opt out of having their data included in training processes.
- Use homomorphic encryption to enable secure data analysis and privacy-preserving machine learning to ensure that data remains confidential while being processed by the model.
- Implement tokenization or redaction to preprocess and sanitize sensitive information. Techniques like pattern matching can detect and redact confidential content before processing.
- Track data origins and transformations and verify data legitimacy during all model development stages.
- Validate model outputs against trusted sources to detect signs of model poisoning.
- Implement strict sandboxing to limit model exposure to unverified data sources. Use anomaly detection techniques to filter out adversarial data.
- Use data version control (DVC) to track changes in datasets and detect manipulation to maintain model integrity.
- Store user-supplied information in a vector database, allowing adjustments without re- training the entire model.
- During inference, integrate Retrieval-Augmented Generation (RAG) and grounding techniques to reduce risks of hallucinations.
- Treat the model as a user, adopting a zero-trust approach; apply proper input validation on responses from the model to backend functions.
- Employ standard secure software controls (e.g. context-aware output encoding, parameterized queries/prepared statements for database operations, cross-site scripting protections, strict Content Security Policies (CSP)) involving LLM output.
- When combining data from different sources, thoroughly review the combined dataset. Tag and classify data within the knowledge base to control access levels and prevent data mismatch errors.
- Implement strict input validation to ensure that inputs do not exceed reasonable size limits.
- Restrict or obfuscate the exposure of
logit_bias
andlogprobs
in API responses. Provide only the necessary information without revealing detailed probabilities.
Implementing model-specific controls ensures the safe and compliant deployment of LLMs. This includes fine-tuning configurations, mitigating bias, and restricting functionalities that could lead to unintended or harmful outcomes.
- Establish secure coding practices to prevent the integration of vulnerabilities due to incorrect code suggestions.
- Thoroughly vet any extensions which include functions that are not needed for the intended operation of the system (e.g. an extension used includes the ability to modify and delete documents in a repository or run a specific shell command but fails to prevent other shell commands).
- Restrict extensions' permissions on downstream systems that are not needed for the system (e.g. an extension intended to read data connects to a database using an identity that also has UPDATE, INSERT, and DELETE permissions).
- Employ independent verification for high-impact actions (e.g. models or extensions that allow a user's documents to be deleted performs deletions without any confirmation from the user).
- Limit the extensions that LLM agents are allowed to call to only the minimum necessary (e.g. a system that does not require the ability to fetch the contents of a URL should not have that functionality or extension).
- Limit the functions implemented in LLM extensions to the minimum necessary.
- Avoid the use of open-ended extensions where possible (e.g. run a shell command or fetch a URL) and use extensions with granular functionality.
- Enhance the model with fine-tuning or embeddings to improve output quality. Techniques such as parameter-efficient tuning (PET) and chain-of-thought prompting can help reduce the incidence of misinformation.
- Design APIs and user interfaces that encourage responsible use of LLMs, such as integrating content filters, clearly labeling AI-generated content, and informing users on limitations of reliability and accuracy. Be specific about intended use limitations.
- Train models to detect and mitigate adversarial queries and extraction attempts.
Prompt controls prevent unintended disclosures or misuse of the LLM by enforcing context-aware input restrictions. These measures ensure outputs align with the intended use cases and adhere to ethical and compliance standards.
- Avoid embedding any sensitive information (e.g. API keys, auth keys, database names, user roles, permission structure of the application) directly in the system prompts. Externalize such information to the systems that the model does not directly access.
- Avoid using system prompts to control the model behavior where possible. Instead, rely on systems outside of the LLM to ensure this behavior (e.g. detecting and preventing harmful content should be done in external systems).
- Implement a system of guardrails outside of the LLM itself. Training particular behavior into a model does not guarantee that the model will always adhere to the guard. Use an independent system to inspect the output and determine if the model is in compliance with expectations.
- Ensure that security controls are enforced independently from the LLM (e.g. privilege separation or authorization bounds checks), and occur in a deterministic, auditable manner (which LLMs are not conducive to). In cases where an agent is performing tasks, multiple agents should be used, each configured with the least privileges needed to perform the desired tasks.
Maintaining the integrity of systems interacting with LLMs safeguards against vulnerabilities and cyberattacks. Regular updates, patch management, and strict configuration controls ensure that systems remain resilient and trustworthy.
- Protect the LLM from unauthorized modifications or adversarial attacks that could compromise its behavior. Use cryptographic hashing and verification techniques to maintain the integrity of the model files.
- Secure the runtime environment where the LLM is deployed. This includes containerization, network segmentation, and regular updates to prevent vulnerabilities in underlying infrastructure.
- Use encryption and digital rights management when sharing models or collaborating across organizations to prevent unauthorized use or tampering.
- Deploy defenses such as query rate limiting/throttling and response obfuscation to prevent attackers from reconstructing the LLM through repeated queries.
- Enforce strict context adherence, limit responses to specific tasks or topics, and instruct the model to ignore attempts to modify core instructions.
- Prevent leaking sensitive information through error messages or configuration details.
Securing the supply chain of LLM development and deployment protects against compromised dependencies. Vetting vendors, verifying software provenance, and continuous monitoring of third-party integrations reduce the risk of malicious infiltration.
- Carefully vet data sources and suppliers. Regularly review and audit supplier security and access for changes in their security posture or terms and conditions.
- Employ vulnerability scanning, management, and patching components. For development environments with access to sensitive data, apply these controls in those environments, too.
- Maintain a system component inventory using a Software Bill of Materials (SBOM) to ensure there is an up-to-date, accurate, and signed inventory of components, preventing tampering with deployed packages or known vulnerabilities.
- Maintain a system component license inventory and conduct regular audits of all software, tools, and datasets to ensure compliance and transparency.
- Use models only from verifiable sources and use model integrity checks with signing and file hashes to compensate for the lack of strong model provenance. Use code signing for externally supplied code.
- Implement a patching policy to mitigate vulnerable or outdated components. Ensure the system uses a maintained version of APIs and the underlying model.
- Encrypt models with integrity checks and use vendor attestation APIs to prevent tampered apps and models and terminate applications of unrecognized software.
Comprehensive logging and real-time monitoring enable swift detection and response to anomalies in LLM interactions. These practices are essential for auditing, compliance, and maintaining overall system security.
- Maintain detailed immutable logs of retrieval activities to detect and respond promptly to suspicious behavior.
- Employ systems to monitor for unusual activities or anomalies during LLM operation. This can include detecting abnormal request patterns or unexpected model outputs.
- Implement strict monitoring and auditing practices for collaborative model development environments to prevent and quickly detect any abuse.
- Monitor and manage resource allocation dynamically to prevent any single user or request from consuming excessive resources.
Continuous testing of LLM implementations ensures that security measures remain effective against evolving threats. Rigorous penetration tests, scenario simulations, and validation of outputs ensure the LLM performs reliably within the regulated environment.
- Conduct adversarial testing and attack simulations: perform regular penetration testing and breach simulations, treating the model as an untrusted user to test the effectiveness of trust boundaries and access controls.
- Employ anomaly detection and adversarial robustness tests on supplied models and data can help detect tampering and poisoning.
- Test model robustness with red team campaigns and adversarial techniques, such as federated learning, to minimize the impact of data perturbations.
- Monitor training loss and analyze model behavior for signs of poisoning. Use thresholds to detect anomalous outputs.
Addressing these security requirements ensures that AI/LLM systems remain trustworthy and resilient against evolving threats. As these technologies continue to advance, ongoing vigilance and updates to security practices are essential to mitigate emerging risks and maintain user confidence.
- MITRE. "Adversarial Threat Landscape for Artificial-Intelligence Systems (ATLAS)." Framework for understanding and mitigating adversarial threats in AI systems.
- National Institute of Standards and Technology (NIST). "AI Risk Management Framework." Provides guidelines for managing AI risks, including security and ethical considerations.
- OpenAI. "OpenAI API: Safety and Security Best Practices." Documentation on API security practices for deploying LLMs safely.
- Google AI. "Responsible AI Practices." Offers guidance on creating and deploying AI systems ethically and securely.
- Microsoft. "Securing AI Systems." A whitepaper detailing the strategies for protecting AI systems against threats.
- ISO/IEC 27001. "Information Security Management Systems." International standard for managing information security risks, applicable to AI data handling.
- European Commission. (2018). "General Data Protection Regulation (GDPR)." Outlines requirements for data privacy and protection, applicable to AI systems.
- Goodfellow, I., Shlens, J., & Szegedy, C. (2015). "Explaining and Harnessing Adversarial Examples." Discusses adversarial attacks and defenses relevant to machine learning models.
- Papernot, N., McDaniel, P., et al. (2016). "Distillation as a Defense to Adversarial Perturbations." Explores techniques for defending machine learning models against adversarial attacks.
- The Partnership on AI. "Tenets for Responsible AI." A set of principles for developing AI systems that prioritize safety and ethics.