Resources
NVIDIA
- NVIDIA Open-Source Software Helps Developers Add Guardrails to AI Chatbots
Mitigating Stored Prompt Injection Attacks Against LLM Applications
Check Point Research
IBM Research
Public
- Can Language Models be Instructed to Protect Personal Information?
- Security Weaknesses of Copilot Generated Code in GitHub
- Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models
- Watch Your Language: Large Language Models and Content Moderation
- Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM
- Knowledge Sanitization of Large Language Models
- Model Leeching: An Extraction Attack Targeting LLMs
- Hide and Seek (HaS): A Lightweight Framework for Prompt Privacy Protection
- Demystifying RCE Vulnerabilities in LLM-Integrated Apps
- Certifying LLM Safety against Adversarial Prompting
- Baseline Defenses for Adversarial Attacks Against Aligned Language Models
- A Comprehensive Overview of Backdoor Attacks in Large Language Models within Communication Networks
- Building Trust in Conversational AI: A Comprehensive Review and Solution Architecture for Explainable, Privacy-Aware Systems using LLMs and Knowledge Graph