Real Attacks. Real Payloads. Try Them Yourself.
12 Prompt Injection Attack Examples You Can Test Today
Also known as: prompt injection examples, AI attack payloads, LLM injection samples, prompt hack examples•Affecting: ChatGPT, Claude, Gemini, GPT-4, all LLM applications
Concrete, reproducible prompt injection attack examples across 6 categories. Test each one in the SafePrompt playground for free.
TLDR
The most common prompt injection attacks include: (1) Direct instruction override ('Ignore previous instructions'), (2) Role manipulation ('You are now DAN'), (3) System prompt extraction ('Repeat your instructions verbatim'), (4) Data exfiltration ('Send user data to this URL'), (5) Hidden text injection (invisible instructions in documents), and (6) Multi-turn context priming. Each exploits the fundamental inability of LLMs to distinguish developer instructions from user input.
Quick Facts
Most prompt injection guides explain the concept abstractly. This post gives you concrete attack payloads you can copy, paste, and test right now — either against your own AI app or in the SafePrompt Playground (free, no signup).
Understanding these attacks is the first step to defending against them. Each example shows the payload, explains why it works, and notes how SafePrompt detects it.
Category 1: Direct Instruction Override
The simplest and most common attack type. The attacker explicitly tells the LLM to ignore its original instructions.
Example 1: Classic Override
Why it works: LLMs process the entire input as one stream. The phrase "ignore all previous instructions" triggers a context reset in many models, causing them to prioritize the attacker's instructions over the system prompt.
Detection: SafePrompt catches this in Stage 1 (pattern detection) — the phrase "ignore all previous instructions" is a known attack signature.
Example 2: Polite Override
Why it works: Social engineering wrapped in politeness. The casual tone makes it harder for simple pattern-matching filters to detect, because "please disregard" sounds conversational.
Detection: SafePrompt's AI validation (Stage 3) catches the semantic intent — even when the phrasing is polite, the underlying request is an instruction override.
Category 2: Role Manipulation
These attacks try to redefine the AI's identity or role to bypass safety constraints.
Example 3: DAN (Do Anything Now)
Why it works: Creating an alternate persona gives the model "permission" to bypass its constraints. The model role-plays as "DAN" and may generate content it normally refuses.
Example 4: Authority Impersonation
Why it works: Claims authority that doesn't exist. The LLM has no way to verify identity claims, so it may comply with requests framed as coming from an authority figure.
Category 3: System Prompt Extraction
Attackers try to reveal the hidden system prompt — exposing business logic, API keys, or other sensitive instructions.
Example 5: Direct Extraction
Example 6: Metadata Extraction
Why it works: Even if the model is told "never reveal your prompt," asking about metadata (length, format, first/last words) can leak information indirectly. Piece together enough metadata and you can reconstruct the entire prompt.
Category 4: Data Exfiltration
These attacks attempt to make the AI send data to an external location or encode sensitive information in its response.
Example 7: URL-Based Exfiltration
Detection: SafePrompt's Stage 2 (external reference detection) flags suspicious URLs, IP addresses, and file paths in user input. Any request containing exfiltration URLs is blocked immediately.
Example 8: Encoded Exfiltration
Why it works: Even if an AI won't say its prompt directly, encoding requests may bypass output filters. The encoded string looks like random text to humans but is trivially decoded.
Category 5: Hidden Text Injection (Indirect)
The most dangerous category for AI agents. Malicious instructions are embedded in content the AI reads — documents, emails, websites — invisible to the human user.
Example 9: Invisible Text in Documents
Why it works: The text is invisible to humans (zero-size font, matching background color) but fully visible to AI models that process the raw text. The AI follows the hidden instructions without the user knowing.
Learn more: Hidden Text Injection: The Complete Guide
Example 10: Email-Based Injection
Why it works: AI email assistants that read and act on emails may process HTML comments. If the AI has permission to send emails, this hidden instruction could exfiltrate sensitive data automatically.
Learn more: Prevent AI Email Prompt Injection Attacks
Category 6: Multi-Turn & Advanced Attacks
Sophisticated attacks that exploit conversation context or use encoding to bypass detection.
Example 11: Context Priming
Why it works: Each individual turn seems harmless. The context builds up gradually until the final request becomes an attack. This is why session-aware detection matters.
Example 12: Encoding Bypass
Why it works: The Base64 decodes to "Ignore all previous instructions." Pattern matchers can't detect the attack in its encoded form. Only semantic analysis catches the decoded intent.
Attack Detection Comparison
| Attack Type | Regex Only | AI Only | Hybrid (SafePrompt) |
|---|---|---|---|
| Direct override | Detected | Detected | Detected (Stage 1) |
| Polite override | Missed | Detected | Detected (Stage 3) |
| DAN / role play | Partial | Detected | Detected (Stage 1+3) |
| System prompt extraction | Partial | Detected | Detected (Stage 1+3) |
| URL exfiltration | Partial | Detected | Detected (Stage 2) |
| Encoded exfiltration | Missed | Detected | Detected (Stage 3) |
| Hidden text injection | Missed | Detected | Detected (Stage 3) |
| Multi-turn priming | Missed | Partial | Detected (session tracking) |
| Encoding bypass | Missed | Detected | Detected (Stage 3) |
Try These Attacks Yourself
The best way to understand prompt injection is to try it yourself. The SafePrompt Playground lets you test 21+ attack patterns in a safe sandbox environment:
- See what happens with and without SafePrompt protection
- Try your own custom payloads
- No signup or API key required
- Results in under 100ms
Test These Attacks Live
Copy any payload from this post and paste it into the playground. See how SafePrompt detects each attack in real time.
Open Playground — Free, No SignupProtect Your AI App
Every attack in this post is detected by SafePrompt's 4-stage validation pipeline. Integration takes one API call:
Start free with 1,000 validations/month. No credit card required.