Domain overview
This domain is built for operators who want to test AI systems like real systems, not like demos. The model is only one node in the chain. The real attack surface lives across hidden instructions, retrieval, embeddings, long-context ingestion, agent memory, connector permissions, tool execution, approval gates, orchestration code and the humans who over-trust the output.
Good AI assessment work combines application security, API review, auth logic, cloud exposure and workflow abuse with model-specific pressure. Prompt injection, context poisoning, output steering, authority confusion, unsafe tool invocation, retrieval exfiltration and agent compromise are all just different ways of asking whether language can seize control of automation.
Primary operator questions
- Can untrusted content override or reshape the hidden instruction hierarchy?
- Can a retrieved document, email, web page or ticket poison the model's planning path?
- Can the assistant call tools, query data stores or send actions with more authority than it should?
- Can model output be trusted by code, analysts or business workflows without verification?
- Can the system be pushed from harmless chat into data exposure, lateral movement or destructive action?
Red-team pressure lines
Useful pressure usually follows five lanes. First, instruction attacks: direct prompt injection, indirect prompt injection, jailbreak chaining and system prompt leakage. Second, retrieval attacks: poisoned corpora, malicious documents, embedded instructions and confidence laundering through RAG. Third, agent abuse: unauthorized tool use, connector overreach, action replay and confirmation bypass. Fourth, API and inference weaknesses: weak auth, file-handling mistakes, quota abuse, plugin boundaries and tenant leakage. Fifth, reporting discipline: proving whether the behavior is reachable, repeatable and tied to real business impact.
Related certification context
These certifications are not the point of the domain, but they are useful orientation anchors for operators who want a formal practice path beside the field notes.
- OffSec OSAI+ / AI-300 · Advanced AI Red TeamingClosest fit for offensive work against LLMs, agents, RAG and AI infrastructure.
- OffSec OSCP+ / PEN-200 · Penetration Testing with Kali LinuxUseful baseline for scoping, evidence handling and exploitation discipline.
- OffSec OSWE / WEB-300 · Advanced Web Attacks and ExploitationStrong adjacent context because most AI systems still fail at classic web, API and trust-boundary controls.
Curated public references
- OWASP Gen AI Security ProjectProject home for LLM and GenAI security guidance.
- OWASP Top 10 for LLM Applications 2025Useful risk framing for prompt injection, insecure output handling, sensitive information disclosure and model abuse.
- OWASP Top 10 for Agentic Applications 2026Agent-focused risk framing for autonomous planning, tool invocation and multi-step workflow compromise.
- MITRE ATLASAdversarial tactics and techniques mapped to AI-enabled systems.
- NIST AI RMF 1.0Operational risk-management framing for AI systems.
- NIST SP 800-218ASecure development practices for generative AI and dual-use foundation models.
Brief index
AI Attack Surface Primer
Maps where hidden instructions, memory, retrieval, tools and human approvals create real attack paths.
Prompt Injection & Jailbreaks
Direct and indirect instruction hijacking, safety bypassing, prompt leakage and response steering under attacker control.
RAG, Agents & Tool Abuse
Poisoned retrieval, unsafe planners, over-privileged connectors, confirmation bypass and action-layer compromise.
Model API & Inference Security
Model endpoints, auth, file handling, quota pressure, tenant isolation, inference routing and plugin boundaries.
AI Red Teaming Methodology
Scoping, replayability, harm framing, evidence discipline and reporting patterns that survive scrutiny.
LLM Pentesting Note
Existing specialist note linked back into the wider advanced surface.
