AI Security // Prompt & Policy Control

Prompt Injection & Jailbreaks

Prompt Injection & Jailbreaks is presented here as a field note for offensive security work. The emphasis is on attack surface, validation logic, common failure patterns, operator choices and the public references worth keeping nearby during a live assessment.

field noteassessment referencepublic sources

Why it matters in practice

Prompt Injection & Jailbreaks matters because it shapes how an operator scopes the work, chooses validation steps, prioritizes evidence and explains risk. The point is not to accumulate trivia; it is to understand which control boundary is in play and how that boundary can fail under realistic pressure.

This note keeps prompt injection & jailbreaks tied to offensive workflow: what to observe, what to prove, what usually goes wrong, and which references remain useful once an assessment moves from planning into active validation.

Primary coverage

  • Direct injection against the visible chat surface.
  • Indirect injection through RAG, browsing, imported files and helpdesk content.
  • System prompt extraction and policy leakage.
  • Safety-evasion chains that rely on roleplay, translation, summarisation or format-shifting.
  • Output steering where the model convinces another component or analyst to take an unsafe step.

Selected public references

Good reporting preserves the exact payload, the preconditions, the response pattern, the trust boundary crossed and the downstream consequence. That is what turns a jailbreak into a security finding instead of a screenshot.

Selected public references