Query Details

Foundry Guardrail Jailbreak Detected

Query

id: 5e6f7081-4444-4ddd-9201-0123456789d1
name: Foundry - Guardrail jailbreak / prompt-injection detected
description: |
  Raises an incident when a Foundry / Agent Service run trips a guardrail
  that detects an attempt to override the agent's instructions: Prompt
  Shields jailbreak detection or indirect (cross-document) prompt
  injection. These are the highest-fidelity signals that someone is
  trying to exfiltrate the system prompt, disable safety, or smuggle
  instructions through tool / RAG content.

  Reads the real Foundry telemetry shape: spans land in AppDependencies
  with the property bag in Properties; guardrail verdicts in
  microsoft.foundry.content_filter.results. Sub-key naming varies by API
  version, so jailbreak / prompt_shield / indirect_attack are all parsed
  defensively. Requires AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED
  for the prompt text to be populated.
severity: High
requiredDataConnectors:
- connectorId: ApplicationInsights
  dataTypes:
  - AppDependencies
queryFrequency: PT1H
queryPeriod: PT1H
triggerOperator: gt
triggerThreshold: 0
enabled: true
tactics:
- DefenseEvasion
- InitialAccess
- Execution
relevantTechniques:
- T1562
- T1059
query: |
  AppDependencies
  | where isnotempty(Properties["microsoft.foundry.content_filter.results"])
  | extend
      Agent     = tostring(Properties["gen_ai.agent.name"]),
      Model     = tostring(Properties["gen_ai.request.model"]),
      ConvId    = tostring(Properties["gen_ai.conversation.id"]),
      ProjectId = tostring(Properties["microsoft.foundry.project.id"]),
      Prompt    = tostring(Properties["gen_ai.input.messages"]),
      ToolName  = tostring(Properties["gen_ai.tool.name"]),
      FilterArr = todynamic(tostring(Properties["microsoft.foundry.content_filter.results"]))
  | mv-expand Entry = FilterArr
  | extend
      SourceType = tostring(Entry.source_type),
      Blocked    = tobool(Entry.blocked),
      Filter     = todynamic(Entry.content_filter_results)
  | extend
      JailbreakDetected = tobool(Filter.jailbreak.detected) or tobool(Filter.jailbreak.filtered),
      PromptShieldHit   = tobool(Filter.prompt_shield.detected) or tobool(Filter.prompt_shield.filtered),
      IndirectAttackHit = tobool(Filter.indirect_attack.detected) or tobool(Filter.indirect_attack.filtered)
  | where JailbreakDetected or PromptShieldHit or IndirectAttackHit
  | extend Signal = case(
      JailbreakDetected, "Jailbreak",
      PromptShieldHit,   "PromptShield",
      IndirectAttackHit, "IndirectPromptInjection",
      "Unknown")
  | extend AccountName = iff(isempty(Agent), "unknown-agent", Agent)
  | project
      TimeGenerated, Signal, SourceType, Blocked, AccountName, Agent, Model, ProjectId,
      ConvId, ToolName, Prompt
  | order by TimeGenerated desc
entityMappings:
- entityType: Account
  fieldMappings:
  - identifier: Name
    columnName: AccountName
- entityType: CloudApplication
  fieldMappings:
  - identifier: Name
    columnName: Model
eventGroupingSettings:
  aggregationKind: SingleAlert
incidentConfiguration:
  createIncident: true
  groupingConfiguration:
    enabled: true
    reopenClosedIncident: false
    lookbackDuration: PT6H
    matchingMethod: Selected
    groupByEntities:
    - Account
    groupByAlertDetails: []
    groupByCustomDetails: []
version: 1.0.0
kind: Scheduled
tags:
- Sentinel-As-Code
- Custom
- Foundry
- AI
- ContentSafety
- Guardrails
- Jailbreak

Explanation

This query is designed to monitor and detect attempts to bypass security measures in a system that uses Foundry or Agent Service. It specifically looks for activities that try to override the system's instructions, such as attempts to extract system prompts, disable safety features, or insert unauthorized instructions through various means.

Here's a simplified breakdown of what the query does:

  1. Data Source: It uses data from Application Insights, specifically looking at application dependencies.

  2. Detection Criteria: The query checks for specific security alerts (guardrail verdicts) related to:

    • Jailbreak attempts (trying to override system instructions).
    • Prompt Shield hits (attempts to manipulate prompts).
    • Indirect prompt injections (cross-document attempts to insert unauthorized instructions).
  3. Process:

    • It extracts relevant information such as agent name, model, conversation ID, project ID, prompt text, and tool name.
    • It evaluates whether any of the security alerts (jailbreak, prompt shield, indirect attack) are detected.
    • If any of these alerts are detected, it logs the incident with details like the type of signal detected, source type, whether the action was blocked, and other contextual information.
  4. Alert Configuration:

    • The query runs every hour and triggers an alert if any incidents are detected.
    • The severity of the alert is set to high.
    • It creates incidents in the system for further investigation and groups them by account for better management.
  5. Purpose: The main goal is to ensure that any attempts to compromise the system's security through prompt manipulation or instruction overrides are quickly identified and addressed.

Overall, this query helps maintain the integrity and security of the system by actively monitoring for and responding to potential security breaches related to prompt manipulation and instruction overrides.

Details

David Alonso profile picture

David Alonso

Released: June 8, 2026

Tables

AppDependencies

Keywords

FoundryAgentServiceAppDependenciesPropertiesGuardrailVerdictsMicrosoftAzureTracingGenAIContentRecordingApplicationInsightsDefenseEvasionInitialAccessExecutionJailbreakPromptShieldIndirectAttackAccountCloudApplicationSentinelAsCodeCustomAIContentSafetyGuardrails

Operators

isnotemptyextendtostringtodynamicmv-expandtoboolorwherecaseiffisemptyprojectorder by

Actions