Foundry - Guardrail jailbreak / prompt-injection detected

Foundry Guardrail Jailbreak Detected

Query

AppDependencies
| where isnotempty(Properties["microsoft.foundry.content_filter.results"])
| extend
    Agent     = tostring(Properties["gen_ai.agent.name"]),
    Model     = tostring(Properties["gen_ai.request.model"]),
    ConvId    = tostring(Properties["gen_ai.conversation.id"]),
    ProjectId = tostring(Properties["microsoft.foundry.project.id"]),
    Prompt    = tostring(Properties["gen_ai.input.messages"]),
    ToolName  = tostring(Properties["gen_ai.tool.name"]),
    FilterArr = todynamic(tostring(Properties["microsoft.foundry.content_filter.results"]))
| mv-expand Entry = FilterArr
| extend
    SourceType = tostring(Entry.source_type),
    Blocked    = tobool(Entry.blocked),
    Filter     = todynamic(Entry.content_filter_results)
| extend
    JailbreakDetected = tobool(Filter.jailbreak.detected) or tobool(Filter.jailbreak.filtered),
    PromptShieldHit   = tobool(Filter.prompt_shield.detected) or tobool(Filter.prompt_shield.filtered),
    IndirectAttackHit = tobool(Filter.indirect_attack.detected) or tobool(Filter.indirect_attack.filtered)
| where JailbreakDetected or PromptShieldHit or IndirectAttackHit
| extend Signal = case(
    JailbreakDetected, "Jailbreak",
    PromptShieldHit,   "PromptShield",
    IndirectAttackHit, "IndirectPromptInjection",
    "Unknown")
| extend AccountName = iff(isempty(Agent), "unknown-agent", Agent)
| project
    TimeGenerated, Signal, SourceType, Blocked, AccountName, Agent, Model, ProjectId,
    ConvId, ToolName, Prompt
| order by TimeGenerated desc

Explanation

This query is designed to monitor and detect attempts to bypass security measures in a system that uses Foundry or Agent Service. It specifically looks for activities that try to override the system's instructions, such as attempts to extract system prompts, disable safety features, or insert unauthorized instructions through various means.

Here's a simplified breakdown of what the query does:

Data Source: It uses data from Application Insights, specifically looking at application dependencies.
Detection Criteria: The query checks for specific security alerts (guardrail verdicts) related to:
- Jailbreak attempts (trying to override system instructions).
- Prompt Shield hits (attempts to manipulate prompts).
- Indirect prompt injections (cross-document attempts to insert unauthorized instructions).
Process:
- It extracts relevant information such as agent name, model, conversation ID, project ID, prompt text, and tool name.
- It evaluates whether any of the security alerts (jailbreak, prompt shield, indirect attack) are detected.
- If any of these alerts are detected, it logs the incident with details like the type of signal detected, source type, whether the action was blocked, and other contextual information.
Alert Configuration:
- The query runs every hour and triggers an alert if any incidents are detected.
- The severity of the alert is set to high.
- It creates incidents in the system for further investigation and groups them by account for better management.
Purpose: The main goal is to ensure that any attempts to compromise the system's security through prompt manipulation or instruction overrides are quickly identified and addressed.

Overall, this query helps maintain the integrity and security of the system by actively monitoring for and responding to potential security breaches related to prompt manipulation and instruction overrides.

Details

David Alonso

Released: June 8, 2026

Tables

AppDependencies

Keywords

FoundryAgentServiceAppDependenciesPropertiesGuardrailVerdictsMicrosoftAzureTracingGenAIContentRecordingApplicationInsightsDefenseEvasionInitialAccessExecutionJailbreakPromptShieldIndirectAttackAccountCloudSentinelAsCodeCustomSafetyGuardrails

Operators

isnotemptyextendtostringtodynamicmv-expandtoboolorwherecaseiffisemptyprojectorder by

Severity

High

Tactics

DefenseEvasionInitialAccessExecution

MITRE Techniques

T1562 T1059

Frequency: PT1H

Period: PT1H

Actions

GitHub

KQL Search