Query Details

Copilot Jailbreak Attempt Detection

Query

id: 5a8c3f64-3a6d-4b8e-9d7f-0c1a2b3c4d51
name: Microsoft 365 Copilot - Jailbreak attempt against AI agent
description: |
  Detects prompts that contain known jailbreak markers (persona
  takeovers such as DAN / DUDE / STAN, "ignore safety", "developer
  mode", policy-bypass framings) or that ask the agent to disclose
  its own system prompt.

  Jailbreaks are the precursor to most agent-driven exfiltration
  and toxic-output incidents. Surfaces successful or attempted
  jailbreaks so SOC can revoke the session and review what tools
  the agent invoked afterwards.
severity: High
requiredDataConnectors:
- connectorId: MicrosoftCopilot
  dataTypes:
  - CopilotActivity
queryFrequency: PT15M
queryPeriod: PT1H
triggerOperator: gt
triggerThreshold: 0
enabled: true
tactics:
- DefenseEvasion
- InitialAccess
- Execution
relevantTechniques:
- T1562
- T1059
query: |
  // Uses Microsoft's native Prompt Shield verdict surfaced in
  // LLMEventData.Messages[].JailbreakDetected (confirmed by schema probe).
  CopilotActivity
  | where TimeGenerated > ago(1h)
  | where RecordType == "CopilotInteraction"
  | extend ThreadId = tostring(LLMEventData.ThreadId)
  | extend AppHost = tostring(LLMEventData.AppHost)
  | mv-expand m = LLMEventData.Messages
  | extend
      MessageId = tostring(m.Id),
      IsPrompt = tobool(m.isPrompt),
      JailbreakDetected = tobool(m.JailbreakDetected)
  | where JailbreakDetected == true
  | summarize
      JailbreakHits = count(),
      PromptHits = countif(IsPrompt),
      Threads = make_set(ThreadId, 16),
      MessageIds = make_set(MessageId, 32),
      AppHosts = make_set(AppHost, 8),
      ClientIPs = make_set(SrcIpAddr, 16),
      FirstSeen = min(TimeGenerated),
      LastSeen = max(TimeGenerated)
      by AgentId, AgentName, ActorName, ActorUserId, TenantId
  | extend SrcIpAddr = tostring(ClientIPs[0])
entityMappings:
- entityType: CloudApplication
  fieldMappings:
  - identifier: Name
    columnName: AgentName
  - identifier: AppId
    columnName: AgentId
- entityType: Account
  fieldMappings:
  - identifier: Name
    columnName: ActorName
- entityType: IP
  fieldMappings:
  - identifier: Address
    columnName: SrcIpAddr
eventGroupingSettings:
  aggregationKind: SingleAlert
incidentConfiguration:
  createIncident: true
  groupingConfiguration:
    enabled: true
    reopenClosedIncident: false
    lookbackDuration: PT5H
    matchingMethod: Selected
    groupByEntities:
    - Account
    - CloudApplication
    groupByAlertDetails: []
    groupByCustomDetails: []
version: 1.0.0
kind: Scheduled
tags:
- Sentinel-As-Code
- Custom
- Copilot
- AI

Explanation

This query is designed to detect and alert on potential "jailbreak" attempts against Microsoft 365 Copilot, an AI agent. Here's a simplified breakdown:

  1. Purpose: The query identifies prompts that contain known jailbreak markers, such as attempts to bypass safety protocols or requests for the AI to reveal its internal instructions. These attempts are often precursors to unauthorized data access or harmful outputs.

  2. Severity: The alert is marked as high severity, indicating the importance of addressing these attempts promptly.

  3. Data Source: It uses data from Microsoft Copilot activities, specifically looking at interactions with the AI.

  4. Frequency and Period: The query runs every 15 minutes and looks at data from the past hour.

  5. Detection Logic:

    • It examines AI interactions to find messages flagged as potential jailbreaks.
    • It counts how many such attempts occurred, identifies the threads, messages, applications, and IP addresses involved, and notes when these attempts were first and last seen.
  6. Output: The results are summarized by agent and actor details, providing a clear view of who might be attempting the jailbreak and from where.

  7. Alerting and Incident Management:

    • If any jailbreak attempts are detected, an incident is created.
    • The incidents are grouped by account and application to manage them effectively.
  8. Entity Mapping: The query maps relevant data to entities like Cloud Applications, Accounts, and IP addresses for better context in alerts.

  9. Version and Tags: This is version 1.0.0 of the query, tagged for use with Sentinel-As-Code, Custom, Copilot, and AI-related monitoring.

In essence, this query helps security teams quickly identify and respond to potential security threats involving AI misuse within Microsoft 365 Copilot.

Details

David Alonso profile picture

David Alonso

Released: May 20, 2026

Tables

CopilotActivity

Keywords

CopilotActivityCloudApplicationAccountIP

Operators

ago==tostringmv-expandtoboolsummarizecountcountifmake_setminmaxextendby

Actions