Query Details

Copilot Jailbreak Multi Turn Hunting

Query

id: 1a02932a-90d3-a1e4-f3d5-628293a2b2c7
name: Microsoft 365 Copilot - Multi-turn jailbreak escalation hunting
description: |
  Hunts for Microsoft 365 Copilot conversations where a single user gradually
  escalates the prompts against an agent: starting with benign
  questions, then adding role-play / persona language, then asking
  for policy-bypass output. Multi-turn jailbreaks evade single-turn
  detectors and are described in the OWASP Top 10 for Agents 2026
  multi-turn-attack category.

  Surfaces conversations with three or more escalation steps over
  a one-day window so an analyst can review the full transcript.
query: |
  // Confirmed schema: per-message JailbreakDetected aggregated by ThreadId.
  // A thread with two or more flagged messages is a multi-turn jailbreak
  // attempt (single-turn jailbreaks are already covered by the analytic).
  let window = 1d;
  CopilotActivity
  | where TimeGenerated > ago(window)
  | where RecordType == "CopilotInteraction"
  | extend ThreadId = tostring(LLMEventData.ThreadId)
  | mv-expand m = LLMEventData.Messages
  | extend
      MessageId = tostring(m.Id),
      IsPrompt = tobool(m.isPrompt),
      JbDetected = tobool(m.JailbreakDetected)
  | summarize
      Messages = count(),
      Prompts = countif(IsPrompt),
      JailbreakHits = countif(JbDetected),
      PromptJailbreakHits = countif(JbDetected and IsPrompt),
      Agents = make_set(AgentName, 4),
      Actors = make_set(ActorName, 4),
      JbMessageIds = make_set_if(MessageId, JbDetected, 32),
      FirstSeen = min(TimeGenerated),
      LastSeen = max(TimeGenerated)
      by ThreadId, TenantId
  | where JailbreakHits >= 2
  | extend EscalationRatio = todouble(JailbreakHits) / todouble(Messages)
  | order by JailbreakHits desc, EscalationRatio desc, LastSeen desc
tactics:
  - DefenseEvasion
  - InitialAccess
techniques:
  - T1562
  - T1059
tags:
  - Sentinel-As-Code
  - Custom
  - Copilot
  - AI

Explanation

This query is designed to identify and analyze conversations involving Microsoft 365 Copilot where a user gradually escalates their prompts in an attempt to bypass security measures. Here's a simplified breakdown of what the query does:

  1. Purpose: The query hunts for conversations where a user starts with harmless questions and gradually escalates to using role-play or persona language, eventually attempting to bypass policies. This type of attack is known as a "multi-turn jailbreak."

  2. Time Frame: It examines interactions within a one-day window.

  3. Data Source: It looks at records from the CopilotActivity table, specifically those labeled as "CopilotInteraction."

  4. Process:

    • It extracts and processes each message in a conversation thread.
    • It checks if each message is a prompt and whether it has been flagged as a potential jailbreak attempt.
    • It aggregates data by conversation thread (ThreadId) and tenant (TenantId).
  5. Criteria for Detection:

    • A conversation is flagged if there are two or more messages in the thread that are identified as potential jailbreak attempts.
  6. Output:

    • It calculates the ratio of jailbreak attempts to total messages.
    • It orders the results by the number of jailbreak attempts, escalation ratio, and the time of the last message.
  7. Objective: The goal is to surface these potentially malicious conversations for further review by an analyst, allowing them to examine the full transcript of the interaction.

  8. Security Context: This query is part of a defense strategy against tactics like Defense Evasion and Initial Access, using techniques such as T1562 (Impair Defenses) and T1059 (Command and Scripting Interpreter).

  9. Tags: The query is tagged for use with Sentinel-As-Code, Custom solutions, Copilot, and AI-related activities.

Details

David Alonso profile picture

David Alonso

Released: May 20, 2026

Tables

CopilotActivity

Keywords

MicrosoftCopilotActivityThreadIdMessagesPromptsAgentsActorsTenantIdJailbreakHitsEscalationRatioTimeGenerated

Operators

let|where>ago==extendtostringmv-expandtoboolsummarizecount()countifmake_setmake_set_ifminmaxby>=todouble/order bydesc

Actions