Microsoft 365 Copilot - Jailbreak attempt against AI agent

Copilot Jailbreak Attempt Detection

Query

// Uses Microsoft's native Prompt Shield verdict surfaced in
// LLMEventData.Messages[].JailbreakDetected (confirmed by schema probe).
CopilotActivity
| where TimeGenerated > ago(1h)
| where RecordType == "CopilotInteraction"
| extend ThreadId = tostring(LLMEventData.ThreadId)
| extend AppHost = tostring(LLMEventData.AppHost)
| mv-expand m = LLMEventData.Messages
| extend
    MessageId = tostring(m.Id),
    IsPrompt = tobool(m.isPrompt),
    JailbreakDetected = tobool(m.JailbreakDetected)
| where JailbreakDetected == true
| summarize
    JailbreakHits = count(),
    PromptHits = countif(IsPrompt),
    Threads = make_set(ThreadId, 16),
    MessageIds = make_set(MessageId, 32),
    AppHosts = make_set(AppHost, 8),
    ClientIPs = make_set(SrcIpAddr, 16),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated)
    by AgentId, AgentName, ActorName, ActorUserId, TenantId
| extend SrcIpAddr = tostring(ClientIPs[0])

Explanation

This query is designed to detect and alert on potential "jailbreak" attempts against Microsoft 365 Copilot, an AI agent. Here's a simplified breakdown:

Purpose: The query identifies prompts that contain known jailbreak markers, such as attempts to bypass safety protocols or requests for the AI to reveal its internal instructions. These attempts are often precursors to unauthorized data access or harmful outputs.
Severity: The alert is marked as high severity, indicating the importance of addressing these attempts promptly.
Data Source: It uses data from Microsoft Copilot activities, specifically looking at interactions with the AI.
Frequency and Period: The query runs every 15 minutes and looks at data from the past hour.
Detection Logic:
- It examines AI interactions to find messages flagged as potential jailbreaks.
- It counts how many such attempts occurred, identifies the threads, messages, applications, and IP addresses involved, and notes when these attempts were first and last seen.
Output: The results are summarized by agent and actor details, providing a clear view of who might be attempting the jailbreak and from where.
Alerting and Incident Management:
- If any jailbreak attempts are detected, an incident is created.
- The incidents are grouped by account and application to manage them effectively.
Entity Mapping: The query maps relevant data to entities like Cloud Applications, Accounts, and IP addresses for better context in alerts.
Version and Tags: This is version 1.0.0 of the query, tagged for use with Sentinel-As-Code, Custom, Copilot, and AI-related monitoring.

In essence, this query helps security teams quickly identify and respond to potential security threats involving AI misuse within Microsoft 365 Copilot.

Details

David Alonso

Released: May 20, 2026

Tables

CopilotActivity

Keywords

CopilotActivityCloudApplicationAccountIP

Operators

ago==tostringmv-expandtoboolsummarizecountcountifmake_setminmaxextendby

Severity

High

Tactics

DefenseEvasionInitialAccessExecution

MITRE Techniques

T1562 T1059

Frequency: PT15M

Period: PT1H

Actions

GitHub

KQL Search