Copilot Studio - Multi-stage attack chain in a single conversation

Copilot Studio Multi Stage Attack Chain

Query

let injectionMarkers = dynamic([
    "ignore previous instructions", "ignore all previous", "disregard previous",
    "you are now", "act as", "developer mode", "do anything now", "dan mode",
    "reveal your system prompt", "show your system prompt", "bypass your rules",
    "without any restrictions", "pretend you are", "from now on you"
]);
let escalationMarkers = dynamic([
    "i am the ceo", "i am the cfo", "as an administrator", "as the administrator",
    "as an admin", "i have admin", "i am authorised", "i am authorized",
    "i have permission to", "on behalf of the", "grant me access", "elevate my access"
]);
let exfilMarkers = dynamic([
    "show all", "list all", "list every", "export all", "give me every",
    "give me the full list", "all records", "all customers", "every user",
    "dump the", "entire database", "entire table", "select *"
]);
let jailbreakMarkers = dynamic([
    "developer mode", "do anything now", "dan mode", "ignore your safety",
    "without restrictions", "bypass your rules", "disable your guardrails",
    "unrestricted mode", "you have no rules"
]);
let conns =
    AppDependencies
    | where AppRoleName == "Microsoft Copilot Studio" or DependencyType == "Connector"
    | extend ConvId = tostring(Properties["conversationId"])
    | summarize ConnectorCalls = count(), Connectors = make_set(Name, 20),
                ConnectorFailures = countif(Success == false) by ConvId;
AppEvents
| where Name == "BotMessageReceived"
| extend
    ConvId = tostring(Properties["conversationId"]),
    Text   = tolower(tostring(Properties["text"]))
| where isnotempty(Text)
| extend
    SigInjection  = Text has_any (injectionMarkers),
    SigEscalation = Text has_any (escalationMarkers),
    SigExfil      = Text has_any (exfilMarkers),
    SigJailbreak  = Text has_any (jailbreakMarkers)
| summarize
    Injection  = countif(SigInjection) > 0,
    Escalation = countif(SigEscalation) > 0,
    Exfil      = countif(SigExfil) > 0,
    Jailbreak  = countif(SigJailbreak) > 0,
    Messages   = count(),
    FirstSeen  = min(TimeGenerated),
    LastSeen   = max(TimeGenerated),
    UserId     = take_any(UserId),
    ChannelId  = take_any(tostring(Properties["channelId"])),
    ClientIP   = take_any(ClientIP),
    SampleText = make_set(substring(tostring(Properties["text"]), 0, 200), 5)
    by ConvId
| extend Stages = toint(Injection) + toint(Escalation) + toint(Exfil) + toint(Jailbreak)
| where Stages >= 2
| join kind=leftouter conns on ConvId
| extend
    AccountName    = iff(isempty(UserId), "unknown-agent", UserId),
    ConnectorCalls = coalesce(ConnectorCalls, 0),
    TimeGenerated  = LastSeen
| project
    TimeGenerated, FirstSeen, LastSeen, AccountName, ConvId, ChannelId, ClientIP,
    Stages, Injection, Escalation, Exfil, Jailbreak,
    Messages, ConnectorCalls, ConnectorFailures, Connectors, SampleText
| order by Stages desc, LastSeen desc

Explanation

This query is designed to detect potential multi-stage attack attempts within conversations in Microsoft Copilot Studio. It monitors conversations for specific suspicious activities, such as:

Prompt Injection: Attempts to manipulate the system by inserting commands like "ignore previous instructions" or "bypass your rules."
Authority/Role Impersonation: Phrases suggesting unauthorized access, like "i am the ceo" or "grant me access."
Bulk Exfiltration Intent: Requests for large amounts of data, such as "show all" or "entire database."
Jailbreak Framing: Attempts to disable security measures, using terms like "developer mode" or "unrestricted mode."

The query aggregates messages from conversations and checks for these markers. If a conversation contains two or more of these suspicious signals, it is flagged as a potential attack. The query also considers connector activity within the same conversation to enhance detection accuracy.

The query runs every hour and generates an alert if any conversation meets the criteria. It creates an incident for further investigation, grouping alerts by account and considering recent activity within a six-hour window. The goal is to identify genuine threats by detecting multiple attack stages occurring together, rather than isolated suspicious actions.

Details

David Alonso

Released: June 8, 2026

Tables

AppDependenciesAppEvents

Keywords

CopilotStudioApplicationInsightsAppEventsAppDependenciesAccountIPUserChannelClientIPConnectorConnectorsMessagesStagesInjectionEscalationExfilJailbreakTimeGenerated

Operators

letdynamicorextendtostringsummarizecountmake_setcountifwhereisnotemptytolowerhas_anyminmaxtake_anysubstringbytointjoinkindleftouteriffisemptycoalesceprojectorder bydesc

Severity

High

Tactics

InitialAccessPrivilegeEscalationExfiltration

MITRE Techniques

T1566 T1078 T1567

Frequency: PT1H

Period: PT1H

Actions

GitHub

KQL Search