Copilot Studio - Prompt-injection / instruction-override patterns in user message

Copilot Studio Prompt Injection Patterns

Query

let injectionMarkers = dynamic([
    "ignore previous instructions", "ignore all previous", "ignore your instructions",
    "disregard the above", "disregard previous", "you are now", "act as",
    "developer mode", "do anything now", "dan mode", "jailbreak",
    "reveal your system prompt", "show your system prompt", "print your instructions",
    "repeat the words above", "what are your instructions", "bypass your rules",
    "without any restrictions", "pretend you are", "from now on you"
]);
AppEvents
| where Name == "BotMessageReceived"
| extend
    ConvId    = tostring(Properties["conversationId"]),
    ChannelId = tostring(Properties["channelId"]),
    Locale    = tostring(Properties["locale"]),
    DesignMode = tostring(Properties["DesignMode"]),
    Text      = tolower(tostring(Properties["text"]))
| where isnotempty(Text)
| mv-apply Marker = injectionMarkers to typeof(string) on (
      where Text contains Marker
      | summarize Markers = make_set(Marker)
  )
| extend AccountName = iff(isempty(UserId), "unknown-agent", UserId)
| project
    TimeGenerated, AccountName, ConvId, ChannelId, Locale, DesignMode,
    Markers, Text = substring(tostring(Properties["text"]), 0, 1024),
    SessionId, ClientIP, AppVersion
| order by TimeGenerated desc

Explanation

This query is designed to monitor and detect potential prompt-injection or instruction-override attempts in user messages sent to a Copilot Studio bot. Here's a simplified breakdown of what it does:

Purpose: The query identifies messages that contain specific phrases commonly used to bypass or manipulate the bot's intended behavior. These phrases include things like "ignore previous instructions" or "developer mode."
Data Source: It analyzes data from the AppEvents table, specifically looking at events where the bot received a message (Name == "BotMessageReceived"). The message content is stored in the Properties column.
Detection Logic: The query checks if the message text contains any of the predefined suspicious phrases (injection markers). If a match is found, it logs the incident.
Incident Creation: When such a message is detected, an incident is raised with a medium severity level. This helps in identifying and responding to potential security threats.
Frequency and Scope: The query runs every hour and looks back over the past hour to catch any new incidents.
Additional Details: It captures various details like conversation ID, channel ID, locale, and the message text (up to 1024 characters) for further analysis.
Entity Mapping: It maps the detected incidents to user accounts and IP addresses for better tracking and response.
Incident Grouping: Incidents are grouped by account to avoid duplicate alerts and to manage them more efficiently.

Overall, this query helps in maintaining the security and integrity of the Copilot Studio bot by detecting and alerting on attempts to manipulate its behavior through prompt-injection techniques.

Details

David Alonso

Released: June 8, 2026

Tables

AppEvents

Keywords

CopilotStudioAppEventsPropertiesAccountIPClientApplicationInsights

Operators

letdynamictostringtolowerisnotemptymv-applycontainssummarizemake_setiffisemptyprojectsubstringorder bydesc

Severity

Medium

Tactics

InitialAccessDefenseEvasion

MITRE Techniques

T1566 T1562

Frequency: PT1H

Period: PT1H

Actions

GitHub

KQL Search