Copilot Studio - Prompt-injection signals in user messages

Agent Prompt Injection Signals

Query

let lookback = 1d;
let injectionMarkers = dynamic([
    "ignore previous instructions", "ignore all previous", "ignore your instructions",
    "disregard the above", "disregard previous", "you are now", "act as",
    "developer mode", "do anything now", "dan mode", "jailbreak",
    "reveal your system prompt", "show your system prompt", "print your instructions",
    "repeat the words above", "what are your instructions", "bypass your rules",
    "without any restrictions", "pretend you are", "from now on you",
    "new instructions:", "system:", "override", "you must now"
]);
AppEvents
| where TimeGenerated > ago(lookback)
| where Name == "BotMessageReceived"
| extend
    ConvId    = tostring(Properties["conversationId"]),
    ChannelId = tostring(Properties["channelId"]),
    Text      = tolower(tostring(Properties["text"]))
| where isnotempty(Text)
| mv-apply Marker = injectionMarkers to typeof(string) on (
      where Text contains Marker
      | summarize Markers = make_set(Marker)
  )
| extend MarkerCount = array_length(Markers)
| project TimeGenerated, UserId, ConvId, ChannelId, MarkerCount, Markers,
          Text = substring(tostring(Properties["text"]), 0, 1024), ClientIP
| order by MarkerCount desc, TimeGenerated desc

Explanation

This query is designed to monitor incoming user messages for signs of prompt-injection or attempts to override instructions, which could indicate potential security risks or misuse. Here's a simple breakdown:

Time Frame: It looks at messages from the past day (lookback = 1d).
Markers: It checks for specific phrases (like "ignore previous instructions" or "jailbreak") that might suggest an attempt to manipulate or bypass the system's intended behavior.
Data Source: It examines events where a bot received a message (Name == "BotMessageReceived").
Processing:
- Extracts conversation and channel IDs, and converts the message text to lowercase for uniformity.
- Filters out messages that are empty.
- Checks each message for the presence of any of the specified markers.
- Counts how many different markers are found in each message.
Output: It lists the messages, showing when they were received, user and conversation details, the number of markers found, the specific markers, a snippet of the message text, and the client's IP address.
Sorting: Results are sorted by the number of markers found (highest first), then by the time they were generated (most recent first).
Purpose: This helps identify conversations that might need further review for potential security threats or misuse, aiding in tuning and proactive monitoring.
Security Context: The query is associated with tactics like Initial Access and Defense Evasion, and techniques such as Phishing (T1566) and Impair Defenses (T1562). It is tagged for use with Sentinel-As-Code, custom monitoring, and AI-related activities.

Details

David Alonso

Released: June 8, 2026

Tables

AppEvents

Keywords

AppEvents

Operators

letdynamicwhereagoextendtostringtolowerisnotemptymv-applycontainssummarizemake_setarray_lengthprojectsubstringorder by

Tactics

InitialAccessDefenseEvasion

MITRE Techniques

T1566 T1562

Actions

GitHub

KQL Search

Copilot Studio - Prompt-injection signals in user messages

Query

Explanation

Details

Actions