Copilot Studio - System-prompt disclosure attempt and leak

Copilot Studio System Prompt Disclosure

Query

let disclosureMarkers = dynamic([
    "system prompt", "your instructions", "your system message",
    "repeat the words above", "what are your instructions",
    "print your prompt", "reveal your prompt", "initial instructions",
    "your configuration", "your guidelines"
]);
let asks =
    AppEvents
    | where Name == "BotMessageReceived"
    | extend ConvId = tostring(Properties["conversationId"]),
             Prompt = tolower(tostring(Properties["text"]))
    | where isnotempty(Prompt)
    | where Prompt has_any (disclosureMarkers)
    | project AskTime = TimeGenerated, ConvId, Prompt = substring(tostring(Properties["text"]), 0, 512),
              UserId, ChannelId = tostring(Properties["channelId"]), ClientIP;
let leaks =
    AppEvents
    | where Name == "BotMessageSend"
    | extend ConvId = tostring(Properties["conversationId"]),
             Output = tostring(Properties["text"])
    | where isnotempty(Output)
    | where Output matches regex @"(?i)(you are an?|your role is|system:|##\s*instructions|the assistant)"
    | project LeakTime = TimeGenerated, ConvId, Output = substring(Output, 0, 1024);
asks
| join kind=inner leaks on ConvId
| where LeakTime between (AskTime .. (AskTime + 10m))
| extend AccountName = iff(isempty(UserId), "unknown-agent", UserId)
| project
    AskTime, LeakTime, AccountName, ConvId, ChannelId, ClientIP,
    Prompt, Output,
    LagSeconds = datetime_diff('second', LeakTime, AskTime)
| order by AskTime desc

Explanation

This KQL query is designed to detect attempts to extract hidden system prompts from a chatbot or virtual assistant. Here's a simple breakdown of what the query does:

Purpose: The query identifies when a user tries to get the chatbot to reveal its internal instructions or system prompts, and whether the bot inadvertently discloses any part of those instructions in its response.
Data Source: It uses data from Application Insights, specifically the AppEvents data type, which logs events related to the chatbot's interactions.
Detection Logic:
- User Request: The query looks for specific phrases in user messages that suggest an attempt to uncover the bot's system prompt (e.g., "show your system prompt", "what are your instructions").
- Bot Response: It checks if the bot's response in the same conversation contains phrases that might indicate a leak of its internal instructions (e.g., "You are", "Your role is", "system:").
Correlation: The query joins these user requests and bot responses based on the conversation ID to see if they occur within a 10-minute window, indicating a potential successful or near-successful extraction of the system prompt.
Output: If such an event is detected, it logs details like the time of the request and response, user ID, channel ID, client IP, and the content of the request and response. It also calculates the time difference between the request and the response.
Alerting: If any such events are found, an alert is generated with a high severity level. The alert can be grouped by user account for incident management purposes.
Configuration: The query runs every hour and checks data from the past hour. It is set to trigger an alert if any matching events are found.

Overall, this query helps in monitoring and securing chatbot systems by identifying and alerting on potential leaks of sensitive system prompts.

Details

David Alonso

Released: June 8, 2026

Tables

AppEvents

Keywords

ApplicationInsightsAppEventsAccountIPAISystemPromptDisclosureCopilotStudio

Operators

letdynamictolowertostringisnotemptyhas_anyprojectsubstringmatches regexjoinkindbetweenextendiffisemptydatetime_difforder by

Severity

High

Tactics

DiscoveryCollection

MITRE Techniques

T1082 T1213

Frequency: PT1H

Period: PT1H

Actions

GitHub

KQL Search