Foundry - System prompt / instruction disclosure

Foundry System Prompt Disclosure

Query

AppDependencies
| where isnotempty(Properties["gen_ai.output.messages"])
| extend
    Agent     = tostring(Properties["gen_ai.agent.name"]),
    Model     = tostring(Properties["gen_ai.request.model"]),
    ConvId    = tostring(Properties["gen_ai.conversation.id"]),
    ProjectId = tostring(Properties["microsoft.foundry.project.id"]),
    Input     = tolower(tostring(Properties["gen_ai.input.messages"])),
    Output    = tostring(Properties["gen_ai.output.messages"])
| extend AskedForPrompt = Input has_any (
      "system prompt", "initial instructions", "your instructions",
      "reveal your prompt", "repeat the words above", "what are your rules",
      "developer message", "print your system", "show me your prompt",
      "ignore previous instructions", "what is your system message",
      "everything above this line", "verbatim instructions")
| extend LeakMarker = Output has_any (
      "you are an", "your role is", "# system", "system prompt",
      "you must never", "do not reveal", "your instructions are",
      "as an ai assistant", "you have access to the following tools",
      "your available tools", "you should always", "never disclose")
| where AskedForPrompt and LeakMarker
| extend AccountName = iff(isempty(Agent), "unknown-agent", Agent)
| project
    TimeGenerated, AccountName, Agent, Model, ProjectId, ConvId,
    Input, Output
| order by TimeGenerated desc

Explanation

This query is designed to detect potential security breaches involving AI systems, specifically targeting scenarios where an attacker tries to extract hidden instructions or system prompts from an AI agent. Here's a simplified breakdown:

Purpose: The query identifies interactions where a user attempts to uncover the AI agent's internal instructions or prompts, and the agent inadvertently reveals such information in its response. This is a security concern known as "system prompt disclosure."
Data Source: It analyzes messages exchanged between users and AI agents, specifically looking at input and output messages recorded in the AppDependencies data from Application Insights.
Conditions:
- The input message contains phrases indicating an attempt to reveal the system prompt (e.g., "system prompt," "your instructions," "show me your prompt").
- The output message contains phrases that suggest the agent has disclosed internal instructions (e.g., "you are an," "your role is," "system prompt").
Alert Generation: If both conditions are met (i.e., the input suggests probing for prompts and the output suggests a leak), an alert is triggered.
Severity and Actions: The alert is marked with high severity, and incidents are created for further investigation. The system groups related alerts to manage them efficiently.
Frequency: The query runs every hour and checks data from the past hour.
Customization: The phrase lists used to detect probing and leaks are broad and should be tailored to fit specific AI agents to reduce false positives.

Overall, this query helps in identifying and preventing unauthorized access to sensitive AI system instructions, which could be exploited for malicious purposes.

Details

David Alonso

Released: June 8, 2026

Tables

AppDependencies

Keywords

AppDependenciesPropertiesAgentModelConvIdProjectIdInputOutputAccountNameTimeGeneratedAccountCloudApplicationSentinelAsCodeCustomFoundryAIOWASPLLM07

Operators

isnotemptyextendtostringtolowerhas_anyiffisemptyprojectorder by

Severity

High

Tactics

CollectionCredentialAccess

MITRE Techniques

T1213 T1552

Frequency: PT1H

Period: PT1H

Actions

GitHub

KQL Search