Foundry - Goal hijacking / goal drift across spans

Foundry Goal Drift Between Spans

Query

let benignFirstWords = dynamic([
    "summarise","summarize","translate","explain","help me","tell me",
    "draft","write a","compose","describe","what is","how does",
    "example","tutorial","define","outline","brainstorm","rephrase",
    "improve the wording","proofread","check the grammar"
]);
let sensitiveIntent = dynamic([
    "delete","drop","grant","revoke","admin","password","secret","key",
    "token","credential","database","table","customer","payment",
    "invoice","payroll","exfil","send to","upload to","exec","execute",
    "run command","ssh","sudo","privilege","impersonate"
]);
let sensitiveTools = dynamic([
    "code_interpreter","python","shell","bash","powershell","exec","run_code",
    "http_request","fetch","invoke_url","send_email","send_message",
    "execute_sql","query_database","file_write","write_file","upload",
    "create_resource","delete_resource","azure_write","deploy"
]);
let conv =
    AppDependencies
    | where TimeGenerated > ago(2h)
    | where isnotempty(Properties["gen_ai.conversation.id"])
    | extend
        ConvId   = tostring(Properties["gen_ai.conversation.id"]),
        Agent    = tostring(Properties["gen_ai.agent.name"]),
        Model    = tostring(Properties["gen_ai.request.model"]),
        Input    = tolower(tostring(Properties["gen_ai.input.messages"])),
        ToolName = tolower(tostring(Properties["gen_ai.tool.name"])),
        ToolType = tolower(tostring(Properties["gen_ai.tool.type"]))
    | extend IsSensitive = ToolName has_any (sensitiveTools) or ToolType has_any (sensitiveTools);
let firstMsg =
    conv
    | where isnotempty(Input)
    | summarize arg_min(TimeGenerated, Input, Agent, Model) by ConvId
    | extend
        FirstInput      = substring(Input, 0, 1024),
        BenignIntent    = Input has_any (benignFirstWords),
        SensitiveIntent = Input has_any (sensitiveIntent);
let acts =
    conv
    | where IsSensitive
    | summarize SensitiveCalls   = count(),
                Tools             = make_set(ToolName, 16),
                DistinctTools     = dcount(ToolName),
                FirstSeen         = min(TimeGenerated),
                LastSeen          = max(TimeGenerated)
            by ConvId;
firstMsg
| join kind=inner acts on ConvId
| where SensitiveCalls >= 3
        and BenignIntent == true
        and SensitiveIntent == false
| extend AccountName = iff(isempty(Agent), "unknown-agent", Agent)
| project LastSeen, AccountName, Agent, Model, ConvId, SensitiveCalls,
          DistinctTools, Tools, FirstInput, FirstSeen
| order by SensitiveCalls desc

Explanation

This query is designed to detect a specific type of security issue known as "goal hijacking" in conversations involving AI agents. Here's a simplified breakdown of what the query does:

Purpose: It identifies situations where a user starts a conversation with a seemingly harmless request, but the AI agent ends up performing sensitive actions without the user's explicit intent.
Detection Criteria:
- The conversation begins with a benign request (e.g., "summarize," "translate," "explain") and does not include any sensitive terms (e.g., "delete," "admin," "password").
- During the same conversation, the AI agent makes three or more sensitive tool calls (e.g., executing code, sending emails, accessing databases) within a two-hour window.
Data Source: The query uses data from Application Insights, specifically looking at application dependencies to track conversations and actions.
Process:
- It first identifies conversations that started with benign requests.
- It then checks if these conversations involved sensitive actions.
- If a conversation meets both criteria (benign start and multiple sensitive actions), it flags it as a potential goal hijacking incident.
Output: The query outputs details such as the last time the sensitive actions were seen, the agent involved, the AI model used, the conversation ID, and the number of sensitive actions detected.
Severity and Response: The severity is set to medium, and the system is configured to create an incident if such a pattern is detected, allowing for further investigation.

Overall, this query helps in identifying and alerting on potential misuse or unintended actions by AI agents, ensuring that sensitive operations are not performed without proper intent.

Details

David Alonso

Released: June 8, 2026

Tables

AppDependencies

Keywords

ApplicationInsightsAppDependenciesAccountCloudApplicationModelAgentToolNameToolTypePropertiesSensitiveCallsDistinctToolsFirstInputConvId

Operators

letdynamictostringtolowerisnotemptyhas_anyagoextendsummarizearg_minsubstringcountmake_setdcountminmaxjoinkind=inneriffisemptyprojectorder by

Severity

Medium

Tactics

ExecutionDefenseEvasion

MITRE Techniques

T1059 T1556

Frequency: PT1H

Period: PT2H

Actions

GitHub

KQL Search