Query Details

Agent Tool Error Storm

Query

id: 06172839-6666-4e12-9206-0123456789d0
name: Agent - Tool error / retry storm
description: |
  Hunts a Foundry / Agent Service conversation where a single tool
  (gen_ai.tool.name) fails repeatedly in a short window. A burst of failed
  tool spans is the function-fuzzing / capability-probing shape - an
  adversary (or a poisoned prompt) hammering a function with malformed or
  permission-probing arguments to find one that works, or a runaway loop
  retrying a denied action. Low noise: only conversations with a high
  count of FAILED calls to the same tool surface.

  Reads tool spans from AppDependencies. Success / ResultCode are native
  dependency columns; both are wrapped in column_ifexists so the query
  saves even before the connector is populated. Tune failThreshold to
  your busiest legitimate agents.
query: |
  let lookback = 1d;
  let failThreshold = 8;
  AppDependencies
  | where TimeGenerated > ago(lookback)
  | where isnotempty(Properties["gen_ai.tool.name"])
  | extend
      Agent     = tostring(Properties["gen_ai.agent.name"]),
      Model     = tostring(Properties["gen_ai.request.model"]),
      ProjectId = tostring(Properties["microsoft.foundry.project.id"]),
      ConvId    = tostring(Properties["gen_ai.conversation.id"]),
      ToolName  = tolower(tostring(Properties["gen_ai.tool.name"])),
      Success_   = tostring(column_ifexists('Success', '')),
      ResultCode = tostring(column_ifexists('ResultCode', ''))
  | extend Failed = Success_ =~ "false"
                 or ResultCode startswith "4"
                 or ResultCode startswith "5"
  | summarize
      TotalCalls   = count(),
      FailedCalls  = countif(Failed),
      FirstSeen    = min(TimeGenerated),
      LastSeen     = max(TimeGenerated),
      ResultCodes  = make_set(ResultCode, 10),
      AnyAgent     = take_any(Agent),
      AnyModel     = take_any(Model),
      AnyProject   = take_any(ProjectId)
      by ConvId, ToolName
  | extend
      DurationMin = datetime_diff('minute', LastSeen, FirstSeen),
      FailRatio   = round(todouble(FailedCalls) / todouble(TotalCalls), 2)
  | where FailedCalls >= failThreshold and FailRatio >= 0.5
  | extend Agent = AnyAgent, Model = AnyModel, ProjectId = AnyProject
  | project
      LastSeen, Agent, Model, ProjectId, ConvId, ToolName,
      FailedCalls, TotalCalls, FailRatio, DurationMin, ResultCodes
  | order by FailedCalls desc
tactics:
  - Execution
  - Discovery
techniques:
  - T1059
  - T1592
tags:
  - Sentinel-As-Code
  - Custom
  - Foundry
  - AI

Explanation

This query is designed to identify situations where a specific tool within a Foundry or Agent Service fails repeatedly in a short period. It looks for patterns that might indicate an adversary or a malfunctioning process is repeatedly trying to use a tool with incorrect or unauthorized inputs.

Here's a simplified breakdown of what the query does:

  1. Time Frame: It examines data from the past day (lookback = 1d).

  2. Failure Threshold: It focuses on tools that have failed at least 8 times (failThreshold = 8).

  3. Data Source: It reads from AppDependencies, which contains information about tool usage.

  4. Filtering: It filters out entries where the tool name is not empty.

  5. Data Extraction: It extracts relevant properties such as agent name, model, project ID, conversation ID, and tool name.

  6. Failure Identification: It marks a call as failed if the Success column is "false" or if the ResultCode starts with "4" or "5", indicating client or server errors.

  7. Aggregation: It summarizes the data by conversation ID and tool name, counting total and failed calls, and noting the first and last occurrence times.

  8. Analysis: It calculates the duration of the failures and the failure ratio (failed calls over total calls).

  9. Filtering for Results: It only keeps records where the number of failed calls is above the threshold and the failure ratio is at least 50%.

  10. Output: It projects relevant information such as the last seen time, agent, model, project ID, conversation ID, tool name, number of failed and total calls, failure ratio, duration, and result codes.

  11. Sorting: The results are sorted by the number of failed calls in descending order.

The query is useful for detecting potential security threats or operational issues where a tool is being misused or malfunctioning.

Details

David Alonso profile picture

David Alonso

Released: June 8, 2026

Tables

AppDependencies

Keywords

AppDependenciesAgentModelProjectIdConvIdToolNameSuccessResultCodeTimeGeneratedProperties

Operators

letagoisnotemptytostringcolumn_ifexiststolower=~startswithsummarizecountcountifminmaxmake_settake_anydatetime_diffroundtodoubleprojectorder by

Actions