Agent - Tool error / retry storm

Agent Tool Error Storm

Query

let lookback = 1d;
let failThreshold = 8;
AppDependencies
| where TimeGenerated > ago(lookback)
| where isnotempty(Properties["gen_ai.tool.name"])
| extend
    Agent     = tostring(Properties["gen_ai.agent.name"]),
    Model     = tostring(Properties["gen_ai.request.model"]),
    ProjectId = tostring(Properties["microsoft.foundry.project.id"]),
    ConvId    = tostring(Properties["gen_ai.conversation.id"]),
    ToolName  = tolower(tostring(Properties["gen_ai.tool.name"])),
    Success_   = tostring(column_ifexists('Success', '')),
    ResultCode = tostring(column_ifexists('ResultCode', ''))
| extend Failed = Success_ =~ "false"
               or ResultCode startswith "4"
               or ResultCode startswith "5"
| summarize
    TotalCalls   = count(),
    FailedCalls  = countif(Failed),
    FirstSeen    = min(TimeGenerated),
    LastSeen     = max(TimeGenerated),
    ResultCodes  = make_set(ResultCode, 10),
    AnyAgent     = take_any(Agent),
    AnyModel     = take_any(Model),
    AnyProject   = take_any(ProjectId)
    by ConvId, ToolName
| extend
    DurationMin = datetime_diff('minute', LastSeen, FirstSeen),
    FailRatio   = round(todouble(FailedCalls) / todouble(TotalCalls), 2)
| where FailedCalls >= failThreshold and FailRatio >= 0.5
| extend Agent = AnyAgent, Model = AnyModel, ProjectId = AnyProject
| project
    LastSeen, Agent, Model, ProjectId, ConvId, ToolName,
    FailedCalls, TotalCalls, FailRatio, DurationMin, ResultCodes
| order by FailedCalls desc

Explanation

This query is designed to identify situations where a specific tool within a Foundry or Agent Service fails repeatedly in a short period. It looks for patterns that might indicate an adversary or a malfunctioning process is repeatedly trying to use a tool with incorrect or unauthorized inputs.

Here's a simplified breakdown of what the query does:

Time Frame: It examines data from the past day (lookback = 1d).
Failure Threshold: It focuses on tools that have failed at least 8 times (failThreshold = 8).
Data Source: It reads from AppDependencies, which contains information about tool usage.
Filtering: It filters out entries where the tool name is not empty.
Data Extraction: It extracts relevant properties such as agent name, model, project ID, conversation ID, and tool name.
Failure Identification: It marks a call as failed if the Success column is "false" or if the ResultCode starts with "4" or "5", indicating client or server errors.
Aggregation: It summarizes the data by conversation ID and tool name, counting total and failed calls, and noting the first and last occurrence times.
Analysis: It calculates the duration of the failures and the failure ratio (failed calls over total calls).
Filtering for Results: It only keeps records where the number of failed calls is above the threshold and the failure ratio is at least 50%.
Output: It projects relevant information such as the last seen time, agent, model, project ID, conversation ID, tool name, number of failed and total calls, failure ratio, duration, and result codes.
Sorting: The results are sorted by the number of failed calls in descending order.

The query is useful for detecting potential security threats or operational issues where a tool is being misused or malfunctioning.

Details

David Alonso

Released: June 8, 2026

Tables

AppDependencies

Keywords

AppDependenciesAgentModelProjectIdConvIdToolNameSuccessResultCodeTimeGeneratedProperties

Operators

letagoisnotemptytostringcolumn_ifexiststolower=~startswithsummarizecountcountifminmaxmake_settake_anydatetime_diffroundtodoubleprojectorder by

Tactics

ExecutionDiscovery

MITRE Techniques

T1059 T1592

Actions

GitHub

KQL Search