OpenAI - Anomalous token / cost spike per user

Open AI Token Cost Spike

Query

let lookback = 7d;
let recentWindow = 1h;
let perHour =
    OpenAIChatCompletions
    | where TimeGenerated > ago(lookback)
    | extend ActorUser = tostring(AdditionalFields.input_user)
    | extend TotalTokens = todouble(InputTokensUsed) + todouble(OutputTokensUsed)
    | summarize
        HourTokens = sum(TotalTokens),
        HourRequests = count()
        by ModelName, ActorUser, Hour = bin(TimeGenerated, 1h);
let baseline =
    perHour
    | where Hour < bin(now(), 1h) - recentWindow
    | summarize
        MedianHourTokens = percentile(HourTokens, 50),
        P95HourTokens = percentile(HourTokens, 95)
        by ModelName, ActorUser;
let recent =
    perHour
    | where Hour >= bin(now(), 1h) - recentWindow;
recent
| join kind=leftouter baseline on ModelName, ActorUser
| extend
    MedianHourTokens = coalesce(todouble(MedianHourTokens), 0.0),
    P95HourTokens = coalesce(todouble(P95HourTokens), 0.0)
| extend SpikeRatio = iff(MedianHourTokens > 0, HourTokens / MedianHourTokens, HourTokens)
| where HourTokens > 50000
    and (SpikeRatio >= 3.0 or HourTokens > P95HourTokens * 2)
| project
    Hour, ModelName, ActorUser, HourRequests, HourTokens,
    MedianHourTokens, P95HourTokens, SpikeRatio

Explanation

This query is designed to detect unusual spikes in token usage by OpenAI API users. Here's a simple breakdown of what it does:

Purpose: The query identifies OpenAI API users whose token usage in the past hour is more than three times their median hourly usage over the past seven days. This helps in spotting potential token abuse, runaway processes, or cost-related attacks.
Data Source: It uses data from the OpenAI connector, specifically the ASimAgentEventLogs, which include actual counts of input and output tokens used.
Logic:
- It calculates the total tokens used per hour for each user and model over the past seven days.
- It establishes a baseline by calculating the median and 95th percentile of hourly token usage for each user and model.
- It then checks the most recent hour's token usage against this baseline.
- If a user's token usage in the last hour is more than three times their median usage or exceeds twice the 95th percentile, and if the total tokens used exceed 50,000, it flags this as an anomaly.
Output: The query outputs details such as the hour, model name, user, number of requests, total tokens used, median tokens, 95th percentile tokens, and the spike ratio.
Severity and Alerts: The severity of this detection is marked as "Medium". If an anomaly is detected, it creates an incident and groups alerts by user account.
Configuration: The query runs every hour and looks back over the past seven days to establish a baseline. It is part of a scheduled task and is enabled by default.

This setup helps organizations monitor and respond to unexpected increases in token usage, potentially preventing misuse or financial loss.

Details

David Alonso

Released: June 8, 2026

Tables

OpenAIChatCompletions

Keywords

OpenAITokensUserModelNameAccountCloudApplicationAI

Operators

letagotostringtodoublesummarizebinnowpercentilejoinkind=leftoutercoalesceiffprojectwhereextendandor

Severity

Medium

Tactics

Impact

MITRE Techniques

T1496 T1499

Frequency: PT1H

Period: P7D

Actions

GitHub

KQL Search