Query Details

Ingestion Anomaly Zscore

Query

let _AuxiliaryTables = dynamic([
    "OfficeActivity", "AzureActivity", "Heartbeat",
    "SentinelHealth", "SecurityAlert", "SecurityIncident", "Operation"
]);
let _LookbackDays = 14d;
let _Threshold = 3.0;
let _Baseline =
    Usage
    | where TimeGenerated > ago(_LookbackDays)
    | where IsBillable == true
    | where DataType !in (_AuxiliaryTables)
    | summarize DailyMB = sum(Quantity) by Day = bin(TimeGenerated, 1d), DataType
    | summarize 
        AvgDailyMB  = avg(DailyMB),
        StdDevMB    = stdev(DailyMB),
        MedianMB    = percentile(DailyMB, 50),
        P95MB       = percentile(DailyMB, 95)
        by DataType;
Usage
| where TimeGenerated > ago(1d)
| where IsBillable == true
| where DataType !in (_AuxiliaryTables)
| summarize TodayMB = sum(Quantity) by DataType
| join kind=inner _Baseline on DataType
| extend 
    ZScore = iff(StdDevMB > 0, round((TodayMB - AvgDailyMB) / StdDevMB, 2), 0.0),
    DeviationPct = round((TodayMB - AvgDailyMB) / AvgDailyMB * 100, 1)
| where ZScore > _Threshold or ZScore < -_Threshold
| extend 
    AnomalyType = iff(ZScore > 0, "SPIKE", "DROP"),
    TodayGB = round(TodayMB / 1024.0, 3),
    AvgGB = round(AvgDailyMB / 1024.0, 3)
| project 
    TimeGenerated = now(),
    DataType, AnomalyType, ZScore, DeviationPct,
    TodayGB, AvgGB, 
    TodayMB = round(TodayMB, 1),
    AvgDailyMB = round(AvgDailyMB, 1),
    StdDevMB = round(StdDevMB, 1),
    P95MB = round(P95MB, 1)
| order by abs(ZScore) desc

Explanation

This KQL query is designed to detect anomalies in data usage over the past day compared to the previous two weeks. Here's a simplified breakdown:

  1. Setup and Parameters:

    • A list of auxiliary tables (_AuxiliaryTables) is defined, which are excluded from the analysis.
    • The lookback period is set to 14 days (_LookbackDays).
    • A threshold for anomaly detection is set at 3.0 (_Threshold).
  2. Baseline Calculation:

    • The query calculates a baseline for each data type over the last 14 days, excluding auxiliary tables and only considering billable data.
    • It computes daily usage in megabytes (DailyMB) and then summarizes this data to find the average, standard deviation, median, and 95th percentile of daily usage for each data type.
  3. Current Day Analysis:

    • It calculates the total usage for each data type over the past day, again excluding auxiliary tables and focusing on billable data.
    • This current day's usage is then compared to the baseline.
  4. Anomaly Detection:

    • The query calculates a Z-score for each data type to measure how much today's usage deviates from the average. A Z-score above or below the threshold indicates an anomaly.
    • It also calculates the percentage deviation from the average.
  5. Anomaly Classification and Output:

    • Anomalies are classified as either a "SPIKE" (if usage is higher than expected) or a "DROP" (if usage is lower).
    • The results include the current time, data type, anomaly type, Z-score, deviation percentage, today's usage in gigabytes, average usage in gigabytes, and other statistical measures.
    • The results are ordered by the absolute value of the Z-score, highlighting the most significant anomalies.

In summary, this query identifies significant deviations in data usage from the norm, helping to pinpoint unusual spikes or drops in usage patterns.

Details

David Alonso profile picture

David Alonso

Released: April 8, 2026

Tables

Usage

Keywords

Usage

Operators

letdynamicagoinsummarizebinavgstdevpercentilejoinkindextendiffroundprojectnoworder byabs

Actions