Query Details

Retention Gap Detection

Query

let _ExpectedRetentionDays = 90;
let _MinRecords = 100;
Usage
| where TimeGenerated > ago(365d)
| summarize 
    OldestRecord = min(TimeGenerated),
    RecordCount = count()
    by DataType
| where RecordCount > _MinRecords
| extend 
    ObservedDays = datetime_diff('day', now(), OldestRecord),
    ExpectedDays = _ExpectedRetentionDays,
    GapDays = _ExpectedRetentionDays - datetime_diff('day', now(), OldestRecord)
| where GapDays > 5
| extend
    GapSeverity = case(
        GapDays > 60, "CRITICAL - Major data loss",
        GapDays > 30, "HIGH - Significant gap",
        GapDays > 14, "MEDIUM - Notable shortfall",
        "LOW - Minor gap"
    ),
    PossibleCause = case(
        ObservedDays < 7, "Recently enabled connector",
        ObservedDays < 30, "Connector enabled within last month",
        "Possible data purge or retention override"
    )
| project 
    DataType, ObservedDays, ExpectedDays, GapDays, GapSeverity, PossibleCause
| order by GapDays desc

Explanation

This KQL query is designed to analyze data retention for different data types over the past year. Here's a simplified explanation:

  1. Set Parameters: It defines two parameters: _ExpectedRetentionDays as 90 days and _MinRecords as 100 records.

  2. Filter Data: It looks at data from the last 365 days.

  3. Summarize Data: For each DataType, it calculates the oldest record date (OldestRecord) and the total number of records (RecordCount).

  4. Filter by Record Count: It only considers data types with more than 100 records.

  5. Calculate Retention Metrics:

    • ObservedDays: The number of days since the oldest record was generated.
    • ExpectedDays: The expected retention period (90 days).
    • GapDays: The difference between the expected retention period and the observed days.
  6. Filter by Gap: It only includes data types where the gap is more than 5 days.

  7. Assess Gap Severity: It categorizes the severity of the gap:

    • "CRITICAL" if the gap is more than 60 days.
    • "HIGH" if the gap is more than 30 days.
    • "MEDIUM" if the gap is more than 14 days.
    • "LOW" for minor gaps.
  8. Identify Possible Causes: It suggests possible reasons for the gap:

    • "Recently enabled connector" if the data is less than 7 days old.
    • "Connector enabled within last month" if the data is less than 30 days old.
    • "Possible data purge or retention override" for older data.
  9. Output: It displays the data type, observed days, expected days, gap days, gap severity, and possible cause, sorted by the largest gap.

In summary, this query helps identify and assess data retention issues by comparing the actual data retention period with the expected period, highlighting significant gaps and suggesting potential causes.

Details

David Alonso profile picture

David Alonso

Released: April 8, 2026

Tables

Usage

Keywords

Usage

Operators

letwhereagosummarizemincountbyextenddatetime_diffnowcaseprojectorder bydesc

Actions