Query Details

Noise Repeated Values

Query

// Repeated Value Dominance per Column (Any Table)
// For each column in a table, identifies if a single value accounts
// for >50% of all rows. Dominant values are prime candidates for
// DCR where clause filters (e.g., where RenderedDescription != "routine check").
//
// HOW TO USE: Replace "SecurityEvent" with your target table.
// Run against your top 5 ingesting tables for maximum cost savings.
// =====================================================================

let _TargetTable = SecurityEvent;  // <-- CHANGE THIS
let _SampleSize = 100000;
let _MinDominancePct = 30;  // Only show columns where top value is >30% of rows
_TargetTable
| where TimeGenerated > ago(1d)
| take _SampleSize
| evaluate narrow()
| where isnotempty(tostring(Value))
| summarize ValueCount = count() by Column, Value = tostring(Value)
| as _AllValues
| join kind=inner (
    _AllValues
    | summarize TotalNonNull = sum(ValueCount) by Column
) on Column
| extend ValuePct = round(toreal(ValueCount) * 100.0 / TotalNonNull, 1)
| where ValuePct >= _MinDominancePct
| summarize 
    TopValue = arg_max(ValuePct, Value),
    TopValueCount = max(ValueCount),
    DistinctValues = dcount(Value),
    TotalRows = max(TotalNonNull)
    by Column
| extend 
    TopValuePreview = iff(strlen(Value) > 80, strcat(substring(Value, 0, 80), "..."), Value),
    FilterAction = case(
        ValuePct > 80, "CRITICAL: >80% same value - Column is nearly constant, project-away or filter",
        ValuePct > 60, "HIGH: >60% same value - Strong DCR filter candidate",
        ValuePct > 40, "MODERATE: >40% same value - Review if value adds detection context",
        "LOW: Common but varied"
    )
| project 
    Column, TopValuePreview, ValuePct, TopValueCount, TotalRows,
    DistinctValues, FilterAction
| order by ValuePct desc

Explanation

This query is designed to analyze a table and identify columns where a single value appears in more than 30% of the rows. This can help you find columns where one value is overly dominant, which might be useful for optimizing data queries or storage.

Here's a simple breakdown of what the query does:

  1. Target Table: It starts by selecting a table named SecurityEvent. You can replace this with any table you want to analyze.

  2. Sample Size: It takes a sample of 100,000 rows from the table to perform the analysis.

  3. Time Filter: It only considers data from the last day.

  4. Data Transformation: The query converts the table into a narrow format and filters out empty values.

  5. Value Counting: It counts how often each value appears in each column.

  6. Dominance Calculation: It calculates what percentage of non-empty rows each value represents in its column.

  7. Filtering: It only keeps columns where the most common value appears in more than 30% of the rows.

  8. Summary: For each column, it identifies the most common value, how often it appears, the number of distinct values, and the total number of rows.

  9. Preview and Action: It provides a preview of the dominant value and suggests actions based on how dominant the value is:

    • CRITICAL: If a value appears in more than 80% of rows, consider removing or filtering the column.
    • HIGH: If a value appears in more than 60% of rows, it's a strong candidate for filtering.
    • MODERATE: If a value appears in more than 40% of rows, review its usefulness.
    • LOW: If the value is common but not overly dominant.
  10. Output: The results are sorted by the dominance percentage, showing the most dominant values first.

This analysis helps identify columns where a single value is overly common, which can be useful for optimizing data queries and storage strategies.

Details

David Alonso profile picture

David Alonso

Released: April 8, 2026

Tables

SecurityEvent

Keywords

SecurityEvent

Operators

letwhereagotakeevaluatenarrowisnotemptytostringsummarizeasjoinkindextendroundtorealiffstrlenstrcatsubstringcaseprojectorder bydesc

Actions