Query Details

Duplicate Cross Table Overlap

Query

// Cross-Table Data Overlap Detection
// Identifies accounts, IPs, and hosts appearing in MULTIPLE tables.
// When the same entity is logged in 3+ tables, you may be paying
// for redundant visibility. Use to decide which tables to keep.
// =====================================================================

let _Window = 1d;
// --- Part 1: Account overlap ---
let _AccountSources =
    union withsource=SourceTable
        (SecurityEvent | where TimeGenerated > ago(_Window) | project Account = TargetAccount, SourceTable),
        (SigninLogs | where TimeGenerated > ago(_Window) | project Account = UserPrincipalName, SourceTable),
        (AADNonInteractiveUserSignInLogs | where TimeGenerated > ago(_Window) | project Account = UserPrincipalName, SourceTable),
        (OfficeActivity | where TimeGenerated > ago(_Window) | project Account = UserId, SourceTable),
        (AuditLogs | where TimeGenerated > ago(_Window) | project Account = tostring(InitiatedBy.user.userPrincipalName), SourceTable),
        (IdentityLogonEvents | where TimeGenerated > ago(_Window) | project Account = AccountUpn, SourceTable)
    | where isnotempty(Account)
    | summarize Tables = make_set(SourceTable), TableCount = dcount(SourceTable), EventCount = count() by Account;
let _OverlappingAccounts =
    _AccountSources
    | where TableCount >= 3
    | top 20 by EventCount desc
    | project 
        EntityType = "Account", Entity = Account, 
        TablesCount = TableCount, Tables = tostring(Tables), 
        TotalEvents = EventCount,
        Recommendation = case(
            TableCount >= 5, "HIGH OVERLAP: Account in 5+ tables - Review which tables add unique detection value",
            TableCount >= 3, "OVERLAP: Account in 3+ tables - Evaluate if all sources needed",
            ""
        );
// --- Part 2: IP overlap ---
let _IPSources =
    union withsource=SourceTable
        (SigninLogs | where TimeGenerated > ago(_Window) | project IP = IPAddress, SourceTable),
        (AADNonInteractiveUserSignInLogs | where TimeGenerated > ago(_Window) | project IP = IPAddress, SourceTable),
        (CommonSecurityLog | where TimeGenerated > ago(_Window) | project IP = SourceIP, SourceTable),
        (AzureActivity | where TimeGenerated > ago(_Window) | project IP = CallerIpAddress, SourceTable),
        (SecurityEvent | where TimeGenerated > ago(_Window) | where isnotempty(IpAddress) | project IP = IpAddress, SourceTable),
        (OfficeActivity | where TimeGenerated > ago(_Window) | project IP = ClientIP, SourceTable)
    | where isnotempty(IP) and IP != "-" and IP != "0.0.0.0"
    | summarize Tables = make_set(SourceTable), TableCount = dcount(SourceTable), EventCount = count() by IP;
let _OverlappingIPs =
    _IPSources
    | where TableCount >= 3
    | top 20 by EventCount desc
    | project 
        EntityType = "IP", Entity = IP,
        TablesCount = TableCount, Tables = tostring(Tables),
        TotalEvents = EventCount,
        Recommendation = case(
            TableCount >= 4, "HIGH OVERLAP: IP in 4+ tables - Likely internal, check if all sources add value",
            TableCount >= 3, "OVERLAP: IP in 3+ tables - Common for corp egress IPs",
            ""
        );
// --- Part 3: Hostname overlap ---
let _HostSources =
    union withsource=SourceTable
        (SecurityEvent | where TimeGenerated > ago(_Window) | project Host = Computer, SourceTable),
        (Syslog | where TimeGenerated > ago(_Window) | project Host = HostName, SourceTable),
        (Heartbeat | where TimeGenerated > ago(_Window) | project Host = Computer, SourceTable),
        (CommonSecurityLog | where TimeGenerated > ago(_Window) | project Host = DeviceName, SourceTable),
        (Event | where TimeGenerated > ago(_Window) | project Host = Computer, SourceTable)
    | where isnotempty(Host)
    | summarize Tables = make_set(SourceTable), TableCount = dcount(SourceTable), EventCount = count() by Host;
let _OverlappingHosts =
    _HostSources
    | where TableCount >= 3
    | top 20 by EventCount desc
    | project 
        EntityType = "Host", Entity = Host,
        TablesCount = TableCount, Tables = tostring(Tables),
        TotalEvents = EventCount,
        Recommendation = case(
            TableCount >= 4, "HIGH OVERLAP: Host in 4+ tables - Likely double-agent or multi-connector",
            TableCount >= 3, "OVERLAP: Host in 3+ tables - Check for MMA+AMA or redundant forwarding",
            ""
        );
_OverlappingAccounts | union _OverlappingIPs | union _OverlappingHosts
| order by EntityType asc, TotalEvents desc

Explanation

This query is designed to identify overlaps in data across multiple tables in a database, focusing on accounts, IP addresses, and hostnames. The goal is to detect entities that appear in three or more tables, which might indicate redundant data collection and potential cost savings by reducing unnecessary data sources. Here's a simple breakdown of the query:

  1. Time Window: The query examines data from the past day (1d).

  2. Account Overlap:

    • It collects account information from various tables like SecurityEvent, SigninLogs, OfficeActivity, etc.
    • It identifies accounts that appear in three or more tables.
    • It lists the top 20 accounts with the highest event counts and provides recommendations based on the number of tables they appear in.
  3. IP Overlap:

    • It gathers IP addresses from tables such as SigninLogs, CommonSecurityLog, AzureActivity, etc.
    • It identifies IPs that appear in three or more tables.
    • It lists the top 20 IPs with the highest event counts and provides recommendations based on their overlap.
  4. Hostname Overlap:

    • It collects hostnames from tables like SecurityEvent, Syslog, Heartbeat, etc.
    • It identifies hosts that appear in three or more tables.
    • It lists the top 20 hosts with the highest event counts and provides recommendations based on their overlap.
  5. Output:

    • The results from accounts, IPs, and hosts are combined and sorted by entity type and event count.
    • This helps in deciding which data sources might be redundant and could be optimized for cost savings.

Overall, the query helps organizations identify and evaluate redundant data sources to streamline their data collection and reduce costs.

Details

David Alonso profile picture

David Alonso

Released: April 8, 2026

Tables

SecurityEventSigninLogsAADNonInteractiveUserSignInLogsOfficeActivityAuditLogsIdentityLogonEventsCommonSecurityLogAzureActivitySyslogHeartbeatEvent

Keywords

AccountIPHost

Operators

letunionwithsourcewhereagoprojectisnotemptysummarizemake_setdcountcounttopbydesccasetostringorder

Actions