Foundry - Content-safety high-severity filter hit

Foundry Content Safety High Severity

Query

AppDependencies
| where isnotempty(Properties["microsoft.foundry.content_filter.results"])
| extend
    Agent     = tostring(Properties["gen_ai.agent.name"]),
    Model     = tostring(Properties["gen_ai.request.model"]),
    ConvId    = tostring(Properties["gen_ai.conversation.id"]),
    ProjectId = tostring(Properties["microsoft.foundry.project.id"]),
    Prompt    = tostring(Properties["gen_ai.input.messages"]),
    FilterArr = todynamic(tostring(Properties["microsoft.foundry.content_filter.results"]))
| mv-expand Entry = FilterArr
| extend
    SourceType = tostring(Entry.source_type),
    Blocked    = tobool(Entry.blocked),
    Filter     = todynamic(Entry.content_filter_results)
| extend
    HateSeverity     = tostring(Filter.hate.severity),
    SexualSeverity   = tostring(Filter.sexual.severity),
    ViolenceSeverity = tostring(Filter.violence.severity),
    SelfHarmSeverity = tostring(Filter.self_harm.severity)
| extend MaxSeverity = case(
    HateSeverity == "high" or SexualSeverity == "high" or ViolenceSeverity == "high" or SelfHarmSeverity == "high", "high",
    HateSeverity == "medium" or SexualSeverity == "medium" or ViolenceSeverity == "medium" or SelfHarmSeverity == "medium", "medium",
    "low")
| where MaxSeverity == "high"
| extend AccountName = iff(isempty(Agent), "unknown-agent", Agent)
| project
    TimeGenerated, AccountName, Agent, Model, ProjectId, ConvId,
    HateSeverity, SexualSeverity, ViolenceSeverity, SelfHarmSeverity,
    Prompt
| order by TimeGenerated desc

Explanation

This query is designed to monitor and raise alerts for high-severity content safety issues detected by Azure AI Content Safety filters. Here's a simplified breakdown:

Purpose: The query identifies and raises an incident when a high-severity verdict is returned by the Azure AI Content Safety filter for content generated by the Foundry or Agent Service. It focuses on four categories of harm: hate, sexual, violence, and self-harm.
Data Source: It uses data from Application Insights, specifically from the AppDependencies data type.
Frequency: The query runs every hour and checks data from the past hour.
Severity and Threshold: The alert is triggered if there is at least one high-severity incident detected, with the alert severity set to medium.
Query Logic:
- It filters records where the content filter results are not empty.
- It extracts relevant information such as agent name, model, conversation ID, project ID, and the prompt text.
- It expands the content filter results to check the severity levels for hate, sexual, violence, and self-harm categories.
- It determines the maximum severity level among these categories.
- It filters for records where the maximum severity is "high".
- It projects relevant details like the time generated, account name, agent, model, project ID, conversation ID, and severity levels.
Incident Management:
- An incident is created for each high-severity alert.
- Incidents are grouped by account, and the system checks for similar incidents within a 6-hour lookback period.
Additional Information:
- The query is part of a scheduled task and is tagged with relevant identifiers like Sentinel-As-Code, Custom, Foundry, AI, and ContentSafety.
- It includes entity mappings for accounts and cloud applications to facilitate incident management and alert grouping.

Overall, this query helps in proactively managing content safety by alerting on high-severity issues, allowing for timely intervention and review.

Details

David Alonso

Released: June 8, 2026

Tables

AppDependencies

Keywords

AzureAIContentSafetyFoundryAgentServiceAppDependenciesPropertiesMicrosoftProjectAccountCloudApplicationSentinelCustom

Operators

isnotemptyextendtostringtodynamicmv-expandtoboolcaseiffisemptyprojectorder by

Severity

Medium

Tactics

ExecutionImpact

MITRE Techniques

T1059

Frequency: PT1H

Period: PT1H

Actions

GitHub

KQL Search