Microsoft 365 Copilot - PII or biased content emitted in agent output

Copilot Bias Or Pii In Output

Query

let biasMarkers = dynamic([
    "all women are", "all men are", "people of",
    "those people", "you people", "inferior race",
    "superior race", "lazy ethnic", "criminal ethnic"
]);
CopilotActivity
| where TimeGenerated > ago(7d)
| extend
    Response = tostring(LLMEventData.Response),
    ConversationId = tostring(LLMEventData.ConversationId)
| extend
    EmailHits = extract_all(@"([A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,})", Response),
    PhoneHits = extract_all(@"(\+?\d{1,3}[\s\-\.]?\(?\d{2,4}\)?[\s\-\.]?\d{3,4}[\s\-\.]?\d{3,4})", Response),
    IbanHits = extract_all(@"([A-Z]{2}\d{2}[A-Z0-9]{11,30})", Response),
    PanHits = extract_all(@"(\b(?:\d[ -]*?){13,19}\b)", Response)
| extend LowerResponse = tolower(Response)
| extend BiasHit = LowerResponse has_any (biasMarkers)
| where array_length(EmailHits) > 0
    or array_length(PhoneHits) > 0
    or array_length(IbanHits) > 0
    or array_length(PanHits) > 0
    or BiasHit
| project
    TimeGenerated, AgentId, AgentName, ActorName, ConversationId,
    EmailHitCount = array_length(EmailHits),
    PhoneHitCount = array_length(PhoneHits),
    IbanHitCount = array_length(IbanHits),
    PanHitCount = array_length(PanHits),
    BiasHit, Response, TenantId
| order by TimeGenerated desc

Explanation

This query is designed to monitor and detect potentially sensitive or inappropriate content generated by the Microsoft 365 Copilot agent. Here's a simplified breakdown of what it does:

Purpose: The query aims to identify responses from the Microsoft 365 Copilot that might contain Personally Identifiable Information (PII) or biased/discriminatory language that wasn't caught by the platform's safety filters.
PII Detection: It looks for patterns in the agent's responses that match:
- Email addresses
- Phone numbers in the E.164 format
- International Bank Account Numbers (IBAN)
- Credit card-like numbers (PANs)
Bias Detection: It checks for phrases that could indicate bias or discrimination, such as "all women are," "inferior race," or "lazy ethnic."
Time Frame: The query examines data from the past 7 days.
Output: If any PII or biased content is found, it records details like the time it was generated, the agent's ID and name, the actor's name, the conversation ID, and counts of each type of PII detected.
Sorting: The results are sorted by the time they were generated, with the most recent first.
Use Case: This acts as a final check to catch any sensitive or inappropriate content that might have bypassed earlier safety measures, especially in interactions with customers.
Tactics and Techniques: The query is associated with tactics like Collection and Exfiltration, and techniques such as T1530 (Data from Cloud Storage) and T1213 (Data from Information Repositories).
Tags: It is tagged for use with Sentinel-As-Code, Custom, Copilot, and AI, indicating its relevance to these areas.

Details

David Alonso

Released: May 20, 2026

Tables

CopilotActivity

Keywords

CopilotActivityResponseConversationEmailPhoneIbanPanBiasAgentActorTenant

Operators

letdynamicwhereextendtostringextract_alltolowerhas_anyarray_lengthorprojectorder bydesc

Tactics

CollectionExfiltration

MITRE Techniques

T1530 T1213

Actions

GitHub

KQL Search

Microsoft 365 Copilot - PII or biased content emitted in agent output

Query

Explanation

Details

Actions