Microsoft 365 Copilot - Hallucination and output quality drift signals

Copilot Hallucination And Quality Drift

Query

let hallucinationMarkers = dynamic([
    "as an ai", "i am not sure but", "i made that up",
    "i cannot verify", "according to my knowledge",
    "i don't have real-time", "i hallucinated"
]);
let contradictionMarkers = dynamic([
    "actually, that was wrong",
    "scratch that", "ignore my previous answer",
    "correction:", "i misspoke", "let me revise"
]);
let confidenceThreshold = 0.4;
CopilotActivity
| where TimeGenerated > ago(7d)
| extend
    Response = tostring(LLMEventData.Response),
    Confidence = toreal(LLMEventData.ResponseConfidence),
    RagSources = LLMEventData.RagSources,
    ConversationId = tostring(LLMEventData.ConversationId)
| extend
    LowerResponse = tolower(Response),
    CitedUrls = extract_all(@"(https?://[^\s\)\""<]+)", Response)
| extend
    RagUriSet = tolower(tostring(RagSources))
| extend
    UncitedUrls = set_difference(CitedUrls, dynamic([])),
    HasHallucinationMarker = LowerResponse has_any (hallucinationMarkers),
    HasContradictionMarker = LowerResponse has_any (contradictionMarkers),
    LowConfidence = isnotnull(Confidence) and Confidence < confidenceThreshold
| where HasHallucinationMarker or HasContradictionMarker or LowConfidence
| project
    TimeGenerated, AgentId, AgentName, ActorName, ConversationId,
    Confidence, HasHallucinationMarker, HasContradictionMarker,
    LowConfidence, CitedUrls, Response, TenantId
| order by TimeGenerated desc

Explanation

This query is designed to identify and analyze responses from Microsoft 365 Copilot that may indicate a decline in quality or potential issues. It looks for specific signals in the responses, such as:

Hallucination Markers: Phrases that suggest the AI might be making up information or unsure about its response, like "as an AI" or "I made that up."
Contradiction Markers: Phrases indicating the AI is correcting itself, such as "actually, that was wrong" or "ignore my previous answer."
Low Confidence: Responses where the AI's confidence level is below a certain threshold (0.4 in this case).

The query examines Copilot activity from the past seven days and checks for these markers in the responses. It also looks for URLs cited in the responses to see if they were part of the original data set used by the AI. If any of these quality-degradation signals are present, the query captures details like the time, agent information, conversation ID, confidence level, and the response itself.

This analysis helps in identifying potential issues like model tampering or data poisoning, which can lead to degraded output quality. The results are ordered by the time they were generated, with the most recent entries first.

Details

David Alonso

Released: May 20, 2026

Tables

CopilotActivity

Keywords

CopilotActivityResponseConfidenceRagSourcesConversationIdCitedUrlsTenantAgentNameActorTimeGenerated

Operators

letdynamictostringtorealagoextendtolowerextract_allset_differencehas_anyisnotnullprojectorder by

Tactics

Impact

MITRE Techniques

T1496

Actions

GitHub

KQL Search