Query Details

Copilot Hallucination And Quality Drift

Query

id: 3c24b54c-b2f5-c306-15f7-84a4b5c4d4e9
name: Microsoft 365 Copilot - Hallucination and output quality drift signals
description: |
  Hunts for Microsoft 365 Copilot responses that show quality-degradation
  signals: low model-reported confidence, self-contradiction
  markers, refusal-then-comply patterns, or hallucination markers
  such as citing non-existent URLs / files that did not appear in
  the conversation's RAG retrieval set.

  Quality drift is the soft side of model tampering, RAG
  poisoning, and silent model swaps. Useful for triage after a
  toxic-output or system-prompt-override alert.
query: |
  let hallucinationMarkers = dynamic([
      "as an ai", "i am not sure but", "i made that up",
      "i cannot verify", "according to my knowledge",
      "i don't have real-time", "i hallucinated"
  ]);
  let contradictionMarkers = dynamic([
      "actually, that was wrong",
      "scratch that", "ignore my previous answer",
      "correction:", "i misspoke", "let me revise"
  ]);
  let confidenceThreshold = 0.4;
  CopilotActivity
  | where TimeGenerated > ago(7d)
  | extend
      Response = tostring(LLMEventData.Response),
      Confidence = toreal(LLMEventData.ResponseConfidence),
      RagSources = LLMEventData.RagSources,
      ConversationId = tostring(LLMEventData.ConversationId)
  | extend
      LowerResponse = tolower(Response),
      CitedUrls = extract_all(@"(https?://[^\s\)\""<]+)", Response)
  | extend
      RagUriSet = tolower(tostring(RagSources))
  | extend
      UncitedUrls = set_difference(CitedUrls, dynamic([])),
      HasHallucinationMarker = LowerResponse has_any (hallucinationMarkers),
      HasContradictionMarker = LowerResponse has_any (contradictionMarkers),
      LowConfidence = isnotnull(Confidence) and Confidence < confidenceThreshold
  | where HasHallucinationMarker or HasContradictionMarker or LowConfidence
  | project
      TimeGenerated, AgentId, AgentName, ActorName, ConversationId,
      Confidence, HasHallucinationMarker, HasContradictionMarker,
      LowConfidence, CitedUrls, Response, TenantId
  | order by TimeGenerated desc
tactics:
  - Impact
techniques:
  - T1496
tags:
  - Sentinel-As-Code
  - Custom
  - Copilot
  - AI

Explanation

This query is designed to identify and analyze responses from Microsoft 365 Copilot that may indicate a decline in quality or potential issues. It looks for specific signals in the responses, such as:

  1. Hallucination Markers: Phrases that suggest the AI might be making up information or unsure about its response, like "as an AI" or "I made that up."

  2. Contradiction Markers: Phrases indicating the AI is correcting itself, such as "actually, that was wrong" or "ignore my previous answer."

  3. Low Confidence: Responses where the AI's confidence level is below a certain threshold (0.4 in this case).

The query examines Copilot activity from the past seven days and checks for these markers in the responses. It also looks for URLs cited in the responses to see if they were part of the original data set used by the AI. If any of these quality-degradation signals are present, the query captures details like the time, agent information, conversation ID, confidence level, and the response itself.

This analysis helps in identifying potential issues like model tampering or data poisoning, which can lead to degraded output quality. The results are ordered by the time they were generated, with the most recent entries first.

Details

David Alonso profile picture

David Alonso

Released: May 20, 2026

Tables

CopilotActivity

Keywords

CopilotActivityResponseConfidenceRagSourcesConversationIdCitedUrlsTenantIdAgentIdAgentNameActorNameTimeGenerated

Operators

letdynamictostringtorealagoextendtolowerextract_allset_differencehas_anyisnotnullprojectorder by

Actions