Microsoft 365 Copilot - Adversarial paraphrase loop after refusal

Copilot Adversarial Paraphrase Loop

Query

let window = 15m;
let refusalMarkers = dynamic([
    "i can't help with that", "i cannot assist", "i won't help",
    "i'm not able to provide", "that request can't be processed",
    "as an ai, i can't", "sorry, i can't", "policy prevents me"
]);
let conv =
    CopilotActivity
    | where TimeGenerated > ago(7d)
    | where RecordType == "CopilotInteraction"
    | extend
        ActorUpn = tolower(tostring(coalesce(column_ifexists('ActorUPN', ''), ActorName))),
        Prompt   = tostring(LLMEventData.Prompt),
        ResponseLower = tolower(tostring(LLMEventData.Response)),
        ConversationId = tostring(LLMEventData.ConversationId);
let refusals =
    conv
    | where ResponseLower has_any (refusalMarkers)
    | extend RefusedPromptTokens = set_difference(extract_all(@"([a-z]{4,})", tolower(Prompt)), dynamic([]))
    | project RefusalTime = TimeGenerated, ActorUpn, AgentId, AgentName,
              ConversationId, RefusedPrompt = Prompt, RefusedPromptTokens;
refusals
| join kind=inner (
      conv
      | extend FollowupTokens = set_difference(extract_all(@"([a-z]{4,})", tolower(Prompt)), dynamic([]))
      | project FollowupTime = TimeGenerated, ActorUpn, AgentId, ConversationId,
                FollowupPrompt = Prompt, FollowupTokens
  ) on ActorUpn, AgentId, ConversationId
| where FollowupTime > RefusalTime and FollowupTime - RefusalTime <= window
| extend
    Intersection = array_length(set_intersect(RefusedPromptTokens, FollowupTokens)),
    Union        = array_length(set_union(RefusedPromptTokens, FollowupTokens))
| extend Jaccard = iff(Union > 0, todouble(Intersection) / todouble(Union), 0.0)
| where Jaccard >= 0.4
| summarize
    ParaphraseCount = count(),
    FirstFollowup   = min(FollowupTime),
    LastFollowup    = max(FollowupTime),
    SampleFollowups = make_set(FollowupPrompt, 5),
    MaxJaccard      = max(Jaccard)
    by RefusalTime, ActorUpn, AgentId, AgentName, ConversationId, RefusedPrompt
| where ParaphraseCount >= 3
| order by ParaphraseCount desc, MaxJaccard desc

Explanation

This query is designed to identify users who are trying to bypass restrictions in Microsoft 365 Copilot by repeatedly rephrasing a request that was initially denied. Here's a simple breakdown of how it works:

Time Frame: It looks at interactions from the past 7 days.
Refusal Detection: It searches for specific phrases in Copilot's responses that indicate a refusal, such as "I can't help with that" or "policy prevents me."
User Behavior: After detecting a refusal, it checks if the same user makes at least three more requests within 15 minutes that are similar to the refused request.
Similarity Check: The similarity between the original refused request and the follow-up requests is measured using Jaccard similarity, which compares the overlap of words between the requests.
Reporting: If a user has made three or more similar follow-up requests, the query summarizes these attempts, including the number of attempts, the time of the first and last follow-up, and a sample of the follow-up requests.
Purpose: This helps identify potential attempts to bypass security or policy restrictions by slightly altering the phrasing of requests.

The query is tagged with tactics and techniques related to defense evasion and initial access, indicating its relevance in security monitoring and threat detection.

Details

David Alonso

Released: May 20, 2026

Tables

CopilotActivity

Keywords

CopilotActivityActorUpnNameLLMEventDataConversationIdAgentTimeGeneratedRecordTypeRefusalMarkersPromptResponseLowerRefusedTokensFollowupIntersectionUnionJaccardParaphraseCountFirstLastSampleFollowupsMax

Operators

letdynamictolowertostringcoalescecolumn_ifexistsagohas_anyset_differenceextract_allprojectjoinkind=innerextendarray_lengthset_intersectset_unionifftodoublesummarizecountminmaxmake_setbyorder by

Tactics

DefenseEvasionInitialAccess

MITRE Techniques

T1059 T1204

Actions

GitHub

KQL Search