LAG Analysis Example

Query

requests
| serialize 
| extend RequestId = toguid(customDimensions.RequestId)
| project-away resultCode, id, itemType, operation_Name, client_Type, client_IP, operation_SyntheticSource, appId, itemId, itemCount, source, url, performanceBucket 
| sort by RequestId, timestamp desc
| extend rn = row_number()
| extend rncr = row_number(1, prev(RequestId,1,0) != RequestId) 
| extend previousTimestamp = iif(prev(RequestId,1,0) != RequestId, timestamp, prev(timestamp,1,0)) 
| extend deltaInMin = datetime_diff('minute', previousTimestamp, timestamp)
| project rncr, timestamp, RequestId, name, success, deltaInMin, duration, customDimensions, operation_Id, operation_ParentId, cloud_RoleInstance, appName
 
let SampleData = datatable (user:string, rowValue: int) ["A",5,"B",12,"B",15,"A",3,"A",9,"A",19,"B",7];
SampleData 
| serialize
| extend rowNumber = row_number()
| extend rowNumberCurrentUser = row_number(1, prev(user,1,0) != user) 
| extend previousValue = strcat("Previous value was ", prev(rowValue,1,0))
| extend nextValue = strcat("Next value was ", next(rowNumber,1,0))
| extend runningTotal = row_cumsum(rowValue)
| project rowNumber, rowNumberCurrentUser, user, rowValue, previousValue, nextValue, runningTotal

Explanation

The query is performing several operations on two different datasets: "requests" and "SampleData".

For the "requests" dataset:

The query first serializes the data.
It creates a new column called "RequestId" by converting the "customDimensions.RequestId" column to a GUID format.
It removes several columns from the dataset.
It sorts the data by "RequestId" and "timestamp" in descending order.
It adds two new columns: "rn" which represents the row number, and "rncr" which represents the row number with a reset for each new "RequestId".
It adds a column called "previousTimestamp" which stores the previous timestamp value if the "RequestId" changes, otherwise it stores the previous timestamp value.
It calculates the time difference in minutes between the previous timestamp and the current timestamp, and stores it in a column called "deltaInMin".
It selects and renames specific columns from the dataset.

For the "SampleData" dataset:

The query serializes the data.
It adds a column called "rowNumber" which represents the row number.
It adds a column called "rowNumberCurrentUser" which represents the row number with a reset for each new user.
It adds a column called "previousValue" which stores the previous value of "rowValue".
It adds a column called "nextValue" which stores the next value of "rowNumber".
It adds a column called "runningTotal" which calculates the cumulative sum of "rowValue".
It selects specific columns from the dataset.

Details

Rod Trent

Released: March 26, 2020

Tables

requestsSampleData

Keywords

Keywords: requestsserializeextendRequestIdtoguidcustomDimensionsproject-awayresultCodeiditemTypeoperation_Nameclient_Typeclient_IPoperation_SyntheticSourceappIditemIditemCountsourceurlperformanceBucketsort bytimestampdescrnrow_numberrncrpreviousTimestampiifdatetime_diffminutedeltaInMindurationoperation_Idoperation_ParentIdcloud_RoleInstanceappNameletSampleDatadatatableuserrowValuerowNumberrowNumberCurrentUserpreviousValuenextValuerunningTotalproject.

Operators

serializeextendproject-awaysort byprojectlet

Actions

GitHub

KQL Search