Display most recent data using kusto query - kql

I have the following kusto query:
customEvents
| where name == "Tracker"
| project Id = tostring(customDimensions["Id"]),
Rank = tostring(customDimensions["Rank"])
which gives the following result:
I see that the same Id is repeated multiple times. Is there a way to display ONLY the most recent data for each id. How do I update the above kusto query?

you can use the arg_max() aggregation function.
for example:
customEvents
| where name == "Tracker"
| summarize arg_max(timestamp, *) by Id
if this is a common use case, you can consider defining a materialized view that performs similar aggregation, then query the view instead of the table with the raw data.

Related

Splunk - I want to add a value from stats count() to a value from a lookup table and show that value in a table

The objective of the query im trying to write is to take a count of raw data from the previous month and add that to a count from a lookup table (.csv)
What I have attempted to do is…
index=*** source=***
| stats count(_raw) as monthCount
| join
[ | inputlookup Log_Count_YTD.csv]
| eval countYTD = toNumber(monthCount) + toNumber(TOTAL_COUNT_YTD)
| table countYTD
This query doesn’t return any value on a table. The TOTAL_COUNT_YTD is the only field from the inputlookup file. Let me know if there is any other information you need to help me out with this one. Thanks!
The stats command transforms the data so it has only 1 field: monthCount. The inputlookup returns only the TOTAL_COUNT_YTD field. The join command works by comparing values of common fields between the main search and the subsearch. Since there are no common fields no events are joined.
There is no need for join in this case. The appendcols command will do, assuming the CSV contains a single field in a single row.
index=*** source=***
| stats count() as monthCount
| appendcols
[ | inputlookup Log_Count_YTD.csv]
| eval countYTD = toNumber(monthCount) + toNumber(TOTAL_COUNT_YTD)
| table countYTD
FWIW, the tonumber function is unnecessary, but doesn't hurt.

Extracting value of a json in Spark SQl

I am looking to aggregate by extracting the value of a json key here from one of the column here. can someone help me with the right syntax in Spark SQL
select count(distinct(Name)) as users, xHeaderFields['xyz'] as app group by app order by users desc
The table column is something like this. I have removed other columns for simplification.Table has columns like Name etc.
Assuming that your dataset is called ds and there is only one key=xyz object per columns;
First, to JSON conversion (if needed):
ds = ds.withColumn("xHeaderFields", expr("from_json(xHeaderFields, 'array<struct<key:string,value:string>>')"))
Then filter the key = xyz and take the first element (assuming there is only one xyz key):
.withColumn("xHeaderFields", expr("filter(xHeaderFields, x -> x.key == 'xyz')[0]"))
Finally, extract value from your object:
.withColumn("xHeaderFields", expr("xHeaderFields.value"))
Final result:
+-------------+
|xHeaderFields|
+-------------+
|null |
|null |
|Settheclass |
+-------------+
Good luck!

How can I improve KQL query for large dataset for heatmap

I have a KQL query below which will provide a real nice heatmap to map out top access by country for Azure WAF.
The challenge here is that this query cannot go beyond 24 hours as the number of records I have way too big. How can i improve this to even display like weekly and monthly stats ?
// source: https://datahub.io/core/geoip2-ipv4
set notruncation;
let CountryDB=externaldata(Network:string, geoname_id:string, continent_code:string, continent_name:string, country_iso_code:string, country_name:string)
[#"https://datahub.io/core/geoip2-ipv4/r/geoip2-ipv4.csv"]
| extend Dummy=1;
let AppGWAccess = AzureDiagnostics
| where ResourceType == "APPLICATIONGATEWAYS"
| where Category == "ApplicationGatewayAccessLog"
| where userAgent_s !in ("bot")
| project TimeGenerated, clientIP_s;
AppGWAccess
| extend Dummy=1
| summarize count() by Hour=bin(TimeGenerated,6h), clientIP_s,Dummy
| partition by Hour(
lookup (CountryDB|extend Dummy=1) on Dummy
| where ipv4_is_match(clientIP_s, Network)
)
| summarize sum(count_) by country_name
What you're doing is creating hourly aggregations over all the data. Instead, you should create a Materialized View that will do the aggregations in the background for you.
Quoting the documentation:
Materialized views expose an aggregation query over a source table. Materialized views always return an up-to-date result of the aggregation query (always fresh). Querying a materialized view is more performant than running the aggregation directly over the source table, which is performed each query.

Take output from query and use in subsequent KQL query

I'm using Azure Log Analytics to review certain events of interest.
I would like to obtain timestamps from data that meets a certain criteria, and then reuse these timestamps in further queries, i.e. to see what else occurred around these times.
The following query returns the desired results, but I'm stuck at how to use the interestingTimes var to then perform further searches and show data within X minutes of each previously returned timestamp.
let interestingTimes =
Event
| where TimeGenerated between (datetime(2021-04-01T11:57:22) .. datetime('2021-04-01T15:00:00'))
| where EventID == 1
| parse EventData with * '<Data Name="Image">' ImageName "<" *
| where ImageName contains "MicrosoftEdge.exe"
| project TimeGenerated
;
Any pointers would be greatly appreciated.
interestingTimes will only be available for use in the query where you declare it. You can't use it in another query, unless you define it there as well.
By the way, you can make your query much more efficient by adding a filter that will utilize the built-in index for the EventData column, so that the parse operator will run on a much smaller amount of records:
let interestingTimes =
Event
| where TimeGenerated between (datetime(2021-04-01T11:57:22) .. datetime('2021-04-01T15:00:00'))
| where EventID == 1
| where EventData has "MicrosoftEdge.exe" // <-- OPTIMIZATION that will filter out most records
| parse EventData with * '<Data Name="Image">' ImageName "<" *
| where ImageName contains "MicrosoftEdge.exe"
| project TimeGenerated
;

How to aggregate logs by field and then by bin in AWS CloudWatch Insights?

I'm trying to do a query that will first aggregate by field count and after by bin(1h) for example I would like to get the result like:
# Date Field Count
1 2019-01-01T10:00:00.000Z A 123
2 2019-01-01T11:00:00.000Z A 456
3 2019-01-01T10:00:00.000Z B 567
4 2019-01-01T11:00:00.000Z B 789
Not sure if it's possible though, the query should be something like:
fields Field
| stats count() by Field by bin(1h)
Any ideas how to achieve this?
Is this what you need?
fields Field | stats count() by Field, bin(1h)
If you want to create a line chart, you can do it by separately counting each value that your field could take.
fields
Field = 'A' as is_A,
Field = 'B' as is_B
| stats sum(is_A) as A, sum(is_B) as B by bin(1hour)
This solution requires your query to include a string literal of each value ('A' and 'B' in OP's example). It works as long as you know what those possible values are.
This might be what Hugo Mallet was looking for, except the avg() function won't work here so he'd have to calculate the average by dividing by a total
Not able to group by a certain field and create visualizations.
fields Field
| stats count() by Field, bin(1h)
Keep getting this message
No visualization available. Try this to get started:
stats count() by bin(30s)