Kusto query to get percentage value of events over time - azure-log-analytics

I have a Kusto / KQL query in azure log analytics that aggregates a count of events over time, e.g.:
customEvents
| where name == "EventICareAbout"
| extend channel = customDimensions["ChannelName"]
| summarize events=count() by bin(timestamp, 1m), tostring(channel)
This gives a good results set of a count of the events in each minute bucket.
But the count is fairly meaningless, what I want to know is if that count is different to the average of over the say last hour.
But I'm not even sure how to start constructing something like that.
Any pointers?

There are a couple of ways to achieve this, first, calculate the hourly avg as an additional column then calculate the diffs from the hourly average:
let minuteValues = customEvents
| where name == "EventICareAbout"
| extend channel = customDimensions["ChannelName"]
| summarize events=count() by bin(timestamp, 1m), tostring(channel)
| extend Day = startofday(timestamp), hour =hourofday(timestamp);
let hourlyAverage = customEvents
| where name == "EventICareAbout"
| extend channel = customDimensions["ChannelName"]
| summarize events=count() by bin(timestamp, 1m), tostring(channel)
| summarize hourlyAvgEvents = avg(events) by bin(timestamp,1h), tostring(channel)
| extend Day = startofday(timestamp),hour =hourofday(timestamp);
minuteValues
| lookup hourlyAverage on hour, Day
| extend Diff = events- hourlyAvgEvents
Another option is to use the built-in Anomaly detection

Related

Filter a result set to include only the top 99.9% of values in Splunk, preferably without a subquery

I am querying the access logs to a service. I want to build a scatter plot where the X axis is the total number of requests in that hour, and the Y axis is how many times a particular request (category) was made in that hour. To do this, my output needs to be:
requestDetails, per_hour, count
Easy enough. I use a query like:
<base query setting requestDetails>
| bucket _time span=1h
| stats count by _time, requestDetails
| eventstats sum(count) as per_hour by _time
| table requestDetails, per_hour, count
The challenge I am running in to is, there are very infrequent items that just add clutter to the chart. I don't care about every request, just the common ones. So I'd like to filter it so that only the requestDetails that make up 99.9% of traffic are included.
In theory, I could do this with a sub-query:
<base query>
[ search <base query>
| stats count by requestDetails
| sort -count
| eventstats sum(count) as total
| eval percent=count/total*100
| accum percent as percentile
| where percentile <= 99.9
| field requestDetails
]
| bucket _time span=1h
| stats count by _time, requestDetails
| eventstats sum(count) as per_hour by _time
| table requestDetails, per_hour, count
The problem is, it's expensive to do it this way, since I have to extract the data twice. And it doesn't exactly give me what I want, since it is a pre-filter, so the value of per_hour is missing the values for those filtered rows.
It seems to me I should be able to accomplish this with a single-pass through the data using eventstats or streamstats. But I'm drawing a blank on how to emulate the accum command because there are so many rows for a given requestDetails, and what I need is to only count any given requestDetails once.
Is there a way to do what I'm trying to do as a post-filter, without using a subquery? If so, what would it look like?
Indeed, it is possible to do this using a single pass through the data:
<base query>
| bucket _time span=1h
| stats count by _time, requestDetails
| eventstats sum(count) as per_hour by _time
| eventstats sum(count) as per_request by requestDetails
| eventstats sum(count) as total
| sort -per_request, +requestDetails
| streamstats sum(count) as count_til_now
| eval percentile = count_til_now / total * 100
| eventstats max(percentile) as max_percentile by requestDetails
| where max_percentile <= 99.9
| table requestDetails, per_hour, count
The key is the sort. We first organize them from most common requestDetails to least common. The other sort terms are to avoid overlap in the event of a tie.
We also have to take some eventstats, such as the total number of times a particular requestDetails was seen in the whole dataset, and the total number of requests.
The fun part is to use streamstats to create a running total, and use that to calculate the percentile.
But then in order to build our filter we do another eventstats to calculate what is the maximum percentile of a particular requestDetails.
All that is left is to filter where the max_percentile is below the threshold, and output the data needed for the visualization.

KQL - return entries not matching IP from watchlist (query optimization)

I want to receive a high severity alert in Sentinel when a user is added to a defined "high severity" group (via watchlist), however, I want to omit any users that are connected to a Zscaler IP address. The query below is working, however, I'm not sure this is the neatest/most optimized logic. Is there a shorter/better way to write this?
I'm only concerned about the lines beginning with asterisks (which are only added for clarity).
watchlist "aadgroups"
Group
Severity
Prod Owners
High
Prod Contributors
High
watchlist "ZSIPs"
zscaler_ip
location
165.225.0.0/23
Chicago
165.225.60.0/22
Chicago
165.225.56.0/22
Chicago
let HighSeverityGroups = (_GetWatchlist('aadgroups') | where severity == "High" | project group_name, severity);
let ZSIPs = (_GetWatchlist('zscaler_ip') | project zscaler_ip);
AuditLogs
| where ActivityDisplayName == "Add member to group"
| where parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue)) has_any (HighSeverityGroups)
| extend InitiatedByActor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend GroupName = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue)))
| extend Actor_ipv4 = tostring(parse_json(tostring(InitiatedBy.user)).ipAddress)
| extend TargetUser = tostring(TargetResources[0].userPrincipalName)
| project-reorder TimeGenerated,SourceSystem,InitiatedBy,ActivityDisplayName,TargetUser,GroupName,InitiatedByActor,Actor_ipv4,Result
| where TargetUser <> ""
** | evaluate ipv4_lookup(ZSIPs, Actor_ipv4, zscaler_ip, return_unmatched = true)
** | where isempty(zscaler_ip)
A couple of things you can try to optimize the query:
This filter is quite costly: | where parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue)) has_any (HighSeverityGroups) - if TargetResources will rarely have strings from HighSeverityGroups, then before this filter, you can add a much more efficient filter, that will filter out most of the records: | where TargetResources has_any (HighSeverityGroups) - this way, the heavy parsing will be done only on a small amount of records
You're parsing some of the data more than once, for example tostring(parse_json(tostring(InitiatedBy.user)) - instead, you need to use the extend operator to parse them only once, and then use later on in the query

Take output from query and use in subsequent KQL query

I'm using Azure Log Analytics to review certain events of interest.
I would like to obtain timestamps from data that meets a certain criteria, and then reuse these timestamps in further queries, i.e. to see what else occurred around these times.
The following query returns the desired results, but I'm stuck at how to use the interestingTimes var to then perform further searches and show data within X minutes of each previously returned timestamp.
let interestingTimes =
Event
| where TimeGenerated between (datetime(2021-04-01T11:57:22) .. datetime('2021-04-01T15:00:00'))
| where EventID == 1
| parse EventData with * '<Data Name="Image">' ImageName "<" *
| where ImageName contains "MicrosoftEdge.exe"
| project TimeGenerated
;
Any pointers would be greatly appreciated.
interestingTimes will only be available for use in the query where you declare it. You can't use it in another query, unless you define it there as well.
By the way, you can make your query much more efficient by adding a filter that will utilize the built-in index for the EventData column, so that the parse operator will run on a much smaller amount of records:
let interestingTimes =
Event
| where TimeGenerated between (datetime(2021-04-01T11:57:22) .. datetime('2021-04-01T15:00:00'))
| where EventID == 1
| where EventData has "MicrosoftEdge.exe" // <-- OPTIMIZATION that will filter out most records
| parse EventData with * '<Data Name="Image">' ImageName "<" *
| where ImageName contains "MicrosoftEdge.exe"
| project TimeGenerated
;

Splunk: How to grab certain section from result in splunk?

I am using this query in splunk search -
index="some_index" | dedup source | sort -source | dedup sourcetype | table sourcetype, source
My result shows like this -
sourcetype source
----------- --------------
dev_architecture_dev1 /u01/splunk/etc/apps/dev-data/data/dev1/dev1-20150629133045.log
dev_architecture_dev2 /u01/splunk/etc/apps/dev-data/data/dev2/dev2-20150626124438.log
I want to grab only the year, month, day, hour, min and sec right before ".log". e.g. 20150629133045.
And then display it like 2015-06-29 13:30:45 in the 'source' column.
Is there a way to do it in Splunk6?
Thanks for looking at the question. Hoping to get some answers.
Capture the data
| rex field=source ".*?(?<dt>\d+)\.log"
Parse into time
| eval dt = strptime(dt, "%Y%m%d%H%M%S")
Format however you need
| eval dt = strftime(dt, "%Y-%m-%d %H:%M:%S")
Output
| table sourcetype source dt

Possible design pattern suggestions?

I was wondering if anyone can suggest a suitable design pattern for achieving the following:
I have a payslip, each slip shows my previous pay and my current pay. Each payslip should not need to duplicate fields, but rather point the current value to the previous value in the next slip.
On top of all this, I also need to be able to retrieve any given payslip at any point in time (preferably O(1)).
Here's a visual to help understand my problem.
[key:"1"] [key:"2"] [key:"3"]
+------+ +------+ +------+
| | | | | |
| Curr | <--- | Prev | | Curr |
| Null | | Curr | <--- | Prev |
| | | | | |
+------+ +------+ +------+
Any help would be greatly appreciated
Give your PaySlip class a PreviousPaySlipKey property, which will be null for the very first pay slip. In your database, this should be a foreign key to the PaySlip's Key property.
This way, if you have a PaySlip, you can find the key of the previous pay slip, and query the database for the payslip with that key.
Disclaimer: I am not a VB programmer, and the syntax below may not apply. Assuming VB has a Map interface, you could use that for your lookups O(1)
Map<Integer, CompositeValue> = some hash based map
and declare a class (assuming VB lets you create classes) CompositeValue as such:
class CompositeValue {
Integer previousKey;
Value value;
}
Now, once you've retrieved a CompositeValue from the Map, you have the means to get it's real value (value), and to retrieve the previous value using previousKey.
Just a thought.
If we are talking DBs here, every payslip should be given a sequential ID (also called auto-increment in some RDBMS). This will be the primary key (so it's automatically indexed). Assuming you retrieved the payslip you wanted, here is how to get previous one from DB:
SELECT TOP 1 * FROM PaySlip WHERE PaySlipID < #PaySlipID ORDER BY PaySlipID Desc
And the next one:
SELECT TOP 1 * FROM PaySlip WHERE PaySlipID > #PaySlipID ORDER BY PaySlipID
You will probably have payslips for multiple employees stored in a single table, so just add to the WHERE condition.
It is counter productive to store references to previous/next payslip on each payslip.