Splunk percentage value for each category - splunk

I have 2 columns service and status. How do I calculate percentage availability for each service.
total count for that service -> ts
5xx status for that service -> er_s
availability = ((ts - er_s) / ts) * 100
I am able to get as a whole or separate result for each service, but I am looking for availability for each app, in one place.

What have you tried so far? Did it include a stats command to compute the totals and an eval to calculate the availability?
... | stats count as ts, sum(eval(status>=500 AND status<=599)) as er_s by service
| eval availability=((ts - er_s) * 100 / ts)

Related

Filter a result set to include only the top 99.9% of values in Splunk, preferably without a subquery

I am querying the access logs to a service. I want to build a scatter plot where the X axis is the total number of requests in that hour, and the Y axis is how many times a particular request (category) was made in that hour. To do this, my output needs to be:
requestDetails, per_hour, count
Easy enough. I use a query like:
<base query setting requestDetails>
| bucket _time span=1h
| stats count by _time, requestDetails
| eventstats sum(count) as per_hour by _time
| table requestDetails, per_hour, count
The challenge I am running in to is, there are very infrequent items that just add clutter to the chart. I don't care about every request, just the common ones. So I'd like to filter it so that only the requestDetails that make up 99.9% of traffic are included.
In theory, I could do this with a sub-query:
<base query>
[ search <base query>
| stats count by requestDetails
| sort -count
| eventstats sum(count) as total
| eval percent=count/total*100
| accum percent as percentile
| where percentile <= 99.9
| field requestDetails
]
| bucket _time span=1h
| stats count by _time, requestDetails
| eventstats sum(count) as per_hour by _time
| table requestDetails, per_hour, count
The problem is, it's expensive to do it this way, since I have to extract the data twice. And it doesn't exactly give me what I want, since it is a pre-filter, so the value of per_hour is missing the values for those filtered rows.
It seems to me I should be able to accomplish this with a single-pass through the data using eventstats or streamstats. But I'm drawing a blank on how to emulate the accum command because there are so many rows for a given requestDetails, and what I need is to only count any given requestDetails once.
Is there a way to do what I'm trying to do as a post-filter, without using a subquery? If so, what would it look like?
Indeed, it is possible to do this using a single pass through the data:
<base query>
| bucket _time span=1h
| stats count by _time, requestDetails
| eventstats sum(count) as per_hour by _time
| eventstats sum(count) as per_request by requestDetails
| eventstats sum(count) as total
| sort -per_request, +requestDetails
| streamstats sum(count) as count_til_now
| eval percentile = count_til_now / total * 100
| eventstats max(percentile) as max_percentile by requestDetails
| where max_percentile <= 99.9
| table requestDetails, per_hour, count
The key is the sort. We first organize them from most common requestDetails to least common. The other sort terms are to avoid overlap in the event of a tie.
We also have to take some eventstats, such as the total number of times a particular requestDetails was seen in the whole dataset, and the total number of requests.
The fun part is to use streamstats to create a running total, and use that to calculate the percentile.
But then in order to build our filter we do another eventstats to calculate what is the maximum percentile of a particular requestDetails.
All that is left is to filter where the max_percentile is below the threshold, and output the data needed for the visualization.

Kusto query to get percentage value of events over time

I have a Kusto / KQL query in azure log analytics that aggregates a count of events over time, e.g.:
customEvents
| where name == "EventICareAbout"
| extend channel = customDimensions["ChannelName"]
| summarize events=count() by bin(timestamp, 1m), tostring(channel)
This gives a good results set of a count of the events in each minute bucket.
But the count is fairly meaningless, what I want to know is if that count is different to the average of over the say last hour.
But I'm not even sure how to start constructing something like that.
Any pointers?
There are a couple of ways to achieve this, first, calculate the hourly avg as an additional column then calculate the diffs from the hourly average:
let minuteValues = customEvents
| where name == "EventICareAbout"
| extend channel = customDimensions["ChannelName"]
| summarize events=count() by bin(timestamp, 1m), tostring(channel)
| extend Day = startofday(timestamp), hour =hourofday(timestamp);
let hourlyAverage = customEvents
| where name == "EventICareAbout"
| extend channel = customDimensions["ChannelName"]
| summarize events=count() by bin(timestamp, 1m), tostring(channel)
| summarize hourlyAvgEvents = avg(events) by bin(timestamp,1h), tostring(channel)
| extend Day = startofday(timestamp),hour =hourofday(timestamp);
minuteValues
| lookup hourlyAverage on hour, Day
| extend Diff = events- hourlyAvgEvents
Another option is to use the built-in Anomaly detection

Count the number of different value of a field, and get the average per minute

I have some domain like this:
domain |
A |
B |
C |
D |
...
One domain can be called in one request, now I want to know what is the average request number per minute for a domain (no matter what domain is). So I split it into three steps:
get the total request number per minute
get the number of domains been called per minute
avg = total request number per minute / number of domain per minute
I have got the result of the first step by:
index="whatever" source="sourceurl"
| bin _time span=1m
| stats count as requestsPerMin by _time
However, I don't know how to get the number of domains that been called. For example, in a minute, domain A has been called twice, domain B has been called once, so the number of domains that been called should be two. But I don't know which query can get this result.
If I understand you correctly, you probably want a timechart instead:
index=ndx sourcetype=srctp domain=* source="sourceurl"
| timechart span=1m dc(domain) as count by source

Using dedup to find unique hosts. How can I find an average for the selected time frame?

The goal is to provide percent availability. I would like to check every 15 minutes if the unique count for server1, server2, and server3 is equal to 3 for each interval (indicating the system is fully healthy). From this count I want to check on the average for whatever time period is selected in splunk to output an average and convert to percent.
index="os" sourcetype=ps host="server1" OR host="server2" OR host="server3"
| search "/logs/temp/random/path" OR "application_listener"
| dedup host
| timechart span=30m count
The count should be 3 for each interval.
It's not clear how much of your requirements the example SPL solves, so I'll assume it does nothing.
Having dedup followed by timechart means the timechart command will only see 3 events - one for each host. That doesn't make for a helpful chart. I suggest using dc(host), instead to get a count of hosts for each interval.
The appendpipe command can be used to add average and percentage values on the end.
index="os" sourcetype=ps host="server1" OR host="server2" OR host="server3"
| search "/logs/temp/random/path" OR "application_listener"
| timechart span=30m dc(host) as count
| appendpipe [ stats avg(count) as Avg | eval Pct=round(Avg*100/3,2) ]

Add calculated threshold line on splunk timechart

I have a simple chart which shows the bottom 5 servers by number of request per minute. I'm looking to add a calculated threshold overlay line that is the average number of requests across all servers minus one standard deviation. I have been searching for hours but I have not been able to find anything.
Current Search Query:
sourcetype=x source=y host="server*" ENTERING | timechart useother=f
span=1m count by host WHERE count in bottom5
I essentially want something like the below (which doesn't work of course):
sourcetype=x source=y host="server*" ENTERING | timechart useother=f
span=1m count by host WHERE count in bottom5 | eval
threshold=(avg(countByHost) - stdev(countByHost))
Try this
sourcetype=x source=y host="server*" ENTERING | timechart useother=f span=1m avg(count) as avgByHost, stdev(count) as stdevByHost , count by host WHERE count in bottom5 |
eval threshold=avgByHost-stdevByHost | fields - threshold, count