Filter a result set to include only the top 99.9% of values in Splunk, preferably without a subquery - splunk

I am querying the access logs to a service. I want to build a scatter plot where the X axis is the total number of requests in that hour, and the Y axis is how many times a particular request (category) was made in that hour. To do this, my output needs to be:
requestDetails, per_hour, count
Easy enough. I use a query like:
<base query setting requestDetails>
| bucket _time span=1h
| stats count by _time, requestDetails
| eventstats sum(count) as per_hour by _time
| table requestDetails, per_hour, count
The challenge I am running in to is, there are very infrequent items that just add clutter to the chart. I don't care about every request, just the common ones. So I'd like to filter it so that only the requestDetails that make up 99.9% of traffic are included.
In theory, I could do this with a sub-query:
<base query>
[ search <base query>
| stats count by requestDetails
| sort -count
| eventstats sum(count) as total
| eval percent=count/total*100
| accum percent as percentile
| where percentile <= 99.9
| field requestDetails
]
| bucket _time span=1h
| stats count by _time, requestDetails
| eventstats sum(count) as per_hour by _time
| table requestDetails, per_hour, count
The problem is, it's expensive to do it this way, since I have to extract the data twice. And it doesn't exactly give me what I want, since it is a pre-filter, so the value of per_hour is missing the values for those filtered rows.
It seems to me I should be able to accomplish this with a single-pass through the data using eventstats or streamstats. But I'm drawing a blank on how to emulate the accum command because there are so many rows for a given requestDetails, and what I need is to only count any given requestDetails once.
Is there a way to do what I'm trying to do as a post-filter, without using a subquery? If so, what would it look like?

Indeed, it is possible to do this using a single pass through the data:
<base query>
| bucket _time span=1h
| stats count by _time, requestDetails
| eventstats sum(count) as per_hour by _time
| eventstats sum(count) as per_request by requestDetails
| eventstats sum(count) as total
| sort -per_request, +requestDetails
| streamstats sum(count) as count_til_now
| eval percentile = count_til_now / total * 100
| eventstats max(percentile) as max_percentile by requestDetails
| where max_percentile <= 99.9
| table requestDetails, per_hour, count
The key is the sort. We first organize them from most common requestDetails to least common. The other sort terms are to avoid overlap in the event of a tie.
We also have to take some eventstats, such as the total number of times a particular requestDetails was seen in the whole dataset, and the total number of requests.
The fun part is to use streamstats to create a running total, and use that to calculate the percentile.
But then in order to build our filter we do another eventstats to calculate what is the maximum percentile of a particular requestDetails.
All that is left is to filter where the max_percentile is below the threshold, and output the data needed for the visualization.

Related

Using dedup to find unique hosts. How can I find an average for the selected time frame?

The goal is to provide percent availability. I would like to check every 15 minutes if the unique count for server1, server2, and server3 is equal to 3 for each interval (indicating the system is fully healthy). From this count I want to check on the average for whatever time period is selected in splunk to output an average and convert to percent.
index="os" sourcetype=ps host="server1" OR host="server2" OR host="server3"
| search "/logs/temp/random/path" OR "application_listener"
| dedup host
| timechart span=30m count
The count should be 3 for each interval.
It's not clear how much of your requirements the example SPL solves, so I'll assume it does nothing.
Having dedup followed by timechart means the timechart command will only see 3 events - one for each host. That doesn't make for a helpful chart. I suggest using dc(host), instead to get a count of hosts for each interval.
The appendpipe command can be used to add average and percentage values on the end.
index="os" sourcetype=ps host="server1" OR host="server2" OR host="server3"
| search "/logs/temp/random/path" OR "application_listener"
| timechart span=30m dc(host) as count
| appendpipe [ stats avg(count) as Avg | eval Pct=round(Avg*100/3,2) ]

how count and plot several searches at once?

I am counting the number of hits on my website using splunk. My current search looks for a keywordA as follows:
index=mydata keywordA |bucket _time span=day |stats count by _time
However, I would like to add several other searches to the output, say for other keywords (keywordB for instance):
index=mydata keywordB |bucket _time span=day |stats count by _time
Note: these searches are not necessarily mutually exlusive! So the searches need to be run independently.
I would like to have the total daily count for each search at once, so that I avoid running each search separately.
Output should be:
day keyA keyB
2020-01-01 423 354
2020-01-02 523 254
What is the best way to proceed?
Thanks!
Try this search that combines your two. Other than the stats command, it doesn't scale well for many keywords.
index=mydata (keywordA OR keywordB)
| bin span=1d _time
| eval keyword = case(match(_raw, "keywordA"), "keywordA", match(_raw, "keywordB"), "keywordB", 1==1, "other")
| stats count by _time, keyword

Can I use splunk timechart without aggregate function?

https://docs.splunk.com/Documentation/Splunk/8.0.2/SearchReference/Timechart
I tried several syntaxes but none is working. they all require aggregate function.
My goal is to display a line chart, representing the value of an event field over time.
Very simple, I don't need any max/min/sum/count at all.
I need the x-axis to be the time span(time range that I passed in as query timespan), every event will be a data point in that chart, y-axis is the value of a field that I choose, for example, fieldA, which is a double value field.
how to write my splunk query?
search query ...| timechart fieldA?
(you don't have to use timechart, any command that can achieve my goal will be accepted)
update: let me try to describe what I wanted using a data generation example:
| makeresults count=10 | streamstats count AS rowNumber
let's say the time span is last 24 hours, when running above query in splunk, it will generate 10 records data with the same _time field which is #now, and a rowNumber field with values from 1 to 10. what I want to see is a visualization, x-axis starts from (#now-24hours) to #now, and no data points for most of the x-axis, but at last second(the rightmost) I want to see 10 dots, the y-axis values of them is from 1 to 10.
You do not need to use an aggregate function with timechart. Just about any stats function will do. See https://docs.splunk.com/Documentation/Splunk/8.0.2/SearchReference/Timechart#Stats_function_options.
Depending on the nature of your data and what you want to see in the chart any of timechart max(fieldA), timechart latest(fieldA), timechart earliest(fieldA), or timechart values(fieldA) may work for you.
| makeresults count=2
| streamstats count
| eval _time=if(count=1,relative_time(_time,"-1d"),_time)
| timechart span=160min count
| streamstats count
| timechart cont=f last(count)
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Eventorderfunctions
try with time picker all time
they reduced the number from original results.
It depends on how you use it.

Sorting problem regarding Last Modified Date in Splunk query

I have a problem regarding sorting in SPLUNK.
I want to make automated reports and I want to sort in a calendar the amount of tickets one day.
A ticket has these time stamps:
ACTUAL_END_DATE="2018-10-29 01:00:00.0",
ACTUAL_START_DATE="2018-10-29 00:00:00.0",
CLOSED_DATE="2019-06-16 12:56:00.0",
COMPLETED_DATE="2019-06-06 10:47:46.0",
EARLIEST_START_DATE="2018-10-23 11:20:42.0",
LAST_MODIFIED_DATE="2019-06-16 12:56:07.0",
RFA_DATE="2018-10-23 11:20:42.0",
RFC_DATE="2018-10-22 15:19:00.0",
SFA_DATE="2019-06-06 10:47:02.0",
SFR_DATE="2019-06-06 10:46:52.0",
SCHEDULED_DATE="2019-06-06 10:47:06.0",
SCHEDULED_END_DATE="2018-10-29 01:00:00.0",
SCHEDULED_START_DATE="2018-10-29 00:00:00.0",
SUBMIT_DATE="2018-10-22 15:18:53.0",
I sort by two tokens, the earliest is "#mon" and the latest is "now".
Unfortunately, it sorts by LAST_MODIFIED_DATE and I have 62 tickets in one day. All that have ACTUAL_START_DATE in different months, as you can change a ticket after it closed to add details.
This is my query:
stats latest(STATUS_REASON) as STATUS_REASON latest(CHANGE_REQUEST_STATUS) as CHANGE_REQUEST_STATUS latest(_time) as _time latest(CHANGE_TIMING) as CHANGE_TIMING by INFRASTRUCTURE_CHANGE_ID
| where CHANGE_REQUEST_STATUS !="Cancelled"
| timechart count span=1D
How can I sort them and get rid of the count from LAST_MODIFIED_DATE and have them shown by ACTUAL_START_DATE?
The timechart command is ordering by _time, not by LAST_MODIFIED_DATE (although the two fields may have the same values). To use a different field, assign that field's value to _time.
stats latest(STATUS_REASON) as STATUS_REASON latest(CHANGE_REQUEST_STATUS) as CHANGE_REQUEST_STATUS latest(_time) as _time latest(CHANGE_TIMING) as CHANGE_TIMING by INFRASTRUCTURE_CHANGE_ID
| where CHANGE_REQUEST_STATUS !="Cancelled"
| eval _time = strptime(ACTUAL_START_DATE, "%Y-%m-%d %H:%M:%S.%N")
| timechart count span=1D

Add calculated threshold line on splunk timechart

I have a simple chart which shows the bottom 5 servers by number of request per minute. I'm looking to add a calculated threshold overlay line that is the average number of requests across all servers minus one standard deviation. I have been searching for hours but I have not been able to find anything.
Current Search Query:
sourcetype=x source=y host="server*" ENTERING | timechart useother=f
span=1m count by host WHERE count in bottom5
I essentially want something like the below (which doesn't work of course):
sourcetype=x source=y host="server*" ENTERING | timechart useother=f
span=1m count by host WHERE count in bottom5 | eval
threshold=(avg(countByHost) - stdev(countByHost))
Try this
sourcetype=x source=y host="server*" ENTERING | timechart useother=f span=1m avg(count) as avgByHost, stdev(count) as stdevByHost , count by host WHERE count in bottom5 |
eval threshold=avgByHost-stdevByHost | fields - threshold, count