how count and plot several searches at once? - splunk

I am counting the number of hits on my website using splunk. My current search looks for a keywordA as follows:
index=mydata keywordA |bucket _time span=day |stats count by _time
However, I would like to add several other searches to the output, say for other keywords (keywordB for instance):
index=mydata keywordB |bucket _time span=day |stats count by _time
Note: these searches are not necessarily mutually exlusive! So the searches need to be run independently.
I would like to have the total daily count for each search at once, so that I avoid running each search separately.
Output should be:
day keyA keyB
2020-01-01 423 354
2020-01-02 523 254
What is the best way to proceed?
Thanks!

Try this search that combines your two. Other than the stats command, it doesn't scale well for many keywords.
index=mydata (keywordA OR keywordB)
| bin span=1d _time
| eval keyword = case(match(_raw, "keywordA"), "keywordA", match(_raw, "keywordB"), "keywordB", 1==1, "other")
| stats count by _time, keyword

Related

Regex count capture group members

I have multiple log messages each containing a list of JobIds -
IE -
1. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890039","db7a18ae-ea59-4987-87d5-c80adefa4475"]}`
2. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890040","db7a18ae-ea59-4987-87d5-c80adefa4489"]}`
3. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890070"]}`
I have a rex to get those jobIds. Next I want to count the number of jobIds
My query looks like this -
| rex field=message "\"(?<job_ids>(?:\w+-\w+-\w+-\w+-\w+)+),?\""
| stats count(job_ids)
But this will only give me a count of 3 when I am looking for 5. How can I get a count of all jobIds? I am not sure if this is a splunk limitation or I am missing something in my regex.
Here is my regex - https://regex101.com/r/vqlq5j/1
Also with max-match=0 but with mvcount() instead of mvexpand():
| makeresults count=3 | streamstats count
| eval message=case(count=1, "{\"JobIds\":[\"a1a2a2-b23-b34-d4d4d4\", \"x1a2a2-y23-y34-z4z4z4\"]}", count=2, "{\"JobIds\":[\"a1a9a9-b93-b04-d4d4d4\", \"x1a9a9-y93-y34-z4z4z4\"]}", count=3, "{\"JobIds\":[\"a1a9a9-b93-b04-d14d14d14\"]}")
``` above is test data setup ```
``` below is the actual query ```
| rex field=message max_match=0 "\"(?<id>[\w\d]+\-[\w\d]+\-[\w\d]+\-[\w\d]+\")"
| eval cnt=mvcount(id)
| stats sum(cnt)
In Splunk, to capture multiple matches from a single event, you need to add max_match=0 to your rex, per docs.Splunk
But to get them then separated into a singlevalue field from the [potential] multivalue field job_ids that you made, you need to mvxepand or similar
So this should get you closer:
| rex field=message max_match=0 "\"(?<job_id>(?:\w+-\w+-\w+-\w+-\w+)+),?\""
| mvexpand job_id
| stats dc(job_id)
I also changed from count to dc, as it seems you're looking for a unique count of job IDs, and not just a count of how many in total you've seen
Note: if this is JSON data (and not JSON-inside-JSON) coming into Splunk, and the sourcetype is configured correctly, you shouldn't have to manually extract the multivalue field, as Splunk will do it automatically
Do you have a full set of sample data (a few entire events) you can share?

Filter a result set to include only the top 99.9% of values in Splunk, preferably without a subquery

I am querying the access logs to a service. I want to build a scatter plot where the X axis is the total number of requests in that hour, and the Y axis is how many times a particular request (category) was made in that hour. To do this, my output needs to be:
requestDetails, per_hour, count
Easy enough. I use a query like:
<base query setting requestDetails>
| bucket _time span=1h
| stats count by _time, requestDetails
| eventstats sum(count) as per_hour by _time
| table requestDetails, per_hour, count
The challenge I am running in to is, there are very infrequent items that just add clutter to the chart. I don't care about every request, just the common ones. So I'd like to filter it so that only the requestDetails that make up 99.9% of traffic are included.
In theory, I could do this with a sub-query:
<base query>
[ search <base query>
| stats count by requestDetails
| sort -count
| eventstats sum(count) as total
| eval percent=count/total*100
| accum percent as percentile
| where percentile <= 99.9
| field requestDetails
]
| bucket _time span=1h
| stats count by _time, requestDetails
| eventstats sum(count) as per_hour by _time
| table requestDetails, per_hour, count
The problem is, it's expensive to do it this way, since I have to extract the data twice. And it doesn't exactly give me what I want, since it is a pre-filter, so the value of per_hour is missing the values for those filtered rows.
It seems to me I should be able to accomplish this with a single-pass through the data using eventstats or streamstats. But I'm drawing a blank on how to emulate the accum command because there are so many rows for a given requestDetails, and what I need is to only count any given requestDetails once.
Is there a way to do what I'm trying to do as a post-filter, without using a subquery? If so, what would it look like?
Indeed, it is possible to do this using a single pass through the data:
<base query>
| bucket _time span=1h
| stats count by _time, requestDetails
| eventstats sum(count) as per_hour by _time
| eventstats sum(count) as per_request by requestDetails
| eventstats sum(count) as total
| sort -per_request, +requestDetails
| streamstats sum(count) as count_til_now
| eval percentile = count_til_now / total * 100
| eventstats max(percentile) as max_percentile by requestDetails
| where max_percentile <= 99.9
| table requestDetails, per_hour, count
The key is the sort. We first organize them from most common requestDetails to least common. The other sort terms are to avoid overlap in the event of a tie.
We also have to take some eventstats, such as the total number of times a particular requestDetails was seen in the whole dataset, and the total number of requests.
The fun part is to use streamstats to create a running total, and use that to calculate the percentile.
But then in order to build our filter we do another eventstats to calculate what is the maximum percentile of a particular requestDetails.
All that is left is to filter where the max_percentile is below the threshold, and output the data needed for the visualization.

Using dedup to find unique hosts. How can I find an average for the selected time frame?

The goal is to provide percent availability. I would like to check every 15 minutes if the unique count for server1, server2, and server3 is equal to 3 for each interval (indicating the system is fully healthy). From this count I want to check on the average for whatever time period is selected in splunk to output an average and convert to percent.
index="os" sourcetype=ps host="server1" OR host="server2" OR host="server3"
| search "/logs/temp/random/path" OR "application_listener"
| dedup host
| timechart span=30m count
The count should be 3 for each interval.
It's not clear how much of your requirements the example SPL solves, so I'll assume it does nothing.
Having dedup followed by timechart means the timechart command will only see 3 events - one for each host. That doesn't make for a helpful chart. I suggest using dc(host), instead to get a count of hosts for each interval.
The appendpipe command can be used to add average and percentage values on the end.
index="os" sourcetype=ps host="server1" OR host="server2" OR host="server3"
| search "/logs/temp/random/path" OR "application_listener"
| timechart span=30m dc(host) as count
| appendpipe [ stats avg(count) as Avg | eval Pct=round(Avg*100/3,2) ]

Can I use splunk timechart without aggregate function?

https://docs.splunk.com/Documentation/Splunk/8.0.2/SearchReference/Timechart
I tried several syntaxes but none is working. they all require aggregate function.
My goal is to display a line chart, representing the value of an event field over time.
Very simple, I don't need any max/min/sum/count at all.
I need the x-axis to be the time span(time range that I passed in as query timespan), every event will be a data point in that chart, y-axis is the value of a field that I choose, for example, fieldA, which is a double value field.
how to write my splunk query?
search query ...| timechart fieldA?
(you don't have to use timechart, any command that can achieve my goal will be accepted)
update: let me try to describe what I wanted using a data generation example:
| makeresults count=10 | streamstats count AS rowNumber
let's say the time span is last 24 hours, when running above query in splunk, it will generate 10 records data with the same _time field which is #now, and a rowNumber field with values from 1 to 10. what I want to see is a visualization, x-axis starts from (#now-24hours) to #now, and no data points for most of the x-axis, but at last second(the rightmost) I want to see 10 dots, the y-axis values of them is from 1 to 10.
You do not need to use an aggregate function with timechart. Just about any stats function will do. See https://docs.splunk.com/Documentation/Splunk/8.0.2/SearchReference/Timechart#Stats_function_options.
Depending on the nature of your data and what you want to see in the chart any of timechart max(fieldA), timechart latest(fieldA), timechart earliest(fieldA), or timechart values(fieldA) may work for you.
| makeresults count=2
| streamstats count
| eval _time=if(count=1,relative_time(_time,"-1d"),_time)
| timechart span=160min count
| streamstats count
| timechart cont=f last(count)
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Eventorderfunctions
try with time picker all time
they reduced the number from original results.
It depends on how you use it.

How to set different target values for different days in Splunk?

how can I change the target column in different target values you can see in the picture below.
How is it possible to set different values for the different days?
Result in Splunk with changes
index=*************
| bin _time span=1d
| stats count by _time
| eval target = 1000
Using a lookup would be the best approach, as Rich said.
Alternatively, you could use a case statement, if the list is not going to be too long.
index=************* | bin _time span=1d | stats count by _time | eval target = case(
_time="2019-09-23",500,
_time="2019-09-24",2000,
_time="2019-09-25",1300,
_time="2019-09-26",400,
_time="2019-09-27",1500,
_time="2019-09-29",300 )