Sorting problem regarding Last Modified Date in Splunk query - splunk

I have a problem regarding sorting in SPLUNK.
I want to make automated reports and I want to sort in a calendar the amount of tickets one day.
A ticket has these time stamps:
ACTUAL_END_DATE="2018-10-29 01:00:00.0",
ACTUAL_START_DATE="2018-10-29 00:00:00.0",
CLOSED_DATE="2019-06-16 12:56:00.0",
COMPLETED_DATE="2019-06-06 10:47:46.0",
EARLIEST_START_DATE="2018-10-23 11:20:42.0",
LAST_MODIFIED_DATE="2019-06-16 12:56:07.0",
RFA_DATE="2018-10-23 11:20:42.0",
RFC_DATE="2018-10-22 15:19:00.0",
SFA_DATE="2019-06-06 10:47:02.0",
SFR_DATE="2019-06-06 10:46:52.0",
SCHEDULED_DATE="2019-06-06 10:47:06.0",
SCHEDULED_END_DATE="2018-10-29 01:00:00.0",
SCHEDULED_START_DATE="2018-10-29 00:00:00.0",
SUBMIT_DATE="2018-10-22 15:18:53.0",
I sort by two tokens, the earliest is "#mon" and the latest is "now".
Unfortunately, it sorts by LAST_MODIFIED_DATE and I have 62 tickets in one day. All that have ACTUAL_START_DATE in different months, as you can change a ticket after it closed to add details.
This is my query:
stats latest(STATUS_REASON) as STATUS_REASON latest(CHANGE_REQUEST_STATUS) as CHANGE_REQUEST_STATUS latest(_time) as _time latest(CHANGE_TIMING) as CHANGE_TIMING by INFRASTRUCTURE_CHANGE_ID
| where CHANGE_REQUEST_STATUS !="Cancelled"
| timechart count span=1D
How can I sort them and get rid of the count from LAST_MODIFIED_DATE and have them shown by ACTUAL_START_DATE?

The timechart command is ordering by _time, not by LAST_MODIFIED_DATE (although the two fields may have the same values). To use a different field, assign that field's value to _time.
stats latest(STATUS_REASON) as STATUS_REASON latest(CHANGE_REQUEST_STATUS) as CHANGE_REQUEST_STATUS latest(_time) as _time latest(CHANGE_TIMING) as CHANGE_TIMING by INFRASTRUCTURE_CHANGE_ID
| where CHANGE_REQUEST_STATUS !="Cancelled"
| eval _time = strptime(ACTUAL_START_DATE, "%Y-%m-%d %H:%M:%S.%N")
| timechart count span=1D

Related

Extracting a count from raw splunk data by id

I am trying to get a count from transactional information that is retained within raw data in splunk. I have 3-5 transactions that occur.
One has raw data stating: pin match for id 12345678-1234-1234-abcd-12345678abcd or pin mismatched for id etc.
I'm trying to count the number of times the pin match occurs within the transaction time window of 180sec.
I was trying to do something like:
|eval raw=_raw |search index=transa
|eval pinc= if((raw like "%pin match%"),1,0) |stats count(pinc) as Pincount by ID
The issue I'm having is it is counting cumulatively over whatever time I am looking at those transactions. Is there a way to attach it to the ID that is within the message or have it count every one that occurs within that time window?
Thanks!
Presuming the pin status and ID have not been extracted:
index=ndx sourcetype=srctp "pin" "match" OR "mismatched"
| rex field=_raw "pin (?<pin_status>\w+)"
| rex field=_raw "id (?<id>\S+)"
| eval status_time=pin_status+"|"+_time
| stats earliest(status_time) as beginning latest(status_time) as ending by id
| eval beginning=split(beginning,"|"), ending=split(ending,"|")
| eval begining=mvindex(beginning,-1), ending=mvindex(ending,-1)
| table id beginning ending
| sort 0 id
| eval beginning=strftime(beginning,"%c"), ending=strftime(ending,"%c")
After extracting the status ("match" or "mismatched") and the id, append the individual event's _time to the end of the status - we'll pull that value back out after statsing
Using stats, find the earliest and latest status_time entries (fields just created on the previous line) by id, saving them into new fields beginning and ending
Next, split() beginning and ending on the pipe we added to separate the status from the timestamp into a multivalue field
Then assign the last item from the multivalue field (which we know is the timestamp) into itself (because we know that the earliest entry for a status_time should always be "match", and the latest entry for a status_time should always be "mismatched")
Lastly, table the id and time stamps, sort by id, and format the timestamp into something human readable (strftime takes many arguments, %c just happens to be quick)

Filter a result set to include only the top 99.9% of values in Splunk, preferably without a subquery

I am querying the access logs to a service. I want to build a scatter plot where the X axis is the total number of requests in that hour, and the Y axis is how many times a particular request (category) was made in that hour. To do this, my output needs to be:
requestDetails, per_hour, count
Easy enough. I use a query like:
<base query setting requestDetails>
| bucket _time span=1h
| stats count by _time, requestDetails
| eventstats sum(count) as per_hour by _time
| table requestDetails, per_hour, count
The challenge I am running in to is, there are very infrequent items that just add clutter to the chart. I don't care about every request, just the common ones. So I'd like to filter it so that only the requestDetails that make up 99.9% of traffic are included.
In theory, I could do this with a sub-query:
<base query>
[ search <base query>
| stats count by requestDetails
| sort -count
| eventstats sum(count) as total
| eval percent=count/total*100
| accum percent as percentile
| where percentile <= 99.9
| field requestDetails
]
| bucket _time span=1h
| stats count by _time, requestDetails
| eventstats sum(count) as per_hour by _time
| table requestDetails, per_hour, count
The problem is, it's expensive to do it this way, since I have to extract the data twice. And it doesn't exactly give me what I want, since it is a pre-filter, so the value of per_hour is missing the values for those filtered rows.
It seems to me I should be able to accomplish this with a single-pass through the data using eventstats or streamstats. But I'm drawing a blank on how to emulate the accum command because there are so many rows for a given requestDetails, and what I need is to only count any given requestDetails once.
Is there a way to do what I'm trying to do as a post-filter, without using a subquery? If so, what would it look like?
Indeed, it is possible to do this using a single pass through the data:
<base query>
| bucket _time span=1h
| stats count by _time, requestDetails
| eventstats sum(count) as per_hour by _time
| eventstats sum(count) as per_request by requestDetails
| eventstats sum(count) as total
| sort -per_request, +requestDetails
| streamstats sum(count) as count_til_now
| eval percentile = count_til_now / total * 100
| eventstats max(percentile) as max_percentile by requestDetails
| where max_percentile <= 99.9
| table requestDetails, per_hour, count
The key is the sort. We first organize them from most common requestDetails to least common. The other sort terms are to avoid overlap in the event of a tie.
We also have to take some eventstats, such as the total number of times a particular requestDetails was seen in the whole dataset, and the total number of requests.
The fun part is to use streamstats to create a running total, and use that to calculate the percentile.
But then in order to build our filter we do another eventstats to calculate what is the maximum percentile of a particular requestDetails.
All that is left is to filter where the max_percentile is below the threshold, and output the data needed for the visualization.

Using dedup to find unique hosts. How can I find an average for the selected time frame?

The goal is to provide percent availability. I would like to check every 15 minutes if the unique count for server1, server2, and server3 is equal to 3 for each interval (indicating the system is fully healthy). From this count I want to check on the average for whatever time period is selected in splunk to output an average and convert to percent.
index="os" sourcetype=ps host="server1" OR host="server2" OR host="server3"
| search "/logs/temp/random/path" OR "application_listener"
| dedup host
| timechart span=30m count
The count should be 3 for each interval.
It's not clear how much of your requirements the example SPL solves, so I'll assume it does nothing.
Having dedup followed by timechart means the timechart command will only see 3 events - one for each host. That doesn't make for a helpful chart. I suggest using dc(host), instead to get a count of hosts for each interval.
The appendpipe command can be used to add average and percentage values on the end.
index="os" sourcetype=ps host="server1" OR host="server2" OR host="server3"
| search "/logs/temp/random/path" OR "application_listener"
| timechart span=30m dc(host) as count
| appendpipe [ stats avg(count) as Avg | eval Pct=round(Avg*100/3,2) ]

how count and plot several searches at once?

I am counting the number of hits on my website using splunk. My current search looks for a keywordA as follows:
index=mydata keywordA |bucket _time span=day |stats count by _time
However, I would like to add several other searches to the output, say for other keywords (keywordB for instance):
index=mydata keywordB |bucket _time span=day |stats count by _time
Note: these searches are not necessarily mutually exlusive! So the searches need to be run independently.
I would like to have the total daily count for each search at once, so that I avoid running each search separately.
Output should be:
day keyA keyB
2020-01-01 423 354
2020-01-02 523 254
What is the best way to proceed?
Thanks!
Try this search that combines your two. Other than the stats command, it doesn't scale well for many keywords.
index=mydata (keywordA OR keywordB)
| bin span=1d _time
| eval keyword = case(match(_raw, "keywordA"), "keywordA", match(_raw, "keywordB"), "keywordB", 1==1, "other")
| stats count by _time, keyword

Can I use splunk timechart without aggregate function?

https://docs.splunk.com/Documentation/Splunk/8.0.2/SearchReference/Timechart
I tried several syntaxes but none is working. they all require aggregate function.
My goal is to display a line chart, representing the value of an event field over time.
Very simple, I don't need any max/min/sum/count at all.
I need the x-axis to be the time span(time range that I passed in as query timespan), every event will be a data point in that chart, y-axis is the value of a field that I choose, for example, fieldA, which is a double value field.
how to write my splunk query?
search query ...| timechart fieldA?
(you don't have to use timechart, any command that can achieve my goal will be accepted)
update: let me try to describe what I wanted using a data generation example:
| makeresults count=10 | streamstats count AS rowNumber
let's say the time span is last 24 hours, when running above query in splunk, it will generate 10 records data with the same _time field which is #now, and a rowNumber field with values from 1 to 10. what I want to see is a visualization, x-axis starts from (#now-24hours) to #now, and no data points for most of the x-axis, but at last second(the rightmost) I want to see 10 dots, the y-axis values of them is from 1 to 10.
You do not need to use an aggregate function with timechart. Just about any stats function will do. See https://docs.splunk.com/Documentation/Splunk/8.0.2/SearchReference/Timechart#Stats_function_options.
Depending on the nature of your data and what you want to see in the chart any of timechart max(fieldA), timechart latest(fieldA), timechart earliest(fieldA), or timechart values(fieldA) may work for you.
| makeresults count=2
| streamstats count
| eval _time=if(count=1,relative_time(_time,"-1d"),_time)
| timechart span=160min count
| streamstats count
| timechart cont=f last(count)
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Eventorderfunctions
try with time picker all time
they reduced the number from original results.
It depends on how you use it.