Query for calculating duration between two different logs in Splunk - splunk

As part of my requirements, I have to calculate the duration between two different logs using Splunk query.
For example:
Log 2:
2020-04-22 13:12 ADD request received ID : 123
Log 1 :
2020-04-22 12:12 REMOVE request received ID : 122
The common String between two logs is " request received ID :" and unique strings between two logs are "ADD", "REMOVE". And the expected output duration is 1 hour.
Any help would be appreciated. Thanks

You can use the transaction command, https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Transaction
Assuming you have the field ID extracted, you can do
index=* | transaction ID
This will automatically produce a field called duration, which is the time between the first and last event with the same ID

While transaction will work, it's very inefficient
This stats should show you what you're looking for (presuming the fields are already extracted):
(index=ndxA OR index=ndxB) ID=* ("ADD" OR "REMOVE")
| stats min(_time) as when_added max(_time) as when_removed by ID
| eval when_added=strftime(when_added,"%c"), when_removed(when_removed,"%c")
If you don't already have fields extracted, you'll need to modify thusly (remove the "\D^" in the regex if the ID value isn't at the end of the line):
(index=ndxA OR index=ndxB) ("ADD" OR "REMOVE")
| rex field=_raw "ID \s+:\s+(?<ID>\d+)\D^"
| stats min(_time) as when_added max(_time) as when_removed by ID
| eval when_added=strftime(when_added,"%c"), when_removed(when_removed,"%c")

Related

Regex count capture group members

I have multiple log messages each containing a list of JobIds -
IE -
1. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890039","db7a18ae-ea59-4987-87d5-c80adefa4475"]}`
2. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890040","db7a18ae-ea59-4987-87d5-c80adefa4489"]}`
3. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890070"]}`
I have a rex to get those jobIds. Next I want to count the number of jobIds
My query looks like this -
| rex field=message "\"(?<job_ids>(?:\w+-\w+-\w+-\w+-\w+)+),?\""
| stats count(job_ids)
But this will only give me a count of 3 when I am looking for 5. How can I get a count of all jobIds? I am not sure if this is a splunk limitation or I am missing something in my regex.
Here is my regex - https://regex101.com/r/vqlq5j/1
Also with max-match=0 but with mvcount() instead of mvexpand():
| makeresults count=3 | streamstats count
| eval message=case(count=1, "{\"JobIds\":[\"a1a2a2-b23-b34-d4d4d4\", \"x1a2a2-y23-y34-z4z4z4\"]}", count=2, "{\"JobIds\":[\"a1a9a9-b93-b04-d4d4d4\", \"x1a9a9-y93-y34-z4z4z4\"]}", count=3, "{\"JobIds\":[\"a1a9a9-b93-b04-d14d14d14\"]}")
``` above is test data setup ```
``` below is the actual query ```
| rex field=message max_match=0 "\"(?<id>[\w\d]+\-[\w\d]+\-[\w\d]+\-[\w\d]+\")"
| eval cnt=mvcount(id)
| stats sum(cnt)
In Splunk, to capture multiple matches from a single event, you need to add max_match=0 to your rex, per docs.Splunk
But to get them then separated into a singlevalue field from the [potential] multivalue field job_ids that you made, you need to mvxepand or similar
So this should get you closer:
| rex field=message max_match=0 "\"(?<job_id>(?:\w+-\w+-\w+-\w+-\w+)+),?\""
| mvexpand job_id
| stats dc(job_id)
I also changed from count to dc, as it seems you're looking for a unique count of job IDs, and not just a count of how many in total you've seen
Note: if this is JSON data (and not JSON-inside-JSON) coming into Splunk, and the sourcetype is configured correctly, you shouldn't have to manually extract the multivalue field, as Splunk will do it automatically
Do you have a full set of sample data (a few entire events) you can share?

Extracting a count from raw splunk data by id

I am trying to get a count from transactional information that is retained within raw data in splunk. I have 3-5 transactions that occur.
One has raw data stating: pin match for id 12345678-1234-1234-abcd-12345678abcd or pin mismatched for id etc.
I'm trying to count the number of times the pin match occurs within the transaction time window of 180sec.
I was trying to do something like:
|eval raw=_raw |search index=transa
|eval pinc= if((raw like "%pin match%"),1,0) |stats count(pinc) as Pincount by ID
The issue I'm having is it is counting cumulatively over whatever time I am looking at those transactions. Is there a way to attach it to the ID that is within the message or have it count every one that occurs within that time window?
Thanks!
Presuming the pin status and ID have not been extracted:
index=ndx sourcetype=srctp "pin" "match" OR "mismatched"
| rex field=_raw "pin (?<pin_status>\w+)"
| rex field=_raw "id (?<id>\S+)"
| eval status_time=pin_status+"|"+_time
| stats earliest(status_time) as beginning latest(status_time) as ending by id
| eval beginning=split(beginning,"|"), ending=split(ending,"|")
| eval begining=mvindex(beginning,-1), ending=mvindex(ending,-1)
| table id beginning ending
| sort 0 id
| eval beginning=strftime(beginning,"%c"), ending=strftime(ending,"%c")
After extracting the status ("match" or "mismatched") and the id, append the individual event's _time to the end of the status - we'll pull that value back out after statsing
Using stats, find the earliest and latest status_time entries (fields just created on the previous line) by id, saving them into new fields beginning and ending
Next, split() beginning and ending on the pipe we added to separate the status from the timestamp into a multivalue field
Then assign the last item from the multivalue field (which we know is the timestamp) into itself (because we know that the earliest entry for a status_time should always be "match", and the latest entry for a status_time should always be "mismatched")
Lastly, table the id and time stamps, sort by id, and format the timestamp into something human readable (strftime takes many arguments, %c just happens to be quick)

SPLUNK use result from first search in second search

Say I have a query such as
index="example" source="example.log" host="example" "ERROR 1234"
| stats distinct_count by id
This will give me all the events with that error code per id.
I then want to combine this query to search the same log file for another string but only on the unique id's returned from the first search. Because the new string will appear on a separate event I can't just do an 'AND'.
There are a few ways to do that, including using subsearches, join, or append, but those require multiple passes through the data. Here is a way that makes a single pass through the index.
index=example source="example.log" ("ERROR 1234" OR "ERROR 5678")
``` Check for the presence of each string in the event ```
| eval string1=if(searchmatch("ERROR 1234"), 1, 0)
| eval string2=if(searchmatch("ERROR 5678"), 1, 0)
``` Count string occurrences by id ```
| stats sum(string1) as string1, sum(string2) as string2 by id
``` Keep only the ids that have both strings ```
| where (string1 > 0 AND string2 > 0)
You can search for "some other string" in subsearch and then join the queries on the id:
index="example" source="example.log" host="example" "ERROR 1234"
| join id [search index="example" source="example.log" host="example" "some other string" ]
| stats distinct_count by id
Presuming your id field is the same and available in both indices, this form should work:
(index=ndxA sourcetype=srctpA id=* source=example.log host=example "ERROR 1234") OR (index=ndxB sourcetype=srctpB id=* "some other string")
| rex field=_raw "(?<first_field>ERROR 1234)"
| rex field=_raw "(?<second_field>some other string)"
| fillnull value="-" first_field second_field
| stats count by id first_string second_string
| search NOT (first_string="-" OR second_string="-")
If your id field has a different name in the other index, do a rename like this before the stats line:
| rename otherIdFieldName as id
Advantages of this format:
you are not limited by subsearch constraints (search must finish in 60 seconds, no more than 50k rows)
the Search Peers (ie Indexers) will handle all of the overhead instead of having to wait on the Search Head that initiated the search to do lots of post-processing (all the SH is doing is sending the distributed search, then a post-stats filter to ensure both first_string and second_string have the values you are looking for)

Splunk search - How to loop on multi values field

My use case is analysing ticket in order to attribute a state regarding all the status of a specific ticket.
Raw data look like this :
Id
Version
Status
Event Time
0001
1
New
2021-01-07T09:14:00Z
0001
1
Completed - Action Performed
2021-01-07T09:38:00Z
Data looks like this after transaction command:
Id
Version
Status
Event Time
state
0001, 0001
1, 1
New, Completed - Action Performed
2021-01-07T09:14:00Z, 2021-01-07T09:38:00Z
Acknowlegdement, Work
I'm using transcation command in order to calculate the duration of acknnowlegdement and resolution of the ticket.
I have predefine rule to choose the correct state. This rules compare the n-1 status (New), and the current status (Completed - Action Performed) to choose the state.
Issue
Each ticket has a different number of status. We can not know in advance the max status number. I can not write a static search comparing each value of the Status field.
Expected Solution
I have a field that inform me the number of index on the status (number of status of a ticket) field.
I want to use a loop (Why not a loop for), to iterate on each index of the field Status and compare the value i-1 and i.
I can not find how to do this. Is this possible ?
Thank you
Update to reflect more details
Here's a method with streamstats that should get you towards an answer:
index=ndx sourcetype=srctp Id=* Version=* Status=* EventTime=* state=*
| eval phash=sha256(Version.Status)
| sort 0 _time
| streamstats current=f last(phash) as chash by Id state
| fillnull value="noprev"
| eval changed=if(chash!=phash OR chash="noprev","true","false")
| search NOT changed="false"
| table *
original answer
Something like the following should work to get the most-recent status:
index=ndx sourcetype=srctp Id=* Version=* Status=* EventTime=* state=*
| stats latest(Status) as Status latest(Version) as Version latest(state) state latest(EventTime) as "Event Time" by Id
edit in light of mentioning g the transaction command
Don't use transaction unless you really really really need to.
99% of the time, stats will accomplish what transaction does faster and more efficiently.
For example:
index=ndx sourcetype=srctp Id=* Version=* Status=* EventTime=* state=*
| stats earliest(Status) as eStatus latest(Status) as lStatus earliest(Version) as eVersion latest(Version) as lVersion earliest(status) as estate latest(state) lstate earliest(EventTime) as Opened latest(EventTime) as MostRecent by Id
Will yield a table you can then manipulate further with eval and such. Eg (presuming the time format is subtractable (ie still in Unix epoch format)):
| eval ticketAge=MostRecent-Opened
| eval Versions=eVersion+" - "+lVersion
| eval Statuses=eStatus+" - "+lStatus
| eval State=estate+", ",lstate
| eval Opened=strftime(Opened,"%c"), MostRecent=strftime(MostRecent,"%c")
| eval D=if(ticketAge>86400,round(ticketAge/86400),0)
| eval ticketAge=if(D>0,round(ticketAge-(D*86400)),ticketAge)
| eval H=if(ticketAge>3600,round(ticketAge/3600),0)
| eval ticketAge=if(H>0,round(ticketAge-(H*3600)),ticketAge)
| eval M=if(ticketAge>60,round(ticketAge/60),0)
| eval ticketAge=if(M>0,round(ticketAge-(M*60)),ticketAge)
| rename ticketAge as S
| eval Age=D+" days "+H+" hours"+M+" minutes"+S+" seconds"
| table Id Versions Statuses Opened MostRecent State Age
| rename MostRecent as "Most Recent"
Note: I may have gotten the conversion from raw seconds into days, hours, minutes, seconds off - but it should be close

Splunk - Get Prefefined Outputs Based on the event count and event data

I have a query as below. The result is always predefined as -
If the query result has 3 events and if the 3rd event has event="delivered" as value then the whole transaction needs to be returned as "COMPLETE".
If the 3rd event is present and event!="delivered" then the status becomes "PENDING"
If the 3rd event is not present at all, then the transaction is marked as ERROR
My Query -
index=myindex OR index=myindex2 uuid=98as786-ffe6-4de1-929y-080e99bc2e6r (status="202") OR (TransactionStatus="PUBLISHED") | append [search index=myindex2 (logMessage="Producer created new event") event="delivered" OR event="processed" serviceName="abc" [search index=myindex uuid=98as786-ffe6-4de1-929y-080e99bc2e6r AND status="SUCCESS" AND serviceName="abc" | top limit=1 headerId | fields + headerId | rename headerId as message_id]]
Result events -
Event1 - 202 Accepted
Event 2 - Adapter Success
Event 3 - delivered or error or processed
My high level dashboard should look like below -
Complete - 6378638
Pending - 2173
Error - 6356
The unique ID will be the UUID on which the count to be performed.
What can be the possible way we can do this - eval ? Lookup ? not sure as I am new to splunk.
Please let me know if more information is needed if I am missing something.
See if this helps. The terminology in your question is a little inconsistent so you may need to adjust the field names in this query.
index=myindex OR index=myindex2 uuid=98as786-ffe6-4de1-929y-080e99bc2e6r ((status="202") OR (TransactionStatus="PUBLISHED")) OR (index=myindex2 (logMessage="Producer created new event") event="delivered" OR event="processed" serviceName="abc") (index=myindex uuid=98as786-ffe6-4de1-929y-080e99bc2e6r AND status="SUCCESS" AND serviceName="abc" )
| stats count, latest(event) as event by headerId
| eval result=case(count=3 AND event="delivered", "COMPLETE", count=3 AND event!="delivered", "PENDING", count!=3, "ERROR", 1=1, "UNKNOWN")
| stats count by result
| table result count