How to find time duration between two splunk events which has unique key - splunk

First Event
17:09:05:362 INFO com.a.b.App - Making a GET Request and req-id: [123456]
Second Event
17:09:06:480 INFO com.a.b.App - Output Status Code: 200 req-id:"123456"
I tried to use index="xyz" container="service-name" | transaction "req-id" startswith="Making a GET Request" endswith="Output Status Code" | table duration but it is also not working.
I want to calculate duration of above two events for every request. I went over some solutions in splunk and Stack Overflow, but still can't get the proper result.

Try doing it with stats instead:
index=ndx sourcetype=srctp
| rex field=_raw "req\-id\D+(?<req_id>\d+)"
| rex field=_raw "(?<sequence>Making a GET Request)"
| rex field=_raw "(?<sequence>Output Status Code)"
| eval sequence=sequence+";"+_time
| stats values(sequence) as sequence by req_id
| mvexpand sequence
| rex field=sequence "(?<sequence>[^;]+);(?<time>\d+)"
| eval time=strftime(time,"%c")
This will extract the "req-id" into a field named req_id, and the start and end of the sequence into a field named sequence
Presuming the sample data you shared is correct, when you stats values(sequence) as sequence, it will put the "Making..." entry first and the "Output..." entry second
Because values() will do this, when you mvexpand and then split the values()'d field part into sequence and time, they'll be in the proper order
If the sample data is incomplete, you may need to tweak the regexes for populating sequence

It’s seem you’re going with my previously suggested approach 😉
Now you have 2 possibilities
1. SPL
Below the simplest query, only invoking 1 rex and assuming _time field correctly filled
index=<your_index> source=<your_source>
("*Making a GET Request*" OR "*Output Status Code*")
| rex field=_raw "req\-id\D+(?<req_id>\d+)"
| stats max(_time) as end, min(_time) as start by id
| eval duration = end - start
| table id duration
Note that depending the amount of data to scan, this one can be ressources consuming for your Splunk cluster
2. Log the response time directly in API (more efficient)
It seem you are working on an API. You must have capabilities to get the response time of each call and directly trace it in your log
Then you can exploit it easily in SPL without calculation
It always preferable to persist data at index time vs. operate systematic calculation at search time

Related

Splunk Concurrency Calculation

I have some data from logs in Splunk where I need to determine what other requests were running concurrently at the time of any single event.
Using the following query, I was able to have it return a column for the number of requests that ran at the same time within my start time and duration.
index="sfdc" source="sfdc_event_log://EventLog_SFDC_Production_eventlog_hourly" EVENT_TYPE IN (API, RestAPI) RUN_TIME>20000
| eval endTime=_time
| eval permitTimeInSecs=(RUN_TIME-20000)/1000
| eval permitAcquiredTime=endTime-permitTimeInSecs
| eval dbTotalTime=DB_TOTAL_TIME/1000000
| concurrency start=permitAcquiredTime duration=permitTimeInSecs
| table _time API_TYPE EVENT_TYPE ENTITY_NAME apimethod concurrency permitAcquiredTime permitTimeInSecs RUN_TIME CPU_TIME dbtotalTime REQUEST_ID USER_ID
| fieldformat dbTotalTime=round(dbTotalTime,0)
| rename permitAcquiredTime as "Start Time", permitTimeInSecs as "Concurrency Duration", concurrency as "Concurrent Running Events", API_TYPE as "API Type", EVENT_TYPE as "Event Type", ENTITY_NAME as "Entity Name", apimethod as "API Method", RUN_TIME as "Run Time", CPU_TIME as "CPU Time", dbtotalTime as "DB Total Time", REQUEST_ID as "Request ID", USER_ID as "User ID"
| sort "Concurrent Running Events" desc
I am now trying to investigate a single event in these results. For example, the top event says that at the time it ran, there were 108 concurrent requests running in the 20 second window of time.
How can I identify those 108 events using this data?
I imagine it would be querying the events that had a specific time frame range, but I am not sure if I need to check something like _time + - 10 seconds to see what was running within the 20 second window?
Just need to understand the data behind this 108 events a little more for this top example. My end goal here is to be able to add a drill-down to the dashboard so that when I click on the 108, I can see those events that were running.
Essentially, you are on right lines. What you want to do is create a search (presumably on the original data) using 'earliest=<beginning of 20 second window> latest=<end of 20 second window> using your calculated values.
You have start time and can calculate end time. Then pipe these as variables into a new search.
| search earliest=start_time latest=end_time index="sfdc" etc..
I cant check this here right now. But its probably something along those lines. Quite likely more elegant ways to do the same. Hope I'm not wildly off mark and this at least helps a little.

Splunk searching event logs to find values exceeding a given threshold

I want to search the log event
"Closure request counts: startAssets: "
and find occurrences where the startAssets are larger than 50.
How would I do that?
Something like:
Closure request counts: startAssets: 51
would maybe give a search similar to
"Closure request counts: startAssets: {num} AND num >=50"
perhaps?
What does that look like in SPL?
That's pretty simple, but you'll need to extract the number to do it. I like to use the rex command to do that, but there may be other ways.
index=foo "Closure request counts: startAssets: *"
| rex "startAssets: (?<startAssets>\d+)"
| where startAssets > 50

In Amazon Cloudwatch Insights, how do you take a statistic of a statistic?

I am using AWS Cloudwatch Insights and running a query like this:
fields #message, #timestamp
| filter strcontains(#message, "Something of interest happened")
| stats count() as interestCount by bin(10m) as tenMinuteTime
| stats max(interestCount) by datefloor(tenMinuteTime, 1d)
However, on the last line, I get the following error:
mismatched input 'stats' expecting {K_PARSE, K_SEARCH, K_FIELDS, K_DISPLAY, K_FILTER, K_SORT, K_ORDER, K_HEAD, K_LIMIT, K_TAIL}
It would seem to mean from this that I cannot take multiple layers of stat queries in Insights, and thus cannot take a statistic of a statistic. Is there a way around this?
You cannot currently use multiple stat commands and from what I know there is no direct way around that at this time. You can however thicken up your single stat command and separate by comma, like so:
fields #message, #timestamp
| filter strcontains(#message, "Something of interest happened")
| stats count() as #interestCount,
max(interestCount) as #maxInterest,
interestCount by bin(10m) as #tenMinuteTime
You define fields and use functions after stats and then process those result fields.

Filtering duplicate entries from Splunk events

I am new to splunk and have got some splunk events as below
2019-06-26 23:45:36 INFO ID 123456 | Response Code 404
2019-06-26 23:55:36 INFO ID 123456 | Response Code 404
2019-06-26 23:23:36 INFO ID 258080 | Response Code 404
Is there way to filter out the first two events as they have the same ID 123456 and view them as one event?
I tried something which I know is completely wrong, suggestions might be very useful on this.
index=myindex "Response Code 404" | rex field=ID max_match=2 "(?<MyID>\b(?:123456)\b)" | stats count by ID MyID | where count > 1
That's not completely wrong. It's one of the legitimate ways to remove duplicates. Here's another:
index=myindex "Response Code 404"
| rex field=ID max_match=2 "(?<MyID>\b(?:123456)\b)"
| dedup MyID
Using dedup is often preferred because it doesn't remove fields the way stats does.
I know it's late reply, but those really aren't duplicate events if the timestamps are different. I'd be more concerned about finding out what machine is sending an event twice at different times (and why than eliminating the results. Keep in mind that each event will go against your license, and while it may seem small, enough of them add up to a GB.

Search with original text that was replaced earlier

I am gathering performance metrics for each each api that we have. With the below query I get results as
method response_time
Create Billing 2343.2323
index="dev-uw2" logger_name="*Aspect*" message="*ApiImpl*" | rex field=message "PerformanceMetrics - method='(?<method>.*)' execution_time=(?<response_time>.*)" | table method, response_time | replace "public com.xyz.services.billingservice.model.Billing com.xyz.services.billingservice.api.BillingApiImpl.createBilling(java.lang.String)” WITH "Create Billing” IN method
If the user clicks on each api text in table cell to drill down further it will open a new search with "Create Billing" obviosuly it will give zero results since we don't have any log with that string.
I want splunk to search with original text that was replaced earlier.
You can use click.value to get around this.
http://docs.splunk.com/Documentation/SplunkCloud/6.6.3/Viz/tokens