Filtering duplicate entries from Splunk events - splunk

I am new to splunk and have got some splunk events as below
2019-06-26 23:45:36 INFO ID 123456 | Response Code 404
2019-06-26 23:55:36 INFO ID 123456 | Response Code 404
2019-06-26 23:23:36 INFO ID 258080 | Response Code 404
Is there way to filter out the first two events as they have the same ID 123456 and view them as one event?
I tried something which I know is completely wrong, suggestions might be very useful on this.
index=myindex "Response Code 404" | rex field=ID max_match=2 "(?<MyID>\b(?:123456)\b)" | stats count by ID MyID | where count > 1

That's not completely wrong. It's one of the legitimate ways to remove duplicates. Here's another:
index=myindex "Response Code 404"
| rex field=ID max_match=2 "(?<MyID>\b(?:123456)\b)"
| dedup MyID
Using dedup is often preferred because it doesn't remove fields the way stats does.

I know it's late reply, but those really aren't duplicate events if the timestamps are different. I'd be more concerned about finding out what machine is sending an event twice at different times (and why than eliminating the results. Keep in mind that each event will go against your license, and while it may seem small, enough of them add up to a GB.

Related

How to find time duration between two splunk events which has unique key

First Event
17:09:05:362 INFO com.a.b.App - Making a GET Request and req-id: [123456]
Second Event
17:09:06:480 INFO com.a.b.App - Output Status Code: 200 req-id:"123456"
I tried to use index="xyz" container="service-name" | transaction "req-id" startswith="Making a GET Request" endswith="Output Status Code" | table duration but it is also not working.
I want to calculate duration of above two events for every request. I went over some solutions in splunk and Stack Overflow, but still can't get the proper result.
Try doing it with stats instead:
index=ndx sourcetype=srctp
| rex field=_raw "req\-id\D+(?<req_id>\d+)"
| rex field=_raw "(?<sequence>Making a GET Request)"
| rex field=_raw "(?<sequence>Output Status Code)"
| eval sequence=sequence+";"+_time
| stats values(sequence) as sequence by req_id
| mvexpand sequence
| rex field=sequence "(?<sequence>[^;]+);(?<time>\d+)"
| eval time=strftime(time,"%c")
This will extract the "req-id" into a field named req_id, and the start and end of the sequence into a field named sequence
Presuming the sample data you shared is correct, when you stats values(sequence) as sequence, it will put the "Making..." entry first and the "Output..." entry second
Because values() will do this, when you mvexpand and then split the values()'d field part into sequence and time, they'll be in the proper order
If the sample data is incomplete, you may need to tweak the regexes for populating sequence
It’s seem you’re going with my previously suggested approach 😉
Now you have 2 possibilities
1. SPL
Below the simplest query, only invoking 1 rex and assuming _time field correctly filled
index=<your_index> source=<your_source>
("*Making a GET Request*" OR "*Output Status Code*")
| rex field=_raw "req\-id\D+(?<req_id>\d+)"
| stats max(_time) as end, min(_time) as start by id
| eval duration = end - start
| table id duration
Note that depending the amount of data to scan, this one can be ressources consuming for your Splunk cluster
2. Log the response time directly in API (more efficient)
It seem you are working on an API. You must have capabilities to get the response time of each call and directly trace it in your log
Then you can exploit it easily in SPL without calculation
It always preferable to persist data at index time vs. operate systematic calculation at search time

Splunk searching event logs to find values exceeding a given threshold

I want to search the log event
"Closure request counts: startAssets: "
and find occurrences where the startAssets are larger than 50.
How would I do that?
Something like:
Closure request counts: startAssets: 51
would maybe give a search similar to
"Closure request counts: startAssets: {num} AND num >=50"
perhaps?
What does that look like in SPL?
That's pretty simple, but you'll need to extract the number to do it. I like to use the rex command to do that, but there may be other ways.
index=foo "Closure request counts: startAssets: *"
| rex "startAssets: (?<startAssets>\d+)"
| where startAssets > 50

Splunk: search a string, if found only then look for another log with same request-id

I want to find a string (driving factor) and if found, only then look for another string with same x-request-id and extract some details out of it.
x-request-id=12345 "InterestingField=7850373" [this one is subset of very specific request]
x-request-id=12345 "veryCommonField=56789" [this one is a superSet of all kind of requests]
What I've tried:
index=myindex "InterestingField" OR "veryCommonField"
| transition x-request-id
But problem with above is this query join all those request as well which has only veryCommonField in it.
I want to avoid join as they are pretty low in performance.
What I need:
list InterestingField, veryCommonField
Example:
Below represents beginning of all kind of request. We get thousands of such request in a day.
index=myIndex xrid=12345 "Request received for this. field1: 123 field2: test"
Out of all above request below category falls under 100.
index=myIndex xrid=12345 "I belong to blahBlah category. field3: 67583, field4: testing"
I don't want to search in a super-set of 1000k+ but only in matching 100 requests. Because with increased time span, this search query will take very long.
If I'm understanding your use-case, the following may be helpful.
Using stats
index=myindex "InterestingField" OR "veryCommonField" | stats values(InterestingField), values(veryCommonField) by x-request-id
Using subsearch
index=myindex [ index=myindex InterestingField=* | fields x-request-id | format ]
Depending on the number of results that match InterestingField, you can also use map, https://docs.splunk.com/Documentation/Splunk/8.0.3/SearchReference/Map
index=myindex InterestingField="*" | map maxsearches=0 "search index=myindex x-request-id=$x-request-id$ | stats values(InterestingField), values(veryCommonField) by x-request-id"
If you provide more thorough example events, we can assist you further.

Splunk: Find the difference between 2 events

I have a server with 2 APIs: /migrate/start and /migrate/end
For each request, I log the userID (field usrid="") of the user using my service to be migrated and the api called (field api="").
Users call /migrate/start, then call /migrate/end. I would like to write a slunk query to list the userIDs that are being migrated, i.e. those that called /migrated/start but have yet to call /migrate/end. How would I write that query?
Thank you
Assuming you have only 2 api calls (start/end) in the logs, you can use a stats command to do this.
| your_search
| stats values(api) as api by usrid
| where api!="/migrate/end"
This clubs all api calls done per user and removes the ones which have called /migrate/end
The general method is to get all the start and end events and match them up by user ID. Take the most recent event for each user and throw out the ones that are "migrate/end". What's left are all the in-progress migrations. Something like this:
index = foo (api="/migrate/start" OR api="/migrate/end")
| stats latest(api) by usrid
| where api="/migrate/start"

Search with original text that was replaced earlier

I am gathering performance metrics for each each api that we have. With the below query I get results as
method response_time
Create Billing 2343.2323
index="dev-uw2" logger_name="*Aspect*" message="*ApiImpl*" | rex field=message "PerformanceMetrics - method='(?<method>.*)' execution_time=(?<response_time>.*)" | table method, response_time | replace "public com.xyz.services.billingservice.model.Billing com.xyz.services.billingservice.api.BillingApiImpl.createBilling(java.lang.String)” WITH "Create Billing” IN method
If the user clicks on each api text in table cell to drill down further it will open a new search with "Create Billing" obviosuly it will give zero results since we don't have any log with that string.
I want splunk to search with original text that was replaced earlier.
You can use click.value to get around this.
http://docs.splunk.com/Documentation/SplunkCloud/6.6.3/Viz/tokens