SPLUNK use result from first search in second search - splunk

Say I have a query such as
index="example" source="example.log" host="example" "ERROR 1234"
| stats distinct_count by id
This will give me all the events with that error code per id.
I then want to combine this query to search the same log file for another string but only on the unique id's returned from the first search. Because the new string will appear on a separate event I can't just do an 'AND'.

There are a few ways to do that, including using subsearches, join, or append, but those require multiple passes through the data. Here is a way that makes a single pass through the index.
index=example source="example.log" ("ERROR 1234" OR "ERROR 5678")
``` Check for the presence of each string in the event ```
| eval string1=if(searchmatch("ERROR 1234"), 1, 0)
| eval string2=if(searchmatch("ERROR 5678"), 1, 0)
``` Count string occurrences by id ```
| stats sum(string1) as string1, sum(string2) as string2 by id
``` Keep only the ids that have both strings ```
| where (string1 > 0 AND string2 > 0)

You can search for "some other string" in subsearch and then join the queries on the id:
index="example" source="example.log" host="example" "ERROR 1234"
| join id [search index="example" source="example.log" host="example" "some other string" ]
| stats distinct_count by id

Presuming your id field is the same and available in both indices, this form should work:
(index=ndxA sourcetype=srctpA id=* source=example.log host=example "ERROR 1234") OR (index=ndxB sourcetype=srctpB id=* "some other string")
| rex field=_raw "(?<first_field>ERROR 1234)"
| rex field=_raw "(?<second_field>some other string)"
| fillnull value="-" first_field second_field
| stats count by id first_string second_string
| search NOT (first_string="-" OR second_string="-")
If your id field has a different name in the other index, do a rename like this before the stats line:
| rename otherIdFieldName as id
Advantages of this format:
you are not limited by subsearch constraints (search must finish in 60 seconds, no more than 50k rows)
the Search Peers (ie Indexers) will handle all of the overhead instead of having to wait on the Search Head that initiated the search to do lots of post-processing (all the SH is doing is sending the distributed search, then a post-stats filter to ensure both first_string and second_string have the values you are looking for)

Related

Regex count capture group members

I have multiple log messages each containing a list of JobIds -
IE -
1. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890039","db7a18ae-ea59-4987-87d5-c80adefa4475"]}`
2. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890040","db7a18ae-ea59-4987-87d5-c80adefa4489"]}`
3. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890070"]}`
I have a rex to get those jobIds. Next I want to count the number of jobIds
My query looks like this -
| rex field=message "\"(?<job_ids>(?:\w+-\w+-\w+-\w+-\w+)+),?\""
| stats count(job_ids)
But this will only give me a count of 3 when I am looking for 5. How can I get a count of all jobIds? I am not sure if this is a splunk limitation or I am missing something in my regex.
Here is my regex - https://regex101.com/r/vqlq5j/1
Also with max-match=0 but with mvcount() instead of mvexpand():
| makeresults count=3 | streamstats count
| eval message=case(count=1, "{\"JobIds\":[\"a1a2a2-b23-b34-d4d4d4\", \"x1a2a2-y23-y34-z4z4z4\"]}", count=2, "{\"JobIds\":[\"a1a9a9-b93-b04-d4d4d4\", \"x1a9a9-y93-y34-z4z4z4\"]}", count=3, "{\"JobIds\":[\"a1a9a9-b93-b04-d14d14d14\"]}")
``` above is test data setup ```
``` below is the actual query ```
| rex field=message max_match=0 "\"(?<id>[\w\d]+\-[\w\d]+\-[\w\d]+\-[\w\d]+\")"
| eval cnt=mvcount(id)
| stats sum(cnt)
In Splunk, to capture multiple matches from a single event, you need to add max_match=0 to your rex, per docs.Splunk
But to get them then separated into a singlevalue field from the [potential] multivalue field job_ids that you made, you need to mvxepand or similar
So this should get you closer:
| rex field=message max_match=0 "\"(?<job_id>(?:\w+-\w+-\w+-\w+-\w+)+),?\""
| mvexpand job_id
| stats dc(job_id)
I also changed from count to dc, as it seems you're looking for a unique count of job IDs, and not just a count of how many in total you've seen
Note: if this is JSON data (and not JSON-inside-JSON) coming into Splunk, and the sourcetype is configured correctly, you shouldn't have to manually extract the multivalue field, as Splunk will do it automatically
Do you have a full set of sample data (a few entire events) you can share?

Splunk query with conditions of an object

I need a Splunk query to fetch the counts of each field used in my dashboard.
Splunk sample data for each search is like this
timestamp="2022-11-07 02:06:38.427"
loglevel="INFO" pid="1"
thread="http-nio-8080-exec-10"
appname="my-test-app"
URI="/testapp/v1/mytest-app/dashboard-service"
RequestPayload="{\"name\":\"test\",\"number\":\"\"}"
What would a search look like to print a table with the number of times the name and number is used to search data (at a time only either number/name data can be given by user).
Expected output in table format with counts for Name and Number
#Hanuman
Can you please try this? You can change regular expression as per your events and match with JSON data.
YOUR_SEARCH | rex field=_raw "RequestPayload=\"(?<data>.*[}])\""
| spath input=data
|table name number
My Sample Search:
| makeresults | eval _raw="*timestamp=\"2022-11-07 02:06:38.427\" loglevel=\"INFO\" pid=\"1\" thread=\"http-nio-8080-exec-10\" appname=\"my-test-app\" URI=\"/testapp/v1/mytest-app/dashboard-service\" RequestPayload=\"{\"name\":\"test\",\"number\":\"1\"}\"*"
| rex field=_raw "RequestPayload=\"(?<data>.*[}])\""
| spath input=data
|table name number
Screen
Thanks

Query for calculating duration between two different logs in Splunk

As part of my requirements, I have to calculate the duration between two different logs using Splunk query.
For example:
Log 2:
2020-04-22 13:12 ADD request received ID : 123
Log 1 :
2020-04-22 12:12 REMOVE request received ID : 122
The common String between two logs is " request received ID :" and unique strings between two logs are "ADD", "REMOVE". And the expected output duration is 1 hour.
Any help would be appreciated. Thanks
You can use the transaction command, https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Transaction
Assuming you have the field ID extracted, you can do
index=* | transaction ID
This will automatically produce a field called duration, which is the time between the first and last event with the same ID
While transaction will work, it's very inefficient
This stats should show you what you're looking for (presuming the fields are already extracted):
(index=ndxA OR index=ndxB) ID=* ("ADD" OR "REMOVE")
| stats min(_time) as when_added max(_time) as when_removed by ID
| eval when_added=strftime(when_added,"%c"), when_removed(when_removed,"%c")
If you don't already have fields extracted, you'll need to modify thusly (remove the "\D^" in the regex if the ID value isn't at the end of the line):
(index=ndxA OR index=ndxB) ("ADD" OR "REMOVE")
| rex field=_raw "ID \s+:\s+(?<ID>\d+)\D^"
| stats min(_time) as when_added max(_time) as when_removed by ID
| eval when_added=strftime(when_added,"%c"), when_removed(when_removed,"%c")

How do i optimize the following Splunk query?

I have results like below:
1. DateTime=2019-07-02T16:17:20,913 Thread=[], Message=[Message(userId=124, timestamp=2019-07-02T16:17:10.859Z, notificationType=CREATE, userAccount=UserAccount(firstName=S, lastName=K, emailAddress=abc#xyz.com, status=ACTIVE), originalValues=OriginalValue(emailAddress=null)) Toggle : true]
2. DateTime=2019-07-02T16:18:20,913 Thread=[], Message=[Message(userId=124, timestamp=2019-07-02T16:17:10.859Z, notificationType=CREATE, userAccount=UserAccount(firstName=S, lastName=K, emailAddress=abc#xyz.com, status=ACTIVE), originalValues=OriginalValue(emailAddress=new#xyz.com)) Toggle : true]
3. DateTime=2019-07-02T16:19:20,913 Thread=[], Message=[Message(userId=124, timestamp=2019-07-02T16:17:10.859Z, notificationType=CREATE, userAccount=UserAccount(firstName=S, lastName=K, emailAddress=abc#xyz.com, status=ACTIVE), originalValues=OriginalValue(emailAddress=new#xyz.com)) Toggle : true]
And I am trying to group results where the contents of the entire "Message" field is same and "emailAddress=null" is not contained in the Message.
So in the results above 2 and 3 should be the output.
The following query works fine for me but I need to optimize it further according to the following conditions:
Working Query: index=app sourcetype=appname host=appname* splunk_server_group=us-east-2 | fields Message | search Message= "[Message*" | regex _raw!="emailAddress=null" | stats count(Message) as count by Message | where count > 1
Conditions to optimize
Cannot rex against raw
Message key/value pair needs to be in the main search, not a sub-search
You don't have any subsearches in your current query. A subsearch is a query surrounded by square brackets.
What's wrong with rex against _raw?
Try this:
index=app sourcetype=appname host=appname* splunk_server_group=us-east-2 Message="[Message*"
| fields Message
| regex Message!="emailAddress=null"
| stats count(Message) as count by Message | where count > 1

Splunk match partial result value of field and compare results

I have 3 fields in my splunk result like message, id and docId.
Need to group the results by id and doc id which has specific messages
message="successfully added" id=1234 docId =1345
message="removed someUniqueId" id=1234 docId =1345
I have to group based on the results by both id's which has the specific message
search query | rex "message=(?<message[\S\s]*>)" | where message="successfully added"
which is giving result for the first search, when i tried to search for second search query which is not giving result due to the someUniqueId"
search query | rex "message=(?<message[\S\s]*>)" | where match(message, "removed *")
Could you pelase help me to filter the results which has the 2 messages and group by id and docID
The match function expects a regular expression, not a pattern, as the second argument. Try search query | rex "message=(?<message>[\S\s]*)" | where match(message, "removed .*").
BTW, the regex strings in the rex commands are invalid, but that may be a typing error in the question.