Regex count capture group members - splunk

I have multiple log messages each containing a list of JobIds -
IE -
1. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890039","db7a18ae-ea59-4987-87d5-c80adefa4475"]}`
2. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890040","db7a18ae-ea59-4987-87d5-c80adefa4489"]}`
3. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890070"]}`
I have a rex to get those jobIds. Next I want to count the number of jobIds
My query looks like this -
| rex field=message "\"(?<job_ids>(?:\w+-\w+-\w+-\w+-\w+)+),?\""
| stats count(job_ids)
But this will only give me a count of 3 when I am looking for 5. How can I get a count of all jobIds? I am not sure if this is a splunk limitation or I am missing something in my regex.
Here is my regex - https://regex101.com/r/vqlq5j/1

Also with max-match=0 but with mvcount() instead of mvexpand():
| makeresults count=3 | streamstats count
| eval message=case(count=1, "{\"JobIds\":[\"a1a2a2-b23-b34-d4d4d4\", \"x1a2a2-y23-y34-z4z4z4\"]}", count=2, "{\"JobIds\":[\"a1a9a9-b93-b04-d4d4d4\", \"x1a9a9-y93-y34-z4z4z4\"]}", count=3, "{\"JobIds\":[\"a1a9a9-b93-b04-d14d14d14\"]}")
``` above is test data setup ```
``` below is the actual query ```
| rex field=message max_match=0 "\"(?<id>[\w\d]+\-[\w\d]+\-[\w\d]+\-[\w\d]+\")"
| eval cnt=mvcount(id)
| stats sum(cnt)

In Splunk, to capture multiple matches from a single event, you need to add max_match=0 to your rex, per docs.Splunk
But to get them then separated into a singlevalue field from the [potential] multivalue field job_ids that you made, you need to mvxepand or similar
So this should get you closer:
| rex field=message max_match=0 "\"(?<job_id>(?:\w+-\w+-\w+-\w+-\w+)+),?\""
| mvexpand job_id
| stats dc(job_id)
I also changed from count to dc, as it seems you're looking for a unique count of job IDs, and not just a count of how many in total you've seen
Note: if this is JSON data (and not JSON-inside-JSON) coming into Splunk, and the sourcetype is configured correctly, you shouldn't have to manually extract the multivalue field, as Splunk will do it automatically
Do you have a full set of sample data (a few entire events) you can share?

Related

Splunk query with conditions of an object

I need a Splunk query to fetch the counts of each field used in my dashboard.
Splunk sample data for each search is like this
timestamp="2022-11-07 02:06:38.427"
loglevel="INFO" pid="1"
thread="http-nio-8080-exec-10"
appname="my-test-app"
URI="/testapp/v1/mytest-app/dashboard-service"
RequestPayload="{\"name\":\"test\",\"number\":\"\"}"
What would a search look like to print a table with the number of times the name and number is used to search data (at a time only either number/name data can be given by user).
Expected output in table format with counts for Name and Number
#Hanuman
Can you please try this? You can change regular expression as per your events and match with JSON data.
YOUR_SEARCH | rex field=_raw "RequestPayload=\"(?<data>.*[}])\""
| spath input=data
|table name number
My Sample Search:
| makeresults | eval _raw="*timestamp=\"2022-11-07 02:06:38.427\" loglevel=\"INFO\" pid=\"1\" thread=\"http-nio-8080-exec-10\" appname=\"my-test-app\" URI=\"/testapp/v1/mytest-app/dashboard-service\" RequestPayload=\"{\"name\":\"test\",\"number\":\"1\"}\"*"
| rex field=_raw "RequestPayload=\"(?<data>.*[}])\""
| spath input=data
|table name number
Screen
Thanks

Splunk Count Specific String in a Field

In Splunk, I need to get the count of events from the below msg field value which matches factType=COMMERCIAL and has filters.
Using the basic Splunk query with wildcard does not work efficiently. Could you please assist
app_name="ABC" cf_space_name=prod msg="*/facts?factType=COMMERCIAL&sourceSystem=ADMIN&sourceOwner=ABC&filters*"
msg: abc.asia - [2021-08-23T00:27:08.152+0000] "GET
/facts?factType=COMMERCIAL&sourceSystem=ADMIN&sourceOwner=ABC&filters=%257B%2522stringMatchFilters%2522:%255B%257B%2522key%2522:%2522BFEESCE((json_data-%253E%253E'isNotSearchable')::boolean,%2520false)%2522,%2522value%2522:%2522false%2522,%2522operator%2522:%2522EQ%2522%257D%255D,%2522multiStringMatchFilters%2522:%255B%257B%2522key%2522:%2522json_data-%253E%253E'id'%2522,%2522values%2522:%255B%25224970111%2522%255D%257D%255D,%2522containmentFilters%2522:%255B%255D,%2522nestedMultiStringMatchFilter%2522:%255B%255D,%2522nestedStringMatchFilters%2522:%255B%255D%257D&sorts=%257B%2522sortOrders%2522:%255B%257B%2522key%2522:%2522id%2522,%2522order%2522:%2522DESC%2522%257D%255D%257D&pagination=null
Try this:
index=ndx sourcetype=srctp msg=*
| rex field=msg "factType=(?<facttype>\w+).(?<params>.+)"
| stats count by facttype params
| fields - count
| search facttype="commercial"
The rex will extract the facttype and any following parameters (note - if the URL is submitted with the arguments in a different order, you'll need to adjust the regular expression)
Then use a | stats count by to bin them together
Lastly, search only where there is both a facttype="commercial" and the URL has additional parameters

Query for calculating duration between two different logs in Splunk

As part of my requirements, I have to calculate the duration between two different logs using Splunk query.
For example:
Log 2:
2020-04-22 13:12 ADD request received ID : 123
Log 1 :
2020-04-22 12:12 REMOVE request received ID : 122
The common String between two logs is " request received ID :" and unique strings between two logs are "ADD", "REMOVE". And the expected output duration is 1 hour.
Any help would be appreciated. Thanks
You can use the transaction command, https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Transaction
Assuming you have the field ID extracted, you can do
index=* | transaction ID
This will automatically produce a field called duration, which is the time between the first and last event with the same ID
While transaction will work, it's very inefficient
This stats should show you what you're looking for (presuming the fields are already extracted):
(index=ndxA OR index=ndxB) ID=* ("ADD" OR "REMOVE")
| stats min(_time) as when_added max(_time) as when_removed by ID
| eval when_added=strftime(when_added,"%c"), when_removed(when_removed,"%c")
If you don't already have fields extracted, you'll need to modify thusly (remove the "\D^" in the regex if the ID value isn't at the end of the line):
(index=ndxA OR index=ndxB) ("ADD" OR "REMOVE")
| rex field=_raw "ID \s+:\s+(?<ID>\d+)\D^"
| stats min(_time) as when_added max(_time) as when_removed by ID
| eval when_added=strftime(when_added,"%c"), when_removed(when_removed,"%c")

How to use rex command to extract two fields and chart the count for both in one search query?

I have a log statement like 2017-06-21 12:53:48,426 INFO transaction.TransactionManager.Info:181 -{"message":{"TransactionStatus":true,"TransactioName":"removeLockedUser-1498029828160"}} .
How can i extract TransactionName and TranscationStatus and print in table form TransactionName and its count.
I tried below query but didn't get any success. It is always giving me 0.
sourcetype=10.240.204.69 "TransactionStatus" | rex field=_raw ".TransactionStatus (?.)" |stats count((status=true)) as success_count
Solved it with this :
| makeresults
| eval _raw="2017-06-21 12:53:48,426 INFO transaction.TransactionManager.Info:181 -{\"message\":{\"TransactionStatus\":true,\"TransactioName\":\"removeLockedUser-1498029828160\"}}"
| rename COMMENT AS "Everything above generates sample event data; everything below is your solution"
| rex "{\"TransactionStatus\":(?[^,]),\"TransactioName\":\"(?[^\"])\""
| chart count OVER TransactioName BY TransactionStatus

Adding dedup _raw before timechart returns 0 results

I apologize if this is asked already but I search to no avail.
When writing a Splunk query that will eventually be used for summary indexing using sitimechart, I have this query:
index=app sourcetype=<removed> host=<removed> earliest=-10d
| eval Success_Count=if(scs=="True",1,0)
| eval Failure_Count=if(scs=="False",1,0)
| timechart span=1d sum(Success_Count) as SuccessCount sum(Failure_Count) as FailureCount count as TotalCount by host
Results are as expected. However, some data was accidentally indexed twice, so I need to remove duplicates. If I'm doing a regular search, I just use | dedup _raw to remove the identical events. However, if I run the following query, I get zero results returned (no matter where I put | dedup _raw):
index=app sourcetype=<removed> host=<removed> earliest=-10d
| dedup _raw
| eval Success_Count=if(scs=="True",1,0)
| eval Failure_Count=if(scs=="False",1,0)
| timechart span=1d sum(Success_Count) as SuccessCount sum(Failure_Count) as FailureCount count as TotalCount by host
What am I doing wrong? I'm using Splunk 4.3.2.
I've also posted this identical question to the Splunk>Answers page. I will remove if that is against terms.