Join two Splunk queries without predefined fields - splunk

I am trying to join 2 splunk queries. However in this case the common string between the 2 queries is not a predefined splunk field and is logged in a different manner. I have created the regex which individually identifies the string but when I try to combine using join, I do not get the result.
I have logs like this -
Logline 1 -
21-04-2019 11:01:02.001 server1 app1 1023456789 1205265352567565 1234567Z-1234-1234-1234-123456789123 Application Completed
Logline 2 -
21-04-2019 11:00:00.000 journey_ends server1 app1 1035625855585989 .....(lots of text) commonID:1234567Z-1234-1234-1234-123456789123 .....(lots of text) status(value) OK
the second Logline can be NOTOK as well
Logline 2 -
21-04-2019 11:00:00.000 journey_ends server1 app1 1035625855585989 .....(lots of text) commonID:1234567Z-1234-1234-1234-123456789123 .....(lots of text) status(value) NOTOK
I have tried multiple things but the best that I can come up with is -
index=test "journey_ends" | rex "status(value) (?<StatusType>[A-Z][A-Z]*)" | rex "commonID\:(?<commonID>[^\t]{37})" | table StatusType, commonID | join type=inner commonID [ search index=test "Application Completed" | rex "^(?:[^\t\n]*\t){7}(?P<commonID>[^\t]+)" | table _time, commonID] | chart count over StatusType by commonID
However the above query does not provide me the stats. In verbose mode, I can just see the events of query 1. Please note that the above 2 queries run correctly individually.
However currently I have to initially run the query to fetch the commonIDs from "Application Completed" logline and then in another query give the list of commonIDs found in the result first query as input and find the status value for each commonId from logline 2.
Expected Result (in a table):
StatusType commonID OK 1234567Z-1234-1234-1234-123456789123 NOTOK 1234567Z-1234-1234-1234-985625623541

Can you try the below query,
index=main
AND "Application Completed"
| rex "(?<common_id>[[:alnum:]]+-[[:alnum:]]+-[[:alnum:]]+-[[:alnum:]]+-[[:alnum:]]+)"
| table _time, common_id
| join type=inner common_id [
search index=main
| rex "status\(value\)\s+(?<status>.+)$"
| rex "(?<common_id>[[:alnum:]]+-[[:alnum:]]+-[[:alnum:]]+-[[:alnum:]]+-[[:alnum:]]+)"
| table status, common_id
]

Related

Regex count capture group members

I have multiple log messages each containing a list of JobIds -
IE -
1. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890039","db7a18ae-ea59-4987-87d5-c80adefa4475"]}`
2. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890040","db7a18ae-ea59-4987-87d5-c80adefa4489"]}`
3. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890070"]}`
I have a rex to get those jobIds. Next I want to count the number of jobIds
My query looks like this -
| rex field=message "\"(?<job_ids>(?:\w+-\w+-\w+-\w+-\w+)+),?\""
| stats count(job_ids)
But this will only give me a count of 3 when I am looking for 5. How can I get a count of all jobIds? I am not sure if this is a splunk limitation or I am missing something in my regex.
Here is my regex - https://regex101.com/r/vqlq5j/1
Also with max-match=0 but with mvcount() instead of mvexpand():
| makeresults count=3 | streamstats count
| eval message=case(count=1, "{\"JobIds\":[\"a1a2a2-b23-b34-d4d4d4\", \"x1a2a2-y23-y34-z4z4z4\"]}", count=2, "{\"JobIds\":[\"a1a9a9-b93-b04-d4d4d4\", \"x1a9a9-y93-y34-z4z4z4\"]}", count=3, "{\"JobIds\":[\"a1a9a9-b93-b04-d14d14d14\"]}")
``` above is test data setup ```
``` below is the actual query ```
| rex field=message max_match=0 "\"(?<id>[\w\d]+\-[\w\d]+\-[\w\d]+\-[\w\d]+\")"
| eval cnt=mvcount(id)
| stats sum(cnt)
In Splunk, to capture multiple matches from a single event, you need to add max_match=0 to your rex, per docs.Splunk
But to get them then separated into a singlevalue field from the [potential] multivalue field job_ids that you made, you need to mvxepand or similar
So this should get you closer:
| rex field=message max_match=0 "\"(?<job_id>(?:\w+-\w+-\w+-\w+-\w+)+),?\""
| mvexpand job_id
| stats dc(job_id)
I also changed from count to dc, as it seems you're looking for a unique count of job IDs, and not just a count of how many in total you've seen
Note: if this is JSON data (and not JSON-inside-JSON) coming into Splunk, and the sourcetype is configured correctly, you shouldn't have to manually extract the multivalue field, as Splunk will do it automatically
Do you have a full set of sample data (a few entire events) you can share?

Splunk left jion is not giving as exepcted

Requirement: I want to find out, payment card information used in a particular day are there any tele sales order placed with the same payment card information.
I tried with below query it is supposed to give me all the payment card information from online orders and matching payment info from telesales. But i am not giving correct results basically results shows there are no telesales for payment information, but when i search splunk i am finding telesales as well. So the query wrong.
index="orders" "Online order received" earliest=-9d latest=-8d
| rex field=message "paymentHashed=(?<payHash>.([a-z0-9_\.-]+))"
| rename timestamp as onlineOrderTime
| table payHash, onlineOrderTime
| join type=left payHash [search index="orders" "Telesale order received" earliest=-20d latest=-5m | rex field=message "paymentHashed=(?<payHash>.([a-z0-9_\.-]+))" | rename timestamp as TeleSaleTime | table payHash, TeleSaleTime]
| table payHash, onlineOrderTime, TeleSaleTime
Please help me in fixing the query or a query to find out results for my requirement.
If you do want to do this with a join, what you had, slightly changed, should be correct:
index="orders" "Online order received" earliest=-9d latest=-8d
| rex field=message "paymentHashed=(?<payHash>.([a-z0-9_\.-]+))"
| stats values(_time) as onlineOrderTime by payHash
| join type=left payHash
[search index="orders" "Telesale order received" earliest=-20d latest=-5m
| rex field=message "paymentHashed=(?<payHash>.([a-z0-9_\.-]+))"
| rename timestamp as TeleSaleTime
| stats values(TeleSaleTime) by payHash ]
| rename timestamp as onlineOrderTime
Note the added | stats values(...) by in the subsearch: you need to ensure you've removed any duplicates from the list, which this will do. By using values(), you'll also ensure if there're repeated entries for the payHash field, they get grouped together. (Similarly, added a | stats values... before the subsearch to speed the whole operation.)
You should be able to do this without a join, too:
index="orders" (("Online order received" earliest=-9d latest=-8d) OR "Telesale order received" earliest=-20d))
| rex field=_raw "(?<order_type>\w+) order received"
| rex field=message "paymentHashed=(?<payHash>.([a-z0-9_\.-]+))"
| stats values(order_type) as order_type values(_time) as orderTimes by payHash
| where mvcount(order_type)>1
After you've ensured your times are correct, you can format them - here's one I use frequently:
| eval onlineOrderTime=strftime(onlineOrderTime,"%c"), TeleSaleTime=strftime(TeleSaleTime,"%c")
You may also need to do further reformatting, but these should get you close
fwiw - I'd wonder why you were trying to look at Online orders from only 9 days ago, but Telesale orders from 20 days ago to now: but that's just me.
The join command expects a list of field names on which events from each search will be matched. If no fields are listed then all fields are used. In the example, the fields 'onlineOrderTime' and 'TeleSaleTime' exist only on one side of the join so no matches can be made. The fix is simple: specify the common field name. ... | join type=left payHash ....
First of all, you can delete the last row | table payHash, onlineOrderTime, TeleSaleTime beacuse it doesn't do anything(the join command already joins both tables you created).
Secondly, when running both queries separately - both queries have the same "payHash"es? both queries return back a table with the true results?
Because by the looks of it, you used the join command correctly...

Query for calculating duration between two different logs in Splunk

As part of my requirements, I have to calculate the duration between two different logs using Splunk query.
For example:
Log 2:
2020-04-22 13:12 ADD request received ID : 123
Log 1 :
2020-04-22 12:12 REMOVE request received ID : 122
The common String between two logs is " request received ID :" and unique strings between two logs are "ADD", "REMOVE". And the expected output duration is 1 hour.
Any help would be appreciated. Thanks
You can use the transaction command, https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Transaction
Assuming you have the field ID extracted, you can do
index=* | transaction ID
This will automatically produce a field called duration, which is the time between the first and last event with the same ID
While transaction will work, it's very inefficient
This stats should show you what you're looking for (presuming the fields are already extracted):
(index=ndxA OR index=ndxB) ID=* ("ADD" OR "REMOVE")
| stats min(_time) as when_added max(_time) as when_removed by ID
| eval when_added=strftime(when_added,"%c"), when_removed(when_removed,"%c")
If you don't already have fields extracted, you'll need to modify thusly (remove the "\D^" in the regex if the ID value isn't at the end of the line):
(index=ndxA OR index=ndxB) ("ADD" OR "REMOVE")
| rex field=_raw "ID \s+:\s+(?<ID>\d+)\D^"
| stats min(_time) as when_added max(_time) as when_removed by ID
| eval when_added=strftime(when_added,"%c"), when_removed(when_removed,"%c")

Splunk query to get non matching ID from two query

Index=* sourcetype="publisher" namespace="app_1" | table ID message | where message="published"
Index=* sourcetype="consumer" namespace="app_1" | table ID message | where message="consumed"
I want to display non matching ID by comparing both the query, how can I achieve this.
If query 1 is giving 100 records and query 2 is giving 90 records and all 90 records are present in query 1 them I want to see 10 records which are not present in query 2.
There are several ways to achieve this outcome.
The following counts the number of consumer and producers events for each ID, then shows the IDs of the events that occur only once.
index=* sourcetype="publisher" OR sourcetype="consumer" namespace="app_1" ID="*" | stats count by ID | where count<2
In this next method, we use a sub search and a join. This has the benefit of giving you the full event, not just the ID.
index=* sourcetype="publisher" namespace="app_1" ID="*" | join type=outer ID [ search index=* sourcetype="consumer" namespace="app_1" ID="*" | eval does_match=1 ] | where isnull(does_match)

How do i optimize the following Splunk query?

I have results like below:
1. DateTime=2019-07-02T16:17:20,913 Thread=[], Message=[Message(userId=124, timestamp=2019-07-02T16:17:10.859Z, notificationType=CREATE, userAccount=UserAccount(firstName=S, lastName=K, emailAddress=abc#xyz.com, status=ACTIVE), originalValues=OriginalValue(emailAddress=null)) Toggle : true]
2. DateTime=2019-07-02T16:18:20,913 Thread=[], Message=[Message(userId=124, timestamp=2019-07-02T16:17:10.859Z, notificationType=CREATE, userAccount=UserAccount(firstName=S, lastName=K, emailAddress=abc#xyz.com, status=ACTIVE), originalValues=OriginalValue(emailAddress=new#xyz.com)) Toggle : true]
3. DateTime=2019-07-02T16:19:20,913 Thread=[], Message=[Message(userId=124, timestamp=2019-07-02T16:17:10.859Z, notificationType=CREATE, userAccount=UserAccount(firstName=S, lastName=K, emailAddress=abc#xyz.com, status=ACTIVE), originalValues=OriginalValue(emailAddress=new#xyz.com)) Toggle : true]
And I am trying to group results where the contents of the entire "Message" field is same and "emailAddress=null" is not contained in the Message.
So in the results above 2 and 3 should be the output.
The following query works fine for me but I need to optimize it further according to the following conditions:
Working Query: index=app sourcetype=appname host=appname* splunk_server_group=us-east-2 | fields Message | search Message= "[Message*" | regex _raw!="emailAddress=null" | stats count(Message) as count by Message | where count > 1
Conditions to optimize
Cannot rex against raw
Message key/value pair needs to be in the main search, not a sub-search
You don't have any subsearches in your current query. A subsearch is a query surrounded by square brackets.
What's wrong with rex against _raw?
Try this:
index=app sourcetype=appname host=appname* splunk_server_group=us-east-2 Message="[Message*"
| fields Message
| regex Message!="emailAddress=null"
| stats count(Message) as count by Message | where count > 1