How to get two fields using rex from log file? - splunk

I'm newbie with Splunk. My goal is take two or more fields from logs. I must check if one field is true and so use another field to make a counter. The counter is about how many requests is make by client using user-agent attribute.
My logic desired:
int count1, count2;
count1 = 0;
count2 = 0;
if (GW == true) {
if (UA == "user-agent1") count1++;
if (UA == "user-agent2") count2++;
}
At the moment I can get just one field and make a counter without if-condition.
This query works fine, and return the correct requests counter:
source="logfile.log" | rex "UA=(?<ua>\w+)" | stats count(eval(ua="user-agent1")) as USER-AGENT1
But, when I try get the second field (GW) to make the logic, the query returns 0.
source="logsfile.log" | rex "UA=(?<ua>\w+) GW=(?<gw>\w+)" |stats count(eval(ua="user-agent1")) as USER-AGENT1
So, how I get more fields and how make if-condition on query?
Sample log:
2020-01-10 14:38:44,539 INFO [http-nio-8080-exec-8] class:ControllerV1, UA=user-agent1, GW=true
2020-01-10 14:23:51,818 INFO [http-nio-8080-exec-3] class:ControllerV1, UA=user-agent2, GW=true

It will be something like this:
source="logsfile.log" UA GW
| rex "UA=(?<ua>\w+), GW=(?<gw>\w+)"
| stats count(eval(gw="true" AND ua="user-agent1")) as AGENT1,
count(eval(gw="true" AND ua="user-agent2")) as AGENT2
If, for example, you do not know the order of variables or you have more than 2, you can use separate rex statements:
source="logsfile.log" UA GW
| rex "UA=(?<ua>\w+)"
| rex "GW=(?<gw>\w+)"
| stats count(eval(gw="true" AND ua="user-agent1")) as AGENT1,
count(eval(gw="true" AND ua="user-agent2")) as AGENT2
This could be a bit slower since _raw will be parsed twice.

Related

Regex count capture group members

I have multiple log messages each containing a list of JobIds -
IE -
1. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890039","db7a18ae-ea59-4987-87d5-c80adefa4475"]}`
2. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890040","db7a18ae-ea59-4987-87d5-c80adefa4489"]}`
3. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890070"]}`
I have a rex to get those jobIds. Next I want to count the number of jobIds
My query looks like this -
| rex field=message "\"(?<job_ids>(?:\w+-\w+-\w+-\w+-\w+)+),?\""
| stats count(job_ids)
But this will only give me a count of 3 when I am looking for 5. How can I get a count of all jobIds? I am not sure if this is a splunk limitation or I am missing something in my regex.
Here is my regex - https://regex101.com/r/vqlq5j/1
Also with max-match=0 but with mvcount() instead of mvexpand():
| makeresults count=3 | streamstats count
| eval message=case(count=1, "{\"JobIds\":[\"a1a2a2-b23-b34-d4d4d4\", \"x1a2a2-y23-y34-z4z4z4\"]}", count=2, "{\"JobIds\":[\"a1a9a9-b93-b04-d4d4d4\", \"x1a9a9-y93-y34-z4z4z4\"]}", count=3, "{\"JobIds\":[\"a1a9a9-b93-b04-d14d14d14\"]}")
``` above is test data setup ```
``` below is the actual query ```
| rex field=message max_match=0 "\"(?<id>[\w\d]+\-[\w\d]+\-[\w\d]+\-[\w\d]+\")"
| eval cnt=mvcount(id)
| stats sum(cnt)
In Splunk, to capture multiple matches from a single event, you need to add max_match=0 to your rex, per docs.Splunk
But to get them then separated into a singlevalue field from the [potential] multivalue field job_ids that you made, you need to mvxepand or similar
So this should get you closer:
| rex field=message max_match=0 "\"(?<job_id>(?:\w+-\w+-\w+-\w+-\w+)+),?\""
| mvexpand job_id
| stats dc(job_id)
I also changed from count to dc, as it seems you're looking for a unique count of job IDs, and not just a count of how many in total you've seen
Note: if this is JSON data (and not JSON-inside-JSON) coming into Splunk, and the sourcetype is configured correctly, you shouldn't have to manually extract the multivalue field, as Splunk will do it automatically
Do you have a full set of sample data (a few entire events) you can share?

Merge url with parameters into 1 in Splunk

I am creating a dashboard for our service. And I want to create metrics for url requests.
Lets say have a similar url like this one:
/api/v1/users/{userId}/settings
And I have following query in Splunk
url=*/api/v1/users/*/settings
| stats avg(timeTaken) as avg_latency, p99(timeTaken) as "p99(ms)", perc75(timeTaken) as "p75(ms)", count as total_requests, count(eval(responseStatus=500)) as failed_requests by url
| eval "success_rate"=round((total_requests - failed_requests)/total_requests*100,2)
| eval avg = round(avg)
| sort success_rate
All I want is to have a table with one common url showing all the metrics. But instead, I get a table with a list of all urls with different parameters.
You want to create a field which is the URL minus the UserId part, And therefore the stats will be grouped by which url is called.
You can do this by using split(url,"/") to make a mv field of the url, and take out the UserId by one of two ways depending on the URLs.
Mvfilter: Eg: mvfilter(eval(x!=userId))
Or created a new mvfield with the userId removed by it's index in the mvfield using this: Add/Edit/Delete mvfield
Instead of removing you could also choose to replace the UserId with "{userId}", so long as you do the same for all Urls.
And then you can rejoin the url using mvjoin(url,"/")
I hope I understood your question correctly and this helps you!
You could try doing a replace() on your URL field with eval before calling stats:
| eval url=replace(url,"\/\d+\/settings","/settings")
If it turns out the userid is important to hold onto, pull it into its own field prior to running replace():
| rex field=url "\/(?<userid>\d+)\/settings"
expansion for comment
For multiple possible endings of your URL, try something like this:
index=ndx sourcetype=srctp URL IN("*/api/v1/users/*/settings","*/api/v1/users/*/logout","*/api/v1/users/*/profile")
| rex field=url "\/(?<url_type>\w+)$"
| eval url=replace(url,"\/\d+\/\w+$","")
| stats avg(timeTaken) as avg_latency, p99(timeTaken) as "p99(ms)", perc75(timeTaken) as "p75(ms)", count as total_requests, count(eval(responseStatus=500)) as failed_requests by url type
| eval "success_rate"=round((total_requests - failed_requests)/total_requests*100,2)
| eval avg = round(avg)
| sort success_rate
This will extract the "type" (logout, profile, settings) into a new field, then cleanup the URL by removing everything from userid to the end

Extract data from splunk

I have a Post query where I want to extract request payload or parameters and print a table. In the query, I am trying to extract the user_search name field
I have written a Splunk query but it is not working for me
"Parameters: {\"user_search\"=>{\"name\"=>*" | rex field=_raw "/\"user_search\"=>{\"name\"=>/(?<result>.*)" | table result
Splunk Data
I, [2021-09-23T00:46:31.172197 #44154] INFO -- : [651235bf-7ad5-4a2e-a3b8-7737a3af9fc3] Parameters: {"user_search"=>{"name"=>"aniket", "has_primary_phone"=>"false", "query_params"=>{"searchString"=>"", "start"=>"0", "filters"=>[""]}}}
host = qa-1132-lx02source = /src/project.logsourcetype = data:log
I, [2021-09-23T00:48:31.162197 #44154] INFO -- : [651235bf-7ad5-4a2e-a3b8-7737a3af9fc3] Parameters: {"user_search"=>{"name"=>"shivam", "has_primary_phone"=>"false", "query_params"=>{"searchString"=>"", "start"=>"0", "filters"=>[""]}}}
host = qa-1132-lx02source = /src/project.logsourcetype = data:log
I, [2021-09-23T00:52:27.171197 #44154] INFO -- : [651235bf-7ad5-4a2e-a3b8-7737a3af9fc3] Parameters: {"user_search"=>{"name"=>"tiwari", "has_primary_phone"=>"false", "query_params"=>{"searchString"=>"", "start"=>"0", "filters"=>[""]}}}
host = qa-1132-lx02source = /src/project.logsourcetype = data:log
I have 2 questions
How to write a splunk query to extract request payload in post query
In my above query I am not not sure what I am doing wrong. I would really appreciate if someone has any suggestion.
At the least, your regular expression has an error
You have:
"/\"user_search\"=>{\"name\"=>/(?<result>.*)"
There is an extra "/" after the "=>"
This seems to pull what you're looking for:
user_search\"=>{\"name\"=>(?<result>.*)
Edit per comment "I only want to fetch the values such as aniket & shivam from the name key"
There're a couple ways to do what you're asking, and which is going to be mroe performant will depend on your environment and data
Option 1
index=ndx sourcetype=srctp ("aniket" OR "shivam")
| rex field=_raw "user_search\"=>{\"name\"=>(?<result>.*)"
| stats count by result
Option 2
index=ndx sourcetype=srctp
| rex field=_raw "user_search\"=>{\"name\"=>(?<result>.*)"
| search result="aniket" OR result="shivam"
| stats count by result

Take output from query and use in subsequent KQL query

I'm using Azure Log Analytics to review certain events of interest.
I would like to obtain timestamps from data that meets a certain criteria, and then reuse these timestamps in further queries, i.e. to see what else occurred around these times.
The following query returns the desired results, but I'm stuck at how to use the interestingTimes var to then perform further searches and show data within X minutes of each previously returned timestamp.
let interestingTimes =
Event
| where TimeGenerated between (datetime(2021-04-01T11:57:22) .. datetime('2021-04-01T15:00:00'))
| where EventID == 1
| parse EventData with * '<Data Name="Image">' ImageName "<" *
| where ImageName contains "MicrosoftEdge.exe"
| project TimeGenerated
;
Any pointers would be greatly appreciated.
interestingTimes will only be available for use in the query where you declare it. You can't use it in another query, unless you define it there as well.
By the way, you can make your query much more efficient by adding a filter that will utilize the built-in index for the EventData column, so that the parse operator will run on a much smaller amount of records:
let interestingTimes =
Event
| where TimeGenerated between (datetime(2021-04-01T11:57:22) .. datetime('2021-04-01T15:00:00'))
| where EventID == 1
| where EventData has "MicrosoftEdge.exe" // <-- OPTIMIZATION that will filter out most records
| parse EventData with * '<Data Name="Image">' ImageName "<" *
| where ImageName contains "MicrosoftEdge.exe"
| project TimeGenerated
;

Stats Count Splunk Query

I wonder whether someone can help me please.
I'd made the following post about Splunk query I'm trying to write:
https://answers.splunk.com/answers/724223/in-a-table-powered-by-a-stats-count-search-can-you.html
I received some great help, but despite working on this for a few days now concentrating on using eval if statements, I still have the same issue with the "Successful" and "Unsuccessful" columns showing blank results. So I thought I'd cast the net a little wider and ask please whether someone maybe able to look at this and offer some guidance on how I may get around the problem.
Many thanks and kind regards
Chris
I tried exploring your use-case with splunkd-access log and came up with a simple SPL to help you.
In this query I am actually joining the output of 2 searches which aggregate the required results (Not concerned about the search performance).
Give it a try. If you've access to _internal index, this will work as is. You should be able to easily modify this to suit your events (eg: replace user with ClientID).
index=_internal source="/opt/splunk/var/log/splunk/splunkd_access.log"
| stats count as All sum(eval(if(status <= 303,1,0))) as Successful sum(eval(if(status > 303,1,0))) as Unsuccessful by user
| join user type=left
[ search index=_internal source="/opt/splunk/var/log/splunk/splunkd_access.log"
| chart count BY user status ]
I updated your search from splunk community answers (should look like this):
w2_wmf(RequestCompleted)`request.detail.Context="*test"
| dedup eventId
| rename request.ClientID as ClientID detail.statusCode AS statusCode
| stats count as All sum(eval(if(statusCode <= 303,1,0))) as Successful sum(eval(if(statusCode > 303,1,0))) as Unsuccessful by ClientID
| join ClientID type=left
[ search w2_wmf(RequestCompleted)`request.detail.Context="*test"
| dedup eventId
| rename request.ClientID as ClientID detail.statusCode AS statusCode
| chart count BY ClientID statusCode ]
I answered in Splunk
https://answers.splunk.com/answers/724223/in-a-table-powered-by-a-stats-count-search-can-you.html?childToView=729492#answer-729492
but using dummy encoding, it looks like
w2_wmf(RequestCompleted)`request.detail.Context="*test"
| dedup eventId
| rename request.ClientId as ClientID, detail.statusCode as Status
| eval X_{Status}=1
| stats count as Total sum(X_*) as X_* by ClientID
| rename X_* as *
Will give you ClientID, count and then a column for each status code found, with a sum of each code in that column.
As I gather you can't get this working, this query should show dummy encoding in action
`index=_internal sourcetype=*access
| eval X_{status}=1
| stats count as Total sum(X_*) as X_* by source, user
| rename X_* as *`
This would give an output of something like