Splunk - How to find the first appearance of queries - splunk

I am trying to filter events in Splunk that contain a unique field (payload.procName) that have not been seen before today. Specifically, I am looking for events that contain the payload.procName field that are appearing for the first time today. How can I filter these events to only show the unique payload.procName values that have been seen today but never seen before?
I've try this query :
tags.appInstance=your_index earliest=-1d latest= now() payload.procName NOT in
[| search tags.appInstance= your_index earliest=-1mon#mon latest=-1d table payload.procName
| dedup payload.procName ]
| table payload.procName
| dedup payload.procName

You have the general format for the query. See if this helps. It removes the IN keyword (incorrectly used as in) because the subsearch does not return results compatible with that operator. It also uses the format command to explicitly format the results. Run the subsearch by itself to see what I mean.
tags.appInstance=your_index earliest=-1d latest= now() payload.procName=* NOT
[search tags.appInstance= your_index earliest=-1mon#mon latest=-1d payload.procName=*
| fields payload.procName
| dedup payload.procName
| format ]
| dedup payload.procName
| table payload.procName

Related

Regex count capture group members

I have multiple log messages each containing a list of JobIds -
IE -
1. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890039","db7a18ae-ea59-4987-87d5-c80adefa4475"]}`
2. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890040","db7a18ae-ea59-4987-87d5-c80adefa4489"]}`
3. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890070"]}`
I have a rex to get those jobIds. Next I want to count the number of jobIds
My query looks like this -
| rex field=message "\"(?<job_ids>(?:\w+-\w+-\w+-\w+-\w+)+),?\""
| stats count(job_ids)
But this will only give me a count of 3 when I am looking for 5. How can I get a count of all jobIds? I am not sure if this is a splunk limitation or I am missing something in my regex.
Here is my regex - https://regex101.com/r/vqlq5j/1
Also with max-match=0 but with mvcount() instead of mvexpand():
| makeresults count=3 | streamstats count
| eval message=case(count=1, "{\"JobIds\":[\"a1a2a2-b23-b34-d4d4d4\", \"x1a2a2-y23-y34-z4z4z4\"]}", count=2, "{\"JobIds\":[\"a1a9a9-b93-b04-d4d4d4\", \"x1a9a9-y93-y34-z4z4z4\"]}", count=3, "{\"JobIds\":[\"a1a9a9-b93-b04-d14d14d14\"]}")
``` above is test data setup ```
``` below is the actual query ```
| rex field=message max_match=0 "\"(?<id>[\w\d]+\-[\w\d]+\-[\w\d]+\-[\w\d]+\")"
| eval cnt=mvcount(id)
| stats sum(cnt)
In Splunk, to capture multiple matches from a single event, you need to add max_match=0 to your rex, per docs.Splunk
But to get them then separated into a singlevalue field from the [potential] multivalue field job_ids that you made, you need to mvxepand or similar
So this should get you closer:
| rex field=message max_match=0 "\"(?<job_id>(?:\w+-\w+-\w+-\w+-\w+)+),?\""
| mvexpand job_id
| stats dc(job_id)
I also changed from count to dc, as it seems you're looking for a unique count of job IDs, and not just a count of how many in total you've seen
Note: if this is JSON data (and not JSON-inside-JSON) coming into Splunk, and the sourcetype is configured correctly, you shouldn't have to manually extract the multivalue field, as Splunk will do it automatically
Do you have a full set of sample data (a few entire events) you can share?

Splunk - I want to add a value from stats count() to a value from a lookup table and show that value in a table

The objective of the query im trying to write is to take a count of raw data from the previous month and add that to a count from a lookup table (.csv)
What I have attempted to do is…
index=*** source=***
| stats count(_raw) as monthCount
| join
[ | inputlookup Log_Count_YTD.csv]
| eval countYTD = toNumber(monthCount) + toNumber(TOTAL_COUNT_YTD)
| table countYTD
This query doesn’t return any value on a table. The TOTAL_COUNT_YTD is the only field from the inputlookup file. Let me know if there is any other information you need to help me out with this one. Thanks!
The stats command transforms the data so it has only 1 field: monthCount. The inputlookup returns only the TOTAL_COUNT_YTD field. The join command works by comparing values of common fields between the main search and the subsearch. Since there are no common fields no events are joined.
There is no need for join in this case. The appendcols command will do, assuming the CSV contains a single field in a single row.
index=*** source=***
| stats count() as monthCount
| appendcols
[ | inputlookup Log_Count_YTD.csv]
| eval countYTD = toNumber(monthCount) + toNumber(TOTAL_COUNT_YTD)
| table countYTD
FWIW, the tonumber function is unnecessary, but doesn't hurt.

Filtering out holidays in Splunk

I am attempting to use a search lookup table csv to filter out holidays for some Splunk queries.
To do this, I created a holidays.csv in the following style:
dateof,dateafter,description
01/17/2022,01/18/2022,MLK Day 22
02/21/2022,02/22/2022,Presidents Day 22
05/30/2022,05/31/2022,Memorial Day 22
[...]
Some of the queries run the day after the holiday, which is why I created a dateof and datefater field. I am trying to pipe the condition onto the end of the existing queries.
environment=staging "message=\"This line would contain the original search query"
| eval date=strftime(_time,"%m/%d/%Y")
| search NOT [ inputlookup holidays.csv | fields dateof ]
Note that an Event from the original query will look something like this:
time=2022-08-31T12:01:39,495Z [...] message="This line would contain the original search query"
Despite giving read access to the csv, the above condition does not filter out anything, regardless of whether it is a holiday listed in the csv file or not.
I suspect something is missing. However, I have a limited knowledge of the Spunk querying language. Would anyone be able to give guidance on this? Thanks in advance!
Subsearches can be tricky. If the result of the subsearch isn't just right, the query will fail. It helps to run the subsearch by itself to confirm the string produced makes sense as part of a query. In this case, check that
| inputlookup holidays.csv | fields dateof | rename dateof as date | format
produces something that works with search NOT.
An alternative to try is to explicitly look for the date field in the lookup.
environment=staging "message=\"This line would contain the original search query"
| eval date=strftime(_time,"%m/%d/%Y")
| where NOT [ | inputlookup holidays.csv | fields dateof | rename dateof as date | format ]
Here is another way to do it without a subsearch. A null description field tells us the date was not found in the lookup file and so is not a holiday.
environment=staging "message=\"This line would contain the original search query"
| eval date=strftime(_time,"%m/%d/%Y")
| lookup holidays.csv dateof as date output description
| where isnull(description)

Splunk - Lookup values + static search string = output with count

I want to perform a search where I need to use a static search string + input from a csv file with usernames:
Search query-
index=someindex host=host*p* "STATIC_SEARCH_STRING"
Value from users.csv where the list is like this- Please note that User/UserList is NOT a field in my Splunk:
**UserList**
User1
User2
User3
.
.
UserN
I have tried using multiple one of them being-
| inputlookup users.csv | join [search index=someindex host=host*p* "STATIC_SEARCH_STRING"] | lookup users.csv UserList OUTPUT UserList as User| stats count by User
The above one just outputs the list of users with count as '1' - which I assume it is getting from the table itself.
When I try searching events for a single user like-
index=someindex host=host*p* "User1" "STATIC_SEARCH_STRING". I get 100's of events for that user.
Can someone please help me with this?
Sorry if this is a noob question, I have been trying to learn splunk in order to reduce my workload and am stuck here.
Thanks in advance!
index=someindex host=host*p* "STATIC_SEARCH_STRING" [ | inputlookup users.csv | fields UserList | rename UserList as query]
What is happening here is that there is a sub-search, which does an inputlookup on the users.csv file. We then use fields to ensure there is only a single field (UserList) in the data. We then rename that field to query. This is a special field in sub-searches; when the sub-search returns the field query, it is expanded out into the expression (field_value_1) OR (field_value_2) OR ....
This expression is then appended to the original search string, so the final search that Splunk executes is index=someindex host=host*p* "STATIC_SEARCH_STRING" ("alice") OR ("bob") OR ("charlie")
This approach is outlined at https://docs.splunk.com/Documentation/Splunk/8.0.3/Search/Changetheformatofsubsearchresults
You can also look at the Splunk format command, https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Format if you need to alter the sub-search's expression format, for example, adding * around each returned expression.
I think you're doing the search inside out
What I think you may want is the following:
index=ndx sourcetype=srctp host=host*p* User=*
| search
[| inputlookup users.csv ]
| stats count by User
If I understand your question correctly, you want to use the values in your lookup as a filter on the data (ie, only where User is in that list)
If that is the case, the above will do just that
If you need to make the fieldnames match because the lookup table has a different name, change the subsearch to the following:
[| inputlookup users.csv
| rename lookup_field_name as User ]

Adding dedup _raw before timechart returns 0 results

I apologize if this is asked already but I search to no avail.
When writing a Splunk query that will eventually be used for summary indexing using sitimechart, I have this query:
index=app sourcetype=<removed> host=<removed> earliest=-10d
| eval Success_Count=if(scs=="True",1,0)
| eval Failure_Count=if(scs=="False",1,0)
| timechart span=1d sum(Success_Count) as SuccessCount sum(Failure_Count) as FailureCount count as TotalCount by host
Results are as expected. However, some data was accidentally indexed twice, so I need to remove duplicates. If I'm doing a regular search, I just use | dedup _raw to remove the identical events. However, if I run the following query, I get zero results returned (no matter where I put | dedup _raw):
index=app sourcetype=<removed> host=<removed> earliest=-10d
| dedup _raw
| eval Success_Count=if(scs=="True",1,0)
| eval Failure_Count=if(scs=="False",1,0)
| timechart span=1d sum(Success_Count) as SuccessCount sum(Failure_Count) as FailureCount count as TotalCount by host
What am I doing wrong? I'm using Splunk 4.3.2.
I've also posted this identical question to the Splunk>Answers page. I will remove if that is against terms.