How do I create a Splunk query for unused event types? - splunk

I have found that I can create a Splunk query to show how many times results of a certain event type appear in results.
severity=error | stats count by eventtype
This creates a table like so:
eventtype | count
------------------------
myEventType1 | 5
myEventType2 | 12
myEventType3 | 30
So far so good. However, I would like to find event types with zero results. Unfortunately, those with a count of 0 do not apear in the query above, so I can't just filter by that.
How do I create a Splunk query for unused event types?

There are lots of different ways for that, depending on what you mean by "event types". Somewhere, you have to get a list of whatever you are interested in, and roll them into the query.
Here's one version, assuming you had a csv that contained a list of eventtypes you wanted to see...
severity=error
| stats count as mycount by eventtype
| inputcsv append=t mylist.csv
| eval mycount=coalesce(mycount,0)
| stats sum(mycount) as mycount by eventtype
Here's another version, assuming that you wanted a list of all eventtypes that had occurred in the last 90 days, along with the count of how many had occurred yesterday:
earliest=-90d#d latest=#d severity=error
| addinfo
| stats count as totalcount count(eval(_time>=info_max_time-86400)) as yesterdaycount by eventtype

Related

Extracting a count from raw splunk data by id

I am trying to get a count from transactional information that is retained within raw data in splunk. I have 3-5 transactions that occur.
One has raw data stating: pin match for id 12345678-1234-1234-abcd-12345678abcd or pin mismatched for id etc.
I'm trying to count the number of times the pin match occurs within the transaction time window of 180sec.
I was trying to do something like:
|eval raw=_raw |search index=transa
|eval pinc= if((raw like "%pin match%"),1,0) |stats count(pinc) as Pincount by ID
The issue I'm having is it is counting cumulatively over whatever time I am looking at those transactions. Is there a way to attach it to the ID that is within the message or have it count every one that occurs within that time window?
Thanks!
Presuming the pin status and ID have not been extracted:
index=ndx sourcetype=srctp "pin" "match" OR "mismatched"
| rex field=_raw "pin (?<pin_status>\w+)"
| rex field=_raw "id (?<id>\S+)"
| eval status_time=pin_status+"|"+_time
| stats earliest(status_time) as beginning latest(status_time) as ending by id
| eval beginning=split(beginning,"|"), ending=split(ending,"|")
| eval begining=mvindex(beginning,-1), ending=mvindex(ending,-1)
| table id beginning ending
| sort 0 id
| eval beginning=strftime(beginning,"%c"), ending=strftime(ending,"%c")
After extracting the status ("match" or "mismatched") and the id, append the individual event's _time to the end of the status - we'll pull that value back out after statsing
Using stats, find the earliest and latest status_time entries (fields just created on the previous line) by id, saving them into new fields beginning and ending
Next, split() beginning and ending on the pipe we added to separate the status from the timestamp into a multivalue field
Then assign the last item from the multivalue field (which we know is the timestamp) into itself (because we know that the earliest entry for a status_time should always be "match", and the latest entry for a status_time should always be "mismatched")
Lastly, table the id and time stamps, sort by id, and format the timestamp into something human readable (strftime takes many arguments, %c just happens to be quick)

Selecting limited results from two tables

I apologise if this has been asked before. I'm still not certain how to phrase my question for the title, so wasn't sure what to search for.
I have a hundred or so databases in the same instance, one for each of my customers, named for the customer, and they all have the same structure. I want to select a single result set that includes the database name along with the most recent date entry in one of the tables. I can pull the database names from sys.databases, but then for each database I want to select the most recent date from Events.Date_Logged so that my result set looks something like this:
_______________________________
| | |
|Cust_Name |Latest_Event |
|_______________|_______________|
| | |
|Customer1 |01/02/2020 |
|_______________|_______________|
| | |
|Customer2 |02/02/2020 |
|_______________|_______________|
| | |
|Customer3 |03/02/2020 |
|_______________|_______________|
I'm really struggling with the syntax though. I either get just a single row returned or every single event for each customer. I think my joins are as rusty as hell.
Any help would be appreciated.
What I suggest you do:
Declare a result variable (of type table)
Use a cursor to go over every database
Inside the cursor: do a select top 1 ... order by date desc to get the most recent record. Save this result in the result variable.
After the cursor print the result variable.
That should do the trick.

Splunk query to get non matching ID from two query

Index=* sourcetype="publisher" namespace="app_1" | table ID message | where message="published"
Index=* sourcetype="consumer" namespace="app_1" | table ID message | where message="consumed"
I want to display non matching ID by comparing both the query, how can I achieve this.
If query 1 is giving 100 records and query 2 is giving 90 records and all 90 records are present in query 1 them I want to see 10 records which are not present in query 2.
There are several ways to achieve this outcome.
The following counts the number of consumer and producers events for each ID, then shows the IDs of the events that occur only once.
index=* sourcetype="publisher" OR sourcetype="consumer" namespace="app_1" ID="*" | stats count by ID | where count<2
In this next method, we use a sub search and a join. This has the benefit of giving you the full event, not just the ID.
index=* sourcetype="publisher" namespace="app_1" ID="*" | join type=outer ID [ search index=* sourcetype="consumer" namespace="app_1" ID="*" | eval does_match=1 ] | where isnull(does_match)

Stats Count Splunk Query

I wonder whether someone can help me please.
I'd made the following post about Splunk query I'm trying to write:
https://answers.splunk.com/answers/724223/in-a-table-powered-by-a-stats-count-search-can-you.html
I received some great help, but despite working on this for a few days now concentrating on using eval if statements, I still have the same issue with the "Successful" and "Unsuccessful" columns showing blank results. So I thought I'd cast the net a little wider and ask please whether someone maybe able to look at this and offer some guidance on how I may get around the problem.
Many thanks and kind regards
Chris
I tried exploring your use-case with splunkd-access log and came up with a simple SPL to help you.
In this query I am actually joining the output of 2 searches which aggregate the required results (Not concerned about the search performance).
Give it a try. If you've access to _internal index, this will work as is. You should be able to easily modify this to suit your events (eg: replace user with ClientID).
index=_internal source="/opt/splunk/var/log/splunk/splunkd_access.log"
| stats count as All sum(eval(if(status <= 303,1,0))) as Successful sum(eval(if(status > 303,1,0))) as Unsuccessful by user
| join user type=left
[ search index=_internal source="/opt/splunk/var/log/splunk/splunkd_access.log"
| chart count BY user status ]
I updated your search from splunk community answers (should look like this):
w2_wmf(RequestCompleted)`request.detail.Context="*test"
| dedup eventId
| rename request.ClientID as ClientID detail.statusCode AS statusCode
| stats count as All sum(eval(if(statusCode <= 303,1,0))) as Successful sum(eval(if(statusCode > 303,1,0))) as Unsuccessful by ClientID
| join ClientID type=left
[ search w2_wmf(RequestCompleted)`request.detail.Context="*test"
| dedup eventId
| rename request.ClientID as ClientID detail.statusCode AS statusCode
| chart count BY ClientID statusCode ]
I answered in Splunk
https://answers.splunk.com/answers/724223/in-a-table-powered-by-a-stats-count-search-can-you.html?childToView=729492#answer-729492
but using dummy encoding, it looks like
w2_wmf(RequestCompleted)`request.detail.Context="*test"
| dedup eventId
| rename request.ClientId as ClientID, detail.statusCode as Status
| eval X_{Status}=1
| stats count as Total sum(X_*) as X_* by ClientID
| rename X_* as *
Will give you ClientID, count and then a column for each status code found, with a sum of each code in that column.
As I gather you can't get this working, this query should show dummy encoding in action
`index=_internal sourcetype=*access
| eval X_{status}=1
| stats count as Total sum(X_*) as X_* by source, user
| rename X_* as *`
This would give an output of something like

Cumulative average number of records created for specific day of week or date range

Yeah, so I'm filling out a requirements document for a new client project and they're asking for growth trends and performance expectations calculated from existing data within our database.
The best source of data for something like this would be our logs table as we pretty much log every single transaction that occurs within our application.
Now, here's the issue, I don't have a whole lot of experience with MySql when it comes to collating cumulative sum and running averages. I've thrown together the following query which kind of makes sense to me, but it just keeps locking up the command console. The thing takes forever to execute and there are only 80k records within the test sample.
So, given the following basic table structure:
id | action | date_created
1 | 'merp' | 2007-06-20 17:17:00
2 | 'foo' | 2007-06-21 09:54:48
3 | 'bar' | 2007-06-21 12:47:30
... thousands of records ...
3545 | 'stab' | 2007-07-05 11:28:36
How would I go about calculating the average number of records created for each given day of the week?
day_of_week | average_records_created
1 | 234
2 | 23
3 | 5
4 | 67
5 | 234
6 | 12
7 | 36
I have the following query which makes me want to murderdeathkill myself by casting my body down an elevator shaft... and onto some bullets:
SELECT
DISTINCT(DAYOFWEEK(DATE(t1.datetime_entry))) AS t1.day_of_week,
AVG((SELECT COUNT(*) FROM VMS_LOGS t2 WHERE DAYOFWEEK(DATE(t2.date_time_entry)) = t1.day_of_week)) AS average_records_created
FROM VMS_LOGS t1
GROUP BY t1.day_of_week;
Halps? Please, don't make me cut myself again. :'(
How far back do you need to go when sampling this information? This solution works as long as it's less than a year.
Because day of week and week number are constant for a record, create a companion table that has the ID, WeekNumber, and DayOfWeek. Whenever you want to run this statistic, just generate the "missing" records from your master table.
Then, your report can be something along the lines of:
select
DayOfWeek
, count(*)/count(distinct(WeekNumber)) as Average
from
MyCompanionTable
group by
DayOfWeek
Of course if the table is too large, then you can instead pre-summarize the data on a daily basis and just use that, and add in "today's" data from your master table when running the report.
I rewrote your query as:
SELECT x.day_of_week,
AVG(x.count) 'average_records_created'
FROM (SELECT DAYOFWEEK(t.datetime_entry) 'day_of_week',
COUNT(*) 'count'
FROM VMS_LOGS t
GROUP BY DAYOFWEEK(t.datetime_entry)) x
GROUP BY x.day_of_week
The reason why your query takes so long is because of your inner select, you are essentialy running 6,400,000,000 queries. With a query like this your best solution may be to develop a timed reporting system, where the user receives an email when the query is done and the report is constructed or the user logs in and checks the report after.
Even with the optimization written by OMG Ponies (bellow) you are still looking at around the same number of queries.
SELECT x.day_of_week,
AVG(x.count) 'average_records_created'
FROM (SELECT DAYOFWEEK(t.datetime_entry) 'day_of_week',
COUNT(*) 'count'
FROM VMS_LOGS t
GROUP BY DAYOFWEEK(t.datetime_entry)) x
GROUP BY x.day_of_week