I am trying to create a Splunk alert that will be triggered if two events do not occur in a certain time window. The two events will be linked by a GUID and there may be multiple events occurring with different GUIDs simultaneously.
Can someone indicate where to start?
There probably are a few ways to do this. The worst one is by using the transaction command because it's very slow.
Try using stats to find the time span of the events and alert on those that are too slow (5 minutes in this example).
... | stats range(_time) as duration by GUID | where duration > 300
Related
First Event
06:09:17:362 INFO com.x.y.ConnApp - Making a GET Request
Second Event
06:09:17:480 INFO com.a.b.Response - Output Status Code: 200
Now I want to calculate duration of these two events for every request. I went over the solutions on splunk and Stack Overflow, but still can't get the proper result.
The easy answer is the transaction command, although it has a couple of drawbacks. The first is the command can be a resource hog. The other is can be "greedy" in that multiple requests might be taken to be a single transaction. We'll take care of the second issue with the maxevents option. There's not much we can do about the first except avoid using transaction.
index=foo ("Making a GET Request" OR "Output Status Code:")
| transaction maxevents=2 startswith="Making a GET Request" endswith="Output Status Code:"
| table duration
Another option uses the streamstats command to calculate the difference between adjacent events. This should perform better than transaction.
index=foo ("Making a GET Request" OR "Output Status Code:")
| streamstats window=2 range(_time) as duration
``` Erase the duration field for start events. ```
| eval duration = if(searchmatch("Making a GET Request"),"", duration)
| table _raw duration
Both queries assume the start and end events for different requests are not intermingled.
With the current logging messages, it will be tricky to group logs who are linked by the same source (imagine multiple calls who generate successive Making a GET messages)
In this case, I suggest to spread a ‘correlation Id’ in the logging message
Then you can identify exactly the messages who are triggered by the same source
This involve a change of the app logging function (you can search the following libs: log4/mcd/sleuth)
What's the most efficient way to perform the following search?
Event occurs on index A at X time
Take X time and use it as a start point in index B
Search all occurrences of a field within index B, with additional filters, 5 minutes after that initial event time that occurred from index A
Example using Windows logs: after every successful login via event ID 4624 (index="security") for a particular user on a host, search all Sysmon event ID 1 (index="sysmon") process creation events on that specific host that occurred in a 5 minute window after the login event. My vision is to examine user logins on a particular host and correlate subsequent process creation events over a short period of time.
I've been trying to play with join, stats min(_time), and eval starttimeu, but haven't had any success. Any help/pointers would be greatly appreciated!
Have you tried map? The map command runs a search for each result of another search. For example:
index=security sourcetype=wineventlog EventCode=4624
```Set the latest time for the map to event time + 5 minutes (300 seconds)```
| eval latest=_time+300
| map search="search index=sysmon host=$host$ earliest=$_time$ latest=$latest$"
Names within $ are field names from the main search.
I am building an event reminder page where people can set a reminder for certain events. There is an option for the user to set the amount of time before they need to be notified. It is stored in notification_time and notification_unit. notification_time keeps track of the time before they want to be notified and notification_unit keeps track of the PHP date format in which they selected the time, eg. i for minutes, H for hours.
Eg. notification_time - 2 and notification_unit - H means they need to be notified 2 hours before.
I have Cron jobs running in the background for handling the notification. This function is being hit once every minute.
Reminder::where(function ($query) {
$query->where('event_time', '>=', now()->subMinutes(Carbon::createFromFormat('i', 60)->diffInMinutes() - 1)->format('H:i:s'));
$query->where('event_time', '<=', now()->subMinutes(Carbon::createFromFormat('i', 60)->diffInMinutes())->format('H:i:s'));
})
In this function, I am hard coding the 'i', 60 while it should be fetched from the database. event_time is also part of the same table
The table looks something like this -
id event_time ... notification_unit notification_time created_at updated_at
Is there any way to solve this issue? Is it possible to do the same logic with SQL instead?
A direct answer to this question is not possible. I found 2 ways to resolve my issue.
First solution
Mysql has DATEDIFF and DATE_SUB to get timestamp difference and subtract certain intervals from a timestamp. In my case, the function runs every minute. To use them, I have to refactor my database to store the time and unit in seconds in the database. Then do the calculation. I chose not to use this way because both operations are a bit heavy on the server-side since I am running the function every minute.
Second Solution
This is the solution that I personally did in my case. Here I did the calculations while storing it in the database. Meaning? Let me explain. I created a new table notification_settings which is linked to the reminder (one-one relation). The table looks like this
id, unit, time, notify_at, repeating, created_at, updated_at
The unit and time columns are only used while displaying the reminder. What I did is, I calculated when to be notified in the notify_at column. So in the event scheduler, I need to check for the reminders at present (since I am running it every minute). The repeating column is there to keep track of whether the reminder is repeating or not. If it is repeating I re-calculate the notify_at column at the time of scheduling. Once the user is notified notify_at is set to null.
I've been looking at a recent event in Splunk with sourcetype WinHostMon, and I see two different values for StartTime and _time:
StartTime="20200427223006.448182-300"
_time is recorded as 2020-04-28T15:38:13.000-04:00
If the last part is timezone, there are two things that are strange about this:
The timezone for StartTime is in the middle of the Atlantic.
The times don't actually match.
Question: What is the actual time of this event, if such a thing can actually be determined, and what is causing the discrepancy between these two times?
(I tried to post this on Splunk Answers but they seem to have a labyrinth to stop people from signing up and I was unable to get an activated account.)
_time is the timestamp of the event, that is, when the event was generated or written to a log file. This is the field Splunk uses for default sorting and rendering in tables and time charts.
For WinHostMon events, most notably Process events, StartTime is when that process started.
Hence, it is not surprising that these events are significantly different. The process may have started at some point in the past, and then the WinHostMon input may generate a list of active processes every 5 minutes or so (or more or less)
_time is the timestamp of the event as defined in props.conf - or, if undefined, whenever Splunk receives the event (as often happens with untagged JSON)
The field StartTime is - so far as I can tell - not related to whatever is populating _time
If you open the Add-On's props.conf, you'll see how they're defining the timestamp and the field extraction for StartTime
I am relatively new to Splunk and I am trying to create a reportthat will display a hostname and the amount of times that host failed to login within the past five minutes, when they failed 3 or more times. The only way I was able to get the initial search results I want is to look only within the past 5 minutes, as you can see in my query:
index="wineventlog" EventCode=4625 earliest=-5min | stats count by host,_time | stats count by host | search count > 2
This returns the host and the count. The issue is if I use this query in my report, it can run every five minutes, but the hosts that were listed previously get removed as they no longer are included in the search results.
I found ways to generate logs that I can then search for separately (http://docs.splunk.com/Documentation/Splunk/6.6.2/Alert/LogEvents) but it didn't work the way I expected.
I am looking for an answer to any of these questions that can help me get the intended results:
Can my original search be improved to still only get results where the failed logins were within 5 minutes but be able to search over any time period?
Is there a way to send the results from the query I already have to a report, where the results will not be cleared out when the search is run again?
Is there any other option I haven't considered to achieve the desired result?
If you only care about the last 5 minutes then search only the last 5 minutes. Searching more is just wasting resources.
Consider writing your results to a summary index (using collect) with a scheduled search and have your report/dashboard display values from the summary index.