how to check if splunk has received the logs from 100 different hosts - splunk

I am new to splunk. Wanted to create a splunk alert to check if logs has been received from all the host or not and if not need to set a alert trigger.
| tstats latest(_time) as latest where index=* earliest=-24h by host
| eval recent = if(latest > relative_time(now(),"-5m"),1,0), realLatest = strftime(latest,"%c")
| where recent=0
is the above splunk Query correct?

The query looks good, but the best way to know is to try it. Does it produce the desired results?

Related

Kusto memory status for an operation id

I executed the following control command
.set-or-append async XXXX<|fillXXXX()
This returned me an operation id
Now I want to check how much CPU/MEMORY usage (Query stats) happened for this operation id.
How can we do that?
When you run the command, you also get the ClientRequestId, and that's what you should use to get the resources used to run the command, :
.show commands
| where StartedOn > ago(1d)
| where ClientActivityId == "KE.RunCommand;9763ec24-910c-4e86-a823-339db440216e"
| where CommandType == "TableSetOrAppend"
| project ResourcesUtilization

Is it possible to create log source health alerts in Azure Sentinel?

I am attempting to create an alert that lets me know if a data source stops providing logs to Sentinel. While I know it displays anomalies in log data on the dash board, I am hoping to receive alerts if a source stops providing logs for an extended period of time.
Something like creating a rule with the following query (CEF in this case):
CommonSecurityLog
| where TimeGenerated > ago(24h)
| summarize count() by DeviceVendor, DeviceProduct, DeviceName, DeviceExternalID
| where count_ == 0

How to differentiate messages with pattern /bank/*/accounts/ vs /bank/4/accounts/a1 in splunk query?

I am trying to write a splunk query to monitor messages grouped by the API endpoints they belong to.
I have 2 endpoints to differentiate from:
/bank/*/accounts/
/bank/*/accounts/a1-b2-c3
My sample messages look as follows:
2019-07-15 11:42:10 [INFO] method='GET' path='/bank/4/accounts/' status='200'
2019-07-15 11:44:10 [INFO] method='GET' path='/bank/4/accounts/a1-b2-c3' status='200'
When I use following splunk query, I get messages which belong to both endpoints.
index=my_index host=my_host GET /bank/*/accounts/ | rex field=_raw "path=(?<path>.*)"
I tried appending following command to query, but was not successful in isolating the results:
| rex field=_raw ".*/accounts/(?<accountid>\w+)"
The rex command extracts data from events, it does not filter them. To filter events using regular expressions, use the regex command.
I found that a splunk query with *-* did the work. Complete query-
index=my_index host=my_host GET /bank/*/accounts/*-*
I liked the solution by RichG, but didnt investigate the exact usage of Regex for my case as I have found the solution for my problem.

Azure Monitor Alert based on Custom Metric

I've created a custom metric to monitor free disk C: space on my Azure VM.
But when i'n trying to create an alert rule (not classic), i can't find my custom metrics in the options list. i'm thinking that this is due to the fact that i'm using the new Rule alrts insted of the Classic Rules.
Has someone succeeded to create a new alert rule based on a custom metric?
Using a query can give me the output, but i don't know from where this info are coming (VM extension ? Diagnostic Log?):
Perf
| where TimeGenerated >ago(1d)
| where CounterName == "% Free Space" and ObjectName == "LogicalDisk" and InstanceName == "C:" and CounterValue > 90
| sort by TimeGenerated desc

How to build an ongoing alert that catches sudden spikes for a certain http error code?

I could really use an ongoing alert that catches a sudden rise (spike) in a certain error code (such as 404 or 502 etc...)
I tried giving this some thought on how to achieve that, and... Well... I could really use your help with the script :-)
From my understanding the search query should "know" or, "sense" the normal traffic (not sure for how long, maybe for 1hr, 2hrs) and alert when there is a spike in the error code compared to 1-2 hours ago.
I think the error code spike threshold should be more than 5% of total traffic, while occurring for longer than 90 seconds.
Here is a Splunk Query I use today, I appreciate your help tuning it to what I described above:
tag=NginxLogs host=www1 OR host=www2 |stats count by status|eventstats sum(count) as total|eval perc=round((count/total)*100,2)|where status="404" AND perc>5
The top command automatically provides the count and percent.
http://docs.splunk.com/Documentation/Splunk/7.1.2/SearchReference/Top
tag=NginxLogs host=www1 OR host=www2
| top status
| search percent > 5 AND status > 399
If you have the url,http request method and user in your splunk logs, you can add it as a part of this alert. Example:
tag=NginxLogs host=www1 OR host=www2
| eventstats distinct_count(userid) as NoOfUsersAffected by requestUri,status,httpmethod
| top status,httpmethod,NoOfUsersAffected by requestUri
| search NoOfUsersAffected > 2 AND ((status>499 AND percentage > 5) OR (StatusCode=400 AND percentage > 95))
You can use the following alert message:
$result.percent$ % ($result.count$ calls) has StatusCode $result.status$ for
$result.requestUri$ - $result.httpmethod$.
$result.NoOfUsersAffected$ users were affected
You will get alert like:
21.19 % (850 calls) has StatusCode 500 for https://app.test.com/hello - GET.
90 users are affected