Log Analytics - Only list PC that have errors in event log if there is no subsequent success log - azure-log-analytics

I am new to log analytics and am trying to get a list of PCs that have failed a process without subsequently succeeding.
In my example, a PC will log an error event 360 and can do this many times, until the issue is resolved, then it will log a 300 success.
I have 2 simple queries that return all the success and error logs:
Succeeds Log
Event
| where EventLog == "Microsoft-Windows-User Device Registration/Admin" | where EventID == "300"
| project Computer, TimeGenerated, EventLevelName, EventID, RenderedDescription
Error Log
Event
| where EventLog == "Microsoft-Windows-User Device Registration/Admin" | where EventID == "360"
| project Computer, TimeGenerated, EventLevelName, EventID, RenderedDescription
So I am looking to filter out all computers that have a 300 success log even if they have at some point failed (because they have been fixed).
Thanks.

Related

Kusto memory status for an operation id

I executed the following control command
.set-or-append async XXXX<|fillXXXX()
This returned me an operation id
Now I want to check how much CPU/MEMORY usage (Query stats) happened for this operation id.
How can we do that?
When you run the command, you also get the ClientRequestId, and that's what you should use to get the resources used to run the command, :
.show commands
| where StartedOn > ago(1d)
| where ClientActivityId == "KE.RunCommand;9763ec24-910c-4e86-a823-339db440216e"
| where CommandType == "TableSetOrAppend"
| project ResourcesUtilization

Is it possible to create log source health alerts in Azure Sentinel?

I am attempting to create an alert that lets me know if a data source stops providing logs to Sentinel. While I know it displays anomalies in log data on the dash board, I am hoping to receive alerts if a source stops providing logs for an extended period of time.
Something like creating a rule with the following query (CEF in this case):
CommonSecurityLog
| where TimeGenerated > ago(24h)
| summarize count() by DeviceVendor, DeviceProduct, DeviceName, DeviceExternalID
| where count_ == 0

Query sometimes delivers alerts for services that are running

I am setting up som alerts for windows services. using the code below. But sometimes I am getting an alert for a service that have the state "Running". We canĀ“t se that the service are stopped or restarted under the period. Does any one have an idea what could be wrong? Or should I change the query to something else?
I want an alert every time the service is stopped so the support team can take action.
ConfigurationData
| project Computer, SvcName, SvcDisplayName, SvcState, TimeGenerated
| where (SvcName =~ "W3SVC")
| project Computer, SvcName, SvcDisplayName, SvcState, TimeGenerated
| where SvcState != "Running"
Update:
There is a potential issue in your query, like below:
if the SvcState state is stopped at 2019/09/06 1:00 PM, then you fix the issue by restart it. Let's say it's running again in 2019/09/06 2:00 PM. But in your query, for example, the query runs from 2019/09/06 1:00 PM, it will always return a result to indicate the service is stopped(which is actually an old state in 1:00 pm, but the latest state is running in 2:00 pm)
So you should get the latest SvcState by using top 1 by TimeGenerated, which is ordered by desc in TimeGenerated by default.
Please try the code below:
ConfigurationData
| top 1 by TimeGenerated
| project Computer, SvcName, SvcDisplayName, SvcState, TimeGenerated
| where (SvcName =~ "W3SVC") and SvcState != "Running"

How to build an ongoing alert that catches sudden spikes for a certain http error code?

I could really use an ongoing alert that catches a sudden rise (spike) in a certain error code (such as 404 or 502 etc...)
I tried giving this some thought on how to achieve that, and... Well... I could really use your help with the script :-)
From my understanding the search query should "know" or, "sense" the normal traffic (not sure for how long, maybe for 1hr, 2hrs) and alert when there is a spike in the error code compared to 1-2 hours ago.
I think the error code spike threshold should be more than 5% of total traffic, while occurring for longer than 90 seconds.
Here is a Splunk Query I use today, I appreciate your help tuning it to what I described above:
tag=NginxLogs host=www1 OR host=www2 |stats count by status|eventstats sum(count) as total|eval perc=round((count/total)*100,2)|where status="404" AND perc>5
The top command automatically provides the count and percent.
http://docs.splunk.com/Documentation/Splunk/7.1.2/SearchReference/Top
tag=NginxLogs host=www1 OR host=www2
| top status
| search percent > 5 AND status > 399
If you have the url,http request method and user in your splunk logs, you can add it as a part of this alert. Example:
tag=NginxLogs host=www1 OR host=www2
| eventstats distinct_count(userid) as NoOfUsersAffected by requestUri,status,httpmethod
| top status,httpmethod,NoOfUsersAffected by requestUri
| search NoOfUsersAffected > 2 AND ((status>499 AND percentage > 5) OR (StatusCode=400 AND percentage > 95))
You can use the following alert message:
$result.percent$ % ($result.count$ calls) has StatusCode $result.status$ for
$result.requestUri$ - $result.httpmethod$.
$result.NoOfUsersAffected$ users were affected
You will get alert like:
21.19 % (850 calls) has StatusCode 500 for https://app.test.com/hello - GET.
90 users are affected

Team Foundation/SQL/Silverlight creation of new team project failure Timeout

This is a rather convoluted problem, because we are setting up TFS with SQL Reporting running with SilverLight Integration. We followed the horrific path of set-up instructions that range across 3 different servers, and when we finished, we started getting the following error.
This error results from attempting to create a new team project within the project group.
Following its progress in the reports page, we can see it create the folders cleanly, but when it attempts to create the actual reports on the system, it times out. I've checked every other site I could find to try and figure out what went wrong, and nothing suggested has worked. Any help here would be greatly appreciated
Error/Stack Trace attached below:
2011-01-19T15:54:21 | Module: Engine | Thread: 6 | Running Task "" from Group ""
2011-01-19T15:54:24 | Module: Rosetta | Thread: 19 | Creating folder: /TfsReports/Boeing/admin/Bugs
2011-01-19T15:54:25 | Module: Rosetta | Thread: 19 | Creating folder: /TfsReports/Boeing/admin/Builds
2011-01-19T15:54:26 | Module: Rosetta | Thread: 19 | Creating folder: /TfsReports/Boeing/admin/Project Management
2011-01-19T15:54:27 | Module: Rosetta | Thread: 19 | Creating folder: /TfsReports/Boeing/admin/Tests
2011-01-19T15:54:29 | Module: Rosetta | Thread: 19 | Creating folder: /TfsReports/Boeing/admin/Dashboards
2011-01-19T15:54:30 | Module: Rosetta | Thread: 19 | Creating report: /TfsReports/Boeing/admin/Bugs/Bug Status
---begin Exception entry---
Time: 2011-01-19T15:59:30
Module: Engine
Event Description: TF30162: Task "Populate Reports" from Group "Reporting" failed
Exception Type: Microsoft.TeamFoundation.Client.PcwException
Exception Message: TF30225: Error uploading report 'Bug Status': The operation has timed out
Stack Trace:
at Microsoft.VisualStudio.TeamFoundation.RosettaReportUploader.Execute(ProjectCreationContext context, XmlNode taskXml)
at Microsoft.VisualStudio.TeamFoundation.ProjectCreationEngine.TaskExecutor.PerformTask(IProjectComponentCreator componentCreator, ProjectCreationContext context, XmlNode taskXml)
at Microsoft.VisualStudio.TeamFoundation.ProjectCreationEngine.RunTask(Object taskObj)
-- Inner Exception --
Exception Message: TF30225: Error uploading report 'Bug Status': The operation has timed out (type ReportingUploaderException)
Exception Stack Trace: at Microsoft.TeamFoundation.Client.Reporting.ReportingUploader.UploadReport(XmlNode report)
at Microsoft.TeamFoundation.Client.Reporting.ReportingUploader.HandleCreateReports(XmlNode node)
at Microsoft.TeamFoundation.Client.Reporting.ReportingUploader.Run()
at Microsoft.VisualStudio.TeamFoundation.RosettaReportUploader.Execute(ProjectCreationContext context, XmlNode taskXml)
Inner Exception Details:
Exception Message: The operation has timed out (type WebException)
Exception Stack Trace: at System.Web.Services.Protocols.WebClientProtocol.GetWebResponse(WebRequest request)
at System.Web.Services.Protocols.HttpWebClientProtocol.GetWebResponse(WebRequest request)
at Microsoft.TeamFoundation.Client.TeamFoundationSoapProxy.GetWebResponse(WebRequest request)
at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object[] parameters)
at Microsoft.TeamFoundation.Client.Reporting.ReportingService.CreateReport(String Report, String Parent, Boolean Overwrite, Byte[] Definition, Property[] Properties)
at Microsoft.TeamFoundation.Client.Reporting.ReportingUploader.UploadReport(XmlNode report)
--- end Exception entry ---
2011-01-19T15:59:31 | Module: Engine | Thread: 19 | TF30202: Task "" from Group "" will not be run because a prior task failed.
2011-01-19T15:59:31 | Module: Engine | Thread: 19 | TF30202: Task "SharePointPortal" from Group "Portal" will not be run because a prior task failed.
2011-01-19T15:59:31 | Module: Engine | Thread: 19 | TF30202: Task "" from Group "" will not be run because a prior task failed.
Denis Habib posted this solution to a similar problem. Perhaps you have the same problem
The problem is with uploading a report
to the report server. I think you
have the correct permissions since you
were able to create the site. The
problem may have to do with the
security settings on the datasources
(TfsOlapReportDS and TfsReportDS) as
these are the datasources for the
reports.
Please verify the following settings:
Navigate to the reporting site
(/Reports/Pages/Folder.aspx">http:///Reports/Pages/Folder.aspx),
click on the TfsOlapReportsDS and the
TfsReportDS and verify the connection
settings for each, specifically the
'Connect using:' section. This is
generally set to 'Credentials stored
securely in the report server' and a
valid username/password is specified.
Also, the 'Use as Windows credentials
when connection to the data source' is
checked.