Is there any KQL queries to extract page views, download counts from the W3C IISlogs on Azure-Log analytics? - azure-log-analytics

We're trying to extract page views, file download count, users list from w3c IIS logs. we want to define what's page view, i.e. any user stayed on same page more than 10 sec to be a one page view. anything less is not a page view. w3c logs doesn't seem to be having enough data to extract this. can this be possible with what's already available?
This is the data available to extract the above info from,
Datatable operator
datatable (TimeGenerated:datetime, csUriStem:string, scStatus:string, csUserName:string, sSiteName :string)
[datetime(2019-04-12T11:55:13Z),"/Account/","302","-","WebsiteName",
datetime(2019-04-12T11:55:16Z),"/","302","-","WebsiteName",
datetime(2019-04-12T11:55:17Z),"/Account/","200","myemail#mycom.com","WebsiteName",
datetime(2019-04-12T11:55:17Z),"/Content/site.css","200","-","WebsiteName",
datetime(2019-04-12T11:55:17Z),"/Scripts/modernizr-2.8.3.js","200","-","WebsiteName",
datetime(2019-04-12T11:55:17Z),"/Scripts/bootstrap.js","200","-","WebsiteName",
datetime(2019-04-12T11:55:17Z),"/Content/bootstrap.css","200","-","WebsiteName",
datetime(2019-04-12T11:55:18Z),"/Scripts/jquery-3.3.1.js","200","-","WebsiteName",
datetime(2019-04-12T11:55:23Z),"/","302","-","WebsiteName",
datetime(2019-04-12T11:56:39Z),"/","200","myemail#mycom.com","WebsiteName",
datetime(2019-04-12T11:57:13Z),"/Home/About","200","myemail#mycom.com","WebsiteName",
datetime(2019-04-12T11:58:16Z),"/Home/Contact","200","myemail#mycom.com","WebsiteName",
datetime(2019-04-12T11:59:03Z),"/","200","myemail#mycom.com","WebsiteName"]

I am not sure I got all your requirements right, but here is something to get started and provide you initial direction.
datatable (TimeGenerated:datetime, csUriStem:string, scStatus:string, csUserName:string, sSiteName :string)
[datetime(2019-04-12T11:55:13Z),"/Account/","302","-","WebsiteName",
datetime(2019-04-12T11:55:16Z),"/","302","-","WebsiteName",
datetime(2019-04-12T11:55:17Z),"/Account/","200","myemail#mycom.com","WebsiteName",
datetime(2019-04-12T11:55:17Z),"/Content/site.css","200","-","WebsiteName",
datetime(2019-04-12T11:55:17Z),"/Scripts/modernizr-2.8.3.js","200","-","WebsiteName",
datetime(2019-04-12T11:55:17Z),"/Scripts/bootstrap.js","200","-","WebsiteName",
datetime(2019-04-12T11:55:17Z),"/Content/bootstrap.css","200","-","WebsiteName",
datetime(2019-04-12T11:55:18Z),"/Scripts/jquery-3.3.1.js","200","-","WebsiteName",
datetime(2019-04-12T11:55:23Z),"/","302","-","WebsiteName",
datetime(2019-04-12T11:56:39Z),"/","200","myemail#mycom.com","WebsiteName",
datetime(2019-04-12T11:57:13Z),"/Home/About","200","myemail#mycom.com","WebsiteName",
datetime(2019-04-12T11:58:16Z),"/Home/Contact","200","myemail#mycom.com","WebsiteName",
datetime(2019-04-12T11:59:03Z),"/","200","myemail#mycom.com","WebsiteName"]
| where scStatus !in ('302') // exclude status 302
| where csUriStem !startswith '/Scripts' and csUriStem !endswith ".css" // exclude pages coming from '/Script' and .css files
| order by TimeGenerated asc
| summarize t=make_list(TimeGenerated) by csUriStem, csUserName // create time-series of visit events
| mv-apply t to typeof(datetime) on // run subquery on each of the series
(
project isVisit = (t - prev(t)) > 1min // compare with previous timestamp, and see if >1min passed
| summarize Visits=sum(isVisit)
)
| project csUriStem, csUserName, Visits
Here are links to make_list() (aggregation function), prev() (window function), summarize operator, and mv-apply operator

Related

How can I put several extracted values from a Json in an array in Kusto?

I'm trying to write a query that returns the vulnerabilities found by "Built-in Qualys vulnerability assessment" in log analytics.
It was all going smoothly I was getting the values from the properties Json and turning then into separated strings but I found out that some of the terms posses more than one value, and I need to get all of them in a single cell.
My query is like this right now
securityresources | where type =~ "microsoft.security/assessments/subassessments"
| extend assessmentKey=extract(#"(?i)providers/Microsoft.Security/assessments/([^/]*)", 1, id), IdAzure=tostring(properties.id)
| extend IdRecurso = tostring(properties.resourceDetails.id)
| extend NomeVulnerabilidade=tostring(properties.displayName),
Correcao=tostring(properties.remediation),
Categoria=tostring(properties.category),
Impacto=tostring(properties.impact),
Ameaca=tostring(properties.additionalData.threat),
severidade=tostring(properties.status.severity),
status=tostring(properties.status.code),
Referencia=tostring(properties.additionalData.vendorReferences[0].link),
CVE=tostring(properties.additionalData.cve[0].link)
| where assessmentKey == "1195afff-c881-495e-9bc5-1486211ae03f"
| where status == "Unhealthy"
| project IdRecurso, IdAzure, NomeVulnerabilidade, severidade, Categoria, CVE, Referencia, status, Impacto, Ameaca, Correcao
Ignore the awkward names of the columns, for they are in Portuguese.
As you can see in the "Referencia" and "CVE" columns, I'm able to extract the values from a specific index of the array, but I want all links of the whole array
Without sample input and expected output it's hard to understand what you need, so trying to guess here...
I think that summarize make_list(...) by ... will help you (see this to learn how to use make_list)
If this is not what you're looking for, please delete the question, and post a new one with minimal sample input (using datatable operator), and expected output, and we'll gladly help.

Best method to keep lookup file value fresh

Say, I have to monitor users' activities from 3 specific departments: Science, History, and Math.
The goal is to send an alert if any of the users in any of those departments download a file from site XYZ.
Currently, I have a lookup file for all the users from those three departments.
users
----------------------
user1#organization.edu
user2#organization.edu
user3#organization.edu
user4#organization.edu
user5#organization.edu
One problem: users can join, leave, or transfer to another department anytime.
Fortunately, those activities (join and leave) are tracked and they are Splunk-able.
index=directory status=*
-----------------------------------------------
{
"username":"user1#organization.edu",
"department":"Science",
"status":"added"
}
{
"username":"user1#organization.edu",
"department":"Science",
"status":"removed"
}
{
"username":"user2#organization.edu",
"department":"History",
"status":"added"
}
{
"username":"user3#organization.edu",
"department":"Math",
"status":"added"
}
{
"username":"MRROBOT#organization.edu",
"department":"Math",
"status":"added"
}
In this example, assuming I forgot to update the lookup file, I won't get an alert when MRROBOT#organization.edu downloads a file, and at the same time, I will still get an alert when user1#organization.edu downloads a file.
One solution that I could think of is to update the lookup manually via using inputlookup and outputlook method like:
inputlookup users.csv | users!=user1#organization.edu | outputlookup users.csv
But, I don't think this is an efficient method, especially there's high likely I might miss a user or two.
Is there a better way to keep the lookup file up-to-date? I googled around, and one suggestion is to use a cronjob CURL to update the list. But, I was wondering if there's a simpler or better alternative than that.
Here's a search that should automate the maintenance of the lookup file using the activity events in Splunk.
`comment("Read in the lookup file. Force them to have old timestamps")`
| inputlookup users.csv | eval _time=1, status="added"
`comment("Add in activity events")`
| append [ search index=foo ]
`comment("Keep only the most recent record for each user")`
| stats latest(_time) as _time, latest(status) as status by username
`comment("Throw out users with status of 'removed'")`
| where NOT status="removed"
`comment("Save the new lookup")`
| table username
| outputlookup users.csv
After the append command, you should have a list that looks like this:
user1#organization.edu added
user2#organization.edu added
user3#organization.edu added
user4#organization.edu added
user5#organization.edu added
user1#organization.edu added
user1#organization.edu removed
user2#organization.edu added
user3#organization.edu added
MRROBOT#organization.edu added
The stats command will reduce it to:
user4#organization.edu added
user5#organization.edu added
user1#organization.edu removed
user2#organization.edu added
user3#organization.edu added
MRROBOT#organization.edu added
with the where command further reducing it to:
user4#organization.edu added
user5#organization.edu added
user2#organization.edu added
user3#organization.edu added
MRROBOT#organization.edu added

Search using Lookup from a single field CSV file

I have a list of usernames that I have to monitor and the list is growing every day. I read Splunk documentation and it seems like lookup is the best way to handle this situation.
The goal is for my query to leverage the lookup function and prints out all the download events from all these users in the list.
Sample logs
index=proxy123 activity="download"
{
"machine":"1.1.1.1",
"username":"ABC#xyz.com",
"activity":"download"
}
{
"machine":"2.2.2.2",
"username":"ASDF#xyz.com",
"activity":"download"
}
{
"machine":"3.3.3.3",
"username":"GGG#xyz.com",
"activity":"download"
}
Sample Lookup (username.csv)
users
ABC#xyz.com
ASDF#xyz.com
BBB#xyz.com
Current query:
index=proxy123 activity="download" | lookup username.csv users OUTPUT users | where not isnull(users)
Result: 0 (which is not correct)
I probably don't understand lookup correctly. Can someone correct me and teach me the correct way?
In the lookup file, the name of the field is users, whereas in the event, it is username. Fortunately, the lookup command has a mechanism for renaming the fields during the lookup. Try the following
index=proxy123 activity="download" | lookup username.csv users AS username OUTPUT users | where isnotnull(users)
Now, depending on the volume of data you have in your index and how much data is being discarded when not matching a username in the CSV, there may be alternate approaches you can try, for example, this one using a subsearch.
index=proxy123 activity="download" [ | inputlookup username.csv | rename users AS username | return username ]
What happens here in the subsearch (the bit in the []) is that the subsearch will be expanded first, in this case, to (username="ABC#xyz.com" OR username="ASDF#xyz.com" OR username="BBB#xyz.com"). So your main search will turn into
index=proxy123 activity="download" (username="ABC#xyz.com" OR username="ASDF#xyz.com" OR username="BBB#xyz.com")
which may be more efficient than returning all the data in the index, then discarding anything that doesn't match the list of users.
This approach assumes that you have the username field extracted in the first place. If you don't, you can try the following.
index=proxy123 activity="download" [ | inputlookup username.csv | rename users AS search | format ]
This expanded search will be
index=proxy123 activity="download" "ABC#xyz.com" OR "ASDF#xyz.com" OR "BBB#xyz.com")
which may be more suitable to your data.

Splunk search no subsearch

I have events something like:
{
taskId:5a6d
category:created
when:1517131461
...
}
{
taskId:5a6d
category:started
when:1517131609
...
}
{
taskId:5a6d
category:ended
when:1517134657
...
}
For each task (task id is same), we have events when it is created / started / ended.
I'd like to search if there is any task never be processed (task is created but not started). Here is my search statement:
index=XXX sourcetype=XXX category=created | search NOT [search index=XXX sourcetype=XXX category=started | fields taskId]
This statement works correctly if the time range is less than 48 hours.
If the time range is set to, for example, latest 7 days, the above search statement works incorrectly. It returns a lot of tasks (category=created) which means these tasks are never processed. Actually, they are processed, I can search the events (category=started) by taskId.
I have no idea what's wrong with it. it seems subsearch doesn't return correct results in the range of main search.
This will be hard to debug without seeing your exact data.
To make it simpler, you can try something like this to do everything with one search:
index=XXX sourcetype=XXX category=created
| eventstats values(category) as categories by taskId
| search categories = created NOT categories = started

Search with original text that was replaced earlier

I am gathering performance metrics for each each api that we have. With the below query I get results as
method response_time
Create Billing 2343.2323
index="dev-uw2" logger_name="*Aspect*" message="*ApiImpl*" | rex field=message "PerformanceMetrics - method='(?<method>.*)' execution_time=(?<response_time>.*)" | table method, response_time | replace "public com.xyz.services.billingservice.model.Billing com.xyz.services.billingservice.api.BillingApiImpl.createBilling(java.lang.String)” WITH "Create Billing” IN method
If the user clicks on each api text in table cell to drill down further it will open a new search with "Create Billing" obviosuly it will give zero results since we don't have any log with that string.
I want splunk to search with original text that was replaced earlier.
You can use click.value to get around this.
http://docs.splunk.com/Documentation/SplunkCloud/6.6.3/Viz/tokens