How do I compare current diskspace with previous days diskspace in KQL - kql

I have a query that shows the server with the least diskspace free but I wanted to add a column that shows the space compared to a historic figure (to capture any sudden decrease)
And I seem to be able to only get it to work for a few occurrences
my code so far
let start = ago(24h);
let End_date = ago(48h);
let Curr =Perf
| where ObjectName == "LogicalDisk" and CounterName == "% Free Space"
| where TimeGenerated >ago(1h)
|extend dim =strcat(Computer,InstanceName)
|summarize CurrentVal= max(CounterValue) by Computer,InstanceName;
let Prev = Perf
| where ObjectName == "LogicalDisk" and CounterName == "% Free Space"
| where TimeGenerated >End_date
|where TimeGenerated <start
|extend dim =strcat(Computer,InstanceName)
|summarize YesterdayVal=max(CounterValue) by Computer,InstanceName;
Curr
|join kind = leftouter Prev on InstanceName and Computer
//|extend diff = CurrentVal - YesterdayVal
//|extend Diskspace =CurrentVal
//|project Computer, InstanceName, Diskspace, diff

A possible solution:
let start = ago(24h);
let End_date = ago(48h);
Perf
| where TimeGenerated > End_date
| where ObjectName == "LogicalDisk" and CounterName == "% Free Space"
| extend Current= TimeGenerated > start
| summarize
Diskspace= maxif(CounterValue, Current),
Diff= maxif(CounterValue, Current) - maxif(CounterValue, not(Current))
by Computer, InstanceName

Related

KQL/Search-AzGraph join operator

I have just begun using AzGraph and I am learning how to use its queries, I am running into an issue when attempting to pull key vault whitelisted IP addresses, below is the query that I am currently running:
Search-AzGraph -Query "resources
|where type == 'microsoft.keyvault/vaults'
|where properties.publicNetworkAccess == 'Enabled'
|mv-expand properties.networkAcls.ipRules
|project kvName = name, kvRule = properties.networkAcls.ipRules"
The output instead of providing a list of addresses per vault returns a bunch of duplicated lines for the same vaults, this only occurs after a certain number of whitelisted addresses, I am not sure on the number:
kvName kvRule
------ ------
Vault1
Vault2
Vault3 {#{value=1.1.1.1/32}}
Vault4 {#{value=1.1.1.1/32}, #{value=2.2.2.2/32}}
Vault5 {#{value=1.1.1.1/32}, #{value=2.2.2.2/32}}
Vault6 {#{value=1.1.1.1/32}}
Vault7
Vault8
Vault9 <-- {#{value=1.1.1.1/32}, #{value=2.2.2.2/32}, #{value=3.3.3.3/32}, #{value=4.4.4.4/32}...}
Vault9 <-- {#{value=1.1.1.1/32}, #{value=2.2.2.2/32}, #{value=3.3.3.3/32}, #{value=4.4.4.4/32}...}
Vault9 <-- {#{value=1.1.1.1/32}, #{value=2.2.2.2/32}, #{value=3.3.3.3/32}, #{value=4.4.4.4/32}...}
Vault9 <-- {#{value=1.1.1.1/32}, #{value=2.2.2.2/32}, #{value=3.3.3.3/32}, #{value=4.4.4.4/32}...}
Vault9 <-- {#{value=1.1.1.1/32}, #{value=2.2.2.2/32}, #{value=3.3.3.3/32}, #{value=4.4.4.4/32}...}
I have also tried extending the property values to see if that helped, but instead, the format changed to below
Code:
Search-AzGraph -Query "resources
|where type == 'microsoft.keyvault/vaults'
|where properties.publicNetworkAccess == 'Enabled'
|extend kvRule=parsejson(tostring(properties.networkAcls.ipRules))
|mv-expand kvRule
|project kvName = name, kvRule.value"
Output:
kvName kvRule
------ ------
Vault1
Vault2 1.1.1.1/32
Vault3 1.2.3.4/32
Vault4 5.6.7.8/32
Vault5 1.2.3.3/32
Vault6
Vault7
Vault8
Vault9 <-- 1.1.1.1/32
Vault9 <-- 2.2.2.2/32
Vault9 <-- 3.3.3.3/32
Vault9 <-- 4.4.4.4/32
I came across the join operator, and attempted to use the examples against my queries but failed, the output was always similar to the above output, or I received an error:
This query outputs similar to the second example:
Search-AzGraph -Query "Resources
| join kind=leftouter (resources | where type=='microsoft.keyvault/vaults' | where properties.publicNetworkAccess == 'Enabled' | extend kvRule=parsejson(tostring(properties.networkAcls.ipRules)) | mv-expand kvRule | project id, kvName = name, kvURI = properties.vaultUri, kvRule) on id
| where type == 'microsoft.keyvault/vaults'
| project id, name, kvType = type, kvLoc = location, kvSub = subscriptionId, kvURI, kvRule= properties.networkAcls.ipRules"
I also attempted the below query, which errored out that the ipRules are dynamic:
Search-AzGraph -Query "Resources
|where type == 'microsoft.keyvault/vaults'
|where properties.publicNetworkAccess == 'Enabled'
|extend kvRule=parsejson(tostring(properties.networkAcls.ipRules))
|project kvID = id, name, kvLoc = location, kvSub = subscriptionId, kvURI = properties.vaultUri, kvRule
| join kind=leftouter (
Resources
|where type == 'microsoft.keyvault/vaults'
|where properties.publicNetworkAccess == 'Enabled'
|project name, kvRule = tolower(id))
on kvRule
| summarize by name"
Error:
"code": "InvalidQuery",
"message": "Query is invalid. Please refer to the documentation for the Azure Resource Graph service and fix the error before retrying."
"code": "Default",
"message": "join key 'kvRule' is of a 'dynamic' type. Please use an explicit cast using extend operator in the join legs (for example, '... | extend kvRule = tostring(kvRule) | join (... | extend kvRule = tostring(kvRule)) on kvRule') as join on a 'dynamic' type is not supported."
I am struggling to understand how I can make this query work, I really believe that I need to use the join operator to get this query right, but I do not have enough understanding of KQL/DB queries to do so, looking to be educated on how I can correctly perform this query.
My goal is to have the output be a single vault name, with kvRule including a full list of the addresses in its whitelist if there are any all in a single line:
kvName kvRule
------ ------
Vault1
Vault2 1.1.1.1/32, 2.2.2.2/32, 3.3.3.3/32
Vault3 1.1.1.1/32, 2.2.2.2/32, 3.3.3.3/32, 1.1.1.3/32, 2.2.2.4/32, 3.3.3.1/32
to fix the query, instead of
|extend kvRule=parsejson(tostring(properties.networkAcls.ipRules))
make it
|extend kvRule=tostring(parsejson(properties.networkAcls.ipRules))
and for better filtering, you may consider using
| where isnotempty(kvRule)
and for clear results
kind=inner

How to use `sum` within `summarize` in a KQL query?

I'm working at logging an Azure Storage Account. Have a Diagnostic Setting applied and am using Log Analytics to write KQL queries.
My goal is to determine the number of GetBlob requests (OperationName) for a given fileSize (RequestBodySize).
The challenge is that I need to sum the RequestBodySize for all GetBlob operations on each file. I'm not sure how to nest sum in summarize.
Tried so far:
StorageBlobLogs
| where TimeGenerated >= ago(5h)
and AccountName == 'storageAccount'
and OperationName == 'GetBlob'
| summarize count() by Uri, fileSize = format_bytes(RequestBodySize)
| render scatterchart
Results in:
Also tried: fileSize = format_bytes(sum(RequestBodySize)) but this errored out.
Any ideas?
EDIT 1: Testing out #Yoni's solution.
Here is an example of RequestBodySize with no summarization:
When implementing the summarize query (| summarize count() by Uri, fileSize = format_bytes(RequestBodySize)), the results are 0 bytes.
Though its clear there are multiple calls for a given Uri, the sum doesn't seem to be working.
EDIT 2:
And yeah... pays to verify the field names! There is no RequestBodySize field available, only ResponseBodySize. Using the correct value worked (imagine that!).
I need to sum the RequestBodySize for all GetBlob operations on each file
If I understood your question correctly, you could try this:
StorageBlobLogs
| where TimeGenerated >= ago(5h)
and AccountName == 'storageAccount'
and OperationName == 'GetBlob'
| summarize count(), total_size = format_bytes(sum(RequestBodySize)) by Uri
Here's an example using a dummy data set:
datatable(Url:string, ResponseBodySize:long)
[
"https://something1", 33554432,
"https://something3", 12341234,
"https://something1", 33554432,
"https://something2", 12345678,
"https://something2", 98765432,
]
| summarize count(), total_size = format_bytes(sum(ResponseBodySize)) by Url
Url
count_
total_size
https://something1
2
64 MB
https://something3
1
12 MB
https://something2
2
106 MB

Passing colum as function parameter in Kusto Azure Log Analytics doesn't works

I want to calculate in Kusto (Azure Log Analytics), based on a date, the number of days without weekends in a month.
This works (using now() as paremeter in the daysOfMonthNoWeekends function call):
let daysOfMonthNoWeekends=(_event_date_t:datetime) {
toscalar(range days from startofmonth(_event_date_t) to endofmonth(_event_date_t) step 1d
| where dayofweek(days) between(1d .. 5d)
| count)
};
//
MyTable_CL
| extend daysOfMonthNoWeekends = daysOfMonthNoWeekends(now())
And this doesn't works:
let daysOfMonthNoWeekends=(_event_date_t:datetime) {
toscalar(range days from startofmonth(_event_date_t) to endofmonth(_event_date_t) step 1d
| where dayofweek(days) between(1d .. 5d)
| count)
};
//
MyTable_CL
| extend daysOfMonthNoWeekends = daysOfMonthNoWeekends(TimeGenerated)
//or with another column of MyTable like event_date_t fails too
//| extend daysOfMonthNoWeekends = daysOfMonthNoWeekends(event_date_t)
Error:
Semantic error: '' has the following semantic error: Unresolved reference binding: 'TimeGenerated'.
For the record I pretend to add a column with the number of days without weekends in a month based on a column which is a date to use it in another calculation.
Any idea why this doesn't works?
the reason this doesn't work is documented here: User-defined functions usage restrictions
specifically:
User-defined functions can't pass into toscalar() invocation information that depends on the row-context in which the function is called.
you should be able to achieve your intention using a join/lookup.
for example (caveat: test this actually works with your data. i 'complied' it in my head at an early morning hour):
let T = datatable(TimeGenerated:datetime)
[
datetime(2020-02-11 11:20),
datetime(2020-04-11 11:30),
datetime(2020-05-12 19:20),
datetime(2020-05-13 19:20),
datetime(2020-04-13 19:20),
datetime(2020-01-11 17:20),
]
;
let daysOfMonthNoWeekends =
range dt from startofmonth(toscalar(T | summarize min(TimeGenerated))) to endofmonth(toscalar(T | summarize max(TimeGenerated))) step 1d
| summarize countif(dayofweek(dt) between(1d .. 5d)) by month = startofmonth(dt)
;
T
| extend month = startofmonth(TimeGenerated)
| lookup daysOfMonthNoWeekends on month
| project-away month

Report Builder Query Text Editor Where Clause Parentheses

My employer has switched data systems and reporting tools. We used to use Report Builder with a nicely built data model that allowed me to do some complex filtering easily. Then we used Business Objects, and though I didn't like it very much, it also let me do some complex filtering. Now we're back to Report Builder, but the data model is different, and the only filtering I seem to be able to do is a string of AND operators.
(Note: I'm self-taught on both Report Builder and Business Objects. I have minimal experience with the SQL coding language itself. Also, actual data labels have been changed in this example.)
I'm pulling from a large amount of data, so I need to filter on the query level. I first need to include data based on five criteria, like this.
| SYSTEM.REGION.REGION_STATUS_CODE = N'1'
| SYSTEM.STATE.STATE_STATUS_CODE = N'1'
AND | SYSTEM.ORDERS.DISCARDED_DATE IS NULL
| SYSTEM.SERVICE.SERVICE_DISCARDED_DATE IS NULL
| SYSTEM.SERVICE.SERVICE_STATUS_CODE = N'01'
Then I need to include data that fits one of two pairings, like this.
| | SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Retail'
| AND | SYSTEM.ORDERS.DISCOUNT_CODE = N'N/A'
OR |
| | SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Wholesale'
| AND | SYSTEM.ORDERS.DISCOUNT_CODE != N'N/A'
After I built my query using the query designer and switched to text mode, it gave me this.
WHERE
SYSTEM.REGION.REGION_STATUS_CODE = N'1'
AND SYSTEM.STATE.STATE_STATUS_CODE = N'1'
AND SYSTEM.ORDERS.DISCARDED_DATE IS NULL
AND SYSTEM.SERVICE.SERVICE_DISCARDED_DATE IS NULL
AND SYSTEM.SERVICE.SERVICE_STATUS_CODE = N'01'
AND SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Retail'
AND SYSTEM.ORDERS.DISCOUNT_CODE = N'N/A'
AND SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Wholesale'
AND SYSTEM.ORDERS.DISCOUNT_CODE != N'N/A'
I've tried putting parentheses in, but I must have done it wrong because the query ran for ages before essentially giving me the entire database.
Anybody care to help a SQL newbie?
Presuming everything else is right, it should just be about applying parenthesis to get the logic right. Using slightly exaggerated whitespace to try and make it clear:
WHERE
SYSTEM.REGION.REGION_STATUS_CODE = N'1'
AND SYSTEM.STATE.STATE_STATUS_CODE = N'1'
AND SYSTEM.ORDERS.DISCARDED_DATE IS NULL
AND SYSTEM.SERVICE.SERVICE_DISCARDED_DATE IS NULL
AND SYSTEM.SERVICE.SERVICE_STATUS_CODE = N'01'
AND (
(SYSTEM.ORDERS.DISCOUNT_CODE = N'N/A'
AND SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Retail')
OR
(SYSTEM.ORDERS.DISCOUNT_CODE != N'N/A'
AND SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Wholesale')
)
(It still may run forever, but that's more a factor of database size and indexing.)

Query Optimization for MS Sql Server 2012

I have a table containing ~5,000,000 rows of scada data, described by the following:
create table data (o int, m money).
Where:
- o is PK with clustered index on it. o's fill factor is close to 100%. o represents the date of meter reading, can be thought of as X axis.
- m is a decimal value laying within 1..500 region and is the actual meter reading can be thought of as Y axis.
I need to find out about certain patterns i.e. when, how often and for how long they had been occurring.
Example. Looking for all occurrences of m changing by a region from 500 to 510 within 5 units (well from 1 to 5) of o I run the following query:
select d0.o as tFrom, d1.o as tTo, d1.m - d0.m as dValue
from data d0
inner join data d1
on (d1.o = d0.o + 1 or d1.o = d0.o + 2 or d1.o = d0.o + 3 or d1.o = d0.o + 4)
and (d1.m - d0.m) between 500 and 510
the query takes 23 seconds to execute.
Previous version took 30 minutes (90 times slower), I' managed to optimize it using a naive approach by replacing : on (d1.o - d0.o) between 1 and 4 with on (d0.o = d1.o - 1 or d0.o = d1.o - 2 or d0.o = d1.o - 3 or d0.o = d1.o - 4)
It's clear to me why it's faster - on one hand indexed column scan should fork fast enough on another one I can afford it as dates are discrete (and I always give 5 minutes grace time to any o region, so for 120 minutes it's 115..120 region). I can't use the same approach with m values as they are integral though.
Things I've tried so far:
Soft sharding by applying where o between #oRegionStart and #oRegionEnd at the bottoom of my script. and running it within a loop, fetching results into a temp table. Execution time - 25 seconds.
Hard sharding by splitting data into a number of physical tables. The result is 2 minutes nevermind the maintenance hassle.
Using some precooked data structures, like:
create table data_change_matrix (o int, dM5Min money, dM5Max money, dM10Min money, dM10Max money ... dM{N}Min money, dM{N}Max money)
where N is max dept for which I run the analysis. Having such table I could easily write a query:
select * from data_change_matrix where dMin5Min between 500 and 510
The result is - it went nowhere due to the tremendous size requirements (5M X ~ 250) and maintenance related costs, I need to support that matrix actuality close to real time.
SQL CLR - don't even ask me what went wrong it just didn't work out.
Right now I'm out of inspiration and looking for help.
All in all - is it possible to get a close to instant response time running such type of queries on large volumes of data?
All's run on MS Sql Server 2012. Didn't try it on MS Sql Server 2014 but happy to do it if it'll make sense.
Update - execution plan: http://pastebin.com/PkSSGHvH.
Update 2 - While I really love LAG function suggested by usr I wonder if there's a LAG**S** function allowing for
select o, MIN(LAG**S**(o, 4)) over(...) - or what's its shortest implementation in TSL?
I tried something very similar using SQL CLR and got it working but the performance was awful.
I assume you meant to write "on (d1.o = ..." and not "on (d.o = ...". Anyway, I got pretty drastic improvements just by simplifying the statement (making it easy for the query optimizer to pick a better plan I guess):
select d0.o as tFrom, d1.o as tTo, d1.m - d0.m as dValue
from data d0
inner join data d1
on d1.o between d0.o + 1 and d0.o + 4
and (d1.m - d0.m) between 500 and 510
Good luck with your query!
You say you've already tried CLR but don't give any code.
It was fastest in my test for my sample data.
CREATE TABLE data
(
o INT PRIMARY KEY,
m MONEY
);
INSERT INTO data
SELECT TOP 5000000 ROW_NUMBER() OVER (ORDER BY ##SPID),
1 + ABS(CAST(CRYPT_GEN_RANDOM(4) AS INT) %500)
FROM master..spt_values v1,
master..spt_values v2
None of the versions actually return any results (it is impossible for m to be a decimal value laying within 1..500 and simultaneously for two m values to have a difference > 500) but disregarding this typical timings I got for the code submitted so far are.
+-----------------+--------------------+
| | Duration (seconds) |
+-----------------+--------------------+
| Lag/Lead | 39.656 |
| Original code | 40.478 |
| Between version | 21.037 |
| CLR | 13.728 |
+-----------------+--------------------+
The CLR code I used was based on that here
To call it use
EXEC [dbo].[WindowTest]
#WindowSize = 5,
#LowerBound = 500,
#UpperBound = 510
Full code listing
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
public partial class StoredProcedures
{
public struct DataRow
{
public int o;
public decimal m;
}
[Microsoft.SqlServer.Server.SqlProcedure]
public static void WindowTest(SqlInt32 WindowSize, SqlInt32 LowerBound, SqlInt32 UpperBound)
{
int windowSize = (int)WindowSize;
int lowerBound = (int)LowerBound;
int upperBound = (int)UpperBound;
DataRow[] window = new DataRow[windowSize];
using (SqlConnection conn = new SqlConnection("context connection=true;"))
{
SqlCommand comm = new SqlCommand();
comm.Connection = conn;
comm.CommandText = #"
SELECT o,m
FROM data
ORDER BY o";
SqlMetaData[] columns = new SqlMetaData[3];
columns[0] = new SqlMetaData("tFrom", SqlDbType.Int);
columns[1] = new SqlMetaData("tTo", SqlDbType.Int);
columns[2] = new SqlMetaData("dValue", SqlDbType.Money);
SqlDataRecord record = new SqlDataRecord(columns);
SqlContext.Pipe.SendResultsStart(record);
conn.Open();
SqlDataReader reader = comm.ExecuteReader();
int counter = 0;
while (reader.Read())
{
DataRow thisRow = new DataRow() { o = (int)reader[0], m = (decimal)reader[1] };
int i = 0;
while (i < windowSize && i < counter)
{
DataRow previousRow = window[i];
var diff = thisRow.m - previousRow.m;
if (((thisRow.o - previousRow.o) <= WindowSize-1) && (diff >= lowerBound) && (diff <= upperBound))
{
record.SetInt32(0, previousRow.o);
record.SetInt32(1, thisRow.o);
record.SetDecimal(2, diff);
SqlContext.Pipe.SendResultsRow(record);
}
i++;
}
window[counter % windowSize] = thisRow;
counter++;
}
SqlContext.Pipe.SendResultsEnd();
}
}
}
This looks like a great case for windowed aggregate functions or LAG. Here a version using LAG:
select *
from (
select o
, lag(m, 4) over (order by o) as m4
, lag(m, 3) over (order by o) as m3
, lag(m, 2) over (order by o) as m2
, lag(m, 1) over (order by o) as m1
, m as m0
from data
) x
where 0=1
or (m1 - m0) between 500 and 510
or (m2 - m0) between 500 and 510
or (m3 - m0) between 500 and 510
or (m4 - m0) between 500 and 510
Using a windowed aggregate function you should be able to remove the manual expansion of those LAG calls.
SQL Server implements these things using a special execution plan operator called Window Spool. That makes it quite efficient.