KQL query for azure disk free space in GB and free disk percentage using Insights - kql

I am new to KQL . Can someone please guide me if this query is right?
InsightsMetrics
| where Name == 'FreeSpaceMB'
| summarize arg_max(TimeGenerated, *) by Tags, Computer
| extend Drive = tostring(parse_json(Tags)["vm.azm.ms/mountId"])
| extend Size = toreal(parse_json(Tags)["vm.azm.ms/diskSizeMB"])
| project TimeGenerated, Computer, Drive, bin(SizeGB = Size / 1024, 0.1), bin(FreeGB = Val / 1024, 1)
| join kind=inner (InsightsMetrics
| where Name == "FreeSpacePercentage"
| summarize arg_max(TimeGenerated, *) by Tags, Computer
| extend Drive = tostring(parse_json(Tags)["vm.azm.ms/mountId"])
| project TimeGenerated, Computer, Drive, bin(FreePercent = Val, 1.1))
on Computer, Drive
| project TimeGenerated, Computer, Drive, SizeGB, FreeGB, FreePercent
| order by Computer asc

You query seems more or less O.K.
A few comments:
Avoid calling parse_json multiple times.
Summarize only by mountId and not the entire Tags field (unless you want to get multiple rows in case of change to diskSizeMB)
There is actually no reason to self-join, since you can compute FreeSpacePercentage.
InsightsMetrics
| where Name == "FreeSpaceMB"
| extend Tags = parse_json(Tags)
| extend mountId = tostring(Tags["vm.azm.ms/mountId"])
,diskSizeMB = toreal(Tags["vm.azm.ms/diskSizeMB"])
| project-rename FreeSpaceMB = Val
| summarize arg_max(TimeGenerated, diskSizeMB, FreeSpaceMB) by Computer, mountId
,FreeSpacePercentage = round(FreeSpaceMB / diskSizeMB * 100, 1)
| extend diskSizeGB = round(diskSizeMB / 1024, 1)
,FreeSpaceGB = round(FreeSpaceMB / 1024, 1)
| project TimeGenerated, Computer, mountId, diskSizeGB, FreeSpaceGB, FreeSpacePercentage
| order by Computer asc, mountId asc

Related

How better I can optimize this Kusto Query to get my logs

I have below query which I am running and getting logs for Azure K8s, but its takes hour to generate the logs and i am hoping there is a better way to write what i have already written. Can some Kusto experts advice here as how can I better the performance?
AzureDiagnostics
| where Category == 'kube-audit'
| where TimeGenerated between (startofday(datetime("2022-03-26")) .. endofday(datetime("2022-03-27")))
| where (strlen(log_s) >= 32000
and not(log_s has "aksService")
and not(log_s has "system:serviceaccount:crossplane-system:crossplane")
or strlen(log_s) < 32000
| extend op = parse_json(log_s)
| where not(tostring(op.verb) in ("list", "get", "watch"))
| where substring(tostring(op.responseStatus.code), 0, 1) == "2"
| where not(tostring(op.requestURI) in ("/apis/authorization.k8s.io/v1/selfsubjectaccessreviews"))
| extend user = op.user.username
| extend decision = tostring(parse_json(tostring(op.annotations)).["authorization.k8s.io/decision"])
| extend requestURI = tostring(op.requestURI)
| extend name = tostring(parse_json(tostring(op.objectRef)).name)
| extend namespace = tostring(parse_json(tostring(op.objectRef)).namespace)
| extend verb = tostring(op.verb)
| project TimeGenerated, SubscriptionId, ResourceId, namespace, name, requestURI, verb, decision, ['user']
| order by TimeGenerated asc
You could try starting your query as follow.
Please note the additional condition at the end.
AzureDiagnostics
| where TimeGenerated between (startofday(datetime("2022-03-26")) .. endofday(datetime("2022-03-27")))
| where Category == 'kube-audit'
| where log_s hasprefix '"code":2'
I assumed that code is integer, in case it is string, use the following (added qualifier)
| where log_s has prefix '"code":"2'

Add a Dummy Row for Each Row in the Table

I have below query which returns %CPU of each Computer by every 1 hour
Query
Perf
| where TimeGenerated > ago(1h)
| where CounterName == "% Processor Time"
| where Computer endswith "XYZ"
| summarize avg(CounterValue) by bin(TimeGenerated, 1h), Computer
Result
I want to append Dummy row for each-row in the table with fixed value except TimeGenerated should be same as previous row in the table. Expecting result should look something like this.
Expected Result
you could try something like this (note that you'll need to explicitly order your records as you wish):
let T =
Perf
| where TimeGenerated > ago(1h)
| where CounterName == "% Processor Time"
| where Computer endswith "XYZ"
| summarize avg(CounterValue) by bin(TimeGenerated, 1h), Computer
;
T
| union (T | extend Computer = "Dummy", avg_CounterValue = 10)
| order by TimeGenerated

How do you filter with a Dataframe, list, vector etc. to a table in a database in R?

I have a large set of id-s which is in a csv file. How could I filter on a database table using only that one-columned table in the csv file?
For example in the ODBC database we have:
TABLE 1
+---------+------+
| ID | TYPE |
+---------+------+
| 43PRJIF | A |
| 35IRPFJ | A |
| 452JSU | B |
| 78JFIER | B |
| 48IRUW | C |
| 89UEJDU | C |
| 784NFJR | D |
| 326NFR | D |
| 733ZREW | E |
+---------+------+
And in the CSV file we have:
+---------+
| ID |
+---------+
| 89UEJDU |
| 784NFJR |
| 326NFR |
| 733ZREW |
+---------+
Basically I would like to use something from the dbplyr package if possible. E.g importing the csv table to a dataframe then use a syntax in dbplyr like:
new_table <- TABLE1 %>%
filter(id == "ROWS IN THE CSV")
To get an output like that:
+---------+------+
| ID | TYPE |
+---------+------+
| 89UEJDU | C |
| 784NFJR | D |
| 326NFR | D |
| 733ZREW | E |
+---------+------+
Thank you for your help in advance!
In general joining or merging tables requires them to share the same environment. Hence, there are three general options here:
Load the remote table into R's local workspace
Load the CSV table into the database and use a semi-join.
'Smuggle' the list of IDs in the CSV into the database
Let's consider each in turn:
Option 1
This is probably the simplest option but it requires that the remote/ODBC table is small enough to fit in R's working memory. If so, you can call local_table = collect(remote_table) to load the database table.
Option 2
dbplyr includes a command copy_to (ref) that lets you copy local tables via odbc to a database/remote connection. You will need to have permission to create tables in the remote environment.
This approach makes use of the DBI package. At the time of writing v1.0.0 of DBI on CRAN has some limitations when writing to non-default schemas. So you may need to upgrade to the development version on GitHub (here).
Your code will look something like:
DBI::dbWriteTable(db_connection,
DBI::Id(schema = "schema", table = "name")),
r_table_name)
Option 3
Smuggle the list of IDs into the database via the table definition. This is the same idea as here, and works best if the list of IDs is short.
Remote tables are essentially defined by the code/query that fetches their results. Hence the list of IDs can appear in the code that defines your remote table. Consider the following example:
library(dplyr)
library(dbplyr)
data(mtcars)
list_of_ids = c(1,2,3,4)
df = tbl_lazy(mtcars, con = simulate_mssql())
df %>% filter(ID %in% list_of_ids ) %>% show_query()
show_query() renders the code that defines the current version of the remote table. In the example above it returns the following - note that the list of IDs now appears in the code.
<SQL>
SELECT *
FROM `df`
WHERE (`ID` IN (1.0, 2.0, 3.0, 4.0))
If the list of IDs is very long, the size of this query will become a problem. Hence there is a limit on the number of IDs you can filter on using this approach (I have not tested this approach to find the limit - I seldom using the IN clause for a list of more than 10).
Quick solution is by using merge function
example:
table1 <- data.frame(
ID = c("4322", "2245", "3356"),
TYPE = c("B", "A", "A")
)
table2 <- data.frame(
ID = c("2245")
)
table3 <- merge(table2, table1, all.x = TRUE, by = "ID")
ID TYPE
2245 A
table3 was created by filtering table1 using table2

Removing (grouping by) arrays which are subsets of the biggest array

I have a dataset which looks like this:
| path |
|----------------- |
| {16,13} |
| {16,85} |
| {16,85,1} |
| {16,85,2} |
| {16,85,15} |
| {16,85,80} |
| {16,85,80,1} |
| {16,85,80,63} |
| {16,85,80,63,1} |
The path column represents a kind of hierarchical path through a graph, from some node to a another node. I'm trying to collapse each path down into the longest paths from root to leaf nodes - it should be noted that the order of the elements is important ({1, 2, 3} != {3, 2, 1}).
As an example:
The path {16, 13} is the longest path containing both 16 and 13 in that order, so it stays.
The path {16, 85} is not the longest, as there is a longer path containing those elements in that order, namely {16, 85, 2}. Therefore the row with {16, 85} should be discarded from the result set, and {16, 85, 2} should be kept as it happens to be the longest.
Etc. with every other row
So the resulting set looks like:
| path |
|----------------- |
| {16,13} |
| {16,85,1} |
| {16,85,2} |
| {16,85,15} |
| {16,85,80,1} |
| {16,85,80,63,1} |
I'm not sure even where to start with this, everything I've tried has failed.
I've found that there is something called the array contains operator #>, but don't really know how the apply it.
Is there a reasonable query for doing this? Any help would be great. Thanks!
I think you want the "not contains" operator. So, you can do:
select p.*
from paths p
where not exists (select 1
from paths p2
where p2.path #> p.path and p2.path <> p.path
);
I'm not promising that this is efficient, but it should work well on a smallish table.
EDIT:
To handle ordering, one approach is to convert to a string:
select p.*
from paths p
where not exists (select 1
from paths p2
where array_to_string(p2.path, ',') like array_to_string(p.path, ',') || ',%'
);

connecting three tables in one query

I have the following tables
mixes
mid | date | info
1 | 2009-07-01 | no info yet
music-review
mid | song | buy
1 | Example - Example | http://example.com
2 | Exam - Exam | http://example.com
tracklist
tid | mid | mrid
1 | 1 | 1
2 | 1 | 2
is it possible to have an SQL query where you can link these all into one?
so my results would turn out like:
date | info | tracklist
2009-07-01 | no info yet | Example - Example http://example.com, Exam - Exam http://example.com
or however this result would be returned... or would this need to be a two sql querys where i get the MID from the mixes and then do a query to get the tracklist from that?
For MySQL:
SELECT mixes.date, mixes.info,
GROUP_CONCAT(DISTINCT music-review.song + ' ' + music-review.buy
ORDER BY music-review.mid ASC SEPARATOR ', ')
FROM mixes
JOIN tracklist ON tracklist.mid = mixes.mid
JOIN music-review ON music-review.mrid = tracklist.mrid
GROUP BY mixes.date, mixes.info
this works as adapted from mherren:
SELECT mixes.`date`, mixes.info,
CONCAT(GROUP_CONCAT(DISTINCT `music-review`.song , ' ' , `music-review`.`mid`
ORDER BY `tracklist`.`tid` ASC SEPARATOR ', ')) as `tracklist`
FROM mixes
JOIN tracklist ON tracklist.`mid` = mixes.`mid`
JOIN `music-review` ON tracklist.`mrid` = `music-review`.`mid`
WHERE `mixes`.`date`='2009-07-01'
GROUP BY mixes.`date`, mixes.info;
it fixes a blurb issue i was getting but, one thing is that group_concat has a max limit set at 1024 this can be altered tho by
SET GLOBAL group_concat_max_len=4096
I left so many comments, I thought it would be more helpful to suggest a revision to your architecture as an answer. However, I do think mherren has already competently addressed your actual concern, so while votes are appreciated, I don't think this should be considered as the the right "answer".
I think you need to reconsider how you have arranged the data. Specifically, you have a table for "music-review" that seems out of place while at the same time you refer to "mixes" and "tracklists" which seems a bit redundant. I imagine you want a one-to-many relationship where "mixes" refers to the information about the mix, like when it was created, the user who created it, etc. While "tracklist" is the list of songs within the "mix". What if you tried this:
#song-info
song_id | artist | title | online-store
1 | The Airheads | Asthma Attack! | example.com/?id=123
2 | The Boners | Bite the Boner | example.com/?id=456
3 | Cats in Heat | Catching a Cold | example.com/?id=789
4 | Dirty Djangos | Dig these Digits | example.com/?id=147
#mixes
mix_id | date | info
1 | 2009-07-01 | no info yet
2 | 2009-07-02 | no info yet
#mix_tracklist
mix_id | song_id
1 | 1
1 | 2
2 | 2
2 | 3
Now you can have a list of available mixes, and if a user selects a mix, another query for the actual songs.
Trying to put all of the song data into one column should only be done if the results require that info right away or if there is a condition within the query itself that is conditional to the results of that sub-query. If you simply want to output a list of mixes with the track list for each one, you are better of doing the query for each mix based on the mix index. So in the case of php outputting HTML, you would go with:
$mixes = mysql_query("SELECT * FROM mixes WHERE date > '$last_week'");
while($mix = mysql_fetch_assoc($mixes)) {
$mix_id = $mix['mix_id'];
$mix_date = date('m/d/Y', strtotime($mix['mix_id']));
$mix_info = $mix['mix_id'];
echo <<<EOT
<h2 class="mix_number">$mix_number</h2>
<p class="mix_date">$mix_date</p>
<p class="mix_info">$mix_info</p>
<h3>Track Listing:</h3>
<ul class="tracklist">
EOT;
$tracks = mysql_query("SELECT song.artist artist,
song.title title,
song.online-store url
song.song_id
FROM song-info song
JOIN mix_tracklist tracks ON (tracks.song_id = song.song_id)
WHERE tracks.mix_id = '$mix_id'
ORDER_BY song_id);
while ($track = mysql_fetch_assoc($tracks)) {
$artist = $track['artist'];
$song_name = $track['title'];
$song_url = $track['url'];
echo <<<EOT
<li>$artist – $song_name</li>
EOT;
}
echo "</ul>";
}