Is there any difference in KQL in performance between joining WHERE conditions using ADD statements or adding them separately?
Will smth like
Events
| where Source == "myapp"
and Timestamp > ago(7d)
and isnotnull(DeviceId)
and isnotnull(UserId)
be faster than
Events
| where Source == "myapp"
| where Timestamp > ago(7d)
| where isnotnull(DeviceId)
| where isnotnull(UserId)
?
No difference whatsoever. Both queries are equivalent.
According to the docs, you should use a time filter first because Kusto is highly optimized to use time filters.
Related
Is it possible to implement count() MTS based on condition?
For instance:
We need to monitor the amount of time RDS CPU picks the point of 95% for the last 3 days.
A = data('CPU_Utilization').count(...when point > 95%).
detector(when(A > {number_of_times_breached}, lasting='3d')).publish(...)
Update.
Solution was found by my colleague:
A = data('CPU_Utilization').above({condition_value}, inclusive=True).count(...)
You can use eval() with boolean result inside count() in your SPL query.
Something like
| <your search> | stats count(eval(point>0.95))
I run this query on log analytics
Perf
| where TimeGenerated >= ago(5m)
| join kind = inner
(
Heartbeat
| where TimeGenerated >= ago(5m)
| summarize arg_max(TimeGenerated, *)
by SourceComputerId
) on Computer
| summarize arg_max(TimeGenerated, *) by SourceComputerId, CounterName
| extend Indicator = strcat(ObjectName,'-', CounterName)
| summarize dict = make_dictionary
(
pack
(
'WorkspaceId'
, TenantId
, Indicator
, CounterValue
, 'TimeGenerated'
, TimeGenerated
, 'Computer'
, Computer
)
) by SourceComputerId
| evaluate bag_unpack(dict)
But it's a little bit slow. Is there any way to optimize it, I want the fastest possible query to achieve the same results.
It's somewhat challenging to say without you mentioning the size of the data (e.g. record count) for each of the join legs and the cardinality of the SourceComputerId column.
I would recommend that you go over the query best practices document which covers several techniques for optimization, and see if that helps
Update: Explicitly mentioning best practices which may be helpful in your case: (for you to verify)
When using join operator - choose the table with less rows to be the first one (left-most).
When left side is small (up to 100,000 records) and right side is big then it is recommended to use the hint.strategy=broadcast.
When both sides of the join are too big and the join key is with high cardinality, then it is recommended to use the hint.strategy=shuffle.
When the group by keys of the summarize operator are with high cardinality (best practice: above 1 million) then it is recommended to use the hint.strategy=shuffle.
Can someone explain why these two queries (sometimes) do cause errors? I googled some explanations but none of them were right. I dont want to fix it. This queries should be actually used for SQL injection attack (I think error based sql injection). Triggered error should be "duplicate entry". I'm trying to found out why are they sometimes counsing errors.
Thanks.
select
count(*)
from
information_schema.tables
group by
concat(version(),
floor(rand()*2));
select
count(*),
concat(version(),
floor(rand()*2))x
from
information_schema.tables
group by
x;
It seems the second one is trying to guess which database the victim of the injection is using.
The second one is giving me this:
+----------+------------------+
| count(*) | x |
+----------+------------------+
| 88 | 10.1.38-MariaDB0 |
| 90 | 10.1.38-MariaDB1 |
+----------+------------------+
Okay, I'm going to post an answer - and it's more of a frame challenge to the question itself.
Basically: this query is silly, and it should be written; find out what it's supposed to do and rewrite it in a way that makes sense.
What does the query currently do?
It looks like it's getting a count of the tables in the current database... except it's grouping by a calculated column. And that column looks like it is Version() and appends either a '0' or a '1' to it (chosen randomly.)
So the end result? Two rows, each with a numerical value, the sum of which adds up to the total number of tables in the current database. If there are 30 tables, you might get 13/17 one time, 19/11 the next, followed by 16/14.
I have a hard time believing that this is what the query is supposed to do. So instead of just trying to fix the "error" - dig in and figure out what piece of data it should be returning - and then rewrite the proc to do it.
I am using application insights to monitor API usage in my application. I am trying to generate a report to list down how many times a particular API was called over the last 2 months. Here is my query
requests
| where timestamp >= ago(24*60h)
| summarize count() by name
| order by count_ desc
The problem is that the 'name' of the API has also got parameters attached along with the URL, and so the same API appears many times in the result set with different parameters (e.g. GET api/getTasks/1, GET api/getTasks/2). I tried to look through the 'requests' schema to check if there is a column that I could use which had the API name without parameters, but couldn't find it. Is there a way to group by 'name' without parameters on insights? Please help with the query. Thanks so much in advance.
This cuts everything after the second slash:
requests
| where timestamp > ago(1d)
| extend idx = indexof(name, "/", indexof(name, "api/") + 4)
| extend strippedname = iff(idx >= 0, substring(name, 0, idx), name)
| summarize count() by strippedname
| order by count_
Another approach (if API surface is small) is to extract values through nested iif operators.
I need to run the following query on a Django model:
SELECT *
FROM app_model
WHERE GREATEST(field1, fixed_value) < LEAST(field2, another_fixed_value)
Is there any way for me to run this query without resorting to the raw() method?
You can at least avoid raw by using extra. I don't think the ORM otherwise exposes GREATEST or LEAST.
In theory you could break down your constraint into its different possibilities and or them together:
mymodel.objects.filter(Q(field1__gt=fixed_value and field2__lt=another_fixed_value and field1__lt=field2) | \
Q(field1__lte=fixed_value and field2__lt=another_fixed_value and field2__gt=fixed_value) | \
Q(field1__gt=fixed_value and field2__gte=another_fixed_value and field1__lt=another_fixed_value) | \
Q(field1__lte=fixed_value and field2__gte=another_fixed_value and fixed_value < another_fixed_value))
Except obviously you wouldn't actually include that and fixed_value < another_fixed_value. If they're literally fixed and you know them when writing the code, you'd just make the first two comparisons - if you don't know them, only build the last Q object and or it into the query if necessary.
That said, that's horrible and I think extra is a much better choice.
mymodel.objects.extra(where=['GREATEST(field1, fixed_value) < LEAST(field2, another_fixed_value)'])
Take a look to field lookups
Model.objects.filter(id__gt=4)
is equivalent to:
SELECT ... WHERE id > 4;
less than
Model.objects.filter(id__lt=4)
is equivalent to:
SELECT ... WHERE id < 4;