Passing colum as function parameter in Kusto Azure Log Analytics doesn't works - azure-log-analytics

I want to calculate in Kusto (Azure Log Analytics), based on a date, the number of days without weekends in a month.
This works (using now() as paremeter in the daysOfMonthNoWeekends function call):
let daysOfMonthNoWeekends=(_event_date_t:datetime) {
toscalar(range days from startofmonth(_event_date_t) to endofmonth(_event_date_t) step 1d
| where dayofweek(days) between(1d .. 5d)
| count)
};
//
MyTable_CL
| extend daysOfMonthNoWeekends = daysOfMonthNoWeekends(now())
And this doesn't works:
let daysOfMonthNoWeekends=(_event_date_t:datetime) {
toscalar(range days from startofmonth(_event_date_t) to endofmonth(_event_date_t) step 1d
| where dayofweek(days) between(1d .. 5d)
| count)
};
//
MyTable_CL
| extend daysOfMonthNoWeekends = daysOfMonthNoWeekends(TimeGenerated)
//or with another column of MyTable like event_date_t fails too
//| extend daysOfMonthNoWeekends = daysOfMonthNoWeekends(event_date_t)
Error:
Semantic error: '' has the following semantic error: Unresolved reference binding: 'TimeGenerated'.
For the record I pretend to add a column with the number of days without weekends in a month based on a column which is a date to use it in another calculation.
Any idea why this doesn't works?

the reason this doesn't work is documented here: User-defined functions usage restrictions
specifically:
User-defined functions can't pass into toscalar() invocation information that depends on the row-context in which the function is called.
you should be able to achieve your intention using a join/lookup.
for example (caveat: test this actually works with your data. i 'complied' it in my head at an early morning hour):
let T = datatable(TimeGenerated:datetime)
[
datetime(2020-02-11 11:20),
datetime(2020-04-11 11:30),
datetime(2020-05-12 19:20),
datetime(2020-05-13 19:20),
datetime(2020-04-13 19:20),
datetime(2020-01-11 17:20),
]
;
let daysOfMonthNoWeekends =
range dt from startofmonth(toscalar(T | summarize min(TimeGenerated))) to endofmonth(toscalar(T | summarize max(TimeGenerated))) step 1d
| summarize countif(dayofweek(dt) between(1d .. 5d)) by month = startofmonth(dt)
;
T
| extend month = startofmonth(TimeGenerated)
| lookup daysOfMonthNoWeekends on month
| project-away month

Related

SQL - How to get rows within a date period that are within another date period?

I have the following table in the DDBB:
On the other side, i have an interface with an start and end filter parameters.
So i want to understand how to query the table to only get the data from the table which period are within the values introduces by the user.
Next I present the 3 scenarios possible. If i need to create one query per each scenario is ok:
Scenario 1:If the users only defines start = 03/01/2021, then the expected output should be rows with id 3,5 and 6.
Scenario 2:if the users only defines end = 03/01/2021, then the expected output shoud be rows with id 1 and 2.
Scenario 3:if the users defines start =03/01/2021 and end=05/01/2021 then the expected output should be rows with id 3 and 5.
Hope that makes sense.
Thanks
I will assume that start_date and end_date here are DateFields [Django-doc], and that you have a dictionary with a 'start' and 'end' as (optional) key, and these map to date object, so a possible dictionary could be:
# scenario 3
from datetime import date
data = {
'start': date(2021, 1, 3),
'end': date(2021, 1, 5)
}
If you do not want to filter on start and/or end, then either the key is not in the dictionary data, or it maps to None.
You can make a filter with:
filtr = {
lu: data[ky]
ky, lu in (('start', 'start_date__gte'), ('end', 'end_date__lte'))
if data.get(ky)
}
result = MyModel.objects.filter(**filtr)
This will then filter the MyModel objects to only retrieve MyModels where the start_date and end_date are within bounds.

How to use `sum` within `summarize` in a KQL query?

I'm working at logging an Azure Storage Account. Have a Diagnostic Setting applied and am using Log Analytics to write KQL queries.
My goal is to determine the number of GetBlob requests (OperationName) for a given fileSize (RequestBodySize).
The challenge is that I need to sum the RequestBodySize for all GetBlob operations on each file. I'm not sure how to nest sum in summarize.
Tried so far:
StorageBlobLogs
| where TimeGenerated >= ago(5h)
and AccountName == 'storageAccount'
and OperationName == 'GetBlob'
| summarize count() by Uri, fileSize = format_bytes(RequestBodySize)
| render scatterchart
Results in:
Also tried: fileSize = format_bytes(sum(RequestBodySize)) but this errored out.
Any ideas?
EDIT 1: Testing out #Yoni's solution.
Here is an example of RequestBodySize with no summarization:
When implementing the summarize query (| summarize count() by Uri, fileSize = format_bytes(RequestBodySize)), the results are 0 bytes.
Though its clear there are multiple calls for a given Uri, the sum doesn't seem to be working.
EDIT 2:
And yeah... pays to verify the field names! There is no RequestBodySize field available, only ResponseBodySize. Using the correct value worked (imagine that!).
I need to sum the RequestBodySize for all GetBlob operations on each file
If I understood your question correctly, you could try this:
StorageBlobLogs
| where TimeGenerated >= ago(5h)
and AccountName == 'storageAccount'
and OperationName == 'GetBlob'
| summarize count(), total_size = format_bytes(sum(RequestBodySize)) by Uri
Here's an example using a dummy data set:
datatable(Url:string, ResponseBodySize:long)
[
"https://something1", 33554432,
"https://something3", 12341234,
"https://something1", 33554432,
"https://something2", 12345678,
"https://something2", 98765432,
]
| summarize count(), total_size = format_bytes(sum(ResponseBodySize)) by Url
Url
count_
total_size
https://something1
2
64 MB
https://something3
1
12 MB
https://something2
2
106 MB

SSAS MDX Problem with IIF and TAIL() function

Experts,
I am using SQL SSAS Std. 2017 and in the end want to create a calculated member that returns either the last month's day of my data or the current day if the last existing data is of today.
(=> If today is Aug-31 I want to retrieve Aug-31 of my data, otherwise if today is e.g. Aug-30 retrieve Jul-31)
To develop this member I am currently in the process of creating a MDX query in SQL Server. I am having difficulties to understand what is actually a "tuple set expression" (because the TAIL() function shall return a subset (ergo set) according to MSDN) but in fact I am receiving errors when playing around with the .Item(0) function on its result. In MSDN I cannot find information about "tuple sets" and how to make them do, what I want.
My Date Dimension has a Hierarchy JMT (Year | Month | Day | Date Key of type DATE).
To receive the most current date member of the cross product I am using the TAIL(NONEMPTY(Date...Members, { (DimX.&.. , DimY.&.. , DimZ.&..) })) expression which works fine.
But how do I choose between today's or the previous month's date?
My MDX for testing purposes on February (2) is as follows:
SELECT {
IIF(
TAIL(NONEMPTY([DateDim].[JMT].[T].Members, { ([DimX].[X].&[200], [DimY].[Company].&[4499166], [DateDim].[JMT].[M].&[2020]&[2]) })).Item(0) --.Properties('Date Key', TYPED)
> NOW()
, TAIL(NONEMPTY([DateDim].[JMT].[T].Members, { ([DimX].[X].&[200], [DimY].[Company].&[4499166], [DateDim].[JMT].[M].&[2020]&[1]) }))
, TAIL(NONEMPTY([DateDim].[JMT].[T].Members, { ([DimX].[X].&[200], [DimY].[Company].&[4499166], [DateDim].[JMT].[M].&[2020]&[2]) }))
)
-- ,
-- TAIL(NONEMPTY([DateDim].[JMT].[T].Members, { ([DimX].[X].&[200], [DimY].[Company].&[4499166], [DateDim].[JMT].[M].&[2020]&[2]) })) } on columns
, { [Measures].[Turnover] } on rows
FROM [Finance]
Result:
As you can see, the IIF function does not do what I want. It assumes .Item(0) is greater than NOW() therefore returning the "31" member of January (1). Expected: "29" of February.
I guess, it might be a problem with the data types and the actual value returned by .Item(0). But if I want to use the .Properties('Date Key', TYPED) it states "The Date Key Dimension Attribute could not be found. See below picture.
In the image of the DateDim it should be "DateDim.JMT" in the blue area ;-).
Do you have any suggestions?
Thank you, Cordt
If you switch this:
TAIL(NONEMPTY([DateDim].[JMT].[T].Members, { ([DimX].[X].&[200], [DimY].[Company].&[4499166], [DateDim].[JMT].[M].&[2020]&[2]) })).Item(0)
to the following does it help?
Tail
(
NonEmpty
(
[DateDim].[JMT].[T].MEMBERS
,{
(
[DimX].[X].&[200]
,[DimY].[Company].&[4499166]
,[DateDim].[JMT].[M].&[2020]&[2]
)
}
)
).Item(0).Item(0).MemberValue

Kettle Datedif month issue

I need to reproduce the kettle Datedif function in the R programming language. I need the 'datedif month' option. I thought reproducing would be pretty easy but I have some 'weird behaviour' in pentaho. As an example:
ID date_1 date_2 monthly_difference_kettle daydiff_mysql
15943 31/12/2013 28/07/2014 7 209
15943 31/12/2011 27/07/2012 6 209
So in pentaho kettle I used the formula-step and the function DATEDIF(date2,date1,"m"). As you can see when I calculate the daily difference in mysql I get for both records the same amount of days in difference (209), however, when the monthly difference is calculated via the formula step in pentaho kettle I get a different result in months (7 and 6 respectively). I don't understand how this is calculated...
Can anyone produce the source code for the 'DATEDIF months' function in pentaho? I would like to reproduce it in R so I get exactly the same results.
Thanks in advance,
best regards,
Not sure about mysql, but i think it is the same. In PostgreSQL date difference gives integer value (in days). It means both rows has total match in days.
Calculating month difference non trivial. What is month (28, 30, 31 day)? Shall we count if month is not full?
Documentation states If there is not a complete month between the dates, 0 will be returned
But according to source code easy to understand how calculated datedif:
Source code available via github https://github.com/pentaho/pentaho-reporting/blob/f7defbcfc0e8f48ad2b139fe9820445f052e0e78/libraries/libformula/src/main/java/org/pentaho/reporting/libraries/formula/function/datetime/DateDifFunction.java
private int addFieldLoop( final GregorianCalendar c, final GregorianCalendar target, final int field ) {
c.set( Calendar.MILLISECOND, 0 );
c.set( Calendar.SECOND, 0 );
c.set( Calendar.MINUTE, 0 );
c.set( Calendar.HOUR_OF_DAY, 0 );
target.set( Calendar.MILLISECOND, 0 );
target.set( Calendar.SECOND, 0 );
target.set( Calendar.MINUTE, 0 );
target.set( Calendar.HOUR_OF_DAY, 0 );
if ( c.getTimeInMillis() == target.getTimeInMillis() ) {
return 0;
}
int count = 0;
while ( true ) {
c.add( field, 1 );
if ( c.getTimeInMillis() > target.getTimeInMillis() ) {
return count;
}
count += 1;
}
}
Append 1 month to start date till it will become bigger then end date

RankOver Partition by with minutes and seconds

I am trying to sequence data and as it occurs there are instances where I have to order this sequence using hour/minutes and seconds. However when I use the rank/partition by function, it's almost as if it does not recognize this as chronological data at all. An example of the data I am trying to sequence is below:
Mod_Order Last_Activity ACTIVITY_DATE_DTTM hdm_modif_dttm
1 NULL 15/08/2007 00:00:00 59:47.3
2 NULL 27/09/2007 14:30:02 59:22.9
3 NULL 27/11/2007 15:30:02 59:10.5
3 NULL 27/11/2007 15:30:02 58:38.9
As you can see the last two ACTIVITY_DATE_DTTM date times are exactly the same so I need to go a step further - I removed the date from the hdm_modif_dttm field to see if it made any difference but it does not (I left it as time though as I figured it does not make any difference anyhow). So my code was as follows:
Update q
set q.Mod_Order = b.Mod_Order
from [#Temp_last_act_2]q
Left join
(
select
RANK () over
(partition by pathway_id
order by pathway_id, ACTIVITY_DATE_DTTM,hdm_modif_dttm) as Mod_Order,
PATHWAY_ID,
MODIF_DTTM,
ACTIVITY_DATE_DTTM
from #temp_Last_act_2
) as b on b.PATHWAY_ID = q.PATHWAY_ID
and b.MODIF_DTTM = q.MODIF_DTTM
and b.ACTIVITY_DATE_DTTM = q.ACTIVITY_DATE_DTTM
Is anyone aware of any limitations using this function that I am unaware of or is there a function that may handle this better (or am I being really daft?)