How to use `sum` within `summarize` in a KQL query? - sum

I'm working at logging an Azure Storage Account. Have a Diagnostic Setting applied and am using Log Analytics to write KQL queries.
My goal is to determine the number of GetBlob requests (OperationName) for a given fileSize (RequestBodySize).
The challenge is that I need to sum the RequestBodySize for all GetBlob operations on each file. I'm not sure how to nest sum in summarize.
Tried so far:
StorageBlobLogs
| where TimeGenerated >= ago(5h)
and AccountName == 'storageAccount'
and OperationName == 'GetBlob'
| summarize count() by Uri, fileSize = format_bytes(RequestBodySize)
| render scatterchart
Results in:
Also tried: fileSize = format_bytes(sum(RequestBodySize)) but this errored out.
Any ideas?
EDIT 1: Testing out #Yoni's solution.
Here is an example of RequestBodySize with no summarization:
When implementing the summarize query (| summarize count() by Uri, fileSize = format_bytes(RequestBodySize)), the results are 0 bytes.
Though its clear there are multiple calls for a given Uri, the sum doesn't seem to be working.
EDIT 2:
And yeah... pays to verify the field names! There is no RequestBodySize field available, only ResponseBodySize. Using the correct value worked (imagine that!).

I need to sum the RequestBodySize for all GetBlob operations on each file
If I understood your question correctly, you could try this:
StorageBlobLogs
| where TimeGenerated >= ago(5h)
and AccountName == 'storageAccount'
and OperationName == 'GetBlob'
| summarize count(), total_size = format_bytes(sum(RequestBodySize)) by Uri
Here's an example using a dummy data set:
datatable(Url:string, ResponseBodySize:long)
[
"https://something1", 33554432,
"https://something3", 12341234,
"https://something1", 33554432,
"https://something2", 12345678,
"https://something2", 98765432,
]
| summarize count(), total_size = format_bytes(sum(ResponseBodySize)) by Url
Url
count_
total_size
https://something1
2
64 MB
https://something3
1
12 MB
https://something2
2
106 MB

Related

Querying a python pickled object in postgresql

I have a table as shown
ID (int) | DATA (bytea)
1 | \x800495356.....
The contents of the data column have been stored via a python script
result_dict = {'datapoint1': 100, 'datapoint2': 2.334'}
table.data = pickle.dumps(result_dict)
I can easily read the data back using
queried_dict = pickle.loads(table.data)
But I don't know how to query it directly as a json or even as plain text in postgres alone. I have tried the following query and many versions of it but it doesn't seem to work
-- I don't know what should come between SELECT and FROM
SELECT encode(data, 'escape') AS res FROM table WHERE id = 1;
-- I need to get this or somewhere close to this as the query result
res |
{"datapoint1": 100, "datapoint2": 2.33}
Thanks a lot in advance to everyone trying to help.

Passing colum as function parameter in Kusto Azure Log Analytics doesn't works

I want to calculate in Kusto (Azure Log Analytics), based on a date, the number of days without weekends in a month.
This works (using now() as paremeter in the daysOfMonthNoWeekends function call):
let daysOfMonthNoWeekends=(_event_date_t:datetime) {
toscalar(range days from startofmonth(_event_date_t) to endofmonth(_event_date_t) step 1d
| where dayofweek(days) between(1d .. 5d)
| count)
};
//
MyTable_CL
| extend daysOfMonthNoWeekends = daysOfMonthNoWeekends(now())
And this doesn't works:
let daysOfMonthNoWeekends=(_event_date_t:datetime) {
toscalar(range days from startofmonth(_event_date_t) to endofmonth(_event_date_t) step 1d
| where dayofweek(days) between(1d .. 5d)
| count)
};
//
MyTable_CL
| extend daysOfMonthNoWeekends = daysOfMonthNoWeekends(TimeGenerated)
//or with another column of MyTable like event_date_t fails too
//| extend daysOfMonthNoWeekends = daysOfMonthNoWeekends(event_date_t)
Error:
Semantic error: '' has the following semantic error: Unresolved reference binding: 'TimeGenerated'.
For the record I pretend to add a column with the number of days without weekends in a month based on a column which is a date to use it in another calculation.
Any idea why this doesn't works?
the reason this doesn't work is documented here: User-defined functions usage restrictions
specifically:
User-defined functions can't pass into toscalar() invocation information that depends on the row-context in which the function is called.
you should be able to achieve your intention using a join/lookup.
for example (caveat: test this actually works with your data. i 'complied' it in my head at an early morning hour):
let T = datatable(TimeGenerated:datetime)
[
datetime(2020-02-11 11:20),
datetime(2020-04-11 11:30),
datetime(2020-05-12 19:20),
datetime(2020-05-13 19:20),
datetime(2020-04-13 19:20),
datetime(2020-01-11 17:20),
]
;
let daysOfMonthNoWeekends =
range dt from startofmonth(toscalar(T | summarize min(TimeGenerated))) to endofmonth(toscalar(T | summarize max(TimeGenerated))) step 1d
| summarize countif(dayofweek(dt) between(1d .. 5d)) by month = startofmonth(dt)
;
T
| extend month = startofmonth(TimeGenerated)
| lookup daysOfMonthNoWeekends on month
| project-away month

Report Builder Query Text Editor Where Clause Parentheses

My employer has switched data systems and reporting tools. We used to use Report Builder with a nicely built data model that allowed me to do some complex filtering easily. Then we used Business Objects, and though I didn't like it very much, it also let me do some complex filtering. Now we're back to Report Builder, but the data model is different, and the only filtering I seem to be able to do is a string of AND operators.
(Note: I'm self-taught on both Report Builder and Business Objects. I have minimal experience with the SQL coding language itself. Also, actual data labels have been changed in this example.)
I'm pulling from a large amount of data, so I need to filter on the query level. I first need to include data based on five criteria, like this.
| SYSTEM.REGION.REGION_STATUS_CODE = N'1'
| SYSTEM.STATE.STATE_STATUS_CODE = N'1'
AND | SYSTEM.ORDERS.DISCARDED_DATE IS NULL
| SYSTEM.SERVICE.SERVICE_DISCARDED_DATE IS NULL
| SYSTEM.SERVICE.SERVICE_STATUS_CODE = N'01'
Then I need to include data that fits one of two pairings, like this.
| | SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Retail'
| AND | SYSTEM.ORDERS.DISCOUNT_CODE = N'N/A'
OR |
| | SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Wholesale'
| AND | SYSTEM.ORDERS.DISCOUNT_CODE != N'N/A'
After I built my query using the query designer and switched to text mode, it gave me this.
WHERE
SYSTEM.REGION.REGION_STATUS_CODE = N'1'
AND SYSTEM.STATE.STATE_STATUS_CODE = N'1'
AND SYSTEM.ORDERS.DISCARDED_DATE IS NULL
AND SYSTEM.SERVICE.SERVICE_DISCARDED_DATE IS NULL
AND SYSTEM.SERVICE.SERVICE_STATUS_CODE = N'01'
AND SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Retail'
AND SYSTEM.ORDERS.DISCOUNT_CODE = N'N/A'
AND SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Wholesale'
AND SYSTEM.ORDERS.DISCOUNT_CODE != N'N/A'
I've tried putting parentheses in, but I must have done it wrong because the query ran for ages before essentially giving me the entire database.
Anybody care to help a SQL newbie?
Presuming everything else is right, it should just be about applying parenthesis to get the logic right. Using slightly exaggerated whitespace to try and make it clear:
WHERE
SYSTEM.REGION.REGION_STATUS_CODE = N'1'
AND SYSTEM.STATE.STATE_STATUS_CODE = N'1'
AND SYSTEM.ORDERS.DISCARDED_DATE IS NULL
AND SYSTEM.SERVICE.SERVICE_DISCARDED_DATE IS NULL
AND SYSTEM.SERVICE.SERVICE_STATUS_CODE = N'01'
AND (
(SYSTEM.ORDERS.DISCOUNT_CODE = N'N/A'
AND SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Retail')
OR
(SYSTEM.ORDERS.DISCOUNT_CODE != N'N/A'
AND SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Wholesale')
)
(It still may run forever, but that's more a factor of database size and indexing.)

BigQuery - Adwords Data Transfer - AccountStats vs AccountBasicStats

For many tables, there's always a AccountStats vs AccountBasicStats.
The same SQL query might have different values from Stats vs BasicStats, for example:
SELECT
cs.Date,
SUM(cs.Impressions) AS Sum_Impressions,
SUM(cs.Clicks) AS Sum_Clicks,
SUM(cs.Interactions) AS Sum_Interactions,
(SUM(cs.Cost) / 1000000) AS Sum_Cost,
SUM(cs.Conversions) AS Sum_Conversions
FROM
`{dataset_id}.Customer_{customer_id}` c
LEFT JOIN
`{dataset_id}.AccountBasicStats_{customer_id}` cs
<-----OR USING----->
`{dataset_id}.AccountStats_{customer_id}` cs
ON
c.ExternalCustomerId = cs.ExternalCustomerId
WHERE
c._DATA_DATE = c._LATEST_DATE
AND c.ExternalCustomerId = {customer_id}
GROUP BY
1
ORDER BY
1
It seems the main difference is ClickType column, which might double count based on the documentation: ClickType.
The BasicStats seems the most accurate, and match up exactly from adwords. While the Stats give around 2x-3x increase in impressions.
Is there a way to transform the data so that both queries would get the same results?
Since there's no basic stats for Hourly data, which I'm interested.
According to:
https://groups.google.com/forum/#!topic/adwords-api/QiY_RT9aNlM
Seems that there is no way to de-segment the data after ClickType is brought in.

Select Last movement for each ID in Access

I am working on a MS-Access table, and would like to have a query to result all the information on the last entry from a certain id.
My table (DEPOSIT_MOVEMENTS) is the following:
MOV_CODE | WORK | DEPOSIT_CODE | TYPE | DATE | DESTINATION
I am looking to obtain for each DEPOSIT_CODE the latest register (on date), and obtain the *MOV_CODE so that I can get the DESTINATION of the item.
DEPOSIT_CODE may have many MOV_CODE on different dates.
I have tried with different options posted on stackoveflow, but I coudl not get any of these to work properly.
Right now I am trying with the GROUP BY, but cannot get it working.
SELECT t1.[DEPOSIT_CODE], MAX(t1.[DATE]), t1.[MOV_CODE]
FROM [DEPOSIT_MOVEMENTS] AS t1
GROUP BY t1.[DEPOSIT_CODE];
Any help or guidance is welcome.
Kind regards,
Here is one method:
SELECT dm.*
FROM [DEPOSIT_MOVEMENTS] AS dm
WHERE dm.DATE = (SELECT MAX(dm2.DATE)
FROM [DEPOSIT_MOVEMENTS] AS dm2
WHERE dm2.DEPOSIT_CODE = dm.DEPOSIT_CODE
);