Adding Where condition on Timestamp yields odd aggregated results - azure-log-analytics

I'm looking into Azure Monitor queries for the first time, and can't understand why adding this line:
| where timestamp <= ago(1days)
makes the query results "de-aggregated."
Screenshots of the 2 separate queries/results:
Desired Output
Undesired Output

The operator you should be using is timestamp >= ago(1d), which should pick the rows which have timestamp for last 24Hrs.
Below is the sample
requests
| where timestamp >= ago(1d)
| summarize C = count() by itemType
Output from Explorer with timestamp with in the query
requests
| summarize C = count() by itemType
Output from Explorer with timestamp from Time Range
Documentation reference for using ago()
Hope this helps !

Related

Indexing an SQL table by datetime that is scaling

I have a large table that gets anywhere from 1-3 new entries per minute. I need to be able to find records at specific times which I can do by using a SELECT statement but it's incredibly slow. Lets say the table looks like this:
Device | Date-Time | Data |
-----------------------------------
1 | 2020-01-01 08:00 | 325
2 | 2020-01-01 08:01 | 384
1 | 2020-01-01 08:01 | 175
3 | 2020-01-01 08:01 | 8435
7 | 2020-01-01 08:02 | 784
.
.
.
I'm trying to get data like this:
SELECT *
FROM table
WHERE Date-Time = '2020-01-01 08:00' AND Device = '1'
I also need to get data like this:
SELECT *
FROM table
WHERE Date-Time > '2020-01-01 08:00' Date-Time < '2020-01-10 08:00' AND Device = '1'
But I don't know what the Date-Time will be until requested. In this case, I will have to search the entire table for these times. Can I index the start of the day so I know where dates are?
Is there a way to index this table in order to dramatically decrease the queries? Or is there a better way to achieve this?
I have tried indexing the Date-Time column but I did not decrease the query time at all.
For this query:
SELECT *
FROM mytable
WHERE date_time = '2020-01-01 08:00' AND device = 1
You want an index on mytable(date_time, device). This matches the columns that come into play in the WHERE clause, so the database should be able to lookup the matching rows efficiently.
Note that I removed the single quotes around the literal value given to device: if this is an integer, as it looks like, then it should be treated as such.
The ordering of the column in the index matters; generally, you want the most restrictive column first - from the description of your question, this would probably be date_time, hence the above suggestion. You might want to try the other way around as well (so: mytable(device, date_time)).
Another thing to keep in mind from performance perspective: you should probably enumerate the columns you want in the SELECT clause; if you just want a few additional columns, then it can be useful to add them to the index as well; this gives you a covering index, that the database can use to execute the whole query without even looking back at the data.
Say:
SELECT date_time, device, col1, col2
FROM mytable
WHERE date_time = '2020-01-01 08:00' AND device = 1
Then consider:
mytable(date_time, device, col1, col2)
Or:
mytable(device, date_time, col1, col2)
You can use TimeInMilliseconds as new column and populate it with milliseconds from the year 1970 and create Index on this column. TimeInMilliseconds will always be unique number and it will help the index to search queries faster.

Calculating average with biginteger time intervals using TimescaleDB

I have a schema with the following fields:
Name of row | Type
--------------------------+--------
name | string
value1 | numeric
timestamp | bigint
The rows contain entries with a name, a numeric value and a bigint value storing the unix timestamp in nanoseconds. Using TimescaleDB and I would like to use time_buckets_gapfill to retrieve the data. Given the timestamps are stored in bigint, this is quite cumbersome.
I would like to get aggregated data for these intervals: 5min, hour, day, week, month, quarter, year. I have managed to make it work using normal time_buckets, but now I would like to fill the gaps as well. I am using the following query now:
SELECT COALESCE(COUNT(*), 0), COALESCE(SUM(value1), 0), time_bucket_gapfill('5 min', date_trunc('quarter', to_timestamp(timestamp/1000000000)), to_timestamp(1599100000), to_timestamp(1599300000)) AS bucket
FROM playground
WHERE name = 'test' AND timestamp >= 1599100000000000000 AND timestamp <= 1599300000000000000
GROUP BY bucket
ORDER BY bucket ASC
This returns the values correctly, but does not fill the empty spaces. If I modified my query to
time_bucket_gapfill('5 min',
date_trunc('quarter',
to_timestamp(timestamp/1000000000),
to_timestamp(1599100000),
to_timestamp(1599200000))
I would get the first entry correctly and then empty rows every 5 minutes. How could I make it work? Thanks!
Here is a DB fiddle, but it doesn't work as it doesn't support TimeScaleDB. The query above returns the following:
coalesce | coalesce | avg_val
------------------------+-------------------------
3 | 300 | 2020-07-01 00:00:00+00
0 | 0 | 2020-09-03 02:25:00+00
0 | 0 | 2020-09-03 02:30:00+00
You should use datatypes in your time_bucket_gapfill that matches the datatypes in your table. The following query should get you what you are looking for:
SELECT
COALESCE(count(*), 0),
COALESCE(SUM(value1), 0),
time_bucket_gapfill(300E9::BIGINT, timestamp) AS bucket
FROM
t
WHERE
name = 'example'
AND timestamp >= 1599100000000000000
AND timestamp < 1599200000000000000
GROUP BY
bucket;
I have managed to solve it by building on Sven's answer. It first uses his function to fill out the gaps and then date_trunc is called eliminating the extra rows.
WITH gapfill AS (
SELECT
COALESCE(count(*), 0) as count,
COALESCE(SUM(value1), 0) as sum,
time_bucket_gapfill(300E9::BIGINT, timestamp) as bucket
FROM
playground
WHERE
name = 'test'
AND timestamp >= 1599100000000000000
AND timestamp < 1599300000000000000
GROUP BY
bucket
)
SELECT
SUM(count),
SUM(sum),
date_trunc('quarter', to_timestamp(bucket/1000000000)) as truncated
FROM
gapfill
GROUP BY truncated
ORDER BY truncated ASC

Azure Log Analytic > Same query but different results when pinned on a Dashboard

My below query returns a table with the corresponding values
union (traces), (customEvents)
| where timestamp <= now()
| summarize Users=dcount(user_AuthenticatedId) by Country=client_CountryOrRegion
| sort by Users desc
Results:
When pinning the query to the dashboard, I see different results:
The only difference that I can see is the time range set directly on the dashboard. I set this one to custom: 2016-07-06 to now to simulate the same value than in the query. I have checked and I only have logs from 2019 anyway.
Has anyone a clue?
Whenever I have seen this it is due to time slicing. You could add min and max timestamp values to the query in order to understand the exact ranges:
union (traces), (customEvents)
| where timestamp <= now()
| summarize Users=dcount(user_AuthenticatedId), FirstRecord=min(timestamp), LastRecord=max(timestamp) by Country=client_CountryOrRegion
| sort by Users desc

Using crosstab, dynamically loading column names of resulting pivot table in one query?

The gem we have installed (Blazer) on our site limits us to one query.
We are trying to write a query to show how many hours each employee has for the past 10 days. The first column would have employee names and the rest would have hours with the column header being each date. I'm having trouble figuring out how to make the column headers dynamic based on the day. The following is an example of what we have working without dynamic column headers and only using 3 days.
SELECT
pivot_table.*
FROM
crosstab(
E'SELECT
"User",
"Date",
"Hours"
FROM
(SELECT
"q"."qdb_users"."name" AS "User",
to_char("qdb_works"."date", \'YYYY-MM-DD\') AS "Date",
sum("qdb_works"."hours") AS "Hours"
FROM
"q"."qdb_works"
LEFT OUTER JOIN
"q"."qdb_users" ON
"q"."qdb_users"."id" = "q"."qdb_works"."qdb_user_id"
WHERE
"qdb_works"."date" > current_date - 20
GROUP BY
"User",
"Date"
ORDER BY
"Date" DESC,
"User" DESC) "x"
ORDER BY 1, 2')
AS
pivot_table (
"User" VARCHAR,
"2017-10-06" FLOAT,
"2017-10-05" FLOAT,
"2017-10-04" FLOAT
);
This results in
| User | 2017-10-05 | 2017-10-04 | 2017-10-03 |
|------|------------|------------|------------|
| John | 1.5 | 3.25 | 2.25 |
| Jill | 6.25 | 6.25 | 6 |
| Bill | 2.75 | 3 | 4 |
This is correct, but tomorrow, the column headers will be off unless we update the query every day. I know we could pivot this table with date on the left and names on the top, but that will still need updating with each new employee – and we get new ones often.
We have tried using functions and queries in the "AS" section with no luck. For example:
AS
pivot_table (
"User" VARCHAR,
current_date - 0 FLOAT,
current_date - 1 FLOAT,
current_date - 2 FLOAT
);
Is there any way to pull this off with one query?
You could select a row for each user, and then per column sum the hours for one day:
with user_work as
(
select u.name as user
, to_char(w.date, 'YYYY-MM-DD') as dt_str
, w.hours
from qdb_works w
join qdb_users u
on u.id = w.qdb_user_id
where w.date >= current_date - interval '2 days'
)
select User
, sum(case when dt_str = to_char(current_date,
'YYYY-MM-DD') then hours end) as Today
, sum(case when dt_str = to_char(current_date - 'interval 1 day',
'YYYY-MM-DD') then hours end) as Yesterday
, sum(case when dt_str = to_char(current_date - 'interval 2 days',
'YYYY-MM-DD') then hours end) as DayBeforeYesterday
from user_work
group by
user
, dt_str
It's often easier to return a list and pivot it client side. That also allows you to generate column names with a date.
Is there any way to pull this off with one query?
No, because a fixed SQL query cannot have any variability in its output columns. The SQL engine determines the number, types and names of every column of a query before executing it, without reading any data except in the catalog (for the structure of tables and other objects), execution being just the last of 5 stages.
A single-query dynamic pivot, if such a thing existed, couldn't be prepared, since a prepared query always have the same results structure, whereas by definition a dynamic pivot doesn't, as the rows that pivot into columns can change between executions. That would be at odds again with the Prepare-Bind-Execute model.
You may find some limited workarounds and additional explanations in other questions, for example: Execute a dynamic crosstab query, but since you mentioned specifically:
The gem we have installed (Blazer) on our site limits us to one
query
I'm afraid you're out of luck. Whatever the workaround, it always need at best one step with a query to figure out the columns and generate a dynamic query from them, and a second step executing the query generated at the previous step.

Oracle date column showing the wrong value

I'm trying to identify a problem in a date colum in my table.
The database is Oracle 11g.
The situation is:
When I run the following query:
select to_char(data_val, 'DD/MM/YYYY'), a.data_val from material a order by a.data_val asc;
the five first lines of the result are:
00/00/0000 | 29/06/5585 00:00:00
00/00/0000 | 29/06/5585 00:00:00
00/00/0000 | 29/06/5585 00:00:00
11/11/1111 | 11/11/1111 00:00:00
01/01/1500 | 01/01/1500 00:00:00
the question is:
Why the to_char function of the first three lines returns a different value of date (00/00/0000)?
And why the date 29/06/5585 is the first result of a ASC date order by? It'll be right using: order by data_val DESC, will not?
We've encountered the same problem. I can confirm that the "date" column is indeed the DATE type.
The date in question is 01-May-2014, so it's most likely not related to the big year number in the original post. And when you perform some calculation with the date, the problem is fixed, i.e. TO_CHAR(datum) would be all zeros, TO_CHAR(datum + 1) would be as expected, and even TO_CHAR(datum +1 -1) would be correct. (TO_CHAR(datum+0) doesn't help :))
Based on the DUMP value it seems that the problem is that we've somehow managed to store 31-Apr-2014 rather than 01-May-2014 (investigating now how that was possible; Informatica + Oracle 11.2, I believe).