trino sql calculate any time unit from given timestamp - sql

I am exploring a legacy code and came upon this trino sql query that calculates and groups by any given period , i.e. 86400 for a day, 3600 for hour, etc, from a timestamp field.
SELECT
COUNT(*) count,
(floor(((to_unixtime(s.uploadedon) - 60*(-345)) / 86400)) * 86400)*1000 as sn_period_day
from as.prod_views.sessions as s
where ...
group by sn_period_day
So I can understand that (to_unixtime(s.uploadedon) - 60*(-345) this will add or subtract the hour offset from a given UTC timestamp field, s.uploadedon, where 345 is in minutes.
But the rest of (floor(((to_unixtime(s.uploadedon) - 60*(-345)) / 86400)) * 86400)*1000 I am confused.
Also 86400 is used for the day calculation. Similarly (floor(((to_unixtime(s.uploadedon) - 60*(-345)) / 3600)) * 3600)*1000 for hour calculation.
Can anyone help me understand this computation logic?

Related

Get data when date is equal to or greater than 90 days ago

I wonder if anyone here can help with a BigQuery piece I am working on.
I'm trying to pull the date, email and last interaction time from a dataset when the last interaction time is equal to or greater than 90 days ago.
I have the following query:
SELECT
date,
user_email,
DATE_FROM_UNIX_DATE(gmail.last_interaction_time) AS Last_Interaction_Date,
DATE_ADD(CURRENT_DATE(), INTERVAL -90 DAY) AS Days_ago
FROM
`bqadminreporting.adminlogtracking.usage`
WHERE
'Last_Interaction_Date' >= 'Days_ago'
However, I run into the following error:
DATE value is out of allowed range: from 0001-01-01 to 9999-12-31
As far as I can see, it makes sense - so not entirely sure why its throwing out an error?
Looks like you have some inconsistent values (data) in filed gmail.last_interaction_time, which you need to handle to avoid error.
Moreover above query will not work as per your expected WHERE conditions, you should use following query to get expected output.
SELECT * FROM
(SELECT
date,
user_email,
DATE_FROM_UNIX_DATE(gmail.last_interaction_time) AS Last_Interaction_Date,
DATE_ADD(CURRENT_DATE(), INTERVAL -90 DAY) AS Days_ago
FROM
`bqadminreporting.adminlogtracking.usage`)
WHERE
Last_Interaction_Date >= Days_ago
Presumably, your problem is DATE_FROM_UNIX_DATE(). Without sample data, it is not really possible to determine what the issue is.
However, you don't need to convert to a date to do this. You can do all the work in the Unix seconds space:
select u.*
from `bqadminreporting.adminlogtracking.usage` u
where gmail.last_interaction_time >= unix_seconds(timestamp(current_date)) - 90 * 60 * 60 * 24
Note that I suspect that the issue is that last_interaction_time is really measured in milliseconds or microseconds or some other unit. This will prevent your error, but it might not do what you want.

PostgreSQL DATE_PART function

I need to find all applications whose time of sending all documents does not exceed 10 minutes. I have done the rest of code but I have problem with date_part function. The code below works fine but I have to find the other way. How can I do it differently ?
abs(
round(
(
date_part('hour',d.received_date)
-
date_part('hour',d.send_date)
) * 60
+
(
date_part('minute', d.received_date)
-
date_part('minute', d.send_date)
)
)
) as sendTime
It's unclear to me if you want to limit the result to rows where the difference is less then 10 minutes (as stated in the question) or you just want to display the result between two timestamps in minutes (as you stated in the comments).
The expression r.received_date - r.send_date returns an interval which can easily to be converted to minutes. So to display the difference, use:
extract(epoch from d.received_date - d.send_date) / 60 as minutes
to limit the result to only rows where the difference is less than 10 minutes use:
select ...
from ...
where d.received_date - d.send_date <= interval '10' minute

HOW to SELECT data basing on both a period of date and a period of time in clickhouse

I want to filter some data by both yyyymmdd(date) and hhmmss(time), but clickhouse don't support time type. So I choose datetime to combine them. But how to do such things:
This is code of dolphindb(which supports second type to represent hhmmss.
select avg(ofr + bid) / 2.0 as avg_price
from taq
where
date between 2007.08.05 : 2007.08.07,
time between 09:30:00 : 16:00:00
group by symbol, date
This is code of clickhouse, but a logical problematic code.
SELECT avg(ofr + bid) / 2.0 AS avg_price
FROM taq
WHERE
time BETWEEN '2007-08-05 09:30:00' AND '2007-08-07 16:00:00'
GROUP BY symbol, toYYYYMMDD(time)
;
how to express it in sql just like the dolphindb code?
Assume that you just want to average the trading price in normal trading hours, excluding after hour trading, then a possible solution:
SELECT avg(ofr + bid) / 2.0 AS avg_price
FROM taq
WHERE
toYYYYMMDD(time) BETWEEN 20070805 AND 20070807 AND
toYYYYMMDDhhmmss(time)%1000000 BETWEEN 93000 and 160000
GROUP BY symbol, toYYYYMMDD(time)
This filters the taq table within specified date and time.

Oddities with postgres SQL [negative date interval and alias that doesn't work only in condition clause]

I'm coming to you guys with with two small oddities I can't seem to understand with postgres:
(1)
SELECT "LASTREQUESTED",
(DATE_TRUNC('seconds', CURRENT_TIMESTAMP - "LASTREQUESTED")
- INTERVAL '8 hours') AS "TIME"
FROM "USER" AS u
JOIN "REQUESTLOG" AS r ON u."ID" = r."ID"
ORDER BY "TIME"
I'm calculating when users can make their next request [once every 8 hours], but if you look at entry 16 I get "1 day -06:20:47" instead of "18:00:00" ish, unlike every other line. [The table LASTREQUESTED is a simple timestamp, nothing different here from the other entries for line 16], why is that?
(2)
On the same request, if I try to add a condition on the "TIME" column, the compiler says it doesn't exist although using it to order by is ok. I don't get why.
SELECT (DATE_TRUNC('seconds', CURRENT_TIMESTAMP - "LASTREQUESTED")
- INTERVAL '8 hours') AS "TIME"
FROM "USER" AS u
JOIN "REQUESTLOG" AS r ON u."ID" = r."ID"
WHERE "TIME" > 0
ORDER BY "TIME";
Question #1: negative hours but positive days?
According to the PostgreSQL documentation, this is a situation where PostgreSQL differs from the SQL standard:
According to the SQL standard all fields of an interval value must have the same sign…. PostgreSQL allows the fields to have different signs….
Internally interval values are stored as months, days, and seconds. This is done because the number of days in a month varies, and a day can have 23 or 25 hours if a daylight savings time adjustment is involved. The months and days fields are integers while the seconds field can store fractions. …
You can see a more extreme example of this with the following query:
=# select interval '1 day' - interval '300 hours';
?column?
------------------
1 day -300:00:00
(1 row)
So this is not a single interval in seconds expressed in a strange way; instead, it's an interval of 0 months, +1 day, and -1,080,000.0 seconds. If you are certain that there's no daylight savings time issues with the timestamps that you got these intervals from, you can use justify_hours to convert days into 24-hour periods and get an interval that makes more sense:
=# select justify_hours(interval '1 day' - interval '300 hours');
justify_hours
--------------------
-11 days -12:00:00
Question #2: SELECT columns can't be used in WHERE?
This is standard PostgreSQL behavior. See this duplicate question. Solutions presented there include:
Repeat the expression twice, once in the SELECT list, and again in the WHERE clause. (I've done this more times than I want to remember…)
SELECT (my - big * expression) AS x
FROM stuff
WHERE (my - big * expression) > 5
ORDER BY x
Create a subquery without that WHERE filter, and put the WHERE conditions in the outer query
SELECT *
FROM (SELECT (my - big * expression) AS x
FROM stuff) AS subquery
WHERE x > 5
ORDER BY x
Use a WITH statement to achieve something similar to the subquery trick.
I don't now exactly why it's calculating as-is (maybe because you subtract an Interval from another Interval) but when you change the calculation to Timestamp minus Timestamp it works as expected:
DATE_TRUNC('seconds', CURRENT_TIMESTAMP - (LASTREQUESTED + INTERVAL '8 hours'))
See Fiddle
Regarding #2: Based on Standard SQL the columns in the Select-list are calculated after FROM/WHERE/GROUP BY/HAVING, but before ORDER, that's why you can't use an alias in WHERE. There are some good articles on that topic written by Itzik Ben-Gan (based on MS SQL Server, but similar for PostgreSQL).

Get data that is no more than an hour old in BigQuery

Trying to use the statement:
SELECT *
FROM data.example
WHERE TIMESTAMP(timeCollected) < DATE_ADD(USEC_TO_TIMESTAMP(NOW()), 60, 'MINUTE')
to get data from my bigquery data. It seems to return same set of result even when time is not within the range. timeCollected is of the format 2015-10-29 16:05:06.
I'm trying to build a query that is meant to return is data that is not older than an hour. So data collected within the last hour should be returned, the rest should be ignored.
Using Standard SQL:
SELECT * FROM data
WHERE timestamp > TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -60 MINUTE)
The query you made means "return to me anything that has a collection time smaller than an hour in the future" which will literally mean your whole table. You want the following (from what I got through your comment, at least) :
SELECT *
FROM data.example
WHERE TIMESTAMP(timeCollected) > DATE_ADD(USEC_TO_TIMESTAMP(NOW()), -60, 'MINUTE')
This means that any timeCollected that is NOT greater than an hour ago will not be returned. I believe this is what you want.
Also, unless you need it, Select * is not ideal in BigQuery. Since the data is saved by column, you can save money by selecting only what you need down the line. I don't know your use case, so * may be warranted though
To get table data collected within the last hour:
SELECT * FROM [data.example#-3600000--1]
https://cloud.google.com/bigquery/table-decorators
Using Standard SQL:
SELECT * FROM data WHERE timestamp > **TIMESTAMP_SUB**(CURRENT_TIMESTAMP(), INTERVAL 60 MINUTE)