Between Operator Big Query Standard SQL - google-bigquery

Using Standard SQL in BQ - as part of a task I want to search for records created between 2pm the previous day & 2pm on current day
I have found
SELECT DATETIME_SUB(DATETIME_TRUNC(CURRENT_DATETIME(), DAY), INTERVAL 10 hour) Gives me 2PM yesterday
SELECT DATETIME_ADD(DATETIME_TRUNC(CURRENT_DATETIME(), DAY), INTERVAL 14 hour)
Gives me 2pm today
So, i assumed i could use this in my query
Select * from
TableA
where CreatedDate Between
DATETIME_SUB(DATETIME_TRUNC(CURRENT_DATETIME(), DAY), INTERVAL 10 hour) and DATETIME_ADD(DATETIME_TRUNC(CURRENT_DATETIME(), DAY), INTERVAL 14 hour)
However I get the following
No matching signature for operator BETWEEN for argument types:
TIMESTAMP, DATETIME, DATETIME. Supported signature: (ANY) BETWEEN
(ANY) AND (ANY)
Where am i going wrong?

Your issue is that CreatedDate is TIMESTAMP and you need to convert into a DATETIME
It could be like:
where DATETIME(CreatedDate) Between ...
But you could easily write your own statements for TIMESTAMP
SELECT timestamp_sub(timestamp_trunc(current_timestamp() ,
DAY),interval 10 hour)

Related

Seems like bug in Timestamp and Datetime diffs handing in BigQuery

If I run a query like following, to find difference in days for two points in time,
select
timestamp_diff(timestamp('2021-04-13T06:51:42'), timestamp('2021-04-05T06:56:24'), day)
,datetime_diff(timestamp('2021-04-13T06:51:42 UTC'), timestamp('2021-04-05T06:56:24 UTC'), day)
,timestamp_diff('2021-04-13T06:51:42', '2021-04-05T06:56:24', day)
,datetime_diff ('2021-04-13T06:51:42', '2021-04-05T06:56:24', day)
,datetime_diff (datetime('2021-04-13T06:51:42'), datetime('2021-04-05T06:56:24'), day)
I get following results:
7 7 7 8 8
query result
Time points are the same on all lines of query, it's exactly the same time frame, and I'd expect equal results.
Seems like temporal data diffs handling is not consistent, or I see expected behavior?
Maybe this helps:-
A DATETIME object using a TIMESTAMP object supports an optional parameter to specify a timezone. If no timezone is specified, the default timezone, UTC, is used.
I executed your query by making some modifications as below and I get same results across.
select
timestamp_diff(timestamp('2021-04-13T06:51:42' , "America/Los_Angeles"), timestamp('2021-04-05T06:56:24'), day)
,datetime_diff(timestamp('2021-04-13T06:51:42' , "America/Los_Angeles"), timestamp('2021-04-05T06:56:24 UTC'), day)
,timestamp_diff(timestamp('2021-04-13T06:51:42',"America/Los_Angeles"), '2021-04-05T06:56:24', day)
,datetime_diff ('2021-04-13T06:51:42', '2021-04-05T06:56:24', day)
,datetime_diff (datetime('2021-04-13T06:51:42'), datetime('2021-04-05T06:56:24'), day);
Follow this
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#datetime
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#time-zones

Difference between CURRENT_TIMESTAMP and CURRENT_DATE

I want to get the data from the last 28 days and only include complete days. So what I mean is, when I look at the data today at 10:00 AM, it only includes data from yesterday (the completed day) and 28 days before yesterday.
I am creating a live dashboard with figures like this. So I don't want the numbers to change until the day is finished.
Also, I am willing to understand the difference between CURRENT_DATE and CURRENT_TIMESTAMP
For example, in my code, if I use CURRENT_TIMESTAMP, will I get the data from today 10:00 AM back to 28 days ago 10:00 AM? if not, how can I get data in a way numbers change live according to every time I run the code (the average time that data change in the database is 10 minutes).
My simplified code:
select count(id) from customers
where created_at > CURRENT_DATE - interval '28 days'
Maybe I am using wrong code, can you please give me advice on how to get the date in both formats:
include only complete days(does not include today, until the day is
finished)
include hours, from today morning until 28 days back same
time in the morning.
Assuming created_at is of type timestamptz.
include only complete days(does not include today, until the day is
finished)
Start with now() and use date_trunc():
SELECT count(*)
FROM customers
WHERE created_at < date_trunc('day', now())
AND created_at >= date_trunc('day', now() - interval '28 days');
Or work with CURRENT_DATE ...
WHERE created_at < CURRENT_DATE
AND created_at >= CURRENT_DATE - 28;
The result for both depends on the current timezone setting. The "date" functionally depends on your current time zone. The type timestamp with time zone (timestamptz) does not. But the expression date_trunc('day', now()) introduces the same dependency as the "day" is defined by your current time zone. So you need to define which "days" you mean precisely. Basics:
Ignoring time zones altogether in Rails and PostgreSQL
You can subtract integer values from a date to subtract days:
How do I determine the last day of the previous month using PostgreSQL?
now() is a shorter equivalent of CURRENT_TIMESTAMP. See:
Difference between now() and current_timestamp
count(*) is equivalent to count(id) while id is defined NOT NULL, but a bit faster.
I have different results from query for COUNT('e.id') or COUNT(e.id)
include hours, from today morning until 28 days back same time in the morning.
Simply:
WHERE created_at > now() - interval '28 days'
No dependency on the current time zone.

How to subtract days to a Timestamp in CrateDB SQL query?

How can i subtract days to a timestamp in CrateDB SQL query?
Exist something similar to this?
TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 14 DAY)
Don't think there is a built in function but you could do something like this
SELECT DATE_FORMAT(CURRENT_TIMESTAMP - 1000*60*60*24*14) LIMIT 100
in this example (1000 * 60 * 60) * 24 * 14 (24 is to get days and 14 is your number of days)
NB. You can also cast dates into timestamp and perform similar functionality
SELECT ABS(cast('2019-01-1' AS TIMESTAMP) - CURRENT_TIMESTAMP ) / (1000*60*60*24) LIMIT 100
this will get you a number of days between now and 1st of January
So far that's all what they have in their docs
You can subtract INTERVAL from TIMESTAMP, but before any matematichal operation you need to CAST the datatype, you can do it in this way:
SELECT now() - CAST('14 day' AS INTERVAL)
Or the same function of above, but in a contracted way
SELECT now() - '14 day'::INTERVAL;
As a string to be CAST to an INTERVAL you can use a number followed by any of this:
second
minute
hour
day
week
month
quarter
year

How to run a query for every date for last 3 month

I have a table(pkg_date) in redshift. I want to fetch some data for every date for the last 3 months.
Here is my query
select * from pkg_data where scan_date < current_date;
How can I use current_date as a variable in the query itself and run this query for every date from April 1.
I have set a cron job which will run in every hour. In every hour it should run with different current_date
SELECT *
FROM pkg_data
WHERE scan_date > CURRENT_DATE - INTERVAL '3 months'
Be careful — Redshift works in UTC, so the CURRENT_DATE might suffer from timezone effects and be +/- what you expect sometimes.
SELECT
CURRENT_DATE,
(CURRENT_DATE - INTERVAL '3 months')::date
Returns:
2018-06-21 2018-03-21
Also be careful with strange lengths of months!
SELECT DATE '2018-05-31' - INTERVAL '3 months'
returns:
2018-02-28 00:00:00
Notice that it gave the last day of the month (31st vs 28th).
By the way, you can use DATE '2018-05-31' or '2018-05-31'::DATE, and also INTERVAL '3 months' or '3 months'::INTERVAL to convert types.
Use dateadd() for getting date 3 moth old day and GETDATE() for get current date.
ie code will look like.
select * from pkg_data where scan_date < dateadd(month,-3,GETDATE());
for cron refer How to execute scheduled SQL script on Amazon Redshift?

Query to fetch records only form previous half hour with time stamp in unix epoch format

I want SQL query to fetch/select records which are taken only from previous half an hour only. For example if my scheduler ran at 2 pm, and then again in 2:30, during the 2:30 run it should only pick rows from between 2pm and 2:30pm and not earlier, using the column created_timestamp which stores the time as unix epoch format eg:
|created_timestamp|
|1497355750350 |
|1497506182344 |
We can do arithmetic with Oracle dates. Subtracting one date from another gives the interval as a fractional number. Multiplying by 86400 gives us the number of seconds. So this is the current unix expoch:
(sysdate - date '1970-01-01') * 86400
This means your query will be something like
select * from your_table
where created_timestamp >= (:last_run_time - date '1970-01-01') * 86400
The trick is that your scheduler needs to pass in the time of the previous run - last_run_time - to pick up all the records which have been added since then.
You can do Flashback query
SELECT * FROM TABLE
AS OF TIMESTAMP (SYSTIMESTAMP - INTERVAL '30' MINUTE);