google bigquery select from a timestamp column between now and n days ago - sql

I have a dataset in bigquery with a TIMESTAMP column "register_date" (sample value "2017-11-19 22:45:05.000 UTC" ).
I need to filter records based on x days or weeks before today criteria.
Example query
select all records which are 2 weeks old.
Currently I have this query (which I feel like a kind of hack) that works and returns the correct results
SELECT * FROM `my-pj.my_dataset.sample_table`
WHERE
(SELECT
CAST(DATE(register_date) AS DATE)) BETWEEN DATE_ADD(CURRENT_DATE(), INTERVAL -150 DAY)
AND CURRENT_DATE()
LIMIT 10
My question is do I have to use all that CASTing stuff on a TIMESTAMP column (which seems like over complicating the otherwise simple query)?
If I remove the CASting part, my query doesn't run and returns error.
Here is my simplified query
SELECT
*
FROM
`my-pj.my_dataset.sample_table`
WHERE
register_date BETWEEN DATE_ADD(CURRENT_DATE(), INTERVAL -150 DAY)
AND CURRENT_DATE()
LIMIT
10
that results into an error
Query Failed
Error: No matching signature for operator BETWEEN for argument types: TIMESTAMP, DATE, DATE. Supported signature: (ANY) BETWEEN (ANY) AND (ANY) at [6:17]
any insight is highly appreciated.

Use timestamp functions:
SELECT t.*
FROM `my-pj.my_dataset.sample_table` t
WHERE register_date BETWEEN TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -150 DAY) AND CURRENT_TIMESTAMP()
LIMIT 10;
BigQuery has three data types for date/time values: date, datetime, and timestamp. These are not mutually interchangeable. The basic idea is:
Dates have no time component and no timezone.
Datetimes have a time component and no timezone.
Timestamp has both a time component and a timezone. In fact, it represents the value in UTC.
INTERVAL values are defined in gcp documentation
Conversion between the different values is not automatic. Your error message suggests that register_date is really stored as a Timestamp.
One caveat (from personal experience): the definition of day is based on UTC. This is not much of an issue if you are in London. It can be a bigger issue if you are in another time zone and you want the definition of "day" to be based on the local time zone. If that is an issue for you, ask another question.

Related

Seems like bug in Timestamp and Datetime diffs handing in BigQuery

If I run a query like following, to find difference in days for two points in time,
select
timestamp_diff(timestamp('2021-04-13T06:51:42'), timestamp('2021-04-05T06:56:24'), day)
,datetime_diff(timestamp('2021-04-13T06:51:42 UTC'), timestamp('2021-04-05T06:56:24 UTC'), day)
,timestamp_diff('2021-04-13T06:51:42', '2021-04-05T06:56:24', day)
,datetime_diff ('2021-04-13T06:51:42', '2021-04-05T06:56:24', day)
,datetime_diff (datetime('2021-04-13T06:51:42'), datetime('2021-04-05T06:56:24'), day)
I get following results:
7 7 7 8 8
query result
Time points are the same on all lines of query, it's exactly the same time frame, and I'd expect equal results.
Seems like temporal data diffs handling is not consistent, or I see expected behavior?
Maybe this helps:-
A DATETIME object using a TIMESTAMP object supports an optional parameter to specify a timezone. If no timezone is specified, the default timezone, UTC, is used.
I executed your query by making some modifications as below and I get same results across.
select
timestamp_diff(timestamp('2021-04-13T06:51:42' , "America/Los_Angeles"), timestamp('2021-04-05T06:56:24'), day)
,datetime_diff(timestamp('2021-04-13T06:51:42' , "America/Los_Angeles"), timestamp('2021-04-05T06:56:24 UTC'), day)
,timestamp_diff(timestamp('2021-04-13T06:51:42',"America/Los_Angeles"), '2021-04-05T06:56:24', day)
,datetime_diff ('2021-04-13T06:51:42', '2021-04-05T06:56:24', day)
,datetime_diff (datetime('2021-04-13T06:51:42'), datetime('2021-04-05T06:56:24'), day);
Follow this
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#datetime
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#time-zones

Substraction with decimal in ORACLE SQL

I need to substract 2 timestamps in the given format:
16/01/17 07:01:06,165000000
16/01/17 07:01:06,244000000
I want to express the result with 2 decimal values but somewhere in the CAST process I am loosing precision. My atempt by now goes this way:
select
id,
trunc((CAST(MAX(T.TIMESTAMP) AS DATE) - CAST(MIN(T.TIMESTAMP) AS DATE))*24*60*60,2) as result
from table T
group by id;
But I get id_1 '0' as a result for the two timestamps above even after I set the truncate decimals at 2.
Is there a way that I can obtain the 0.XX aa a result of the substraction?
It's because you are casting the timestamp to date.
Use to_timestamp to convert your string into timestamp.
Try this:
with your_table(tstamp) as (
select '16/01/17 07:01:06,165000000' from dual union all
select '16/01/17 07:01:06,244000000' from dual
),
your_table_casted as (
select to_timestamp(tstamp,'dd/mm/yy hh24:mi:ss,ff') tstamp from your_table
)
select trunc(sysdate + (max(tstamp) - min(tstamp)) * 86400 - sysdate, 2) diff
from your_table_casted;
The difference between two timestamps is INTERVAL DAY TO SECOND.
To convert it into seconds, use the above trick.
DATE—This datatype stores a date and a time, resolved to the second. It does not include the time zone. DATE is the oldest and most commonly used datatype for working with dates in Oracle applications.
TIMESTAMP—Time stamps are similar to dates, but with these two key distinctions: you can store and manipulate times resolved to the nearest billionth of a second (9 decimal places of precision), and you can associate a time zone with a time stamp, and Oracle Database will take that time zone into account when manipulating the time stamp.
The result of a substraction of two timestamps is an INTERVAL:
INTERVAL—Whereas DATE and TIMESTAMP record a specific point in time, INTERVAL records and computes a time duration. You can specify an interval in terms of years and months, or days and seconds.
You can find more information here

Timestamp query to hours

I need some help with timestamps with postgresql. I have a column for the timestamp with timezone named download_at for when a user downloaded an app and a column user_id which is an integer. I am trying to extract user IDs of users that have downloaded within the last 168 hours from the last 60 days of information. I am a bit confused on how I can approach this and felt stuck because of the two different times. I believe I might have to play around with the trunc function but felt a bit stuck.
A basic example:
SELECT *
FROM table1
WHERE download_at > now() - '186 hours'::interval
Postgres is phenomenal at handling dates and times. A breakdown of what this does:
now() --function that returns the current time as a datetime object
'186 hours'::interval --a string cast to an interval
In postgres :: does casting. When casting to an interval Postgres will turn formatted English to an interval object. Since you can subtract datetime and interval objects it'll do the rest for you.

Returning results for the whole of yesterday and not the last 24 hours

I have this query that needs data for yesterday. What i have below returns result for the last 24 hours which is different from yesterday 00.00 - 23.59.
Here is what i have but doesn't solve the problem.
Select * from message where now() - arrival_timestamp <= interval '24 hour'
You could cast the timestamp to date with the syntax expression::type (more info on The Type Casts section of The PostgreSQL Documentation). Sufficient tools for making the comparison between dates can be found from the section 9.9. Date/Time Functions and Operators:
SELECT * FROM message WHERE arrival_timestamp::date = current_date - 1;
If you have an index on arrival_timestamp the cast to date would render the index unusable in the query. In that case use other comparison operators:
SELECT * FROM message WHERE arrival_timestamp >= current_date - 1 AND arrival_timestamp < current_date;
You might try this:
SELECT * FROM message
WHERE date_trunc('day', arrival_timestamp) = current_date - interval '1 day'
When working with time or any other data type, you want to avoid a filter criteria in the form
function( field ) comparison other_field_or_constant
This is non-sargable. What that means is that the optimizer can't take advantage if field is indexed. Since it cannot predict what value will result from the function call, it must perform a complete table scan and submit every field value to the function to make the comparison.
So you want every expression in the where clause to be in the form
field comparison other_field_or_constant
Note that function( constant ) is still a constant as all such results are cached and re-used for each comparison rather than the function being called for each comparison.
In your case, here, in pseudo-code, is what you want to do. You want yesterday. Fine, every DBMS I know has a system function that returns Now. Yesterday is from midnight the day before Now right up to but not including midnight of Now.
where arrival_timestamp >= trunc( now ) - 1 day
and arrival_timestamp < trunc( now );
Translating the expression above to English is: "where the value is on or after midnight yesterday morning and any time up to but not including midnight last nite." In other words, "yesterday." Performing the comparison this way, you don't have to worry about how close to midnight the system clock can generate a timestamp. As soon as the time clicks to midnight last nite, no matter how large or extremely small that click may be, the comparison yields a valid result.
Notice, no manipulation is performed on arrival_timestamp, not even to convert it to a more convenient data type. The expression trunc( now ) (including the one where a day is subtracted) is a constant. The functions are called once and the result used over and over for every row. This is why all such functions must be deterministic. Even functions that are NOT in fact deterministic, like Now, are deemed to be deterministic for the life of the query.
For that reason you can do pretty much whatever you need to do on the right side of the comparison. For example, because SQL Server doesn't have a trunc function, the first line written above would look like this:
where arrival_timestamp >= DateAdd( day, -1, DateAdd( day, DateDiff( day, 0, GetDate()), 0))
It doesn't matter. The calculation is performed once and the result used for the entire duration of the query.
So the where clause can now be seen as
where field >= constant1
and field < constant2;
This means if field is indexed, that index can now be used.
you can use the DATE_PART function
select * from message where DATE_PART('day', now() - arrival_timestamp) <= 1

Timestamp Difference In Hours for PostgreSQL

Is there a TIMESTAMPDIFF() equivalent for PostgreSQL?
I know I can subtract two timestamps to get a postgresql INTERVAL. I just want the difference between the two timestamps in in hours represented by an INT.
I can do this in MySQL like this:
TIMESTAMPDIFF(HOUR, links.created, NOW())
I just need the difference between two timestamps in hours represented as an integer.
Solution works for me:
SELECT "links_link"."created",
"links_link"."title",
(EXTRACT(EPOCH FROM current_timestamp - "links_link"."created")/3600)::Integer AS "age"
FROM "links_link"
The first things popping up
EXTRACT(EPOCH FROM current_timestamp-somedate)/3600
May not be pretty, but unblocks the road. Could be prettier if division of interval by interval was defined.
Edit: if you want it greater than zero either use abs or greatest(...,0). Whichever suits your intention.
Edit++: the reason why I didn't use age is that age with a single argument, to quote the documentation: Subtract from current_date (at midnight). Meaning you don't get an accurate "age" unless running at midnight. Right now it's almost 1am here:
select age(current_timestamp);
age
------------------
-00:52:40.826309
(1 row)
Get fields where a timestamp is greater than date in postgresql:
SELECT * from yourtable
WHERE your_timestamp_field > to_date('05 Dec 2000', 'DD Mon YYYY');
Subtract minutes from timestamp in postgresql:
SELECT * from yourtable
WHERE your_timestamp_field > current_timestamp - interval '5 minutes'
Subtract hours from timestamp in postgresql:
SELECT * from yourtable
WHERE your_timestamp_field > current_timestamp - interval '5 hours'
Michael Krelin's answer is close is not entirely safe, since it can be wrong in rare situations. The problem is that intervals in PostgreSQL do not have context with regards to things like daylight savings. Intervals store things internally as months, days, and seconds. Months aren't an issue in this case since subtracting two timestamps just use days and seconds but 'days' can be a problem.
If your subtraction involves daylight savings change-overs, a particular day might be considered 23 or 25 hours respectively. The interval will take that into account, which is useful for knowing the amount of days that passed in the symbolic sense but it would give an incorrect number of the actual hours that passed. Epoch on the interval will just multiply all days by 24 hours.
For example, if a full 'short' day passes and an additional hour of the next day, the interval will be recorded as one day and one hour. Which converted to epoch/3600 is 25 hours. But in reality 23 hours + 1 hour should be a total of 24 hours.
So the safer method is:
(EXTRACT(EPOCH FROM current_timestamp) - EXTRACT(EPOCH FROM somedate))/3600
As Michael mentioned in his follow-up comment, you'll also probably want to use floor() or round() to get the result as an integer value.
You can use the "extract" or "date_part" functions on intervals as well as timestamps, but I don't think that does what you want. For example, it gives 3 for an interval of '2 days, 3 hours'. However, you can convert an interval to a number of seconds by specifying 'epoch' as the time element you want: extract(epoch from '2 days, 3 hours'::interval) returns 183600 (which you then divide by 3600 to convert seconds to hours).
So, putting this all together, you get basically Michael's answer: extract(epoch from timestamp1 - timestamp2)/3600. Since you don't seem to care about which timestamp precedes which, you probably want to wrap that in abs:
SELECT abs(extract(epoch from timestamp1 - timestamp2)/3600)
postgresql get seconds difference between timestamps
SELECT (
(extract (epoch from (
'2012-01-01 18:25:00'::timestamp - '2012-01-01 18:25:02'::timestamp
)
)
)
)::integer
which prints:
-2
Because the timestamps are two seconds apart. Take the number and divide by 60 to get minutes, divide by 60 again to get hours.
extract(hour from age(now(),links.created)) gives you a floor-rounded count of the hour difference.
To avoid the epoch conversion you could extract the days multiply them by 24 and add the extraction of hours to it.
select current_timestamp, (current_timestamp - interval '500' hour), (extract(day from (current_timestamp - (current_timestamp - interval '500' hour)) * 24) + extract(hour from (current_timestamp - (current_timestamp - interval '500' hour))));
For MySQL timestampdiff I don't know, but for MSSQL datediff(hour, start, end) the best equivalent in PostgreSQL is floor(extract(epoch from end - start)/3600), because in MSSQL select datediff(hour,'2021-10-31 18:00:00.000', '2021-10-31 18:59:59.999') return 0
This might sound crazy to a lot of developers who like to take advantage of database functions,
But after exhaustive problems thinking, creating and bugfixing applications for mysql and postgrsql with php comparing date functions, I've come to the conclusion (for myself), that the easiest way, that is the simplest with less SQL headaches is not to take advantage of any of them.
Why? because if you are developing in a middleware language like PHP, PHP has all of these functions, and they are easier to implement in the application ode as comparing integers. PostgreSQL timestamp is NOT == UNIX TIMESTAMP and MySQL's UNIX TIMESTAMP is NOT PostgresQL's or Oracles timestamp.. it gets harder to port if you use database timestamps..
so just use an integer, not a timestamp,
as the number of seconds since january 1st 1970 midnight. and never mind database timestamps.
, and use gmdate() and store everything as gmt time to avoid timezone issues.
if you need to search, sort or compare the day from other data, or the month or the year or the day of the week, or anything, in your application,
and INTEGER datatype for time_day, time_hour, time_seconds.. or whatever you wnat to index to be searched will make for smoother and more portable databases.
you can just use one field, in most instances: INTEGER time_created NOT NULL
(more fields in your database row is the only drawback to this solution that i have found, and that doesnt cause as many headaches, or cups of coffee :)
php's date functions are outstanding to compare dates,
but in mysql or postgresql, comparing dates ? nah.. use integer sql comparisons
i realize it may SEEM easier to use CURRENT_TIMESTAMP on an insert function. HA!
don't be fooled.
You cant do DELETE FROM SESSION_TABLE WHERE time-initialized < '2 days'
if time-intitialized is a postgresql timestamp.
but you CAN do:
DELETE FROM SESSION_TABLE WHERE time_initialized < '$yesterday'
As long as you set $yesterday in php as the integer of seconds since 1970 that yesterday was.
This is easier housekeeping of session records than comparing timestamps in postgresql select statements.
SELECT age(), SELECT extract(), and asbtime are headaches in an of themselves. this is just my opinion.
you can do addition, substraction, <, >, all with php date objects
_peter_sysko
U4EA Networks, Inc.