Generating time series between two dates in PostgreSQL - sql

I have a query like this that nicely generates a series of dates between 2 given dates:
select date '2004-03-07' + j - i as AllDate
from generate_series(0, extract(doy from date '2004-03-07')::int - 1) as i,
generate_series(0, extract(doy from date '2004-08-16')::int - 1) as j
It generates 162 dates between 2004-03-07 and 2004-08-16 and this what I want. The problem with this code is that it wouldn't give the right answer when the two dates are from different years, for example when I try 2007-02-01 and 2008-04-01.
Is there a better solution?

Can be done without conversion to/from int (but to/from timestamp instead)
SELECT date_trunc('day', dd):: date
FROM generate_series
( '2007-02-01'::timestamp
, '2008-04-01'::timestamp
, '1 day'::interval) dd
;

To generate a series of dates this is the optimal way:
SELECT t.day::date
FROM generate_series(timestamp '2004-03-07'
, timestamp '2004-08-16'
, interval '1 day') AS t(day);
Additional date_trunc() is not needed. The cast to date (day::date) does that implicitly.
But there is also no point in casting date literals to date as input parameter. Au contraire, timestamp is the best choice. The advantage in performance is small, but there is no reason not to take it. And you do not needlessly involve DST (daylight saving time) rules coupled with the conversion from date to timestamp with time zone and back. See below.
Equivalent, less explicit short syntax:
SELECT day::date
FROM generate_series(timestamp '2004-03-07', '2004-08-16', '1 day') day;
Or with the set-returning function in the SELECT list:
SELECT generate_series(timestamp '2004-03-07', '2004-08-16', '1 day')::date AS day;
The AS keyword is required in the last variant, Postgres would misinterpret the column alias day otherwise. And I would not advise that variant before Postgres 10 - at least not with more than one set-returning function in the same SELECT list:
What is the expected behaviour for multiple set-returning functions in SELECT clause?
(That aside, the last variant is typically fastest by a tiny margin.)
Why timestamp [without time zone]?
There are a number of overloaded variants of generate_series(). Currently (Postgres 11):
SELECT oid::regprocedure AS function_signature
, prorettype::regtype AS return_type
FROM pg_proc
where proname = 'generate_series';
function_signature | return_type
:-------------------------------------------------------------------------------- | :--------------------------
generate_series(integer,integer,integer) | integer
generate_series(integer,integer) | integer
generate_series(bigint,bigint,bigint) | bigint
generate_series(bigint,bigint) | bigint
generate_series(numeric,numeric,numeric) | numeric
generate_series(numeric,numeric) | numeric
generate_series(timestamp without time zone,timestamp without time zone,interval) | timestamp without time zone
generate_series(timestamp with time zone,timestamp with time zone,interval) | timestamp with time zone
(numeric variants were added with Postgres 9.5.) The relevant ones are the last two in bold taking and returning timestamp / timestamptz.
There is no variant taking or returning date. An explicit cast is needed to return date. The call with timestamp arguments resolves to the best variant directly without descending into function type resolution rules and without additional cast for the input.
timestamp '2004-03-07' is perfectly valid, btw. The omitted time part defaults to 00:00 with ISO format.
Thanks to function type resolution we can still pass date. But that requires more work from Postgres. There is an implicit cast from date to timestamp as well as one from date to timestamptz. Would be ambiguous, but timestamptz is "preferred" among "date/time types". So the match is decided at step 4d.:
Run through all candidates and keep those that accept preferred types
(of the input data type's type category) at the most positions where
type conversion will be required. Keep all candidates if none accept
preferred types. If only one candidate remains, use it; else continue
to the next step.
In addition to the extra work in function type resolution this adds an extra cast to timestamptz - which not only adds more cost, it can also introduce problems with DST leading to unexpected results in rare cases. (DST is a moronic concept, btw, can't stress this enough.) Related:
How do I generate a date series in PostgreSQL?
How do I generate a time series in PostgreSQL?
I added demos to the fiddle showing the more expensive query plan:
db<>fiddle here
Related:
Is there a way to disable function overloading in Postgres
Generate series of dates - using date type as input
Postgres data type cast

You can generate series directly with dates. No need to use ints or timestamps:
select date::date
from generate_series(
'2004-03-07'::date,
'2004-08-16'::date,
'1 day'::interval
) date;

You can also use this.
select generate_series ( '2012-12-31'::timestamp , '2018-10-31'::timestamp , '1 day'::interval) :: date

Related

Group by day from nanosecond timestamp

I have a table column transaction_timestamp storing timestamps as epochs with nanosecond resolution.
How do I group and/or count by day? I guess I have to convert the nanosecond timestamp to milliseconds first. How can I do that?
I tried:
SELECT DATE_TRUNC('day', CAST((transaction_timestamp /pow(10,6))as bigint)), COUNT(*)
FROM transaction
GROUP BY DATE_TRUNC('day', transaction_timestamp)
which is does not work:
error: function date_trunc(unknown, bigint) does not exist
I also tried this:
SELECT DATE_TRUNC('day', to_timestamp(transaction_timestamp / 1000000000.0)),
COUNT(*)
FROM transaction
GROUP BY DATE_TRUNC('day', transaction_timestamp)
Basic conversion as instructed here:
What kind of datestyle can this be?
Repeat the same expression in GROUP BY, or use a simple positional reference, like:
SELECT date_trunc('day', to_timestamp(transaction_timestamp / 1000000000.0))
, count(*)
FROM transaction
GROUP BY 1;
Be aware that to_timestamp() assumes UTC time zone for the given epoch to produce a timestamp with time zone (timestamptz). The following date_trunc() then uses the timezone setting of your current session to determine where to truncate "days". You may want to define a certain time zone explicitly ...
Basics:
Ignoring time zones altogether in Rails and PostgreSQL
Typically, it's best to work with a proper timestamptz to begin with. Unfortunately, Postgres timestamps only offer microsecond resolution. Since you need nanoseconds, your approach seems justified.

How to get the actual timestamp minus 2 months and the first day without time part in h2 database?

In H2 I would like to get the actual timestamp minus 2 months and on the first day of month without the time part?
eg.: 2020-03-09 13:46:55 => 2020-01-01 00:00:00
Thanks a lot
Try this:
select FORMATDATETIME(DATEADD(mm,-2,CURRENT_DATE) ,'Y-MM-01');
SELECT DATE_TRUNC('MONTH', TIMESTAMP '2020-03-09 13:46:55' - INTERVAL '2' MONTH)
/
SELECT DATE_TRUNC('MONTH', LOCALTIMESTAMP - INTERVAL '2' MONTH)
should be used in recent releases of H2. It isn't supported by historic versions, however.
FORMATDATETIME is slow, has different known bugs, and it produces a VARCHAR value that needs an additional implicit or explicit cast back to a datetime value.

google bigquery select from a timestamp column between now and n days ago

I have a dataset in bigquery with a TIMESTAMP column "register_date" (sample value "2017-11-19 22:45:05.000 UTC" ).
I need to filter records based on x days or weeks before today criteria.
Example query
select all records which are 2 weeks old.
Currently I have this query (which I feel like a kind of hack) that works and returns the correct results
SELECT * FROM `my-pj.my_dataset.sample_table`
WHERE
(SELECT
CAST(DATE(register_date) AS DATE)) BETWEEN DATE_ADD(CURRENT_DATE(), INTERVAL -150 DAY)
AND CURRENT_DATE()
LIMIT 10
My question is do I have to use all that CASTing stuff on a TIMESTAMP column (which seems like over complicating the otherwise simple query)?
If I remove the CASting part, my query doesn't run and returns error.
Here is my simplified query
SELECT
*
FROM
`my-pj.my_dataset.sample_table`
WHERE
register_date BETWEEN DATE_ADD(CURRENT_DATE(), INTERVAL -150 DAY)
AND CURRENT_DATE()
LIMIT
10
that results into an error
Query Failed
Error: No matching signature for operator BETWEEN for argument types: TIMESTAMP, DATE, DATE. Supported signature: (ANY) BETWEEN (ANY) AND (ANY) at [6:17]
any insight is highly appreciated.
Use timestamp functions:
SELECT t.*
FROM `my-pj.my_dataset.sample_table` t
WHERE register_date BETWEEN TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -150 DAY) AND CURRENT_TIMESTAMP()
LIMIT 10;
BigQuery has three data types for date/time values: date, datetime, and timestamp. These are not mutually interchangeable. The basic idea is:
Dates have no time component and no timezone.
Datetimes have a time component and no timezone.
Timestamp has both a time component and a timezone. In fact, it represents the value in UTC.
INTERVAL values are defined in gcp documentation
Conversion between the different values is not automatic. Your error message suggests that register_date is really stored as a Timestamp.
One caveat (from personal experience): the definition of day is based on UTC. This is not much of an issue if you are in London. It can be a bigger issue if you are in another time zone and you want the definition of "day" to be based on the local time zone. If that is an issue for you, ask another question.

Timestamp query to hours

I need some help with timestamps with postgresql. I have a column for the timestamp with timezone named download_at for when a user downloaded an app and a column user_id which is an integer. I am trying to extract user IDs of users that have downloaded within the last 168 hours from the last 60 days of information. I am a bit confused on how I can approach this and felt stuck because of the two different times. I believe I might have to play around with the trunc function but felt a bit stuck.
A basic example:
SELECT *
FROM table1
WHERE download_at > now() - '186 hours'::interval
Postgres is phenomenal at handling dates and times. A breakdown of what this does:
now() --function that returns the current time as a datetime object
'186 hours'::interval --a string cast to an interval
In postgres :: does casting. When casting to an interval Postgres will turn formatted English to an interval object. Since you can subtract datetime and interval objects it'll do the rest for you.

Returning results for the whole of yesterday and not the last 24 hours

I have this query that needs data for yesterday. What i have below returns result for the last 24 hours which is different from yesterday 00.00 - 23.59.
Here is what i have but doesn't solve the problem.
Select * from message where now() - arrival_timestamp <= interval '24 hour'
You could cast the timestamp to date with the syntax expression::type (more info on The Type Casts section of The PostgreSQL Documentation). Sufficient tools for making the comparison between dates can be found from the section 9.9. Date/Time Functions and Operators:
SELECT * FROM message WHERE arrival_timestamp::date = current_date - 1;
If you have an index on arrival_timestamp the cast to date would render the index unusable in the query. In that case use other comparison operators:
SELECT * FROM message WHERE arrival_timestamp >= current_date - 1 AND arrival_timestamp < current_date;
You might try this:
SELECT * FROM message
WHERE date_trunc('day', arrival_timestamp) = current_date - interval '1 day'
When working with time or any other data type, you want to avoid a filter criteria in the form
function( field ) comparison other_field_or_constant
This is non-sargable. What that means is that the optimizer can't take advantage if field is indexed. Since it cannot predict what value will result from the function call, it must perform a complete table scan and submit every field value to the function to make the comparison.
So you want every expression in the where clause to be in the form
field comparison other_field_or_constant
Note that function( constant ) is still a constant as all such results are cached and re-used for each comparison rather than the function being called for each comparison.
In your case, here, in pseudo-code, is what you want to do. You want yesterday. Fine, every DBMS I know has a system function that returns Now. Yesterday is from midnight the day before Now right up to but not including midnight of Now.
where arrival_timestamp >= trunc( now ) - 1 day
and arrival_timestamp < trunc( now );
Translating the expression above to English is: "where the value is on or after midnight yesterday morning and any time up to but not including midnight last nite." In other words, "yesterday." Performing the comparison this way, you don't have to worry about how close to midnight the system clock can generate a timestamp. As soon as the time clicks to midnight last nite, no matter how large or extremely small that click may be, the comparison yields a valid result.
Notice, no manipulation is performed on arrival_timestamp, not even to convert it to a more convenient data type. The expression trunc( now ) (including the one where a day is subtracted) is a constant. The functions are called once and the result used over and over for every row. This is why all such functions must be deterministic. Even functions that are NOT in fact deterministic, like Now, are deemed to be deterministic for the life of the query.
For that reason you can do pretty much whatever you need to do on the right side of the comparison. For example, because SQL Server doesn't have a trunc function, the first line written above would look like this:
where arrival_timestamp >= DateAdd( day, -1, DateAdd( day, DateDiff( day, 0, GetDate()), 0))
It doesn't matter. The calculation is performed once and the result used for the entire duration of the query.
So the where clause can now be seen as
where field >= constant1
and field < constant2;
This means if field is indexed, that index can now be used.
you can use the DATE_PART function
select * from message where DATE_PART('day', now() - arrival_timestamp) <= 1