Counting events by day while respecting timezone in Postgres - sql

I have a very large table in a postgres database. It has a timestamp column, and I want to count the number of rows "per day" over a time period. It's easy enough to do this naïvely by GROUP BYing the output of date_trunc on the timestamp column. However, this does not account for the timezone of the user (I want to group by days such that midnight for the user is their midnight, not UTC midnight.
I've accomplished this by manually adjusting the timestamp by adding an interval representing the time zone offset of the user before truncating it. This works, but it's slow and results in the indexes I've set up not being used.
Is there a better way to accomplish this that's better-supported by Postgres?

If you know what timezone the values are in, you can switch time zones:
select date_trunc('day', col at time zone 'American/New_York')

Related

Convert sysdate to BST(British Summer Time)

I have a Java api which gets the data from oracle database which is in ET timezone. I want to query that table using sysdate on 2 columns but the sysdate should be picked as current BST date value not as ET date value.
select * from customers where sysdate between mem_registered_date and mem_deregistered_date;
How this can be done? please help
The simplest way to convert between time zones is to use a data type that understands time zones, a TIMESTAMP. Then when you've converted to the time zone you want then CAST it back to a DATE data type:
SELECT *
FROM customers
WHERE CAST( SYSTIMESTAMP AT TIME ZONE 'Europe/London' AS DATE )
BETWEEN mem_registered_date AND mem_deregistered_date;
I'm assuming that you want the current time in the United Kingdom (BST in summer and GMT in winter), if you want the time zone UTC+1 then use:
CAST( SYSTIMESTAMP AT TIME ZONE '+01:00' AS DATE )
If querying a column with a date-only value, without any time-of-day nor any time zone, that is, a column of a type akin to the SQL-standard DATE type, then use Java class LocalDate.
Generally best to use half-open definition of a span of time, where the beginning is inclusive while the ending is exclusive. This allows spans to nearly abut one another without gaps or overlap. So never use the SQL command BETWEEN for date-time ranges, as it is fully-closed (both beginning and ending are inclusive).
Get today’s date as seen in the wall-clock time used by the people of a particular region (a time zone).
BST is not a real time zone. For British time, use Europe/London.
ZoneId z = ZoneId.of( "Europe/London" ) ;
LocalDate today = LocalDate.now( z ) ;
The SQL will look something like this:
SELECT *
FROM event_
WHERE ? >= start_
AND ? < end_
;
Fill in the placeholders.
myPreparedStatement.setObject( 1 , today ) ;
myPreparedStatement.setObject( 2 , today ) ;
Load the date values from database into Java.
LocalDate start = myResultSet.getObject( … , LocalDate.class ) ;
TIMESTAMP WITHOUT TIME ZONE
The misnamed DATE in the Oracle database actually represents a date with time-of-date without the context of a time zone or offset-from-UTC. As such, this type cannot represent a moment, a specific point on the timeline. It the value is noon on the 23rd of January next year, we cannot know if that is noon in Tokyo, Toulouse, or Toledo — all different moments several hours apart. This DATE type is akin to the SQL-standard type TIMESTAMP WITHOUT TIME ZONE.
So for this data type, your question asking about time zones makes no sense. Apple and oranges. Involving time zones means you are tracking moments, specific points on the timeline. But the Oracle DATE cannot represent moments as discussed above.
To track moments, your should be using a column of a type akin to the SQL-standard TIMESTAMP WITH TIME ZONE.

NUMTODSINTERVAL in Redshift. Convert a number to hours

My goal is to offset timestamps in table Date_times to reflect local timezones. I have a Timezone_lookup table that I use for that, which has a column utc_convert and its values are (2, -1, 5, etc.) depending on the timezone.
I used to use NUMTODSINTERVAL in Oracle to be able to convert the utc_convert values to hours so I can add/subtract from the datetimes in the Date_times table.
For Redshift I found INTERVAL, but that's only hardcoding the offset with a specific number.
I also tried:
SELECT CAST(utc as TIME)
FROM(
SELECT *
,to_char(cast(utc_convert as int)||':00:00', 'HH24') as utc
from Timezon_lookup
)
But this doesn't work as some number in the utc_convert column have negative values. Any ideas?
Have you tried multiplying the offset by an interval:
select current_timestamp + utc_convert * interval '1 hour'
In Oracle, you can use the time zone of the user's session (which means you do not need to maintain a table of time zone look-ups or compensate for daylight savings time).
SELECT FROM_TZ( your_timestamp_column, 'UTC' ) AT LOCAL
FROM Date_times
SQLFIDDLE
In RedShift you should be able to use the CONVERT_TIMEZONE( ['source_timezone',] 'target_timezone', 'timestamp') function rather adding a number of intervals. This would allow you to specify the target_timezone as a numeric offset from UTC or as a time zone name (which would automatically compensate for DST).

Oracle timestamp, timezone and utc

I have an application, using an Oracle 11g (11.2.0.2.0 64 bit) db.
I have a lot of entries in a Person table. To access data I'm using different application (same data).
In this example I'm using birth_time field of my person table.
Some application queries data with birth_time directly, some other with to_char to reformat it, and some other with UTC function.
The problem is this: with same data, same query, result are different.
In this screenshot you can see the result with Oracle Sql developer (3.2.20.09)
All the timestamp are inserted with midnight timestamp, and in fact the to_char(..) and birth_time result are at midnight. UTC hours are returned with one hour less (Correct according to my timezone!) but some entry (here one for example, the last one) is TWO HOURS less (only few on thousand are Three)!!
The same query executed with sql*plus return the correct result with one hour of difference for all the entries!
Does anyone have a suggestion to approach this problem?
The issue is born because one of our application made with adobe flex seems to execute queries with UTC time, and the problems appears when you look at data with this component.
ps.:
"BIRTH_TIME" is TIMESTAMP (6)
Would it be possible for you to change the query used? If so, you could use the AT TIME ZONE expression to tell Oracle that this date is in UTC time zone:
SELECT SYS_EXTRACT_UTC(CAST(TRUNC(SYSDATE) AS TIMESTAMP)) AS val FROM dual;
Output:
VAL
----------------------------
13/11/20 23:00:00,000000000
Now, using AT TIME ZONE 'UTC' gets you the date you need:
SELECT SYS_EXTRACT_UTC(
CAST(
TRUNC(SYSDATE) AS TIMESTAMP)
AT TIME ZONE 'UTC') AS val FROM dual;
Output:
VAL
----------------------------
13/11/21 00:00:00,000000000

Query on a time range ignoring the date of timestamps

I'm trying to query a purchases table in my rails database (Postgres) and I want to query on time ranges.
For example, I'd like to know how many purchases were made between 2 PM and 3 PM across all dates.
There is a created_at column in this table but I don't see how to accomplish this without searching for a specific date as well.
I've tried:
Purchases.where("created_at BETWEEN ? and ?", Time.now - 1.hour, Time.now)
But this ultimately will just search for today's date with those times.
You need to extract just the hour portion from created_at using PostgreSQL's date_part/extract function.
SELECT EXTRACT(HOUR FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 20
For example, something like this:
Purchases.where(["EXTRACT(HOUR FROM created_at) BETWEEN ? AND ?", 13, 14])
Use a simple cast to time:
Purchases.where("created_at::time >= ?
AND created_at::time < ?", Time.now - 1.hour, Time.now)
This way you can easily test against arbitrary times.
Consider:
Which border to include / exclude. Common practice is to include the lower border and exclude the upper. BETWEEN includes both.
The exact data type (timestamp or timestamp with time zone) and how that interacts with your local time zone:
Ignoring time zones altogether in Rails and PostgreSQL
Index
If you run these queries a lot and need them fast and your table isn't very small, you will want to create a functional index for performance:
CREATE INDEX tbl_created_at_time_idx ON tbl (cast(created_at AS time));
I used the standard SQL syntax cast() instead of the Postgres syntactical shortcut ::. This is required for indices.
This index works for a timestamp [without time zone] since the cast is IMMUTABLE as required for an index. But it does not work for a timestamp with time zone, for the same reasons. You can fix this, but you need to define what you need exactly. Use the AT TIME ZONE construct to define the time zone to extract the time for. For instance, to measure against UTC time:
CREATE INDEX tbl_created_at_time_idx2
ON tbl(cast(created_at AT TIME ZONE 'UTC' AS time));
Similar question dealing with dates ignoring the year:
How do you do date math that ignores the year?

Get "time with time zone" from "time without time zone" and the time zone name

First off, I realize time with time zone is not recommended. I am going to use it because I'm comparing multiple time with time zone values to my current system time regardless of day. I.e. a user says start everyday at 08:00 and finish at 12:00 with THEIR time zone, not the system time zone. So, I have a time without time zone column in one table, let's call it SCHEDULES.time and I have a UNIX time zone name column in another table, let's call it USERS.tz.
My system time zone is 'America/Regina', which does not use DST and so the offset is always -06.
Given a time of '12:00:00' and a tz of 'America/Vancouver' I would like to select the data into a column of type time with time zone but I DO NOT want to convert the time to my time zone because the user has effectively said begin at when it is 12:00 in Vancouver, not in Regina.
Thus, doing:
SELECT SCHEDULES.time AT TIME ZONE USERS.tz
FROM SCHEDULES JOIN USERS on USERS.ID=SCHEDULES.USERID;
results (at the moment) in:
'10:00:00-08'
but I really want:
'12:00:00-08'
I can't find any documentation relating to applying a time zone to a time, other then AT TIME ZONE. Is there a way to accomplish this without character manipulation or other hacks?
UPDATE:
This can be accomplished by using string concatenation, casting, and the Postgres time zone view as such:
select ('12:00:00'::text || utc_offset::text)::timetz
from pg_timezone_names
where name = 'America/Vancouver';
However, this is fairly slow. There must be a better way, no?
UPDATE 2:
I apologize for the confusion. The SCHEDULES table DOES NOT use time with time zone, I am trying to SELECT a time with time zone by combining values from a time without time zone and a text time zone name.
UPDATE 3:
Thanks to all those involved for their (heated) discussion. :) I have been convinced to abandon my plan to use a time with time zone for my output and instead use a timestamp with time zone as it performs well, is more readable, and solves another problem that I was going to run into, time zones that roll into new dates. IE. '2011-11-21 23:59' in 'America/Vancouver' is '2011-11-22' in 'America/Regina'.
UPDATE 4:
As I said in my last update, I have chosen the answer that #MichaelKrelin-hacker first proposed and #JonSkeet finalized. That is, a timestamp with time zone as my final output is a better solution. I ended up using a query like:
SELECT timezone(USERS.tz, now()::date + SCHEDULES.time)
FROM SCHEDULES
JOIN USERS ON USERS.ID = SCHEDULES.USERID;
The timezone() format was rewritten by Postgres after I entered (current_date + SCHEDULES.time) AT TIME ZONE USERS.tz into my view.
WARNING: PostgreSQL newbie (see comments on the question!). I know a bit about time zones though, so I know what makes sense to ask.
It looks to me like this is basically an unsupported situation (unfortunately) when it comes to AT TIME ZONE. Looking at the AT TIME ZONE documentation it gives a table where the "input" value types are only:
timestamp without time zone
timestamp with time zone
time with time zone
We're missing the one you want: time without time zone. What you're asking is somewhat logical, although it does depend on the date... as different time zones can have different offsets depending on the date. For example, 12:00:00 Europe/London may mean 12:00:00 UTC, or it may mean 11:00:00 UTC, depending on whether it's winter or summer.
On my system, having set the system time zone to America/Regina, the query
SELECT ('2011-11-22T12:00:00'::TIMESTAMP WITHOUT TIME ZONE)
AT TIME ZONE 'America/Vancouver'
gives me 2011-11-22 14:00:00-06 as a result. That's not ideal, but it does at least give the instant point in time (I think). I believe that if you fetched that with a client library - or compared it with another TIMESTAMP WITH TIME ZONE - you'd get the right result. It's just the text conversion that then uses the system time zone for output.
Would that be good enough for you? Can you either change your SCHEDULES.time field to be a TIMESTAMP WITHOUT TIME ZONE field, or (at query time) combine the time from the field with a date to create a timestamp without time zone?
EDIT: If you're happy with the "current date" it looks like you can just change your query to:
SELECT (current_date + SCHEDULES.time) AT TIME ZONE USERS.tz
from SCHEDULES JOIN USERS on USERS.ID=SCHEDULES.USERID
Of course, the current system date may not be the same as the current date in the local time zone. I think this will fix that part...
SELECT ((current_timestamp AT TIME ZONE USERS.tz)::DATE + schedules.time)
AT TIME ZONE USERS.tz
from SCHEDULES JOIN USERS on USERS.ID=SCHEDULES.USERID
In other words:
Take the current instant
Work out the local date/time in the user's time zone
Take the date of that
Add the schedule time to that date to get a TIMESTAMP WITHOUT TIME ZONE
Use AT TIME ZONE to apply the time zone to that local date/time
I'm sure there's a better way, but I think it makes sense.
You should be aware that in some cases this could fail though:
What do you want the result to be for a time of 01:30 on a day when the clock skips from 01:00 to 02:00, so 01:30 doesn't occur at all?
What do you want the result to be for a time of 01:30 on a day when the clock goes back from 02:00 to 01:00, so 01:30 occurs twice?
Here is a demo how to calculate the times without casting to text:
CREATE TEMP TABLE schedule(t time, tz text);
INSERT INTO schedule values
('12:00:00', 'America/Vancouver')
,('12:00:00', 'US/Mountain')
,('12:00:00', 'America/Regina');
SELECT s.t AT TIME ZONE s.tz
- p.utc_offset
+ EXTRACT (timezone from now()) * interval '1s'
FROM schedule s
JOIN pg_timezone_names p ON s.tz = p.name;
Basically you have to subtract the UTC offset and add the offset of your local time zone to arrive at the given time zone.
You can speed up the calculation by hardcoding your local offset. In your case (America/Regina) that should be:
SELECT s.t AT TIME ZONE s.tz
- p.utc_offset
- interval '6h'
FROM schedule s
JOIN pg_timezone_names p ON s.tz = p.name;
As pg_timezone_names is a view and not actually a system table, it is rather slow - just like the demonstrated variant with casting to text representation and back.
I would store the time zone abbreviations and take the double cast via text without joining in pg_timezone_names for optimum performance.
FAST solution
The culprit that's slowing you down is pg_timezone_names. After some testing I found that pg_timezone_abbrevs is far superior. Of course, you have to save correct time zone abbreviations instead of time zone names to achieve this. Time zone names take DST into consideration automatically, time zone abbreviations are basically just codes for a time offset. The documentation:
A time zone abbreviation, for example PST. Such a specification merely
defines a particular offset from UTC, in contrast to full time zone names
which can imply a set of daylight savings transition-date rules as well.
Have a look at these test results or try yourself:
SELECT * FROM pg_timezone_names;
Total runtime: 541.007 ms
SELECT * FROM pg_timezone_abbrevs;
Total runtime: 0.523 ms
Factor 1000. Whether you go with your idea to cast to text and back to timetz or with my method to compute the time is not important. Both methods are very fast. Just don't use pg_timezone_names.
Actually, as soon as you save time zone abbreviations, you can take the casting route without any additional joins. Use the abbreviation instead of the utc_offset. Results are accurate as per your definition.
CREATE TEMP TABLE schedule(t time, abbrev text);
INSERT INTO schedule values
('12:00:00', 'PST') -- 'America/Vancouver'
,('12:00:00', 'MST') -- 'US/Mountain'
,('12:00:00', 'CST'); -- 'America/Regina'
-- calculating
SELECT s.t AT TIME ZONE s.abbrev
- a.utc_offset
+ EXTRACT (timezone from now()) * interval '1s'
FROM schedule s
JOIN pg_timezone_abbrevs a USING (abbrev);
-- casting (even faster!)
SELECT (t::text || abbrev)::timetz
FROM schedule s;