parse_timestamp vs format_timestamp bigquery - sql

Could someone help me understand why these two queries are returning different results in bigquery?
select FORMAT_TIMESTAMP('%F %H:%M:%E*S', "2018-10-01 00:00:00" , 'Europe/London')
returns 2018-10-01 01:00:00
select PARSE_TIMESTAMP('%F %H:%M:%E*S', "2018-10-0100:00:00", "Europe/London")
returns 2018-09-30 23:00:00 UTC
As 2018-10-01 is during british summer time (UTC +1), I would've expected both queries to return 2018-09-30 23:00:00 UTC

The first is given a timestamp which is in UTC. It then converts it to the corresponding time in Europe/London. The return value is a string representing the time in the local timezone.
The second takes a string representation and returns a UTC timestamp. The representation is assumed to be in Europe/London.
So, the two functions are going in different directions, one from UTC to the local time and the other from the local time to UTC.

Related

Timestamp string conversion / from_utc_timestamp

I need to convert 2021-10-03 15:10:00.0 as 2021-10-03T15:10:00-04:00
I tried with.
from_utc_timestamp(from_unixtime(unix_timestamp('2021-10-03 15:10:00.0', "yyyy-MM-dd HH:mm:ss.S"),"yyyy-MM-dd'T'HH:mm:ssXXX"),"America/New_York")
I got Null value
Any suggestions please
from_utc_timestamp can accept timestamp or compatible string (yyyy-MM-dd HH:mm:ss.S), or bigint, not this: "yyyy-MM-dd'T'HH:mm:ssXXX"
Hive timestamps are timezoneless. Once you converted from UTC to America/NY, the timezone information is lost, only you know in which timezone it is, having timestamp converted it is already impossible to derive the timezone from it.
You can concatenate with timezone, conversion like this returns what you need but it works for particular date only. In December -05:00 timezone should be usedm not +04:00:
date_format(from_utc_timestamp('2021-10-03 15:10:00.0',"America/New_York"),"yyyy-MM-dd'T'HH:mm:ss+04:00") --This is wrong!!!
From_utc_timestamp is Daylight saving aware. It can be -05:00 or -04:00 depending on the date.
Consider this example, first returns 5, second returns 4:
select (unix_timestamp("2020-01-01 12:00:00.0")-unix_timestamp(from_utc_timestamp("2020-01-01 12:00:00.0","America/New_York")))/60/60
select (unix_timestamp("2020-10-19 12:00:00.0")-unix_timestamp(from_utc_timestamp("2020-10-19 12:00:00.0","America/New_York")))/60/60
So, you can get current time zone corresponding to America/New_York for the same timestamp and concatenate it with converted timestamp:
select concat(date_format(from_utc_timestamp('2021-10-03 15:10:00.0',"America/New_York"),"yyyy-MM-dd'T'HH:mm:ss"),'+0',
--get hrs shift
(unix_timestamp("2021-10-03 15:10:00.0")-unix_timestamp(from_utc_timestamp("2021-10-03 15:10:00.0","America/New_York"))) div 3600,':00')
Result:
2021-10-03T11:10:00+04:00
It should work correctly with different timestamps taking into account daylight saving time for America/New_York.

How to deal with timestamps without a timezone?

The NYC bike and taxi datasets list the time when events happened in local time. Timestamps like 2018-01-07 10:30:00 means it was 10am in NY at the time.
When I ingest these timestamps into BigQuery, BigQuery assumes they are GMT - appending the incorrect timezone information.
How can I fix this?
2 choices:
Use DATETIME instead of TIMESTAMP - DATETIME has the same information than TIMESTAMP, except no timezone information is added.
Since this is NY, you can append the US/Eastern timezone when ingesting - it will correctly identify summer daylight saving changes and so on
For example:
SELECT TIMESTAMP('2018-3-10 10:00:00', 'US/Eastern')
, TIMESTAMP('2018-5-10 10:00:00', 'US/Eastern')
2018-03-10 15:00:00 UTC
2018-05-10 14:00:00 UTC

Calculate time difference between two columns of string type in hive without changing the data type string

I am trying to calculate the time difference between two columns of a row which are of string data type. If the time difference between them is less than 2 hours then select the first column of that row else if the time difference is greater than 2 hours then select the second column of that row. It can be done by converting the columns to datetime format, but I want the result to be in string only. How can I do that? The data looks like this:
col1(string type)
2018-07-16 02:23:00
2018-07-26 12:26:00
2018-07-26 15:32:00
col2(string type)
2018-07-16 02:36:00
2018-07-26 14:29:00
2018-07-27 15:38:00
I think you don't need to convert the columns to datetime format, since the data in your case is already ordered (yyyy-MM-dd hh:mm:ss). You just need to take all the digits and take it into one string (yyyyMMddhhmmss) then you can apply your selection which is bigger or smaller than 2 hours (here 20000 since the hour is followed by mmss). By looking at your example (assuming col2 > col1), this query would work:
SELECT case when regexp_replace(col2,'[^0-9]', '')-regexp_replace(col1,'[^0-9]', '') < 20000 then col1 else col2 end as col3 from your_table;
Use unix_timestamp() to convert string timestamp to seconds.
The difference in hours will be:
hive> select (unix_timestamp('2018-07-16 02:23:00')- unix_timestamp('2018-07-16 02:36:00'))/60/60;
OK
-0.21666666666666667
Important update: this method will work correctly only if time zone is configured as UTC. Because for DST timezones for some marginal cases Hive converts time during timestamp operations. Consider this example for PDT time zone:
hive> select hour('2018-03-11 02:00:00');
OK
3
Note the hour is 3, not 2. This is because 2018-03-11 02:00:00 cannot exist in PDT time zone because exactly at 2018-03-11 02:00:00 time is adjusted and becomes 2018-03-11 03:00:00.
The same happens when converting to unix_timestamp. For PDT time zone unix_timestamp('2018-03-11 03:00:00') and unix_timestamp('2018-03-11 02:00:00') will return the same timestamp:
hive> select unix_timestamp('2018-03-11 03:00:00');
OK
1520762400
hive> select unix_timestamp('2018-03-11 02:00:00');
OK
1520762400
And few links for your reference:
https://community.hortonworks.com/questions/82511/change-default-timezone-for-hive.html
http://boristyukin.com/watch-out-for-timezones-with-sqoop-hive-impala-and-spark-2/
Also have a look at this jira please: Hive should carry out timestamp computations in UTC

Extract date,month,year and month name from the unix timestamp with postgresql

I use postgres for the rails app and I have a unix timestamp in postgresql db. I have a requirement to select and group by the dd-mm-yyyy and by month name.
Consider I have the following unix timestamp
1425148200
and I would need to change this to datetime and I used to_timestamp which returned
2015-02-28 18:30:00 UTC
and I tried to convert the datetime to local timezone using
::timestamp without time zone AT TIME ZONE 'IST'
but that did not give time in required timezone and instead it returned
2015-02-28 16:30:00 UTC
and I tried to get the date part using ::date which returned
Sat, 28 Feb 2015
So please help me get the dd-mm-yyyy in specified timezone and month name(March) from the unix timestamp.
Thanks in Advance!
select to_char(to_timestamp('1425148200')::timestamptz at time zone 'UTC-5:30','DD-MM-YYYY & of course Month')
01-03-2015 & of course March
It is postgres mistake I guess
according to http://www.postgresql.org/docs/7.2/static/timezones.html

Converting UTC 0 to UTC local in SQL. But for two different time zone

I want historic date convert UTC 0 to UTC local in SQL. Like;
2012-11-23
2013-01-08
2014-02-23
But we have 2 different time zone. We use UTC +2 after last sunday in March and use UTC +3 after last sunday October. I need solution immediately guys. Please help me...
Try this:
SELECT CONVERT_TZ('your date ','+your time zone','+time zone you want');
SELECT CONVERT_TZ('2004-01-01 12:00:00','+02:00' ,'+03:00'); // in your case, from +2 to +3
See this link
If you need a dynamic thing, you'll need the timezone of each history (maybe you can store it in a separeted column), so I think you can do something like this:
SELECT CONVERT_TZ('your_date_column','local_timezone' ,'time_zone_column');