hive from_utc_timestamp returns wrong time - hive

I am using Hive and wants to get the UTC time.
By running
select date_format(to_utc_timestamp(bigint(1621446734295),'UTC'),'yyyy-MM-dd HH:mm:ss.SSS')
It returns: 2021-05-20 01:52:14.295.
However, this timestamp refers 2021-05-19 17:52:14.295 GMT.
Why does the function to_utc_timestamp still returns time with timezone? Do I need to change some settings for Hive?

Related

Timestamp string conversion / from_utc_timestamp

I need to convert 2021-10-03 15:10:00.0 as 2021-10-03T15:10:00-04:00
I tried with.
from_utc_timestamp(from_unixtime(unix_timestamp('2021-10-03 15:10:00.0', "yyyy-MM-dd HH:mm:ss.S"),"yyyy-MM-dd'T'HH:mm:ssXXX"),"America/New_York")
I got Null value
Any suggestions please
from_utc_timestamp can accept timestamp or compatible string (yyyy-MM-dd HH:mm:ss.S), or bigint, not this: "yyyy-MM-dd'T'HH:mm:ssXXX"
Hive timestamps are timezoneless. Once you converted from UTC to America/NY, the timezone information is lost, only you know in which timezone it is, having timestamp converted it is already impossible to derive the timezone from it.
You can concatenate with timezone, conversion like this returns what you need but it works for particular date only. In December -05:00 timezone should be usedm not +04:00:
date_format(from_utc_timestamp('2021-10-03 15:10:00.0',"America/New_York"),"yyyy-MM-dd'T'HH:mm:ss+04:00") --This is wrong!!!
From_utc_timestamp is Daylight saving aware. It can be -05:00 or -04:00 depending on the date.
Consider this example, first returns 5, second returns 4:
select (unix_timestamp("2020-01-01 12:00:00.0")-unix_timestamp(from_utc_timestamp("2020-01-01 12:00:00.0","America/New_York")))/60/60
select (unix_timestamp("2020-10-19 12:00:00.0")-unix_timestamp(from_utc_timestamp("2020-10-19 12:00:00.0","America/New_York")))/60/60
So, you can get current time zone corresponding to America/New_York for the same timestamp and concatenate it with converted timestamp:
select concat(date_format(from_utc_timestamp('2021-10-03 15:10:00.0',"America/New_York"),"yyyy-MM-dd'T'HH:mm:ss"),'+0',
--get hrs shift
(unix_timestamp("2021-10-03 15:10:00.0")-unix_timestamp(from_utc_timestamp("2021-10-03 15:10:00.0","America/New_York"))) div 3600,':00')
Result:
2021-10-03T11:10:00+04:00
It should work correctly with different timestamps taking into account daylight saving time for America/New_York.

How do I prevent Redshift INSERT datetime from dropping the timezone?

I have a String in this format: 2018-11-01T00:00:00-07:00 and I would like to convert it to a TIMESTAMP and insert it into a TIMESTAMP column. However, when I insert it, it drops the -07:00 without first converting it to -00:00. How do I ensure that it is converted and stored in Redshift properly?
Here is an example:
select ORIGINAL_DATE, TO_TIMESTAMP(ORIGINAL_DATE,'YYYY-MM-DD HH24:MI:SS') FROM CDW_LANDING.X where id = XXXXXX;
=> 2018-11-01T00:00:00-07:00 2018-10-31 17:00:00
The TO_TIMESTAMP converts it to 2018-10-31 17:00:00 which is what I want. However, when I insert it, it becomes 2018-11-01 00:00:00 and simply drops the -07:00.
Here is the example:
insert into cdw_stage.X (ORIG_DT)
select TO_TIMESTAMP(ORIGINAL_DATE,'YYYY-MM-DD HH24:MI:SS')
from CDW_LANDING.INVOICE where id = XXXXXX;
But when I query it with select ORIG_DT from cdw_landing.X;, it displays 2018-11-01 00:00:00. What I would like to see is 2018-10-31 17:00:00 which is what the TO_TIMESTAMP function should do.
The ORIG_DT in Redshift is in TIMESTAMP format. The input date is in VARCHAR.
How do I get Redshift to save this correctly? I also added postgres tag because Redshift is based off of postgres. Thank you so much!!!
2018-11-01T00:00:00-07:00 is not a timestamp (timestamp without time zone) literal, strictly speaking. It is a timestamptz (timestamp with time zone) literal. This is the root of all pain in your question. The wrong cast to timestamp ignores the offset. The Postgres manual:
In a literal that has been determined to be timestamp without time zone, PostgreSQL will silently ignore any time zone indication. That
is, the resulting value is derived from the date/time fields in the
input value, and is not adjusted for time zone.
Bold emphasis mine.
The use of TO_TIMESTAMP() can't save you. The Redshift manual:
Formats that include a time zone (TZ, tz, or OF) are not supported as input.
(The same is true in Postgres.)
Solution
Cast to timestamptz (or use a column of that type to begin with), the rest should fall in place:
SELECT cast('2018-11-01T00:00:00-07:00' AS timestamptz);
Or:
SELECT '2018-11-01T00:00:00-07:00'::timestamptz;
The manual about casting in Redshift.
When an actual timestamptz is assigned to a timestamp column it is converted according to the current timezone setting of the session automatically. If you want a different target timezone, use the AT TIME ZONE construct. Details:
Ignoring time zones altogether in Rails and PostgreSQL
The related answer is for Postgres, but timestamp handling in Redshift (while differing in many other aspects!) is the same. The Redshift manual:
When converting DATE or TIMESTAMP to TIMESTAMPTZ, DATE or TIMESTAMP
are assumed to use the current session time zone. The session time
zone is UTC by default. For more information about setting the session
time zone, see timezone.

Converting only time to unixtimestamp in Hive

I have a column eventtime that only stores the time of day as string. Eg:
0445AM - means 04:45 AM. I am using the below query to convert to UNIX timestamp.
select unix_timestamp(eventtime,'hhmmaa'),eventtime from data_raw limit 10;
This seems to work fine for test data. I always thought unixtimestamp is a combination of date and time while here I only have the time. My question is what date does it consider while executing the above function? The timestamps seem to be quite small.
Unix timestamp is the bigint number of seconds from Unix epoch (1970-01-01 00:00:00 UTC). The unix time stamp is a way to track time as a running total of seconds.
select unix_timestamp('0445AM','hhmmaa') as unixtimestamp
Returns
17100
And this is exactly 4hrs, 45min converted to seconds.
select 4*60*60 + 45*60
returns 17100
And to convert it back use from_unixtime function
select from_unixtime (17100,'hhmmaa')
returns:
0445AM
If you convert using format including date, you will see it assumes the date is 1970-01-01
select from_unixtime (17100,'yyyy-MM-dd hhmmaa')
returns:
1970-01-01 0445AM
See Hive functions dosc here.
Also there is very useful site about Unix timestamp

date_trunc in hive is working incorrectly

I am running below query:
select a.event_date,
date_format(date_trunc('month', a.event_date), '%m/%d/%Y') as date
from monthly_test_table a
order by 1;
Output:
2017-09-15 | 09/01/2017
2017-10-01 | 09/30/2017
2017-11-01 | 11/01/2017
Can anyone tell me why for date "2017-10-01" it is showing me date as "09/30/2017" after using date_trunc.
Thanks in Advance...!
You are reverse formatting so it is incorrect.
Use the below Code
select a.event_date,
date_format(date_trunc('month', a.event_date), '%Y/%m/%d') as date
from monthly_test_table a
order by 1;
You can use date_add with a logic to subtract 1-day(yourdate) to replicate trunc.
For eg:
2017-10-01 - day('2017-10-01') is 1 and you add 1-1=0 days
2017-08-30 - day('2017-08-30') is 30 and you add 1-30=-29 days
I faced the same issue recently and resorted to using this logic.
date_add(from_unixtime(unix_timestamp(event_date,'yyyy-MM-dd'),'yyyy-MM-dd'),
1-day(from_unixtime(unix_timestamp(event_date,'yyyy-MM-dd'),'yyyy-MM-dd'))
)
PS: As far as i know, there is no date_trunc function in Hive documentation.
As per the source code below: UTC_CHRONOLOGY time is translated w.r.t. locale, also in Description it is mentioned that session timezone will be the precision, also refer to below URL.
#Description("truncate to the specified precision in the session timezone")
#ScalarFunction("date_trunc")
#LiteralParameters("x")
#SqlType(StandardTypes.DATE)
public static long truncateDate(ConnectorSession session, #SqlType("varchar(x)") Slice unit, #SqlType(StandardTypes.DATE) long date)
{
long millis = getDateField(UTC_CHRONOLOGY, unit).roundFloor(DAYS.toMillis(date));
return MILLISECONDS.toDays(millis);
}
See https://prestodb.io/docs/current/release/release-0.66.html:::
Time Zones:
This release has full support for time zone rules, which are needed to perform date/time calculations correctly. Typically, the session time zone is used for temporal calculations. This is the time zone of the client computer that submits the query, if available. Otherwise, it is the time zone of the server running the Presto coordinator.
Queries that operate with time zones that follow daylight saving can
produce unexpected results. For example, if we run the following query
to add 24 hours using in the America/Los Angeles time zone:
SELECT date_add('hour', 24, TIMESTAMP '2014-03-08 09:00:00');
Output: 2014-03-09 10:00:00.000

Extract date,month,year and month name from the unix timestamp with postgresql

I use postgres for the rails app and I have a unix timestamp in postgresql db. I have a requirement to select and group by the dd-mm-yyyy and by month name.
Consider I have the following unix timestamp
1425148200
and I would need to change this to datetime and I used to_timestamp which returned
2015-02-28 18:30:00 UTC
and I tried to convert the datetime to local timezone using
::timestamp without time zone AT TIME ZONE 'IST'
but that did not give time in required timezone and instead it returned
2015-02-28 16:30:00 UTC
and I tried to get the date part using ::date which returned
Sat, 28 Feb 2015
So please help me get the dd-mm-yyyy in specified timezone and month name(March) from the unix timestamp.
Thanks in Advance!
select to_char(to_timestamp('1425148200')::timestamptz at time zone 'UTC-5:30','DD-MM-YYYY & of course Month')
01-03-2015 & of course March
It is postgres mistake I guess
according to http://www.postgresql.org/docs/7.2/static/timezones.html