Difference between unix_timestamp and casting to timestamp - hive

I am having a situation for a hive table, to convert a two fields of numeric string (T1 and T2) to date timestamp format "YYYY-MM-DD hh:mm:ss.SSS" and to find difference of both.
I have tried two methods:
Method 1: Through CAST
Select CAST(regexp_replace(substring(t1, 1,17),'(\\d{4})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{3})','$1-$2-$3 $4:$5:$6.$7') as timestamp), CAST(regexp_replace(substring(t2, 1,17),'(\\d{4})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{3})','$1-$2-$3 $4:$5:$6.$7') as timestamp), CAST(regexp_replace(substring(t1, 1,17),'(\\d{4})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{3})','$1-$2-$3 $4:$5:$6.$7') as timestamp) - CAST(regexp_replace(substring(t2, 1,17),'(\\d{4})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{3})','$1-$2-$3 $4:$5:$6.$7') as timestamp) as time_diff
from tab1
And getting output as
Method 2: Through unix_timestamp
Select from_unixtime (unix_timestamp(substring(t1,1,17),'yyyyMMddhhmmssSSS'),'yyyy-MM-dd hh:mm:ss.SSS'), from_unixtime (unix_timestamp(substring(t2,1,17),'yyyyMMddhhmmssSSS'),'yyyy-MM-dd hh:mm:ss.SSS'), from_unixtime (unix_timestamp(substring(t1,1,17),'yyyyMMddhhmmssSSS'),'yyyy-MM-dd hh:mm:ss.SSS') - from_unixtime (unix_timestamp(substring(t2,1,17),'yyyyMMddhhmmssSSS'),'yyyy-MM-dd hh:mm:ss.SSS') as time_diff
from tab1;
And getting output as
I am not getting clear why there is difference in outputs.

unix_timestamp() gives you epoch time ie. time in seconds since unix epoch 1970-01-01 00:00:00
Whereas the the timestamp will provide date and time viz YYYY-MM-DD T HH:MI:SS
Hence an accurate way would be to convert the string timestamp to unix_timestamp(), subtract and then convert back using from_unixtime()
eg.
select from_unixtime(unix_timestamp('2020-04-12 01:30:02.000') - unix_timestamp('2020-04-12 01:29:43.000'))

Method 2 finally equates to something like this
select ('2020-04-12 01:30:02.000' - '2020-04-12 01:29:43.000') as time_diff;
You cannot subtract dates like this.. you have to use DateDiff.
In Hive DateDiff returns > 0 only if there is a diff in day else you get zero.

Related

When to use DATE_TRUNC() vs. DATE_PART()?

I do not know when to use DATE_TRUNC and DATE_PART() in a query.
I have not really tried much, just some web searches that I do not fully grasp but I just started learning SQL (Postgres).
They both do very different things. One truncates a date to the precision specified (kind of like rounding, in a way) and the other just returns a particular part of a datetime.
From the documentation:
date_part():
The date_part function is modeled on the traditional Ingres equivalent
to the SQL-standard function extract:
date_part('field', source)
Note that here the field parameter needs to be a string value, not a
name. The valid field names for date_part are the same as for extract.
For historical reasons, the date_part function returns values of type
double precision. This can result in a loss of precision in certain
uses. Using extract is recommended instead.
SELECT date_part('day', TIMESTAMP '2001-02-16 20:38:40');
Result: 16
SELECT date_part('hour', INTERVAL '4 hours 3 minutes');
Result: 4
date_trunct():
The function date_trunc is conceptually similar to the trunc function
for numbers.
date_trunc(field, source [, time_zone ]) source is a value expression
of type timestamp, timestamp with time zone, or interval. (Values of
type date and time are cast automatically to timestamp or interval,
respectively.) field selects to which precision to truncate the input
value. The return value is likewise of type timestamp, timestamp with
time zone, or interval, and it has all fields that are less
significant than the selected one set to zero (or one, for day and
month).
...
Examples (assuming the local time zone is America/New_York):
SELECT date_trunc('hour', TIMESTAMP '2001-02-16 20:38:40');
Result: 2001-02-16 20:00:00
SELECT date_trunc('year', TIMESTAMP '2001-02-16 20:38:40');
Result: 2001-01-01 00:00:00
SELECT date_trunc('day', TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40+00');
Result: 2001-02-16 00:00:00-05
SELECT date_trunc('day', TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40+00', 'Australia/Sydney');
Result: 2001-02-16 08:00:00-05
SELECT date_trunc('hour', INTERVAL '3 days 02:47:33');
Result: 3 days 02:00:00

How to substract 2 varchar dates in oracle?

I have these varchar : 20211026231735.
So I would like a query to substract actual sysdate to that date and convert the substraction to DAY HOURS AND SECONDS.
select TO_CHAR(SYSDATE,'YYYYMMDDHH24MISS') - start_time from TABLEA where job_name='jOB_AA_BB';
I get 4220.
Any help please? Thanks
When you do datetime arithmetic with the DATE datatype, you get back a NUMBER of days. To get an INTERVAL you can subtract two TIMESTAMPs. You don't say what the data type is for start_time, but you might get away with this:
select localtimestamp - start_time
from tablea where job_name='jOB_AA_BB';
LOCALTIMESTAMP gives you a TIMESTAMP value in the current session time zone. There's also CURRENT_TIMESTAMP, which give you the same thing in a TIMESTAMP WITH TIME ZONE and SYSTIMESTAMP that gives you the database time in TIMESTAMP WITH TIME ZONE. You may need to convert your start_time to avoid time zone differences, if any.
You can us the function numtodsinterval to convert the results of date arithmetic to an interval. If necessary then use extract to pull out the needed components.
with tablea(job_name, start_time) as
(select 'jOB_AA_BB','20211026231735' from dual)
select numtodsinterval((SYSDATE - to_date( start_time,'yyyymmddhh24miss')),'hour') date_diff
from tablea where job_name='jOB_AA_BB' ;
with tablea(job_name, start_time) as
(select 'jOB_AA_BB','20211026231735' from dual)
select extract (hour from date_diff) || ':' || extract (minute from date_diff)
from (
select numtodsinterval((sysdate - to_date( start_time,'yyyymmddhh24miss')),'day') date_diff
from tablea where job_name='jOB_AA_BB'
);
NOTE: I am not sure how you got any result, other than an error, as your query winds up as a string - a string. You should not convert sysdate to a string but your string to a date (better yet store it as the proper data type - date).
You can convert the value to a date (rather than converting SYSDATE to a string) and then subtract and explicitly return the value as an INTERVAL DAY TO SECOND type:
SELECT (SYSDATE - TO_DATE('20211026231735', 'YYYYMMDDHH24MISS')) DAY TO SECOND
FROM DUAL;
Or, for your table:
SELECT (SYSDATE - TO_DATE(start_time,'YYYYMMDDHH24MISS')) DAY(5) TO SECOND
FROM TABLEA
WHERE job_name='jOB_AA_BB';
db<>fiddle here

Subtracting days from current_timestamp() in Hive

I want to get the timestamp that is exactly 10 days before the current timestamp in Hive. I can get the current timestamp using the function current_timestamp() in hive (I don't want to use unix_timestamp() here because its deprecated in recent versions of hive).
So, How do I get the timestamp which is exactly 10 days before the current timestamp? Any function like add_days available?
Source: date_sub(date/timestamp/string startdate, tinyint/smallint/int days), Subtracts a number of days to date
date_sub(current_timestamp(), 10)
Format to 'yyyy-MM-dd HH:mm:ss.SSS'
date_format(date_sub(current_timestamp(), 10),'yyyy-MM-dd HH:mm:ss.SSS')
Alternatively,you can also use date_add(date/timestamp/string startdate, tinyint/smallint/int days), Adds a number of days to date
date_add(current_timestamp(), -10)
Convert the current_timestamp to unix timestamp and subtract 10 days=10*86400 seconds. Then use from_unixtime to get the timestamp string.
from_unixtime(unix_timestamp(current_timestamp)-10*86400,'yyyy-MM-dd HH:mm:ss')
Note that unix_timestamp() is being deprecated but not unix_timestamp(string date)

PLSQL - convert unix timestamp with millsecond precision to timestamp(6)

I have a unix timstamp with millsecond precision like below:
1523572200000
I need to convert it to timestamp(6). This is the format I need:
05-NOV-14 09.45.00.000000000 AM
(Fyi examples above are not matching dates, just using as example.)
What's the best way to go about this?
Thanks!
The following might work for you (where myunixtimestamp is the name of the column in which your Unix timestamps are stored):
SELECT TIMESTAMP'1970-01-01 00:00:00.000' + NUMTODSINTERVAL(myunixtimestamp/1000, 'SECOND')
FROM mytable;
For example,
SELECT TIMESTAMP'1970-01-01 00:00:00.000' + NUMTODSINTERVAL(1523572200000/1000, 'SECOND')
FROM dual;
gives a result of 2018-04-12 10:30:00.000000000 PM.
Hope this helps.
Assuming that current timestamp is: 1523572200000, try following:
select cast (to_date('1970-01-01', 'YYYY-MM-DD') + 1523572200000/1000/60/60/24 as timestamp) from dual;
where:
to_date('1970-01-01', 'YYYY-MM-DD') is epoch time
<unix_timestamp>/60/60/24 was divided by 1000 miliseconds 60 second and 60 minutes and 24 hours because in oracle we are adding days

Hive from_unixtime for milliseconds

We have a timestamp epoch column (BIGINT) stored in Hive.
We want to get Date 'yyyy-MM-dd' for this epoch.
Problem is my epoch is in milliseconds e.g. 1409535303522.
So select timestamp, from_unixtime(timestamp,'yyyy-MM-dd') gives wrong results for date as it expects epoch in seconds.
So i tried dividing it by 1000. But then it gets converted to Double and we can not apply function to it. Even CAST is not working when I try to Convert this double to Bigint.
Solved it by following query:
select timestamp, from_unixtime(CAST(timestamp/1000 as BIGINT), 'yyyy-MM-dd') from Hadoop_V1_Main_text_archieved limit 10;
The type should be double to ensure precision is not lost:
select from_unixtime(cast(1601256179170 as double)/1000.0, "yyyy-MM-dd hh:mm:ss.SSS") as event_timestamp
timestamp_ms is unixtime in milliseconds
SELECT from_unixtime(floor(CAST(timestamp_ms AS BIGINT)/1000), 'yyyy-MM-dd HH:mm:ss.SSS') as created_timestamp FROM table_name;
In the original answer you'll get string, but if you'd like to get date you need to call extra cast with date:
select
timestamp,
cast(from_unixtime(CAST(timestamp/1000 as BIGINT), 'yyyy-MM-dd') as date) as date_col
from Hadoop_V1_Main_text_archieved
limit 10;
Docs for casting dates and timestamps. For converting string to date:
cast(string as date)
If the string is in the form 'YYYY-MM-DD', then a date value corresponding to that year/month/day is returned. If the string value does not match this formate, then NULL is returned.
Date type is available only from Hive > 0.12.0 as mentioned here:
DATE (Note: Only available starting with Hive 0.12.0)