I have two time stamp columns in a Hive DB storing timestamp in following format:
hive> select last_date from xyz limit 2;
OK
2019-08-21 15:11:23.553
2019-08-21 15:11:23.553
[Above has milliseconds stored in it by default]
hive> select last_modify_date from xyz limit 2;
OK
2018-04-18 23:32:58
2017-09-22 04:02:32
I need a common Hive select query which would convert both the above timestamps to 'YYYY-MM-DD HH:mm:ss.SSS' formats, preserving the millisecond value if exists, or appending '.000' if it doesnt exist.
What I have tried so far:
select
last_modify_date,
from_unixtime(unix_timestamp(last_modify_date), "yyyy-MM-dd HH:mm:ss.SSS") as ts
from xyz limit 3;
However, the above query displays '.000' for both the above said timestamp columns.
Please help
From the UDF that implements unix_timestamp, you can see that the returned value is in SENCONDS represented by a LongWritable. And anything less than one second is rounded off.
You can write your own UDF, or just use pure SQL to achieve that.
One of the easy way is to use the GenericUDFRpad rpad:
select rpad(your_date, 23, '.000') from your_table;
Some examples:
hive> select rpad('2018-04-18 23:32:58', 23, '.000');
OK
2018-04-18 23:32:58.000
hive> select rpad('2018-04-18 23:32:58.553', 23, '.000');
OK
2018-04-18 23:32:58.553
Related
I would like to extract the date & hour from UTC time from the below table in bigquery. I have used timestamp for getting the date or time using the below code. I would like to apply the code for the entire column. How to apply timestamp for the entire column? Can you please assist with it?
SELECT EXTRACT(HOUR FROM TIMESTAMP "2020-05-03 16:49:47.583494")
My data is like this
I want result like this:
You can do it this way:
SELECT my_column AS original_value,
DATE_FORMAT(STR_TO_DATE(my_column, "%Y-%m-%d %H:%i:%s.%f UTC"), "%e/%m/%Y") AS date,
DATE_FORMAT(STR_TO_DATE(my_column, "%Y-%m-%d %H:%i:%s.%f UTC"), "%l%p") AS hour
FROM my_table;
I am assuming that the column is VARCHAR, that's why I am converting it to DATE.
Output:
Demo:
You can check the demo here.
Edit:
My initial thought was that OP wanted the query for MySQL (probably BigQuery is based on that). But it turns out that BigQuery is not based on MySQL. So you can use FORMAT_TIMESTAMP in BigQuery, this is how the query would look:
SELECT Occurrence AS original_value,
FORMAT_TIMESTAMP("%e/%m/%Y", Occurrence) AS date,
FORMAT_TIMESTAMP("%l%p", Occurrence) AS hour
FROM mytable
I am comparing timestamp columns between 2 different database engines and I need to retrieve the time stamp column stored in YYY-MM-DD HH:mm:ss format to YYY-MM-DD HH:mm:ss.SSS, with SSS being 000 when no entry is there.
Can I do the above using Hive select query?
Split the timestamp to get milliseconds part, use rpad to add zeroes if there is no millisecond part at all or milliseconds part is less that 3 digits.
Demo:
with your_data as (
select stack(3, '2019-11-02 20:18:00.123',
'2019-11-02 20:18:00.12',
'2019-11-02 20:18:00'
) as ts
)
select concat(split(ts,'\\.')[0],'.',rpad(nvl(split(ts,'\\.')[1],''),3,0))
from your_data d
;
Result:
2019-11-02 20:18:00.123
2019-11-02 20:18:00.120
2019-11-02 20:18:00.000
Given that both formats (and their lengths) are strictly defined, you can use this simple logic:
left(concat(ts,'.000'),19)
Can't check the exact syntax, but basically you append extra zeros and cut them off if you don't need them.
I am trying to calculate the time difference between two columns of a row which are of string data type. If the time difference between them is less than 2 hours then select the first column of that row else if the time difference is greater than 2 hours then select the second column of that row. It can be done by converting the columns to datetime format, but I want the result to be in string only. How can I do that? The data looks like this:
col1(string type)
2018-07-16 02:23:00
2018-07-26 12:26:00
2018-07-26 15:32:00
col2(string type)
2018-07-16 02:36:00
2018-07-26 14:29:00
2018-07-27 15:38:00
I think you don't need to convert the columns to datetime format, since the data in your case is already ordered (yyyy-MM-dd hh:mm:ss). You just need to take all the digits and take it into one string (yyyyMMddhhmmss) then you can apply your selection which is bigger or smaller than 2 hours (here 20000 since the hour is followed by mmss). By looking at your example (assuming col2 > col1), this query would work:
SELECT case when regexp_replace(col2,'[^0-9]', '')-regexp_replace(col1,'[^0-9]', '') < 20000 then col1 else col2 end as col3 from your_table;
Use unix_timestamp() to convert string timestamp to seconds.
The difference in hours will be:
hive> select (unix_timestamp('2018-07-16 02:23:00')- unix_timestamp('2018-07-16 02:36:00'))/60/60;
OK
-0.21666666666666667
Important update: this method will work correctly only if time zone is configured as UTC. Because for DST timezones for some marginal cases Hive converts time during timestamp operations. Consider this example for PDT time zone:
hive> select hour('2018-03-11 02:00:00');
OK
3
Note the hour is 3, not 2. This is because 2018-03-11 02:00:00 cannot exist in PDT time zone because exactly at 2018-03-11 02:00:00 time is adjusted and becomes 2018-03-11 03:00:00.
The same happens when converting to unix_timestamp. For PDT time zone unix_timestamp('2018-03-11 03:00:00') and unix_timestamp('2018-03-11 02:00:00') will return the same timestamp:
hive> select unix_timestamp('2018-03-11 03:00:00');
OK
1520762400
hive> select unix_timestamp('2018-03-11 02:00:00');
OK
1520762400
And few links for your reference:
https://community.hortonworks.com/questions/82511/change-default-timezone-for-hive.html
http://boristyukin.com/watch-out-for-timezones-with-sqoop-hive-impala-and-spark-2/
Also have a look at this jira please: Hive should carry out timestamp computations in UTC
How to caculate sum of times of my colonne called "timeSpent" having this format: HH:mm
in SQL? I am using MySQL.
the type of my column is Time.
it has this structure
TimeFrom like 10:00:00 12:00:00 02:00:00
TimeUntil 08:00:00 09:15:00 01:15:00
Time spent
total time 03:15:00
SELECT SEC_TO_TIME( SUM( TIME_TO_SEC( `timeSpent` ) ) ) AS timeSum
FROM YourTableName
100% working code to get sum of time out of MYSQL Database:
SELECT
SEC_TO_TIME( SUM(time_to_sec(`db`.`tablename`)))
As timeSum
FROM
`tablename`
Try and confirm.
Thanks.
In MySQL, you would do something like this to get the time interval:
SELECT TIMEDIFF('08:00:00', '10:00:00');
Then to add the time intervals, you would do:
SELECT ADDTIME('01:00:00', '01:30:00');
Unfortunately, you're not storing dates or using 24-hour time, so these calculations would end up incorrect since your TimeUntil is actually lower than your TimeFrom.
Another approach would be (assuming you sort out the above issue) to store the time intervals as seconds using TIMESTAMPDIFF():
UPDATE my_table SET time_spent=TIMESTAMPDIFF(start, end));
SELECT SEC_TO_TIME(SUM(time_spent)) FROM my_table;
If the data type of the timeSpent column is TIME you should be able to use the following query:
SELECT SUM(timeSpent)
FROM YourTableName -- replace YourTableName with the actual table name
However, if that is not the case, you may have to use a cast to convert to the Time data type. Something like this should work:
SELECT SUM(timeSpent - CAST('0:0:0' as TIME))
FROM YourTableName -- replace YourTableName with the actual table name
How to caculate sum of times of my colonne called "timeSpent" having this format: HH:mm
in SQL? I am using MySQL.
the type of my column is Time.
it has this structure
TimeFrom like 10:00:00 12:00:00 02:00:00
TimeUntil 08:00:00 09:15:00 01:15:00
Time spent
total time 03:15:00
SELECT SEC_TO_TIME( SUM( TIME_TO_SEC( `timeSpent` ) ) ) AS timeSum
FROM YourTableName
100% working code to get sum of time out of MYSQL Database:
SELECT
SEC_TO_TIME( SUM(time_to_sec(`db`.`tablename`)))
As timeSum
FROM
`tablename`
Try and confirm.
Thanks.
In MySQL, you would do something like this to get the time interval:
SELECT TIMEDIFF('08:00:00', '10:00:00');
Then to add the time intervals, you would do:
SELECT ADDTIME('01:00:00', '01:30:00');
Unfortunately, you're not storing dates or using 24-hour time, so these calculations would end up incorrect since your TimeUntil is actually lower than your TimeFrom.
Another approach would be (assuming you sort out the above issue) to store the time intervals as seconds using TIMESTAMPDIFF():
UPDATE my_table SET time_spent=TIMESTAMPDIFF(start, end));
SELECT SEC_TO_TIME(SUM(time_spent)) FROM my_table;
If the data type of the timeSpent column is TIME you should be able to use the following query:
SELECT SUM(timeSpent)
FROM YourTableName -- replace YourTableName with the actual table name
However, if that is not the case, you may have to use a cast to convert to the Time data type. Something like this should work:
SELECT SUM(timeSpent - CAST('0:0:0' as TIME))
FROM YourTableName -- replace YourTableName with the actual table name