Get a substring in hive - hive

I am trying to get a substring of a string from Hive. I have a string as this one: 2017-06-05 09:06:32.0
What I want is to get the first two digits from hour, that is, 09.
I get the entire hour with this command:
SELECT SUBSTR(hora,11) AS subhoras FROM axmugbcn18.bbdd WHERE hora = '2017-06-05 09:06:32.0'
The result of the command is: 09:06:32.0
In order to get only 09 I try this command:
SELECT REGEXP_EXTRACT(hora,'\d\d') AS subhoras FROM axmugbcn18.bbdd WHERE hora = '2017-06-05 09:09:32.0'
but results are blank.
How can I retrieve only the two digits of hour?
Thanks

There are several ways you can extract hours from timestamp value.
1.Using Substring function:
select substring(string("2017-06-05 09:06:32.0"),12,2);
+------+--+
| _c0 |
+------+--+
| 09 |
+------+--+
2.Using Regexp_Extract:
select regexp_Extract(string("2017-06-05 09:06:32.0"),"\\s(\\d\\d)",1);
+------+--+
| _c0 |
+------+--+
| 09 |
+------+--+
3.Using Hour:
select hour(timestamp("2017-06-05 09:06:32.0"));
+------+--+
| _c0 |
+------+--+
| 9 |
+------+--+
4.Using from_unixtime:
select from_unixtime(unix_timestamp('2017-06-05 09:06:32.0'),'HH');
+------+--+
| _c0 |
+------+--+
| 09 |
+------+--+
5.Using date_format:
select date_format(string('2017-06-05 09:06:32.0'),'hh');
+------+--+
| _c0 |
+------+--+
| 09 |
+------+--+
6.Using Split:
select split(split(string('2017-06-05 09:06:32.0'),' ')[1],':')[0];
+------+--+
| _c0 |
+------+--+
| 09 |
+------+--+

Try the below:
select
'2017-06-05 09:06:32.0' as t,
hour('2017-06-05 09:06:32.0'), -- output: 9
from_unixtime(unix_timestamp('2017-06-05 09:06:32.0'),'HH') -- output: 09
from table_name;
You can either try hour or unixtimestamp to get the desired result.
Hope this helps :)

Related

How to change string to timestamp?

I have a timestamp data that still in STRING type like below:
+-----------------------------+
| created_at |
+-----------------------------+
| 2019-09-05T07:44:32.117283Z |
+-----------------------------+
| 2019-09-05T08:44:32.117213D |
+-----------------------------+
| 2019-09-06T08:44:32.117283A |
+-----------------------------+
| 2019-09-21T09:42:32.117223T |
+-----------------------------+
| 2019-10-21T10:21:14.1174dwC |
+-----------------------------+
How can I change it to ISO Format like "2020-09-05 07:44:32 UTC"?
Thanks in advance
You can use PARSE_TIMESTAMP('%FT%T', SPLIT(created_at, '.')[OFFSET(0)]) or PARSE_TIMESTAMP('%FT%T', SUBSTR(created_at, 1, 19)) - whatever you like better
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT '2019-09-05T07:44:32.117283Z' created_at UNION ALL
SELECT '2019-09-05T08:44:32.117213D' UNION ALL
SELECT '2019-09-06T08:44:32.117283A' UNION ALL
SELECT '2019-09-21T09:42:32.117223T' UNION ALL
SELECT '2019-10-21T10:21:14.1174dwC'
)
SELECT PARSE_TIMESTAMP('%FT%T', SPLIT(created_at, '.')[OFFSET(0)])
FROM `project.dataset.table`
with output
Row f0_
1 2019-09-05 07:44:32 UTC
2 2019-09-05 08:44:32 UTC
3 2019-09-06 08:44:32 UTC
4 2019-09-21 09:42:32 UTC
5 2019-10-21 10:21:14 UTC

Replace empty string in hive- Nvl and COALESCE tried

How to replace an empty string(length 0 ) with some other value? Already used Nvl and COALESCE but both doesn't replace with the replacement value because the value is not null. i can use case statement but looking for a built in function if there is any.
As you are having empty strings so when we use coalesce or nvl works only if we are having null values in the data. These functions won't work with empty strings.
With Empty strings:
hive> select coalesce(string(""),"1");
+------+--+
| _c0 |
+------+--+
| |
+------+--+
hive> select nvl(string(""),"1");
+------+--+
| _c0 |
+------+--+
| |
+------+--+
With null values:
hive> select coalesce(string(null),"1");
+------+--+
| _c0 |
+------+--+
| 1 |
+------+--+
hive> select nvl(string(null),"1");
+------+--+
| _c0 |
+------+--+
| 1 |
+------+--+
Try to alter the table and add this property
TBLPROPERTIES('serialization.null.format'='')
if this property doesn't display empty string as null's then we need to use either case/if statement to replace empty strings.
You can use if statement
if(boolean testCondition, T valueTrue, T valueFalseOrNull)
hive> select if(length(trim(<col_name>))=0,'<replacement_val>',<col_name>) from <db>.<tb>;
Example:
hive> select if(length(trim(string("")))=0,'1',string("col_name"));
+------+--+
| _c0 |
+------+--+
| 1 |
+------+--+
hive> select if(length(trim(string("1")))=0,'1',string("col_name"));
+-----------+--+
| _c0 |
+-----------+--+
| col_name |
+-----------+--+
In Hive, empty string is treated like usual comparable value, not NULL. That is why there is no built-in function for this.
Using case statement:
case when col='' or col is null then 'something' else col end

Netezza time formatting

Need to convert timestamps with 1/1000 second resolution to 1/100 resolution. I could possibly use to_char(timestamp, text) formatting function for this purpose, however need help with text to be used. Postgres way of doing this is here.
input table (note - the timestamp here is stored as varchar)
+-------------------------+
| ms1000_val |
+-------------------------+
| 2017/02/20 08:27:17.899 |
| 2017/02/20 08:23:43.894 |
| 2017/02/20 08:24:41.894 |
| 2017/02/20 08:28:09.899 |
+-------------------------+
output table
+------------------------+
| ms100_val |
+------------------------+
| 2017/02/20 08:27:17.89 |
| 2017/02/20 08:23:43.89 |
| 2017/02/20 08:24:41.89 |
| 2017/02/20 08:28:09.89 |
+------------------------+
Try this
select cast(to_char(sub.field,'YYYY-MM-DD HH24:MI:SS') as timestamp)
+ interval '10 millisecond' * (cast(to_char(sub.field,'MS') as integer)/10) as converted_value
from (
select to_timestamp('2017/02/20 08:27:17.899','YYYY/MM/DD HH24:MI:SS.MS') as field
union
select to_timestamp('2017/02/20 08:23:43.894','YYYY/MM/DD HH24:MI:SS.MS')
union
select to_timestamp('2017/02/20 08:24:41.894','YYYY/MM/DD HH24:MI:SS.MS')
union
select to_timestamp('2017/02/20 08:28:09.899','YYYY/MM/DD HH24:MI:SS.MS')
) sub

Why cast as timestamp give out two different result

I have a hive table with two rows like this:
0: jdbc:hive2://localhost:10000/default> select * from t2;
+-----+--------+
| id | value |
+-----+--------+
| 10 | 100 |
| 11 | 101 |
+-----+--------+
2 rows selected (1.116 seconds)
but when I issue a query :
select cast(1 as timestamp) from t2;
it gives out unconsistent result, can anyone tell me the reason ?
0: jdbc:hive2://localhost:10000/default> select cast(1 as timestamp) from t2;
+--------------------------+
| _c0 |
+--------------------------+
| 1970-01-01 07:00:00.001 |
| 1970-01-01 07:00:00.001 |
+--------------------------+
2 rows selected (0.913 seconds)
0: jdbc:hive2://localhost:10000/default> select cast(1 as timestamp) from t2;
+--------------------------+
| _c0 |
+--------------------------+
| 1970-01-01 08:00:00.001 |
| 1970-01-01 07:00:00.001 |
+--------------------------+
2 rows selected (1.637 seconds)
I can't reproduce your problem, which Hive version are you using? Hive had a bug with timestamp and bigint (see https://issues.apache.org/jira/browse/HIVE-3454), but it doesn't explain your problem. For example Hive 0.14 gives different results for
SELECT (cast 1 as timestamp), cast(cast(1 as double) as timestamp) from my_table limit 5;

SQL query to get dif between dates in select statment

I have a two tables call RFS and RFS_History.
RFS_id | name
--------+--------
12 | xx
14 | yy
15 | zz
figure 1 :RFS table
RFS_id | gate | End | start
--------+-------+--------+-------
12 | aa | 19/02 | 20/03
12 | bb | 30/01 | 12/08
12 | cc | 30/01 | 12/08
13 | aa | 30/01 | 12/08
12 | dd | 30/01 | 12/08
figure 2 :RFS history
My initial query is a select * query to get information where FRSname ='xx'
SELECT * FROM RFS, RFSHistory
WHERE RFSname="xx" And RFShistory.RFS_ID=RFS.RFS_ID
result is:
RFS_id | gate | End | start
--------+-------+--------+-------
12 | aa | 19/02 | 19/01
12 | bb | 12/04 | 12/02
12 | cc | 20/03 | 12/03
12 | dd | 30/09 | 12/08
figure 3
however I want to get a result like bellow format :
RFS_id | gate_aa | gate_bb | gate_cc | gate_dd
----------------------------------------------
12 | 30 days | 60dyas | 8days | 18days
gate_aa is duraion and it gets from start - end date. Please help me to write single query to get this result.
Use datediff() to get date difference and Pivot() to convert row into cloumn
like here in your case gate wise column
Sample Syntax
SELECT DATEDIFF(day,'2008-06-05','2008-08-05') AS DiffDate
You can use the below query for get the difference b/w dates
SELECT RFS.ID,(RFS_HISTORY.end_t-RFS_HISTORY.start_t) AS DiffDate,gate FROM RFS, RFS_HISTORY
WHERE name='aa' And RFS_HISTORY.ID=RFS.ID group by RFS.ID,gate,RFS_HISTORY.end_t,RFS_HISTORY.start_t
I think you want to convert rows into columns on the values. This can be done with the help of pivoting.
SELECT * FROM RFS, RFSHistory
pivot for columname on [values]
I actually forgot the syntax but you can google it