TimeStamp issue in hive 1.1 - hive

I am facing a very weird issue in hive in production environment(cloudera 5.5) which is basically not reproducible in my local server(Don't know why) i.e. for some records I am having wrong timestamp value while inserting from temp table to main table as String "2017-10-21 23" is converted into timestamp "2017-10-21 23:00:00" datatype while insertion.
example::
2017-10-21 23 -> 2017-10-21 22:00:00
2017-10-22 15 -> 2017-10-22 14:00:00
It is happening very very infrequent. Means delta value is about 1% of the data.
Flow::: Data in temp table(External table) is populated hourly by using oozie. Below insert statement is executed hourly to insert from temp table to main table(internal table in ORC) in hive using Oozie workflow.
Flow summary:::
Linux logs >> copy logs in temp table(external hive table) >> insert in main hive table.
Insert from temp table to main table:::
FROM
temp
INSERT INTO TABLE
main
PARTITION(event_date,service_id)
SELECT
from_unixtime(unix_timestamp(event_timestamp ,'yyyy-MM-dd HH'), 'yyyy-MM-dd HH:00:00.0'),
col3,
col4,
"" as partner_nw_name,
col5,
"" as country_name,
col6,
col7,
col8,
col9,
col10,
col11,
col12,
col13,
col14,
col15,
kpi_id,
col18,
col19,
col20,
col21,
col23,
col24,
col25,
from_unixtime(unix_timestamp(event_timestamp ,'yyyy-MM-dd HH'), 'yyyy-MM-dd') as event_date,
service_id;
Temp Table:::
hive> desc temp;
OK
event_timestamp string
col2 int
col3 int
col4 int
col5 int
col6 string
col7 string
col8 string
col9 string
col10 string
col11 int
col12 int
col13 string
col14 string
col15 string
service_id int
kpi_id int
col18 bigint
col19 bigint
col20 bigint
col21 bigint
col22 double
col23 string
col24 int
col25 int
Time taken: 0.165 seconds, Fetched: 25 row(s)
Main Table:::
hive> desc main;
OK
event_timestamp timestamp
col3 int
col4 int
partner_nw_name string
col5 int
country_name string
col6 string
col7 string
col8 string
col9 string
col10 int
col11 int
col12 int
col13 string
col14 string
col15 string
kpi_id int
col18 bigint
col19 bigint
col20 bigint
col21 bigint
col23 double
col24 int
col25 int
event_date date
service_id int
# Partition Information
# col_name data_type comment
event_date date
service_id int
Time taken: 0.175 seconds, Fetched: 32 row(s)

Seems like you are adding extra 00 for the hrs place too..
try this:
select from_unixtime(unix_timestamp('2017-08-29 05','yyyy-MM-dd HH'),'yyyy-MM-dd HH:00:0');
the above query gives:
2017-10-21 23:00:0
is this what you are expecting?
you can add 'yyyy-MM-dd HH:00:00.0' if needed.

If you are writing your data in parquet format using Hive then hive adjust the timestamp by local timezone offset. For more information please go through the below links.
There is a Jira ticket related to that for Impala #2716
Cloudera Impala Timestamp document is here

Related

Insert different values everytime when using insert into

I have a table which has 3 columns and 6 records, I want to insert the values in another table which has an additional column of sysid, when I use insert into clause it inserts the records but the sysid value is same , I want different values everytime a record insert but I want to insert in bulk as well
Use a sequence:
CREATE SEQUENCE table_name__sysid__seq;
Then:
INSERT INTO table_name (sysid, col1, col2, col3, col4, col5, col6)
VALUES (table_name__sysid__seq.NEXTVAL, 'value1', 'value2', 'value3', 'value4', 'value5', 'value6');
or, for multiple rows:
INSERT INTO table_name (sysid, col1, col2, col3, col4, col5, col6)
SELECT table_name__sysid__seq.NEXTVAL, col1, col2, col3, col4, col5, col6
FROM first_table;
Or else, if you are using Oracle 12 or later, define sysid as an IDENTITY column:
CREATE TABLE table_name (
sysid NUMBER
GENERATED ALWAYS AS IDENTITY
PRIMARY KEY,
col1 VARCHAR2(20),
col2 VARCHAR2(20),
col3 VARCHAR2(20),
col4 VARCHAR2(20),
col5 VARCHAR2(20),
col6 VARCHAR2(20)
);
db<>fiddle here

Insert extra columns with NULL values from one table to another table HIVE

Table1 have columns col1 , col2 ,col3 , col4 , col5
Table2 have columns col1 , col3 , col5
I want to insert rows from Table2 to Table1
But col2 , col4 should be NULL datatype after inserting into Table2
How can I do it in HIVE , Currently I am using Hortonworks 3.1 version
You just use insert . . . select:
insert into table1 (col1, col2, col3, col4, col5)
select col1, null, col3, null, col5
from table2;

Is there a way to return select query results with few column's value as NULL (other than using REPLACE)?

I want to export a table in DB2 where in which I want to return few column vaues as NULL due to some restrictions.
I am looking for better alternatives to prepare the select query for export.
I can achieve it with the below select query. But the query is very long considering the table has many columns.
SELECT
COL1
,COL2
,COL3
,COL4
,REPLACE(COL5,NULL) AS COL5
,REPLACE(COL6,NULL) AS COL6
,COL7
,COL8
,COL9
,COL10
,COL11
,REPLACE(COL12,NULL) AS COL12
,COL13
,COL14
,COL15
,COL16
,COL17
,COL18
,REPLACE(COL19,NULL) AS COL19
,COL20
FROM
TABLE1
Is there any better alternatives?
Use the following way to set column null value
SELECT
COL1
,COL2
,COL3
,COL4
,NULL AS COL5,
,NULL AS COL6
FROM TABLE1

In SQL,How to pick columns which doesnt have 0 value?

I have pivot query which will result one row of record. I will have to further filter that one row of record. One row of record has hour based columns and so, i have 24 columns for each hour.
How to pick columns which has only values
Lets say we have 5 columns
Col1 Col2 Col3 Col4 Col5
100 0 0 20 0
Col1, Col4 are eligible. I need total these two columns.
Col1 Col4 Total
100 20 120
create table #t
(
Col0 int,
Col1 int,
Col2 int,
Col3 int,
Col4 int,
Col5 int,
Col6 int,
Col7 int,
Col8 int,
Col9 int,
Col10 int,
Col11 int
)
insert into #t values
(0,100,0,0,0,0,20,10,0,0,0,0)
select * from #t
-- Expected Result
select Col1, Col6, Col7 from #t
You can not do that using a static pivot table, dynamic perhaps. It is not known when a pivot table is defined if all the values are null or not and by the time the data is piped in it is too late. You can either create a reporting project that supports omitting empty column groups or look for a more dynamic query alternative.

Separated string into specific temp table colums

I am using a RIGHT(LEFT()) method of stripping a string as each character needs to be put into its own holder so I can access it and use it for a report (each character needs to be in its own box for some reason).
There are 16 characters usually but for space and to save repition I've slimmed down the code.
What I am trying to do is put the separated character value into the corresponding column of the temp table - how is this best achieved?
I have no other use for this data once used I'll destroy it.
Code
CREATE table #StringSeparate
(
col1 varchar(1),
col2 varchar(1),
col3 varchar(1),
col4 varchar(1),
col5 varchar(1),
col6 varchar(1),
col7 varchar(1),
col8 varchar(1),
)
declare #string varchar(16)
set #string = 'tpg22052015-1204'
SELECT
LEFT(#string,1),
RIGHT(LEFT(#string,2),1),
RIGHT(LEFT(#string,3),1),
RIGHT(LEFT(#string,4),1),
RIGHT(LEFT(#string,5),1),
RIGHT(LEFT(#string,6),1),
RIGHT(LEFT(#string,7),1),
RIGHT(LEFT(#string,8),1)
INTO
#String Separate
Just do it like:
CREATE table #StringSeparate
(
col1 varchar(1),
col2 varchar(1),
col3 varchar(1),
col4 varchar(1),
col5 varchar(1),
col6 varchar(1),
col7 varchar(1),
col8 varchar(1),
)
INSERT INTO #StringSeparate
SELECT
LEFT(#string,1),
RIGHT(LEFT(#string,2),1),
RIGHT(LEFT(#string,3),1),
RIGHT(LEFT(#string,4),1),
RIGHT(LEFT(#string,5),1),
RIGHT(LEFT(#string,6),1),
RIGHT(LEFT(#string,7),1),
RIGHT(LEFT(#string,8),1)
Or don't create temp table and do this:
SELECT
LEFT(#string,1) col1,
RIGHT(LEFT(#string,2),1) col2,
RIGHT(LEFT(#string,3),1) col3,
RIGHT(LEFT(#string,4),1) col4,
RIGHT(LEFT(#string,5),1) col5,
RIGHT(LEFT(#string,6),1) col6,
RIGHT(LEFT(#string,7),1) col7,
RIGHT(LEFT(#string,8),1) col8
INTO
#StringSeparate
It will automatically create that temp table, because INTO creates table.
Depending on you RDBMS I suppose I might prefer SUBSTRING:
INSERT INTO #StringSeparate
SELECT
LEFT(#string,1),
SUBSTRING(#string,2,1),
SUBSTRING(#string,3,1),
...
RIGHT(#string,1)
I made a big insert of your statement.
INSERT INTO #StringSeparate
VALUES
((LEFT(#string,1)),
(RIGHT(LEFT(#string,2),1)),
(RIGHT(LEFT(#string,3),1)),
(RIGHT(LEFT(#string,4),1)),
(RIGHT(LEFT(#string,5),1)),
(RIGHT(LEFT(#string,6),1)),
(RIGHT(LEFT(#string,7),1)),
(RIGHT(LEFT(#string,8),1)))