HIVE- Divide single row in to multiple rows

HIVE- Divide single row in to multiple rows - hive

I have to prepare the query from source table to target table. table structure are shown in the image. Can any one help on this.http://i.stack.imgur.com/wnUuZ.png
[Tables image]

Hive's stack function should work here.
SELECT stack(2,
col1, col2, col3, '',
col1, col2, '', col4
) AS (newCol1, newCol2, newCol3, newCol4)
FROM source;
Basically, stack generates N rows for each row in the source, and you define each of these new rows.

Related

Best way to compare three columns in sql Hive

I need to do some comparison through 3 columns containing string dates 'yyyy-mm-dd', in Hive SQL. Please take in consideration that the table has more than 2 million records.
Consider three columns (col1; col2; col3) from table T1, I must guarantee that:
col1 = col2, and both, or at least one is different from col3.
My best regards,

Logically you have an issue.
col1 = col2
Therefore if col1 != col3 then col2 != col3;
There for it's really enough to use:
select * from T1 where col1 = col2 and col1 != col3;
It is appropriate to do this map side so using a where criteria is likely good enough.
If you wanted to say 2 out of the 3 need to match you could use group by with having to reduce comparisons.

Hive- how to get the derive column names and use it in the same query?

I am trying to run the below query :
select [every_column],count(*) from <table> group by [every_column] having count(*) >1
But column names should be derived in the same query. I believe show columns in would list down the column names separated by new line. But I need to use it in one query to retrieve the result.
Appreciate any help in this regard.

You can use shell sed to search the newlines(\n) and replace with comma(,).
Assign the comma separated column names to a hive variable, use the variable name in your hive query.
References for sed and set hive variables

Have you thought of using subqueries or even CTE? Maybe this helps you find your answer:
select outer.col1,
outer.col2,
outer.col3,
outer.col4,
outer.col5,
outer.col6, count(*) as cnt
from (
select <some logic> as col1,
<some logic> as col2,
<some logic> as col3,
<some logic> as col4,
<some logic> as col5,
<some logic> as col6
from innerTable
)outer
group by outer.col1,
outer.col2,
outer.col3,
outer.col4,
outer.col5,
outer.col6

More efficient to SELECT constants vs using VALUES when INSERTing a row with constant values?

Which of the following queries is the more efficient and the more idiomatic way of inserting a row with some constant values?
INSERT INTO example_table (col1, col2, col3)
SELECT 123, other_col, 'value' FROM other_table WHERE some_id = 999;
or
INSERT INTO example_table (col1, col2, col3)
VALUES (123,
(SELECT other_col FROM other_table WHERE some_id = 999),
'value');

they have different semantics if there is not exactly one row matching the following query
SELECT other_col FROM other_table WHERE some_id = 999
so choose the one that gives you the semantics you want
If the above query returns 0 rows do you want (a) no rows to be inserted or (b) a row with NULL?
If the above query returns more than one row do you want (a) that number of rows to be inserted or (b) a runtime error?
If you answered (a) for both the above choose the first one. If you answered (b) choose the second one.

Insert Statement for List

I'm not too sure how to describe my SQL Insert statement so I will describe the expected result.
I'm building a data extract list and have a table that I've put all my data into. It's called _MATTER_LIST
What I am trying to Achieve is to have the Client_Number + Col1 combination repeat after every unique COL1+COL2+COL3 combination but not duplicate when there is already a CLIENT_NUMBER+COL1. So the end result would be:
thanks in advance for any tips.

Simple ORDER BY should work for you if i understand. Try this :
select Client_Number, Col1, Col2, Col3 from _MATTER_LIST
order by Client_Number, Col1

I've managed to fix my own issue. I added a unique key for the col1 + col2 + col3 , then make col2 repeat over each combination for example.
The result is: select * from _MATTER_LIST order by COL4, COL5

Oracle_How to insert data from slow select query into two tables

I have searched but can't get answer for this (maybe wrong keyword...)
I come to this problem today when I need to create a procedure to calculate data to save to 2 report table in 2 different schemas. Let say those two tables have same structure.
The query to calculate data may take more than 60 seconds (data may or may not change the result of SELECT statemant if run again)
I have two way to insert data to those two table:
Just run insert TWO time with that same select query.
Using a GTT - global temporary table to save calculated data from SELECT query, then INSERT to those two tables using data in that GTT.
I wonder if Oracle will keep cache of result for the SELECT query so that the first way will have better performance then second way (but have longer code, and duplicate code, not synchronized?).
So could anyone confirm and explain the right way to solve this for me? Or a better way of doing this?
Thank you,
Appendix 1:
INSERT INTO report_table (col1, col2, ....)
SELECT .....
FROM .....
--(long query)
;
INSERT INTO center_schema.report_table (col1, col2, ....)
SELECT .....
FROM .....
--same select query as above
;
And 2:
INSERT INTO temp_report_table(col1, col2, ...)
SELECT .....
FROM .....
--(long query)
;
INSERT INTO report_table (col1, col2, ....)
SELECT col1, col2, ....
FROM temp_report_table
;
INSERT INTO center_schema.report_table (col1, col2, ....)
SELECT col1, col2, ....
FROM temp_report_table
;

No, you have a third option - the wonderful multi-insert...
INSERT ALL
INTO report_table (col1, col2, ....)
VALUES (X.col1, X.col2, ...)
INTO center_schema.report_table (col1, col2, ...)
VALUES (X.col1, X.col2, ...)
SELECT col1, col2, ...
FROM your_table X
--(long query)
;
For a detailed info on this nice way of loading multiple tables at once please refer to the respective part of Oracle documentation.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

HIVE- Divide single row in to multiple rows - hive

I have to prepare the query from source table to target table. table structure are shown in the image. Can any one help on this.http://i.stack.imgur.com/wnUuZ.png [Tables image]

Hive's stack function should work here. SELECT stack(2, col1, col2, col3, '', col1, col2, '', col4 ) AS (newCol1, newCol2, newCol3, newCol4) FROM source; Basically, stack generates N rows for each row in the source, and you define each of these new rows.

Related

Best way to compare three columns in sql Hive

Hive- how to get the derive column names and use it in the same query?

More efficient to SELECT constants vs using VALUES when INSERTing a row with constant values?

Insert Statement for List

Oracle_How to insert data from slow select query into two tables

Categories

Resources