How to efficiently unpivot MULTIPLE columns in Hive?

How to efficiently unpivot MULTIPLE columns in Hive? - hive

My data is structured like in the below table:
| Name | Foo_A | Foo_B | Foo_C | Bar_A | Bar_B | Bar_C |
--------------------------------------------------------
| abcd | 16 | 32 | 14 | 52 | 41 | 17 |
| ... | ... | ... | ... | ... | ... | ... |
I am looking to query the data in Hive in a way such that it looks like this:
| Name | Class | FooVal | BarVal |
----------------------------------
| abcd | A | 16 | 52 |
| abcd | B | 32 | 41 |
| abcd | C | 14 | 17 |
| ... | ... | ... | ... |
I am already aware of and am using a UNION ALL, but what would be a more efficient way of doing this using "LATERAL VIEW explode" a map data type?

CROSS JOIN with class stack (see code example) will multiply main table rows x3, one row per class, then use case statements to derive your columns depending on class value. CROSS JOIN with small dataset (3 rows) should be transformed to map join and will execute very fast on mappers.
set hive.auto.convert.join=true; --this enables map-join
select t.Name,
s.class,
case s.class when 'A' then t.Foo_A
when 'B' then t.foo_B
when 'C' then t.foo_C
end as FooVal,
case s.class when 'A' then t.Bar_A
when 'B' then t.Bar_B
when 'C' then t.Bar_C
end as BarVal
from table t
cross join (select stack(3,'A','B','C') as class) s
;
It will scan the table only once and perform much better than UNION ALL approach.

Thanks for reply! Please find below another way of doing it which is faster than CROSS JOIN.
select t1.ID, t2.key_1 as class, t2.FooVal, t3.BarVal
from table t1
LATERAL VIEW explode (map(
'A', Foo_A,
'B', Foo_B,
'C', Foo_C
)) t2 as key_1, FooVal
LATERAL VIEW explode (map(
'A', Bar_A,
'B', Bar_B,
'C', Bar_C
)) t3 as key_2, BarVal
where t2.key_1 = t3.key_2;

Hive unpivot multiple columns:
select
t1.ID,
lv.key as class,
lv.FooStr.col1 as FooVal,
lv.FooStr.col2 as BarVal
from
table t1
LATERAL VIEW explode (
map(
'A', named_struct('col1', Foo_A, 'col2', Bar_A),
'B', named_struct('col1', Foo_B, 'col2', Bar_B),
'C', named_struct('col1', Foo_C, 'col2', Bar_C)
)) lv as key, FooStr
where
coalesce(lv.FooStr.col1, lv.FooStr.col2) IS NOT NULL

Related

Need to match a field with another that has commas on it' s value

I would like to match the field values of "FORMULATION" from TABLE 1 to "C_TEST_ARTICLE" from table 2, that has mutiple of these formulation sepparated by commas.
Table 1:
+----------------+--------------------+
| SAMPLE_NUMBER | FORMULATION |
+----------------+-----------+--------+
| 84778 | S/200582/01-TA-002 |
| 84777 | S/200582/01-TA-002 |
| 81691 | S/200451/01-TA-011 |
| 81690 | S/200451/01-TA-011 |
+----------------+-----------+--------+
TABLE 2
+-----------------------+--------------------------------------+------------------+
| C_TEST_ARTICLE | C_REPORT_NUMBER |
+----------------+-----------+---------------------------------+------------------+
| S/200180/03-TA-001,S/200180/03-TA-002 | 16698 |
| S/200375/01-TA-001,S/200375/01-TA-002,S/200375/01-TA-003 | 15031 |
+--------------------------------------------------------------+------------------+
What I want form all of this, is that the each of these "C_TEST_ARTICLES" has a "C_REPORT_NUMBER", so I would like to get all those "SAMPLE_NUMBERS" from table 1, so in that way, I would have the samples related to the report number.

you could try using LIKE
select SAMPLE_NUMBER
from table1
INNER JOIN table2 ON c_test_article like concat('%', formulation , '%'')

select
C_TEST_ARTICLE
,C_REPORT_NUMBER
,b1.SAMPLE_NUMBER
from TABLE 2
INNER JOIN TABLE 1 as b1 on C_TEST_ARTICLE like '%'+FORMULATION+'%'

Try
SELECT
T1.SampleNumber
, T2.C_Report_Number
FROM Table1 T1
, Table2 T2
WHERE CHARINDEX(T1.Formulation, T2.C_Test_article) > 0

How to fetch records from DB which fulfill a certain criteria

I have the following problem and wanted to ask if this is the correct way to do it or if there is a better way of doing it:
Assume I have the following table/data in my DB:
|---|----|------|-------------|---------|---------|
|id |city|street|street_number|lastname |firstname|
|---|----|------|-------------|---------|---------|
| 1 | ar | K1 | 13 |Davenport| Hector |
| 2 | ar | L1 | 27 |Cannon | Teresa |
| 3 | ar | A1 | 135 |Brewer | Izaac |
| 4 | dc | A2 | 8 |Fowler | Milan |
| 5 | fr | C1 | 18 |Kaiser | Ibrar |
| 6 | fr | C1 | 28 |Weaver | Kiri |
| 7 | ny | O1 | 37 |Petersen | Derrick |
I now get some some requests of the following structures: (city/street/street_number)
E.g.: {(ar,K1,13),(dc,A2,8),(ny,01,37)}
I want to retrieve the last name of the person living there. Since the request amount is quite large I don't want to run over all the request one-by-one. My current implementation is to insert the data into a temporary table and join the values.
Is this the right approach or is there some better way of doing this?

You can construct a query using in with tuples:
select t.*
from t
where (city, street, street_number) in ( (('ar', 'K1', '13'), ('dc', 'A2', '8'), ('ny', '01', '37') );
However, if the data starts in the database, then a temporary table or subquery is better than bringing the results back to the application and constructing such a query.

I think you can use the hierarchy query and string function as follows:
WITH YOUR_INPUT_DATA AS
(SELECT '(ar,K1,13),(dc,A2,8),(ny,01,37)' AS INPUT_STR FROM DUAL),
--
CTE AS
( SELECT REGEXP_SUBSTR(STR,'[^,]',1,2) AS STR1,
REGEXP_SUBSTR(STR,'[^,]',1,3) AS STR2,
REGEXP_SUBSTR(STR,'[^,]',1,4) AS STR3
FROM (SELECT SUBSTR(INPUT_STR,
INSTR(INPUT_STR,'(',1,LEVEL),
INSTR(INPUT_STR,')',1,LEVEL) - INSTR(INPUT_STR,'(',1,LEVEL) + 1) STR
FROM YOUR_INPUT_DATA
CONNECT BY LEVEL <= REGEXP_COUNT(INPUT_STR,'\),\(') + 1))
--
SELECT * FROM YOUR_TABLE WHERE (city,street,street_number)
IN (SELECT STR1,STR2,STR3 FROM CTE);

Comma separated (with curly brackets ) search in SQL

I have a SQL Table like this:
Table1:
| SomeID1 | OtherID1 | Data1
+----------------+-------------+-------------------
| abcdef-..... | cdef123-... | {18,20,22}
| abcdef-..... | 4554a24-... | {17,19}
| 987654-..... | 12324a2-... | {13,19,20}
And another table with:
Table 2:
| SomeID2 | OtherID2 | Data2
+----------------+-------------+-------------------
| abcdef-..... | cdef123-... | 13
| abcdef-..... | 4554a24-... | 14
| 987654-..... | 12324a2-... | 15
| abcdef-..... | 4554a24-... | 16
| 987654-..... | 12324a2-... | 17
Is it possible to gather one Data1 value from table1 and search in table2 like:
select * from table2 where Data2 in ('18','20','22')
Im looking for something like this:
select * from table2 where Data2 in (select Data1 from table1 where SomeID1='abcdef')
PD: I did not make the table

If SomeID1 is unique you can do this:
select * from table2
where (select replace(replace(Data1, '{', ','), '}', ',') from table1 where SomeID1=?)
like concat('%,', Data2, ',%')
This works for SQL Server and MySql and you can adjust it to work for any database.
See the demo.

SELECT SomeID1, OtherID1, Data1 FROM Table1,Table2 WHERE SomeID1 = SomeID2 AND ....
You need one reference that is unique together

SOLUTION #1 (programming language)
This can be done with any programming language
1: Prepare both statements
2: Execute your first query Ex. SELECT Data1 FROM table1
3: Explode Data1 field by commas, store the exploded var in an array (trim curly braces first)
4: Loop through your array and execute your second query Ex. SELECT * FROM table2 WHERE Data2 = array[index++]
5: Get your results whenever there's a match
SOLUTION #2 (PL/SQL)
Using only PL/SQL a cursor can be helpful for what you're trying to accomplish
http://www.sqlservertutorial.net/sql-server-stored-procedures/sql-server-cursor/

Is it possible for you to normalize data with To-Many relationship by introducing one more table? So you will have Table 3:
| table1SomeID1 | Data1
+----------------+-------------
| abcdef-..... | 18
| abcdef-..... | 20
| abcdef-..... | 22
| abcdef-..... | 17
| abcdef-..... | 19
| 987654-..... | 13
| 987654-..... | 19
| 987654-..... | 20
You will be able to make queries like:
select * from table2 where data2 in (select Data1 from table3 where table1SomeID = "abcdef")

Aggregate ENTIRE rows based on single field without querying source twice or using CTEs?

Assume I have the following table:
+--------+--------+--------+
| field1 | field2 | field3 |
+--------+--------+--------+
| a | a | 1 |
| a | b | 2 |
| a | c | 3 |
| b | a | 1 |
| b | b | 2 |
| c | b | 2 |
| c | b | 3 |
+--------+--------+--------+
I want to select only the rows where field3 is the minimum value, so only these rows:
+--------+--------+--------+
| field1 | field2 | field3 |
+--------+--------+--------+
| a | a | 1 |
| b | a | 1 |
| c | b | 2 |
+--------+--------+--------+
The most popular solution is to query the source twice, once directly and then joined to a subquery where the source is queried again and then aggregated. However, since my data source is actually a derived table/subquery itself, I'd have to duplicate the subquery in my SQL which is ugly. The other option is to use the WITH CTE and reuse the subquery which would be nice, but Teradata, the database I am using, doesn't support CTEs in views, though it does in macros which is not an option for me now.
So is it possible in standard SQL to group multiple records into a single record by using only a single field in the aggregation without querying the source twice or using a CTE?

This is possible using a window function:
select *
from (
select column_1, column_2, column_3,
min(column_3) over (partition by column_1) as min_col_3
from the_table
) t
where column_3 = min_col_3;
The above is standard SQL and I believe Teradata also supports window functions.
The derived table is necessary because you can't refer to a column alias in the where clause - at least not in standard SQL.
I think Teradata actually allows that using the qualify operator, but as I have never used it, I am not sure:
select *
from the_table
qualify min(column_3) over (partition by column_1) = column_3;

Use NOT EXISTS to return a row if there are no other row with same field1 value but a lower field3 value:
select * from table t1
where not exists (select 1 from table t2
where t2.field1 = t1.field1
and t2.field3 < t1.field3)

Sql query listing Fathers and childs with joins, how to distinct them?

Having those tables:
table_n1:
| t1_id | t1_name |
| 1 | foo |
table_n2:
| t2_id | t1_id | t2_name |
| 1 | 1 | bar |
I need a query that gives me two result:
| names |
| foo |
| foo / bar |
But i cant figure out the right way.
I wrote this one:
SELECT
CONCAT_WS(' / ', table_n1.t1_name, table_n2.t2_name) AS names
FROM
table_n1
LEFT JOIN table_n2 ON table_n2.t1_id = table_n1.t1_id
that works for an half: this only return the 2° row (in the example above):
| names |
| foo - bar |
This query return the 'father' (table_n1) name only when it doesnt have 'childs' (table_n2).
How can i fix it?

Using a UNION and changing the LEFT JOIN to an INNER JOIN should give you the correct result.
SELECT table_n1.t1_name AS names
FROM table_n1
UNION ALL
SELECT CONCAT_WS(' / ', table_n1.t1_name, table_n2.t2_name) AS names
FROM table_n1
INNER JOIN table_n2 ON table_n2.t1_id = table_n1.t1_id

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to efficiently unpivot MULTIPLE columns in Hive? - hive

Related

Need to match a field with another that has commas on it' s value

How to fetch records from DB which fulfill a certain criteria

Comma separated (with curly brackets ) search in SQL

Aggregate ENTIRE rows based on single field without querying source twice or using CTEs?

Sql query listing Fathers and childs with joins, how to distinct them?

Categories

Resources