Merging columns in a join of two tables - hive

I have the following tables in a Hive database:
table1:
id t X
1 1 a
1 4 a
2 5 a
3 10 a
table2:
id t Y
1 3 b
2 6 b
2 8 b
3 15 b
And I would like to merge them to have a table like:
id t Z
1 1 a
1 3 b
1 4 a
2 5 a
2 6 b
2 8 b
3 10 a
3 15 b
Basically what I want to do is :
a join on the column id (that part is easy)
merge the columns table1.t and table2.t into a new column t
have the variable Z that is equal to table1.X if the corresponding t comes from table1.t, and table2.Y if it comes from table2.t
order the table by id and then by t (that shouldn't be too hard)
I have no idea on how to do the parts 2 and 3. I tried with an outer join on
table1.id = table2.id and table1.t = table2.t, but it doesn't merge the two columns t.
Any pointer would be appreciated. Thanks!

CREATE TABLE table3 as SELECT * FROM (SELECT id,t,X as Z FROM t3_1 UNION ALL SELECT id,t,Y as Z FROM t3_2) u1 order by id,t;
Although not always required, using a subquery for the union'd queries help to organize, plus you can then reference the fields from the union (e.g. u1.id ) in other parts of the query.
You'll need the alias on the 3rd column to make the schemas match. If the source table name was not already a column, you could do something like this:
select * from (select id,t,'a' from t3_1 UNION ALL select id,t,'b' from t3_2) u1;

Try this one. It will insert in table 3, all the values from the other 2 tables
INSERT INTO table3 ( t, Z )
SELECT t, X
FROM table1
UNION ALL
SELECT t, Y
FROM table2

Related

best way to join by multiple columns and avoid OR in SQL

I need a recommendation.
I have two tables. Table 1 is the main table and table 2 is the table that I initially thought to join Table 1 through a left join, table 2 is much larger than table 1. What would be the best performing way to join Table 1 and Table 2 being the union condition that Column b is equal to column b or that column c is equal to column c and column d is equal to column d, that is, any of these conditions is met but no empty values are met. This without using OR in the left join due to the poor performance it would have and the execution time. I appreciate any help.
Note: table 1 and table 2 is the result of 40 lines query. Database do not support recursive query. The database is sap hana.
Table 1
ID
column b
column c
column d
1
d
g
j
2
e
h
k
3
f
i
Table 2
ID_2
column b
column c
column d
4
d
g
5
k
6
i
Desired Result
ID
column b
column c
column d
ID_2
1
d
g
J
4
2
e
h
k
5
3
f
i
6
Use two left joins:
select t1.*,
coalesce(t2_b.id_2, t2_c.id_2, t2_d.id_2) as id_2
from table1 t1 left join
table2 t2_b
on t1.b = t2_b.b left join
table2 t2_c
on t1.c = t2_d.c and t2_b.b is null
table2 t2_d
on t1.d = t2_d.d and t2_c.c is null;
Note that for optimal performance, you want three indexes:
table2(b, id_2)
table2(c, id_2)
table2(d, id_2)

Select within select with multiple matches on the other table SQL

I have these 3 tables
Table 1:
id_Table1 field_table1_1 field_table1_2
1 A B
2 C D
3 E F
Table 1:
id_Table2 field_table2_1 field_table2_2
4 G H
5 I J
List item
Table 3:
id_Table3 id_Table1 id_Table2
1 1 4
2 1 5
3 2 5
So table 3 holds the relation between table 1 and 2.
What I want to do, is with a query, get all the fields in the table 1, plus one extra field that contains all the ids of the table 2 separated by coma.
So the result should be something like this:
id_Table1 field_table1_1 field_table1_2 id_Table2
1 A B 4, 5
2 C D 5
3 E F
One option use a lateral join and string_agg():
select t1.*, x.*
from table1 t1
outer apply (
select string_agg(t3.id_table2) id_table2
from table3 t3
where t3.id_table1 = t1.id_table1
) x
There is no need to bring table2 to get the results you want.

Which kind of join do I need to use here?

For every row in table Y, I need a copy of the current row in Table X, taking field 1 from Table Y.
Table X
Field 1 Field 2
null A
null B
null C
Table Y
Field 1
1
2
3
Desired output
Field 1 Field 2
1 A
1 B
1 C
2 A
2 B
2 C
3 A
3 B
3 C
Looks like a cross join:
select y.field1, x.field2
from x cross join
y;
Looks like an unconditional select of both tables without matching ids
Something like
select tableY.column1, tableX.column2
from tableY, tableX
order by tableY.column1 asc, tableX.column2 asc
should do it.
BTW. Was this a school question, because then I should not have answered this.
Try this query:
SELECT #Tabley.Field1 , #TableX.Field2
FROM #TableX ,#Tabley

How SQL join work?

i am using MYSQL..
I have two tables:
TABLE 1 (TABLE NAME T1)
SL NAME
1 a
2 b
3 c
4 c
table 2 (table name T2)
SL NAME
1 a
2 c
3 c
4 c
Q1: how i count the total number of 'c' in both table?
Q2: which name is max occurrences in both table?
sl is primary key...
my query is:>
select count(*) from t1,t2
where t1.name=t2.name where t1.name='c';
but it showing 6
To count c in both tables you should use UNION, not JOIN.
Syntax:
SELECT ...
UNION [ALL | DISTINCT] SELECT ...
[UNION [ALL | DISTINCT] SELECT ...]
Doc:
http://dev.mysql.com/doc/refman/5.0/en/union.html
Edit:
I'll explain the query that you provided.
select count(*) from t1,t2 where t1.name=t2.name where t1.name='c';
First of all, you use WHERE clause twice which is a syntax error. Should be:
select count(*) from t1,t2 where t1.name=t2.name AND t1.name='c';
And this is the same that:
SELECT count(*) from t1
JOIN t2 ON t1.name=t2.name
WHERE t1.name='c';
You choose only rows with c value so these are the rows, that we will take under consideration:
TABLE 1 (TABLE NAME T1)
SL NAME
3 c
4 c
table 2 (table name T2)
SL NAME
2 c
3 c
4 c
Now, simple JOIN joins every row from table 1 to every row from table 2 (where condition is true of course)
So the result before counting is:
t1.SL t1.NAME t2.SL t2.NAME
3 c 2 c
4 c 3 c
3 c 4 c
4 c 2 c
3 c 3 c
4 c 4 c
This is 6 rows.
Answers for both of your questions.
SELECT name, count(*) as cnt
FROM(select t1.name from t1
union all
select name from t2) as tem
group by name
order by cnt DESC
This query will give you ranking of names ordered by occurrences.
To retrieve only c count, just add WHERE clause. To retrieve only the most occurring name set LIMIT clause to 1.
INSERT INTO #test
SELECT NAME FROM m_t1 WHERE NAME ='c'
UNION all
SELECT NAME FROM m_t2 WHERE NAME ='c'
SELECT count(*) FROM #test

Select records by comparing subsets

Given two tables (the rows in each table are distinct):
1) x | y z 2) x | y z
------- --- ------- ---
1 | a a 1 | a a
1 | b b 1 | b b
2 | a 1 | c
2 | b 2 | a
2 | c 2 | b
2 | c
Is there a way to select the values in the x column of the first table for which the subset of values in the y column, for that x, matches exactly the values in the z column of the second table?
In case 1), expected result is 1. If c is added to the second table then the expected result is 2.
In case 2), expected result is no record since neither of the subsets in the first table matches the subset in the second table. If c is added to the second table then the expected result is 1, 2.
I've tried using except and intersect to compare subsets of first table with the second table, which works fine, but it takes too long on the intersect part and I can't figure out why (the first table has about 10.000 records and the second has around 10).
EDIT: I've updated the question to provide an extra scenario.
SELECT
table1.x
FROM
table1
INNER JOIN
table2
ON table1.y = table2.z
GROUP BY
table1.x
HAVING
COUNT(*) = (SELECT COUNT(*) FROM table2 AS lookup)
AND COUNT(*) = (SELECT COUNT(*) FROM table1 AS lookup WHERE x = table1.x)
One of these will do
select
t1.x
from
table1 as t1 inner join table2 as t2 on t1.x=t2.x
group by t1.x
having count(distinct t1.x)=count(distinct t2.x)
select
t1.x
from
table1 as t1 inner join table2 as t2 on t1.x=t2.x
group by t1.x
having count(distinct t1.x)=(select count(distinct x) from table2)