Combine many tables in Hive using UNION ALL? - sql

I'm trying to append one variable from several tables together (aka row-bind, concatenate) to make one longer table with a single column in Hive. I think this is possible using UNION ALL based on this question ( HiveQL UNION ALL ), but I'm not sure an efficient way to accomplish this?
The pseudocode would look something like this:
CREATE TABLE tmp_combined AS
SELECT b.var1 FROM tmp_table1 b
UNION ALL
SELECT c.var1 FROM tmp_table2 c
UNION ALL
SELECT d.var1 FROM tmp_table3 d
UNION ALL
SELECT e.var1 FROM tmp_table4 e
UNION ALL
SELECT f.var1 FROM tmp_table5 f
UNION ALL
SELECT g.var1 FROM tmp_table6 g
UNION ALL
SELECT h.var1 FROM tmp_table7 h;
Any help is appreciated!

Try with following coding...
Select * into tmp_combined from
(
SELECT b.var1 FROM tmp_table1 b
UNION ALL
SELECT c.var1 FROM tmp_table2 c
UNION ALL
SELECT d.var1 FROM tmp_table3 d
UNION ALL
SELECT e.var1 FROM tmp_table4 e
UNION ALL
SELECT f.var1 FROM tmp_table5 f
UNION ALL
SELECT g.var1 FROM tmp_table6 g
UNION ALL
SELECT h.var1 FROM tmp_table7 h
) CombinedTable
Use with the statement :
set hive.exec.parallel=true
This will execute different selects simultaneously otherwise it would be step by step.

I would say that's both straightforward and efficient way to do the row-bind, at least, that's what I would use in my code.
Btw, it might cause you some syntax error if you put your pseudo code directly, you may try:
create table join_table as
select * from
(select ...
join all
select
join all
select...) tmp;

I did same concept but for different tables employee and location that might help you I believe :
DATA:Table_e-employee
empid empname
13 Josan
8 Alex
3 Ram
17 Babu
25 John
Table_l-location
empid emplocation
13 San Jose
8 Los Angeles
3 Pune,IN
17 Chennai,IN
39 Banglore,IN
hive> SELECT e.empid AS a ,e.empname AS b FROM employee e
UNION ALL
SELECT l.empid AS a,l.emplocation AS b FROM location l;
OutPut with alias a and b:
13 San Jose
8 Los Angeles
3 Pune,IN
17 Chennai,IN
39 Banglore,IN
13 Josan
8 Alex
3 Ram
17 Babu
25 John

Related

How to show data that's not in a table. SQL ORACLE

I've a data base with two tables.
Table Players Table Wins
ID Name ID Player_won
1 Mick 1 2
2 Frank 2 1
3 Sarah 3 4
4 Eva 4 5
5 Joe 5 1
I need a SQL query which show "The players who have not won any game".
I tried but I don't know even how to begin.
Thank you
You need all the rows from players that don't have corresponding rows in wins. For this you need a left join, filtering for rows that don't join:
select
p.id,
p.name
from Players p
left join Wins w on w.Player_won = p.id
where w.Player_won is null
You can also use not in:
select
id,
name
from Players
where id not in (select Player_won from Wins)
How about the MINUS set operator?
Sample data:
SQL> with players (id, name) as
2 (select 1, 'Mick' from dual union all
3 select 2, 'Ffrank' from dual union all
4 select 3, 'Sarah' from dual union all
5 select 4, 'Eva' from dual union all
6 select 5, 'Joe' from dual
7 ),
8 wins (id, player_won) as
9 (select 1, 2 from dual union all
10 select 2, 1 from dual union all
11 select 3, 4 from dual union all
12 select 4, 5 from dual union all
13 select 5, 1 from dual
14 )
Query begins here:
15 select id from players
16 minus
17 select player_won from wins;
ID
----------
3
SQL>
So, yes ... player 3 didn't win any game so far.
I think you should provide your attempts next time, but here you go:
select p.name
from players p
where not exists (select * from wins w where p.id = w.player_won);
MINUS is not the best option here because of not using indexes and instead performing a full-scan of both tables.
I've a data base with two tables.
You don't show the names or any definition of the tables, leaving me to make an educated guess about their structure.
I tried but I don't know even how to begin.
What exactly did you try? Possibly what you are missing here is the concept of a LEFT OUTER JOIN.
Assuming the tables are named player_table and wins_table, and have column names exactly as you showed, and that the player_won column is intended to express the number of games won by the player of that row's ID, and without knowing whether or not wins_table will have rows for players with zero wins… this should cover it:
select Name
from players_table pt
left join wins_table wt on (pt.ID = wt.ID)
-- Either this player is explicitly specified to have Player_won=0
-- or there is no row for this player ID in the wins table
-- (but excluding the possibility of an explicit NULL value, since its meaning would be unclear)
where Player_won = 0 or wt.ID is null;
As you can see from the variety of answers you've gotten, there are many ways to accomplish this.
One additional way to do this is to use COUNT in a correlated subquery, as in:
SELECT *
FROM PLAYERS p
WHERE 0 = (SELECT COUNT(*)
FROM WINS w
WHERE w.PLAYER_WON = p.ID)
db<>fiddle here
SELECT *
FROM Players p
INNER JOIN Wins w
ON p.ID = w.ID
WHERE w.players_won = 0
I have not done SQL in awhile but I think this might be right if you are looking for players with 0 wins

Oracle SQL. Find Matching values in two different columns and different rows from same table or different one

it is my first question in this community. Any help is welcomed.
Imagine I have a table like this (It can also be having the columns in different tables, I do not mind):
Account_Name_1:
Nike
Pepsi
Coke
Account_Name_2:
Reebok
Coke
Nike
I need to query a list of Account Names who are in "Account_Name_1" and "Account_Name_2"
Which will result as:
Accounts_in_both_columns
Nike
Coke
How can I do this? I have tried with Inner Join but I am not sure,
Thank you :)
extra:
I also have a problem of naming inconsistency across the Account names, some of them are named differently even if they are the same account. Example:
Account_Name_1 Account_Name_2
Nike Reebok
Pepsi Coke
Coke Nike Inc
If we run the same query as before, it will only list 'Coke'.
I have read about UTL Matching, Levenshtein Distance Algorithm and JARO_WINKLER_SIMILARITY Function. But I am not able to create a column of those values who has similarity and how much similar are they, so I can investigate and decide if they are the same account or not.
Please keep in mind it is not about same row matching, but value matching in two columns.
Thank you
I think you want union:
select account_name_1 as account_name
from t
union -- on purpose to remove duplicates
select account_name_2
from t;
EDIT:
If you want the values in both columns, just use exists:
select distinct account_name_1 as account_name
from t
where exists (select 1
from t t2
where t2.account_name_2 = t.account_name_1
);
As far as I understood the question, it is intersect you're looking for:
SQL> with
2 tab_1 (col) as
3 (select 'Nike' from dual union all
4 select 'Pepsi' from dual union all
5 select 'Coke' from dual
6 ),
7 tab_2 (col) as
8 (select 'Reebok' from dual union all
9 select 'Coke' from dual union all
10 select 'Nike' from dual
11 )
12 -- code you need follows
13 select col from tab_1
14 intersect
15 select col from tab_2;
COL
------
Coke
Nike
SQL>

how to union 2 tables based on specific columns in sas

I want to union 2 tables, but get the error
proc sql;
select * from Table1
outer union corr
select * from table2;
But get the error:
ERROR: The type of column EntryId from the left hand side of the OUTER UNION set operation is
different from EntryId on the right hand side
If I understand this correct and based on UNION ALL two SELECTs with different column types - expected behaviour?, the first column is a different data type and cannot proceed with the union (which is true)
RecordID num label='RecordID' format=20. informat=20.
and
RecordID num label='RecordID' format=11. informat=11.
BUT, there is a column I want to use which has the same format
Pseu char(64) label='Pseu' format=$64. informat=$64.
Pseu char(64) label='Pseu' format=$64. informat=$64.
and in each table they are columns 3 and 4.
Is there a way to union these table together using that column as the reference, as opposed to the original?
I tried to no avail:
proc sql;
select * from Table1
outer union corr
select * from table2
on Table1.Pseu=Table2.Pseu;
ERROR: Found "on" when expecting ;
It follows from the OUTER UNION CORRESPONDING example given on http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473694.htm, and is here based on what I want:
table1
R y p
1 A 100
2 B 101
3 R 102
table2
R z p
4 A 102
5 R 103
6 T 104
MERGED
p R y R z
100 1 A
101 2 B
102 3 R 4 A
103 5 R
104 6 T
Something like this perhaps:
proc sql;
select * from Table1
outer union corr
select p, r as r2, z from table2
Having a column alias for the column r.
Using a regular UNION:
select p, r, y, null, null from Table1
outer union corr
select p, null, null, r as r2, z from table2
The answer supplied by my search and jarlh were correct.
The issues arose due to the size and number of columns in the data sets to be union'd. I had to make sure that there were no column names repeated in the union'd data sets (in my 600 total columns, some had similar names), so I had to rename columns

Explanation on how the minus/except operator works

I am trying to figure out how the minus/except operator works.
Somehow I cannot find anything useful on the web. The "minus" operator is used to return all rows in the first statement that are not part of the second statement. But how exactly does it manage to do this?
Can someone please provide me how this is done "step by step"?
Note: The following answer is true for Oracle. See The UNION [ALL], INTERSECT, MINUS Operators
My SQL has only UNION and UNION ALL. Results for INTERSECT and MINUS can be got using IN and NOT IN. See Union, Difference, Intersection, and Division in MySQL [PDF]
In SQL Server, MINUS is called EXCEPT. See Set Operators (Transact-SQL)
I am surprised you are unable to find anything related to this on the Web. MINUS is a set operation in SQL, others include UNION, UNION ALL and INTERSECT.
This is what they do:
Sample Data:
EMPLOYEE
ID NAME SALARY AGE
1 Alice 5000 23
2 Joe 1000 25
3 Raj 2000 28
4 Pam 1500 32
UNION:
Returns results from SQL 1 combiled with Results from SQL 2, after removing duplicates. A variation is UNION ALL that does not remove duplicates. UNION ALL has better performance because it does not do the sort and remove duplication (internal) step. Union all is useful when the results of two SQLs being used are mutually exclusive.
select * from employee where salary > 1000
union
select * from employee where age > 25
returns all employees that are 25 years old or more or have a salary > 1000 (satisfy either condition)
ID NAME SALARY AGE
1 Alice 5000 23
3 Raj 2000 28
4 Pam 1500 32
Using UNION ALL in the above case returns record for Raj twice because UNION ALL does not remove duplicates.
select * from employee where salary > 1000
union all
select * from employee where age > 25
ID NAME SALARY AGE
1 Alice 5000 23
3 Raj 2000 28
4 Pam 1500 32
3 Raj 2000 28
INTERSECT:
Returns only common records between the result sets.
select * from employee where salary > 1000
intersect
select * from employee where age > 25
returns only those records that satisfy both conditions: Have salary > 1000 AND are over 25.
ID NAME SALARY AGE
3 Raj 2000 28
4 Pam 1500 32
MINUS:
Returns records from SQL 1 after removing results from SQL 2:
select * from employee where salary > 1000
intersect
select * from employee where age > 25
returns all those employees that have a salary > 1000 after removing employees that are more than 25 years of age:
ID NAME SALARY AGE
1 Alice 5000 23
Assume:
create table U
( x int not null primary key );
insert into U (x) values (1),(2),(3);
create table V
( y int not null primary key );
insert into U (x) values (3),(4);
Now U - V = { 1, 2 }
U - V can be expressed as:
All tupels in U that does not exist in V, i.e.
select x
from U
where not exists (
select 1
from V
where V.y = U.x
);
In the same way V - U = { 4 }
Did that clarify?
this should give you a clue !
create table #t(id int)
insert into #t values(1),(2),(3),(4),(5)
create table #t2(id1 int)
insert into #t2 values(2),(5),(6)
select * from #t except select * from #t2
select id from #t left join #t2 on #t.id=#t2.id1 where #t2.id1 is null
SEE DEMO
IN set theory
say A={A,B,C,D} ,B={B,X,Y,Z}
so A-B={A,C,D} -- which is a left join in sql

SQl query required for the below Scenario

Here for part ‘CF061W’ finum is 25, I will select records whose fparinum value is 25 now I will get these parts FA061W, HRD20600 and SD1201. Now again I will select records whose fparinum value is finumber of above retrieved parts FA061W, HRD20600 and SD1201 and so on. This should continue till the highest level (flevel), for the above table it is up to level 4.
Now I want single sql query that will retrieve all the records for the parent part ‘CF061W’.
Thanks in advance
Pradeep
this wil work for you
WITH TAB_CTE AS (
SELECT finum, part, fparinum, flevel
FROM TABTEST
WHERE PART='CF061W'
UNION ALL
SELECT e.finum, e.part, e.fparinum, e.flevel
FROM TABTEST e
INNER JOIN TAB_CTE ecte ON ecte.finum = e.fparinum
)
SELECT *
FROM TAB_CTE
OUTPUT
finum part fparinum flevel
25 CF061W 0 1
26 FA061w 25 2
27 hrd20600 25 2
35 sd1201 25 2
28 f1024 27 3
I might have the join condition columns: INNER JOIN PartHierarchy ph ON n.finum = ph.fparinum the wrong way round (not familiar with your schema).
WITH PartHierarchy (finum, part, fparinum , dsono, flevel) AS
(
-- Base case
SELECT
finum,
part,
fparinum,
dsono,
1 as flevel
FROM myTablename
WHERE fparinum = 0
UNION ALL
-- Recursive step
SELECT
n.finum,
n.part,
n.fparinum,
n.dsono,
ph.flevel + 1 AS flevel
FROM myTablename n
INNER JOIN PartHierarchy ph ON n.finum = ph.fparinum
)
SELECT *
FROM PartHierarchy
ORDER BY flevel
This is a classic recursive CTE (Common Table Expression)
This is almost a textbook example of when to use a Recursive CTE.
There are plenty of articles detailing what to do. eg. this one on MSDN:
http://msdn.microsoft.com/en-us/library/ms186243.aspx