How to map each distinct value of a column in one table with each distinct value of a column in another table in Hive - hive

I have two tables in Hive, Table1 and Table2. I want to get each distinct customerID in Table1 and map it to each distinct value in a column called category of Table2. However I am a bit lost on how to do this in hive. A better example of what I am trying to do is the following: Let's say Table1 contains 5 distinct customerID's and Table2 contains 3 distinct categories. I want my query result to look something like the following:
However Table1 and Table2 do not have any columns in common so I am a bit lost on how to perform a join on this two tables in hive. Is this task possible in hive? Any insights on this would be greatly appreciated!

You can do that with a cross join of distinct values from both tables.
select t1.customerid,t2.categories
from (select distinct customerid from tbl1) t1
cross join (select distinct categories from tbl2) t2

Related

Multiple rows in table with values from another table

I am struggling with following issue:
Table1:
Table2:
Expected result:
Basically I want to multiple rows in dates table with rows from User table. Is it somehow possible? (using TSQL).
You need to apply cross join
select date,month,userid,userno from table1 cross join table2
select Date, Month, USER_ID, ID from
t1 cross join t2

Not able to Get data from multiple independent tables that have a common column and yet do not depend on each other

I have 8 tables all with equal number of columns and with a common column. I want to fetch data from all tables in a single query.
My table structure is TABLE1, TABLE2, TABLE3, ..... TABLE 8.
that have columns COLUMNA, COLUMNB... COLUMNE and a COMMON_COLUMN
I need to get data with a where clause where COMMON_COLUMN='X'
I will need all columns from all tables.
I used a query that goes like this..
SELECT TABLE1.*, TABLE2.*, TABLE3.*
FROM TABLE1 T1
LEFT JOIN TABLE2 T2 ON T1.COMMON_COLUMN = T2.COMMON_COLUMN,
LEFT JOIN TABLE3 T3 ON T1.COMMON_COLUMN = T3.COMMON_COLUMN
WHERE T1.COMMON_COLUMN='X' AND T2.COMMON_COLUMN='X' AND T3.COMMON_COLUMN='X'
The above query is not giving any results even if one of the tables do not have any rows. I do not want to use inner join because although the tables have a common column they do not depend on each other and I need data from all tables with a certain common column.
Also, the tables have unequal number of rows.
What am I doing wrong?
correct me if i am wrong - as you do not attach any sample data and desired result
but i assume that you simply need union all tables. You write in the title that tables are independent
SELECT T1.*
FROM TABLE1 T1
WHERE T1.COMMON_COLUMN='X'
UNION ALL
SELECT T2.*
FROM TABLE2 T2
WHERE T2.COMMON_COLUMN='X'
UNION ALL
SELECT T3.*
FROM TABLE3 T3
WHERE T3.COMMON_COLUMN='X'
...

Comparing rows of two tables in HIVE

I want to compare two tables in HIVE. More specifically, I want to see if table2 has any rows that are not in table1, and vice versa. So far I have this:
select count(A.PERS_KEY) from
table1 A left outer join table2 B
on A.PERS_GEN_KEY = B.PERS_KEY
where B.PERS_KEY IS NULL;
But this will only check for the PERS_KEY. How would I check to see if an entire row is in one table but not the other?
You can check based on pers_key. However, i am not sure why you would compare the entire row.
select pers_key from table1
where pers_key not in (select distinct PERS_GEN_KEY from table2)
select pers_gen_key from table2
where pers_gen_key not in (select distinct pers_key from table1)

Query With Distinct Value

I have two tables Tabel1 and Table2. My tables look something like this.
Table1 has three fields Cust_Number, sales_org and Orders.
Table 2 has fields by name Cust_Number, sales_org, BU and Dist_Channel.
Dist_Channel is missing in table1 hence I have to get the Distinct of Cust_Number and sales_org from TABLE 2 and then do a join with table1 to get the corresponding BU.
I was able to do it in MS access by creating one additional query to pull the distinct Numbers and then using that query in my final query.
Could anybody give some suggestions on this?
SELECT t1.cust_number, t2.bu, t2.dist_channel
FROM table1 t1, table2 t2
WHERE t1.cust_number = t2.cust_number
AND t1.sale_org = t2.sales_org
AND -- your actual criteria go here
Try this simple version :
SELECT t1.cust_number, t1.sales_org, t2.BU
FROM Tabel1 t1
INNER JOIN Table2 t2 ON t1.cust_number= t2.cust_number AND t1.sales_org = t2.sales_org
You can do it like this:-
SELECT tbl1.cust_number, tbl1.sales_org tbl2.bu, tbl2.dist_channel
FROM Table1 as tbl1, Table2 as tbl2
WHERE tbl1.cust_number = tbl2.cust_number
AND tbl1.sale_org = tbl2.sales_org
......
Cust_Number, sales_org, are common in both the table you can do this by left join
also first check distinct count of Cust_Number and sales_org in table 2 is same or not
SELECT Tabel1.Cust_Number, Tabel1.sales_org, Table2.BU FROM Tabel1 left JOIN Table2 ON Tabel1.Cust_Number = Table2.Cust_Number AND Tabel1.sales_org = Table2.sales_org
Santhosa, as I understand the problem, it's that in Table2, you can have multiple rows that have the same customer number and sales org (but different dist_channels) so that a simple join to the table will multiply the number of rows coming out of Table1, which is not what you want. Instead, you want to use Table2 simply as a look up for the BU for a customer number and sales org, and we are assuming that there is a functional dependency from cust_number,sale_org --> BU, i.e. that for any given cust_number, sale_org pair in Table2, the BU is always the same.
If that assumption is true, then you can do the following:
SELECT tb1.cust_number, tb1.sales_org, tb2.bu
FROM Table1 AS tb1
JOIN (
SELECT DISTINCT cust_number, sales_org, bu
FROM Table2) AS tb2
ON tb1.cust_number = tb2.cust_number
AND tb1.sales_org = tb2.sales_org
But keep in mind that if you have multiple BUs for a given cust_number, sales_org pair, then this will still result in multiple rows being returned for a given cust_number, sales_org pair in Table1.

How do I merge data from two tables in a single database call into the same columns?

If I run the two statements in batch will they return one table to two to my sqlcommand object with the data merged. What I am trying to do is optimize a search by searching twice, the first time on one set of data and then a second on another. They have the same fields and I’d like to have all the records from both tables show and be added to each other. I need this so that I can sort the data between both sets of data but short of writing a stored procedure I can’t think of a way of doing this.
Eg. Table 1 has columns A and B, Table 2 has these same columns but different data source. I then wan to merge them so that if a only exists in one column it is added to the result set and if both exist it eh tables the column B will be summed between the two.
Please note that this is not the same as a full outer join operation as that does not merge the data.
[EDIT]
Here's what the code looks like:
Select * From
(Select ID,COUNT(*) AS Count From [Table1]) as T1
full outer join
(Select ID,COUNT(*) AS Count From [Table2]) as T2
on t1.ID = T2.ID
Perhaps you're looking for UNION?
IE:
SELECT A, B FROM Table1
UNION
SELECT A, B FROM Table2
Possibly:
select table1.a, table1.b
from table1
where table1.a not in (select a from table2)
union all
select table1.a, table1.b+table2.b as b
from table1
inner join table2 on table1.a = table2.a
edit: perhaps you would benefit from unioning the tables before counting. e.g.
select id, count() as count from
(select id from table1
union all
select id from table2)
I'm not sure if I understand completely but you seem to be asking about a UNION
SELECT A,B
FROM tableX
UNION ALL
SELECT A,B
FROM tableY
To do it, you would go:
SELECT * INTO TABLE3 FROM TABLE1
UNION
SELECT * FROM TABLE2
Provided both tables have the same columns
I think what you are looking for is this, but I am not sure I am understanding your language correctly.
select id, sum(count) as count
from (
select id, count() as count
from table1
union all
select id, count() as count
from table2
) a
group by id