Getting the count from two tables in Apache HIVE or SQL - sql

So I have two tables:
table_1 and table_2
They both have various columns with the same name.
We only need to work with 2 columns:
ID and REGION
table_1 has ID fields that are distinct to table_1 only.
table_2 has ID fields that are distinct to table_2 only.
however, some ID fields are shared by both table_1 and table_2
I need to write a query where i get the number of different ID fields from both tables where REGION = '1'

A FULL OUTER JOIN should do the trick.
SELECT COUNT(*)
FROM table_1
FULL OUTER JOIN table_2 ON (table_1.id=table_2.id)
It will create a single row for every id that is either in table_1 or table_2. If the id is in both tables, it will still create a single row.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins

Using SQL, take advantage of a UNION to eliminate duplicate values between the two tables, so you're left with a distinct list of ID values to count.
SELECT COUNT(*)
FROM (SELECT ID
FROM table_1
WHERE REGION = '1'
UNION
SELECT ID
FROM table_2
WHERE REGION = '1') t

Related

Get rows from two tables with multiple matches in the other

I have two all the similar fields tables:
table_1:
field_1, field_2, field_3, field_4
table_2:
field_1, field_2, field_3, field_4
Here field_1 can be used as foreign key to join both tables.
I would like to get all the rows from table_1 and table_2 that have at least one row in table_1 but more than one in table_2, or vice versa.
So far I have tried these related solutions:
https://dba.stackexchange.com/questions/144313/how-do-i-find-mismatches-in-two-tables
Compare two tables, find missing rows and mismatched data.
Assuming both tables have the same row type: all the same column names and types (at least compatible), you can work with row types to simplify:
SELECT (t).*
FROM (SELECT t, count(*) AS ct1 FROM table_1 t GROUP BY 1) t1
JOIN (SELECT t, count(*) AS ct2 FROM table_2 t GROUP BY 1) t2 USING (t)
WHERE t1.ct1 > 1
OR t2.ct2 > 1;
Group duplicates and remember the count in each table.
Join the two tables, which removes all rows without match in the other table.
Filter rows where at least one side has more than one copy.
In the outer SELECT decompose the row type to get columns as usual.
I don't return row counts. If you need those, add ct1 and ct2 in the outer SELECT.
This requires every column type to support the equality operator =.
A prominent example that does not is json. (But jsonb does.) See:
How to query a json column for empty objects?
If you have such columns, cast to text to work around it. Or you can work with hash values - which also helps performance for very wide rows and/or many duplicates. Related:
Why doesn't my UNIQUE constraint trigger?
One way of getting all records from table_1 which have more than one matching record in table_2 is to count the number of matching records in a subquery, and put a condition on it:
SELECT *
FROM table_1 t1
WHERE (SELECT count(*)
FROM table_2 t2
WHERE t1.field_1 = t2.field_1) > 1
If you're looking to have both sides of this in one query, you can combine them with a UNION:
SELECT *
FROM table_1 t1
WHERE (SELECT count(*)
FROM table_2 t2
WHERE t1.field_1 = t2.field_1) > 1
UNION
SELECT *
FROM table_2 t2
WHERE (SELECT count(*)
FROM table_1 t1
WHERE t1.field_1 = t2.field_1) > 1

How to map each distinct value of a column in one table with each distinct value of a column in another table in Hive

I have two tables in Hive, Table1 and Table2. I want to get each distinct customerID in Table1 and map it to each distinct value in a column called category of Table2. However I am a bit lost on how to do this in hive. A better example of what I am trying to do is the following: Let's say Table1 contains 5 distinct customerID's and Table2 contains 3 distinct categories. I want my query result to look something like the following:
However Table1 and Table2 do not have any columns in common so I am a bit lost on how to perform a join on this two tables in hive. Is this task possible in hive? Any insights on this would be greatly appreciated!
You can do that with a cross join of distinct values from both tables.
select t1.customerid,t2.categories
from (select distinct customerid from tbl1) t1
cross join (select distinct categories from tbl2) t2

Query With Distinct Value

I have two tables Tabel1 and Table2. My tables look something like this.
Table1 has three fields Cust_Number, sales_org and Orders.
Table 2 has fields by name Cust_Number, sales_org, BU and Dist_Channel.
Dist_Channel is missing in table1 hence I have to get the Distinct of Cust_Number and sales_org from TABLE 2 and then do a join with table1 to get the corresponding BU.
I was able to do it in MS access by creating one additional query to pull the distinct Numbers and then using that query in my final query.
Could anybody give some suggestions on this?
SELECT t1.cust_number, t2.bu, t2.dist_channel
FROM table1 t1, table2 t2
WHERE t1.cust_number = t2.cust_number
AND t1.sale_org = t2.sales_org
AND -- your actual criteria go here
Try this simple version :
SELECT t1.cust_number, t1.sales_org, t2.BU
FROM Tabel1 t1
INNER JOIN Table2 t2 ON t1.cust_number= t2.cust_number AND t1.sales_org = t2.sales_org
You can do it like this:-
SELECT tbl1.cust_number, tbl1.sales_org tbl2.bu, tbl2.dist_channel
FROM Table1 as tbl1, Table2 as tbl2
WHERE tbl1.cust_number = tbl2.cust_number
AND tbl1.sale_org = tbl2.sales_org
......
Cust_Number, sales_org, are common in both the table you can do this by left join
also first check distinct count of Cust_Number and sales_org in table 2 is same or not
SELECT Tabel1.Cust_Number, Tabel1.sales_org, Table2.BU FROM Tabel1 left JOIN Table2 ON Tabel1.Cust_Number = Table2.Cust_Number AND Tabel1.sales_org = Table2.sales_org
Santhosa, as I understand the problem, it's that in Table2, you can have multiple rows that have the same customer number and sales org (but different dist_channels) so that a simple join to the table will multiply the number of rows coming out of Table1, which is not what you want. Instead, you want to use Table2 simply as a look up for the BU for a customer number and sales org, and we are assuming that there is a functional dependency from cust_number,sale_org --> BU, i.e. that for any given cust_number, sale_org pair in Table2, the BU is always the same.
If that assumption is true, then you can do the following:
SELECT tb1.cust_number, tb1.sales_org, tb2.bu
FROM Table1 AS tb1
JOIN (
SELECT DISTINCT cust_number, sales_org, bu
FROM Table2) AS tb2
ON tb1.cust_number = tb2.cust_number
AND tb1.sales_org = tb2.sales_org
But keep in mind that if you have multiple BUs for a given cust_number, sales_org pair, then this will still result in multiple rows being returned for a given cust_number, sales_org pair in Table1.

How to compare tables and find duplicates and also find columns with different value

I have the following tables in Oracle 10g:
Table1
Name Status
a closed
b live
c live
Table2
Name Status
a final
b live
c live
There are no primary keys in both tables, and I am trying to write a query which will return identical rows without looping both tables and comparing rows/columns. If the status column is different then the row in the Table2 takes presedence.
So in the above example my query should return this:
Name Status
a final
b live
c live
Since you have mentioned that there are no Primary Key on both tables, I'm assuming that there maybe a possibility that a row may exist on Table1, Table2, or both. The query below uses Common Table Expression and Windowing function to get such result.
WITH unionTable
AS
(
SELECT Name, Status, 1 AS ordr FROM Table1
UNION
SELECT Name, Status, 2 AS ordr FROM Table2
),
ranks
AS
(
SELECT Name, Status,
ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY ordr DESC) rn
FROM unionTable
)
SELECT Name, Status
FROM ranks
WHERE rn = 1
SQLFiddle Demo
Something like this?
SELECT table1.Name, table2.Status
FROM table1
INNER JOIN table2 ON table1.Name = table2.Name
By always returning table2.Status you've covered both the case when they're the same and when they're different (essentially it doesn't matter what the value of table1.Status is).

Help with Joins

my first table has about 18K records
so when i
select * from table2 i get about 18k
i'm trying to do a join on it as follows, but i'm getting like 26K back.. what am i doing wrong? i though it's supposed to return all of the "right" aka table2 records plus show me whatever value matches from the first in a separate column...
Select t1.fID , t2.*
FROM table1 t1 right join table2 t2 on t1.fName = t2.f
here is an exmaple of my tables:
table 1:
fID, fName
table 2: id, f, address, etc
i need to get all records from table 2, with an fID column, whenever f=fName
table1 has many rows with a value of fname that matches the same in table2.
Example, say 5k rows table2 have no matching rows in table1, you have a average of 2 rows in table 1 for each of the remaining 13k table2 rows
Because you have also asked for a column for table1, this will happen. You'll note multiple t1.fId values for a given t2.fname. Or NULLs
If t1.fName and t2.f aren't unique identifiers for their tables, you will find that rows from table1 are being joined with multiple rows from table2.
The RIGHT JOIN keyword Return all rows from the right table (table_name2), even if there are no matches in the left table (table_name1).See Right Join
So it looks like you do not have your matching criteria set correctly or you have no matches.
This is possible when some fName values are repeated in Table2 and/or Table 1.
Run these Queries and See:
SELECT fName, COUNT(1) FROM Table2 GROUP BY fName HAVING COUNT(1) > 1
SELECT fName, COUNT(1) FROM Table1 GROUP BY fName HAVING COUNT(1) > 1