Hive join with distinct

Hive join with distinct - hive

I have two tables TableA and TableB.
TableA has columns REC_NUM and ITEM_ID.
TableB has columns ITEM_ID, UNITS.
I need to take a distinct of ITEM_ID from TableA, and fetch all the
records from TableB for every matching value of ITEM_ID from TableA with that of the ITEM_ID of TableB.
Can someone please let know I can do this.

as per the question :
schema of TABLEA:
REC_NUM INT,ITEM_ID INT
schema of TABLEB:
ITEM_ID INT,UNITS INT
Following query should work:
SELECT b.* FROM (SELECT DISTINCT ITEM_ID FROM TABLEA) a JOIN TABLEB b ON a.ITEM_ID=b.ITEM_ID;

correct me if I am wrong
select ITEM_ID, UNITS from TableB where ITEM_ID in (select ITEM_ID from tableA)
I am not sure why you want to use distinct, should it be used with the column REC_NUM?

Related

join two tables in sql using common column

I have two table.
tablea contains assetID, branchID, latID, lonID. Each row is unique.
assetID, branchID, latID, lonID
For every assetID in tablea, there are 32 entries in tableb in the following format:
assetID, branchID, risk1, risk2, risk3, risk4
I want to randomly select 10 rows from tablea, pull the data from tableb for these random assetID and join them together to get the table in following format
assetID, branchID, latID, lonID, risk1, risk2, risk3, risk4
So far I have the below sql query but I am unable to join the two tables:
select * from tableb where branchID <2 and assetID in
(select top 10 assetID from tablea where assetID is not null and branchID <2)

Does this solve your problem?
select * from (
select * from /*i get the first 10 rows from tablea*/
tablea
where branchID < 2
limit 10
) as tablea
join tableb /*i pull the relative data from tableb */
on tablea.assetID = tableb.assetID and tableb.branchID < 2

How to count rows matching multiple filters in SQL?

I have data in which I'm aiming to find rows that have unique values of their main_ID column and then count the total of those IDs that also have either of 2 values for another ID column.
I am trying this:
SELECT COUNT(DISTINCT(main_id))
FROM (SELECT other_id, main_id FROM database.table WHERE other_id ='5') a INNER JOIN
(SELECT other_id, main_id FROM database.table WHERE other_id ='6') b USING (main_id)
This returns an error at (SELECT saying subquery in FROM must have an alias. I've never coded in SQL before so I'm not sure what to start with addressing this. As I understand it, it wants aliases for the 2 columns - how do I assign these for my inner join?

your query can be optimized to this :
select count(*) from (
select main_id
from database.table
where other_id in ('5','6')
group by main_id
having count(distinct other_id) = 2
) t

You need to follow this structure for an inner join
SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name;
in your query you need to add the relation between the 2 tables, in this case you need to use the primary key of both tables to make the relation between the tables.
like this example you need to add "On" stament and "table a primary key" equals "table b primary key"
SELECT COUNT(DISTINCT(main_id))
FROM (SELECT other_id, main_id FROM database.table WHERE other_id ='5') a INNER JOIN
(SELECT other_id, main_id FROM database.table WHERE other_id ='6') b on
a.primary_key=b.primary_key
You can red more information about inner join.

How do I join two tables together (one to many relationship), but only select the 3rd match from the second table?

I have two tables, table A and table B. There are multiple entries in table B for each entry in table A when joining them together, but I only want to match the 3rd value from table B, which is neither the maximum nor the minimum of the values. The values can be ordered, and it will always be the 3rd value after ordering. Is there a way to do this? Thank you!

WITH
ranked_b AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY key ORDER BY val) AS key_rank
FROM
table_b
)
SELECT
*
FROM
table_a
INNER JOIN
ranked_b
ON ranked_b.key = table_a.key
AND ranked_b.key_rank = 3

Consider below approach
select key,
array_agg(value order by value limit 3)[safe_ordinal(3)] as value
from tableA
left join tableB
on key = foreignkey
group by key

You can use a correlated subquery:
select a.*,
(select b.value
from b
where b.key = a.key
limit 1 offset 2
)
from a;

Insert into table without duplicates

I have a Table A from where I have to copy Data to Table B. Now problem is In both table A and Table B there is a column ID which is primary key and can't be null.Table A is having Duplicates. Can any one tell me How to insert Data into Table B from Table A without Duplicates?

It would be something like
INSERT INTO TableA(ID) SELECT DISTINCT ID FROM TableB B LEFT JOIN TableA A ON A.ID = B.ID WHERE A.ID IS NULL

You can use the DISTINCT function in a select statement to remove duplicates.
In the example I'm going to assume that both tables have 3 columns called ID, Name and Surname:
insert into tableB (ID, Name, Surname)
select
distinct(ID) as ID
,Name
,Surname
from tableA
;
Please note that the DISTINCT function will provide distinct rows.

SQL: Single select query from 2 Non-Joining tables

I have 2 tables which doesn't have any references to each other and I'm trying to create a third table (for reference lookup) by selecting from fields from both tables.
TableA has an A_ID
TableB has a B_ID
I want to create Table C which has a 1 to 1 reference between A_ID to B_ID
where A_ID = FirstID and B_ID = SecondID, I can't join the 2 tables because there's nothing in common.
Something like:
Insert INTO [TableC]
(FirstID, SecondID)
SELECT
A_ID As FirstID,
(Select B_ID From TableB)
FROM TableA
Basically we are creating a relationship right now with Table C so that we can use it to reference the two tables in the future using the their IDs.

Assuming TableA and TableB truly have nothing in common, then what you want is the Cartesian product, that is, for every row in A times every row in B.
INSERT INTO TableC(FirstID,SecondID)
SELECT A_ID,B_ID
FROM TableA
CROSS JOIN TableB
Perhaps what you really want is to join them by ROW_NUMBER().
INSERT INTO TableC(FirstID,SecondID)
SELECT A_ID,B_ID
FROM (SELECT A_ID,ROW_NUMBER() OVER (ORDER BY whatever) as rownumA FROM TableA) a
FULL OUTER JOIN (SELECT B_ID,ROW_NUMBER() OVER (ORDER BY whatever) as rownumB FROM TableB) b ON a.rownumA=b.rownumB

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hive join with distinct - hive

as per the question : schema of TABLEA: REC_NUM INT,ITEM_ID INT schema of TABLEB: ITEM_ID INT,UNITS INT Following query should work: SELECT b.* FROM (SELECT DISTINCT ITEM_ID FROM TABLEA) a JOIN TABLEB b ON a.ITEM_ID=b.ITEM_ID;

correct me if I am wrong select ITEM_ID, UNITS from TableB where ITEM_ID in (select ITEM_ID from tableA) I am not sure why you want to use distinct, should it be used with the column REC_NUM?

Related

join two tables in sql using common column

How to count rows matching multiple filters in SQL?

How do I join two tables together (one to many relationship), but only select the 3rd match from the second table?

Insert into table without duplicates

SQL: Single select query from 2 Non-Joining tables

Categories

Resources