comparing multiple columns between tables hive

comparing multiple columns between tables hive - hive

I have two tables
Table A:
Fruit Number
Apple 7235
Plum 1284
Pear 8932
Orange 2839
Table B:
Fruit Number
Apple 7235
Apple 3893
Plum 1284
Pear 8932
Orange 2839
Orange 4732
I want the end result of my query to get the columns that are not the same for the tables. For example New TableC:
Fruit Number
Apple 3893
Orange 4732
I tried to do joins but the join is only taking in the first occurrence of a record. How can i achieve the desired results above.

Use a full join which gets you rows missing on either side.
select fruit,coalesce(num1,num2) as number
from (select coalesce(a.fruit,b.fruit) as fruit,a.number as num1,b.number as num2
from a
full join b on a.fruit=b.fruit and a.number=b.number
where a.number is null or b.number is null
) t

Related

Case Statement to Add a Column

I have the following table:
ID Fruit
A apple
A banana
A grapes
B orange
B apple
B grapes
C grapes
C orange
C banana
I would like to add a new column called Apple such to denote whether ID is associated with apple or not:
ID Fruit Apple
A apple yes
A banana yes
A grapes yes
B orange yes
B apple yes
B grapes yes
C grapes no
C orange no
C banana no

Since this seems like a contrived example, I'll post several options. The best one will depend on what you're really doing.
First up, this is likely to perform best, but it risks duplicating rows if you could have multiple matches for the JOINed table. It's also the only solution I'm presenting to actually use a CASE expression as requested.
SELECT a.*, case when b.ID IS NOT NULL THEN 'Yes' ELSE 'No' END AS Apple
FROM MyTable a
LEFT JOIN MyTable b on b.ID = a.ID AND b.Fruit = 'Apple'
Alternatively, this will never duplicate rows, but has to re-run the nested query for each result row. If this is not a contrived example, but something more like homework, this is probably the expected result.
SELECT *, coalesce(
(
SELECT TOP 1 'Yes'
FROM MyTable b
WHERE b.ID = a.ID AND b.Fruit = 'Apple'
), 'No') As Apple
FROM MyTable a
Finally, this also re-runs the nested query for each result row, but there is the potential a future enhancement will improve on that and it makes it possible to provide values for multiple columns from the same nested subquery.
SELECT a.*, COALESCE(c.Apple, 'No') Apple
FROM MyTable a
OUTER APPLY (
SELECT TOP 1 'Yes' As Apple
FROM MyTable b
WHERE b.ID = a.ID AND b.Fruit = 'Apple'
) c
See them work here:
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=e1991e8541e7421b90f601c7e8c8906b

It could be achieved without self JOIN by using windowed COUNT_IF:
SELECT *, COUNT_IF(Fruit = 'apple') OVER(PARTITION BY ID) > 0 AS Apple
FROM tab;
Output:

In a new asking of this same question with more columns,
the form Lukas advocates performs poorly, as the COUNT/SUM is done per row,
at which point do a join between the raw, and the aggregated results should perform better.
select *
FROM MyTable as a
natural join (
SELECT c.id,
iff(count_if(c.Fruit like 'apple') > 0, 'yes', 'no') as "Apple",
--iff(count_if(c.Fruit like 'banana') > 0, 'yes', 'no') as "Banana",
--iff(count_if(c.Fruit like 'orange') > 0, 'yes', 'no') as "Orange"
FROM MyTable as c
GROUP BY 1
) as b

For the following table,
ID Fruit
A apple
A banana
A grapes
B orange
B apple
B grapes
C grapes
C orange
C banana
Adding a new column called Apple to denote whether ID is associated with apple or not, the resultset would be
ID Fruit Apple
A apple yes
A banana no
A grapes no
B orange no
B apple yes
B grapes no
C grapes no
C orange no
C banana no
If the expected resultset is as above, the below query will help to get the desired output.
select
id,
case
when fruit='apple' then 'yes'
when fruit!='apple' then 'no'
end as Apple
from Fruits;

Filtering based on multiple categories

My search to filter based on multiple categories listed below, for any records that fall in categories A-F but do not have more than 1 item from the same category. I will try to explain with an example.
A Bread
B Apple
C Strawberry OR Blueberry OR Raspberry
D Watermelon OR Muskmelon OR Honeydew
E Papaya
F Oranges OR Peaches OR Nectarines
T1:
1
2
3
4
5
6
7
T2:
ID Category
1 Bread
2 Apple
2 Strawberry
3 Blueberry
3 Raspberry
4 Watermelon
5 Muskmelon
5 Honeydew
4 Papaya
2 Oranges
1 Peaches
5 Nectarines
In the above scenario, my search is to return:
1 Bread,Peaches
2 Apple,Strawberry, Oranges
4 Watermelon,Papaya
3 and 5 are not to be returned as they have items from the same category -
#3: Blueberry and Raspberry
#5: Muskmelon, Honeydew and Nectarines

First of all, you need a table -- call it CATEGORY_GROUPS ( category, category_group ) -- that relates this information from your post:
A Bread
B Apple
C Strawberry OR Blueberry OR Raspberry
D Watermelon OR Muskmelon OR Honeydew
E Papaya
F Oranges OR Peaches OR Nectarines
Where, for example, 'Bread' would be the category and 'A' would be the category group.
Then, you join t1, t2, and category_groups together in a query so you have every item, its category and its category group. Then group by the id.
The key part is how to restrict the items that have duplicates. If there are duplicates, then the number of distinct category groups will be less than the number of categories. So, you can use that condition in your HAVING clause to get what you want.
Like this should work:
SELECT t1.id, listagg(t2.category,',') within group ( order by category )
FROM t1 inner join t2 on t2.id = t1.id
inner join category_groups cg on cg.category = t2.category
GROUP BY t1.id
HAVING COUNT(DISTINCT cg.category_group) < COUNT(DISTINCT t2.category)

merging two tables and adding additional column

I am using sql-server. I have two tables (simple snap shot below).
table hlds table bench
name country wgt name country wgt
abc us 30 abc us 40
mno uk 50 ppp fr 45
xyz us 20 xyz us 15
what I would like to do is calculate the differnces in the wgt columns and insert the results into another table, lets call it merge_tbl. The other thing I would like to do is in merge_tbl have a bit column where it is 1 if the company exists in the table hlds.
So I would like the result to look like below,
merge_tbl
name country wgt inHld
abc us -10 1
mno uk 50 1
xzy us 5 1
ppp fr -45 0
How do I go about doing this?

I think you need a FULL OUTER JOIN to get records from both tables. Then, you can use a INSERT INTO SELECT statement to do the insert:
INSERT INTO merge_tbl
SELECT COALESCE(h.name, b.name) AS name,
COALESCE(h.country, b.country) AS country,
COALESCE(h.wgt, 0) - COALESCE(b.wgt, 0) AS wgt,
CASE WHEN h.name IS NOT NULL THEN 1
ELSE 0
END AS inHld
FROM hlds AS h
FULL OUTER JOIN bench AS b ON h.name = b.name AND h.country = b.country
The ON clause of the JOIN operation depends on your actual requirements. I have made the assumption that records from hlds, bench tables match if both name and country fields are equal.
Demo here

Copy table id from one column to another where column data matches between tables

I have two tables, Fruits and Meals, and one of the columns in Meals is a varchar(100) with the fruits in them. I am changing this so that
the column is instead the id of the fruit from the Fruits table, and I would like to set this by comparing the two tables and grabbing the id
from the fruits table where the fruit columns match.
Table: Fruits
id | fruit
1 apple
2 banana
3 orange
Table: Meals
id | Meal | Fruit
1 xxxx apple
2 xxxx apple
3 xxxx orange
4 xxxx banana
5 xxxx orange
6 xxxx orange
7 xxxx apple
I've tried the following script, but I get the following error.
Update product_attribute set control_caption =
(
Select DISTINCT T1.control_caption_id from control_caption T1
INNER Join product_attribute T2
On T1.control_caption = T2.control_caption
Where T1.control_caption = T2.control_caption
)
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

Depends on your RDBMS, but this should work for SQL Server:
Update pa
set pa.control_caption = cc.control_caption_id
From product_attribute pa
Join control_caption cc On
cc.control_caption = pa.control_caption

The Update Query can be simplified and a Join may be used instead of a subquery running the select statement.
Update P
Set P.Control_Caption = C.Control_Caption_ID
From Product_Attribute P
join Control_Caption C on C.Control_Caption= P.Control_Caption
This runs on both SQL Server & Oracle.

SQL query to join results from two tables but also include rows that do not have counterparts in the other table?

Given two tables APPLE and ORANGE,
NAME APPLES
Alice 5
Bob 10
Trudy 1
NAME ORANGES
Bob 50
Trudy 10
Dick 10
How can I write a JOIN to show the table:
NAME APPLES ORANGES
Alice 5 -
Bob 10 50
Trudy 1 10
Dick - 10
I currently have
SELECT a.NAME, APPLES, ORANGES
FROM APPLE a
JOIN
ORANGE o ON o.NAME = a.NAME
but that only returns the fields that have a value in both APPLE and ORANGE.

SELECT COALESCE(a.NAME, b.NAME) as NAME, APPLES, ORANGES
FROM APPLE a
FULL OUTER JOIN ORANGE o ON o.NAME = a.NAME

SELECT a.NAME, a.APPLES, o.ORANGES
FROM APPLE a
FULL OUTER JOIN
ORANGE o ON o.NAME = a.NAME

should be:
SELECT COALESCE(a.NAME,o.name) as Name, APPLES, ORANGES
FROM APPLE a
FULL OUTER JOIN ORANGE o ON o.NAME = a.NAME
Example: http://sqlfiddle.com/#!4/1ae9a/4

Change JOIN to FULL OUTER JOIN.

you have to use a left right outer join depending on which table contains the inclomplete data

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

comparing multiple columns between tables hive - hive

Use a full join which gets you rows missing on either side. select fruit,coalesce(num1,num2) as number from (select coalesce(a.fruit,b.fruit) as fruit,a.number as num1,b.number as num2 from a full join b on a.fruit=b.fruit and a.number=b.number where a.number is null or b.number is null ) t

Related

Case Statement to Add a Column

Filtering based on multiple categories

merging two tables and adding additional column

Copy table id from one column to another where column data matches between tables

SQL query to join results from two tables but also include rows that do not have counterparts in the other table?

Categories

Resources