PostgreSQL join duplicates rows

PostgreSQL join duplicates rows - sql

I am using PostgreSQL and I am new to it. I am attempting to join a table to two other tables but the results are being duplicated. I have the following tables.
MEAL
id
name
ingredients
flavors
abc
Creamy Chicken Soup
{def,ghi,jkl}
{mno}
INGREDIENT
id
name
def
chicken
ghi
corn
jkl
pepper
FLAVOR
id
name
mno
spicy
And here is my query
SELECT
meal.id,
meal.name,
JSON_AGG(i) as ing,
JSON_AGG(f) as flav,
FROM meal LEFT JOIN
(SELECT
ingredient.id,
ingredient.name
FROM ingredient) i
ON (i.id = ANY(meal.ingredients)) LEFT JOIN
(SELECT
flavor.id,
flavor.name
FROM flavor) f
ON (f.id = ANY(meal.flavors))
GROUP BY
meal.id,
meal.name
And the results are:
id
name
ing
flav
abc
Creamy Chicken Soup
[{id: "def",name:"chicken"},{id:"ghi",name:"corn"},{id:"jkl",name:"pepper"}]
[{id:"mno",name:"spicy"},{id:"mno",name:"spicy"},{id:"mno",name:"spicy"}]
As you can see the flavors are being duplicated the same number of times as the ingredient count. How can I do this query without the duplicates. Unfortunatly I do not have any control over the table structure as it is being pulled in from a third party. I can maniputlate the data in code but I would prefer to query it and get back the correct data set.

Related

SQL query without join

I have the following tables
Table food Table Race Table animal
+------------+--------------+ +------------+--------------+ +------------+--------------+
| Quantity | animal_id | | race_code | race_name | | animal_id | race_code |
+------------+--------------+ +------------+--------------+ +------------+--------------+
I was asked to calculate the average food quantity for every race (race_name). The challenge here is that I should not use JOIN because we have not studied it yet.
I have written the following query:
select AVG(f.quantity),r.race_name from food f, race r
group by r.race_name;
but it doesn't work as I want it to be since it returns the same average food quantity for all races. I know I have to use the animal table to link the other 2 but I didn't know how. I should be using subqueries

That question is exactly the same as your previous, where you had to use SUM (instead of AVG). No difference at all.
Ah, sorry - it wasn't you, but your school colleague, here
Saying that you "didn't learn joins", well - what do you call what you posted here, then? That's a cross join and will produce Cartesian product, once you fix the error you got by not including non-aggregated column into the group by clause and include additional joins required to return desired result.
The "old" syntax is
select r.name,
avg(f.quantity) avg_quantity
from race r, animal a, food f
where a.race_code = r.race_code
and f.animal_id = a.animal_id
group by r.name;
What you "didn't learn yet" does the same, but looks differently:
from race r join animal a on a.race_code = r.race_code
join food f on f.animal_id = a.animal_id
The rest of the query remains the same.
Nowadays, you should use JOINs to join tables, and put conditions into the WHERE clause. For example, condition would be that you want to calculate averages for donkeys only. As you don't have it, you don't need it.

You still have to do some matching of related rows. If not explicitly with JOIN you can do it in the WHERE clause. Ie something like
select AVG(f.quantity),r.race_name
from food f, race r, animal a
where f.animal_id = a.animal_id and a.race_code = r.race_code
group by r.race_name;

select race_name ,(select avg(quantity) from food where animal_id in (select animal_id from animal a where r.race_code = a.race_code))
from race r

Efficient way to query a table with data from another table using 3 keys

Lets say I have Table A and Table B. Both tables contain about 500,000 records. Cat, Dog and Mouse house the same exact data type for both tables but data present in one table may not be in the other.
Table Zoo:
Cat | Dog | Mouse | Bird
xyz dfg sdhf 123
dfr kjf asdc 456
zxc abc qwrt 789
Table Pet_Store:
Cat | Dog | Mouse | Pig
ghf dsa dfre 12
dfr gfr qwy5 19
zxc abc dfgr 21
Desired Result:
Cat | Dog | Mouse
dfr kjf asdc
zxc abc qwrt
I want to query every record where either Cat, Dog or Mouse are the same. There is no unique key here to connect both tables the only way we can draw a connection is with those 3 fields. If at least one match is present return Cat, Dog and Mouse. I did a select statement myself but considering the data I am working with is very large this process is taking a long time so I don't think I am being efficient. Any suggestions?:
select n.Cat, n.Dog, n.Mouse
from Zoo n, Pet_Store t
where
(n.Cat =t.Cat or n.Dog =t.Dog or n.Mouse =t.Mouse)
edit: Sorry I should have included a little more clarity. My brain is fried at the moment so I apologize for that. If any of the fields I do a check on match, pull the fields Cat, Dog, Mouse from the Zoo table.

Depending on how much you care about duplicates, you could do something like
select z.cat, z.dog, z.mouse from zoo z inner join pet_store p on z.cat = p.cat
union all
select z.cat, z.dog, z.mouse from zoo z inner join pet_store p on z.dog = p.dog
union all
select z.cat, z.dog, z.mouse from zoo z inner join pet_store p on z.mouse = p.mouse
This will allow index usage on all columns (assuming you have the proper indexes on both tables).

Well you have not told us much but given what you have told us this is how I would do it.
SELECT A.Cat, A.Dog, A.Mouse
FROM Zoo A
LEFT JOIN Pet_Store B1 ON A.Cat = B1.Cat
LEFT JOIN Pet_Store B2 ON A.Dog = B2.Dog
LEFT JOIN Pet_Store B3 ON A.Mouse = B3.Mouse
WHERE COALESCE(B1.Cat, B2.Dog, B3.Mouse) IS NOT NULL
Since we don't know anything about the structure of the data or other information about the columns or the tables I know of no way to improve this query. HOWEVER, if you do have any indexes at all -- this query will use them the best possible ways.
For example an index on B.Mouse could be used in this query but not used in your example query.

There's nothing really wrong with your query, you're dealing with no indexes and table scans on a reasonably large table. You will see a slight improvement by refactoring the query slightly, but you would see much more significant performance improvements by adding indexes.
SELECT z.Cat, z.Dog, z.Mouse
FROM Zoo z
INNER JOIN Pet_Store p ON
z.Cat = p.Cat OR
z.Dog = p.Dog OR
z.Mouse = p.Mouse
That will return the data you're looking for - there's no need to join the tables multiple times.

Joining a table to two one-to-many relationship tables in SQL Server

Happy Friday folks,
I'm trying to write an SSRS report displaying data from three (actually about 12, but only three relevant) tables that have akward relationships and the SQL query behind the data is proving difficult.
There are three entities involved - a Purchase Order, a Sales Order, and a Delivery. The problem is the a Purchase Order can have many sales orders, and also many deliveries which are NOT linked to the sales orders...that would be too easy.
Both the Sales Order and Delivery tables can be linked to the Purchase Order table by foreign keys and an intermediate table each.
I need to basically list Purchase Orders, a list of sales orders and a list of deliveries next to them, with NULLs for any fields that aren't valid so that'll give the required output in SSRS/when read by a human, ie, for a purchase order with 2 sales orders and 4 delivery dates;
PO SO Delivery
1234 ABC 05/10
1234 DEF 09/10
1234 NULL 10/12
1234 NULL 14/12
The above (when grouped by PO) will tell the users there are two sales orders and four (unlinked) delivery dates.
Likewise if there are more SOs than deliveries, we need NULLs in the Delivery column;
PO SO Delivery
1234 ABC 03/08
1234 DEF NULL
1234 GHI NULL
1234 JKL NULL
Above would be the case with 4 SOs and one delivery date.
Using Left Outer joins alone gives too much duplication - in this case 8 rows, as it gives 4 delivery dates for each match on the sales order;
PO SO Delivery
1234 ABC 05/10
1234 ABC 09/10
1234 ABC 10/12
1234 ABC 14/12
1234 DEF 05/10
1234 DEF 09/10
1234 DEF 10/12
1234 DEF 14/12
It's fine that the PO column is duplicated as SSRS can visually group that - but the SO/Delivery fields can't be allowed to duplicate as this can't be got rid of in the report - if I group the column in SSRS by SO then it still spits out 4 delivery dates for each one.
The only situation our query works nice is when there is just one SO per PO. In that case the single PO and SO numbers are duplicated together for x deliveries and can both be neatly grouped in SSRS. Unfortunately this is a rare occurence in the data.
I've thought of trying to use some sort of windowing function or CROSS APPLY but both fall down as they will repeat for every PO number listed and end up spitting out too much data.
At the point of thinking this just isn't set-based enough to be doable in SQL, I know the data is horrible..
Any help much appreciated.
EDIT - basical sqlfiddle link to the table schemas. Omitted many columns which aren't relevant. http://sqlfiddle.com/#!2/5ba16
Example data...
Purchase Order
PO_Number Style
1001 Black work boots
1002 Green hat
1006 Red Scarf
Sales Order
Sales_order_number PO_number Qty Retailer
A100-21 1001 15 Walmart
A100-22 1001 29 Walmart
A200-31 1006 1000 Asda
Delivery
Delivery_ID Delivery_Date PO_number
1543285 10/05/2014 1001
1543286 12/05/2014 1001
1543287 17/05/2014 1001
1543288 21/05/2014 1002

If you assign row numbers to the elements in salesorders and deliveries, you can link on that.
Something like this
declare #salesorders table (po int, so varchar(10))
declare #deliveries table (po int, delivery date)
declare #purchaseorders table (po int)
insert #purchaseorders values (123),(456)
insert #salesorders values (123,'a'),(123,'b'),(456,'c')
insert #deliveries values (123,'2014-1-1'),(456,'2014-2-1'),(456,'2014-2-1')
select *
from
(
select numbers.number, p.po, so.so, d.delivery from #purchaseorders p
cross join (Select number from master..spt_values where type='p') numbers
left join (select *,ROW_NUMBER() over (partition by po order by so) sor from #salesorders ) so
on p.po = so.po and numbers.number = so.sor
left join (select * , ROW_NUMBER() over (partition by po order by delivery) dor from #deliveries) d
on p.po = d.po and numbers.number = d.dor
) v
where so is not null or delivery is not null
order by po,number

SQL Joins issue

I have 3 database tables.
First one containing Ingredients, second one containing Dishes and the third one which is conecting both Ingredients and Dishes.
Adding data to those tables was easy but I faced a problem while trying to select specific content.
Reurning all ingredients for specific dish.
SELECT *
FROM Ingredient As I
JOIN DishIngredients as DI
ON I.ID = DI.IngredientID
WHERE DI.DishID = 1;
But If i try to query for dish Name and Description no matter what kind o join I use i always get number of results equal to number of used Ingredients. If i have 4 ingredients in my dish then select returns Name and Description 4 times, how can I modify my slect to select those values just once?
Here is result of my query (same as hawk's) if i try to select Name and Description. I am using MS SQL.
ID Name Description DishID IngredientID
-- -------------------- -------------------------------------------------------------------- ------ ---------
1 Spaghetti Carbonara This delcitious pasta is made with fresh Panceta and Single Cream 1 1
1 Spaghetti Carbonara This delcitious pasta is made with fresh Panceta and Single Cream 1 2
Kuzgun's query worked fine for me. However from your sugestions I see that I dont really need join between DishIngredient and Dish.
When I need Name and Descritpion I can simply go for
SELECT * FROM Dish WHERE ID=1;
Wehn I need list of Ingredient I can use my above query.

If you need to display both dish details and ingredient details, you need to join all 3 tables:
SELECT *
FROM Ingredient As I
JOIN DishIngredients as DI
ON I.ID = DI.IngredientID
JOIN Dish AS D
ON D.ID=DI.DishID
WHERE DI.DishID = 1;

If you don't care about ingredient,you don't have to use the table DishIngredient.Just use tale Dish.select * from dish d where d.id=1.
If you want to know what the ingredient is ,the sql that you use just query the id of table ingredient.It's useless.Because of the design of your database ,a little redundancy is a must .
select * from dish d join dishingredient di on d.id=di.dishid join ingredient i on
i.id=di.ingredientid where d.id=1
Of course,you will get number of results that contain dish's name and description.
If you want to get the full information but the least redundancy,you can do it in two step:
select * from dish d where d.id=1;
select * from ingredient i join DishIngredient di on i.id=di.ingredientid where di.dishid=1
In java ,you can write a class to represent a dish and a list to represent the ingredients it use.
public class Dish {
BigDecimal id;
String name;
String description;
List<Ingredient> ingredient;
}
class Ingredient{
BigDecimal id;
String name;
.....
}

Query to JOIN / overwrite field

I'm not sure if I'm using the correct terminology.
SELECT movies.*, actors.`First Name`, actors.`Last Name`
From movies
Inner Join actors on movies.`actor1` Where movies.`actor1` = actors.`indexActors`;
#Inner Join actors on movies.`actor2` Where movies.`actor2` = actors.`indexActors`;
I have the 2nd line commented out, each one works individually, and I'm wondering how to combine them.
2ndly, when I execute the query, I get the results:
ID Title Runtime Rating Actor1 Actor2 First Name Last Name
1 Se7en 127 R 1 2 Morgan Freeman
2 Bruce Almighty 101 PG-13 1 3 Morgan Freeman
3 Mr. Popper's Penguins 94 PG 3 4 Jim Carrey
4 Superbad 113 R 4 5 Emma Stone
5 Crazy, Stupid, Love. 118 PG-13 4 Null Emma Stone
Is there a way to add the results from the 2nd join to the rightmost columns?
Also, is it possible to combine the strings/VARCHARs from First Name and Last Name, and then have that value show up under the corresponding Actor Field?
(aka the field under Actor 1 for row 1 would be "Morgan Freeman" instead of "1")
Thanks.

Your sql is not valid, but you can achieve your goal by joining to the same table twice, with different aliases. This sort of thing
select blah blah blah
from table1 t1 join table2 t2 on t1.field1 = t2.field1
join table2 t2_again on t1.field1 = t2_again.field2
etc
As far as joining first and last names in a single field, most databases have a way to concatenate strings, but they are not all the same. You'll have to specify your db engine.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas