rewriting LEFT OUTER JOIN as a subquery - sql

Is it possible to rewrite the following left join:
SELECT wanted.id AS wanted_id, offers.id AS offer_id, departure_city,
departure_country, destination_city, destination_country, departure_date_min, departure_date_max, MIN(price) as price
FROM wanted LEFT JOIN (SELECT id, wanted_id, SUM(price) AS price FROM `offers` GROUP BY offer_date) AS offers ON wanted.id = offers.wanted_id
WHERE wanted.id IN (SELECT wanted_id FROM wanted_by_user WHERE user_ID=%d) group by offer_id
as a subquery?
I have tried it like this, and technically, it works fine, BUT: it only works like an inner join would (meaning not showing rows if there's no match in the table 'offers'):
FROM wanted, (SELECT id, SUM(price) AS price FROM `offers` GROUP BY offer_date) AS offers "
."WHERE wanted.id IN(SELECT wanted_id FROM wanted_by_user WHERE user_ID=%s)

Related

Group By and Inner Join Together To Get Unique Values By Maximum Date

I have a table here in which I want to write a SELECT query in SQL Server that allows me to get the following:
For each unique combination of SalesPerson x Country, get only the rows with the latest Upload_DateTime
However, I am trying to do a group-by and inner join, but to no avail. My code is something like this:
SELECT t1.[SalesPerson], t1.[Country], MAX(t1.[Upload_DateTime]) as [Upload_DateTime]
FROM [dbo].[CommentTable] AS t1
GROUP BY t1.[SalesPerson], t1.[Country]
INNER JOIN SELECT * FROM [dbo].[CommentTable] as t2 ON t1.[SalesPerson] = t2.[SalesPerson], t1.[Country] = t2.[Country]
It seems like the GROUP BY needs to be done outside of the INNER JOIN? How does that work? I get an error when I run the query and it seems my SQL is not right.
Basically, this subquery will fetch the person, the country and the latest date:
SELECT
SalesPerson, Country, MAX(uplodaed_datetime)
FROM CommentTable
GROUP BY SalesPerson, Country;
This can be used on a lot of ways (for example with JOIN or with an IN clause).
The main query will add the remaing columns to the result.
Since you tried a JOIN, here the JOIN option:
SELECT
c.id, c.SalesPerson, c.Country,
c.Comment, c.uplodaed_datetime
FROM
CommentTable AS c
INNER JOIN
(SELECT
SalesPerson, Country,
MAX(uplodaed_datetime) AS uplodaed_datetime
FROM CommentTable
GROUP BY SalesPerson, Country) AS sub
ON c.SalesPerson = sub.SalesPerson
AND c.Country = sub.Country
AND c.uplodaed_datetime = sub.uplodaed_datetime
ORDER BY c.id;
Try out: db<>fiddle

Prevent duplicate rows when using LEFT JOIN in Postgres without DISTINCT

I have 4 tables:
Item
Purchase
Purchase Item
Purchase Discount
In these tables, the Purchase Discount has two entries, all the others have only one entry. But when I query them, due to the LEFT JOIN, I'm getting duplicate entries.
This query will be running in a large database, and I heard using DISTINCT will reduce the performance. Is there any other way I can remove duplicates without using DISTINCT?
Here is the SQL Fiddle.
The result shows:
[{"item_id":1,"purchase_items_ids":[1234,1234],"total_sold":2}]
But the result should come as:
[{"item_id":1,"purchase_items_ids":[1234],"total_sold":1}]
Using correlated subquery instead of LEFT JOIN:
SELECT array_to_json(array_agg(p_values)) FROM
(
SELECT t.item_id, t.purchase_items_ids, t.total_sold, t.discount_amount FROM
(
SELECT purchase_items.item_id AS item_id,
ARRAY_AGG(purchase_items.id) AS purchase_items_ids,
SUM(purchase_items.sold) as total_sold,
SUM((SELECT SUM(pd.discount_amount) FROM purchase_discounts pd
WHERE pd.purchase_id = purchase.id)) as discount_amount
FROM items
INNER JOIN purchase_items ON purchase_items.item_id = items.id
INNER JOIN purchase ON purchase.id = purchase_items.purchase_id
WHERE purchase.id = 200
GROUP by purchase_items.item_id
) as t
INNER JOIN items i ON i.id = t.item_id
) AS p_values;
db<>fiddle demo
Output:
[{"item_id":1,"purchase_items_ids":[1234],"total_sold":1,"discount_amount":12}]
First I would suggest to remove INNER JOIN items i ON i.id = t.item_id from the query which no reason to be there.
Then instead Left joining Purchase_Discounts table use subquery to get the Discount_amount (as mentioned in Lukasz Szozda's answer)
If there is no discount for any product then Discount_amount column will display NULL. If you want to avoid it then you can use COALESCE() as below instead:
COALESCE(SUM((select sum(discount_amount) from purchase_discounts
where purchase_discounts.purchase_id = purchase.id)),0) as discount_amount
Db-Fiddle:
SELECT array_to_json(array_agg(p_values)) FROM
(
SELECT t.item_id, t.purchase_items_ids, t.total_sold, t.discount_amount FROM
(
SELECT purchase_items.item_id AS item_id,
ARRAY_AGG(purchase_items.id) AS purchase_items_ids,
SUM(purchase_items.sold) as total_sold,
SUM((select sum(discount_amount) from purchase_discounts
where purchase_discounts.purchase_id = purchase.id)) as discount_amount
FROM items
INNER JOIN purchase_items ON purchase_items.item_id = items.id
INNER JOIN purchase ON purchase.id = purchase_items.purchase_id
WHERE
purchase.id = 200
GROUP by
purchase_items.item_id
) as t
) AS p_values;
Output:
array_to_json
[{"item_id":1,"purchase_items_ids":[1234],"total_sold":1,"discount_amount":12}]
db<>fiddle here
The core problem is that your LEFT JOIN multiplies rows. See:
Two SQL LEFT JOINS produce incorrect result
Aggregate discounts to a single row before the join. Or use a (uncorrelated) subquery expression:
SELECT json_agg(items)
FROM (
SELECT pi.item_id
, array_agg(pi.id) AS purchase_items_ids
, sum(pi.sold) AS total_sold
,(SELECT COALESCE(sum(pd.discount_amount), 0)
FROM purchase_discounts pd
WHERE pd.purchase_id = 200) AS discount_amount
FROM purchase_items pi
WHERE pi.purchase_id = 200
GROUP BY 1
) AS items;
Result:
[{"item_id":1,"purchase_items_ids":[1234],"total_sold":1,"discount_amount":12}]
db<>fiddle here
I added a couple of additional improvements:
Assuming referential integrity enforced by FK constraints, we don't need to involve the tables purchase and items at all.
Removed a subquery level doing nothing.
Using json_agg() instead of array_to_json(array_agg()).
Added COALESCE() to output 0 instead or NULL for no discounts.
Since discounts apply to the purchase in your model, not to individual items, it doesn't make sense to output discount_amount for every single item. Consider this query instead to return an array of items and a single, separate discount_amount:
SELECT json_build_object(
'items'
, json_agg(items)
, 'discount_amount'
, (SELECT COALESCE(sum(pd.discount_amount), 0)
FROM purchase_discounts pd
WHERE pd.purchase_id = 200)
)
FROM (
SELECT pi.item_id
, array_agg(pi.id) AS purchase_items_ids
, sum(pi.sold) AS total_sold
FROM purchase_items pi
WHERE pi.purchase_id = 200
GROUP BY 1
) AS items;
Result:
{"items" : [{"item_id":1,"purchase_items_ids":[1234],"total_sold":1}], "discount_amount" : 12}
db<>fiddle here
Using json_build_object() to assemble the JSON object.
Your example with a single item in the purchase isn't too revealing. I added a purchase with multiple items and no discount to my fiddle.
If you can have multiple values only in the purchase_discounts table then a subquery that aggregate multiple purchase_discounts rows into one before the join can solve the problem:
SELECT array_to_json(array_agg(p_values)) FROM
(
SELECT t.item_id, t.purchase_items_ids, t.total_sold, t.discount_amount FROM
(
SELECT purchase_items.item_id AS item_id,
ARRAY_AGG(purchase_items.id) AS purchase_items_ids,
SUM(purchase_items.sold) as total_sold,
X.discount_amount
FROM items
INNER JOIN purchase_items ON purchase_items.item_id = items.id
INNER JOIN purchase ON purchase.id = purchase_items.purchase_id
LEFT JOIN (SELECT purchase_id, sum(purchase_discounts.discount_amount) AS discount_amount FROM purchase_discounts GROUP BY purchase_id) X ON X.purchase_id = purchase.id
WHERE
purchase.id = 200
GROUP by
purchase_items.item_id,
X.discount_amount
) as t
INNER JOIN items i ON i.id = t.item_id
) AS p_values;
The LEFT JOIN is not causing your duplicates, I understand why you need it as there may not be any discounts, but for the data provided changing to an inner join produces the same result. You are getting duplicate entries because you use ARRAY_AGG(purchase_items.id). Further, with the data presented, the tables item and purchase are not necessary. You can use the window version of sum and distinct on to reduce the duplication of purchase_id, and eliminate the mentioned tables. Finally the middle select ... ) t can be completely removed. Resulting in: (see demo)
select array_to_json(array_agg(p_values))
from (select distinct on (pi.item_id, pi.id)
pi.item_id
, pi.id purchase_items_ids
, sum(pi.sold) over (partition by pi.item_id) total_sold
, sum(pd.discount_amount) over(partition by pi.item_id) discount_amount
from purchase_items pi
left join purchase_discounts pd
on pd.purchase_id = pi.purchase_id
order by pi.item_id, pi.id
) as p_values;
I think the left join does not cause, because with the Inner Join query result same as the left join, in discount with purchase_id=200 query has 2 results you can use from row_number with the partion_by same as:
ROW_NUMBER() OVER(PARTITION BY purchase_items.id order by purchase_items.id) rn
then select rn=1.
you change your query for the sum function, I think that you can use from partion_by.

PostgreSQL GROUP BY column must appear in the GROUP BY

SELECT
COUNT(follow."FK_accountId"),
score.*
FROM
(
SELECT items.*, AVG(reviews.score) as "averageScore" FROM "ITEM_VARIATION" as items
INNER JOIN "ITEM_REVIEW" as reviews ON reviews."FK_itemId"=items.id
GROUP BY items.id
) as score
INNER JOIN "ITEM_FOLLOWER" as follow ON score.id=follow."FK_itemId"
GROUP BY score.id
Inner Block works by itself and I believe I followed the same format.
However it outputs error:
ERROR: column "score.name" must appear in the GROUP BY clause or be used in an aggregate function
LINE 18: score.*
^
Is listing all the columns in score field only solution?
there are over 10 columns to list so I'd like to avoid that solution if it's not the only one
columns not included on the aggregation must be specified during group by
SELECT
COUNT(follow."FK_accountId"),
score.id,
score.name
FROM
(
SELECT items.id as id, items.name as name, AVG(reviews.score) as "averageScore" FROM "ITEM_VARIATION" as items
INNER JOIN "ITEM_REVIEW" as reviews ON reviews."FK_itemId"=items.id
GROUP BY items.id, items.name
) as score
INNER JOIN "ITEM_FOLLOWER" as follow ON score.id=follow."FK_itemId"
GROUP BY score.id, score.name
I would suggest you use correlated subqueries or a lateral join:
SELECT i.*,
(SELECT AVG(r.score)
FROM "ITEM_REVIEW" r
WHERE r."FK_itemId" = i.id
) as averageScore,
(SELECT COUNT(*)
FROM "ITEM_FOLLOWER" f
WHERE f."FK_itemId" = i.id
)
FROM "ITEM_VARIATION" i;
With the right indexes, this is probably faster as well.

Issue with IfNull using a join | Big Query

I need some help. I've got three tables that I need information from. The most important parameter is the DealID from my table Flostream.orders. If this is null, I want it replaced with the Mobileheads.survey.sales_rule which is the same format.
I've constructed this:
SELECT
filename,
IFNULL(dealID,mobileheads.surveys.sales_rule) AS DealIDcombo,
COUNT(*) AS Total,
SUM(integer(weight)) AS TotalWeight,
SUM(Productweight)/1000 AS SumWeight,
Currency,
Deliverybasecost,
ROUND(SUM(Deliverybasecost),2) AS TotalDelCost,
Productsku,
Productname,
Dealstartdate
FROM
[flostream.orders]
LEFT OUTER JOIN flostream.briisk
ON dealID = Uniquereference
LEFT OUTER JOIN mobileheads.surveys
ON mobileheads.surveys.order_number = ExternalReference
GROUP BY
filename,
DealIDCombo,
currency,
Deliverybasecost,
Productname,
Productsku,
dealstartdate
ORDER BY
filename,
Total desc;
My issue is with this:
LEFT outer JOIN flostream.briisk ON dealID = Uniquereference
Ideally I would like it to be:
LEFT outer JOIN flostream.briisk ON dealIDCombo = Uniquereference
but unfortunately that doesn't work.
Any ideas on how to tackle this?
This is because the join can't access fields that are computed after the join.
See how Ifnull uses the joined table. You need to nest these tables.
First the join with mobileheads.surveys and then the next join.
SELECT * FROM(
SELECT
filename,
IFNULL(dealID,mobileheads.surveys.sales_rule) AS DealIDcombo,
COUNT(*) AS Total,
SUM(integer(weight)) AS TotalWeight,
SUM(Productweight)/1000 AS SumWeight,
Currency,
Deliverybasecost,
ROUND(SUM(Deliverybasecost),2) AS TotalDelCost,
Productsku,
Productname,
Dealstartdate
FROM
[flostream.orders]
LEFT OUTER JOIN mobileheads.surveys
ON mobileheads.surveys.order_number = ExternalReference) as first
LEFT OUTER JOIN flostream.briisk
ON first.dealIDCombo = Uniquereference
GROUP BY
filename,
DealIDCombo,
currency,
Deliverybasecost,
Productname,
Productsku,
dealstartdate
ORDER BY
filename,
Total desc;
Excuse the mess, i don't know where these field belong to. Hopefully this helps. Ask me if you need more explanation!

SQL Selecting Distinct Count of items where 2 conditions are met

I am struggling to get a DISTINCT COUNT working with SQL DISTINCT SELECT
Not sure if I should even be using distinct here, but I have got it correct using a subquery, though it is very heavy processing wise.
This query does what I ultimately want results wise (without the weight)
SELECT DISTINCT
product_brandNAME,
product_classNAME,
(SELECT COUNT(productID) FROM products
WHERE products.product_classID = product_class.product_classID
AND products.product_brandID = product_brand.product_brandID) as COUNT
FROM products
JOIN product_brand
JOIN product_class
ON products.product_brandID = product_brand.product_brandID
AND products.product_classID = product_class.product_classID
GROUP BY productID
ORDER BY product_brandNAME
This gets close, and is much more efficient, but I can't get the count working, it only counts (obviously) the distinct count which is 1.
SELECT DISTINCT product_brandNAME, product_classNAME, COUNT(*) as COUNT
FROM products
JOIN product_brand
JOIN product_class
ON products.product_brandID = product_brand.product_brandID
AND products.product_classID = product_class.product_classID
GROUP BY productID
ORDER BY product_brandNAME
Any suggestions, I'm sure its small, and have been researching the net for hours for an answer to no avail for 2 conditions to match.
Thanks,
Have you tried following query
Edit
SELECT product_brandNAME
, product_classNAME
, COUNT(*)
FROM products
JOIN product_brand ON products.product_brandID = product_brand.product_brandID
JOIN product_class ON products.product_classID = product_class.product_classID
GROUP BY
product_brandNAME
, product_classNAME
When using GROUP BY you do not need to use a DISTINCT clause. Try the following:
SELECT productID,
product_brandNAME,
product_classNAME,
COUNT(*) as COUNT
FROM products JOIN product_brand ON products.product_brandID = product_brand.product_brandID
JOIN product_class ON products.product_classID = product_class.product_classID
GROUP BY productID,
product_brandNAME,
product_classNAME
ORDER BY product_brandNAME