SQL Sum returning wrong number - sql

I am adding up the amount of tickets sold for a sporting event, the answer should be under 100 but my answer is in the thousands.
SELECT Stubhub.Active.Opponent,
SUM(Stubhub.Active.Qty) AS AQty, SUM(Stubhub.Sold.Qty) AS SQty
FROM Stubhub.Active INNER JOIN
Stubhub.Sold ON Stubhub.Active.Opponent = Stubhub.Sold.Opponent
GROUP BY Stubhub.Active.Opponent

This is type of problem occurs because you are getting a cartesian product between each table for each opponent. The solution is to pre-aggregate by opponent:
SELECT a.Opponent, a.AQty, s.SQty
FROM (SELECT a.Opponent, SUM(a.Qty) as AQty
FROM Stubhub.Active a
GROUP BY a.Opponent
) a INNER JOIN
(SELECT s.Opponent, SUM(s.QTY) as SQty
FROM Stubhub.Sold s
GROUP BY s.Opponent
) s
ON a.Opponent = s.Opponent;
Notice that in this case, you do not need the aggregation in the outer query.

Related

Prevent duplicate rows when using LEFT JOIN in Postgres without DISTINCT

I have 4 tables:
Item
Purchase
Purchase Item
Purchase Discount
In these tables, the Purchase Discount has two entries, all the others have only one entry. But when I query them, due to the LEFT JOIN, I'm getting duplicate entries.
This query will be running in a large database, and I heard using DISTINCT will reduce the performance. Is there any other way I can remove duplicates without using DISTINCT?
Here is the SQL Fiddle.
The result shows:
[{"item_id":1,"purchase_items_ids":[1234,1234],"total_sold":2}]
But the result should come as:
[{"item_id":1,"purchase_items_ids":[1234],"total_sold":1}]
Using correlated subquery instead of LEFT JOIN:
SELECT array_to_json(array_agg(p_values)) FROM
(
SELECT t.item_id, t.purchase_items_ids, t.total_sold, t.discount_amount FROM
(
SELECT purchase_items.item_id AS item_id,
ARRAY_AGG(purchase_items.id) AS purchase_items_ids,
SUM(purchase_items.sold) as total_sold,
SUM((SELECT SUM(pd.discount_amount) FROM purchase_discounts pd
WHERE pd.purchase_id = purchase.id)) as discount_amount
FROM items
INNER JOIN purchase_items ON purchase_items.item_id = items.id
INNER JOIN purchase ON purchase.id = purchase_items.purchase_id
WHERE purchase.id = 200
GROUP by purchase_items.item_id
) as t
INNER JOIN items i ON i.id = t.item_id
) AS p_values;
db<>fiddle demo
Output:
[{"item_id":1,"purchase_items_ids":[1234],"total_sold":1,"discount_amount":12}]
First I would suggest to remove INNER JOIN items i ON i.id = t.item_id from the query which no reason to be there.
Then instead Left joining Purchase_Discounts table use subquery to get the Discount_amount (as mentioned in Lukasz Szozda's answer)
If there is no discount for any product then Discount_amount column will display NULL. If you want to avoid it then you can use COALESCE() as below instead:
COALESCE(SUM((select sum(discount_amount) from purchase_discounts
where purchase_discounts.purchase_id = purchase.id)),0) as discount_amount
Db-Fiddle:
SELECT array_to_json(array_agg(p_values)) FROM
(
SELECT t.item_id, t.purchase_items_ids, t.total_sold, t.discount_amount FROM
(
SELECT purchase_items.item_id AS item_id,
ARRAY_AGG(purchase_items.id) AS purchase_items_ids,
SUM(purchase_items.sold) as total_sold,
SUM((select sum(discount_amount) from purchase_discounts
where purchase_discounts.purchase_id = purchase.id)) as discount_amount
FROM items
INNER JOIN purchase_items ON purchase_items.item_id = items.id
INNER JOIN purchase ON purchase.id = purchase_items.purchase_id
WHERE
purchase.id = 200
GROUP by
purchase_items.item_id
) as t
) AS p_values;
Output:
array_to_json
[{"item_id":1,"purchase_items_ids":[1234],"total_sold":1,"discount_amount":12}]
db<>fiddle here
The core problem is that your LEFT JOIN multiplies rows. See:
Two SQL LEFT JOINS produce incorrect result
Aggregate discounts to a single row before the join. Or use a (uncorrelated) subquery expression:
SELECT json_agg(items)
FROM (
SELECT pi.item_id
, array_agg(pi.id) AS purchase_items_ids
, sum(pi.sold) AS total_sold
,(SELECT COALESCE(sum(pd.discount_amount), 0)
FROM purchase_discounts pd
WHERE pd.purchase_id = 200) AS discount_amount
FROM purchase_items pi
WHERE pi.purchase_id = 200
GROUP BY 1
) AS items;
Result:
[{"item_id":1,"purchase_items_ids":[1234],"total_sold":1,"discount_amount":12}]
db<>fiddle here
I added a couple of additional improvements:
Assuming referential integrity enforced by FK constraints, we don't need to involve the tables purchase and items at all.
Removed a subquery level doing nothing.
Using json_agg() instead of array_to_json(array_agg()).
Added COALESCE() to output 0 instead or NULL for no discounts.
Since discounts apply to the purchase in your model, not to individual items, it doesn't make sense to output discount_amount for every single item. Consider this query instead to return an array of items and a single, separate discount_amount:
SELECT json_build_object(
'items'
, json_agg(items)
, 'discount_amount'
, (SELECT COALESCE(sum(pd.discount_amount), 0)
FROM purchase_discounts pd
WHERE pd.purchase_id = 200)
)
FROM (
SELECT pi.item_id
, array_agg(pi.id) AS purchase_items_ids
, sum(pi.sold) AS total_sold
FROM purchase_items pi
WHERE pi.purchase_id = 200
GROUP BY 1
) AS items;
Result:
{"items" : [{"item_id":1,"purchase_items_ids":[1234],"total_sold":1}], "discount_amount" : 12}
db<>fiddle here
Using json_build_object() to assemble the JSON object.
Your example with a single item in the purchase isn't too revealing. I added a purchase with multiple items and no discount to my fiddle.
If you can have multiple values only in the purchase_discounts table then a subquery that aggregate multiple purchase_discounts rows into one before the join can solve the problem:
SELECT array_to_json(array_agg(p_values)) FROM
(
SELECT t.item_id, t.purchase_items_ids, t.total_sold, t.discount_amount FROM
(
SELECT purchase_items.item_id AS item_id,
ARRAY_AGG(purchase_items.id) AS purchase_items_ids,
SUM(purchase_items.sold) as total_sold,
X.discount_amount
FROM items
INNER JOIN purchase_items ON purchase_items.item_id = items.id
INNER JOIN purchase ON purchase.id = purchase_items.purchase_id
LEFT JOIN (SELECT purchase_id, sum(purchase_discounts.discount_amount) AS discount_amount FROM purchase_discounts GROUP BY purchase_id) X ON X.purchase_id = purchase.id
WHERE
purchase.id = 200
GROUP by
purchase_items.item_id,
X.discount_amount
) as t
INNER JOIN items i ON i.id = t.item_id
) AS p_values;
The LEFT JOIN is not causing your duplicates, I understand why you need it as there may not be any discounts, but for the data provided changing to an inner join produces the same result. You are getting duplicate entries because you use ARRAY_AGG(purchase_items.id). Further, with the data presented, the tables item and purchase are not necessary. You can use the window version of sum and distinct on to reduce the duplication of purchase_id, and eliminate the mentioned tables. Finally the middle select ... ) t can be completely removed. Resulting in: (see demo)
select array_to_json(array_agg(p_values))
from (select distinct on (pi.item_id, pi.id)
pi.item_id
, pi.id purchase_items_ids
, sum(pi.sold) over (partition by pi.item_id) total_sold
, sum(pd.discount_amount) over(partition by pi.item_id) discount_amount
from purchase_items pi
left join purchase_discounts pd
on pd.purchase_id = pi.purchase_id
order by pi.item_id, pi.id
) as p_values;
I think the left join does not cause, because with the Inner Join query result same as the left join, in discount with purchase_id=200 query has 2 results you can use from row_number with the partion_by same as:
ROW_NUMBER() OVER(PARTITION BY purchase_items.id order by purchase_items.id) rn
then select rn=1.
you change your query for the sum function, I think that you can use from partion_by.

SQL dividing a count from one table by a number from a different table

I am struggling with taking a Count() from one table and dividing it by a correlating number from a different table in Microsoft SQL Server.
Here is a fictional example of what I'm trying to do
Lets say I have a table of orders. One column in there is states.
I have a second table that has a column for states, and second column for each states population.
I'd like to find the order per population for each sate, but I have struggled to get my query right.
Here is what I have so far:
SELECT Orders.State, Count(*)/
(SELECT StatePopulations.Population FROM Orders INNER JOIN StatePopulations
on Orders.State = StatePopulations.State
WHERE Orders.state = StatePopulations.State )
FROM Orders INNER JOIN StatePopulations
ON Orders.state = StatePopulations.State
GROUP BY Orders.state
So far I'm contending with an error that says my sub query is returning multiple results for each state, but I'm newer to SQL and don't know how to overcome it.
If you really want a correlated sub-query, then this should do it...
(You don't need to join both table in either the inner or outer query, the correlation in the inner query's where clause does the 'join'.)
SELECT
Orders.state,
COUNT(*) / (SELECT population FROM StatePopulation WHERE state = Orders.state)
FROM
Orders
GROUP BY
Orders.state
Personally, I'd just join them and use MAX()...
SELECT
Orders.state,
COUNT(*) / MAX(StatePopulation.population)
FROM
Orders
INNER JOIN
StatePopulation
StatePopulation.state = Orders.state
GROUP BY
Orders.state
Or aggregate your orders before you join...
SELECT
Orders.state,
Orders.order_count / StatePopulation.population
FROM
(
SELECT
Orders.state,
COUNT(*) AS order_count
FROM
Orders
GROUP BY
Orders.state
)
Orders
INNER JOIN
StatePopulation
StatePopulation.state = Orders.state
(Please forgive typos and smelling pistakes, I'm doing this on a phone.)

Select sum and inner join

I have two tables
Bills: id amount reference
Transactions: id reference amount
The following SQL query
SELECT
*,
(SELECT SUM(amount)
FROM transactions
WHERE transactions.reference = bils.reference) AS paid
FROM bills
GROUP BY id HAVING paid<amount
was meant to some rows from table Bills, adding a column paid with the sum of amount of related transactions.
However, it only works when there is at least one transaction for each bill. Otherwise, no line for a transaction-less bill is returned.
Probably, that's because I should have done an inner join!
So I try the following:
SELECT
*,
(SELECT SUM(transactions.amount)
FROM transactions
INNER JOIN bills ON transactions.reference = bills.reference) AS paid
FROM bills
GROUP BY id
HAVING paid < amount
However, this returns the same value of paid for all rows! What am I doing wrong ?
Use a left join instead of a subquery:
select b.id, b.amount, b.paid, sum(t.amount) as transactionamount
from bills b
left join transactions t on t.reference = b.reference
group by b.id, b.amount, b.paid
having b.paid < b.amount
Edit:
To compare the sum of transactions to the amount, handle the null value that you get when there are no transactions:
having isnull(sum(t.amount), 0) < b.amount
You need a RIGHT JOIN to include all bill rows.
EDIT
So the final query will be
SELECT
*,
(SELECT SUM(transactions.amount)
FROM transactions
WHERE transactions.reference = bills.reference) AS paid
FROM bills
WHERE paid < amount
I knows this thread is old, but I came here today because I encountering the same problem.
Please see another post with same question:
Sum on a left join SQL
As the answer says, use GROUP BY on the left table. This way you get all the records out from left table, and it sums the corresponding rows from right table.
Try to use this:
SELECT
*,
SUM(transactions.sum)
FROM
bills
RIGHT JOIN
transactions
ON
bills.reference = transactions.reference
WHERE
transactions.sum > 0
GROUP BY
bills.id

Select SUM from multiple tables

I keep getting the wrong sum value when I join 3 tables.
Here is a pic of the ERD of the table:
(Original here: http://dl.dropbox.com/u/18794525/AUG%207%20DUMP%20STAN.png )
Here is the query:
select SUM(gpCutBody.actualQty) as cutQty , SUM(gpSewBody.quantity) as sewQty
from jobOrder
inner join gpCutHead on gpCutHead.joNum = jobOrder.joNum
inner join gpSewHead on gpSewHead.joNum = jobOrder.joNum
inner join gpCutBody on gpCutBody.gpCutID = gpCutHead.gpCutID
inner join gpSewBody on gpSewBody.gpSewID = gpSewHead.gpSewID
If you are only interested in the quantities of cuts and sews for all orders, the simplest way to do it would be like this:
select (select SUM(gpCutBody.actualQty) from gpCutBody) as cutQty,
(select SUM(gpSewBody.quantity) from gpSewBody) as sewQty
(This assumes that cuts and sews will always have associated job orders.)
If you want to see a breakdown of cuts and sews by job order, something like this might be preferable:
select joNum, SUM(actualQty) as cutQty, SUM(quantity) as sewQty
from (select joNum, actualQty, 0 as quantity
from gpCutBody
union all
select joNum, 0 as actualQty, quantity
from gpSewBody) sc
group by joNum
Mark's approach is a good one. I want to suggest the alternative of doing the group by's before the union, simply because this can be a more general approach for summing along multiple dimensions.
Your problem is that you have two dimensions that you want to sum along, and you are getting a cross product of the values in the join.
select joNum, act.quantity as ActualQty, q.quantity as Quantity
from (select joNum, sum(actualQty) as quantity
from gpCutBody
group by joNum
) act full outer join
(select joNum, sum(quantity) as quantity
from gpSewBody
group by joNum
) q
on act.joNum = q.joNum
(I have kept Mark's assumption that doing this by joNum is the desired output.)

SQL Selecting Distinct Count of items where 2 conditions are met

I am struggling to get a DISTINCT COUNT working with SQL DISTINCT SELECT
Not sure if I should even be using distinct here, but I have got it correct using a subquery, though it is very heavy processing wise.
This query does what I ultimately want results wise (without the weight)
SELECT DISTINCT
product_brandNAME,
product_classNAME,
(SELECT COUNT(productID) FROM products
WHERE products.product_classID = product_class.product_classID
AND products.product_brandID = product_brand.product_brandID) as COUNT
FROM products
JOIN product_brand
JOIN product_class
ON products.product_brandID = product_brand.product_brandID
AND products.product_classID = product_class.product_classID
GROUP BY productID
ORDER BY product_brandNAME
This gets close, and is much more efficient, but I can't get the count working, it only counts (obviously) the distinct count which is 1.
SELECT DISTINCT product_brandNAME, product_classNAME, COUNT(*) as COUNT
FROM products
JOIN product_brand
JOIN product_class
ON products.product_brandID = product_brand.product_brandID
AND products.product_classID = product_class.product_classID
GROUP BY productID
ORDER BY product_brandNAME
Any suggestions, I'm sure its small, and have been researching the net for hours for an answer to no avail for 2 conditions to match.
Thanks,
Have you tried following query
Edit
SELECT product_brandNAME
, product_classNAME
, COUNT(*)
FROM products
JOIN product_brand ON products.product_brandID = product_brand.product_brandID
JOIN product_class ON products.product_classID = product_class.product_classID
GROUP BY
product_brandNAME
, product_classNAME
When using GROUP BY you do not need to use a DISTINCT clause. Try the following:
SELECT productID,
product_brandNAME,
product_classNAME,
COUNT(*) as COUNT
FROM products JOIN product_brand ON products.product_brandID = product_brand.product_brandID
JOIN product_class ON products.product_classID = product_class.product_classID
GROUP BY productID,
product_brandNAME,
product_classNAME
ORDER BY product_brandNAME