Issue with SUM with two LEFT JOINs - sql

In the query below I have trouble when I have two left joins. What happens is the sums are incorrect (elevated) with both left joins. When I remove the second left join, the query runs correctly. How can I run this query with the second left join?
SELECT o.IdNoLocation,
COUNT(DISTINCT o.EncodedId) AS "Frequency",
ISNULL(SUM(o.Subtotal), 0) AS "Spend",
ISNULL(SUM(os.Amount), 0) AS "Surcharges",
ISNULL(SUM(od.Amount), 0) AS "Discounts"
FROM ((tblOrder o
LEFT JOIN tblOrderSurcharge os ON o.OrderId=os.OrderId)
LEFT JOIN tblOrderDiscount od ON o.OrderId=od.OrderId)
WHERE
o.BusinessDate BETWEEN '2014-01-01' AND '2014-12-31'
AND o.IdNoLocation <> 'X'
GROUP BY o.IdNoLocation

This is the expected result when there is more than one related row in either tblOrderSurcharge or tblOrderDiscount... you've got an implicit semi-cross join between the rows from those two tables.
There's a couple of approaches to "fixing" this, if you need the totals returned in a single statement.
One option is to use inline views to perform the totals, and then combine the rows. Essentially, run separate queries to get the totals from each table, and then combine the results on a unique key. For example:
SELECT o.IdNoLocation
, o.Frequency
, ISNULL(o.Spend, 0) AS Spend
, ISNULL(s.Surcharges), 0) AS Surcharges
, ISNULL(s.Discounts), 0) AS Discounts
FROM ( SELECT ooo.IdNoLocation
, COUNT(DISTINCT ooo.EncodedId) AS Frequency
, SUM(ooo.Subtotal) AS Spend
FROM tblOrder ooo
WHERE ooo.BusinessDate BETWEEN '2014-01-01' AND '2014-12-31'
AND ooo.IdNoLocation <> 'X'
GROUP BY ooo.IdNoLocation
) o
LEFT
JOIN ( SELECT oso.IdNoLocation
, SUM(oss.Amount) AS Surcharges
FROM tblOrderSurcharge oss
JOIN tblOrder oso
ON oso.OrderId = oss.OrderId
WHERE oso.BusinessDate BETWEEN '2014-01-01' AND '2014-12-31'
AND oso.IdNoLocation <> 'X'
GROUP BY oso.IdNoLocation
) s
ON s.IdNoLocation = o.IdNoLocation
LEFT
JOIN ( SELECT odo.IdNoLocation
, SUM(odd.Amount) AS Discounts
FROM tblOrderDiscount odd
JOIN tblOrder odo
ON odo.OrderId = odd.OrderId
WHERE odo.BusinessDate BETWEEN '2014-01-01' AND '2014-12-31'
AND odo.IdNoLocation <> 'X'
GROUP BY odo.IdNoLocation
) d
ON d.IdNoLocation = o.IdNoLocation
(This isn't tested, just desk checked. Run each of the inline view queries (o, s, d) and verify the results from each of those queries is as you expect. Then, run the entire query, to combine the rows... in the outer query, we'll do the "outer" joins and deal with the "missing" rows and replacing NULLs with zeros)
Another option is to use correlated subqueries in the SELECT list.

To see why this doesn't work, run this query:
SELECT o.IdNoLocation,
o.EncodedId AS "Frequency",
o.Subtotal AS "Spend",
os.Amount AS "Surcharges",
od.Amount AS "Discounts"
FROM ((tblOrder o
LEFT JOIN tblOrderSurcharge os ON o.OrderId=os.OrderId)
LEFT JOIN tblOrderDiscount od ON o.OrderId=od.OrderId)
WHERE
o.BusinessDate BETWEEN '2014-01-01' AND '2014-12-31'
AND o.IdNoLocation <> 'X'
The 'correct' answer is to use one table for both surcharges and discounts. I would assume that's no longer an option, but you can fake it with a view that performs a UNION ALL between the two tables.

Related

SQL subquery with join to main query

I have this:
SELECT
SU.FullName as Salesperson,
COUNT(DS.new_dealsheetid) as Units,
SUM(SO.New_profits_sales_totaldealprofit) as TDP,
SUM(SO.New_profits_sales_totaldealprofit) / COUNT(DS.new_dealsheetid) as PPU,
-- opportunities subquery
(SELECT COUNT(*) FROM Opportunity O
LEFT JOIN Account A ON O.AccountId = A.AccountId
WHERE A.OwnerId = SU.SystemUserId AND
YEAR(O.CreatedOn) = 2022)
-- /opportunities subquery
FROM New_dealsheet DS
LEFT JOIN SalesOrder SO ON DS.New_DSheetId = SO.SalesOrderId
LEFT JOIN New_salespeople SP ON DS.New_SalespersonId = SP.New_salespeopleId
LEFT JOIN SystemUser SU ON SP.New_SystemUserId = SU.SystemUserId
WHERE
YEAR(SO.New_purchaseordersenddate) = 2022 AND
SP.New_SalesGroupIdName = 'LO'
GROUP BY SU.FullName
I'm getting an error from the subquery:
Column 'SystemUser.SystemUserId' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Is it possible to use the SystemUser table join from the main query in this way?
As has been mentioned extensively in the comments, the error is actually telling you the problem; SU.SystemUserId isn't in the GROUP BY nor in an aggregate function, and it appears in the SELECT of the query (albeit in the WHERE of a correlated subquery). Any columns in the SELECT must be either aggregated or in the GROUP BY when using one of the other for a query scope. As the column in question isn't aggregated nor in the GROUP BY, the error occurs.
There are, however, other problems. Like mentioned ikn the comments too, your LEFT JOINs make little sense, as many of the tables you LEFT JOIN to require a column in that table to have a non-NULL value; it is impossible for a column to have a non-NULL value if a row was not found.
You also use syntax like YEAR(<Column Name>) = <int Value> in the WHERE; this is not SARGable, and thus should be avoided. Use explicit date boundaries instead.
I assume here that SU.SystemUserId is a primary key, and so should be in the GROUP BY. This is probably a good thing anyway, as a person's full name isn't something that can be used to determine who a person is on their own (take this from someone who shared their name, date of birth and post code with another person in their youth; it caused many problems on the rudimentary IT systems of the time). This results in a query like this:
SELECT SU.FullName AS Salesperson,
COUNT(DS.new_dealsheetid) AS Units,
SUM(SO.New_profits_sales_totaldealprofit) AS TDP,
SUM(SO.New_profits_sales_totaldealprofit) / COUNT(DS.new_dealsheetid) AS PPU,
(SELECT COUNT(*)
FROM dbo.Opportunity O
JOIN dbo.Account A ON O.AccountId = A.AccountId --A.OwnerID must have a non-NULL value, so why was this a LEFT JOIN?
WHERE A.OwnerId = SU.SystemUserId
AND O.CreatedOn >= '20220101' --Don't use YEAR(<Column Name>) = <int Value> syntax, it isn't SARGable
AND O.CreatedOn < '20230101') AS SomeColumnAlias
FROM dbo.New_dealsheet DS
JOIN dbo.SalesOrder SO ON DS.New_DSheetId = SO.SalesOrderId --SO.New_purchaseordersenddate must have a non-NULL value, so why was this a LEFT JOIN?
JOIN dbo.New_salespeople SP ON DS.New_SalespersonId = SP.New_salespeopleId --SP.New_SalesGroupIdName must have a non-NULL value, so why was this a LEFT JOIN?
LEFT JOIN dbo.SystemUser SU ON SP.New_SystemUserId = SU.SystemUserId --This actually looks like it can be a LEFT JOIN.
WHERE SO.New_purchaseordersenddate >= '20220101' --Don't use YEAR(<Column Name>) = <int Value> syntax, it isn't SARGable
AND SO.New_purchaseordersenddate < '20230101'
AND SP.New_SalesGroupIdName = 'LO'
GROUP BY SU.FullName,
SU.SystemUserId;
Doing such a sub-query is bad on a performance-wise point of view
better do it like this:
SELECT
SU.FullName as Salesperson,
COUNT(DS.new_dealsheetid) as Units,
SUM(SO.New_profits_sales_totaldealprofit) as TDP,
SUM(SO.New_profits_sales_totaldealprofit) / COUNT(DS.new_dealsheetid) as PPU,
SUM(csq.cnt) as Count
FROM New_dealsheet DS
LEFT JOIN SalesOrder SO ON DS.New_DSheetId = SO.SalesOrderId
LEFT JOIN New_salespeople SP ON DS.New_SalespersonId = SP.New_salespeopleId
LEFT JOIN SystemUser SU ON SP.New_SystemUserId = SU.SystemUserId
-- Moved subquery as sub-join
LEFT JOIN (SELECT a.OwnerId, YEAR(o.CreatedOn) as year, COUNT(*) cnt FROM Opportunity O
LEFT JOIN Account A ON O.AccountId = A.AccountId
GROUP BY a.OwnerId, YEAR(o.CreatedOn) as csq ON csq.OwnerId = su.SystemUserId and csqn.Year = 2022
WHERE
YEAR(SO.New_purchaseordersenddate) = 2022 AND
SP.New_SalesGroupIdName = 'LO'
GROUP BY SU.FullName
So you have a nice join and a clean result
The query above is untested

LEFT JOIN not keeping only records that occur in a SELECT query

I have the following SQL select statement that I use to get a subset of products, or wines:
SELECT pv.SkProdVariantId AS id,
pa.Colour AS colour,
FROM Dim.ProductVariant AS pv
JOIN ProductAttributes_new AS pa
ON pv.SkProdVariantId = pa.SkProdVariantId
WHERE pv.ProdTypeName = 'Wines'
The length of this table generated is 3,905. I want to get all the transactional data for these products.
At the moment I'm using this select statement
SELECT c.CalDate AS timestamp,
f.SkProductVariantId AS sku_id,
f.Quantity AS quantity
FROM fact.FTransactions AS f
LEFT JOIN Dim.Calendar AS c
ON f.SkDateId = c.SkDateId
LEFT JOIN (
SELECT pv.SkProdVariantId AS id,
pa.Colour AS colour,
FROM Dim.ProductVariant AS pv
JOIN ProductAttributes_new AS pa
ON pv.SkProdVariantId = pa.SkProdVariantId
WHERE pv.ProdTypeName = 'Wines'
) AS s
ON s.id = f.SkProductVariantId
WHERE c.CalDate LIKE '%2019%'
The calendar dates are correct, but the number of unique products returned is 5,648, rather than the expected 3,905 from the select query.
Why does my LEFT JOIN on the first select query not work as I expect it to, please?
Thanks for any help!
If you want all the rows form your query, it needs to be the first reference in the LEFT JOIN. Then, I am guessing that you want transaction in 2019:
select . . .
from (SELECT pv.SkProdVariantId AS id, pa.Colour AS colour,
FROM Dim.ProductVariant pv JOIN
ProductAttributes_new pa
ON pv.SkProdVariantId = pa.SkProdVariantId
WHERE pv.ProdTypeName = 'Wines'
) s LEFT JOIN
(fact.FTransactions f JOIN
Dim.Calendar c
ON f.SkDateId = c.SkDateId AND
c.CalDate >= '2019-01-01' AND
c.CalDate < '2020-01-01'
)
ON s.id = f.SkProductVariantId;
Note that this assumes that CalDate is really a date and not a string. LIKE should only be used on strings.
You misunderstand somehow how outer joins work. See Gordon's answer and my request comment on that.
As to the task: It seems you want to select transactions of 2019, but you want to restrict your results to wine products. We typically restrict query results in the WHERE clause. You can use IN or EXISTS for that.
SELECT
c.CalDate AS timestamp,
f.SkProductVariantId AS sku_id,
f.Quantity AS quantity
FROM fact.FTransactions AS f
INNER JOIN Dim.Calendar AS c ON f.SkDateId = c.SkDateId
WHERE DATEPART(YEAR, c.CalDate) = 2019
AND f.SkProductVariantId IN
(
SELECT pv.SkProdVariantId
FROM Dim.ProductVariant AS pv
WHERE pv.ProdTypeName = 'Wines'
);
(I've removed the join to ProductAttributes_new, because it doesn't seem to play any part in this query.)

Rows which appeared only once in query results

I want to have rows which appeared in 25th of February and did not appear in 6th of March. I've tried to do such query:
SELECT favfd.day, dv.title, array_to_string(array_agg(distinct "host_name"), ',') AS affected_hosts , htmltoText(dsol.fix),
count(dv.title) FROM fact_asset_vulnerability_finding_date favfd
INNER JOIN dim_vulnerability dv USING(vulnerability_id)
INNER JOIN dim_asset USING(asset_id)
INNER JOIN dim_vulnerability_solution USING(vulnerability_id)
INNER JOIN dim_solution dsol USING(solution_id)
INNER JOIN dim_solution_highest_supercedence dshs USING (solution_id)
WHERE (favfd.day='2018-02-25' OR favfd.day='2018-03-06') AND
dsol.solution_type='PATCH' AND dshs.solution_id=dshs.superceding_solution_id
GROUP BY favfd.day, dv.title, host_name, dsol.fix
ORDER BY favfd.day, dv.title
which gave me following results:
results
I've read that I need to add something like "HAVING COUNT(*)=1" but like you can see in query results my count columns looks quite weird. Here is my results with that line added:
results with having
Can you advice me what I am doing wrong?
One way is to use a HAVING clause to assert your date criteria:
SELECT
dv.title,
array_to_string(array_agg(distinct "host_name"), ',') AS affected_hosts,
htmltoText(dsol.fix),
count(dv.title)
FROM fact_asset_vulnerability_finding_date favfd
INNER JOIN dim_vulnerability dv USING(vulnerability_id)
INNER JOIN dim_asset USING(asset_id)
INNER JOIN dim_vulnerability_solution USING(vulnerability_id)
INNER JOIN dim_solution dsol USING(solution_id)
INNER JOIN dim_solution_highest_supercedence dshs USING (solution_id)
WHERE
dsol.solution_type = 'PATCH' AND
dshs.solution_id = dshs.superceding_solution_id
GROUP BY dv.title, dsol.fix
HAVING
SUM(CASE WHEN favfd.day = '2018-02-25' THEN 1 ELSE 0 END) > 0 AND
SUM(CASE WHEN favfd.day = '2018-03-06' THEN 1 ELSE 0 END) = 0
ORDER BY favfd.day, dv.title
The only major changes I made were to remove host_name from the GROUP BY clause, because you use this column in aggregate, in the select clause. And I added the HAVING clause to check your logic.
The change you should make is to replace your implicit join syntax with explicit joins. Putting join criteria into the WHERE clause is considered bad practice nowadays.

SQL sum aggregate function gives wrong results

My data is given below
Right answer is Sum = 601,050.00
But SQL sum aggregate function gives me wrong answer that is 5078150.00000
15,000.00 27,950.00 24,750.00 11,550.00 7,400.00 7,500.00 14,650.00 12,500.00 32,800.00 35,700.00 94,100.00 10,100.00 19,700.00 22,100.00 35,450.00 28,050.00 50,150.00 69,750.00 13,800.00 3,600.00 18,600.00 2,350.00 7,200.00 21,600.00 7,700.00 4,500.00 2,500.00
select sum(SO_SalesOrder.OrderTotal),l.Name as [Store Name]
From SO_SalesOrder inner join BASE_Location l on
SO_SalesOrder.LocationId = l.LocationId
inner join SO_SalesOrder_Line on SO_SalesOrder.SalesOrderId =
SO_SalesOrder_Line.SalesOrderId
inner join BASE_Product on BASE_Product.ProdId =
SO_SalesOrder_Line.ProdId
inner join BASE_Category on BASE_Category.CategoryId =
BASE_Product.CategoryId
where SO_SalesOrder.OrderDate >= '2018-02-01' and
SO_SalesOrder.OrderDate <= '2018-02-28' and BASE_Category.Name = '1MHNZ'
group by l.Name
There is likely to be a problem with one (or more) of your joins, maybe you have duplicate rows or the joining conditions are not OK.
Remove the group by l.Name, the SUM() aggregate and see if the returned values for SO_SalesOrder.OrderTotal are what you are expecting them to be (you might need to filter with a particular l.Name in a WHERE clause). It's very likely you will see duplicate amounts, or amounts you are not considering when arriving to the value 601,050.00.
If so, try joining the tables 1 by 1 and see which ones are making your rows go comando.
In my opinion your problem depends on the logic of the query.
You have a master-detail relationship between SO_SalesOrder and SO_SalesOrder_line joined by SalesOrderId column.
So if you have three lines in your order you will sum up three times the same OrderTotal.
try with something like this:
select sum(SO_SalesOrder.OrderTotal) Total, l.Name as [Store Name]
From SO_SalesOrder
join BASE_Location l on SO_SalesOrder.LocationId = l.LocationId
where SO_SalesOrder.OrderDate >= '2018-02-01' and SO_SalesOrder.OrderDate <= '28-02-2018'
and exists (
select 0 x
From SO_SalesOrder_Line
join BASE_Product on BASE_Product.ProdId = SO_SalesOrder_Line.ProdId
join BASE_Category on BASE_Category.CategoryId = BASE_Product.CategoryId
where BASE_Category.Name = '1MHNZ'
and SO_SalesOrder_Line.SalesOrderId = SO_SalesOrder.SalesOrderId
)
group by l.Name
P.S.
Check also the dates columns, if they contains also time fraction you should reconsider your upper bound filter.
I suggest you to use and SO_SalesOrder.OrderDate < '01-03-2018' instead of <= 28-02

SQL query that uses a GROUP BY and IN is too slow

I am struggling to speed this SQL query up. I have tried removing all the fields besides the two SUM() functions and the Id field but it is still incredibly slow. It is currently taking 15 seconds to run. Does anyone have any suggestions to speed this up as it is currently causing a timeout on a page in my web app. I need the fields shown so I can't really remove them but there surely has to be a way to improve this?
SELECT [Customer].[iCustomerID],
[Customer].[sCustomerSageCode],
[Customer].[sCustomerName],
[Customer].[sCustomerTelNo1],
SUM([InvoiceItem].[fQtyOrdered]) AS [Quantity],
SUM([InvoiceItem].[fNetAmount]) AS [Value]
FROM [dbo].[Customer]
LEFT JOIN [dbo].[CustomerAccountStatus] ON ([Customer].[iAccountStatusID] = [CustomerAccountStatus].[iAccountStatusID])
LEFT JOIN [dbo].[SalesOrder] ON ([SalesOrder].[iCustomerID] = [dbo].[Customer].[iCustomerID])
LEFT JOIN [Invoice] ON ([Invoice].[iCustomerID] = [Customer].[iCustomerID])
LEFT JOIN [dbo].[InvoiceItem] ON ([Invoice].[iInvoiceNumber] = [InvoiceItem].[iInvoiceNumber])
WHERE ([InvoiceItem].[sNominalCode] IN ('4000', '4001', '4002', '4004', '4005', '4006', '4007', '4010', '4015', '4016', '700000', '701001', '701002', '701003'))
AND( ([dbo].[SalesOrder].[dOrderDateTime] >= '2013-01-01')
OR ([dbo].[Customer].[dDateCreated] >= '2014-01-01'))
GROUP BY [Customer].[iCustomerID],[Customer].[sCustomerSageCode],[Customer].[sCustomerName], [Customer].[sCustomerTelNo1];
I don't think this query is doing what you want anyway. As written, there are no relationships between the Invoice table and the SalesOrder table. This leads me to believe that it is producing a cartesian product between invoices and orders, so customers with lots of orders would be generating lots of unnecessary intermediate rows.
You can test this by removing the SalesOrder table from the query:
SELECT c.[iCustomerID], c.[sCustomerSageCode], c.[sCustomerName], c.[sCustomerTelNo1],
SUM(it.[fQtyOrdered]) AS [Quantity], SUM(it.[fNetAmount]) AS [Value]
FROM [dbo].[Customer] c LEFT JOIN
[dbo].[CustomerAccountStatus] cas
ON c.[iAccountStatusID] = cas.[iAccountStatusID] LEFT JOIN
[Invoice] i
ON (i.[iCustomerID] = c.[iCustomerID]) LEFT JOIN
[dbo].[InvoiceItem] it
ON (i.[iInvoiceNumber] = it.[iInvoiceNumber])
WHERE it.[sNominalCode] IN ('4000', '4001', '4002', '4004', '4005', '4006', '4007', '4010', '4015', '4016', '700000', '701001', '701002', '701003') AND
c.[dDateCreated] >= '2014-01-01'
GROUP BY c.[iCustomerID], c.[sCustomerSageCode], c.[sCustomerName], c.[sCustomerTelNo1];
If this works and you need the SalesOrder, then you will need to either pre-aggregate by SalesOrder or find better join keys.
The above query could benefit from an index on Customer(dDateCreated, CustomerId).
You have a lot of LEFT JOIN
I don't see CustomerAccountStatus usage. Ou can exclude it
The [InvoiceItem].[sNominalCode] could be null in case of LEFT JOIN so add [InvoiceItem].[sNominalCode] is not null or <THE IN CONDITION>
Also add the is not null checks to other conditions
It seems you are looking for customers that are either created this year or for which sales orders exist from last year or this year. So select from customers, but use EXISTS on SalesOrder. Then you want to count invoices. So outer join them and make sure to have the criteria in the ON clause. (sNominalCode will be NULL for any outer joined records. Hence asking for certain sNominalCode in the WHERE clause will turn your outer join into an inner join.)
SELECT
c.iCustomerID,
c.sCustomerSageCode,
c.sCustomerName,
c.sCustomerTelNo1,
SUM(ii.fQtyOrdered) AS Quantity,
SUM(ii.fNetAmount) AS Value
FROM dbo.Customer c
LEFT JOIN dbo.Invoice i ON (i.iCustomerID = c.iCustomerID)
LEFT JOIN dbo.InvoiceItem ii ON (ii.iInvoiceNumber = i.iInvoiceNumber AND ii.sNominalCode IN ('4000', '4001', '4002', '4004', '4005', '4006', '4007', '4010', '4015', '4016', '700000', '701001', '701002', '701003'))
WHERE c.dDateCreated >= '2014-01-01'
OR EXISTS
(
SELECT *
FROM dbo.SalesOrder
WHERE iCustomerID = c.iCustomerID
AND dOrderDateTime >= '2013-01-01'
)
GROUP BY c.iCustomerID, c.sCustomerSageCode, c.sCustomerName, c.sCustomerTelNo1;