SQL Server Left Join producing duplicates - sql

I have 3 tables, Budgets, Income, and Expenses.
Budget table:
Income table:
Expenses table:
This is my SQL statement:
SELECT
Budgets.BudgetID, Budgets.BudgetName, Budgets.Username_FK,
Budgets.BudgetAmount, Budgets.SavePercentage,
Expenses.ExpensesID, Expenses.ExpensesAmount, Expenses.ExpensesCategory,
Income.IncomeID, Income.IncomeAmount, Income.IncomeCategory
FROM
Budgets
LEFT JOIN
Income ON Budgets.BudgetID = Income.BudgetID_FK
LEFT JOIN
Expenses ON Budgets.BudgetID = Expenses.BudgetID_FK
WHERE
BudgetName = '2019
And the results are as follows:
Based on my Income table, there is only 1 record tied to BudgetID = 3, but in the left join, it duplicates.
Ideally, I would want it to return "null" on the duplicates. How do I do this?

You have several rows in expenses per budgetID, so your join produces that many rows. I tend to suspect that the same situation could happen with income too.
If you want one row per budgetID, then one option is pre-aggregation and a left join (or outer apply). Say you want the total expense and income per budget, you would do:
select b.*, e.expenseAmount, i.amountAmount
from budgets b
left join (
select budgetID_FK, sum(expenseAmount) expenseAmount
from expenses
group by budgetID_FK
) e on e.budgetID_FK = b.budgetID
left join (
select budgetID_FK, sum(incomeAmount) incomeAmount
from income
group by budgetID_FK
) i on i.budgetID_FK = b.budgetID
Now you are grouping rows in the dependent tables by budgetID, so you can't see the other columns of these tables, such as incomeCategory or expenseCategory (which have multiple values per budgetID).

For budgetID 3 , you have 4 expenses( distinct expense IDs) of 50.00 in your expense table for the same user. I think what you would want is the total expenses of same category and budget ID i.e budgetID 3 in this case
SELECT
Budgets.BudgetID, Budgets.BudgetName, Budgets.Username_FK,
Budgets.BudgetAmount, Budgets.SavePercentage,
Income.IncomeID, Income.IncomeAmount, Income.IncomeCategory,
ex.ExpensesCategory,
ex.Total_ExpensesAmount,
FROM Budgets
LEFT JOIN Income ON Budgets.BudgetID = Income.BudgetID_FK
LEFT JOIN (Select BudgetID_FK, ExpensesCategory, SUM(ExpensesAmount) as Total_ExpensesAmount
FROM Expenses
GROUP BY BudgetID_FK, ExpensesCategory) ex ON Budgets.BudgetID = ex.BudgetID_FK
WHERE BudgetName = '2019'

Related

Redundant values in columns with aggregate function?

SELECT
SUM(total_amt_usd),
sales_rep_id,
sales_reps.name AS salesman,
region.name AS regionname
FROM orders
JOIN accounts
ON orders.account_id = accounts.id
JOIN sales_reps
ON accounts.sales_rep_id = sales_reps.id
JOIN region
ON sales_reps.region_id = region.id
Are the three columns redundant with the SUM aggregate function? SUM I believe has correctly performed its operation of summing all the values (total_amt_usd) in the first column.
SELECT Sum(total_amt_usd),
sales_rep_id,
sales_reps.NAME AS salesman,
region.NAME AS regionname
FROM orders
LEFT JOIN accounts
ON orders.account_id = accounts.id
LEFT JOIN sales_reps
ON accounts.sales_rep_id = sales_reps.id
LEFT JOIN region
ON sales_reps.region_id = region.id
GROUP BY sales_rep_id,
sales_reps.NAME,
region.NAME
They do not look redundant. It seems like a query to find total sales amount summed separately for each sales representative from the orders table. The orders table has the sales representative id, which is used to lookup the representatives name from the joined sales_reps table and that table, in turn has a region_id which is used to look up region name from the joined region table. The final result has rows with id, name, and region of a sales representative along with total amount of orders that sales representative has got.
I think the query should be more like the following for this explanation to make sense:
SELECT
SUM(total_amt_usd),
sales_rep_id,
sales_reps.name AS salesman,
region.name AS regionname
FROM orders
JOIN accounts
ON orders.account_id = accounts.id
JOIN sales_reps
ON accounts.sales_rep_id = sales_reps.id
JOIN region
ON sales_reps.region_id = region.id
GROUP BY sales_rep_id;

Select all rows from joined with multiple conditions tables with null values

I want to join two tables, Sales and Budget.
Sales table columns:
| Customer | Period | Sales |
Budget table columns:
| Customer | Period | SaleBudget |
Sales table has data for periods 1, 2, and 3. Budget has data for periods 1-12. When I try to run below query I get only data for periods from Sales table matched with Budget table. But my goal is to get all data from both tables. Could you give me a hint how to change query?
Select s.Customer, b.SaleBudget, s.Sales from Sales s
full outer join Budget b on b.Customer = s.Customer and b.Period = s.Period
When you use left join its join the rows that are same in
b.Customer = s.Customer and b.Period = s.Period
if you want have all of the rows you shouldn't use left join;
The LEFT JOIN keyword returns all records from the left table (table1), and the matching records from the right table (table2);
there is not any way that get some data matching and some data Inconsistency in one shape.

SQL - Create complete matrix with all variables even if null

please provide some assistance/guidance to solve the following:
I have 1 main table which indicates sales volumes by sales person per different product type.
Where a salesperson did not sell a particular product on a particular day, there is no record.
The intention is to create null value records for salesmen that did not sell a product on a specific day. The query must be dynamic as there are many more salesmen with sales over many days.
Thanks in advance
Just generate records for all sales persons, days, and products using cross join and then bring in the existing data:
select p.salesperson, d.salesdate, st.salestype,
coalesce(t.sales_volume, 0)
from (select distinct salesperson from t) p cross join
(select distinct salesdate from t) d cross join
(select distinct salestype from t) st left join
t
on t.salesperson = p.salesperson and
t.salesdate = d.salesdate and
t.salestype = st.salestype;
Note: You may have other tables that have lists of sales people, dates, and types -- and those can be used instead of the select distinct queries.

SQL 4 tables inner join pick up sum Nulls also?

I have 4 tables.... Employees, Customers, Orders and Order_Info. I am trying to inner join the 4 tables to sum up the order amounts and calculate the employees commission based on 7%. I am very close to solving this but I have one slight problem: I am not getting all of the employees because less are showing up than exist in my Employees table. This is how I currently have my query written:
SELECT Employees.lName, Employees.fName,
SUM(quantOrdered * costEach) AS ttl_orders_value,
(SUM(quantOrdered * costEach) * .07) AS Commission
FROM Customers
INNER JOIN Employees ON Customers.empNumber = Employees.empNumber
INNER JOIN Orders ON Customers.custNumber = Orders.custNumber
INNER JOIN Order_Info ON Orders.ordNumber = Order_Info.ordNumber
GROUP BY Employees.lName, Employees.fName
ORDER BY Employees.lName, Employees.fName
I wish to get all employees even if their commission and total sales equal zero, which I believe to be calculated from NULLS.
Any help improving my query would be greatly appreciated!
try this:
SELECT Employees.lName, Employees.fName,
SUM(ISNULL(quantityOrdered,0) * ISNULL(priceEach,0)) AS ttl_orders_value,
(SUM(ISNULL(quantOrdered,0) * ISNULL(costEach,0)) * .05) AS Commission
FROM Employees
LEFT JOIN Customers ON Customers.empNumber = Employees.empNumber
LEFT JOIN Orders
INNER JOIN OrderDetails ON Orders.ordNumber = OrderDetails.ordNumber
ON Customers.custNumber = Orders.custNumber
WHERE Employees.workTitle = 'Developer'
GROUP BY Employees.lName, Employees.fName
ORDER BY Employees.lName, Employees.fName
Note, I have changed INNER JOIN to LEFT JOIN only for Orders table because as you say Employees records should be there, they only may not have linked orders.
You may also need to wrap NULL priceEach, costEach, quantityOrdered and quantOrdered values with ISNULL([field_name], 0) to get proper results for those employees not having any orders.

SQL rewrite to optimize

I'm trying to optimize or change the SQL to work with inner joins rather than independent calls
Database: one invoice can have many payment records and order (products) records
Original:
SELECT
InvoiceNum,
(SELECT SUM(Orders.Cost) FROM Orders WHERE Orders.Invoice = InvoiceNum and Orders.Returned <> 1 GROUP BY Orders.Invoice) as vat_only,
(SELECT SUM(Orders.Vat) FROM Orders WHERE Orders.Invoice = InvoiceNum and Orders.Returned <> 1 GROUP BY Orders.Invoice) as sales_prevat,
(SELECT SUM(pay.Amount) FROM Payments as pay WHERE Invoices.InvoiceNum = pay.InvoiceNum ) as income
FROM
Invoices
WHERE
InvoiceYear = currentyear
I'm sure we can do this another way by grouping and joining tables together. When I tried the SQL statement below, I wasn't getting the same amount (count) of records...I'm thinking in respect to the type of join or where it joins !! but still couldn't get it working after 3 hrs of looking on the screen..
So far I got to...
SELECT
Invoices.InvoiceNum,
Sum(Orders.Cost) AS SumOfCost,
Sum(Orders.VAT) AS SumOfVAT,
SUM(distinct Payments.Amount) as money
FROM
Invoices
LEFT JOIN
Orders ON Orders.Invoice = Invoices.InvoiceNum
LEFT JOIN
Payments ON Invoices.InvoiceNum = Payments.InvoiceNum
WHERE
Invoices.InvoiceYear = 11
AND Orders.Returned <> 1
GROUP BY
Invoices.InvoiceNum
Sorry for the bad english and I'm not sure what to search for to find if it's already been answered here :D
Thanks in advance for all the help
Your problem is that an order has multiple lines for an invoice and it has multiple payments on an invoice (sometimes). This causes a cross product effect for a given order. You fix this by pre-summarizing the tables.
A related problem is that the join will fail if there are no payments, so you need left outer join.
select i.InvoiceNum, osum.cost, osum.vat, p.income
from Invoice i left outer join
(select o.Invoice, sum(o.Cost) as cost, sum(o.vat) as vat
from orders o
where Returned <> 1
group by o.Invoice
) osum
on osum.Invoice = i.InvoiceNum left outer join
(select p.InvoiceNum, sum(pay.Amount) as income
from Payments p
group by p.InvoiceNum
) psum
on psum.InvoiceNum = i.InvoiceNum
where i.InvoiceYear = year(getdate())
Two comments: Is the key field for orders really Invoice or is it also InvoiceNum? Also, do you have a field Invoice.InvoiceYear? Or do you want year(i.InvoiceDate) in the where clause?
Assuming that both payments and orders can contain more than one record per invoice you will need to do your aggregates in a subquery to avoid cross joining:
SELECT Invoices.InvoiceNum, o.Cost, o.VAT, p.Amount
FROM Invoices
LEFT JOIN
( SELECT Invoice, Cost = SUM(Cost), VAT = SUM(VAT)
FROM Orders
WHERE Orders.Returned <> 1
GROUP BY Invoice
) o
ON o.Invoice = Invoices.InvoiceNum
LEFT JOIN
( SELECT InvoiceNum, Amount = SUM(Amount)
FROM Payments
GROUP BY InvoiceNum
) P
ON P.InvoiceNum = Invoices.InvoiceNum
WHERE Invoices.InvoiceYear = 11;
ADDENDUM
To expand on the CROSS JOIN comment, imagine this data for an Invoice (1)
Orders
Invoice Cost VAT
1 15.00 3.00
1 10.00 2.00
Payments
InvoiceNum Amount
1 15.00
1 10.00
When you join these tables as you did:
SELECT Orders.*, Payments.Amount
FROM Invoices
LEFT JOIN Orders
ON Orders.Invoice = Invoices.InvoiceNum
LEFT JOIN Payments
ON Invoices.InvoiceNum = Payments.InvoiceNum;
You end up with:
Orders.Invoice Orders.Cost Orders.Vat Payments.Amount
1 15.00 3.00 15.00
1 10.00 2.00 15.00
1 15.00 3.00 10.00
1 10.00 2.00 10.00
i.e. every combination of payments/orders, so for each invoice you would get many more rows than required, which distorts your totals. So even though the original data had £25 of payments, this doubles to £50 because of the two records in the order table. This is why each table needs to be aggregated individually, using DISTINCT would not work in the case there was more than one payment/order for the same amount on a single invoice.
One final point with regard to optimisation, you should probably index your tables, If you run the query and display the actual execution plan SSMS will suggest indexes for you, but at a guess the following should improve the performance:
CREATE NONCLUSTERED INDEX IX_Orders_InvoiceNum ON Orders (Invoice) INCLUDE(Cost, VAT, Returned);
CREATE NONCLUSTERED INDEX IX_Payments_InvoiceNum ON Payments (InvoiceNum) INCLUDE(Amount);
This should allow both subqueries to only use the index on each table, with no bookmark loopup/clustered index scan required.
Try this, note that I haven't tested it, just wipped it out on notepad. If any of your invoices may not exist in any of the subtables, then use LEFT JOIN
SELECT InvoiceNum, vat_only, sales_prevat, income
FROM Invoices i
INNER JOIN (SELECT Invoice, SUM(Cost) [vat_only], SUM(Vat) [sales_prevat]
FROM Orders
WHERE Returned <> 1
GROUP BY Invoice) o
ON i.InvoiceNum = o.Invoice
INNER JOIN (SELECT SUM(Amount) [income]
FROM Payments) p
ON i.InvoiceNum = p.InvoiceNum
WHERE i.InvoiceYear = currentyear
select
PreQuery.InvoiceNum,
PreQuery.VAT_Only,
PreQuery.Sales_Prevat,
SUM( Pay.Amount ) as Income
from
( select
I.InvoiceNum,
SUM( O.Cost ) as VAT_Only,
SUM( O.Vat ) as sales_prevat
from
Invoice I
Join Orders O
on I.InvoiceNum = O.Invoice
AND O.Returned <> 1
where
I.InvoiceYear = currentYear
group by
I.InvoiceNum ) PreQuery
JOIN Payments Pay
on PreQuery.InvoiceNum = Pay.InvoiceNum
group by
PreQuery.InvoiceNum,
PreQuery.VAT_Only,
PreQuery.Sales_Prevat
Your "currentYear" reference could be parameterized or you can use from getting the current date from sql function such as
Year( GetDate() )