Left Join Is Returning One value for entire column - sql

I am doing a left join on another table in the subquery and the commission column I want to return from the left join is only bringing one value for the entire commission column which is wrong (see first query result below). Now if I do the left join outside of table A (query 2) then I get the desired results (see second result set). The question is why isn't the left join working within table A in the first query.
I have tried left joining outside of subquery/table A (query 2) and it it works fine but I want to learn why isn't the left join working within the table A.
The query (query 1) is below which is giving the duplicate values in the commission column
Position_A1 table
Sector Short_Side
------------------------
Engineering -2
Financial -5
Industry -10
Corporate -36
Energy -52
Financial -26
Order table
Sector Commission
------------------------
Engineering 10
Financial 100
Industry 36
Corporate 91
Energy 10
Financial 25
Query 1
SELECT *
FROM
(SELECT POS.SECTOR,
SUM(ABS(POS.SHORT_SIDE)) AS Short_Expo,
COM.COMMISSION
FROM Position_A1 POS
LEFT JOIN (SELECT SECTOR, sum(COMMISSION) AS COMMISSION
FROM ORDER
WHERE TRADE_DATE = TO_DATE('2019-11-01','YYYY-MM-DD')
GROUP BY SECTOR
)COM
ON POS.SECTOR = COM.SECTOR
WHERE TRADE_DATE = TO_DATE('2019-11-01','YYYY-MM-DD')
GROUP BY SECTOR ) A
However, if I try the below, I get the correct results in the commission column.
Query 2
SELECT A.*, COM.COMMISSION
FROM
(SELECT POS.SECTOR,
SUM(ABS(POS.SHORT_SIDE)) AS Short_Expo
FROM Position_A1 POS
WHERE TRADE_DATE = TO_DATE('2019-11-01','YYYY-MM-DD')
GROUP BY SECTOR ) A
LEFT JOIN (SELECT SECTOR, sum(COMMISSION) AS COMMISSION
FROM ORDER
WHERE TRADE_DATE = TO_DATE('2019-11-01','YYYY-MM-DD')
GROUP BY SECTOR
)COM
ON POS.SECTOR = COM.SECTOR
As per the first query the result I get is:
Sector Short_Expo Commission
Energy 256 125
Industry 236 125
Financial 125 125
As per the second query the result I get (which is correct) is:
Sector Short_Expo Commission
Energy 256 128
Industry 236 325
Financial 125 186
The question is why isn't query one giving the ideal result whereas query 2. What am I doing wrong in the first query that is resulting in duplicate commission?
Using the first it seems that the commission for only one sector (Financial) is being returned for all sectors.

In the first query, the WHERE clause in the outer query (WHERE TRADE_DATE = TO_DATE('2019-11-01', 'YYYY-MM-DD')) is turning the LEFT JOIN into an INNER JOIN because the non-matching values are NULL and they don't match.
The normal solution is to include filtering on subsequent tables in the ON clause.

Related

Inner join + group by - select common columns and aggregate functions

Let's say i have two tables
Customer
---
Id Name
1 Foo
2 Bar
and
CustomerPurchase
---
CustomerId, Amount, AmountVAT, Accountable(bit)
1 10 11 1
1 20 22 0
2 5 6 0
2 2 3 0
I need a single record for every joined and grouped Customer and CustomerPurchase group.
Every record would contain
columns from table Customer
some aggregation functions like SUM
a 'calculated' column. For example difference of other columns
result of subquery to CustomerPurchase table
An example of result i would like to get
CustomerPurchases
---
Name Total TotalVAT VAT TotalAccountable
Foo 30 33 3 10
Bar 7 9 2 0
I was able to get a single row only by grouping by all the common columns, which i dont think is the right way to do. Plus i have no idea how to do the 'VAT' column and 'TotalAccountable' column, which filters out only certain rows of CustomerPurchase, and then runs some kind of aggregate function on the result. Following example doesn't work ofc but i wanted to show what i would like to achieve
select C.Name,
SUM(CP.Amount) as 'Total',
SUM(CP.AmountVAT) as 'TotalVAT',
diff? as 'VAT',
subquery? as 'TotalAccountable'
from Customer C
inner join CustomerPurchase CR
on C.Id = CR.CustomerId
group by C.Id
I would suggest you just need the follow slight changes to your query. I would also consider for clarity, if you can, to use the terms net and gross which is typical for prices excluding and including VAT.
select c.[Name],
Sum(cp.Amount) as Total,
Sum(cp.AmountVAT) as TotalVAT,
Sum(cp.AmountVAT) - Sum(CP.Amount) as VAT,
Sum(case when cp.Accountable = 1 then cp.Amount end) as TotalAccountable
from Customer c
join CustomerPurchase cp on cp.CustomerId = c.Id
group by c.[Name];

GROUP BY one column, then GROUP BY another column

I have a database table t with a sales table:
ID
TYPE
AGE
1
B
20
1
BP
20
1
BP
20
1
P
20
2
B
30
2
BP
30
2
BP
30
3
P
40
If a person buys a bundle it appears the bundle sale (TYPE B) and the different bundle products (TYPE BP), all with the same ID. So a bundle with 2 products appears 3 times (1x TYPE B and 2x TYPE BP) and has the same ID.
A person can also buy any other product in that single sale (TYPE P), which has also the same ID.
I need to calculate the average/min/max age of the customers but the multiple entries per sale tamper with the correct calculation.
The real average age is
(20 + 30 + 40) / 3 = 30
and not
(20+20+20+20 + 30+30+30 + 40) / 8 = 26,25
But I don't know how I can reduce the sales to a single row entry AND get the 4 needed values?
Do I need to GROUP BY twice (first by ID, then by AGE?) and if yes, how can I do it?
My code so far:
SELECT
AVERAGE(AGE)
, MIN(AGE)
, MAX(AGE)
, MEDIAN(AGE)
FROM t
but that does count every row.
Assuming the age is the same for all rows with the same ID (which in itself indicates a normalisation problem), you can use nest aggregation:
select avg(min(age)) from sales
group by id
AVG(MIN(AGE))
-------------
30
SQL Fiddle
The example in the documentation is very similar; and is explained as:
This calculation evaluates the inner aggregate (MAX(salary)) for each group defined by the GROUP BY clause (department_id), and aggregates the results again.
So for your version:
This calculation evaluates the inner aggregate (MIN(age)) for each group defined by the GROUP BY clause (id), and aggregates the results again.
It doesn't really matter whether the inner aggregate is min or max - again, assuming they are all the same - it's just to get a single value per ID, which can then be averaged.
You can do the same for the other values in your original query:
select
avg(min(age)) as avg_age,
min(min(age)) as min_age,
max(min(age)) as max_age,
median(min(age)) as med_age
from sales
group by id;
AVG_AGE MIN_AGE MAX_AGE MED_AGE
------- ------- ------- -------
30 20 40 30
Or if you prefer you could get the one-age-per-ID values once ina CTE or subquery and apply the second layer of aggregation to that:
select
avg(age) as avg_age,
min(age) as min_age,
max(age) as max_age,
median(age) as med_age
from (
select min(age) as age
from sales
group by id
);
which gets the same result.
SQL Fiddle

SQL: Using one table as a criteria for itself

I am trying to use a parameter of a table as a criteria for itself, and can't quite get my sql statement right. It seems to be a relatively simple query; I'm using a sub query for my criteria, but it is not filtering out other rows on my table.
Background:
Manufacturing production floor: I have a bunch of machinists on their machines right now running an operation (OprSeq) of a job (JobNum). From the LaborDtl table, which keeps a record of all labor activity, I can see what labor is currently active (ActiveTrans = 1). With this criteria of active labor, I want to sum up all the past labor transactions on each active labor entry. So I need a LaborDtl table of inactive labor activity with the criteria of active labor from the same table.
The code:
Heres my 'criteria' subquery:
SELECT
LaborDtl.JobNum,
LaborDtl.OprSeq
FROM Erp.LaborDtl
WHERE LaborDtl.ActiveTrans = 1
Which returns active transactions, here's the first couple (sorted by job):
Job Operation
000193 90
000457 70
000457 70
020008-1 140
020008-2 130
020010 60
020035 130
020175 40
020175-2 50
020186 80
020199 10
020203 50
020212 40
020258 60
020272 10
020283 30
020298 10
020299 30
Then here's the full SQL Statement, with the query above embedded:
SELECT
LaborDtl.JobNum,
LaborDtl.OprSeq as "Op",
SUM(LaborDtl.LaborQty) as "Total Labor"
FROM Erp.LaborDtl
WHERE EXISTS
(
SELECT
LaborDtl.Company,
LaborDtl.JobNum,
LaborDtl.OprSeq
FROM Erp.LaborDtl
WHERE LaborDtl.ActiveTrans = 1 --Labor table of just current activity
)
GROUP BY LaborDtl.JobNum, LaborDtl.OprSeq
I expect to see only the Job and Operation numbers that exist in my sub query, but I'm getting both jobs and operations that don't exist in my sub query. Here are the first 10 (note, the first JobNum should be 000193 per my criteria)
JobNum Op Total Labor
0 0.00000000
000004 1 32.00000000
000019 1 106.00000000
000029 1 175.00000000
000143 1 85.00000000
000164 1 58.00000000
000181 1 500.00000000
000227 1 116.00000000
000421 1 154.00000000
000458 1 67.00000000
You're missing some condition to tie the outer and inner queries together. Right now, without that criteria, the inner query just returns "true", as there are jobs with active activities and thus all the rows in the outer query are returned. Note that you'll have to add aliases to the tables, as the inner and outer query use the same table:
SELECT a.JobNum, a.OprSeq as "Op", SUM(a.LaborQty) as "Total Labor"
FROM Erp.LaborDtl a
WHERE EXISTS (SELECT * -- The select list doesn't really matter here
FROM Erp.LaborDtl b
WHERE a.JobNum = b.JobNum AND -- Here!
a.OprSeq = b.OprSeq AND -- And here!
b.ActiveTrans = 1 -- Labor table of just current activity
)
GROUP BY a.JobNum, a.OprSeq
Note, however, that there's an easier (IMHO) way. Since you're grouping by JobNum and OprSeq anyway, you could just count the number of active transactions and using a having clause to query only those that have at least one active transaction:
SELECT JobNum, OprSeq as "Op", SUM(LaborQty) as "Total Labor"
FROM Erp.LaborDtl
GROUP BY JobNum, OprSeq
HAVING COUNT(CASE ActiveTrans WHEN 1 THEN 1 END) > 0
Without knowing RDBMS vendor and version this is the best I can do:
SELECT
t1.JobNum,
t1.OprSeq as "Op",
SUM(t1.LaborQty) as "Total Labor"
FROM Erp.LaborDtl t1
WHERE EXISTS
(
SELECT 1
FROM Erp.LaborDtl t2
WHERE t2.ActiveTrans = 1 --Labor table of just current activity
and t2.Company = t1.Company
and t2.JobNum = t1.JobNum
and t2.OprSeq = t1.OprSeq
)
GROUP BY t1.JobNum, t1.OprSeq

Query to join tables based on two criteria

First, I'm not sure the title adequetely describes what I am trying to achive - so please ammend as you see fit.
I have a table in an SQL database which records budget allocations and transfers.
Each allocation and transfer is recorded against a combination of two details - the year_ID and program_ID. Allocations can come from nowhere or from other year_id & program_id combinations - these are the transfers.
For example, year_ID 1 & program_ID 2 was allocated $1000, then year_ID 1 & program_ID 2 transfered $100 to year_ID 2 & program_id 2.
This is stored in the database like
From_year_ID From_program_ID To_year_ID To_program_ID Budget
null null 1 2 1000
1 2 2 2 100
The query needs to summarise these budget allocations based on the year_id + program_id combination, so the results would display:
year_ID program_ID Budget_Allocations Budget_Transfers
1 2 1000 100
2 2 100 0
I've spent two days trying to put this query together and am officially stuck - could someone help me out or point me in the right direction? I've tried what feels like every combination of left, right, inner, union joins, with etc - but haven't got the outcome I'm looking for.
Here is a sqlfiddle with sample data: http://sqlfiddle.com/#!3/9c1ec/1/0 and one of the queries that doesnt quite work.
I would sum the Budget by Program_ID and Year_ID in some CTEs and join those to the Program and Year tables to avoid summing Budget values more than once.
WITH
bt AS
(SELECT
To_Year_ID AS Year_ID,
To_Program_ID AS Program_ID,
SUM(Budget) AS Budget_Allocation
FROM T_Budget
GROUP BY
To_Year_ID,
To_Program_ID),
bf AS
(SELECT
From_Year_ID AS Year_ID,
From_Program_ID AS Program_ID,
SUM(Budget) AS Budget_Transfer
FROM T_Budget
GROUP BY
From_Year_ID,
From_Program_ID)
SELECT
y.Year_ID,
p.Program_id,
bt.Budget_Allocation,
bf.Budget_Transfer,
y.Short_Name + ' ' + p.Short_Name AS Year_Program,
isnull(bt.Budget_Allocation,0) -
isnull(bf.Budget_Transfer,0)AS Budget_Balance
FROM T_Programs p
CROSS JOIN T_Years y
INNER JOIN bt
ON bt.Program_ID = p.Program_ID
AND bt.Year_ID = y.Year_ID
LEFT JOIN bf
ON bf.Program_ID = p.Program_ID
AND bf.Year_ID = y.Year_ID
ORDER BY
y.Year_ID,
p.Program_ID
http://sqlfiddle.com/#!3/9c1ec/13

Multiple subqueries when combined return too many results

Background
I have merged two long lists of financial transactions (one from each of two companies) into a single table (actually a DataView for reasons not important here).
These two companies did business with lots of customers.
What I want is a query that returns the total number of financial transactions each company had with each customer.
For example:
Customer Company A Company B
Customer X 10 0
Customer Y 15 26
Customer Z 0 71
Hence each customer has dealt with at least one company, and possibly both companies.
So far my query has got this far . . .
SELECT v.[Company],
v. [AnalysisName],
s1.CMTtrans,
s2.CFLtrans
FROM vMainCustTrans AS v
LEFT JOIN (SELECT [AnalysisName], COUNT([AnalysisName]) AS CMTtrans
FROM vMainCustTrans
WHERE [Company] = 'Money'
GROUP BY [AnalysisName]) AS s1
ON v.[AnalysisName] = s1.[AnalysisName]
LEFT JOIN (SELECT [AnalysisName], COUNT([AnalysisName]) AS CFLtrans
FROM vMainCustTrans
WHERE [Company] = 'Forex'
GROUP BY [AnalysisName]) AS s2
ON v.[AnalysisName] = s2.[AnalysisName]
ORDER BY v.[Company], v.[AnalysisName]
Now sub query (s1) returns 89 customers
sub query (s2) returns 37 customers
Yet the whole query returns 18,989 lines
There should be between 89 and 126 (i.e. 89 + 37) lines, depending on how much overlap there is between Company A and Company B
Could someone kindly point out what is wrong with my query and how I produce the results I want; namely a list of customers, with two counts iro of the number of transactions they have had with each of the two customers.
If I've got it right (Customer = AnalysisName)it should looks like:
SELECT
v. [AnalysisName],
SUM(CASE WHEN [Company] = 'Money' THEN 1 ELSE 0 END) CMTtrans,
SUM(CASE WHEN [Company] = 'Forex' THEN 1 ELSE 0 END) CFLtrans
FROM vMainCustTrans AS v
GROUP BY v.[AnalysisName]
ORDER BY v.[AnalysisName]