Query with LEFT OUTER JOIN on subqueries with SUMs - sql

Im trying to perform an OUTER JOIN on related tables, but I want to JOIN the SUMs not the actual data. This query is flawed so I am looking for help on this structure, but also Im curious if there is a more elegant way of performing this type of query.
SELECT firstname,lastname, thesum1, thesum2 FROM whovians
LEFT OUTER JOIN (
SELECT SUM(thevalue) AS thesum1 FROM friends WHERE doctornumref = 10 AND year = 1968
) AS derivedTable1
ON (whovians.doctornum = friends.doctornumref)
LEFT OUTER JOIN (
SELECT SUM(amount) AS thesum2 FROM enemies WHERE doctornumref = 10 AND year = 1968
) AS derivedTable2
ON (whovians.doctornum = enemies.doctornumref) WHERE year = 1968 AND doctornum = 10;

Should work like this:
SELECT w.firstname, w.lastname, derived1.thesum1, derived2.thesum2
FROM whovians w
LEFT JOIN (
SELECT doctornumref, SUM(thevalue) AS thesum1
FROM friends
WHERE doctornumref = 10
AND year = 1968
GROUP BY 1
) AS derived1 ON derived1.doctornumref = w.doctornum
LEFT JOIN (
SELECT doctornumref, SUM(amount) AS thesum2
FROM enemies
WHERE doctornumref = 10
AND year = 1968
GROUP BY 1
) AS derived2 ON derived2.doctornumref = w.doctornum
WHERE w.doctornum = 10
AND w.year = 1968;
In this particular case, since you restrict to the same year and doctornumref / doctornum in outer query as well as subqueries, and the subquery can only return 0 or 1 rows, you can simplify with lowly correlated subqueries:
SELECT firstname,lastname
, (SELECT SUM(thevalue)
FROM friends
WHERE doctornumref = w.doctornum
AND year = w.year) AS thesum1
, (SELECT SUM(amount)
FROM enemies
WHERE doctornumref = w.doctornum
AND year = w.year) AS thesum2
FROM whovians w
WHERE year = 1968
AND doctornum = 10;
If (year, doctornum) is not unique in table whovians, the first form will prevent repeated evaluation of the subqueries and perform better, though.
You can still simplify:
SELECT w.firstname, w.lastname, f.thesum1, e.thesum2
FROM whovians w
LEFT JOIN (
SELECT SUM(thevalue) AS thesum1
FROM friends
WHERE doctornumref = 10
AND year = 1968
) f ON true -- 0 or 1 row in subquery, guaranteed to match
LEFT JOIN (
SELECT SUM(amount) AS thesum2
FROM enemies
WHERE doctornumref = 10
AND year = 1968
) e ON true
WHERE w.doctornum = 10
AND w.year = 1968;

Related

Query optimization for multiple inner joins and sub-query

I need help regarding query optimization of the below query.
SELECT pr.todate , pr.descr, cmp.company_id
FROM employee AS emp
INNER JOIN company AS cmp ON emp.emp_comp_id = cmp.company_id
INNER JOIN profile AS pr ON emp.acca_id = pr.profile_id
INNER JOIN acondition ON as_id = as_ac_id
WHERE as_closed = 0
AND (pr.ac_act_id = 20)
AND (pr.todate = (SELECT MIN(todate) AS Expr1
FROM profile pro
INNER JOIN employee empl ON empl.acca_id = pro.profile_id
JOIN acondition ON as_id = as_ac_id
WHERE (pro.ac_act_id = 20
AND empl.emp_comp_id = cmp.company_id)
AND as_closed = 0))
Since there are duplicate joins in the main query and sub query, is there any way to remove those joins in the subquery?
Since, as you clarified, your sub-query is almost identical to your main query you might be able to use the window function RANK as a filter condition. RANK assigns the same number to ties, meaning if multiple records per company match you will get them all e.g.
SELECT todate, descr, company_id
FROM (
SELECT pr.todate, pr.descr, cmp.company_id
, RANK() OVER (PARTITION BY cmp.company_id ORDER BY pr.todate ASC) RankNumber
FROM employee AS emp
INNER JOIN company AS cmp ON emp.emp_comp_id = cmp.company_id
INNER JOIN profile AS pr ON emp.acca_id = pr.profile_id
INNER JOIN acondition ON as_id = as_ac_id
WHERE as_closed = 0 AND pr.ac_act_id = 20
) X
where RankNumber = 1;
Does this work for you?
SELECT ca.todate , pr.descr, cmp.company_id
FROM employee AS emp
INNER JOIN company AS cmp ON emp.emp_comp_id = cmp.company_id
CROSS APPLY (
SELECT TOP(1) pr.todate
FROM profile pr
INNER JOIN acondition ON as_id = as_ac_id
WHERE emp.acca_id = pr.profile_id AND (pr.ac_act_id = 20) AND as_closed = 0
ORDER BY pr.todate ASC
) AS ca

Oracle query optimization recommendation

Below query is just taking long time and the below predicate is used only to get unique records, as such was wondering is there a different way to rewrite the same query without calling the below predicate multiple times, to get the unique ID.
select max(c.id) from plocation c where c.ids = y.ids and c.idc = y.idc)
select max(cr.id) from plocation_log cr where cr.ids = yt.ids and cr.idc = yt.idc)
select max(pr.id) from patentpr where pr.ids = p.ids and pr.idc = p.idc)
My full sample query
SELECT to_char(p.pid) AS patentid,
p.name,
x.dept,
y.location
FROM patent p
JOIN pdetails x ON p.pid = x.pid AND x.isactive = 1
JOIN plocation y
ON y.idr = p.idr
AND y.idc = p.idc
AND y.id = *(select max(c.id) from plocation c where c.ids = y.ids and c.idc = y.idc)*
AND y.idopstype in (36, 37)
JOIN plocation_log yt
ON yt.idr = y.idr
AND yt.idc= y.idc
AND yt.id = *(select max(cr.id) from plocation_log cr where cr.ids = yt.ids and cr.idc = yt.idc)*
AND yt.idopstype in (36,37)
WHERE
p.idp IN (10,20,30)
AND p.id = *(select max(pr.id) from patent pr where pr.ids = p.ids and pr.idc = p.idc)*
AND p.idopstype in (36,37)
Consider joining to aggregate CTEs to calculate MAX values per group once as opposed to rowwise MAX calculation for every outer query row. Also, be sure to use more informative table aliases instead of a, b, c or x, y, z style.
WITH loc_max AS
(select ids, idc, max(id) as max_id from plocation group ids, idc)
, log_max AS
(select ids, idc, max(id) as max_id from plocation_log group by ids, idc)
, pat_max AS
(select ids, idc, max(id) as max_id from patent pr group by ids, idc)
SELECT to_char(pat.pid) AS patentid
, pat.name
, det.dept
, loc.location
FROM patent pat
JOIN pdetails det
ON pat.pid = det.pid
AND det.isactive = 1
JOIN plocation loc
ON loc.idr = pat.idr
AND loc.idc = pat.idc
AND loc.idopstype IN (36, 37)
JOIN loc_max -- ADDED CTE JOIN
ON loc.id = loc_max.max_id
AND loc.ids = loc_max.ids
AND loc.idc = loc_max.idc
JOIN plocation_log log
ON log.idr = log.idr
AND log.idc = log.idc
AND log.idopstype in (36,37)
JOIN log_max -- ADDED CTE JOIN
ON log.id = log_max.max_id
AND log.ids = log_max.ids
AND log.idc = log_max.idc
JOIN pat_max -- ADDED CTE JOIN
ON pat.id = pat_max.max_id
AND pat.ids = pat_max.ids
AND pat.idc = pat_max.idc
WHERE pat.idp IN (10, 20, 30)
AND pat.idopstype IN (36, 37)
As commented by The Impaler, one option is to use analytic functions instead of correlated subqueries. The idea is to rank records within subqueries using RANK(), then filter in the outer query (join conditions or WHERE clause).
Consider:
SELECT to_char(p.pid) AS patentid,
p.name,
x.dept,
y.location
FROM (SELECT p.*, RANK() OVER(PARTITION BY ids, idc ORDER BY id) rn FROM patinet) p
JOIN pdetails x ON p.pid = x.pid AND x.isactive = 1
JOIN (SELECT y.*, RANK() OVER(PARTITION BY ids, idc ORDER BY id) rn FROM plocation y) y
ON y.idr = p.idr
AND y.idc = p.idc
AND y.idopstype in (36, 37)
AND y.rn = 1
JOIN (SELECT y.*, RANK() OVER(PARTITION BY ids, idc ORDER BY id) rn FROM plocation_log yt) yt
ON yt.idr = y.idr
AND yt.idc= y.idc
AND yt.idopstype in (36,37)
AND yt.rn = 1
WHERE
p.idp IN (10,20,30)
AND p.idopstype in (36,37)
AND p.rn = 1

Get value from a joined table with no value in primary table

The query shown below is just about right, but I need to have a row for each fiscal Id, i.e. in the output shown below, there needs to be a new row after row 4 with data (screen shot below)
The query I'm using is:
SELECT a.companyId,a.profitCenterID,a.coaID,a.fiscalId,
COALESCE(SUM(a.amount * -1),0) amount,
twelveMo = (
SELECT COALESCE(SUM(amount * -1), 0)
FROM gl a1
LEFT OUTER JOIN fiscal f ON a1.fiscalId=f.Id
WHERE
a1.companyId = a.companyId AND
a1.profitCenterId = a.profitCenterId AND
a1.coaId = a.coaId AND
f.Id > a.fiscalId - 12 AND
f.Id <= a.fiscalId
)
FROM gl a
INNER JOIN coa c ON c.Id=a.coaId AND c.statementType=4
GROUP BY companyId,profitCenterId,coaId,a.fiscalId
ORDER BY companyId,profitCenterId,coaId,a.fiscalId
I don't know your sample datas and your schema's, so I've just added my query on the top of your's.
;WITH CTE_NUM_TEMP AS
(
SELECT 1 AS Fiscal
UNION ALL
SELECT Fiscal+1 FROM CTE_NUM_TEMP
WHERE Fiscal+1<=100
)
SELECT ISNULL(Der.companyId,1) AS companyId,ISNULL(Der.profitCenterID,1) AS profitCenterID,
ISNULL(Der.coaID,40000) AS coaID,IIF(twelveMo IS NULL,LAG(twelveMo,1) OVER(ORDER BY Fiscal),twelveMo) AS twelveMo
FROM CTE_NUM_TEMP AS Num
LEFT JOIN
(
SELECT a.companyId,a.profitCenterID,a.coaID,a.fiscalId,
COALESCE(SUM(a.amount * -1),0) amount,
twelveMo = (
SELECT COALESCE(SUM(amount * -1), 0)
FROM gl a1
LEFT OUTER JOIN fiscal f ON a1.fiscalId=f.Id
WHERE
a1.companyId = a.companyId AND
a1.profitCenterId = a.profitCenterId AND
a1.coaId = a.coaId AND
f.Id > a.fiscalId - 12 AND
f.Id <= a.fiscalId
)
FROM gl a
INNER JOIN coa c ON c.Id=a.coaId AND c.statementType=4
GROUP BY companyId,profitCenterId,coaId,a.fiscalId
)AS Der
ON Num.Fiscal=Der.fiscalId

Sum of resulting set of rows in SQL

I've got the following query:
SELECT DISTINCT CU.permit_id, CU.month, /*CU.year,*/ M.material_id, M.material_name, /*MC.chemical_id, C.chemical_name,
C.precursor_organic_compound, C.non_precursor_organic_compound,*/
/*MC.chemical_percentage,*/
POC_emissions =
CASE
WHEN (C.precursor_organic_compound = 'true')
THEN (CU.chemical_usage_lbs / CU.material_density) * M.VOC
ELSE 0
END,
NON_POC_emissions =
CASE
WHEN (C.non_precursor_organic_compound = 'true')
THEN CU.chemical_usage_lbs * (MC.chemical_percentage / 100)
ELSE 0
END
FROM material M
LEFT OUTER JOIN material_chemical MC ON MC.material_id = M.material_id
LEFT OUTER JOIN chemical_usage CU ON CU.material_id = MC.material_id
LEFT OUTER JOIN chemical C ON C.chemical_id = MC.chemical_id
WHERE (CU.month >=1 AND CU.month <= 2)
AND CU.year = 2013
AND M.material_id = 52
--AND CU.permit_id = 2118
--GROUP BY CU.permit_id, M.material_id, M.material_name, CU.month, MC.chemical_id, MC.chemical_id, C.chemical_name, C.precursor_organic_compound, C.non_precursor_organic_compound
--ORDER BY C.chemical_name ASC
Which returns:
But what I need is to return one row per month per material adding up the values of POC per month and NON_POC per month.
So, I should end up with something like:
Month material_id material_name POC NON_POC
1 52 Krylon... 0.107581 0.074108687
2 52 Krylon... 0.143437 0.0988125
I tried using SUM but it sums up the same result multiple times:
SELECT /*DISTINCT*/ CU.permit_id, CU.month, /*CU.year,*/ M.material_id, M.material_name, /*MC.chemical_id, C.chemical_name,
C.precursor_organic_compound, C.non_precursor_organic_compound,*/
--MC.chemical_percentage,
POC_emissions = SUM(
CASE
WHEN (C.precursor_organic_compound = 'true')
THEN (CU.chemical_usage_lbs / CU.material_density) * M.VOC
ELSE 0
END),
NON_POC_emissions = SUM(
CASE
WHEN (C.non_precursor_organic_compound = 'true')
THEN CU.chemical_usage_lbs * (MC.chemical_percentage / 100)
ELSE 0
END)
FROM material M
LEFT OUTER JOIN material_chemical MC ON MC.material_id = M.material_id
LEFT OUTER JOIN chemical_usage CU ON CU.material_id = MC.material_id
LEFT OUTER JOIN chemical C ON C.chemical_id = MC.chemical_id
WHERE M.material_id = 52
--AND CU.permit_id = 187
AND (CU.month >=1 AND CU.month <= 2)
AND CU.year = 2013
GROUP BY CU.permit_id, M.material_id, M.material_name, CU.month/*, CU.year, MC.chemical_id, C.chemical_name, C.precursor_organic_compound, C.non_precursor_organic_compound*/
--ORDER BY C.chemical_name ASC
The first query has a DISTINCT clause. What is the output without the DISTINCT clause. I suspect you have more rows than shows in your screenshot.
Regardless, you could try something like this to get the desired result.
select permit_id, month, material_id, material_name,
sum(poc_emissions), sum(non_poc_emissions)
from (
SELECT DISTINCT CU.permit_id, CU.month, M.material_id, M.material_name,
POC_emissions =
CASE
WHEN (C.precursor_organic_compound = 'true')
THEN (CU.chemical_usage_lbs / CU.material_density) * M.VOC
ELSE 0
END,
NON_POC_emissions =
CASE
WHEN (C.non_precursor_organic_compound = 'true')
THEN CU.chemical_usage_lbs * (MC.chemical_percentage / 100)
ELSE 0
END
FROM material M
LEFT OUTER JOIN material_chemical MC ON MC.material_id = M.material_id
LEFT OUTER JOIN chemical_usage CU ON CU.material_id = MC.material_id
LEFT OUTER JOIN chemical C ON C.chemical_id = MC.chemical_id
WHERE (CU.month >=1 AND CU.month <= 2)
AND CU.year = 2013
AND M.material_id = 52
) main
group by permit_id, month, material_id, material_name
Explanation
Since the results you retrieved by doing a DISTINCT was consider source-of-truth, I created an in-memory table by making it a sub-query. However, this subquery must have a name of some kind...whatever name. I gave it a name main. Subqueries look like this:
select ... from (sub-query) <give-it-a-table-name>
Simple Example:
select * from (select userid, username from user) user_temp
Advanced Example:
select * from (select userid, username from user) user_temp
inner join (select userid, sum(debits) as totaldebits from debittable) debit
on debit.userid = user_temp.userid
Notice how user_temp alias for the subquery can be used as if the sub-query was a real table.
Use above query in subquery and group by (month) and select sum(POC_emissions) and sum(NON_POC_emissions )

TSQL Join on Tables with Multiple Counts

I have a table that I am trying to get 2 different counts from the same data using 2 left joins.
For whatever reason, it is duplicating the data and providing an incorrect result and I am not sure why.
This is the query that I have so far which I thought would be working:
DECLARE #locale INT = '14'
SELECT TOP 50
E.[DepartmentDesc] AS department,
COUNT(N.[nomineeQID]) AS totalNominations,
COUNT(S.[subQID]) AS totalSubmissions,
COUNT(N.[nomineeQID]) + COUNT(S.[subQID]) AS total
FROM
employees AS E
LEFT JOIN
submissions AS S ON E.[qid] = S.[subQID] AND S.[statusID] = 3
AND S.[locationID] = #locale
LEFT JOIN
submissions AS N ON E.[qid] = N.[nomineeQID] AND N.[statusID] = 3
AND N.[locationID] = #locale
GROUP BY
E.[DepartmentDesc]
ORDER BY
totalNominations DESC
Here is a SQL Fiddle of the data: http://sqlfiddle.com/#!3/4e6b5/1
The result should be the following but it is providing skewed numbers:
Total Nominations should be 3
Total Submissions should be 2
Total should be 5
I have a feeling its close but the math is just not cooperating!
Any ideas?
You are getting a cartesian product for each department. The simplest fix to your query is to use count(distinct):
COUNT(DISTINCT N.[nomineeQID]) AS totalNominations,
COUNT(DISTINCT S.[subQID]) AS totalSubmissions,
COUNT(DISTINCT N.[nomineeQID]) + COUNT(DISTINCT S.[subQID]) AS total
A more correct fix is to do the aggregations in subqueries before doing the join.
EDIT:
Because of the duplications problem, use SubmissionId instead:
COUNT(DISTINCT N.SubmissionId) AS totalNominations,
COUNT(DISTINCT S.SubmissionId) AS totalSubmissions,
COUNT(DISTINCT N.SubmissionId) + COUNT(DISTINCT S.SubmissionId) AS total
Try this:
DECLARE #locale INT = '14'
select TOP 50 t.department, sum(t.totalSubmissions),
sum(t.totalNominations), sum(t.total)
from
(SELECT
E.[DepartmentDesc] AS department,
(select COUNT(S.[subQID])
from submissions AS S
where E.[qid] = S.[subQID]
AND S.[statusID] = 3
AND S.[locationID] = #locale) AS totalSubmissions,
(select COUNT(N.[nomineeQID])
from submissions AS N
where E.[qid] = N.[nomineeQID]
AND N.[statusID] = 3
AND N.[locationID] = #locale) AS totalNominations,
(select COUNT(S.[subQID])
from submissions AS S
where E.[qid] = S.[subQID]
AND S.[statusID] = 3
AND S.[locationID] = #locale) +
(select COUNT(N.[nomineeQID])
from submissions AS N
where E.[qid] = N.[nomineeQID]
AND N.[statusID] = 3
AND N.[locationID] = #locale) AS total
FROM employees AS E
where exists(
select 'submission'
from submissions AS S
where E.[qid] = S.[subQID]
AND S.[statusID] = 3
AND S.[locationID] = #locale
) or
exists(
select 'nomination'
from submissions AS N
where E.[qid] = N.[nomineeQID]
AND N.[statusID] = 3
AND N.[locationID] = #locale
)
) as t
group by t.department
ORDER BY sum(t.totalNominations) DESC
Go to SqlFiddle