TSQL Join on Tables with Multiple Counts

TSQL Join on Tables with Multiple Counts - sql

I have a table that I am trying to get 2 different counts from the same data using 2 left joins.
For whatever reason, it is duplicating the data and providing an incorrect result and I am not sure why.
This is the query that I have so far which I thought would be working:
DECLARE #locale INT = '14'
SELECT TOP 50
E.[DepartmentDesc] AS department,
COUNT(N.[nomineeQID]) AS totalNominations,
COUNT(S.[subQID]) AS totalSubmissions,
COUNT(N.[nomineeQID]) + COUNT(S.[subQID]) AS total
FROM
employees AS E
LEFT JOIN
submissions AS S ON E.[qid] = S.[subQID] AND S.[statusID] = 3
AND S.[locationID] = #locale
LEFT JOIN
submissions AS N ON E.[qid] = N.[nomineeQID] AND N.[statusID] = 3
AND N.[locationID] = #locale
GROUP BY
E.[DepartmentDesc]
ORDER BY
totalNominations DESC
Here is a SQL Fiddle of the data: http://sqlfiddle.com/#!3/4e6b5/1
The result should be the following but it is providing skewed numbers:
Total Nominations should be 3
Total Submissions should be 2
Total should be 5
I have a feeling its close but the math is just not cooperating!
Any ideas?

You are getting a cartesian product for each department. The simplest fix to your query is to use count(distinct):
COUNT(DISTINCT N.[nomineeQID]) AS totalNominations,
COUNT(DISTINCT S.[subQID]) AS totalSubmissions,
COUNT(DISTINCT N.[nomineeQID]) + COUNT(DISTINCT S.[subQID]) AS total
A more correct fix is to do the aggregations in subqueries before doing the join.
EDIT:
Because of the duplications problem, use SubmissionId instead:
COUNT(DISTINCT N.SubmissionId) AS totalNominations,
COUNT(DISTINCT S.SubmissionId) AS totalSubmissions,
COUNT(DISTINCT N.SubmissionId) + COUNT(DISTINCT S.SubmissionId) AS total

Try this:
DECLARE #locale INT = '14'
select TOP 50 t.department, sum(t.totalSubmissions),
sum(t.totalNominations), sum(t.total)
from
(SELECT
E.[DepartmentDesc] AS department,
(select COUNT(S.[subQID])
from submissions AS S
where E.[qid] = S.[subQID]
AND S.[statusID] = 3
AND S.[locationID] = #locale) AS totalSubmissions,
(select COUNT(N.[nomineeQID])
from submissions AS N
where E.[qid] = N.[nomineeQID]
AND N.[statusID] = 3
AND N.[locationID] = #locale) AS totalNominations,
(select COUNT(S.[subQID])
from submissions AS S
where E.[qid] = S.[subQID]
AND S.[statusID] = 3
AND S.[locationID] = #locale) +
(select COUNT(N.[nomineeQID])
from submissions AS N
where E.[qid] = N.[nomineeQID]
AND N.[statusID] = 3
AND N.[locationID] = #locale) AS total
FROM employees AS E
where exists(
select 'submission'
from submissions AS S
where E.[qid] = S.[subQID]
AND S.[statusID] = 3
AND S.[locationID] = #locale
) or
exists(
select 'nomination'
from submissions AS N
where E.[qid] = N.[nomineeQID]
AND N.[statusID] = 3
AND N.[locationID] = #locale
)
) as t
group by t.department
ORDER BY sum(t.totalNominations) DESC
Go to SqlFiddle

Related

Query optimization for multiple inner joins and sub-query

I need help regarding query optimization of the below query.
SELECT pr.todate , pr.descr, cmp.company_id
FROM employee AS emp
INNER JOIN company AS cmp ON emp.emp_comp_id = cmp.company_id
INNER JOIN profile AS pr ON emp.acca_id = pr.profile_id
INNER JOIN acondition ON as_id = as_ac_id
WHERE as_closed = 0
AND (pr.ac_act_id = 20)
AND (pr.todate = (SELECT MIN(todate) AS Expr1
FROM profile pro
INNER JOIN employee empl ON empl.acca_id = pro.profile_id
JOIN acondition ON as_id = as_ac_id
WHERE (pro.ac_act_id = 20
AND empl.emp_comp_id = cmp.company_id)
AND as_closed = 0))
Since there are duplicate joins in the main query and sub query, is there any way to remove those joins in the subquery?

Since, as you clarified, your sub-query is almost identical to your main query you might be able to use the window function RANK as a filter condition. RANK assigns the same number to ties, meaning if multiple records per company match you will get them all e.g.
SELECT todate, descr, company_id
FROM (
SELECT pr.todate, pr.descr, cmp.company_id
, RANK() OVER (PARTITION BY cmp.company_id ORDER BY pr.todate ASC) RankNumber
FROM employee AS emp
INNER JOIN company AS cmp ON emp.emp_comp_id = cmp.company_id
INNER JOIN profile AS pr ON emp.acca_id = pr.profile_id
INNER JOIN acondition ON as_id = as_ac_id
WHERE as_closed = 0 AND pr.ac_act_id = 20
) X
where RankNumber = 1;

Does this work for you?
SELECT ca.todate , pr.descr, cmp.company_id
FROM employee AS emp
INNER JOIN company AS cmp ON emp.emp_comp_id = cmp.company_id
CROSS APPLY (
SELECT TOP(1) pr.todate
FROM profile pr
INNER JOIN acondition ON as_id = as_ac_id
WHERE emp.acca_id = pr.profile_id AND (pr.ac_act_id = 20) AND as_closed = 0
ORDER BY pr.todate ASC
) AS ca

How I can select highest review from a user?

I need to select reviews for product, but unique by user (i.e. one review from user).
With my code, I select all reviews, and I can see few reviews left by one user.
SELECT
tr.reviewText, tr.reviewDate, tr.reviewRating,
u.userName AS userName,
u.userFirstName AS userFirstName, u.userSurname AS userSurname,
u.countryId AS countryId
FROM
tblReviews tr
INNER JOIN
tblOrderProduct op ON op.orderProductId = tr.orderProductId
AND op.productOptionId IN (SELECT productOptionId
FROM tblProductOption
WHERE productSubCuId = 111
AND productOptionActive = 1)
LEFT JOIN
tblOrder o ON o.orderId = op.orderId
LEFT JOIN
tblUser u ON u.userRandomId = o.userRandomId
WHERE
tr.reviewsStatusId = 2
ORDER BY
tr.reviewRating DESC, tr.reviewDate DESC
OFFSET 0 ROWS FETCH NEXT 100 ROWS ONLY
Can I get just one review from each user?
Maybe I need select userId -> group results by userId and select one per group? [I tried to do so, but I didn't succeed :( ]

You can use row_number to number the reviews and select any one like below:
;with per_user_one_review
as
(SELECT tr.reviewText, tr.reviewDate, tr.reviewRating,
u.userName as userName,
u.userFirstName as userFirstName, u.userSurname as userSurname,
u.countryId as countryId, row_number() over (partition by u.userRandomId order by tr.reviewDate desc) rn
FROM tblReviews tr
INNER JOIN tblOrderProduct op
ON op.orderProductId = tr.orderProductId
AND op.productOptionId IN (
SELECT productOptionId FROM tblProductOption
WHERE productSubCuId = 111 AND productOptionActive = 1
)
LEFT JOIN tblOrder o ON o.orderId = op.orderId
LEFT JOIN tblUser u ON u.userRandomId = o.userRandomId
WHERE tr.reviewsStatusId = 2
ORDER BY tr.reviewRating DESC, tr.reviewDate DESC
OFFSET 0 ROWS FETCH NEXT 100 ROWS ONLY
)
select * from per_user_one_review where rn = 1
It will pick the latest review (reviewDate desc) from the user.

If you need the last review you could use a join with the suquery for max review date grouped by orderProductId
(and as a suggestion you could use a inner join instead of a IN clasue based on a subquery)
select tr.reviewText
, tr.reviewDate
, tr.reviewRating
, u.userName
, u.userFirstName
, u.userSurname
, u.countryId
from tblReviews tr
INNER JOIN (
select max(reviewDate) max_date, orderProductId
from tblReviews
group by orderProductId
) t1 on t1.orderProductId = tr.orderProductId and t1.max_date = tr.reviewDate
INNER JOIN tblOrderProduct op ON op.orderProductId = tr.orderProductId
INNER JOIN (
SELECT productOptionId
FROM tblProductOption
WHERE productSubCuId = 111 AND productOptionActive = 1
) t2 ON op.productOptionId = t2.productOptionId
LEFT JOIN tblOrder o ON o.orderId = op.orderId
LEFT JOIN tblUser u ON u.userRandomId = o.userRandomId
WHERE tr.reviewsStatusId = 2
ORDER BY tr.reviewRating DESC, tr.reviewDate DESC
OFFSET 0 ROWS FETCH NEXT 100 ROWS ONLY

Sum of resulting set of rows in SQL

I've got the following query:
SELECT DISTINCT CU.permit_id, CU.month, /*CU.year,*/ M.material_id, M.material_name, /*MC.chemical_id, C.chemical_name,
C.precursor_organic_compound, C.non_precursor_organic_compound,*/
/*MC.chemical_percentage,*/
POC_emissions =
CASE
WHEN (C.precursor_organic_compound = 'true')
THEN (CU.chemical_usage_lbs / CU.material_density) * M.VOC
ELSE 0
END,
NON_POC_emissions =
CASE
WHEN (C.non_precursor_organic_compound = 'true')
THEN CU.chemical_usage_lbs * (MC.chemical_percentage / 100)
ELSE 0
END
FROM material M
LEFT OUTER JOIN material_chemical MC ON MC.material_id = M.material_id
LEFT OUTER JOIN chemical_usage CU ON CU.material_id = MC.material_id
LEFT OUTER JOIN chemical C ON C.chemical_id = MC.chemical_id
WHERE (CU.month >=1 AND CU.month <= 2)
AND CU.year = 2013
AND M.material_id = 52
--AND CU.permit_id = 2118
--GROUP BY CU.permit_id, M.material_id, M.material_name, CU.month, MC.chemical_id, MC.chemical_id, C.chemical_name, C.precursor_organic_compound, C.non_precursor_organic_compound
--ORDER BY C.chemical_name ASC
Which returns:
But what I need is to return one row per month per material adding up the values of POC per month and NON_POC per month.
So, I should end up with something like:
Month material_id material_name POC NON_POC
1 52 Krylon... 0.107581 0.074108687
2 52 Krylon... 0.143437 0.0988125
I tried using SUM but it sums up the same result multiple times:
SELECT /*DISTINCT*/ CU.permit_id, CU.month, /*CU.year,*/ M.material_id, M.material_name, /*MC.chemical_id, C.chemical_name,
C.precursor_organic_compound, C.non_precursor_organic_compound,*/
--MC.chemical_percentage,
POC_emissions = SUM(
CASE
WHEN (C.precursor_organic_compound = 'true')
THEN (CU.chemical_usage_lbs / CU.material_density) * M.VOC
ELSE 0
END),
NON_POC_emissions = SUM(
CASE
WHEN (C.non_precursor_organic_compound = 'true')
THEN CU.chemical_usage_lbs * (MC.chemical_percentage / 100)
ELSE 0
END)
FROM material M
LEFT OUTER JOIN material_chemical MC ON MC.material_id = M.material_id
LEFT OUTER JOIN chemical_usage CU ON CU.material_id = MC.material_id
LEFT OUTER JOIN chemical C ON C.chemical_id = MC.chemical_id
WHERE M.material_id = 52
--AND CU.permit_id = 187
AND (CU.month >=1 AND CU.month <= 2)
AND CU.year = 2013
GROUP BY CU.permit_id, M.material_id, M.material_name, CU.month/*, CU.year, MC.chemical_id, C.chemical_name, C.precursor_organic_compound, C.non_precursor_organic_compound*/
--ORDER BY C.chemical_name ASC

The first query has a DISTINCT clause. What is the output without the DISTINCT clause. I suspect you have more rows than shows in your screenshot.
Regardless, you could try something like this to get the desired result.
select permit_id, month, material_id, material_name,
sum(poc_emissions), sum(non_poc_emissions)
from (
SELECT DISTINCT CU.permit_id, CU.month, M.material_id, M.material_name,
POC_emissions =
CASE
WHEN (C.precursor_organic_compound = 'true')
THEN (CU.chemical_usage_lbs / CU.material_density) * M.VOC
ELSE 0
END,
NON_POC_emissions =
CASE
WHEN (C.non_precursor_organic_compound = 'true')
THEN CU.chemical_usage_lbs * (MC.chemical_percentage / 100)
ELSE 0
END
FROM material M
LEFT OUTER JOIN material_chemical MC ON MC.material_id = M.material_id
LEFT OUTER JOIN chemical_usage CU ON CU.material_id = MC.material_id
LEFT OUTER JOIN chemical C ON C.chemical_id = MC.chemical_id
WHERE (CU.month >=1 AND CU.month <= 2)
AND CU.year = 2013
AND M.material_id = 52
) main
group by permit_id, month, material_id, material_name
Explanation
Since the results you retrieved by doing a DISTINCT was consider source-of-truth, I created an in-memory table by making it a sub-query. However, this subquery must have a name of some kind...whatever name. I gave it a name main. Subqueries look like this:
select ... from (sub-query) <give-it-a-table-name>
Simple Example:
select * from (select userid, username from user) user_temp
Advanced Example:
select * from (select userid, username from user) user_temp
inner join (select userid, sum(debits) as totaldebits from debittable) debit
on debit.userid = user_temp.userid
Notice how user_temp alias for the subquery can be used as if the sub-query was a real table.

Use above query in subquery and group by (month) and select sum(POC_emissions) and sum(NON_POC_emissions )

Query with LEFT OUTER JOIN on subqueries with SUMs

Im trying to perform an OUTER JOIN on related tables, but I want to JOIN the SUMs not the actual data. This query is flawed so I am looking for help on this structure, but also Im curious if there is a more elegant way of performing this type of query.
SELECT firstname,lastname, thesum1, thesum2 FROM whovians
LEFT OUTER JOIN (
SELECT SUM(thevalue) AS thesum1 FROM friends WHERE doctornumref = 10 AND year = 1968
) AS derivedTable1
ON (whovians.doctornum = friends.doctornumref)
LEFT OUTER JOIN (
SELECT SUM(amount) AS thesum2 FROM enemies WHERE doctornumref = 10 AND year = 1968
) AS derivedTable2
ON (whovians.doctornum = enemies.doctornumref) WHERE year = 1968 AND doctornum = 10;

Should work like this:
SELECT w.firstname, w.lastname, derived1.thesum1, derived2.thesum2
FROM whovians w
LEFT JOIN (
SELECT doctornumref, SUM(thevalue) AS thesum1
FROM friends
WHERE doctornumref = 10
AND year = 1968
GROUP BY 1
) AS derived1 ON derived1.doctornumref = w.doctornum
LEFT JOIN (
SELECT doctornumref, SUM(amount) AS thesum2
FROM enemies
WHERE doctornumref = 10
AND year = 1968
GROUP BY 1
) AS derived2 ON derived2.doctornumref = w.doctornum
WHERE w.doctornum = 10
AND w.year = 1968;
In this particular case, since you restrict to the same year and doctornumref / doctornum in outer query as well as subqueries, and the subquery can only return 0 or 1 rows, you can simplify with lowly correlated subqueries:
SELECT firstname,lastname
, (SELECT SUM(thevalue)
FROM friends
WHERE doctornumref = w.doctornum
AND year = w.year) AS thesum1
, (SELECT SUM(amount)
FROM enemies
WHERE doctornumref = w.doctornum
AND year = w.year) AS thesum2
FROM whovians w
WHERE year = 1968
AND doctornum = 10;
If (year, doctornum) is not unique in table whovians, the first form will prevent repeated evaluation of the subqueries and perform better, though.
You can still simplify:
SELECT w.firstname, w.lastname, f.thesum1, e.thesum2
FROM whovians w
LEFT JOIN (
SELECT SUM(thevalue) AS thesum1
FROM friends
WHERE doctornumref = 10
AND year = 1968
) f ON true -- 0 or 1 row in subquery, guaranteed to match
LEFT JOIN (
SELECT SUM(amount) AS thesum2
FROM enemies
WHERE doctornumref = 10
AND year = 1968
) e ON true
WHERE w.doctornum = 10
AND w.year = 1968;

Query for logistic regression, multiple where exists

A logistic regression is a composed of a uniquely identifying number, followed by multiple binary variables (always 1 or 0) based on whether or not a person meets certain criteria. Below I have a query that lists several of these binary conditions. With only four such criteria the query takes a little longer to run than what I would think. Is there a more efficient approach than below? Note. tblicd is a large table lookup table with text representations of 15k+ rows. The query makes no real sense, just a proof of concept. I have the proper indexes on my composite keys.
select patient.patientid
,case when exists
(
select c.patientid from tblclaims as c
inner join patient as p on p.patientid=c.patientid
and c.admissiondate = p.admissiondate
and c.dischargedate = p.dischargedate
where patient.patientid = p.patientid
group by c.patientid
having count(*) > 1000
)
then '1' else '0'
end as moreThan1000
,case when exists
(
select c.patientid from tblclaims as c
inner join patient as p on p.patientid=c.patientid
and c.admissiondate = p.admissiondate
and c.dischargedate = p.dischargedate
where patient.patientid = p.patientid
group by c.patientid
having count(*) > 1500
)
then '1' else '0'
end as moreThan1500
,case when exists
(
select distinct picd.patientid from patienticd as picd
inner join patient as p on p.patientid= picd.patientid
and picd.admissiondate = p.admissiondate
and picd.dischargedate = p.dischargedate
inner join tblicd as t on t.icd_id = picd.icd_id
where t.descrip like '%diabetes%' and patient.patientid = picd.patientid
)
then '1' else '0'
end as diabetes
,case when exists
(
select r.patientid, count(*) from patient as r
where r.patientid = patient.patientid
group by r.patientid
having count(*) >1
)
then '1' else '0'
end
from patient
order by moreThan1000 desc

I would start by using subqueries in the from clause:
select q.patientid, moreThan1000, moreThan1500,
(case when d.patientid is not null then 1 else 0 end),
(case when pc.patientid is not null then 1 else 0 end)
from patient p left outer join
(select c.patientid,
(case when count(*) > 1000 then 1 else 0 end) as moreThan1000,
(case when count(*) > 1500 then 1 else 0 end) as moreThan1500
from tblclaims as c inner join
patient as p
on p.patientid=c.patientid and
c.admissiondate = p.admissiondate and
c.dischargedate = p.dischargedate
group by c.patientid
) q
on p.patientid = q.patientid left outer join
(select distinct picd.patientid
from patienticd as picd inner join
patient as p
on p.patientid= picd.patientid and
picd.admissiondate = p.admissiondate and
picd.dischargedate = p.dischargedate inner join
tblicd as t
on t.icd_id = picd.icd_id
where t.descrip like '%diabetes%'
) d
on p.patientid = d.patientid left outer join
(select r.patientid, count(*) as cnt
from patient as r
group by r.patientid
having count(*) >1
) pc
on p.patientid = pc.patientid
order by 2 desc
You can then probably simplify these subqueries more by combining them (for instance "p" and "pc" on the outer query can be combined into one). However, without the correlated subqueries, SQL Server should find it easier to optimize the queries.

Example of left joins as requested...
SELECT
patientid,
ISNULL(CondA.ConditionA,0) as IsConditionA,
ISNULL(CondB.ConditionB,0) as IsConditionB,
....
FROM
patient
LEFT JOIN
(SELECT DISTINCT patientid, 1 as ConditionA from ... where ... ) CondA
ON patient.patientid = CondA.patientID
LEFT JOIN
(SELECT DISTINCT patientid, 1 as ConditionB from ... where ... ) CondB
ON patient.patientid = CondB.patientID
If your Condition queries only return a maximum one row, you can simplify them down to
(SELECT patientid, 1 as ConditionA from ... where ... ) CondA

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

TSQL Join on Tables with Multiple Counts - sql

Related

Query optimization for multiple inner joins and sub-query

How I can select highest review from a user?

Sum of resulting set of rows in SQL

Query with LEFT OUTER JOIN on subqueries with SUMs

Query for logistic regression, multiple where exists

Categories

Resources