Cannot perform an aggregate function on an expression containing an aggregate or a subquery sql server 2012 - sql

I am performing a query and its showing an error Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
the query is
SELECT
tbl_Product.ID,
tbl_Product.ArticleID,
tbl_Product.Title,
tbl_Product.Description,
tbl_Product.Price,
tbl_ProductType.Name,
tbl_Status.StatusName,
tbl_VisibilityStatus.VisibilityStatus,
MAX(
CASE
WHEN tbl_RelatedProduct.TypeOfRelation = 1 THEN 'Bundle'
WHEN tbl_Product.ID IN
(
select tbl_RelatedProduct.Product2ID
from tbl_RelatedProduct
where tbl_RelatedProduct.Product1ID=9 and tbl_RelatedProduct.TypeOfRelation=1) THEN 'Bundle'
END
) 'Bundle',
MAX(
CASE
WHEN tbl_RelatedProduct.TypeOfRelation = 2 THEN 'Follower' END) 'Follower',
MAX(
CASE
WHEN tbl_RelatedProduct.TypeOfRelation = 3 THEN 'Related' END) 'Related'
FROM
tbl_Product inner JOIN
tbl_ProductType ON tbl_Product.ProductTypeId = tbl_ProductType.ID Inner JOIN
tbl_Status ON tbl_Product.StatusID = tbl_Status.ID Inner JOIN
tbl_VisibilityStatus ON tbl_Product.VisibilityID = tbl_VisibilityStatus.ID
left JOIN tbl_RelatedProduct ON tbl_Product.ID = tbl_RelatedProduct.Product1ID
group by
tbl_Product.ID,
tbl_Product.ArticleID,
tbl_Product.Title,
tbl_Product.Description,
tbl_Product.Price,
tbl_ProductType.Name,
tbl_Status.StatusName,
tbl_VisibilityStatus.VisibilityStatus
order by tbl_Product.Title
ANyone know how to help on this...plsss

The below line is causing an issue:
WHEN tbl_Product.ID IN
(
select tbl_RelatedProduct.Product2ID
from tbl_RelatedProduct
where tbl_RelatedProduct.Product1ID=9 and tbl_RelatedProduct.TypeOfRelation=1) THEN 'Bundle'
END
) 'Bundle',
You are using MAX with CASE and one of the statements within CASE uses a subquery, which is causing the error. You might want to consider using joins instead of subqueries to implement this.

SELECT tbl_Product.ID,tbl_Product.ArticleID,tbl_Product.Title,tbl_Product.Description,tbl_Product.Price,tbl_ProductType.Name,tbl_Status.StatusName, tbl_VisibilityStatus.VisibilityStatus,
Case when ((select count(1) from tbl_RelatedProduct where tbl_RelatedProduct.TypeOfRelation = 1 and tbl_RelatedProduct.Product1ID = tbl_Product.Id) > 0) then 1
when ((select count(1) from tbl_RelatedProduct where tbl_RelatedProduct.TypeOfRelation = 1 and tbl_RelatedProduct.Product2ID = tbl_Product.Id) > 0) then 1
else 0 end as Bundle,
(select count(1) from tbl_RelatedProduct where tbl_RelatedProduct.TypeOfRelation = 2 and tbl_RelatedProduct.Product1ID = tbl_Product.Id) as Follower,
(select count(1) from tbl_RelatedProduct where tbl_RelatedProduct.TypeOfRelation = 3 and tbl_RelatedProduct.Product1ID = tbl_Product.Id) as Related
FROM tbl_Product
inner JOIN
tbl_ProductType ON tbl_Product.ProductTypeId = tbl_ProductType.ID Inner JOIN
tbl_Status ON tbl_Product.StatusID = tbl_Status.ID
Inner JOIN
tbl_VisibilityStatus ON tbl_Product.VisibilityID = tbl_VisibilityStatus.ID
Its resolved with this query :)

I think I see what you are trying to do, and it is a bit strange. I would output this all as a single column, and lose the MAX aggregate as follows:
SELECT
p.ID,
p.ArticleID,
p.Title,
p.Description,
p.Price,
pt.Name,
s.StatusName,
v.VisibilityStatus,
CASE
WHEN rp.TypeOfRelation = 1 OR (rp2.Product1ID = 9 AND rp2.TypeOfRelation = 1)
THEN 'Bundle'
WHEN rp.TypeOfRelation = 2
THEN 'Follower'
WHEN rp.TypeOfRelation = 3
THEN 'Related'
ELSE Null
END AS Relation
FROM
tbl_Product p
INNER JOIN tbl_ProductType pt
ON p.ProductTypeId = pt.ID
INNER JOIN tbl_Status s
ON p.StatusID = s.ID
INNER JOIN tbl_VisibilityStatus v
ON p.VisibilityID = v.ID
LEFT JOIN tbl_RelatedProduct rp
ON p.ID = rp.Product1ID
LEFT JOIN tbl_RelatedProduct rp2
ON p.ID = rp2.Product2ID
ORDER BY p.Title
If you would still like to have them in separate columns, just break up the CASE statement as follows:
SELECT
p.ID,
p.ArticleID,
p.Title,
p.Description,
p.Price,
pt.Name,
s.StatusName,
v.VisibilityStatus,
CASE
WHEN rp.TypeOfRelation = 1 OR (rp2.Product1ID = 9 AND rp2.TypeOfRelation = 1)
THEN 'Bundle'
ELSE Null
END AS Bundle,
CASE
WHEN rp.TypeOfRelation = 2
THEN 'Follower'
ELSE Null
END AS Follower,
CASE
WHEN rp.TypeOfRelation = 3
THEN 'Related'
ELSE Null
END AS Related
FROM
tbl_Product p
INNER JOIN tbl_ProductType pt
ON p.ProductTypeId = pt.ID
INNER JOIN tbl_Status s
ON p.StatusID = s.ID
INNER JOIN tbl_VisibilityStatus v
ON p.VisibilityID = v.ID
LEFT JOIN tbl_RelatedProduct rp
ON p.ID = rp.Product1ID
LEFT JOIN tbl_RelatedProduct rp2
ON p.ID = rp2.Product2ID
ORDER BY p.Title

Related

Joining two queries to one table and substracting values

i need to put those two output values (Add_sum and Minus_sum) to one table and subtract them (Add_sum - Minus_sum) and show this value.
I tried many other options, subqueries etc but could not get it to work.
Query 1:
SELECT I.ItemCode, COUNT(H.TransactionTypeID) AS ADD_Sum
FROM inMoveHd AS H INNER JOIN
inMoveLn AS L ON L.InvMoveID = H.InvMoveID INNER JOIN
inItem AS I ON I.ItemID = L.ItemID INNER JOIN
inTransactionType AS T ON H.TransactionTypeID = T.TransactionTypeID
WHERE (T.TransactionSign = 1)
GROUP BY I.ItemCode
Query 2:
SELECT I.ItemCode, COUNT(H.TransactionTypeID) AS Minus_Sum
FROM inMoveHd AS H INNER JOIN
inMoveLn AS L ON L.InvMoveID = H.InvMoveID INNER JOIN
inItem AS I ON I.ItemID = L.ItemID INNER JOIN
inTransactionType AS T ON H.TransactionTypeID = T.TransactionTypeID
WHERE (T.TransactionSign = -1)
GROUP BY I.ItemCode
Use case expressions to do conditional aggregation:
SELECT I.ItemCode,
COUNT(case when T.TransactionSign = 1 then H.TransactionTypeID end) AS ADD_Sum,
COUNT(case when T.TransactionSign = -1 then H.TransactionTypeID end) AS Minus_Sum
FROM inMoveHd AS H INNER JOIN
inMoveLn AS L ON L.InvMoveID = H.InvMoveID INNER JOIN
inItem AS I ON I.ItemID = L.ItemID INNER JOIN
inTransactionType AS T ON H.TransactionTypeID = T.TransactionTypeID
WHERE (T.TransactionSign = -1 or T.TransactionSign = 1)
GROUP BY I.ItemCode
I think that you are looking for conditional aggregation:
SELECT
I.ItemCode,
SUM(CASE WHEN T.TransactionSign = 1 THEN 1 ELSE 0 END) AS Add_Sum,
SUM(CASE WHEN T.TransactionSign = -1 THEN 1 ELSE 0 END) AS Minus_Sum,
SUM(T.TransactionSign) difference
FROM
inMoveHd AS H INNER JOIN
inMoveLn AS L ON L.InvMoveID = H.InvMoveID INNER JOIN
inItem AS I ON I.ItemID = L.ItemID INNER JOIN
inTransactionType AS T ON H.TransactionTypeID = T.TransactionTypeID
WHERE T.TransactionSign IN (-1, 1)
GROUP BY I.ItemCode

SQL - SUM TotalValue returning a null with LEFT JOIN Clause

I'd to like SUM a TotalValue column based on Vendor and logged in user. I completely returning a right value of other info in logged user and the connected vendor, the problem is the SUM of TotalValue column is returning a null value. am I missing something?
This is what I've already tried:
SELECT ,v.VendorName ,
u.Product ,
v.[Description] ,
v.Status ,
SUM(cpm.TotalValue) AS TotalValue
FROM Vendor v
LEFT JOIN [ProductContract] c ON v.VendorId = c.VendorId
AND c.[Status] = 4
AND c.ProductContractId IN
(SELECT con.ProductContractId
FROM [ProductContract] con
INNER JOIN [ProductContractPermission] cp ON cp.ProductContractId = con.ProductContractId
WHERE cp.UserInfoId = #UserInfoId)
LEFT JOIN ProductContractPaymentMenu cpm ON c.ProductContractId = cpm.ProductContractId
AND c.[Status] = 4
AND c.VendorId = #VendorId
LEFT JOIN VendorContact vc ON v.VendorId = vc.VendorId
AND vc.[Type] = 1
LEFT JOIN UserInfo u ON vc.UserInfoId = u.UserInfoId
WHERE v.VendorId IN
(SELECT VendorId
FROM ClientVendor
WHERE ClientId = #VendorId)
GROUP BY v.VendorName,
u.Product,
v.[Description],
v.Status,
cpm.TotalValue
ORDER BY v.[Status],
v.CreatedOn
Seems that you want to apply an aggregate filter, this is the famous HAVING clause:
...
GROUP BY v.VendorName,
u.Product,
v.[Description],
v.Status,
cpm.TotalValue
HAVING
SUM(cpm.TotalValue) > 0
ORDER BY v.[Status],
v.CreatedOn
Hi i think you have to add ISNULL(value,defaultvalue) because cpm table was on left join and it can be null.
SELECT v.VendorName ,
u.Product ,
v.[Description] ,
v.Status ,
SUM(ISNULL(cpm.TotalValue,0)) AS TotalValue
FROM Vendor v
LEFT JOIN [ProductContract] c ON v.VendorId = c.VendorId
AND c.[Status] = 4
AND c.ProductContractId IN
(SELECT con.ProductContractId
FROM [ProductContract] con
INNER JOIN [ProductContractPermission] cp ON cp.ProductContractId = con.ProductContractId
WHERE cp.UserInfoId = #UserInfoId)
LEFT JOIN ProductContractPaymentMenu cpm ON c.ProductContractId = cpm.ProductContractId
AND c.[Status] = 4
AND c.VendorId = #VendorId
LEFT JOIN VendorContact vc ON v.VendorId = vc.VendorId
AND vc.[Type] = 1
LEFT JOIN UserInfo u ON vc.UserInfoId = u.UserInfoId
WHERE v.VendorId IN
(SELECT VendorId
FROM ClientVendor
WHERE ClientId = #VendorId)
GROUP BY v.VendorName,
u.Product,
v.[Description],
v.Status,
cpm.TotalValue
ORDER BY v.[Status],
v.CreatedOn
https://learn.microsoft.com/en-us/sql/t-sql/functions/isnull-transact-sql?view=sql-server-2017
Edit: other point you have miss ',' in your query after the SELECT.
Your left join on ProductContractPaymentMenu may not always get an item, so cpm.TotalValue can be NULL sometimes. When you use SUM and when one value is NULL then the result will be NULL. You might rewrite that part as:
SUM(ISNULL(cpm.TotalValue, 0)) AS TotalValue
In that case, it will treat non-existing records as value 0.

SQL query optimization - make only one join on table

I have a large SQL query, where I need to select some data.
SELECT p.Id, p.UserId, u.Name AS CreatedBy, p.JournalId, p.Title, pt.Name AS PublicationType, p.CreatedDate, p.MagazineTitle, /*ps.StatusId,*/ p.Authors, pb.Name AS Publisher, p.Draft,jns.Name AS JournalTitle,
ISNULL(
ISNULL(
(SELECT StatusId FROM [PublicationsStatus] WHERE StatusDate=
(SELECT MAX(StatusDate) FROM [PublicationsStatus] AS ps WHERE ps.PublicationId = p.Id )),--AND ps.UserId = #UserId ORDER BY StatusDate DESC),
(SELECT TOP(1) ActionId + 6 FROM [PublicationsQuoteSaleLines] AS pqsl WHERE pqsl.PublicationId = p.Id ORDER BY pqsl.Id)
),
1
)AS StatusId
,ISNULL(
(SELECT MAX(StatusDate) FROM [PublicationsStatus] AS ps WHERE ps.PublicationId = p.Id ),--AND ps.UserId = #UserId),
p.CreatedDate
) AS StatusDate
,ISNULL(
(cast((SELECT MAX(StatusDate) FROM [PublicationsStatus] AS ps WHERE ps.PublicationId = p.Id) as date) ),--AND ps.UserId = #UserId),
p.CreatedDate
) AS StDate
,CASE
WHEN ISNULL(
ISNULL(
(SELECT StatusId FROM [PublicationsStatus] WHERE StatusDate=
(SELECT MAX(StatusDate) FROM [PublicationsStatus] AS ps WHERE ps.PublicationId = p.Id )),--AND ps.UserId = #UserId ORDER BY StatusDate DESC),
(SELECT TOP(1) ActionId + 6 FROM [PublicationsQuoteSaleLines] AS pqsl WHERE pqsl.PublicationId = p.Id ORDER BY pqsl.Id)
),
1 ) IN (1, 7, 8) THEN 0
ELSE 1 END AS OrderCriteria
,(SELECT COUNT(*) FROM SentEmails AS se WHERE se.PublicationId = p.Id AND se.EmailType = 1 AND se.UserId = #UserId) AS NumberOfAlerts
,(SELECT COUNT(*) FROM SentEmails AS se WHERE se.PublicationId = p.Id AND se.EmailType = 3 AND se.UserId = #UserId) AS NumberOfReminders
FROM Publications AS p
LEFT JOIN PublicationTypes AS pt ON p.PublicationTypeId = pt.Id
LEFT JOIN Publishers AS pb ON p.PublisherId = pb.Id
LEFT JOIN Journals As jns ON p.JournalId = jns.Id
LEFT JOIN Users AS u ON u.Id = p.UserId
The problem is that the query is slow. AS you can see I have the same thing at OrderCriteria and the StatusId. The StatusDate I'm getting from the same table.
I thought that I could resolve the performance to make only one \
LEFT JOIN
something like this:
LEFT JOIN (
SELECT
PublicationId,
StatusId AS StatusId,
StatusDate AS StatusDate
FROM [PublicationsStatus] WHERE StatusDate=
(
SELECT MAX(StatusDate) FROM PublicationsStatus
)
) AS ps ON ps.PublicationId = p.Id
but I did not get the same results this way.
Can you please advise?
I tried to simplify your query using a few CTE to avoid doing the same subquery multiple times. You can try this out and see if it's still slow.
;WITH MaxStatusDateByPublication AS
(
SELECT
PublicationId = ps.PublicationId,
MaxStatusDate = MAX(ps.StatusDate)
FROM
[PublicationsStatus] AS ps
GROUP BY
PS.PublicationId
),
StatusForMaxDateByPublication AS
(
SELECT
StatusId = PS.StatusId,
M.PublicationId,
M.MaxStatusDate
FROM
MaxStatusDateByPublication AS M
INNER JOIN [PublicationsStatus] AS PS ON
M.PublicationId = PS.PublicationId AND
M.MaxStatusDate = PS.StatusDate
),
SentEmailsByPublicationAndType AS
(
SELECT
S.PublicationID,
S.EmailType,
AmountSentEmails = COUNT(1)
FROM
SentEmails AS S
WHERE
S.EmailType IN (1, 3) AND
S.UserID = #UserId
GROUP BY
S.PublicationID,
S.EmailType
)
SELECT
p.Id,
p.UserId,
u.Name AS CreatedBy,
p.JournalId,
p.Title,
pt.Name AS PublicationType,
p.CreatedDate,
p.MagazineTitle,
p.Authors,
pb.Name AS Publisher,
p.Draft,
jns.Name AS JournalTitle,
COALESCE(MS.StatusId, SL.StatusId, 1) AS StatusId,
ISNULL(MS.MaxStatusDate, P.CreatedDate) AS StatusDate,
ISNULL(CONVERT(DATE, MS.MaxStatusDate), P.CreatedDate) AS StDate,
CASE
WHEN COALESCE(MS.StatusId, SL.StatusId, 1) IN (1, 7, 8) THEN 0
ELSE 1
END AS OrderCriteria,
ISNULL(TY1.AmountSentEmails, 0) AS NumberOfAlerts,
ISNULL(TY3.AmountSentEmails, 0) AS NumberOfReminders
FROM
Publications AS p
LEFT JOIN PublicationTypes AS pt ON p.PublicationTypeId = pt.Id
LEFT JOIN Publishers AS pb ON p.PublisherId = pb.Id
LEFT JOIN Journals As jns ON p.JournalId = jns.Id
LEFT JOIN Users AS u ON u.Id = p.UserId
LEFT JOIN StatusForMaxDateByPublication AS MS ON P.Id = MS.PublicationId
LEFT JOIN SentEmailsByPublicationAndType AS TY1 ON
P.Id = TE.PublicationID AND
TY1.EmailType = 1
LEFT JOIN SentEmailsByPublicationAndType AS TY3 ON
P.Id = TE.PublicationID AND
TY1.EmailType = 3
OUTER APPLY (
SELECT TOP 1
StatusId = ActionId + 6
FROM
[PublicationsQuoteSaleLines] AS pqsl
WHERE
pqsl.PublicationId = P.Id
ORDER BY
pqsl.Id ASC) AS SL
Try to avoid writing the same expression several times (and specially if it involes subqueries inside a column!). Using a few CTEs and proper identing will help readability.
This is a complex query and involves several tables. If your query runs slow it might be for many different reasons. Try executing each subquery on it's own to check if they are slow or not, then try joining them 1 by 1. Indexes by the join columns will probably increase your performance if they don't exist already. Posting the full query execution plan might help.

SQl Error : Each GROUP BY expression must contain at least one column that is not an outer reference [duplicate]

This question already has answers here:
Each GROUP BY expression must contain at least one column that is not an outer reference
(8 answers)
Closed 6 years ago.
I get this error
Each GROUP BY expression must contain at least one column that is not an outer reference
while running this query:
SELECT TOP 1
SUM(mla.total_current_attribute_value)
FROM
partstrack_machine_location_attributes mla (NOLOCK)
INNER JOIN
#tmpInstallParts_Temp installpartdetails ON mla.machine_sequence_id = installpartdetails.InstallKitToMachineSequenceId
AND (CASE WHEN mla.machine_side_id IS NULL THEN 1
WHEN mla.machine_side_id = installpartdetails.InstallKitToMachineSideId THEN 1 END
) = 1
INNER JOIN
partstrack_mes_attribute_mapping mam (NOLOCK) ON mla.mes_attribute = mam.mes_attribute_name
INNER JOIN
partstrack_attribute_type at (NOLOCK) ON mam.pt_attribute_id = at.pt_attribute_id
INNER JOIN
partstrack_ipp_mes_attributes ima(NOLOCK) ON at.pt_attribute_id = ima.pt_attribute_id
WHERE
mla.active_ind = 'Y' AND
ima.ipp_ID IN (SELECT ipp.ipp_id
FROM partstrack_individual_physical_part ipp
INNER JOIN #tmpInstallParts_Temp tmp ON (ipp.ipp_id = tmp.InstallingPartIPPId OR
(CASE WHEN tmp.InstallingPartIPKId = '-1' THEN 1 END) = 1
)
GROUP BY
ima.ipp_id
Can someone help me?
This is the text of the query from the first revision of the question.
In later revisions you removed the last closing bracket ) and the query became syntactically incorrect. You'd better check and fix the text of the question and format the text of the query, so it is readable.
SELECT TOP 1
SUM(mla.total_current_attribute_value)
FROM
partstrack_machine_location_attributes mla (NOLOCK)
INNER JOIN #tmpInstallParts_Temp installpartdetails
ON mla.machine_sequence_id = installpartdetails.InstallKitToMachineSequenceId
AND (CASE WHEN mla.machine_side_id IS NULL THEN 1
WHEN mla.machine_side_id = installpartdetails.InstallKitToMachineSideId THEN 1 END) = 1
INNER JOIN partstrack_mes_attribute_mapping mam (NOLOCK) ON mla.mes_attribute = mam.mes_attribute_name
INNER JOIN partstrack_attribute_type at (NOLOCK) ON mam.pt_attribute_id = at.pt_attribute_id
INNER JOIN partstrack_ipp_mes_attributes ima(NOLOCK) ON at.pt_attribute_id = ima.pt_attribute_id
WHERE
mla.active_ind = 'Y'
AND ima.ipp_ID IN
(
Select
ipp.ipp_id
FROM
partstrack_individual_physical_part ipp
INNER JOIN #tmpInstallParts_Temp tmp
ON (ipp.ipp_id = tmp.InstallingPartIPPId
OR (CASE WHEN tmp.InstallingPartIPKId = '-1' THEN 1 END) = 1)
GROUP BY
ima.ipp_id
)
With this formatting it is clear now that there is a subquery with GROUP BY.
Most likely it is just a typo: you meant to write GROUP BY ipp.ipp_id instead of GROUP BY ima.ipp_id.
If you really wanted to have the GROUP BY not in a subquery, but in the main SELECT, then you misplaced the closing bracket ) and the query should look like this:
SELECT TOP 1
SUM(mla.total_current_attribute_value)
FROM
partstrack_machine_location_attributes mla (NOLOCK)
INNER JOIN #tmpInstallParts_Temp installpartdetails
ON mla.machine_sequence_id = installpartdetails.InstallKitToMachineSequenceId
AND (CASE WHEN mla.machine_side_id IS NULL THEN 1
WHEN mla.machine_side_id = installpartdetails.InstallKitToMachineSideId THEN 1 END) = 1
INNER JOIN partstrack_mes_attribute_mapping mam (NOLOCK) ON mla.mes_attribute = mam.mes_attribute_name
INNER JOIN partstrack_attribute_type at (NOLOCK) ON mam.pt_attribute_id = at.pt_attribute_id
INNER JOIN partstrack_ipp_mes_attributes ima(NOLOCK) ON at.pt_attribute_id = ima.pt_attribute_id
WHERE
mla.active_ind = 'Y'
AND ima.ipp_ID IN
(
Select
ipp.ipp_id
FROM
partstrack_individual_physical_part ipp
INNER JOIN #tmpInstallParts_Temp tmp
ON (ipp.ipp_id = tmp.InstallingPartIPPId
OR (CASE WHEN tmp.InstallingPartIPKId = '-1' THEN 1 END) = 1)
)
GROUP BY
ima.ipp_id
In any case, proper formatting of the source code can really help.
Group By ima.ipp_id
should be applicable to outer query. Because of incorrect placement of '(' it was applying to inner query.
Now after correcting the query,it's working fine without any issues.
Final Query is :
SELECT TOP 1
SUM(mla.total_current_attribute_value)
FROM
partstrack_machine_location_attributes mla (NOLOCK)
INNER JOIN #tmpInstallParts_Temp installpartdetails
ON mla.machine_sequence_id = installpartdetails.InstallKitToMachineSequenceId
AND (CASE WHEN mla.machine_side_id IS NULL THEN 1
WHEN mla.machine_side_id = installpartdetails.InstallKitToMachineSideId THEN 1 END ) = 1
INNER JOIN partstrack_mes_attribute_mapping mam (NOLOCK) ON mla.mes_attribute = mam.mes_attribute_name
INNER JOIN partstrack_attribute_type at (NOLOCK) ON mam.pt_attribute_id = at.pt_attribute_id
INNER JOIN partstrack_ipp_mes_attributes ima(NOLOCK) ON at.pt_attribute_id = ima.pt_attribute_id
WHERE
mla.active_ind = 'Y'
AND ima.ipp_ID IN
(
Select
ipp.ipp_id
FROM
partstrack_individual_physical_part ipp
INNER JOIN #tmpInstallParts_Temp tmp
ON (ipp.ipp_id = tmp.InstallingPartIPPId
OR (CASE WHEN tmp.InstallingPartIPKId = '-1' THEN 1 END ) =1)
)
GROUP BY ima.ipp_id
Thank you all.

Query for logistic regression, multiple where exists

A logistic regression is a composed of a uniquely identifying number, followed by multiple binary variables (always 1 or 0) based on whether or not a person meets certain criteria. Below I have a query that lists several of these binary conditions. With only four such criteria the query takes a little longer to run than what I would think. Is there a more efficient approach than below? Note. tblicd is a large table lookup table with text representations of 15k+ rows. The query makes no real sense, just a proof of concept. I have the proper indexes on my composite keys.
select patient.patientid
,case when exists
(
select c.patientid from tblclaims as c
inner join patient as p on p.patientid=c.patientid
and c.admissiondate = p.admissiondate
and c.dischargedate = p.dischargedate
where patient.patientid = p.patientid
group by c.patientid
having count(*) > 1000
)
then '1' else '0'
end as moreThan1000
,case when exists
(
select c.patientid from tblclaims as c
inner join patient as p on p.patientid=c.patientid
and c.admissiondate = p.admissiondate
and c.dischargedate = p.dischargedate
where patient.patientid = p.patientid
group by c.patientid
having count(*) > 1500
)
then '1' else '0'
end as moreThan1500
,case when exists
(
select distinct picd.patientid from patienticd as picd
inner join patient as p on p.patientid= picd.patientid
and picd.admissiondate = p.admissiondate
and picd.dischargedate = p.dischargedate
inner join tblicd as t on t.icd_id = picd.icd_id
where t.descrip like '%diabetes%' and patient.patientid = picd.patientid
)
then '1' else '0'
end as diabetes
,case when exists
(
select r.patientid, count(*) from patient as r
where r.patientid = patient.patientid
group by r.patientid
having count(*) >1
)
then '1' else '0'
end
from patient
order by moreThan1000 desc
I would start by using subqueries in the from clause:
select q.patientid, moreThan1000, moreThan1500,
(case when d.patientid is not null then 1 else 0 end),
(case when pc.patientid is not null then 1 else 0 end)
from patient p left outer join
(select c.patientid,
(case when count(*) > 1000 then 1 else 0 end) as moreThan1000,
(case when count(*) > 1500 then 1 else 0 end) as moreThan1500
from tblclaims as c inner join
patient as p
on p.patientid=c.patientid and
c.admissiondate = p.admissiondate and
c.dischargedate = p.dischargedate
group by c.patientid
) q
on p.patientid = q.patientid left outer join
(select distinct picd.patientid
from patienticd as picd inner join
patient as p
on p.patientid= picd.patientid and
picd.admissiondate = p.admissiondate and
picd.dischargedate = p.dischargedate inner join
tblicd as t
on t.icd_id = picd.icd_id
where t.descrip like '%diabetes%'
) d
on p.patientid = d.patientid left outer join
(select r.patientid, count(*) as cnt
from patient as r
group by r.patientid
having count(*) >1
) pc
on p.patientid = pc.patientid
order by 2 desc
You can then probably simplify these subqueries more by combining them (for instance "p" and "pc" on the outer query can be combined into one). However, without the correlated subqueries, SQL Server should find it easier to optimize the queries.
Example of left joins as requested...
SELECT
patientid,
ISNULL(CondA.ConditionA,0) as IsConditionA,
ISNULL(CondB.ConditionB,0) as IsConditionB,
....
FROM
patient
LEFT JOIN
(SELECT DISTINCT patientid, 1 as ConditionA from ... where ... ) CondA
ON patient.patientid = CondA.patientID
LEFT JOIN
(SELECT DISTINCT patientid, 1 as ConditionB from ... where ... ) CondB
ON patient.patientid = CondB.patientID
If your Condition queries only return a maximum one row, you can simplify them down to
(SELECT patientid, 1 as ConditionA from ... where ... ) CondA