Incorrect Count with Multiple Joins - sql

I'm getting an incorrect count when I use multiple 'Joins'. It should only show 3 as the total but it's returning 134 for the total. What's the proper way to use COUNT with multiple 'Joins'?
SELECT r.Field1
, Total = COUNT(r.Field1)
FROM Location1.dbo.Table1 r ( NOLOCK )
JOIN Location2.dbo.Table2 i ( NOLOCK ) ON r.Field1 = i.Field1
JOIN Location3.dbo.Table3 rt ( NOLOCK ) ON rt.Field1 = i.Field1
AND rt.Field2 = r.Field2
WHERE r.Field3 = '40'
AND r.Field4 = 'H'
AND r.Field1 = '516'
AND CONVERT(CHAR(10), r.TIMESTAMP, 101) = CONVERT(CHAR(10), GETDATE(), 101)
GROUP BY r.Field1

That's how joins work. You get the total number of results as a result of the joins. So even if the original table only has one row that matches your criteria, the COUNT from a JOIN could have hundreds of results due to one-to-many relationship. You can see why by changing your query:
SELECT *
FROM Location1.dbo.Table1 r ( NOLOCK )
JOIN Location2.dbo.Table2 i ( NOLOCK ) ON r.Field1 = i.Field1
JOIN Location3.dbo.Table3 rt ( NOLOCK ) ON rt.Field1 = i.Field1
AND rt.Field2 = r.Field2
WHERE r.Field3 = '40'
AND r.Field4 = 'H'
AND r.Field1 = '516'
AND CONVERT(CHAR(10), r.TIMESTAMP, 101) = CONVERT(CHAR(10), GETDATE(), 101)
This will return all rows from all tables and you'll see the 134 rows. If you aren't interested in the total, then don't do the join -- since you say that the query without the joins gives you the expected result of 3.

Related

How to write Multiple Sum amount query in select statement using SQL for fast performance

I have writen a query below which is working fine for my report
declare #orgId int=3,
#Year int=2022
;with cte as (
select t.requestNo,year(timeStamp) dispensedInYear from txnhistory(NOLOCK) t where t.categoryNo=411 and t.statusNO=76 and t.companyId=#orgId and
(#Year=0 or (year(t.Timestamp)=#Year ))
)
select distinct p.personid as personId,p.firstName+' '+p.lastname Fitter,
(SELECT ISNULL( ROUND(SUM(amountPaid),3),0) PaidAmount FROM commissionPaid where commissionType='fitter' and personid=cp.personid and statusNo=452 and (#Year=0 or (year(dispensedSignDate)=#Year ))) as total,
(select ISNULL( ROUND(SUM(plt.allowedPrice),3),0) from commissionPaid cp1
inner join ProfitOrLossTable plt on cp1.doNumber=plt.doid where
cp1.personId=cp.personId)BillableAmount,
from cte as r
join productrequestall pr on r.requestNo=pr.requestId --and cp.paymentType='Sales Commission'
join commissionPaid cp on cp.doNumber=pr.doId and cp.commissionType='fitter' and cp.statusNo=452
join commissionratepaid crp on crp.doNumber=cp.doNumber and crp.commissionType=cp.commissionType
join employeecheck ec on ec.employeeCheckId=cp.checkNumber
join person p on p.personId=cp.personId
join fitterCommissionRates fr(NOLOCK) on fr.fitterId=cp.personId and fr.organizationId=#orgId
But is this correct way to write in below line in query because I have other same 3 sum amount counts
(SELECT ISNULL( ROUND(SUM(amountPaid),3),0) PaidAmount FROM commissionPaid where commissionType='fitter' and personid=cp.personid and statusNo=452 and (#Year=0 or (year(dispensedSignDate)=#Year ))) as total
or other way to write above line using left join or sub query to fast report performance and avoid to query timeout issue.
I have modified your query then it is working.
declare
#orgId int = 3,
#Year int = 2021;
select
p.personid,
max( p.firstName +' '+ p.lastname ) Fitter,
coalesce( sum( cp.PaidAmount ), 0 ) PaidAmount
from
txnhistory r
join productrequestall pr
on r.requestNo = pr.requestId
-- and cp.paymentType='Sales Commission'
join
(
SELECT
personid,
doNumber,
ROUND( SUM( amountPaid ), 3) PaidAmount
FROM
commissionPaid
where
commissionType = 'fitter'
and statusNo = 452
and #Year in ( 0, year(dispensedSignDate))
group by
personId,
doNumber,commissionType ) cp
on pr.doId = cp.doNumber
AND pr.doId = cp.doNumber
join commissionratepaid crp
on cp.doNumber = crp.doNumber
and crp.commissionType='fitter'
join person p
on cp.personId = p.personId
join fitterCommissionRates fr
on cp.personId = fr.fitterId
and fr.organizationId = #orgId
where
r.categoryNo = 411
and r.statusNO = 76
and r.companyId = #orgId
and #Year in ( 0, year(r.Timestamp))
group by
p.personid
It APPEARS you are trying to get total commissions (and/or other amounts you are trying to aggregate) based on each given person (fitter). You want to see if there is a way to optimize vs writing 3 separate queries for the respective 3 columns you want aggregated, but all appear to be coming from the same commission paid table (which makes sense). By doing repeated conditional querying per column can be very resource intense, especially with larger data.
What you should probably do is a pre-aggregate of the commission based on the given person AND item based on its qualified criteria. This is done once for the whole query, THEN apply a join based on both parts matching the preceeding ProductRequestALl table content.
The outer query can apply a SUM() of those pre-queried amounts as the outer total is per single person, not based on the person and each "DoID" entry they had in their underlying commission records.
For the outermost query, since I am doing a group by person, but you want to see the name, I am applying a MAX() of the person's name. Since that will never change based on the ID, ID = 1 for "Steve" will always be "Steve", so applying a MAX() eliminates the need of adding the name portion to the GROUP BY clause.
Something like below, and obviously, I dont know your other columns you want aggregated, but think you can get the concept of it.
declare
#orgId int = 3,
#Year int = 2022;
select
p.personid,
max( p.firstName +' '+ p.lastname ) Fitter,
coalesce( sum( cp.PaidAmount ), 0 ) PaidAmount,
coalesce( sum( cp.SecondAmount ), 0 ) SecondAmount,
coalesce( sum( cp.ThirdAmount ), 0 ) ThirdAmount
from
txnhistory r
join productrequestall pr
on r.requestNo = pr.requestId
-- and cp.paymentType='Sales Commission'
join
(
SELECT
personid,
doNumber,
ROUND( SUM( amountPaid ), 3) PaidAmount,
ROUND( SUM( OtherAmount2 ), 3) SecondAmount,
ROUND( SUM( ThirdAmoundColumn ), 3) ThirdAmount
FROM
commissionPaid
where
commissionType = 'fitter'
and statusNo = 452
and #Year in ( 0, year(dispensedSignDate))
group by
personId,
doNumber ) cp
on pr.personid = cp.personid
AND pr.doId = cp.doNumber
join commissionratepaid crp
on cp.doNumber = crp.doNumber
and cp.commissionType = crp.commissionType
join employeecheck ec
on cp.checkNumber = ec.employeeCheckId
join person p
on cp.personId = p.personId
join fitterCommissionRates fr
on cp.personId = fr.fitterId
and fr.organizationId = #orgId
where
r.categoryNo = 411
and r.statusNO = 76
and r.companyId = #orgId
and #Year in ( 0, year(r.Timestamp))
group by
p.personid
Now, you also had an IFNULL() consideration, such as someone does not have any commission. If that is the case, do you still need to see that person even if they never earned a commission? If so, then a left-join MIGHT be applicable.
This query also does away with the "WITH CTE" construct.

Combine two queries to get the data in two columns

SELECT
tblEmployeeMaster.TeamName, SUM(tblData.Quantity) AS 'TotalQuantity'
FROM
tblData
INNER JOIN
tblEmployeeMaster ON tblData.EntryByHQCode = tblEmployeeMaster.E_HQCode
INNER JOIN
tblPhotos ON tblEmployeeMaster.TeamNo = tblPhotos.TeamNo
WHERE
IsPSR = 'Y'
GROUP BY
tblPhotos.TeamSort, tblPhotos.TeamNo, tblPhotos.Data,
tblEmployeeMaster.TeamName
ORDER BY
tblPhotos.TeamSort DESC, TotalQuantity DESC
This returns
Using this statement
select TeamName, count(TeamName) AS 'Head Count'
from dbo.tblEmployeeMaster
where IsPSR = 'Y'
group by teamname
Which returns
I would like to combine these 2 queries in 1 to get the below result.
Tried union / union all but no success :(
Any help will be very much helpful.
You can simply use the sub-query as follows:
SELECT tblEmployeeMaster.TeamName, SUM(tblData.Quantity) AS 'TotalQuantity',
MAX(HEAD_COUNT) AS HEAD_COUNT, -- USE THIS VALUE FROM SUB-QUERY
CASE WHEN MAX(HEAD_COUNT) <> 0
THEN SUM(tblData.Quantity)/MAX(HEAD_COUNT)
END AS PER_MAN_CONTRIBUTION -- column asked in comment
FROM tblData INNER JOIN
tblEmployeeMaster ON tblData.EntryByHQCode = tblEmployeeMaster.E_HQCode INNER JOIN
tblPhotos ON tblEmployeeMaster.TeamNo = tblPhotos.TeamNo
-- FOLLOWING SUB-QUERY CAN BE USED
LEFT JOIN (select TeamName, count(TeamName) AS HEAD_COUNT
from dbo.tblEmployeeMaster
where IsPSR = 'Y' group by teamname) AS HC
ON HC.TeamName = tblEmployeeMaster.TeamName
where IsPSR = 'Y'
GROUP BY tblPhotos.TeamSort, tblPhotos.TeamNo, tblPhotos.Data,tblEmployeeMaster.TeamName
order by tblPhotos.TeamSort desc, TotalQuantity desc

Avoid SQL Pivot returning duplicate rows

I have the following SQL script which returns duplciate values in PIVOT. How do I combine those duplicate records to one row.
Please check the below image for the results set.
SELECT *
FROM (SELECT X.stockcode,
X.description,
X.pack,
X.location,
X.lname,
X.qty,
Y.stockcode AS StockCode2,
y.periodname,
Y.months,
Y.saleqty
FROM (SELECT dbo.stock_items.stockcode,
dbo.stock_items.description,
dbo.stock_items.pack,
dbo.stock_loc_info.location,
dbo.stock_locations.lname,
dbo.stock_loc_info.qty
FROM dbo.stock_locations
INNER JOIN dbo.stock_loc_info
ON dbo.stock_locations.locno = dbo.stock_loc_info.location
LEFT OUTER JOIN dbo.stock_items
ON dbo.stock_loc_info.stockcode = dbo.stock_items.stockcode
WHERE ( dbo.stock_items.status = 's' )) AS X
LEFT OUTER JOIN (SELECT dbo.dr_invlines.stockcode,
( 12 + Datepart(month, Getdate()) - Datepart(month, dbo.dr_trans.transdate) ) % 12 + 1 AS Months,
Sum(dbo.dr_invlines.quantity) AS SaleQty,
dbo.period_status.periodname
FROM dbo.dr_trans
INNER JOIN dbo.period_status
ON dbo.dr_trans.period_seqno = dbo.period_status.seqno
LEFT OUTER JOIN dbo.stock_items AS STOCK_ITEMS_1
RIGHT OUTER JOIN dbo.dr_invlines
ON STOCK_ITEMS_1.stockcode = dbo.dr_invlines.stockcode
ON dbo.dr_trans.seqno = dbo.dr_invlines.hdr_seqno
WHERE ( STOCK_ITEMS_1.status = 'S' )
AND ( dbo.dr_trans.transtype IN ( 1, 2 ) )
AND ( dbo.dr_trans.transdate >= Dateadd(m, -6, Getdate()) )
GROUP BY dbo.dr_invlines.stockcode,
Datepart(month, dbo.dr_trans.transdate),
dbo.period_status.periodname) AS Y
ON X.stockcode = Y.stockcode) z
PIVOT (Sum(saleqty) FOR [months] IN ([1],[2],[3],[4],[5],[6])) AS pivoted
EDIT: I missed the root-cause of your issue being the inclusion of the periodname column causing the percieved duplication. I am leaving this in place as general solution showing CTE usage, because it could still be useful if you then want to do extra filtering/transformation of your pivot results
One way is to take the results of the pivot query and run it through a SELECT DISTINCT query.
An example of wrapping your pivot query as a CTE and using it to feed a SELECT DISTINCT below (please note: untested, but parses as valid in my SSMS)
WITH PivotResults_CTE (
stockcode,
description,
pack,
location,
lname,
qty,
StockCode2,
periodname,
months,
saleqty
)
AS (
SELECT *
FROM (
SELECT X.stockcode
,X.description
,X.pack
,X.location
,X.lname
,X.qty
,Y.stockcode AS StockCode2
,y.periodname
,Y.months
,Y.saleqty
FROM (
SELECT dbo.stock_items.stockcode
,dbo.stock_items.description
,dbo.stock_items.pack
,dbo.stock_loc_info.location
,dbo.stock_locations.lname
,dbo.stock_loc_info.qty
FROM dbo.stock_locations
INNER JOIN dbo.stock_loc_info ON dbo.stock_locations.locno = dbo.stock_loc_info.location
LEFT OUTER JOIN dbo.stock_items ON dbo.stock_loc_info.stockcode = dbo.stock_items.stockcode
WHERE (dbo.stock_items.STATUS = 's')
) AS X
LEFT OUTER JOIN (
SELECT dbo.dr_invlines.stockcode
,(12 + Datepart(month, Getdate()) - Datepart(month, dbo.dr_trans.transdate)) % 12 + 1 AS Months
,Sum(dbo.dr_invlines.quantity) AS SaleQty
,dbo.period_status.periodname
FROM dbo.dr_trans
INNER JOIN dbo.period_status ON dbo.dr_trans.period_seqno = dbo.period_status.seqno
LEFT OUTER JOIN dbo.stock_items AS STOCK_ITEMS_1
RIGHT OUTER JOIN dbo.dr_invlines ON STOCK_ITEMS_1.stockcode = dbo.dr_invlines.stockcode ON dbo.dr_trans.seqno = dbo.dr_invlines.hdr_seqno WHERE (STOCK_ITEMS_1.STATUS = 'S')
AND (
dbo.dr_trans.transtype IN (
1
,2
)
)
AND (dbo.dr_trans.transdate >= Dateadd(m, - 6, Getdate()))
GROUP BY dbo.dr_invlines.stockcode
,Datepart(month, dbo.dr_trans.transdate)
,dbo.period_status.periodname
) AS Y ON X.stockcode = Y.stockcode
) z
PIVOT(Sum(saleqty) FOR [months] IN (
[1]
,[2]
,[3]
,[4]
,[5]
,[6]
)) AS pivoted
)
SELECT DISTINCT *
FROM
PivotResults_CTE
;
Also note, your sql included in the above may look slightly different to your original but that is only because i ran it through a reformatter to ensure i understood the structure of it.
In other words, the basic CTE wrapper for your pivot query is:
WITH PivotResults_CTE (
Field1,
Field2,
...
)
AS (
YOUR_PIVOT_QUERY_HERE
)
SELECT DISTINCT *
FROM
PivotResults_CTE
;

I need help improving my SQL query for pulling a recent document count

This is just a portion of the query, but it seems to be the bottleneck:
SELECT CAST (CASE WHEN EXISTS
(SELECT 1
FROM dbo.CBDocument
WHERE (FirmId = R.FirmId) AND
(ContributionDate > DATEADD(m, -3, GETDATE())) AND
((EntityTypeId = 2600 AND EntityId = P.IProductId) OR
(EntityTypeId = 2500 AND EntityId = M.IManagerId)))
THEN 1 ELSE 0 END AS BIT) AS HasRecentDocuments
FROM dbo.CBIProduct P
JOIN dbo.CBIManager M ON P.IManagerId = M.IManagerId
JOIN dbo.CBIProductRating R ON P.IProductId = R.IProductId
JOIN dbo.CBIProductFirmDetail D ON (D.IProductId = P.IProductId) AND
(R.FirmId = D.FirmId)
CROSS APPLY (SELECT TOP 1 RatingDate, IProductRatingId, FirmId
FROM dbo.CBIProductRating
WHERE (IProductId = P.IProductId) AND (FirmId = R.FirmId)
ORDER BY RatingDate DESC) AS RD
WHERE (R.IProductRatingId = RD.IProductRatingId) AND (R.FirmId = RD.FirmId)
There are a lot of other columns that I typically pull back that need the CROSS APPLY and the other joins. The bit I need to optimize is the sub-query in the case statement. This subquery takes over 3 minutes to return 119k records. I know enough about SQL to get this far, but there has to be a way to make this more efficient.
The gist of the query is just to return a flag if the associated product has any documents that have been added to the system within the last 3 months.
Edit: My DB is hosted in Azure and the database tuning advisor won't connect to it. There is a tuning advisor component in Azure, but it's not suggesting anything. There must be a better approach to the query.
Edit: In an attempt to further simplify and determine the culprit, I whittled it down to this query: (Rather than determine if a recent doc exists, it just counts recent docs.)
SELECT D.FirmId, P.IProductId,
,(SELECT COUNT(DocumentId) FROM dbo.CBDocument WHERE
(FirmId = D.FirmId) AND
(ContributionDate > DATEADD(m, -3, GETDATE())) AND
((EntityTypeId = 2600 AND EntityId = P.IProductId) OR
(EntityTypeId = 2500 AND EntityId = M.IManagerId))) AS RecentDocCount
FROM dbo.CBIProduct P
FULL JOIN dbo.CBIProductFirmDetail D ON D.IProductId = P.IProductId
JOIN dbo.CBIManager M ON M.IManagerId = P.IManagerId
That runs in 3 minutes, 53 seconds.
If I declare a variable to store the date (DECLARE #Today DATE = GETDATE())
and put the variable in place of GETDATE() in the query (DATEADD(m, -3, #Today)), it runs in 12 seconds.
Is there a known performance issue with GETDATE()? As far as I know, I can't use the variable in a view definition.
Does this shine any light on anything that could point to a solution? I suppose I could turn the whole thing into a stored procedure, but then I also have to adjust the application code.
Thanks.
This is the query that you claim needs optimization:
SELECT CAST(CASE WHEN EXISTS (SELECT 1
FROM dbo.CBDocument d
WHERE (d.FirmId = R.FirmId) AND
(d.ContributionDate > DATEADD(m, -3, GETDATE())) AND
((d.EntityTypeId = 2600 AND d.EntityId = P.IProductId) OR
(d.EntityTypeId = 2500 AND d.EntityId = M.IManagerId)
)
)
. . .
I'll trust your judgement. I think phrasing the query like this gives you more paths to optimization:
SELECT CAST(CASE WHEN EXISTS (SELECT 1
FROM dbo.CBDocument d
WHERE d.FirmId = R.FirmId AND
d.ContributionDate > DATEADD(m, -3, GETDATE()) AND
d.EntityTypeId = 2600 AND d.EntityId = P.IProductId
) OR
EXISTS (SELECT 1
FROM dbo.CBDocument d
WHERE d.FirmId = R.FirmId AND
d.ContributionDate > DATEADD(m, -3, GETDATE()) AND
d.EntityTypeId = 2500 AND d.EntityId = M.IManagerId
)
. . .
Then you want an index on CBDocument(FirmId, EntityTypeId, EntityId, ContributionDate).
Operations such as correlated subqueries and full outer join are rather expensive and I would suggest looking for alternatives to those. Whilst I am not familiar with your data model or data, I suggest the changing the "from table" to CBIProductFirmDetail and I have further assumed an inner join the product table and the manager table then inner joined to the product table. If that join sequence is correct this removes the expense of some outer joins.
In place of the correlated subquery to determine a count, I suggest you treat that as a subquery which is left joined.
SELECT
d.FirmId
, p.IProductId
, COALESCE(Docs.RecentDocCount,0) RecentDocCount
FROM dbo.CBIProductFirmDetail d
JOIN dbo.CBIProduct p ON d.IProductId = p.IProductId
JOIN dbo.CBIManager m ON p.IManagerId = m.IManagerId
LEFT JOIN (
SELECT
FirmId
, EntityId
, EntityTypeId
, COUNT(DocumentId) recentdoccount
FROM dbo.CBDocument
WHERE ContributionDate > DATEADD(m, -3, GETDATE())
AND EntityTypeId IN (2500,2600)
GROUP BY
FirmId
, EntityId
, EntityTypeId
) AS docs ON d.FirmId = docs.FirmId
AND (
(docs.EntityTypeId = 2600 AND docs.EntityId = p.IProductId)
OR (docs.EntityTypeId = 2500 AND docs.EntityId = m.IManagerId)
)
;
There might be benefit in dividing that subquery too to avoid the awkward OR in that join, so:
SELECT
d.FirmId
, p.IProductId
, COALESCE(d2500.DocCount,0) + COALESCE(d2600.DocCount,0) RecentDocCount
FROM dbo.CBIProductFirmDetail d
JOIN dbo.CBIProduct p ON d.IProductId = p.IProductId
JOIN dbo.CBIManager m ON p.IManagerId = m.IManagerId
LEFT JOIN (
SELECT
FirmId
, EntityId
, COUNT(DocumentId) doccount
FROM dbo.CBDocument
WHERE ContributionDate > DATEADD(m, -3, GETDATE())
AND EntityTypeId = 2500
GROUP BY
FirmId
, EntityId
) AS d2500 ON d.FirmId = d2500.FirmId
AND m.IManagerId = d2500.EntityId
LEFT JOIN (
SELECT
FirmId
, EntityId
, COUNT(DocumentId) doccount
FROM dbo.CBDocument
WHERE ContributionDate > DATEADD(m, -3, GETDATE())
AND EntityTypeId = 2600
GROUP BY
FirmId
, EntityId
) AS d2600 ON d.FirmId = d2600.FirmId
AND p.IProductId = d2600.EntityId
;
Depending on stuff it may be faster to use a left join:
SELECT CAST(CASE when x.FirmId is not null THEN 1 ELSE 0 END AS BIT) AS HasRecentDocuments
FROM dbo.CBIProduct P
JOIN dbo.CBIManager M ON P.IManagerId = M.IManagerId
JOIN dbo.CBIProductRating R ON P.IProductId = R.IProductId
JOIN dbo.CBIProductFirmDetail D ON (D.IProductId = P.IProductId) AND (R.FirmId = D.FirmId)
LEFT JOIN dbo.CBDocument x ON x.FirmId = R.FirmId
AND x.ContributionDate > DATEADD(m, -3, GETDATE())
AND ( (x.EntityTypeId = 2600 AND x.EntityId = P.IProductId)
OR (x.EntityTypeId = 2500 AND x.EntityId = M.IManagerId))
CROSS APPLY (SELECT TOP 1 RatingDate, IProductRatingId, FirmId
FROM dbo.CBIProductRating
WHERE (IProductId = P.IProductId) AND (FirmId = R.FirmId)
ORDER BY RatingDate DESC) AS RD
WHERE (R.IProductRatingId = RD.IProductRatingId) AND (R.FirmId = RD.FirmId)
it certainly looks simpler.

Inner join that ignore singlets

I have to do an self join on a table. I am trying to return a list of several columns to see how many of each type of drug test was performed on same day (MM/DD/YYYY) in which there were at least two tests done and at least one of which resulted in a result code of 'UN'.
I am joining other tables to get the information as below. The problem is I do not quite understand how to exclude someone who has a single result row in which they did have a 'UN' result on a day but did not have any other tests that day.
Query Results (Columns)
County, DrugTestID, ID, Name, CollectionDate, DrugTestType, Results, Count(DrugTestType)
I have several rows for ID 12345 which are correct. But ID 12346 is a single row of which is showing they had a row result of count (1). They had a result of 'UN' on this day but they did not have any other tests that day. I want to exclude this.
I tried the following query
select
c.desc as 'County',
dt.pid as 'PID',
dt.id as 'DrugTestID',
p.id as 'ID',
bio.FullName as 'Participant',
CONVERT(varchar, dt.CollectionDate, 101) as 'CollectionDate',
dtt.desc as 'Drug Test Type',
dt.result as Result,
COUNT(dt.dru_drug_test_type) as 'Count Of Test Type'
from
dbo.Test as dt with (nolock)
join dbo.History as h on dt.pid = h.id
join dbo.Participant as p on h.pid = p.id
join BioData as bio on bio.id = p.id
join County as c with (nolock) on p.CountyCode = c.code
join DrugTestType as dtt with (nolock) on dt.DrugTestType = dtt.code
inner join
(
select distinct
dt2.pid,
CONVERT(varchar, dt2.CollectionDate, 101) as 'CollectionDate'
from
dbo.DrugTest as dt2 with (nolock)
join dbo.History as h2 on dt2.pid = h2.id
join dbo.Participant as p2 on h2.pid = p2.id
where
dt2.result = 'UN'
and dt2.CollectionDate between '11-01-2011' and '10-31-2012'
and p2.DrugCourtType = 'AD'
) as derived
on dt.pid = derived.pid
and convert(varchar, dt.CollectionDate, 101) = convert(varchar, derived.CollectionDate, 101)
group by
c.desc, dt.pid, p.id, dt.id, bio.fullname, dt.CollectionDate, dtt.desc, dt.result
order by
c.desc ASC, Participant ASC, dt.CollectionDate ASC
This is a little complicated because the your query has a separate row for each test. You need to use window/analytic functions to get the information you want. These allow you to do calculate aggregation functions, but to put the values on each line.
The following query starts with your query. It then calculates the number of UN results on each date for each participant and the total number of tests. It applies the appropriate filter to get what you want:
with base as (<your query here>)
select b.*
from (select b.*,
sum(isUN) over (partition by Participant, CollectionDate) as NumUNs,
count(*) over (partition by Partitipant, CollectionDate) as NumTests
from (select b.*,
(case when result = 'UN' then 1 else 0 end) as IsUN
from base
) b
) b
where NumUNs <> 1 or NumTests <> 1
Without the with clause or window functions, you can create a particularly ugly query to do the same thing:
select b.*
from (<your query>) b join
(select Participant, CollectionDate, count(*) as NumTests,
sum(case when result = 'UN' then 1 else 0 end) as NumUNs
from (<your query>) b
group by Participant, CollectionDate
) bsum
on b.Participant = bsum.Participant and
b.CollectionDate = bsum.CollectionDate
where NumUNs <> 1 or NumTests <> 1
If I understand the problem, the basic pattern for this sort of query is simply to include negating or exclusionary conditions in your join. I.E., self-join where columnA matches, but columns B and C do not:
select
[columns]
from
table t1
join table t2 on (
t1.NonPkId = t2.NonPkId
and t1.PkId != t2.PkId
and t1.category != t2.category
)
Put the conditions in the WHERE clause if it benchmarks better:
select
[columns]
from
table t1
join table t2 on (
t1.NonPkId = t2.NonPkId
)
where
t1.PkId != t2.PkId
and t1.category != t2.category
And it's often easiest to start with the self-join, treating it as a "base table" on which to join all related information:
select
[columns]
from
(select
[columns]
from
table t1
join table t2 on (
t1.NonPkId = t2.NonPkId
)
where
t1.PkId != t2.PkId
and t1.category != t2.category
) bt
join [othertable] on (<whatever>)
join [othertable] on (<whatever>)
join [othertable] on (<whatever>)
This can allow you to focus on getting that self-join right, without interference from other tables.