SQL Server Query Returning Same Row Twice - sql

I seem to be having trouble with the following query. It basically works, but I have a case where it is returning one row from mc_WorkoutDetails twice!
Here's the original query:
ALTER PROCEDURE [dbo].[mc_Workouts_GetActivities]
#WorkoutID bigint
AS
BEGIN
SET NOCOUNT ON
SELECT d.ID, a.Description,
CASE WHEN Reps = 0 THEN NULL ELSE Reps END AS Reps,
CASE WHEN Sets = 0 THEN NULL ELSE Sets END AS Sets,
CASE WHEN Minutes = 0 THEN NULL ELSE Minutes END AS Minutes,
d.Comments, c.Name AS Category, a.CategoryID,
(CASE WHEN v.ActivityID IS NULL THEN 0 ELSE 1 END) AS HasVideo,
a.ID AS ActivityID
FROM mc_WorkoutDetails d
INNER JOIN mc_Activities a ON d.ActivityID = a.ID
INNER JOIN mc_Activities_Categories c ON a.CategoryID = c.ID
LEFT OUTER JOIN mc_TrainerVideos v ON a.ID = v.ActivityID
WHERE (d.WorkoutID = #WorkoutID)
ORDER BY SortOrder, a.Description
RETURN ##ERROR
END
Then I tried changing:
INNER JOIN mc_Activities a ON d.ActivityID = a.ID
INNER JOIN mc_Activities_Categories c ON a.CategoryID = c.ID
To:
LEFT OUTER JOIN mc_Activities a ON d.ActivityID = a.ID
LEFT OUTER JOIN mc_Activities_Categories c ON a.CategoryID = c.ID
But that didn't seem to help. I still get the duplicate row.
Can anyone see what's happening?

What you can do is add a join back to the same table using a group to weed out the duplicate rows.
So add this to your join after FROM mc_WorkoutDetails d:
inner join (select [columns you want to select], max(id) id
from mc_WorkoutDetails
group by [columns you want to select] ) q on q.id = d.id
Let me know if that makes sense. Basically you are doing a distinct and getting the max id so you eliminate one of the rows in the join. You have to remember that even if you want there to be duplicates, they will be eliminated even if they are suppose to be there.
The full alter would be:
ALTER PROCEDURE [dbo].[mc_Workouts_GetActivities]
#WorkoutID bigint
AS
BEGIN
SET NOCOUNT ON
SELECT d.ID, a.Description,
CASE WHEN Reps = 0 THEN NULL ELSE Reps END AS Reps,
CASE WHEN Sets = 0 THEN NULL ELSE Sets END AS Sets,
CASE WHEN Minutes = 0 THEN NULL ELSE Minutes END AS Minutes,
d.Comments, c.Name AS Category, a.CategoryID,
(CASE WHEN v.ActivityID IS NULL THEN 0 ELSE 1 END) AS HasVideo,
a.ID AS ActivityID
FROM mc_WorkoutDetails d
inner join (select Reps, Sets, Comments, Minutes, max(id) id
from mc_WorkoutDetails
group by Reps, Sets, Comments, Minutes ) q on q.id = d.id
INNER JOIN mc_Activities a ON d.ActivityID = a.ID
INNER JOIN mc_Activities_Categories c ON a.CategoryID = c.ID
LEFT OUTER JOIN mc_TrainerVideos v ON a.ID = v.ActivityID
WHERE (d.WorkoutID = #WorkoutID)
ORDER BY SortOrder, a.Description
RETURN ##ERROR
END

Thanks for everyone's input. The general suggestions here were correct: I had two rows in the mc_workoutDetails table that referenced the same row in the mc_Activities table.
While the foreign key was part of a unique primary key, it was a compound key and so this column could contain duplicates as long as the other column in the key were different.

Related

Select an ID where there is only one row and that row is a specific value

I have this query. There's a lot of joins because I am checking if an ID is linked to any of those tables.
Currently, this query shows me any ID's that are not linked to any of those tables. I would like to add to it so that it also shows any IDs that are linked to the d table, but only if there is only 1 row in the D table and the type in the D field is 'member'.
SELECT
c.ID,
c.location,
c.pb,
c.name,
c.surname
FROM c
LEFT JOIN l on c.rowno = l.rowno
LEFT JOIN d on c.rowno = d.rowno
LEFT JOIN t on c.rowno = t.rowno
LEFT JOIN cj ON (c.rowno = cj.rowno OR c.rowno = cj.rowno2)
LEFT JOIN dj ON c.rowno = d.rowno
LEFT JOIN lg ON c.rowno = lg.rowno
LEFT JOIN tj ON c.rowno = tj.rowno
WHERE
c.status != 'closed'
AND l.rowno IS NULL
AND d.rowno IS NULL
AND t.rowno IS NULL
AND cj.rowno IS NULL
AND dj.rowno IS NULL
AND lg.rowno IS NULL
AND tj.rowno IS NULL
My first thought is to just add
WHERE D.type = 'member'
But that gives me all IDs that have a row with D.type = member (they could have 10 rows with all different types, but as long as 1 of those has type = member it shows up). I want to see ID's that ONLY have d.type = member
I'm sorry if I'm wording this badly, I'm having trouble getting this straight in my head. Any help is appreciated!
I would use exists for all conditions except the one on the D table:
SELECT c.*
FROM c JOIN
(SELECT d.rownum, COUNT(*) as cnt,
SUM(CASE WHEN d.type = 'Member' THEN 1 ELSE 0 END) as num_members
FROM t
GROUP BY d.rownum
) d
ON c.rownum = d.rownum
WHERE c.status <> 'closed' AND
NOT EXISTS (SELECT 1 FROM t WHERE c.rowno = t.rowno) AND
NOT EXISTS (SELECT 1 FROM l WHERE c.rowno = l.rowno) AND
. . .
I find NOT EXISTS is easier to follow logically. I don't think there is a big performance difference between the two methods in SQL Server.

Remove duplicate address_id from sql data set

I need to get only distinct address_id in result no duplication. Here is my query.
SELECT DISTINCT address.address_id, address.address1, address.streetcity, state.stateabbrev, rtrim(ltrim(case when address.streetzipcode is not null and address.streetzipcode != 'NULL' then address.streetzipcode else '' end))+case when len(address.streetzipplus4)>0 then '-'+rtrim(ltrim(address.streetzipplus4)) else '' end as streetzipcode, address.homephone,
dbo.f_addressstudent (student.address_id) as Students,
dbo.f_addresspeople (student.address_id) as Adults,
case
when #classif_id IS NULL then 0
else
student.classif_id
end classif,
classifctn
FROM district WITH(NOLOCK)
JOIN dbo.building ON building.district_id = district.district_id
JOIN dbo.studbldg_bridge WITH(NOLOCK) ON studbldg_bridge.bldg_id=building.bldg_id
JOIN dbo.student WITH(NOLOCK) ON student.student_id = studbldg_bridge.student_id
JOIN classif with(nolock) on student.classif_id = classif.classif_id
LEFT JOIN dbo.address WITH(NOLOCK) ON student.address_id = address.address_id
LEFT JOIN dbo.state WITH(NOLOCK) ON address.streetstate_id = state.state_id
LEFT JOIN dbo.state AS mailstate WITH(NOLOCK) ON address.state_id = mailstate.state_id
WHERE district.district_id = (SELECT district_id FROM dbo.building WITH(NOLOCK) WHERE bldg_id = #bldg_id)
ORDER BY classif,Adults, Students
Here is result of query
Query result with error in data
I have tried to group by and use aggregate function with address_id but I also have non-aggregate columns so it didn't worked for me.
After that I also tried using OVER(partition by address.address_id) but it also didn't worked.
Any help will be appreciated in advance.
Thank you
**UPDATE on Business logic/Requirements **
I need to get unique addresses for parents of students. As parent can have two or more children living in same address, it causes duplication. I need to get only one child per parent in other words.
From your image of the results it looks like the classifctn column has more than 1 value so it is repeating your row by that. In order to get 1 distinct address_id and rest of the columns either remove it from your query or you can set a precedence that will only return 1 record per address_id
further please tag only the RDBMs you ware actually using. MySQL for example doesn't have window functions yet you tagged it yet referenced using OVER(partition.... which would not be possible in mysql
;WITH cte (
SELECT DISTINCT address.address_id, address.address1, address.streetcity, state.stateabbrev, rtrim(ltrim(case when address.streetzipcode is not null and address.streetzipcode != 'NULL' then address.streetzipcode else '' end))+case when len(address.streetzipplus4)>0 then '-'+rtrim(ltrim(address.streetzipplus4)) else '' end as streetzipcode, address.homephone,
dbo.f_addressstudent (student.address_id) as Students,
dbo.f_addresspeople (student.address_id) as Adults,
case
when #classif_id IS NULL then 0
else
student.classif_id
end classif,
classifctn,
ROW_NUMBER() OVER (PARTITION BY address.address_id ORDER BY HOW WILL YOU CHOOSE?) AS RowNum
FROM district WITH(NOLOCK)
JOIN dbo.building ON building.district_id = district.district_id
JOIN dbo.studbldg_bridge WITH(NOLOCK) ON studbldg_bridge.bldg_id=building.bldg_id
JOIN dbo.student WITH(NOLOCK) ON student.student_id = studbldg_bridge.student_id
JOIN classif with(nolock) on student.classif_id = classif.classif_id
LEFT JOIN dbo.address WITH(NOLOCK) ON student.address_id = address.address_id
LEFT JOIN dbo.state WITH(NOLOCK) ON address.streetstate_id = state.state_id
LEFT JOIN dbo.state AS mailstate WITH(NOLOCK) ON address.state_id = mailstate.state_id
WHERE district.district_id = (SELECT district_id FROM dbo.building WITH(NOLOCK) WHERE bldg_id = #bldg_id)
)
SELECT *
FROM
cte
WHERE
RowNum = 1
ORDER BY
classif
,Adults
,Students
Alternatively you could nest your select query. note though this solution is somewhat useless as it will only return 1 grade/classifctn when more than 1 exists in a household if you really don't care about the column then you should just remove it from your query.
Actually both your classifctn and classif columns will cause you multiple rows when more than 1 student is at the same address. here is a way to concatenate those values to a single row. You should spend some more time on your business case and defining it for us. But here is one example for you:
SELECT DISTINCT
address.address_id
,address.address1
,address.streetcity
,state.stateabbrev
,LTRIM(RTRIM(ISNULL(NULLIF(address.streetzipcode,'NULL'),'')))
+ CASE WHEN LEN(address.streetzipplus4) > 0 THEN '-' ELSE '' END
+ LTRIM(RTRIM(ISNULL(address.streetzipplus4,''))) AS streetzipcode
,address.homephone
,dbo.f_addressstudent (student.address_id) as Students
,dbo.f_addresspeople (student.address_id) as Adults
, case
when #classif_id IS NULL then 0
else student.classif_id
end classif
,STUFF(
(SELECT ',' + CAST(classif_id AS VARCHAR(100))
FROM
classif c
WHERE c.classif = student.classif
FOR XML PATH(''))
,1,1,'') AS classifs
,STUFF(
(SELECT ',' + CAST(classifctn AS VARCHAR(100))
FROM
classif c
WHERE c.classif = student.classif
FOR XML PATH(''))
,1,1,'') AS classifctns
FROM
district WITH(NOLOCK)
INNER JOIN dbo.building
ON building.district_id = district.district_id
AND building.bldg_id = #bldg_id
INNER JOIN dbo.student WITH(NOLOCK)
ON student.student_id = studbldg_bridge.student_id
INNER JOIN dbo.address WITH(NOLOCK)
ON student.address_id = address.address_id
LEFT JOIN dbo.state WITH(NOLOCK)
ON address.streetstate_id = state.state_id
Note I when ahead and changed the zip code logic to show you some use of ISNULL() and NULLIF() that are helpful in cases like that. I also removed 3 tables because 2 are not used and the third ends up being used in a subselect to concatenate the values. Also address table was changed to an INNER JOIN because if an address doesn't exist all of the other information becomes blank/useless....
INNER JOIN dbo.studbldg_bridge WITH(NOLOCK) ON studbldg_bridge.bldg_id=building.bldg_id
LEFT JOIN dbo.state AS mailstate WITH(NOLOCK) ON address.state_id = mailstate.state_id
INNER JOIN classif with(nolock) on student.classif_id = classif.classif_id

Get DISTINCT record using INNER JOIN

I have the follwong Query on multi tables
SELECT DISTINCT b.BoxBarcode as [Box Barcode], (select case when b.ImagesCount IS NULL
then 0
else b.ImagesCount end) as [Total Images], s.StageName as [Current Stage] ,d.DocuementTypeName as [Document Type],
u.UserName as [Start User],uu.UserName as [Finished User]
FROM [dbo].[Operations] o
inner join dbo.LKUP_Stages s on
o.stageid=s.id
inner join dbo.LKUP_Users u on
u.id=o.startuserid
left join dbo.LKUP_Users uu on
uu.id=o.FinishedUserID
inner join boxes b on
b.id=o.boxid
inner join LKUP_DocumentTypes d on
d.ID = b.DocTypeID
where b.IsExportFinished = 0
when i select count from the Boxes table where IsExportFinished = 0
i got the Count 42 records, when i run the above qoury i got 71 records,
i want just the 42 records in Boxes table to retrived.
You are doing a one-to-many join, i.e. at least one of the tables have multiple rows that match the join criteria.
Step one is to find which table(s) that give the "duplicates".
Once you have done that you may be able to fix the problem by adding additional criteria to the join. I'm taking a guess that the same boxid occurs several times in the Operations table. If that is the case you need to decide which Operation row you want to select and then update the SQL accordingly.
Try this one -
SELECT
Box_Barcode = b.BoxBarcode
, Total_Images = ISNULL(b.ImagesCount, 0)
, Current_Stage = s.StageName
, Document_Type = d.DocuementTypeName
, Start_User = u.UserName
, Finished_User = uu.UserName
FROM (
SELECT DISTINCT
o.stageid
, o.boxid
, o.startuserid
, o.FinishedUserID
FROM dbo.[Operations]
) o
JOIN dbo.LKUP_Stages s ON o.stageid = s.id
JOIN dbo.boxes b ON b.id = o.boxid
JOIN dbo.LKUP_DocumentTypes d ON d.id = b.DocTypeID
JOIN dbo.LKUP_Users u ON u.id = o.startuserid
LEFT JOIN dbo.LKUP_Users uu ON uu.id = o.FinishedUserID
WHERE b.IsExportFinished = 0
I guess that if you change your LEFT JOIN into INNER JOIN you will get 42 records as requested.

SQL-subquery workaround to join based on alias

Attempted Subquery using alias
SELECT
*,
IQ.EndPart,
IQ.QtyToShip
FROM
parts p
INNER JOIN (
select
*,
(case when c.kitno is null then l.partno else c.partno end) as [EndPart],
(case when c.kitno is null then l.TotalQuantity else c.reqqty end) as [QtyToShip]
FROM
shipments s
INNER JOIN shipments_li l ON s.ShipmentNo = l.ShipmentNo
LEFT JOIN ProductConfiguration c ON l.PartNo = c.KitNo WHERE s.Status='N' and year(s.OrderDate)>2007
) IQ ON p.partno = IQ.EndPart
Looking for a way to join the parts table to my query below, using the part # which is aliased as EndPart. If there is another way to acheive the results of taking two columns and combining them instead of case and an alias that would be a great alternative as well. All my searches of other individuals quest to achieve the same have yielded the result you cannot perform a join based on alias because the results have not been determined at that point, recommending a subquery as a workaround. I'm just not sure how to acheive working results. The above query was what I attempted to get working. Any help appreciated. Thanks in advance.
Original Query
SELECT
*,
(case when c.kitno is null then l.partno else c.partno end) as [EndPart],
(case when c.kitno is null then l.TotalQuantity else c.reqqty end) as [QtyToShip]
FROM
shipments s
INNER JOIN shipments_li l ON s.ShipmentNo = l.ShipmentNo
LEFT JOIN ProductConfiguration c ON l.PartNo = c.KitNo
WHERE
s.Status='N'
and year(s.OrderDate)>2007
order by s.shipmentno
More proper way
--- CREATE TempTable
DECLARE #tblTemp AS TABLE (EndPart INT,
QtyToShip INT)
INSERT INTO #tblTemp (EndPart, QtyToShip)
SELECT
(CASE WHEN c.kitno IS NULL
THEN l.partno ELSE c.partno END) AS [EndPart],
(CASE WHEN c.kitno IS NULL
THEN l.TotalQuantity
ELSE c.reqqty END) AS [QtyToShip]
FROM dbo.Shipments s WITH(NOLOCK)
INNER JOIN shipments_li l WITH(NOLOCK) ON s.ShipmentNo = l.ShipmentNo
LEFT JOIN ProductConfiguration c WITH(NOLOCK) ON l.PartNo = c.KitNo
WHERE s.Status='N' AND YEAR(s.OrderDate)>2007
SELECT
p.*,
tmp.EndPart,
tmp.QtyToShip
FROM dbo.parts p WITH(NOLOCK)
INNER JOIN #tblTemp tmp ON p.partno = tmp.EndPart

Inner join that ignore singlets

I have to do an self join on a table. I am trying to return a list of several columns to see how many of each type of drug test was performed on same day (MM/DD/YYYY) in which there were at least two tests done and at least one of which resulted in a result code of 'UN'.
I am joining other tables to get the information as below. The problem is I do not quite understand how to exclude someone who has a single result row in which they did have a 'UN' result on a day but did not have any other tests that day.
Query Results (Columns)
County, DrugTestID, ID, Name, CollectionDate, DrugTestType, Results, Count(DrugTestType)
I have several rows for ID 12345 which are correct. But ID 12346 is a single row of which is showing they had a row result of count (1). They had a result of 'UN' on this day but they did not have any other tests that day. I want to exclude this.
I tried the following query
select
c.desc as 'County',
dt.pid as 'PID',
dt.id as 'DrugTestID',
p.id as 'ID',
bio.FullName as 'Participant',
CONVERT(varchar, dt.CollectionDate, 101) as 'CollectionDate',
dtt.desc as 'Drug Test Type',
dt.result as Result,
COUNT(dt.dru_drug_test_type) as 'Count Of Test Type'
from
dbo.Test as dt with (nolock)
join dbo.History as h on dt.pid = h.id
join dbo.Participant as p on h.pid = p.id
join BioData as bio on bio.id = p.id
join County as c with (nolock) on p.CountyCode = c.code
join DrugTestType as dtt with (nolock) on dt.DrugTestType = dtt.code
inner join
(
select distinct
dt2.pid,
CONVERT(varchar, dt2.CollectionDate, 101) as 'CollectionDate'
from
dbo.DrugTest as dt2 with (nolock)
join dbo.History as h2 on dt2.pid = h2.id
join dbo.Participant as p2 on h2.pid = p2.id
where
dt2.result = 'UN'
and dt2.CollectionDate between '11-01-2011' and '10-31-2012'
and p2.DrugCourtType = 'AD'
) as derived
on dt.pid = derived.pid
and convert(varchar, dt.CollectionDate, 101) = convert(varchar, derived.CollectionDate, 101)
group by
c.desc, dt.pid, p.id, dt.id, bio.fullname, dt.CollectionDate, dtt.desc, dt.result
order by
c.desc ASC, Participant ASC, dt.CollectionDate ASC
This is a little complicated because the your query has a separate row for each test. You need to use window/analytic functions to get the information you want. These allow you to do calculate aggregation functions, but to put the values on each line.
The following query starts with your query. It then calculates the number of UN results on each date for each participant and the total number of tests. It applies the appropriate filter to get what you want:
with base as (<your query here>)
select b.*
from (select b.*,
sum(isUN) over (partition by Participant, CollectionDate) as NumUNs,
count(*) over (partition by Partitipant, CollectionDate) as NumTests
from (select b.*,
(case when result = 'UN' then 1 else 0 end) as IsUN
from base
) b
) b
where NumUNs <> 1 or NumTests <> 1
Without the with clause or window functions, you can create a particularly ugly query to do the same thing:
select b.*
from (<your query>) b join
(select Participant, CollectionDate, count(*) as NumTests,
sum(case when result = 'UN' then 1 else 0 end) as NumUNs
from (<your query>) b
group by Participant, CollectionDate
) bsum
on b.Participant = bsum.Participant and
b.CollectionDate = bsum.CollectionDate
where NumUNs <> 1 or NumTests <> 1
If I understand the problem, the basic pattern for this sort of query is simply to include negating or exclusionary conditions in your join. I.E., self-join where columnA matches, but columns B and C do not:
select
[columns]
from
table t1
join table t2 on (
t1.NonPkId = t2.NonPkId
and t1.PkId != t2.PkId
and t1.category != t2.category
)
Put the conditions in the WHERE clause if it benchmarks better:
select
[columns]
from
table t1
join table t2 on (
t1.NonPkId = t2.NonPkId
)
where
t1.PkId != t2.PkId
and t1.category != t2.category
And it's often easiest to start with the self-join, treating it as a "base table" on which to join all related information:
select
[columns]
from
(select
[columns]
from
table t1
join table t2 on (
t1.NonPkId = t2.NonPkId
)
where
t1.PkId != t2.PkId
and t1.category != t2.category
) bt
join [othertable] on (<whatever>)
join [othertable] on (<whatever>)
join [othertable] on (<whatever>)
This can allow you to focus on getting that self-join right, without interference from other tables.