Find NON-duplicate column value combinations - sql

This is for a migration script.
CompanyTable:
EmployeeId
DivisionId
abc
div1
def
div1
abc
div1
abc
div2
xyz
div2
In the below code I am Selecting duplicate EmployeeId-DivisionId combinations, that is, the records that have the same EmployeeId and DivisionId will be selected. So from the above table, the two rows that have abc-div1 combination will be selected by the below code.
How can I invert it? It seems so simple but I can't figure it out. I tried replacing with HAVING count(*) = 0 instead of > 1, I've tried fiddling with the equality signs in the ON and AND lines. Basically from the above table, I want to select the other three rows that don't have the abc-div1 combination. If there is a way to select all the unique EmployeeID-DivisionId combinations, let me know.
SELECT a.EmployeeID, a.DivisionId FROM CompanyTable a
JOIN ( SELECT EmployeeID, DivisionId
FROM CompanyTable
GROUP BY EmployeeID, DivisionId
HAVING count(*) > 1 ) b
ON a.EmployeeID = b.EmployeeID
AND a.DivisionId = b.DivisionId;
EmployeeId and DivisionId are both nvarchar(50) columns.

A windowed count would seem a suitable method:
select employeeid, divisionid
from (
select *, Count(*) over(partition by employeeid, divisionid) ct
from t
)t
where ct = 1;

As already mentioned, you must replace > 1 by its real opposite <= 1, this works: db<>fiddle

First, let's try rewriting your query using a common table expression (CTE), instead of a subquery:
WITH cteCompanyTableStats as (
SELECT
EmployeeID, DivisionId,
HasDuplicates = CASE WHEN count(*) > 1 THEN1 ELSE 0 END
FROM CompanyTable
GROUP BY EmployeeID, DivisionId
)
SELECT ct.*
FROM CompanyTable ct
inner join cteCompanyTableStats cts on
ct.EmployeeId = cts.EmployeeId
and ct.DivisionId = cts.DivisionId
and cts.HasDuplicates = 1
Notice how I've removed the HAVING clause & added a new HasDuplicates column? We're going to use that new column to find all of the table rows that -DON'T- have duplicates:
WITH cteCompanyTableStats as (
SELECT
EmployeeID, DivisionId,
HasDuplicates = CASE WHEN count(*) > 1 THEN1 ELSE 0 END
FROM CompanyTable
GROUP BY EmployeeID, DivisionId
)
SELECT ct.*
FROM CompanyTable ct
inner join cteCompanyTableStats cts on
ct.EmployeeId = cts.EmployeeId
and ct.DivisionId = cts.DivisionId
and cts.HasDuplicates = 0
The only character of SQL code that changed between the two queries was the last line, where and cts.HasDuplicates = ### is set.

Related

SQL SELECT ID WHERE rows with the same ID have different Values

I need some help creating a SQL statement across rows.
SELECT SZ.Stammindex AS ID, S.sEbene1, S.sEbene2, S.sEbene3
FROM SuchbaumZuordnung SZ
LEFT JOIN Suchbaum S
ON SZ.gSuchbaumID = S.gID
WHERE (S.sEbene1 IN ('Test1')
AND (S.sEbene2 IN ('Test2') OR S.sEbene2 IS NULL)
AND S.sEbene3 IS NULL)
As you can see in the screenshot, I selected ID=10004 and ID=10005. But actually I only want ID=10005 to show up. I am trying to filter across Rows as already mentioned.
My goal is to get all the IDs, where all the conditions are connected with "AND", something like this:
WHERE (sEbene1 IN ('Test1')
AND (sEbene2 IN ('Test2') *AND* sEbene2 IS NULL)
AND sEbene3 IS NULL)
But this will return nothing.
Edit
I hope you guys can help me.
I suspect that you want:
SELECT SZ.Stammindex AS ID
FROM SuchbaumZuordnung SZ
WHERE EXISTS (SELECT 1
FROM Suchbaum S
WHERE SZ.gSuchbaumID = S.gID AND
S.sEbene1 IN ('Test1') AND
sEbene2 IN ('Test2')
) AND
EXISTS (SELECT 1
FROM Suchbaum S
WHERE SZ.gSuchbaumID = S.gID AND
S.sEbene2 IS NULL AND
S.sEbene3 IS NULL
);
This is looking for two different rows in Suchbaum, each one matching one of the conditions.
Considering you only have 3 columns you want to check different rows, it seems like this would be easily serviced with a CTE and a Windowed COUNT:
WITH CTE AS(
SELECT SZ.Stammindex AS ID,
S.sEbene1, --Guessed the table alias
S.sEbene2, --Guessed the table alias
S.sEbene3, --Guessed the table alias
COUNT(DISTINCT CONCAT(ISNULL(S.S.sEbene1,'-'),ISNULL(S.sEbene2,'-'),ISNULL(S.sEbene3,'-'))) OVER (PARTITION BY SZ.Stammindex) AS DistinctRows
FROM SuchbaumZuordnung SZ
LEFT JOIN Suchbaum S ON SZ.gSuchbaumID = S.gID) --This was missing the ON in your sample
SELECT C.Stammindex,
C.sEbene1,
C.sEbene2,
C.sEbene3
FROM CTE C
WHERE C.DistinctRows > 1;
If it's purely where an ID has more than 1 rows (which could be identical) then you can just use COUNT:
WITH CTE AS(
SELECT SZ.Stammindex AS ID,
S.sEbene1, --Guessed the table alias
S.sEbene2, --Guessed the table alias
S.sEbene3, --Guessed the table alias
COUNT(*) OVER (PARTITION BY SZ.Stammindex) AS [Rows]
FROM SuchbaumZuordnung SZ
LEFT JOIN Suchbaum S ON SZ.gSuchbaumID = S.gID)
SELECT C.Stammindex,
C.sEbene1,
C.sEbene2,
C.sEbene3
FROM CTE C
WHERE C.[Rows] > 1;

How to pivot two rows into two columns

I have the following SQL Query:
select
distinct
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name
from
Equipment,
Studies,
Equipment_Reserved
where
Studies.Study = 'MAINT19-01'
and
Equipment.idEquipment = Equipment_Reserved.Equipment_idEquipment
and
Studies.idStudies = Equipment_Reserved.Studies_idStudies
and
Equipment.Type = 'Probe'
This query produces the following results:
Equipment_Attached_To Name
2297 R1-P1
2297 R1-P2
2299 R1-P3
I would like to change it to the following:
Equipment_Attached_To Name1 Name2
2297 R1-P1 R1-P2
2299 R1-P3 NULL
Thanks for your help!
I'd first change your query from the old, legacy JOIN syntax to an explicit join as it makes the query easier to understand:
SELECT
DISTINCT
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name
FROM
Equipment
INNER JOIN Equipment_Reserved ON Equipment_Reserved.Equipment_idEquipment = Equipment.idEquipment
INNER JOIN Studies ON Studies.idStudies = Equipment_Reserved.Studies_idStudies
WHERE
Studies.Study = 'MAINT19-01'
AND
Equipment.Type = 'Probe'
I don't think you actually need a PIVOT - I think you can do this with a nested query with the ROW_NUMBER function. I've seen that PIVOT queries often have worse query execution plans than nested-queries.
Let's add ROW_NUMBER (which require an ORDER BY as it's a windowing-function) and a matching ORDER BY in the whole query to make it consistent). Let's also use PARTITION BY so it resets the row-number for each Equipment_Attached_To value:
SELECT
DISTINCT
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name,
ROW_NUMBER() OVER (PARTITION BY Equipment_Attached_To ORDER BY [Name]) AS RowNumber
FROM
Equipment
INNER JOIN Equipment_Reserved ON Equipment_Reserved.Equipment_idEquipment = Equipment.idEquipment
INNER JOIN Studies ON Studies.idStudies = Equipment_Reserved.Studies_idStudies
WHERE
Studies.Study = 'MAINT19-01'
AND
Equipment.Type = 'Probe'
ORDER BY
Equipment_Attached_To,
[Name]
This will give output like this:
Equipment_Attached_To Name RowNumber
2297 R1-P1 1
2297 R1-P2 2
2299 R1-P3 1
This can then be split out into explicit columns like so below. The use of MAX() is arbitrary (we could use MIN() instead) and only because we're dealing with a GROUP BY and because the CASE WHEN... restricts the input set to just 1 row anyway.
SELECT
Equipment_Attached_To,
MAX( CASE WHEN RowNumber = 1 THEN [Name] END ) AS Name1,
MAX( CASE WHEN RowNumber = 2 THEN [Name] END ) AS Name2
FROM
(
-- the query from above
)
GROUP BY
Equipment_Attached_To
ORDER BY
Equipment_Attached_To,
Name1,
Name2
So the final query is:
SELECT
Equipment_Attached_To,
MAX( CASE WHEN RowNumber = 1 THEN [Name] END ) AS Name1,
MAX( CASE WHEN RowNumber = 2 THEN [Name] END ) AS Name2
FROM
(
SELECT
DISTINCT
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name,
ROW_NUMBER() OVER (PARTITION BY Equipment_Attached_To ORDER BY [Name]) AS RowNumber
FROM
Equipment
INNER JOIN Equipment_Reserved ON Equipment_Reserved.Equipment_idEquipment = Equipment.idEquipment
INNER JOIN Studies ON Studies.idStudies = Equipment_Reserved.Studies_idStudies
WHERE
Studies.Study = 'MAINT19-01'
AND
Equipment.Type = 'Probe'
)
GROUP BY
Equipment_Attached_To
ORDER BY
Equipment_Attached_To,
Name1,
Name2
Let's start with some basics.
To facilitate reading the code, I added alias to the tables using their initials.
Then, I converted the old join syntax which is partly deprecated to use the standard syntax since 1992 (27 years and people still use the old syntax).
Finally, since there are only 2 possible values, we can use MIN and MAX to separate them in 2 columns.
And because we're using aggregate functions, we remove the DISTINCT and use GROUP BY
The code now looks like this:
SELECT er.Equipment_Attached_To,
--Gets the first row for the id
MIN( e.Name) AS Name1,
--If the MAX is equal to the MIN, returns a NULL. If not, it returns the second value.
NULLIF( MAX(e.Name), MIN( e.Name)) AS Name2
FROM Equipment e
JOIN Studies s ON s.idStudies = er.Studies_idStudies
JOIN Equipment_Reserved er ON e.idEquipment = er.Equipment_idEquipment
WHERE s.Study = 'MAINT19-01'
AND e.Type = 'Probe'
GROUP BY er.Equipment_Attached_To;

3 Statements into a view

I have 3 SQL statements that I would like to create a view and return 3 columns, each representing a count.
Here are my statements
SELECT Count(*)
FROM PlaceEvents
WHERE PlaceID = {placeID} AND EndDateTimeUTC >= GETUTCDATE()
SELECT Count(*)
FROM PlaceAnnouncements
WHERE PlaceID = {placeID}
SELECT Count(*)
FROM PlaceFeedback
WHERE PlaceID = {placeID} AND IsSystem = 0
I know how to create a basic view but how do I create one that will let me have those 3 column place placeID as a column to use for filtering
I would like to do the following to return the proper data
SELECT *
FROM vMyCountView
WHERE PlaceID = 1
CREATE VIEW vMyCountView AS
(...) AS ActiveEvents,
(...) AS Announcements,
(...) AS UserFeedback,
PlaceID
I'd rather use a function then a view:
This allows you to pass in any parameters you like (I assumed placeId is an INT) and deal with it within your query. The handling is quite as easy as with a View:
CREATE FUNCTION MyCountFunction(#PlaceID INT)
RETURNS TABLE
AS
RETURN
SELECT
(SELECT Count(*) FROM PlaceEvents WHERE PlaceID = #PlaceID AND EndDateTimeUTC >= GETUTCDATE()) AS ActiveEvents
,(SELECT Count(*) FROM PlaceAnnouncements WHERE PlaceID = #PlaceID) AS Announcements
,(SELECT Count(*) FROM PlaceFeedback WHERE PlaceID = #PlaceID AND IsSystem = 0) AS UserFeedback
,#PlaceID AS PlaceID;
GO
And this is how you call it. You can use this for JOINs or with APPLY also...
SELECT * FROM dbo.MyCountFunction(3);
You can combine them as multiple select sub-queries.
CREATE VIEW vMyCountView AS
SELECT
(SELECT Count(*) FROM PlaceEvents
WHERE PlaceID = s.placeID AND EndDateTimeUTC >= GETUTCDATE()) AS ActiveEvents,
(SELECT Count(*) FROM PlaceAnnouncements
WHERE PlaceID = s.placeID) AS Announcements,
(SELECT Count(*) FROM PlaceFeedback
WHERE PlaceID = s.placeID AND IsSystem = 0) AS UserFeedback,
placeID
from Sometable s
By definition, view is a single select statement. You can use join, union and so on if it makes sense to your business logic provided create view is the only query in the batch.
You can make a view like that with GROUP BY:
SELECT
PlaceId
, Count(peId) AS ActiveEvents
, COUNT(paId) AS Announcements
, COUNT(fbId) AS UserFeedback
FROM (
SELECT PlaceId, 1 AS peId, NULL AS paId, NULL AS fbId
FROM PlaceEvents
WHERE EndDateTimeUTC >= GETUTCDATE()
UNION ALL
SELECT PlaceId, NULL AS peId, 1 AS paId, NULL AS fbId
FROM PlaceAnnouncements
UNION ALL
SELECT PlaceId, NULL AS peId, NULL AS paId, 1 AS fbId
FROM PlaceFeedback
WHERE IsSystem = 0
) src
GROUP BY PlaceId
The idea behind this select, which is very easy to make into a view, is to select items from three tables into one for counting, and then group them all at once.
If you have two active events, one announcement, and three feedbacks for place ID 123, the three inner selects would produce this:
PlaceId peId paId fbId
------- ---- ---- ----
123 1 NULL NULL
123 1 NULL NULL
123 NULL 1 NULL
123 NULL NULL 1
123 NULL NULL 1
123 NULL NULL 1

Find matching sets in a database table

I have a junction table in a (SQL Server 2014) database with columns FirstID and SecondID. Given a specific FirstID, I'd like to find all other FirstIDs from the table that have an equivalent set of SecondIDs (even if that set is empty).
Sample Data:
FirstId SecondId
1 1
1 2
2 3
3 1
3 2
... ...
In the case of the sample data, if I specified FirstID = 1, then I'd expect 3 to appear in the result set.
I've tried the following so far, which works pretty well except for empty sets:
SELECT FirstSecondEqualSet.FirstId
FROM FirstSecond FirstSecondOriginal
INNER JOIN FirstSecond FirstSecondEqualSet ON FirstSecondOriginal.SecondId = FirstSecondEqualSet.SecondId
WHERE FirstSecondOriginal.FirstId = #FirstId 
AND FirstSecondEqualSet.FirstId != #FirstId
GROUP BY FirstSecondEqualSet.FirstId
HAVING COUNT(1) = (SELECT COUNT(1) FROM FirstSecond WHERE FirstSecond.FirstId = #FirstId)
I think it's somehow related to Relational Division with no Remainder (RDNR). See this great article by Dwain Camps for reference.
DECLARE #firstId INT = 1
SELECT
f2.FirstId
FROM FirstSecond f1
INNER JOIN FirstSecond f2
ON f2.SecondId = f1.SecondId
AND f1.FirstId <> f2.FirstId
WHERE
f1.FirstId = #firstId
GROUP BY f2.FirstId
HAVING COUNT(*) = (SELECT COUNT(*) FROM FirstSecond WHERE FirstId = #firstId)
Here is one approach. It counts the number of values for each firstid and then joins on the secondid.
select fs2.firstid
from (select fs1.*, count(*) over (partition by firstid) as numseconds
from firstsecond fs1
where fs1.firstid = #firstid
) fs1 join
(select fs2.*, count(*) over (partition by firstid) as numseconds
from firstsecond fs2
) fs2
on fs1.secondid = fs2.secondid and fs1.numseconds = fs2.numseconds
group by fs2.firstid
having count(*) = max(fs1.numseconds);

Remove duplicates in SQL Result set of ONE table

Afternoon/Evening all,
I'm looking for the final touches to the below query. I need to remove the duplicate occurrences of a column in a particular row. Currently using the below SQL:
SELECT CBNEW.*
FROM CallbackNewID CBNEW
INNER JOIN (SELECT IDNEW, MAX(CallbackDate) AS MaxDate
FROM CallbackNewID
GROUP BY IDNEW) AS groupedCBNEW
ON (CBNEW.CallbackDate = groupedCBNEW.MaxDate) AND (CBNEW.IDNEW = groupedCBNEW.IDNEW);
My result set looks like the below
ID RecID Comp Rem Date_ IDNEW IDOLD CB? CallbackDate
138618 83209 1 0 2012-03-16 12:40:00 83209 83209 2 16-Mar-12
138619 83209 1 0 2012-03-16 12:40:00 83209 83209 2 16-Mar-12
110470 83799 1 0 2011-07-27 11:46:00 83799 83799 10 27-Jul-11
110471 83799 1 0 2011-07-27 11:46:00 83799 83799 10 27-Jul-11
This however gives me duplicate values in the CallBackDate and IDNEW Column because in the table there are some different Primary Keys with the same IDNEW and CallbackDate values.
If I dump this result into Excel, I can just use remove duplicates on the first ID column, and the problem's solved.
But what I want to do is make sure my result only includes the FIRST instance of the ID column, where IDNEW and CallbackDate are duplicated.
I'm sure I just need to append a tiny piece of SQL, but I'm stuck if I can find the answer so far.
Your help is very much appreciated.
Try adding MIN(ID) to the inner query and then adding it also on the ON clause:
SELECT CBNEW.*
FROM CallbackNewID CBNEW
INNER JOIN (SELECT IDNEW, MIN(ID) AS MinId, MAX(CallbackDate) AS MaxDate
FROM CallbackNewID
GROUP BY IDNEW) AS groupedCBNEW
ON (CBNEW.CallbackDate = groupedCBNEW.MaxDate)
AND (CBNEW.IDNEW = groupedCBNEW.IDNEW)
AND (CBNEW.ID = groupedCBNEW.MinId) ;
sqlfiddle demo
Here is a rather "brute force" approach. It just takes the results of your original query and does Min() on [ID], Max() on [Comp] and [Rem], and GROUP BY on everything else:
SELECT
Min(t.ID) AS MinOfID,
t.RecID,
Max(t.Comp) AS MaxOfComp,
Max(t.Rem) AS MaxOfRem,
t.Date_,
t.IDNEW,
t.IDOLD,
t.[CB?],
t.CallbackDate
FROM
(
SELECT CBNEW.*
FROM
CallbackNewID CBNEW
INNER JOIN
(
SELECT IDNEW, MAX(CallbackDate) AS MaxDate
FROM CallbackNewID
GROUP BY IDNEW
) AS groupedCBNEW
ON (CBNEW.CallbackDate = groupedCBNEW.MaxDate)
AND (CBNEW.IDNEW = groupedCBNEW.IDNEW)
) t
GROUP BY
t.RecID,
t.Date_,
t.IDNEW,
t.IDOLD,
t.[CB?],
t.CallbackDate;
It might not be terribly elegant, but if it works....
In MS SQL Server, I think you are looking for the ROW_NUMBER() function.
Something like this should help you get what you are looking for:
SELECT
X.*
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY DBNEW.IDNEW, DBNEW.MaxDate) [row_num]
FROM
CallbackNewID CBNEW
INNER JOIN
(
SELECT
IDNEW,
MAX(CallbackDate) AS MaxDate
FROM
CallbackNewID
GROUP BY
IDNEW
) AS groupedCBNEW ON (CBNEW.CallbackDate = groupedCBNEW.MaxDate) AND (CBNEW.IDNEW = groupedCBNEW.IDNEW)
) X
WHERE
X.row_num = 1
SELECT
A.*
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY IDNEW ORDER BY CallbackDate DESC)
AS [row_num]
FROM CallbackNewID
) A
WHERE
A.row_num = 1