Find matching sets in a database table - sql

I have a junction table in a (SQL Server 2014) database with columns FirstID and SecondID. Given a specific FirstID, I'd like to find all other FirstIDs from the table that have an equivalent set of SecondIDs (even if that set is empty).
Sample Data:
FirstId SecondId
1 1
1 2
2 3
3 1
3 2
... ...
In the case of the sample data, if I specified FirstID = 1, then I'd expect 3 to appear in the result set.
I've tried the following so far, which works pretty well except for empty sets:
SELECT FirstSecondEqualSet.FirstId
FROM FirstSecond FirstSecondOriginal
INNER JOIN FirstSecond FirstSecondEqualSet ON FirstSecondOriginal.SecondId = FirstSecondEqualSet.SecondId
WHERE FirstSecondOriginal.FirstId = #FirstId 
AND FirstSecondEqualSet.FirstId != #FirstId
GROUP BY FirstSecondEqualSet.FirstId
HAVING COUNT(1) = (SELECT COUNT(1) FROM FirstSecond WHERE FirstSecond.FirstId = #FirstId)

I think it's somehow related to Relational Division with no Remainder (RDNR). See this great article by Dwain Camps for reference.
DECLARE #firstId INT = 1
SELECT
f2.FirstId
FROM FirstSecond f1
INNER JOIN FirstSecond f2
ON f2.SecondId = f1.SecondId
AND f1.FirstId <> f2.FirstId
WHERE
f1.FirstId = #firstId
GROUP BY f2.FirstId
HAVING COUNT(*) = (SELECT COUNT(*) FROM FirstSecond WHERE FirstId = #firstId)

Here is one approach. It counts the number of values for each firstid and then joins on the secondid.
select fs2.firstid
from (select fs1.*, count(*) over (partition by firstid) as numseconds
from firstsecond fs1
where fs1.firstid = #firstid
) fs1 join
(select fs2.*, count(*) over (partition by firstid) as numseconds
from firstsecond fs2
) fs2
on fs1.secondid = fs2.secondid and fs1.numseconds = fs2.numseconds
group by fs2.firstid
having count(*) = max(fs1.numseconds);

Related

SQL SELECT ID WHERE rows with the same ID have different Values

I need some help creating a SQL statement across rows.
SELECT SZ.Stammindex AS ID, S.sEbene1, S.sEbene2, S.sEbene3
FROM SuchbaumZuordnung SZ
LEFT JOIN Suchbaum S
ON SZ.gSuchbaumID = S.gID
WHERE (S.sEbene1 IN ('Test1')
AND (S.sEbene2 IN ('Test2') OR S.sEbene2 IS NULL)
AND S.sEbene3 IS NULL)
As you can see in the screenshot, I selected ID=10004 and ID=10005. But actually I only want ID=10005 to show up. I am trying to filter across Rows as already mentioned.
My goal is to get all the IDs, where all the conditions are connected with "AND", something like this:
WHERE (sEbene1 IN ('Test1')
AND (sEbene2 IN ('Test2') *AND* sEbene2 IS NULL)
AND sEbene3 IS NULL)
But this will return nothing.
Edit
I hope you guys can help me.
I suspect that you want:
SELECT SZ.Stammindex AS ID
FROM SuchbaumZuordnung SZ
WHERE EXISTS (SELECT 1
FROM Suchbaum S
WHERE SZ.gSuchbaumID = S.gID AND
S.sEbene1 IN ('Test1') AND
sEbene2 IN ('Test2')
) AND
EXISTS (SELECT 1
FROM Suchbaum S
WHERE SZ.gSuchbaumID = S.gID AND
S.sEbene2 IS NULL AND
S.sEbene3 IS NULL
);
This is looking for two different rows in Suchbaum, each one matching one of the conditions.
Considering you only have 3 columns you want to check different rows, it seems like this would be easily serviced with a CTE and a Windowed COUNT:
WITH CTE AS(
SELECT SZ.Stammindex AS ID,
S.sEbene1, --Guessed the table alias
S.sEbene2, --Guessed the table alias
S.sEbene3, --Guessed the table alias
COUNT(DISTINCT CONCAT(ISNULL(S.S.sEbene1,'-'),ISNULL(S.sEbene2,'-'),ISNULL(S.sEbene3,'-'))) OVER (PARTITION BY SZ.Stammindex) AS DistinctRows
FROM SuchbaumZuordnung SZ
LEFT JOIN Suchbaum S ON SZ.gSuchbaumID = S.gID) --This was missing the ON in your sample
SELECT C.Stammindex,
C.sEbene1,
C.sEbene2,
C.sEbene3
FROM CTE C
WHERE C.DistinctRows > 1;
If it's purely where an ID has more than 1 rows (which could be identical) then you can just use COUNT:
WITH CTE AS(
SELECT SZ.Stammindex AS ID,
S.sEbene1, --Guessed the table alias
S.sEbene2, --Guessed the table alias
S.sEbene3, --Guessed the table alias
COUNT(*) OVER (PARTITION BY SZ.Stammindex) AS [Rows]
FROM SuchbaumZuordnung SZ
LEFT JOIN Suchbaum S ON SZ.gSuchbaumID = S.gID)
SELECT C.Stammindex,
C.sEbene1,
C.sEbene2,
C.sEbene3
FROM CTE C
WHERE C.[Rows] > 1;

How to pivot two rows into two columns

I have the following SQL Query:
select
distinct
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name
from
Equipment,
Studies,
Equipment_Reserved
where
Studies.Study = 'MAINT19-01'
and
Equipment.idEquipment = Equipment_Reserved.Equipment_idEquipment
and
Studies.idStudies = Equipment_Reserved.Studies_idStudies
and
Equipment.Type = 'Probe'
This query produces the following results:
Equipment_Attached_To Name
2297 R1-P1
2297 R1-P2
2299 R1-P3
I would like to change it to the following:
Equipment_Attached_To Name1 Name2
2297 R1-P1 R1-P2
2299 R1-P3 NULL
Thanks for your help!
I'd first change your query from the old, legacy JOIN syntax to an explicit join as it makes the query easier to understand:
SELECT
DISTINCT
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name
FROM
Equipment
INNER JOIN Equipment_Reserved ON Equipment_Reserved.Equipment_idEquipment = Equipment.idEquipment
INNER JOIN Studies ON Studies.idStudies = Equipment_Reserved.Studies_idStudies
WHERE
Studies.Study = 'MAINT19-01'
AND
Equipment.Type = 'Probe'
I don't think you actually need a PIVOT - I think you can do this with a nested query with the ROW_NUMBER function. I've seen that PIVOT queries often have worse query execution plans than nested-queries.
Let's add ROW_NUMBER (which require an ORDER BY as it's a windowing-function) and a matching ORDER BY in the whole query to make it consistent). Let's also use PARTITION BY so it resets the row-number for each Equipment_Attached_To value:
SELECT
DISTINCT
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name,
ROW_NUMBER() OVER (PARTITION BY Equipment_Attached_To ORDER BY [Name]) AS RowNumber
FROM
Equipment
INNER JOIN Equipment_Reserved ON Equipment_Reserved.Equipment_idEquipment = Equipment.idEquipment
INNER JOIN Studies ON Studies.idStudies = Equipment_Reserved.Studies_idStudies
WHERE
Studies.Study = 'MAINT19-01'
AND
Equipment.Type = 'Probe'
ORDER BY
Equipment_Attached_To,
[Name]
This will give output like this:
Equipment_Attached_To Name RowNumber
2297 R1-P1 1
2297 R1-P2 2
2299 R1-P3 1
This can then be split out into explicit columns like so below. The use of MAX() is arbitrary (we could use MIN() instead) and only because we're dealing with a GROUP BY and because the CASE WHEN... restricts the input set to just 1 row anyway.
SELECT
Equipment_Attached_To,
MAX( CASE WHEN RowNumber = 1 THEN [Name] END ) AS Name1,
MAX( CASE WHEN RowNumber = 2 THEN [Name] END ) AS Name2
FROM
(
-- the query from above
)
GROUP BY
Equipment_Attached_To
ORDER BY
Equipment_Attached_To,
Name1,
Name2
So the final query is:
SELECT
Equipment_Attached_To,
MAX( CASE WHEN RowNumber = 1 THEN [Name] END ) AS Name1,
MAX( CASE WHEN RowNumber = 2 THEN [Name] END ) AS Name2
FROM
(
SELECT
DISTINCT
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name,
ROW_NUMBER() OVER (PARTITION BY Equipment_Attached_To ORDER BY [Name]) AS RowNumber
FROM
Equipment
INNER JOIN Equipment_Reserved ON Equipment_Reserved.Equipment_idEquipment = Equipment.idEquipment
INNER JOIN Studies ON Studies.idStudies = Equipment_Reserved.Studies_idStudies
WHERE
Studies.Study = 'MAINT19-01'
AND
Equipment.Type = 'Probe'
)
GROUP BY
Equipment_Attached_To
ORDER BY
Equipment_Attached_To,
Name1,
Name2
Let's start with some basics.
To facilitate reading the code, I added alias to the tables using their initials.
Then, I converted the old join syntax which is partly deprecated to use the standard syntax since 1992 (27 years and people still use the old syntax).
Finally, since there are only 2 possible values, we can use MIN and MAX to separate them in 2 columns.
And because we're using aggregate functions, we remove the DISTINCT and use GROUP BY
The code now looks like this:
SELECT er.Equipment_Attached_To,
--Gets the first row for the id
MIN( e.Name) AS Name1,
--If the MAX is equal to the MIN, returns a NULL. If not, it returns the second value.
NULLIF( MAX(e.Name), MIN( e.Name)) AS Name2
FROM Equipment e
JOIN Studies s ON s.idStudies = er.Studies_idStudies
JOIN Equipment_Reserved er ON e.idEquipment = er.Equipment_idEquipment
WHERE s.Study = 'MAINT19-01'
AND e.Type = 'Probe'
GROUP BY er.Equipment_Attached_To;

Distinct keyword not fetching results in Oracle

I have the following query where I unique records for patient_id, meaning patient_id should not be duplicate. Each time I try executing the query, seems like the DB hangs or it takes hours to execute, I'm not sure. I need my records to load quickly. Any quick resolution will be highly appreciated.
SELECT DISTINCT a.patient_id,
a.study_id,
a.procstep_id,
a.formdata_seq,
0,
(SELECT MAX(audit_id)
FROM audit_info
WHERE patient_id =a.patient_id
AND study_id = a.study_id
AND procstep_id = a.procstep_id
AND formdata_seq = a.formdata_seq
) AS data_session_id
FROM frm_rg_ps_rg a,
PATIENT_STUDY_STEP pss
WHERE ((SELECT COUNT(*)
FROM frm_rg_ps_rg b
WHERE a.patient_id = b.patient_id
AND a.formdata_seq = b.formdata_seq
AND a.psdate IS NOT NULL
AND b.psdate IS NOT NULL
AND a.psresult IS NOT NULL
AND b.psresult IS NOT NULL) = 1)
OR NOT EXISTS
(SELECT *
FROM frm_rg_ps_rg c
WHERE a.psdate IS NOT NULL
AND c.psdate IS NOT NULL
AND a.psresult IS NOT NULL
AND c.psresult IS NOT NULL
AND a.patient_id = c.patient_id
AND a.formdata_seq = c.formdata_seq
AND a.elemdata_seq! =c.elemdata_seq
AND a.psresult != c.psresult
AND ((SELECT (a.psdate - c.psdate) FROM dual)>=7
OR (SELECT (a.psdate - c.psdate) FROM dual) <=-7)
)
AND a.psresult IS NOT NULL
AND a.psdate IS NOT NULL;
For start, you have a cartesian product with PATIENT_STUDY_STEP (pss).
It is not connected to anything.
select *
from (select t.*
,count (*) over (partition by patient_id) as cnt
from frm_rg_ps_rg t
) t
where cnt = 1
;

Need help creating SQL query from example of data

I have a database table below.
And I want to get list of all DBKey that have: at least one entry with Staled=1, and the last entry is Staled=0
The list should not contain DBKey that has only Staled=0 OR Staled=1.
In this example, the list would be: DBKey=2 and DBKey=3
I think this should do the trick:
SELECT DISTINCT T.DBKey
FROM TABLE T
WHERE
-- checks that the DBKey has at least one entry with Staled = 1
EXISTS (
SELECT DISTINCT Staled
FROM TABLE
WHERE DBKey = T.DBKey
AND Staled = 1
)
-- checks that the last Staled entry for this DBKey is 0
AND EXISTS (
SELECT DISTINCT Staled
FROM TABLE
WHERE DBKey = T.DBKey
AND Staled = 0
AND EntryDateTime = (
SELECT MAX(EntryDateTime)
FROM TABLE
WHERE DBKey = T.DBKey
)
)
Here is a working SQLFiddle of the query, using your sample data.
The idea is to use EXISTS to look for those individual conditions that you've described. I've added comments to my code to explain what each does.
Should be done with a simple JOIN... Starting FIRST with any 1 qualifiers, joined to itself by same key AND 0 staled qualifier AND the 0 record has a higher date. Ensure you have an index on ( DBKey, Staled, EntryDateTime )
SELECT
YT.DBKey,
MAX( YT.EntryDateTime ) as MaxStaled1,
MAX( YT2.EntryDateTime ) as MaxStaled0
from
YourTable YT
JOIN YourTable YT2
ON YT.DBKey = YT2.DBKey
AND YT2.Staled = 0
AND YT.EntryDateTime < YT2.EntryDateTime
where
YT.Staled = 1
group by
YT.DBKey
having
MAX( YT.EntryDateTime ) < MAX( YT2.EntryDateTime )
Maybe this:
With X as
(
Select Row_Number() Over (Partition By DBKey Order By EntryDateTime Desc) RN, DBKey, Staled
From table
)
Select *
From X
Where rn = 1 and staled = 0 and
Exists (select 1 from x x2 where x2.dbkey = x.dbkey and Staled = 1)

T-sql problem with running sum

I am trying to write T-sql script which will find "open" records for one table
Structure of data is following
Id (int PK) Ts (datetime) Art_id (int) Amount (float)
1 '2009-01-01' 1 1
2 '2009-01-05' 1 -1
3 '2009-01-10' 1 1
4 '2009-01-11' 1 -1
5 '2009-01-13' 1 1
6 '2009-01-14' 1 1
7 '2009-01-15' 2 1
8 '2009-01-17' 2 -1
9 '2009-01-18' 2 1
According to my needs I am trying to show only records after last sum for every one articles where 0 sorting by date of last running sum of zero value. So I am trying to abstract (show) records 5 and 6 for Art_id=1 and record 9 for art_id=2. I am using MSSQL2005 and my table has around 30K records with 6000 distinct values of ART_ID.
In this solution I simply want to find all the rows where there isn't a subsequent row for that Art_id where the running sum was 0. I am assuming we can use the ID as a better tiebreaker than TS, since two rows can come in with the same timestamp but they will get sequential identity values.
;WITH base AS
(
SELECT
ID, Art_id, TS, Amount,
RunningSum = Amount + COALESCE
(
(
SELECT SUM(Amount)
FROM dbo.foo
WHERE Art_id = f.Art_id
AND ID < f.ID
)
, 0
)
FROM dbo.[table name] AS f
)
SELECT ID, Art_id, TS, Amount
FROM base AS b1
WHERE NOT EXISTS
(
SELECT 1
FROM base AS b2
WHERE Art_id = b1.Art_id
AND ID >= b1.ID
AND RunningSum = 0
)
ORDER BY ID;
Complete working query:
SELECT
*
FROM TABLE_NAME E
JOIN
(SELECT
C.ART_ID,
MAX(TS) MAX_TS
FROM
(SELECT
ART_ID,
TS,
COALESCE((SELECT SUM(AMOUNT) FROM TABLE_NAME B WHERE (B.Art_id = A.Art_id) AND (B.Ts < A.Ts)),0) ROW_SUM
FROM TABLE_NAME A) C
WHERE C.ROW_SUM = 0
GROUP BY C.ART_ID) D
ON
(D.ART_ID = E.ART_ID) AND
(E.TS >= D.MAX_TS)
First we calculate running sums for every row:
SELECT
ART_ID,
TS,
COALESCE((SELECT SUM(AMOUNT) FROM TABLE_NAME B WHERE (B.Art_id = A.Art_id) AND (B.Ts < A.Ts)),0) ROW_SUM
FROM TABLE_NAME A
Then we look for last article with 0:
SELECT
C.ART_ID,
MAX(TS) MAX_TS
FROM
(SELECT
ART_ID,
TS,
COALESCE((SELECT SUM(AMOUNT) FROM TABLE_NAME B WHERE (B.Art_id = A.Art_id) AND (B.Ts < A.Ts)),0) ROW_SUM
FROM TABLE_NAME A) C
WHERE C.ROW_SUM = 0
GROUP BY C.ART_ID
You can find all rows where the running sum is zero with:
select cur.id, cur.art_id
from #articles cur
left join #articles prev
on prev.art_id = cur.art_id
and prev.id <= cur.id
group by cur.id, cur.art_id
having sum(prev.amount) = 0
Then you can query all rows that come after the rows with a zero running sum:
select a.*
from #articles a
left join (
select cur.id, cur.art_id, running = sum(prev.amount)
from #articles cur
left join #articles prev
on prev.art_id = cur.art_id
and prev.ts <= cur.ts
group by cur.id, cur.art_id
having sum(prev.amount) = 0
) later_zero_running on
a.art_id = later_zero_running.art_id
and a.id <= later_zero_running.id
where later_zero_running.id is null
The LEFT JOIN in combination with the WHERE says: there can not be a row after this row, where the running sum is zero.