Grouping 'groups' with common element - sql

I have a table like below, containing a group_id and some value.
group_id | value
---------+-------
1 | A
1 | B
2 | C
2 | D
2 | A
3 | E
3 | C
4 | G
4 | H
What I want to get is a unique number for each group that is somehow connected.. like this:
Group 1 and 2 have a common element A, Group 1 and 3 have a common element C > so this is actually one big group.
master_id | group_id | value
----------+----------+--------
1 | 1 | A
1 | 1 | B
1 | 2 | C
1 | 2 | D
1 | 2 | A
1 | 3 | E
1 | 3 | C
2 | 4 | G
2 | 4 | H
How can I get this master_id?

Calculating the master group is a graph-walking problem, which implies a recursive CTE. I would approach this by:
Generating edges between the groups, based on the values.
Traversing the edges without visiting previous groups.
The calculation of the master group is then the minimum of the visited groups for each group.
In SQL, this looks like:
with edges as (
select distinct t1.group_id as group_id_1, t2.group_id as group_id_2
from t t1 join
t t2
on t1.value = t2.value
),
cte as (
select e.group_id_1, e.group_id_2, convert(varchar(max), concat(',', group_id_1, ',', group_id_2)) as visited, 1 as lev
from edges e
union all
select cte.group_id_1, e.group_id_2,
concat(visited, e.group_id_2, ','), lev + 1
from cte join
edges e
on e.group_id_1 = cte.group_id_2
where cte.visited not like concat('%,', e.group_id_2, ',%') and lev < 5
)
select group_id_1, dense_rank() over (order by min(group_id_2)) as master_group
from cte
group by group_id_1;
Here is a db<>fiddle.

Sql server example:
WITH GetConnected AS (
SELECT DISTINCT g1.group_id sourceGroup, g2.group_id connectedGroup
FROM #groups g1
LEFT JOIN #groups g2
ON g1.value = g2.value
UNION ALL
SELECT g1.group_id sourceGroup, g3.connectedGroup connectedGroup
FROM #groups g1
INNER JOIN #groups g2
ON g1.value = g2.value
AND g1.group_id < g2.group_id
INNER JOIN GetConnected g3
ON g3.sourceGroup = g2.group_id
AND g3.connectedGroup > g2.group_id
), GetGroups AS (
SELECT MIN(sourceGroup) sourceGroup, connectedGroup, DENSE_RANK() OVER (ORDER BY MIN(sourceGroup)) rk
FROM GetConnected
GROUP BY connectedGroup)
SELECT gg.rk master_id, g.group_id, g.value
FROM GetGroups gg
INNER JOIN #groups g
ON gg.connectedGroup = g.group_id
ORDER BY gg.rk, gg.connectedGroup, g.value
If you consider postgre, I have example code:
WITH RECURSIVE GetConnected AS (
SELECT DISTINCT g1.group_id sourceGroup, g2.group_id connectedGroup
FROM groups g1
LEFT JOIN groups g2
ON g1.value = g2.value
UNION
SELECT g1.group_id sourceGroup, g3.connectedGroup connectedGroup
FROM groups g1
LEFT JOIN groups g2
ON g1.value = g2.value
INNER JOIN GetConnected g3
ON g3.sourceGroup = g2.group_id
), GetGroups AS (
SELECT MIN(sourceGroup) sourceGroup, connectedGroup, DENSE_RANK() OVER (ORDER BY MIN(sourceGroup)) rk
FROM GetConnected
GROUP BY connectedGroup)
SELECT gg.rk master_id, g.group_id, g.value
FROM GetGroups gg
INNER JOIN groups g
ON gg.connectedGroup = g.group_id
ORDER BY gg.rk, gg.connectedGroup, g.value

Related

Get only rows where data is the max value

I have a table like this:
treatment | patient_id
3 | 1
3 | 1
3 | 1
2 | 1
2 | 1
1 | 1
2 | 2
2 | 2
1 | 2
I need to get only rows on max(treatment) like this:
treatment | patient_id
3 | 1
3 | 1
3 | 1
2 | 2
2 | 2
The patient_id 1 the max(treatment) is 3
The patient_id 2 the max(treatment) is 2
You can for example join on the aggregated table using the maximal value:
select t.*
from tmp t
inner join (
select max(a) max_a, b
from tmp
group by b
) it on t.a = it.max_a and t.b = it.b;
Here's the db fiddle.
Try this :
WITH list AS
( SELECT patient_id, max(treatment) AS treatment_max
FROM your_table
GROUP BY patient_id
)
SELECT *
FROM your_table AS t
INNER JOIN list AS l
ON t.patient_id = l.patient_id
AND t.treatment = l.treatment_max
You can use rank:
with u as
(select *, rank() over(partition by patient_id order by treatment desc) r
from table_name)
select treatment, patient_id
from u
where r = 1;
Fiddle
use corelated subquery
select t1.* from table_name t1
where t1.treatment=( select max(treatment) from table_name t2 where t1.patient_id=t2.patient_id
)

Select non existing Numbers from Table each ID

I‘m new in learning TSQL and I‘m struggling getting the numbers that doesn‘t exist in my table each ID.
Example:
CustomerID Group
1 1
3 1
6 1
4 2
7 2
I wanna get the ID which does not exist and select them like this
CustomerID Group
2 1
4 1
5 1
5 2
6 2
....
..
The solution by usin a cte doesn‘t work well or inserting first the data and do a not exist where clause.
Any Ideas?
If you can live with ranges rather than a list with each one, then an efficient method uses lead():
select group_id, (customer_id + 1) as first_missing_customer_id,
(next_ci - 1) as last_missing_customer_id
from (select t.*,
lead(customer_id) over (partition by group_id order by customer_id) as next_ci
from t
) t
where next_ci <> customer_id + 1
Cross join 2 recursive CTEs to get all the possible combinations of [CustomerID] and [Group] and then LEFT join to the table:
declare #c int = (select max([CustomerID]) from tablename);
declare #g int = (select max([Group]) from tablename);
with
customers as (
select 1 as cust
union all
select cust + 1
from customers where cust < #c
),
groups as (
select 1 as gr
union all
select gr + 1
from groups where gr < #g
),
cte as (
select *
from customers cross join groups
)
select c.cust as [CustomerID], c.gr as [Group]
from cte c left join tablename t
on t.[CustomerID] = c.cust and t.[Group] = c.gr
where t.[CustomerID] is null
and c.cust > (select min([CustomerID]) from tablename where [Group] = c.gr)
and c.cust < (select max([CustomerID]) from tablename where [Group] = c.gr)
See the demo.
Results:
> CustomerID | Group
> ---------: | ----:
> 2 | 1
> 4 | 1
> 5 | 1
> 5 | 2
> 6 | 2

Min and max auto Id column plus other columns from the same table

I need to retrieve data from a table relative to 3 columns i.e. max and min Id for every unique reservation_Id and rnoid pair.
This is what I have tried:
SELECT
R1.*,
R2.Id, R2.Res_Id, R2.rNoid
FROM
dbo.Res_Id R1
LEFT OUTER JOIN
dbo.Res_Id R2 ON R2.rNoid = R1.rNoid
AND (R2.Id > R1.Id --min
OR (R2.Id = R1.Id
AND r2.Res_Id <> r1.Res_Id)
)
-- AND R2.rNoid <> R1.rNoid
WHERE
R2.id IS NULL
ORDER BY
R1.Id
Results:
id Res_Id, rNoid, xxx_x, yyy_x, user_id
-------------------------------------------
1 1 33 5 null 1
2 3 44 0 3 1
3 13 22 0 null 1
4 1 22 2 5 2
5 3 5 0 5 2
6 3 77 1 3 2
I am getting some unique pairs skipped.
You may try this. This will create min(id) under cte section and max(id) under ct section. After that you may easily get the inner join on Res_Id and
rNoid to get your records.
; with cte as (
SELECT r.Res_Id, r.rNoid, MIN(r.id) as MinId
FROM dbo.Res_Id R1
GROUP BY r.Res_Id, r.rNoid
),
ct as (
SELECT r.Res_Id, r.rNoid, MAX(r.id) as MaxId
FROM dbo.Res_Id R1
GROUP BY r.Res_Id, r.rNoid
)
SELECT C.Res_Id, C.rNoid, MinId, MaxId FROM CTE AS C INNER JOIN CT AS CT
ON C.Res_Id = CT.Res_Id AND C.rNoid = CT.rNoid
max and min Id for every unique reservation_Id and rnoid pair.
You seem to want a simple aggregation query:
SELECT r.Res_Id, r.rNoid, MIN(r.id), MAX(r.id)
FROM dbo.Res_Id R1
GROUP BY r.Res_Id, r.rNoid;

How to all possible child rows in a SQL table when using id parentid relationship in single table

I have the following table schema:
ID | PARENT ID | NAME
-------------------------
1 | NULL | A - ROOT
2 | 1 | B
3 | 2 | C
4 | 1 | D
5 | 4 | E
6 | 5 | F
The hierarchy look like:
A
-- B
-- -- C
-- D
-- -- E
-- -- -- F
I want to get all child recursively in all descendant levels.
For example when I have A and query for it, I would like to get back A, B, C, D, E and F.
When I have E I want to get E and F.
When I have D I want to get D, E and F.
I am not SQL expert and as a developer normally I would build programmatically loops with DB query and check whether I have children or not and recursively get the children. But this i definitely a very expensive/unperformant approach.
Is there an elegant/better way by using a SQL statement?
Here is a generic hierarchy build. This one will maintain presentation sequence and also includes RANGE KEYS (optional and easy to remove if not needed)
Declare #YourTable table (ID int,Parent_ID int,Name varchar(50))
Insert into #YourTable values (1,null,'A - Root'),(2,1,'B'),(3,2,'C'),(4,1,'D'),(5,4,'E'),(6,5,'F')
Declare #Top int = null --<< Sets top of Hier Try 4
Declare #Nest varchar(25) =' ' --<< Optional: Added for readability
;with cteHB (Seq,ID,Parent_ID,Lvl,Name) as (
Select Seq = cast(1000+Row_Number() over (Order by Name) as varchar(500))
,ID
,Parent_ID
,Lvl=1
,Name
From #YourTable
Where IsNull(#Top,-1) = case when #Top is null then isnull(Parent_ID,-1) else ID end
Union All
Select Seq = cast(concat(cteHB.Seq,'.',1000+Row_Number() over (Order by cteCD.Name)) as varchar(500))
,cteCD.ID
,cteCD.Parent_ID,cteHB.Lvl+1
,cteCD.Name
From #YourTable cteCD
Join cteHB on cteCD.Parent_ID = cteHB.ID)
,cteR1 as (Select Seq,ID,R1=Row_Number() over (Order By Seq) From cteHB)
,cteR2 as (Select A.Seq,A.ID,R2=Max(B.R1) From cteR1 A Join cteR1 B on (B.Seq like A.Seq+'%') Group By A.Seq,A.ID )
Select B.R1
,C.R2
,A.ID
,A.Parent_ID
,A.Lvl
,Name = Replicate(#Nest,A.Lvl) + A.Name
From cteHB A
Join cteR1 B on A.ID=B.ID
Join cteR2 C on A.ID=C.ID
Order By B.R1 --<< Or use A.Seq
Returns
R1 R2 ID Parent_ID Lvl Name
1 6 1 NULL 1 A - Root
2 3 2 1 2 B
3 3 3 2 3 C
4 6 4 1 2 D
5 6 5 4 3 E
6 6 6 5 4 F
you can use Common table expressions as below:
;with cte as
(
select id, name from hierarchy where parentid is null
union all
select h.id, h.name from hierarchy h inner join cte c
on h.parentid = c.id
)
select * from cte
This will give you the results you want:
DECLARE #search nvarchar(1) = 'D'
;WITH cte AS (
SELECT ID,
[PARENT ID],
NAME
FROM YourTable
WHERE NAME = #search
UNION ALL
SELECT y.ID,
y.[PARENT ID],
y.NAME
FROM YourTable y
INNER JOIN cte c
ON y.[PARENT ID] = c.ID
)
SELECT *
FROM cte
ORDER BY NAME

SQL intersect with group by

Given these two tables/sets with different groups of items, how can I find which groups in set1 span across more than a single group in set2? how can I find the groups in set1 which cannot be covered by a single group in set2?
e.g. for tables below, A (1,2,5) is the only group that spans across s1(1,2,3) and s2(2,3,4,5). B and C are not the answers because both are covered in a single group s2.
I would prefer to use SQL (Sql Server 2008 R2 available).
Thanks.
set1 set2
+---------+----------+ +---------+----------+
| group | item | | group | item |
`````````````````````+ `````````````````````+
| A | 1 | | s1 | 1 |
| A | 2 | | s1 | 2 |
| A | 5 | | s1 | 3 |
| B | 4 | | s2 | 2 |
| B | 5 | | s2 | 3 |
| C | 3 | | s2 | 4 |
| C | 5 | | s2 | 5 |
+---------+----------+ +---------+----------+
Use this sqlfiddle to try: http://sqlfiddle.com/#!6/fac8a/3
Or use the script below to generate temp tables to try out the answers:
create table #set1 (grp varchar(5),item int)
create table #set2 (grp varchar(5),item int)
insert into #set1 select 'a',1 union select 'a',2 union select 'a',5 union select 'b',4 union select 'b',5 union select 'c',3 union select 'c',5
insert into #set2 select 's1',1 union select 's1',2 union select 's1',3 union select 's2',2 union select 's2',3 union select 's2',4 union select 's2',5
select * from #set1
select * from #set2
--drop table #set1
--drop table #set2
Select groups from set1 for which there are no groups in set2 for which all items in set1 exists in set2:
select s1.grp from set1 s1
where not exists(
select * from set2 s2 where not exists(
select item from set1 s11
where s11.grp = s1.grp
except
select item from set2 s22
where s22.grp = s2.grp))
group by s1.grp
Ok. This is ugly, but it should work. I tried it in fiddle. I think it can be done through windowing, but I have to think about it.
Here is the ugly one for now.
WITH d1 AS (
SELECT set1.grp
, COUNT(*) cnt
FROM set1
GROUP BY set1.grp
), d2 AS (
SELECT set1.grp grp1
, set2.grp grp2
, COUNT(set1.item) cnt
FROM set1
INNER JOIN set2
ON set1.item = set2.item
GROUP BY set1.grp
, set2.grp
)
SELECT grp
FROM d1
EXCEPT
SELECT d1.grp
FROM d1
INNER JOIN d2
ON d2.grp1 = d1.grp
AND d2.cnt = d1.cnt
Can you check this
SELECT DISTINCT a.Group1, a.Item, b.CNT
FROM SET1 a
INNER JOIN
(SELECT GroupA, COUNT(*) CNT
FROM
(
SELECT DISTINCT a.Group1 GroupA, b.Group1 GroupB
FROM SET1 a
INNER JOIN SET2 b ON a.Item = b.Item
) a GROUP BY GroupA
) b ON a.Group1 = b.GroupA
WHERE b.CNT > 1
Thanks for the comments. I believe the following edited query will work:
Select distinct grp1, initialRows, max(MatchedRows) from
(
select a.grp as grp1, b.grp as grp2
, count(distinct case when b.item is not null then a.item end) as MatchedRows
, d.InitialRows
from set1 a
left join set2 b
on a.item = b.item
left join
(select grp, count(distinct Item) as InitialRows from set1
group by grp) d
on a.grp = d.grp
group by a.grp, b.grp, InitialRows
) c
group by grp1, InitialRows
having max(MatchedRows) < InitialRows
I think this will do the trick. The subquery returns set2 groups per set1 group, that have a match for all the items in set1, by counting the matches and comparing the matches count to the set1 group count.
select s.grp from #set1 s
group by s.grp
having not exists (
select s2.grp from #set2 s2 inner join #set1 s1 on s2.item = s1.item
where s1.grp = s.grp
group by s2.grp
having count(s.item) = count(s2.item)
)
You can find the solution through following query:
SELECT A.GROUP AS G1, A.ITEM AS T1, B.GROUP, B.ITEM
FROM SET1 A RIGHT JOIN SET2 B ON A.ITEM=B.ITEM
WHERE A.GROUP IS NULL
Basically the same as Robert Co
I did not get this from his answer - came up with this independently
select set1.group
from set1
except
select set1count.group
from ( select set1.group , count(*) as [count]
from set1
) as set1count
join ( select set1.group as [group1], count(*) as [count]
from set1
join set2
on set2.item = set1.item
group by set1.group, set2.group -- this is the magic
) as set1count
on set1count.group = set2count.[group1] -- note no set2.group match
and set1count.count = set12count.count -- the items in set1 are in at least on set2 group