SQL intersect with group by - sql

Given these two tables/sets with different groups of items, how can I find which groups in set1 span across more than a single group in set2? how can I find the groups in set1 which cannot be covered by a single group in set2?
e.g. for tables below, A (1,2,5) is the only group that spans across s1(1,2,3) and s2(2,3,4,5). B and C are not the answers because both are covered in a single group s2.
I would prefer to use SQL (Sql Server 2008 R2 available).
Thanks.
set1 set2
+---------+----------+ +---------+----------+
| group | item | | group | item |
`````````````````````+ `````````````````````+
| A | 1 | | s1 | 1 |
| A | 2 | | s1 | 2 |
| A | 5 | | s1 | 3 |
| B | 4 | | s2 | 2 |
| B | 5 | | s2 | 3 |
| C | 3 | | s2 | 4 |
| C | 5 | | s2 | 5 |
+---------+----------+ +---------+----------+
Use this sqlfiddle to try: http://sqlfiddle.com/#!6/fac8a/3
Or use the script below to generate temp tables to try out the answers:
create table #set1 (grp varchar(5),item int)
create table #set2 (grp varchar(5),item int)
insert into #set1 select 'a',1 union select 'a',2 union select 'a',5 union select 'b',4 union select 'b',5 union select 'c',3 union select 'c',5
insert into #set2 select 's1',1 union select 's1',2 union select 's1',3 union select 's2',2 union select 's2',3 union select 's2',4 union select 's2',5
select * from #set1
select * from #set2
--drop table #set1
--drop table #set2

Select groups from set1 for which there are no groups in set2 for which all items in set1 exists in set2:
select s1.grp from set1 s1
where not exists(
select * from set2 s2 where not exists(
select item from set1 s11
where s11.grp = s1.grp
except
select item from set2 s22
where s22.grp = s2.grp))
group by s1.grp

Ok. This is ugly, but it should work. I tried it in fiddle. I think it can be done through windowing, but I have to think about it.
Here is the ugly one for now.
WITH d1 AS (
SELECT set1.grp
, COUNT(*) cnt
FROM set1
GROUP BY set1.grp
), d2 AS (
SELECT set1.grp grp1
, set2.grp grp2
, COUNT(set1.item) cnt
FROM set1
INNER JOIN set2
ON set1.item = set2.item
GROUP BY set1.grp
, set2.grp
)
SELECT grp
FROM d1
EXCEPT
SELECT d1.grp
FROM d1
INNER JOIN d2
ON d2.grp1 = d1.grp
AND d2.cnt = d1.cnt

Can you check this
SELECT DISTINCT a.Group1, a.Item, b.CNT
FROM SET1 a
INNER JOIN
(SELECT GroupA, COUNT(*) CNT
FROM
(
SELECT DISTINCT a.Group1 GroupA, b.Group1 GroupB
FROM SET1 a
INNER JOIN SET2 b ON a.Item = b.Item
) a GROUP BY GroupA
) b ON a.Group1 = b.GroupA
WHERE b.CNT > 1

Thanks for the comments. I believe the following edited query will work:
Select distinct grp1, initialRows, max(MatchedRows) from
(
select a.grp as grp1, b.grp as grp2
, count(distinct case when b.item is not null then a.item end) as MatchedRows
, d.InitialRows
from set1 a
left join set2 b
on a.item = b.item
left join
(select grp, count(distinct Item) as InitialRows from set1
group by grp) d
on a.grp = d.grp
group by a.grp, b.grp, InitialRows
) c
group by grp1, InitialRows
having max(MatchedRows) < InitialRows

I think this will do the trick. The subquery returns set2 groups per set1 group, that have a match for all the items in set1, by counting the matches and comparing the matches count to the set1 group count.
select s.grp from #set1 s
group by s.grp
having not exists (
select s2.grp from #set2 s2 inner join #set1 s1 on s2.item = s1.item
where s1.grp = s.grp
group by s2.grp
having count(s.item) = count(s2.item)
)

You can find the solution through following query:
SELECT A.GROUP AS G1, A.ITEM AS T1, B.GROUP, B.ITEM
FROM SET1 A RIGHT JOIN SET2 B ON A.ITEM=B.ITEM
WHERE A.GROUP IS NULL

Basically the same as Robert Co
I did not get this from his answer - came up with this independently
select set1.group
from set1
except
select set1count.group
from ( select set1.group , count(*) as [count]
from set1
) as set1count
join ( select set1.group as [group1], count(*) as [count]
from set1
join set2
on set2.item = set1.item
group by set1.group, set2.group -- this is the magic
) as set1count
on set1count.group = set2count.[group1] -- note no set2.group match
and set1count.count = set12count.count -- the items in set1 are in at least on set2 group

Related

Get only rows where data is the max value

I have a table like this:
treatment | patient_id
3 | 1
3 | 1
3 | 1
2 | 1
2 | 1
1 | 1
2 | 2
2 | 2
1 | 2
I need to get only rows on max(treatment) like this:
treatment | patient_id
3 | 1
3 | 1
3 | 1
2 | 2
2 | 2
The patient_id 1 the max(treatment) is 3
The patient_id 2 the max(treatment) is 2
You can for example join on the aggregated table using the maximal value:
select t.*
from tmp t
inner join (
select max(a) max_a, b
from tmp
group by b
) it on t.a = it.max_a and t.b = it.b;
Here's the db fiddle.
Try this :
WITH list AS
( SELECT patient_id, max(treatment) AS treatment_max
FROM your_table
GROUP BY patient_id
)
SELECT *
FROM your_table AS t
INNER JOIN list AS l
ON t.patient_id = l.patient_id
AND t.treatment = l.treatment_max
You can use rank:
with u as
(select *, rank() over(partition by patient_id order by treatment desc) r
from table_name)
select treatment, patient_id
from u
where r = 1;
Fiddle
use corelated subquery
select t1.* from table_name t1
where t1.treatment=( select max(treatment) from table_name t2 where t1.patient_id=t2.patient_id
)

Create table with values from one column and another column without intersection

I have a table like so:
userid | clothesid
-------|-----------
1 | 1
1 | 3
2 | 1
2 | 4
2 | 5
What I want from this table is a table like so:
userid | clothesid
-------|-----------
1 | 4
1 | 5
2 | 3
How can I do this?
I've tried it with one entry as:
select distinct r.clothesid from table r where r.clothes not in (select r1.clothes from table r1 where r1.userid=1);
and this returns 4,5, but I'm not sure where to proceed from here
You can cross join the list of userids and the list of clothesid to generate all combinations, and then use not exists on the original table to identify the missing rows:
select u.userid, c.clothesid
from (select distinct userid from mytable) u
cross join (select distinct clothesid from mytable) c
where not exists(
select 1 from mytable t on t.userid = u.userid and t.clothesid = c.clothesid
)
I think you want:
select (case when t1.clothesid is not null then 2 else 1 end),
coalesce(t1.clothesid, t2.clothesid)
from (select t.*
from t
where t.userid = 1
) t1 full join
(select t.*
from t
where t.userid = 2
) t2
on t1.clothesid = t2.clothesid
where t1.clothesid is null or t2.clothesid is null;
Actually, I think I have a simpler solution:
select (case when min(t.userid) = 1 then 2 else 1 end), clothesid
from t
group by clothesid
having count(*) = 1;
Here is a db<>fiddle.
Left join all the combinations of userid and clothesid to the table and return only the unmatched rows:
select t1.userid, t2.clothesid
from (select distinct userid from tablename) t1
cross join (select distinct clothesid from tablename) t2
left join tablename t on t.userid = t1.userid and t.clothesid = t2.clothesid
where t.userid is null
Or with the operator EXCEPT:
select t1.userid, t2.clothesid
from (select distinct userid from tablename) t1
cross join (select distinct clothesid from tablename) t2
except
select userid, clothesid
from tablename
See the demo.
Results:
> userid | clothesid
> -----: | --------:
> 1 | 4
> 1 | 5
> 2 | 3

Grouping 'groups' with common element

I have a table like below, containing a group_id and some value.
group_id | value
---------+-------
1 | A
1 | B
2 | C
2 | D
2 | A
3 | E
3 | C
4 | G
4 | H
What I want to get is a unique number for each group that is somehow connected.. like this:
Group 1 and 2 have a common element A, Group 1 and 3 have a common element C > so this is actually one big group.
master_id | group_id | value
----------+----------+--------
1 | 1 | A
1 | 1 | B
1 | 2 | C
1 | 2 | D
1 | 2 | A
1 | 3 | E
1 | 3 | C
2 | 4 | G
2 | 4 | H
How can I get this master_id?
Calculating the master group is a graph-walking problem, which implies a recursive CTE. I would approach this by:
Generating edges between the groups, based on the values.
Traversing the edges without visiting previous groups.
The calculation of the master group is then the minimum of the visited groups for each group.
In SQL, this looks like:
with edges as (
select distinct t1.group_id as group_id_1, t2.group_id as group_id_2
from t t1 join
t t2
on t1.value = t2.value
),
cte as (
select e.group_id_1, e.group_id_2, convert(varchar(max), concat(',', group_id_1, ',', group_id_2)) as visited, 1 as lev
from edges e
union all
select cte.group_id_1, e.group_id_2,
concat(visited, e.group_id_2, ','), lev + 1
from cte join
edges e
on e.group_id_1 = cte.group_id_2
where cte.visited not like concat('%,', e.group_id_2, ',%') and lev < 5
)
select group_id_1, dense_rank() over (order by min(group_id_2)) as master_group
from cte
group by group_id_1;
Here is a db<>fiddle.
Sql server example:
WITH GetConnected AS (
SELECT DISTINCT g1.group_id sourceGroup, g2.group_id connectedGroup
FROM #groups g1
LEFT JOIN #groups g2
ON g1.value = g2.value
UNION ALL
SELECT g1.group_id sourceGroup, g3.connectedGroup connectedGroup
FROM #groups g1
INNER JOIN #groups g2
ON g1.value = g2.value
AND g1.group_id < g2.group_id
INNER JOIN GetConnected g3
ON g3.sourceGroup = g2.group_id
AND g3.connectedGroup > g2.group_id
), GetGroups AS (
SELECT MIN(sourceGroup) sourceGroup, connectedGroup, DENSE_RANK() OVER (ORDER BY MIN(sourceGroup)) rk
FROM GetConnected
GROUP BY connectedGroup)
SELECT gg.rk master_id, g.group_id, g.value
FROM GetGroups gg
INNER JOIN #groups g
ON gg.connectedGroup = g.group_id
ORDER BY gg.rk, gg.connectedGroup, g.value
If you consider postgre, I have example code:
WITH RECURSIVE GetConnected AS (
SELECT DISTINCT g1.group_id sourceGroup, g2.group_id connectedGroup
FROM groups g1
LEFT JOIN groups g2
ON g1.value = g2.value
UNION
SELECT g1.group_id sourceGroup, g3.connectedGroup connectedGroup
FROM groups g1
LEFT JOIN groups g2
ON g1.value = g2.value
INNER JOIN GetConnected g3
ON g3.sourceGroup = g2.group_id
), GetGroups AS (
SELECT MIN(sourceGroup) sourceGroup, connectedGroup, DENSE_RANK() OVER (ORDER BY MIN(sourceGroup)) rk
FROM GetConnected
GROUP BY connectedGroup)
SELECT gg.rk master_id, g.group_id, g.value
FROM GetGroups gg
INNER JOIN groups g
ON gg.connectedGroup = g.group_id
ORDER BY gg.rk, gg.connectedGroup, g.value

Select count of rows in two other tables

I have 3 tables. The main one in which I want to retrieve some information and two others for row count only.
I used a request like this :
SELECT A.*,
COUNT(B.id) AS b_count
FROM A
LEFT JOIN B on B.a_id = A.id
WHERE A.id > 50 AND B.ID < 100
GROUP BY A.id
from Gerry Shaw's comment here. It works perfectly but only for one table.
Now I need to add the row count for the third (C) table. I tried
SELECT A.*,
COUNT(B.id) AS b_count
COUNT(C.id) AS c_count
FROM A
LEFT JOIN B on B.a_id = A.id
LEFT JOIN C on C.a_id = A.id
GROUP BY A.id
but, because of the two left joins, my b_count and my c_count are false and equal to each other. In fact my actual b_count and c_count are equal to real_b_count*real_c_count. Any idea of how I could fix this without adding a lot of complexity/subqueries ?
Data sample as requested:
Table A (primary key : id)
id | data1 | data2
------+-------+-------
1 | 0,45 | 0,79
----------------------
2 | -2,24 | -0,25
----------------------
3 | 1,69 | 1,23
Table B (primary key : (a_id,fruit))
a_id | fruit
------+-------
1 | apple
------+-------
1 | banana
--------------
2 | apple
Table C (primary key : (a_id,color))
a_id | color
------+-------
2 | blue
------+-------
2 | purple
--------------
3 | blue
expected result:
id | data1 | data2 | b_count | c_count
------+-------+-------+---------+--------
1 | 0,45 | 0,79 | 2 | 0
----------------------+---------+--------
2 | -2,24 | -0,25 | 1 | 2
----------------------+---------+--------
3 | 1,69 | 1,23 | 0 | 1
There are two possible solutions. One is using subqueries behind SELECT
SELECT A.*,
(
SELECT COUNT(B.id) FROM B WHERE B.a_id = A.id AND B.ID < 100
) AS b_count,
(
SELECT COUNT(C.id) FROM C WHERE C.a_id = A.id
) AS c_count
FROM A
WHERE A.id > 50
the second are two SQL queries joined together
SELECT t1.*, t2.c_count
FROM
(
SELECT A.*,
COUNT(B.id) AS b_count
FROM A
LEFT JOIN B on B.a_id = A.id
WHERE A.id > 50 AND B.ID < 100
GROUP BY A.id
) t1
JOIN
(
SELECT A.*,
COUNT(C.id) AS c_count
FROM A
LEFT JOIN C on C.a_id = A.id
WHERE A.id > 50
GROUP BY A.id
) t2 ON t1.id = t2.id
I prefer the second syntax since it clearly shows the optimizer that you are interested in GROUP BY, however, the query plans are usually the same.
If tables B & C also have their own key fields, then you can use COUNT DISTINCT on the primary key rather than foreign key. That gets around the multi-line problem you see on link to several tables. If you can post the table structures then we can advise further.
Try something like this
SELECT A.*,
(SELECT COUNT(B.id) FROM B WHERE B.a_id = A.id) AS b_count,
(SELECT COUNT(C.id) FROM C WHERE C.a_id = A.id) AS c_count
FROM A
That is the easier way I can think:
Create table #a (id int, data1 float, data2 float)
Create table #b (id int, fruit varchar(50))
Create table #c (id int, color varchar(50))
Insert into #a
SELECT 1, 0.45, 0.79
UNION ALL SELECT 2, -2.24, -0.25
UNION ALL SELECT 3, 1.69, 1.23
Insert into #b
SELECT 1, 'apple'
UNION ALL SELECT 1, 'banana'
UNION ALL SELECT 2, 'orange'
Insert into #c
SELECT 2, 'blue'
UNION ALL SELECT 2, 'purple'
UNION ALL SELECT 3, 'orange'
SELECT #a.*,
(SELECT COUNT(#b.id) FROM #b where #b.id = #a.id) AS b_count,
(SELECT COUNT(#c.id) FROM #c where #c.id = #a.id) AS b_count
FROM #a
ORDER BY #a.id
Result:
id data1 data2 b_count b_count
1 0,45 0,79 2 0
2 -2,24 -0,25 1 2
3 1,69 1,23 0 1
If table b and c have unique id, you can try this:
SELECT A.*,
COUNT(distinct B.fruit) AS b_count,
COUNT(distinct C.color) AS c_count
FROM A
LEFT JOIN B on B.a_id = A.id
LEFT JOIN C on C.a_id = A.id
GROUP BY A.id
See SQLFiddle MySQL demo.

left join without duplicate values using MIN()

I have a table_1:
id custno
1 1
2 2
3 3
and a table_2:
id custno qty descr
1 1 10 a
2 1 7 b
3 2 4 c
4 3 7 d
5 1 5 e
6 1 5 f
When I run this query to show the minimum order quantities from every customer:
SELECT DISTINCT table_1.custno,table_2.qty,table_2.descr
FROM table_1
LEFT OUTER JOIN table_2
ON table_1.custno = table_2.custno AND qty = (SELECT MIN(qty) FROM table_2
WHERE table_2.custno = table_1.custno )
Then I get this result:
custno qty descr
1 5 e
1 5 f
2 4 c
3 7 d
Customer 1 appears twice each time with the same minimum qty (& a different description) but I only want to see customer 1 appear once. I don't care if that is the record with 'e' as a description or 'f' as a description.
First of all... I'm not sure why you need to include table_1 in the queries to begin with:
select custno, min(qty) as min_qty
from table_2
group by custno;
But just in case there is other information that you need that wasn't included in the question:
select table_1.custno, ifnull(min(qty),0) as min_qty
from table_1
left outer join table_2
on table_1.custno = table_2.custno
group by table_1.custno;
"Generic" SQL way:
SELECT table_1.custno,table_2.qty,table_2.descr
FROM table_1, table_2
WHERE table_2.id = (SELECT TOP 1 id
FROM table_2
WHERE custno = table_1.custno
ORDER BY qty )
SQL 2008 way (probably faster):
SELECT custno, qty, descr
FROM
(SELECT
custno,
qty,
descr,
ROW_NUMBER() OVER (PARTITION BY custno ORDER BY qty) RowNum
FROM table_2
) A
WHERE RowNum = 1
If you use SQL-Server you could use ROW_NUMBER and a CTE:
WITH CTE AS
(
SELECT table_1.custno,table_2.qty,table_2.descr,
RN = ROW_NUMBER() OVER ( PARTITION BY table_1.custno
Order By table_2.qty ASC)
FROM table_1
LEFT OUTER JOIN table_2
ON table_1.custno = table_2.custno
)
SELECT custno, qty,descr
FROM CTE
WHERE RN = 1
Demolink