sql select records with matching subsets

sql select records with matching subsets - sql

There are two sets of employees: managers and grunts.
For each manager, there's a table manager_meetings that holds a list of which meetings each manager attended. A similar table grunt_meetings holds a list of which meetings each grunt attended.
So:
manager_meetings grunt_meetings
managerID meetingID gruntID meetingID
1 a 4 a
1 b 4 b
1 c 4 c
2 a 4 d
2 b 5 a
3 c 5 b
3 d 5 c
3 e 6 a
6 c
7 b
7 a
The owner doesn't like it when a manager and a grunt know exactly the same information. It makes his head hurt. He wants to identify this situation, so he can demote the manager to a grunt, or promote the grunt to a manager, or take them both golfing. The owner likes to golf.
The task is to list every combination of manager and grunt where both attended exactly the same meetings. If the manager attended more meeting than the grunt, no match. If the grunt attended more meetings than the manager, no match.
The expected results here are:
ManagerID GruntID
2 7
1 5
...because manager 2 and grunt 7 both attended (a,b), while manager 1 and grunt 5 both attended (a,b,c).
I can solve it in a clunky way, by pivoting up the subset of meetings in a subquery into XML, and comparing each grunt's XML list to each manager's XML. But that's horrible, and also I have to explain to the owner what XML is. And I don't like golfing.
Is there some better way to do "WHERE {subset1} = {subset2}"? It feels like I'm missing some clever kind of join.
SQL Fiddle

Here is a version that works:
select m.mId, g.gId, count(*) --select m.mid, g.gid, mm.meetingid, gm.meetingid as gmm
from manager m cross join
grunt g left outer join
(select mm.*, count(*) over (partition by mm.mid) as cnt
from manager_meeting mm
) mm
on mm.mid = m.mId full outer join
(select gm.*, count(*) over (partition by gm.gid) as cnt
from grunt_meeting gm
) gm
on gm.gid = g.gid and gm.meetingid = mm.meetingid
group by m.mId, g.gId, mm.cnt, gm.cnt
having count(*) = mm.cnt and mm.cnt = gm.cnt;
The string comparison method is shorter, perhaps easier to understand, and probably faster.
EDIT:
For your particular case of getting exact matches, the query can be simplified:
select mm.mId, gm.gId
from (select mm.*, count(*) over (partition by mm.mid) as cnt
from manager_meeting mm
) mm join
(select gm.*, count(*) over (partition by gm.gid) as cnt
from grunt_meeting gm
) gm
on gm.meetingid = mm.meetingid and
mm.cnt = gm.cnt
group by mm.mId, gm.gId
having count(*) = max(mm.cnt);
This might be more competitive with the string version, both in terms of performance and clarity.
It counts the number of matches between a grunt and a manager. It then checks that this is all the meetings for each.

An attempt at avenging Aaron's defeat – a solution using EXCEPT:
SELECT
m.mID,
g.gID
FROM
manager AS m
INNER JOIN
grunt AS g
ON NOT EXISTS (
SELECT meetingID
FROM manager_meeting
WHERE mID = m.mID
EXCEPT
SELECT meetingID
FROM grunt_meeting
WHERE gID = g.gID
)
AND NOT EXISTS (
SELECT meetingID
FROM grunt_meeting
WHERE gID = g.gID
EXCEPT
SELECT meetingID
FROM manager_meeting
WHERE mID = m.mID
);
Basically, subtract a grunt's set of meetings from a manager's set of meetings, then the other way round. If neither result contains rows, the grunt and the manager attended the same set of meetings.
Please note that this query will match managers and grunts that never attended a single meeting.

An alternative version - but requires another table. Basically, we give each meeting a distinct power of two as it's 'value', then sum every manager's meeting value and each grunt's meeting value. Where they're the same, we have a match.
It should be possible to make the meeting_values table a TVF, but this is a little bit simpler.
SQL Fiddle
Additional table:
CREATE TABLE meeting_values (value INT, meetingID CHAR(1));
INSERT INTO meeting_values VALUES
(1,'a'),(2,'b'),(4,'c'),(8,'d'),(16,'e');
And the query:
SELECT managemeets.mID, gruntmeets.gID
FROM
( SELECT gm.gID, sum(value) AS meeting_totals
FROM grunt_meeting gm
INNER JOIN
meeting_values mv ON gm.meetingID = mv.meetingID
GROUP BY gm.gID
) gruntmeets
INNER JOIN
( SELECT mm.mID, sum(value) AS meeting_totals
FROM manager_meeting mm
INNER JOIN
meeting_values mv ON mm.meetingID = mv.meetingID
GROUP BY mm.mID
) managemeets ON gruntmeets.meeting_totals = managemeets.meeting_totals

Related

SQL aggregate functions, inner join

I am working on writing a sql to get the SID and SNAME. In this task, I need to count which team win the max number of League and find out the SID.
Leagues(LID, CHAMPION_TID)
LID: League ID ; CHAMPION_TID: champion team ID
SUPPORT(SID, LID)
SPONSORS(SID, SNAME)
PRIMARY KEY: LID,SID
Now, I can find out which team win the max number of League through the following SQL:
SELECT
MAX(y.cham)
FROM
(SELECT
CHAMPION_TID, COUNT(L.CHAMPION_TID) AS cham
FROM
LEAGUES L
GROUP BY
L.CHAMPION_TID) y, LEAGUES L
WHERE
y.CHAMPION_TID = L.CHAMPION_TID;
I am confusing in the following step. My idea get the LID, then use the join table to display SID and SNAME. But I suck in this step.
SELECT L.LID, MAX(y.cham)
FROM
(SELECT CHAMPION_TID, COUNT(L.CHAMPION_TID) AS cham
FROM LEAGUES L
GROUP BY L.CHAMPION_TID) y, LEAGUES L
WHERE
y.CHAMPION_TID = L.CHAMPION_TID

You can use the following to find the Sponsor ID and Sponsor Name:
SELECT DISTINCT
sp.SID,
sp.SNAME
FROM
LEAGUES l3
INNER JOIN support s ON
l3.LID = s.LID
INNER JOIN SPONSORS sp ON
s.SID = sp.SID
WHERE
l3.CHAMPION_TID IN (
SELECT
l2.CHAMPION_TID
FROM
LEAGUES l2
GROUP BY
l2.CHAMPION_TID
HAVING
count(l2.CHAMPION_TID) = (
SELECT
count(l1.CHAMPION_TID)
FROM
LEAGUES l1
GROUP BY
l1.CHAMPION_TID
ORDER BY
count(l1.CHAMPION_TID) DESC
FETCH FIRST 1 ROW ONLY
)
);
It finds the count of CHAMPION_TID in LEAGUES, orders it by desc (such that the highest count is always on top), then uses it to find the associated CHAMPION_TID. It handles ties for max(count(CHAMPION_TID)) as well :)
If fetch first 1 row only does not work, you can use select top 1 l1.CHAMPION_TID...
Here is a working demo using Postgres.

SQL - Select highest value when data across 3 tables

I have 3 tables:
Person (with a column PersonKey)
Telephone (with columns Tel_NumberKey, Tel_Number, Tel_NumberType e.g. 1=home, 2=mobile)
xref_Person+Telephone (columns PersonKey, Tel_NumberKey, CreatedDate, ModifiedDate)
I'm looking to get the most recent (e.g. the highest Tel_NumberKey) from the xref_Person+Telephone for each Person and use that Tel_NumberKey to get the actual Tel_Number from the Telephone table.
The problem I am having is that I keep getting duplicates for the same Tel_NumberKey. I also need to be sure I get both the home and mobile from the Telephone table, which I've been looking to do via 2 individual joins for each Tel_NumberType - again getting duplicates.
Been trying the following but to no avail:
-- For HOME
SELECT
p.PersonKey, pn.Phone_Number, pn.Tel_NumberKey
FROM
Persons AS p
INNER JOIN
xref_Person+Telephone AS x ON p.PersonKey = x.PersonKey
INNER JOIN
Telephone AS pn ON x.Tel_NumberKey = pn.Tel_NumberKey
WHERE
pn.Tel_NumberType = 1 -- e.g. Home phone number
AND pn.Tel_NumberKey = (SELECT MAX(pn1.Tel_NumberKey) AS Tel_NumberKey
FROM Person AS p1
INNER JOIN xref_Person+Telephone AS x1 ON p1.PersonKey = x1.PersonKey
INNER JOIN Telephone AS pn1 ON x1.Tel_NumberKey = pn1.Tel_NumberKey
WHERE pn1.Tel_NumberType = 1
AND p1.PersonKey = p.PersonKey
AND pn1.Tel_Number = pn.Tel_Number)
ORDER BY
p.PersonKey
And have been looking over the following links but again keep getting duplicates.
SQL select max(date) and corresponding value
How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?
SQL Server: SELECT only the rows with MAX(DATE)
Am sure this must be possible but been at this a couple of days and can't believe its that difficult to get the most recent / highest value when referencing 3 tables. Any help greatly appreciated.

select *
from
( SELECT p.PersonKey, pn.Phone_Number, pn.Tel_NumberKey
, row_number() over (partition by p.PersonKey, pn.Phone_Number order by pn.Tel_NumberKey desc) rn
FROM
Persons AS p
INNER JOIN
xref_Person+Telephone AS x ON p.PersonKey = x.PersonKey
INNER JOIN
Telephone AS pn ON x.Tel_NumberKey = pn.Tel_NumberKey
WHERE
pn.Tel_NumberType = 1
) tt
where tt.rn = 1
ORDER BY
tt.PersonKey

you have to use max() function and then you have to order by rownum in descending order like.
select f.empno
from(select max(empno) empno from emp e
group by rownum)f
order by rownum desc
It will give you all employees having highest employee number to lowest employee number. Now implement it with your case then let me know.

Help construct a query given a schema

Here is the schema for the database: http://i.stack.imgur.com/omX60.png
Question is: How many people have at least five еntitlements?
I've got this, please tell me how wrong it is and fix it.
select count(personId)
from serialNumber_tbl natural join entitlement_tbl
group by personId
having sum(entitlementID) > 5
Thank you.

The condition for at least 5 is >= 5, not > 5
You need to count the distinct ids in the entitlement table, not person
This gives you the persons, next you need to subquery it to find the count of persons.
select count(personId)
FROM
(
select personId
from serialNumber_tbl natural join entitlement_tbl
group by personId
having count(distinct entitlement_id) >= 5
) X

Your request isn't exactly clear. Are you asking for the count of people with more than five entitlement rows whether they exist on multiple serial numbers or not? If so, you could do something like:
Select Count(*) As CountOfPeople
From Person_tbl As P
Where Exists (
Select 1
From serialNumbers As S1
Join entitlement_tbl As E1
On E1.serialNumberId = S.serialNumberId
Where S1.personId = P.personId
Having Count(*) >= 5
)
Or is it that you are asking to find the number of people that have a serialNumber with more than five entitlements? If that is the case, then you could do something like:
Select Count(*) As CountOfPeople
From Person_tbl As P
Where Exists (
Select 1
From serialNumbers As S1
Join entitlement_tbl As E1
On E1.serialNumberId = S.serialNumberId
Where S1.personId = P.personId
Having Count( Distinct S1.serialNumberId ) >= 5
)

Better way to demand, in SQL, that a column contains every specified value

Imagine you have two tables, with a one to many relationship.
For this example, I will suggest that there are two tables: Person, and Homes.
The person table holds a persons name, and gives them an ID. The homes table, holds the association of homes to a person. PID joins to "Person.ID"
And, in this tiny DB, a person can have no homes, or many homes.
I hope I drew that right.
How do I write a select, that returns everyone with every specified house type?
Let's say these are valid "Types" in the homes table:
Cottage, Main, Mansion, Spaceport.
I want to return everyone, in the Person table, who has a spaceport and a Cottage.
The best I could come up with was this:
SELECT DISTINCT( p.name ) AS name
FROM person p
INNER JOIN homes h ON h.pid = p.id
WHERE 'spaceport' in (
SELECT DISTINCT( type ) AS type
FROM homes
WHERE pid = p.id
)
AND 'cottage' in (
SELECT DISTINCT( type ) AS type
FROM homes
WHERE pid = p.id
)
When I wrote that, it works, but I'm pretty sure there has to be a better way.

The HAVING clause here will guarantee that the persons returned have both types, not just one or the other.
SELECT p.name
FROM person p
INNER JOIN homes h
ON p.id = h.pid
AND h.type IN ('spaceport', 'cottage')
GROUP BY p.name
HAVING COUNT(DISTINCT h.type) = 2

select * from homes;
home_id person_id type
--
1 1 cottage
2 1 mansion
3 2 cottage
4 3 mansion
5 4 cottage
6 4 cottage
To find the id numbers of every person who has both a cottage and a mansion, group by the id number, restrict the output to cottages and mansions, and count the distinct types.
select person_id
from homes
where type in ('cottage','mansion')
group by person_id
having count(distinct type) = 2;
person_id
--
1
You can use this query in a join to get all the columns from the person table.
select person.*
from person
inner join (select person_id
from homes
where type in ('cottage','mansion')
group by person_id
having count(distinct type) = 2) T
on person.person_id = T.person_id;
Thanks to Joe for pointing out an error in my count().

Not sure about the performance on this one, but here goes:
SELECT PID FROM (
SELECT PID, COUNT(PID) cnt FROM (
SELECT DISTINCT PID, Type FROM Homes
WHERE Type IN ('Type1', 'Type2', 'Type3')
) a
GROUP BY PID
) b
WHERE b.cnt = 3
You'd have to dynamically generate your IN clause as well as the WHERE b.CNT clause.

SQL query for finding row with same column values that was created most recently

If I have three columns in my MySQL table people, say id, name, created where name is a string and created is a timestamp.. what's the appropriate query for a scenario where I have 10 rows and each row has a record with a name. The names could have a unique id, but a similar name none the less. So you can have three Bob's, two Mary's, one Jack and 4 Phil's.
There is also a hobbies table with the columns id, hobby, person_id.
Basically I want a query that will do the following:
Return all of the people with zero hobbies, but only check by the latest distinct person created, if that makes sense. Meaning if there is a Bob person that was created yesterday, and one created today.. I only want to know if the Bob created today has zero hobbies. The one from yesterday is no longer relevant.

select pp.id
from people pp, (select name, max(created) from people group by name) p
where pp.name = p.name
and pp.created = p.created
and id not in ( select person_id from hobbies )

SELECT latest_person.* FROM (
SELECT p1.* FROM people p1
WHERE NOT EXISTS (
SELECT * FROM people p2
WHERE p1.name = p2.name AND p1.created < p2.created
)
) AS latest_person
LEFT OUTER JOIN hobbies h ON h.person_id = latest_person.id
WHERE h.id IS NULL;

Try This:
Select *
From people p
Where timeStamp =
(Select Max(timestamp)
From people
Where name = p.Name
And not exists
(Select * From hobbies
Where person_id = p.id))

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

sql select records with matching subsets - sql

Related

SQL aggregate functions, inner join

SQL - Select highest value when data across 3 tables

Help construct a query given a schema

Better way to demand, in SQL, that a column contains every specified value

SQL query for finding row with same column values that was created most recently

Categories

Resources