Join tables with distinct highest ranked row - sql

I have three tables defined like this:
[tbMember]
memberID | memberName
1 | John
2 | Peter
[tbGroup]
groupID | groupName
1 | Alpha
2 | Beta
3 | Gamma
[tbMemberGroupRelation]
memberID | groupID | memberRank (larger number is higher)
1 | 1 | 0
1 | 2 | 1
2 | 1 | 5
2 | 2 | 3
2 | 3 | 1
And now I want to perform a table-join selection to get result which contains (distinct) member with his highest ranked group in each row, for the given example above, the query result is desired to be:
memberID | memberName | groupName | memberRank
1 | John | Beta | 1
2 | Peter | Alpha | 5
Is there a way to implement it in a single SQL like following style ?
select * from tbMember m
left join tbMemberGroupRelation mg on (m.MemberID = mg.MemberID and ......)
left join tbGroup g on (mg.GroupID = g.GroupID)
Any other solutions are also appreciated if it is impossible to write in a simple query.
========= UPDATED =========
Only ONE highest rank is allowed in table

One solution would be to create an inverted sequence/rank of the memberRank so that the highest rank per member is always equal to 1.
This is how I achieved it using a sub-query:
SELECT
m.memberID,
m.memberName,
g.groupName,
mg.memberRank
FROM
tbMember m
LEFT JOIN
(
SELECT
memberID,
groupID,
groupName,
memberRank,
RANK() OVER(PARTITION BY memberID ORDER BY memberRank DESC) AS invRank
FROM
tbMemberGroupRelation
) mg
ON (mg.memberID = m.memberID)
AND (mg.invRank = 1)
LEFT JOIN
tbGroup g
ON (g.groupID = mg.groupID);

An alternative method:
SELECT
M.memberID,
M.memberName,
G.groupName,
MG.memberRank
FROM
Member M
LEFT OUTER JOIN MemberGroup MG ON MG.memberID = M.memberID
LEFT OUTER JOIN MemberGroup MG2 ON
MG2.memberID = M.memberID AND
MG2.memberRank > MG.memberRank
INNER JOIN [Group] G ON G.groupid = MG.groupid
WHERE
MG2.memberid IS NULL
Might perform better in some situations due to indexing, etc.

create table [tbGroup] (groupid int, groupname varchar(8000))
Insert [tbGroup] Values (1, 'Alpha')
Insert [tbGroup] Values (2, 'Beta')
Insert [tbGroup] Values (3, 'Gamma')
create table [tbMemberGroupRelation] (memberid int, groupid int, memberrank int)
Insert [tbMemberGroupRelation] Values (1,1,0)
Insert [tbMemberGroupRelation] Values (1,2,1)
Insert [tbMemberGroupRelation] Values (2,1,5)
Insert [tbMemberGroupRelation] Values (2,2,3)
Insert [tbMemberGroupRelation] Values (2,3,1)
;With cteMemberGroupRelation As
(
Select *, Row_Number() Over (Partition By MemberID Order By MemberRank Desc) SortOrder
From [tbMemberGroupRelation]
)
Select *
From tbMember M
Join (Select * From cteMemberGroupRelation Where SortOrder = 1) R On R.memberid = M.memberid
Join tbGroup G On G.groupid = R.groupid

Related

SQL Query : how to Select Maximum value of each group of joined tables

I have this problem of returning maximum AGE of players in these 2 tables I have, Table tblplayers (with 34 records) when this table is joined to another table called tblClubs (with 9 records).
tblPlayers fields are:
ID(Autonumber) | CLubID(Number) | Player Name(Text) | PlayerAge(Number)
tblClubs fields are:
ID(Autonumber) | ClubName (Text)
Now I need to show Names of players with maximum ages among other players in their own clubs and the club name beside that like this :
Club Name | Player Name | Maximum Age (older player of each club)
please tell me how can i make it?
You can solve this with window functions, if your database supports them:
select c.club_name, p.player_name, p.player_age
from clubs c
inner join (
select
p.*,
rank() over(partition by p.club_id order by p.player_age desc) rn
players p
) p on p.club_id = c.id and p.rn = 1
A common and quite portable alternative is to filter with a subquery:
select
c.club_name,
t.player_name,
t.player_age
from players p
inner join clubs c on c.id = p.club_id
where p.age = (select max(p1.age) from players p1 where p1.club_id = p.club_id)
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE tblPlayers (ID INT, ClubID INT,PlayerName VARCHAR(255)
,PlayerAge INT)
CREATE TABLE tblClubs (ID int,ClubName VARCHAR(255))
INSERT INTO tblPlayers(ID,ClubID,PlayerName
,PlayerAge) VALUES (1,1,'John',30)
,(2,1,'Mark',25)
,(3,1,'Albert',36)
,(4,2,'David',33)
,(5,2,'John',31)
INSERT INTO tblClubs(ID, ClubName) VALUES(1,'TEAM 1')
,(2,'TEAM 2')
Query 1:
SELECT
*
FROM
(SELECT A.*,RANK() OVER
(PARTITION BY A.ClubID ORDER BY A.PlayerAge desc) as rn
FROM tblPlayers A
INNER JOIN tblClubs B ON A.ClubID=B.ID) t
WHERE
t.rn = 1
Results:
| ID | ClubID | PlayerName | PlayerAge | rn |
|----|--------|------------|-----------|----|
| 3 | 1 | Albert | 36 | 1 |
| 4 | 2 | David | 33 | 1 |

Join one table with two other ones by id

I am trying to join one table with two others that are unrelated to each other but are linked to the first one by an id
I have the following tables
create table groups(
id int,
name text
);
create table members(
id int,
groupid int,
name text
);
create table invites(
id int,
groupid int,
status int \\ 2 for accepted, 1 if it's pending
);
Then I inserted the following data
insert into groups (id, name) values(1,'group');
insert into members(id, groupid, name) values(1,1,'admin'),(1,1,'other');
insert into invites(id, groupid, status) values(1,1,2),(2,1,1),(3,1,1);
Obs:
The admin does not has an invite
The group has an approved invitation with status 2 (because the member 'other' joined)
The group has two pending invites with status 1
I am trying to do a query that gets the following result
groupid | name | inviteId
1 | admin | null
1 | other | null
1 | null | 2
1 | null | 3
I have tried the following querys with no luck
select g.id, m.name, i.id from groups g
left join members m ON m.groupid = g.id
left join invites i ON i.groupid = g.id and i.status = 1;
select g.id, m.name, i.id from groups g
join (select groupid, name from members) m ON m.groupid = g.id
join (select groupid, id from invites where status = 1) i ON i.groupid = g.id;
Any ideas of what I am doing wrong?
Because members and invites are not related, you need to use two separate queries and use UNION (automatically removes duplicates) or UNION ALL (keeps duplicates) to get the output you desire:
select g.id as groupid, m.name, null as inviteid from groups g
join members m ON m.groupid = g.id
union all
select g.id, null, i.id from groups g
join invites i ON (i.groupid = g.id and i.status = 1);
Output:
groupid | name | inviteid
---------+-------+----------
1 | admin |
1 | other |
1 | | 3
1 | | 2
(4 rows)
Without a UNION, your query implies that the tables have some sort of relationship, so the columns are joined side-by-side. Since you want to preserve the null values, implying that the tables are not related, you need to concatenate/join them vertically with UNION
Disclosure: I work for EnterpriseDB (EDB)

Need to get multiple value from a table in the left join

I have a table having data as below.
Say I have two versions of the project and I need to migrate data from older version to a new version.
Let's say tblFolders in version1
+----+------------+--------------+--------------+
| id | FolderName | CreatedBy | ModifiedBy |
+----+------------+--------------+--------------+
| 1 | SIMPLE | 5 | 6 |
| 2 | SIMPLE1 | 8 | 1 |
+----+------------+--------------+--------------+
And another table having userid of both versions.
Let's say its tblUsersMapping
+----+----------------+-------------------+
| id | Version1UserID | Version2UserID |
+----+----------------+-------------------+
| 1 | 1 | 500 |
| 2 | 2 | 465 |
| 3 | 3 | 12 |
| 4 | 4 | 85 |
| 5 | 5 | 321 |
| 6 | 6 | 21 |
| 7 | 7 | 44 |
| 8 | 8 | 884 |
+----+----------------+-------------------+
Now I need to transfer data from version 1 to version 2. When I transferring data, CreatedBy and Modifiedby ids should by of the new version.
So though I have data as below
| 1 | SIMPLE | 5 | 6 |
It should be transferred as below
| 1 | SIMPLE | 321 | 21 |
For that, I have added a join so far between these two tables as below.
SELECT id,
foldername,
B.version2userid AS CreatedBy
FROM tblfolders A WITH(nolock)
LEFT JOIN tblusersmapping B WITH(nolock)
ON A.createdby = B.version1userid
This would give me a proper result for column CreatedBy.
But how can I get userid from tblUsersMapping for ModifiedBy column?
Doing below will not work and will give NULL for both the columns.
SELECT id,
foldername,
b.version2userid AS createdby,
b.version2userid AS modifiedby
FROM tblfolders A WITH(nolock)
LEFT JOIN tblusersmapping B WITH(nolock)
ON a.createdby = b.version1userid,
a.modifiedby = b.version1userid
One way is I can add another join with tblusersmapping table. But its not a good idea because tables can have a huge data and another join will affect the performance of the query.
My question is how can I get Version1UserID and Version2UserID from mapping table based on createdby and modifiedby columns?
You can use multiple select which may help you.
SELECT id,
foldername,
(SELECT version2userid from tblUsersMapping where Version1UserID=tblfolders.CreatedBy) AS CreatedBy,
(SELECT version2userid from tblUsersMapping where Version1UserID=tblfolders.ModifiedBy) AS ModifiedBy
FROM tblfolders
If you want to populate both the column where each column joins to to a different row, in that case you have to join the same table twice like following. You can't get it with a single table join the way you are expecting.
SELECT id,
foldername,
B.version2userid AS CreatedBy
C.Version2UserID AS ModifiedBy
FROM tblfolders A WITH(nolock)
LEFT JOIN tblusersmapping B WITH(nolock)
ON A.createdby = B.version1userid
LEFT JOIN tblusersmapping C WITH(nolock)
ON A.ModifiedBy = C.version1userid
Try this, it will work across all sample data,
select tf.id,tf.FolderName
,oa.Version2UserID as CreatedBy
,oa1.Version2UserID as ModifiedBy
from #tblFolders tf
outer apply(select top 1 Version2UserID
from #tblUsersMapping tu
where tu.Version1UserID= tf.CreatedBy order by id desc)oa
outer apply(select top 1 Version2UserID
from #tblUsersMapping tu
where tu.Version1UserID= tf.ModifiedBy order by id desc)oa1
You can use UDF to return modifiedby and INNER JOIN instead of LEFT JOIN (if requirement meets) as below. I think it will help in the preformance
CREATE TABLE tblFolders (id INT, folderName VARCHAR(20), createdBy INT, modifiedBy INT)
INSERT INTO tblFolders VALUES
(1,'SIMPLE', 5,6),
(2,'SIMPLE1', 8,1)
CREATE TABLE tblUsersMapping(id INT, Version1UserID INT, Version2UserID INT)
INSERT INTO tblUsersMapping VALUES
(1,1,500),
(2,2,465),
(3,3,12),
(4,4,85),
(5,5,321),
(6,6,21),
(7,7,44),
(8,8,884)
SELECT a.id,
a.foldername,
b.version2userid AS createdby,
dbo.FNAReturnModifiedBy(a.modifiedBy) AS modifiedby
FROM tblfolders A WITH(nolock)
INNER JOIN tblusersmapping B WITH(nolock) ON a.createdby = b.version1userid
--Function
IF OBJECT_ID(N'dbo.FNAReturnModifiedBy', N'FN') IS NOT NULL
DROP FUNCTION dbo.FNAReturnModifiedBy
GO
CREATE FUNCTION dbo.FNAReturnModifiedBy(#updated_by INT)
RETURNS INT AS
BEGIN
DECLARE #updateUserID INT
SELECT #updateUserID = Version2UserID
FROM tblusersmapping WHERE Version1UserID = #updated_by
RETURN #updateUserID
END
OUTPUT:
id foldername createdby modifiedby
1 SIMPLE 321 21
2 SIMPLE1 884 500
Note :
I did not know about how to find the query performance. I wrote only for your expected output.
I am using SQL Server 2012.
I did not use more than one Join.
Query did JOIN, GROUP BY, ROW_NUMBER (), CASE instead of two LEFT JOIN
Input :
create table ##ver (id int, FolderName varchar (10), CreatedBy int, ModifiedBy int)
insert into ##ver values
(1,'SIMPLE',5,6)
,(2,'SIMPLE1',8,1)
,(3,'File',7, 5)
select * from ##ver
create table ##veruser (id int, Version1UserID int, Version2UserID int)
insert into ##veruser values
(1 , 1 , 500)
,(2 , 2 , 465)
,(3 , 3 , 12 )
,(4 , 4 , 85 )
,(5 , 5 , 321)
,(6 , 6 , 21 )
,(7 , 7 , 44 )
,(8 , 8 , 884)
select * from ##veruser
Query :
select
id, FolderName
, max (case when rn = 1 then Version2UserID end) Version1UserID
, max (case when rn = 2 then Version2UserID end) Version2UserID
from (
select
v.id, v.FolderName, u.Version1UserID, u.Version2UserID
, ROW_NUMBER () over
(partition by v.id order by v.id, v.CreatedBy,
case
when v.CreatedBy > v.ModifiedBy then u.Version1UserID
end desc
) rn
, v.CreatedBy, v.ModifiedBy
from ##ver v
join ##veruser u
on u.Version1UserID in (v.CreatedBy, v.ModifiedBy)
) a
group by id, FolderName
order by id
Update 1:
Query does :
Join the tables.
Row numbering, over (),
Partition by Id.
Order by File id (v.id), Creator id ascending, If creator id greater
than modified id, then creator id descending. (Due to second step this reordering is must)
Depends on 'rn' values, rows are transfer to columns
(You can find many examples at here)
Output :
id FolderName Version1UserID Version2UserID
1 SIMPLE 321 21
2 SIMPLE1 884 500
3 File 44 321
Try this one.
Select a.id,folderName,b.Version2UserId as createdby,c.Version2UserId as modifiedby
from tblFolders as a WITH(nolock)
inner join tblUsersMapping as b WITH(nolock) on a.createdby =b .Version1UserID
inner join tblUsersMapping as c WITH(nolock) on a.modifiedBy =c .Version1UserID

SQL Joins: Select status of reviews submitted by employees and also the list of employees who have not submitted the review for each year

For each year, for each employee, I want to list the status of the review that employee had submitted, or "not initiated" in case employee did not submit review for that year.
It's kind of difficult to express the question in words therefore I would try to explain it by giving example:
create table #employees
(
empid int,
name varchar(100)
)
Create table #review
(
empid int,
ryear int,
status varchar(20)
)
insert into #review values(1,2016,'S2')
insert into #review values(2,2016,'S2')
insert into #review values(2,2017,'S1')
insert into #review values(3,2017,'S2')
insert into #employees values(1,'jack')
insert into #employees values(2,'mack')
insert into #employees values(3,'rack')
insert into #employees values(4,'tack')
Wrong Query
select a.empid
,a.name
,b.ryear
,case isnull(b.status,'')
when ''
then 'Not Initiated'
else status
end as status
from #employees as a
left join #review as b
on a.empid = b.empid
and b.ryear in(select distinct
ryear
from #review
);--something like that
Expected Result:
+-------+------+-------+----------------+
| empid | name | ryear | status |
+-------+------+-------+----------------+
| 1 | jack | 2016 | S2 |
| 1 | jack | 2017 | not initiated |
| 2 | mack | 2016 | S2 |
| 2 | mack | 2017 | S1 |
| 3 | rack | 2016 | not initieated |
| 3 | rack | 2017 | S2 |
| 4 | tack | 2016 | Not Initiated |
| 4 | tack | 2017 | Not Initiated |
+-------+------+-------+----------------+
You could use a cross join on your sub query
select a.empid
,a.name
,c.ryear
,isnull(b.status,'Not Initiated') as status
from #employees as a
cross join(select distinct
ryear
from #review
) as c
left join #review as b
on b.ryear = c.ryear
and a.empid = b.empid
order by a.empid, ryear
This supposedly supports the presence of more than one review by the same employee in the same year:
SELECT Employees.empid, Employees.[name], ReviewYears.[Value]
, [Status] = ISNULL(LatestReviews.[status], 'Not Initiated')
FROM #employees AS Employees
CROSS JOIN (SELECT DISTINCT ryear AS [Value] FROM #review) AS ReviewYears -- We need some source of years, hopefully there are no missing years here.
LEFT JOIN (
SELECT *
FROM (
SELECT empid, ryear, [status]
, RN = ROW_NUMBER() OVER(PARTITION BY empid, ryear ORDER BY STATUS DESC) -- Per employee and year, we'll take only one status, hopefully we can order by statuses.
FROM #review
) AS T
WHERE RN = 1 -- Refer to comments at creation of RN.
) AS LatestReviews ON LatestReviews.empid = Employees.empid AND LatestReviews.ryear = ReviewYears.[Value] -- Refer to comments at creation of RN.
ORDER BY Employees.empid ASC, ReviewYears.[Value] ASC
Here's one using a common table expression;
It also supports multiple reviews in the same year for an employee.
WITH X AS
(SELECT distinct ryear FROM #review)
SELECT a.empid, a.name, X.ryear, isnull(b.status,'Not initiated')
FROM X as x
LEFT
JOIN #employees as a
ON 1=1
LEFT
JOIN #review as b
ON a.empid = b.empid
AND b.ryear = x.ryear
ORDER
BY a.empid,x.ryear

Joining tables based on the maximum value

Here's a simplified example of what I'm talking about:
Table: students exam_results
_____________ ____________________________________
| id | name | | id | student_id | score | date |
|----+------| |----+------------+-------+--------|
| 1 | Jim | | 1 | 1 | 73 | 8/1/09 |
| 2 | Joe | | 2 | 1 | 67 | 9/2/09 |
| 3 | Jay | | 3 | 1 | 93 | 1/3/09 |
|____|______| | 4 | 2 | 27 | 4/9/09 |
| 5 | 2 | 17 | 8/9/09 |
| 6 | 3 | 100 | 1/6/09 |
|____|____________|_______|________|
Assume, for the sake of this question, that every student has at least one exam result recorded.
How would you select each student along with their highest score? Edit: ...AND the other fields in that record?
Expected output:
_________________________
| name | score | date |
|------+-------|--------|
| Jim | 93 | 1/3/09 |
| Joe | 27 | 4/9/09 |
| Jay | 100 | 1/6/09 |
|______|_______|________|
Answers using all types of DBMS are welcome.
Answering the EDITED question (i.e. to get associated columns as well).
In Sql Server 2005+, the best approach would be to use a ranking/window function in conjunction with a CTE, like this:
with exam_data as
(
select r.student_id, r.score, r.date,
row_number() over(partition by r.student_id order by r.score desc) as rn
from exam_results r
)
select s.name, d.score, d.date, d.student_id
from students s
join exam_data d
on s.id = d.student_id
where d.rn = 1;
For an ANSI-SQL compliant solution, a subquery and self-join will work, like this:
select s.name, r.student_id, r.score, r.date
from (
select r.student_id, max(r.score) as max_score
from exam_results r
group by r.student_id
) d
join exam_results r
on r.student_id = d.student_id
and r.score = d.max_score
join students s
on s.id = r.student_id;
This last one assumes there aren't duplicate student_id/max_score combinations, if there are and/or you want to plan to de-duplicate them, you'll need to use another subquery to join to with something deterministic to decide which record to pull. For example, assuming you can't have multiple records for a given student with the same date, if you wanted to break a tie based on the most recent max_score, you'd do something like the following:
select s.name, r3.student_id, r3.score, r3.date, r3.other_column_a, ...
from (
select r2.student_id, r2.score as max_score, max(r2.date) as max_score_max_date
from (
select r1.student_id, max(r1.score) as max_score
from exam_results r1
group by r1.student_id
) d
join exam_results r2
on r2.student_id = d.student_id
and r2.score = d.max_score
group by r2.student_id, r2.score
) r
join exam_results r3
on r3.student_id = r.student_id
and r3.score = r.max_score
and r3.date = r.max_score_max_date
join students s
on s.id = r3.student_id;
EDIT: Added proper de-duplicating query thanks to Mark's good catch in comments
SELECT s.name,
COALESCE(MAX(er.score), 0) AS high_score
FROM STUDENTS s
LEFT JOIN EXAM_RESULTS er ON er.student_id = s.id
GROUP BY s.name
Try this,
Select student.name, max(result.score) As Score from Student
INNER JOIN
result
ON student.ID = result.student_id
GROUP BY
student.name
With Oracle's analytic functions this is easy:
SELECT DISTINCT
students.name
,FIRST_VALUE(exam_results.score)
OVER (PARTITION BY students.id
ORDER BY exam_results.score DESC) AS score
,FIRST_VALUE(exam_results.date)
OVER (PARTITION BY students.id
ORDER BY exam_results.score DESC) AS date
FROM students, exam_results
WHERE students.id = exam_results.student_id;
Select Name, T.Score, er. date
from Students S inner join
(Select Student_ID,Max(Score) as Score from Exam_Results
Group by Student_ID) T
On S.id=T.Student_ID inner join Exam_Result er
On er.Student_ID = T.Student_ID And er.Score=T.Score
Using MS SQL Server:
SELECT name, score, date FROM exam_results
JOIN students ON student_id = students.id
JOIN (SELECT DISTINCT student_id FROM exam_results) T1
ON exam_results.student_id = T1.student_id
WHERE exam_results.id = (
SELECT TOP(1) id FROM exam_results T2
WHERE exam_results.student_id = T2.student_id
ORDER BY score DESC, date ASC)
If there is a tied score, the oldest date is returned (change date ASC to date DESC to return the most recent instead).
Output:
Jim 93 2009-01-03 00:00:00.000
Joe 27 2009-04-09 00:00:00.000
Jay 100 2009-01-06 00:00:00.000
Test bed:
CREATE TABLE students(id int , name nvarchar(20) );
CREATE TABLE exam_results(id int , student_id int , score int, date datetime);
INSERT INTO students
VALUES
(1,'Jim'),(2,'Joe'),(3,'Jay')
INSERT INTO exam_results VALUES
(1, 1, 73, '8/1/09'),
(2, 1, 93, '9/2/09'),
(3, 1, 93, '1/3/09'),
(4, 2, 27, '4/9/09'),
(5, 2, 17, '8/9/09'),
(6, 3, 100, '1/6/09')
SELECT name, score, date FROM exam_results
JOIN students ON student_id = students.id
JOIN (SELECT DISTINCT student_id FROM exam_results) T1
ON exam_results.student_id = T1.student_id
WHERE exam_results.id = (
SELECT TOP(1) id FROM exam_results T2
WHERE exam_results.student_id = T2.student_id
ORDER BY score DESC, date ASC)
On MySQL, I think you can change the TOP(1) to a LIMIT 1 at the end of the statement. I have not tested this though.