SQL LEFT JOIN: difference between WHERE and condition inside AND [duplicate] - sql

This question already has answers here:
SQL JOIN - WHERE clause vs. ON clause
(22 answers)
Closed 3 years ago.
What's the difference between
select t.*,
a.age
from t
left join a
on t.ID = a.ID and a.column > 10
and
select t.*,
a.age
from t
left join a
on t.ID = a.ID
where a.column > 10
?
Specifically, what's the difference when I put the condition on the table I am joining to the main table inside AND versus inside WHERE condition?

with a left join there is a difference
with condition on left join rows with column > 10 will be there filled with nulls
with where condition rows will be filtered out
with a inner join there is no difference
example:
declare #t table (id int, dummy varchar(20))
declare #a table (id int, age int, col int)
insert into #t
select * from (
values
(1, 'pippo' ),
(2, 'pluto' ),
(3, 'paperino' ),
(4, 'ciccio' ),
(5, 'caio' ),
(5, 'sempronio')
) x (c1,c2)
insert into #a
select * from (
values
(1, 38, 2 ),
(2, 26, 5 ),
(3, 41, 12),
(4, 15, 11),
(5, 39, 7 )
) x (c1,c2,c3)
select t.*, a.age
from #t t
left join #a a on t.ID = a.ID and a.col > 10
Outputs:
id dummy age
1 pippo NULL
2 pluto NULL
3 paperino 41
4 ciccio 15
5 caio NULL
5 sempronio NULL
While
select t.*, a.age
from #t t
left join #a a on t.ID = a.ID
where a.col > 10
Outputs:
id dummy age
3 paperino 41
4 ciccio 15
So with LEFT JOIN you will get ALWAYS all the rows from 1st table
If the join condition is true, you will get columns from joined table filled with their values, if the condition is false their columns will be NULL
With WHERE condition you will get only the rows that match the condition.

So what's the difference between them?
An explanation through examples:
CREATE TABLE Students
(
StudentId INT PRIMARY KEY,
Name VARCHAR(100)
);
✓
CREATE TABLE Scores
(
ScoreId INT PRIMARY KEY,
ExamId INT NOT NULL,
StudentId INT NOT NULL,
Score DECIMAL(4,1) NOT NULL DEFAULT 0,
FOREIGN KEY (StudentId)
REFERENCES Students(StudentId)
);
✓
INSERT INTO Students
(StudentId, Name) VALUES
(11,'Joe Shmoe'),
(12,'Jane Doe'),
(47,'Norma Nelson');
✓
INSERT INTO Scores
(ScoreId, ExamId, StudentId, Score) VALUES
(1, 101, 11, 65.2),
(2, 101, 12, 72.6),
(3, 102, 11, 69.6);
✓
--
-- Using an INNER JOIN
--
-- Only Students that have scores
-- So only when there's a match between the 2 tables
--
SELECT stu.Name, sco.Score
FROM Students AS stu
INNER JOIN Scores AS sco
ON sco.StudentId = stu.StudentId
ORDER BY stu.Name
Name | Score
:-------- | :----
Jane Doe | 72.6
Joe Shmoe | 65.2
Joe Shmoe | 69.6
--
-- Using an LEFT JOIN
--
-- All Students, even those without scores
-- Those that couldn't be matched will show NULL's
-- for the fields from the joined table
--
SELECT stu.Name, sco.Score, sco.ScoreId
FROM Students AS stu
LEFT JOIN Scores AS sco
ON sco.StudentId = stu.StudentId
ORDER BY stu.Name
Name | Score | ScoreId
:----------- | :---- | :------
Jane Doe | 72.6 | 2
Joe Shmoe | 65.2 | 1
Joe Shmoe | 69.6 | 3
Norma Nelson | null | null
--
-- Using an LEFT JOIN
-- But with an extra criteria in the ON clause
--
-- All Students again.
-- That have scores >= 66
-- But also the unmatched without scores
--
SELECT stu.Name, sco.Score, sco.ScoreId
FROM Students AS stu
LEFT JOIN Scores AS sco
ON sco.StudentId = stu.StudentId
AND sco.Score >= 66.0
ORDER BY stu.Name
Name | Score | ScoreId
:----------- | :---- | :------
Jane Doe | 72.6 | 2
Joe Shmoe | 69.6 | 3
Norma Nelson | null | null
--
-- Using an LEFT JOIN
-- But with an extra criteria in the WHERE clause
--
-- Only students with scores >= 66
-- The WHERE filters out the unmatched.
--
SELECT stu.Name, sco.Score
FROM Students AS stu
LEFT JOIN Scores AS sco
ON sco.StudentId = stu.StudentId
WHERE sco.Score >= 66.0
ORDER BY stu.Name
Name | Score
:-------- | :----
Jane Doe | 72.6
Joe Shmoe | 69.6
--
-- Using an INNER JOIN
-- And with an extra criteria in the WHERE clause
--
-- Only Students that have scores >= 66
--
SELECT stu.Name, sco.Score
FROM Students AS stu
INNER JOIN Scores AS sco
ON sco.StudentId = stu.StudentId
WHERE sco.Score >= 66
ORDER BY stu.Name
Name | Score
:-------- | :----
Jane Doe | 72.6
Joe Shmoe | 69.6
db<>fiddle here
Did you notice how the criteria in the WHERE clause can make a LEFT JOIN behave like an INNER JOIN?

Related

Find Last Record in Chain - a Customer Merge Process

I am importing customer data from a another vendor's system and we have merge processes that we use to identify potential duplicate customer accounts and them merge them if they meet certain criteria - like same first name, last name, SSN and DOB. In this process, I am seeing where we are creating chains - for instance, Customer A is merged to Customer B who is then merged to Customer C.
What I am hoping to do it to identify these chains and update the customer record to point to the last record in the chain. So in my example above, Customer A and Customer B would both have Customer C's id in their merged To field.
CustID FName LName CustStatusType isMerged MergedTo
1 Kevin Smith M 1 2
2 Kevin Smith M 1 3
3 Kevin Smith M 1 4
4 Kevin Smith O 0 NULL
5 Mary Jones O 0 NULL
6 Wyatt Earp M 1 7
7 Wyatt Earp O 1 NULL
8 Bruce Wayn M 1 10
9 Brice Wayne M 1 10
10 Bruce Wane M 1 11
11 Bruce Wayne O 1 NULL
CustStatusType indicates if the customer account is open ("O") or merged ("M"). And then we have an isMerged field as a BIT field that indicates whether the account has been merged and finally the MergedTo field that indicates what customer account the record was merged to.
With the example provided, what I would like to achieve would to have the CustID records of 1 & 2 have their MergedTo record set to 3 - while CustID 3 could either be updated or left as is. For Cust IDs 4, 5, and 6 - these records are find and do not need to be updated. But on Cust IDs 8 - 10, I would like these records to be set to 11 - like the table below.
CustID FName LName CustStatusType isMerged MergedTo
1 Kevin Smith M 1 4
2 Kevin Smith M 1 4
3 Kevin Smith M 1 4
4 Kevin Smith O 0 NULL
5 Mary Jones O 0 NULL
6 Wyatt Earp M 1 7
7 Wyatt Earp O 1 NULL
8 Bruce Wayn M 1 11
9 Brice Wayne M 1 11
10 Bruce Wane M 1 11
11 Bruce Wayne O 1 NULL
I haven't been able to figure out how to achieve this with TSQL - suggestions?
Test Data:
DROP TABLE IF EXISTS #Customers;
CREATE TABLE #Customers
(
CustomerID INT ,
FirstName VARCHAR (25) ,
LastName VARCHAR (25) ,
CustomerStatusTypeID VARCHAR (1) ,
isMerged BIT ,
MergedTo INT
);
INSERT INTO #Customers
VALUES ( 1, 'Kevin', 'Smith', 'M', 1, 2 ) ,
( 2, 'Kevin', 'Smith', 'M', 1, 3 ) ,
( 3, 'Kevin', 'Smith', 'M', 1, 4 ) ,
( 4, 'Kevin', 'Smith', 'O', 0, NULL ) ,
( 5, 'Mary', 'Jones', 'O', 0, NULL ) ,
( 6, 'Wyatt', 'Earp', 'M', 1, 7 ) ,
( 7, 'Wyatt', 'Earp', 'O', 1, NULL ) ,
( 8, 'Bruce', 'Wayn', 'M', 1, 10 ) ,
( 9, 'Brice', 'Wayne', 'M', 1, 10 ) ,
( 10, 'Bruce', 'Wane', 'M', 1, 11 ) ,
( 11, 'Bruce', 'Wayne', 'O', 1, NULL );
SELECT *
FROM #Customers;
DROP TABLE #Customers;
For your example soundex() seems good enough. It returns a code, that is based on the word's pronunciation in English. Use it on the first and last name to join the customer table and a subquery which queries the customer table adding the row_number() partitioned by the Soundex of the names and order descending by the ID -- to number the "latest" record with 1. For the join condition use the Soundex of the names, a row number of 1 and of course inequality of the IDs.
UPDATE c1
SET c1.mergedto = x.customerid
FROM #customers c1
LEFT JOIN (SELECT c2.customerid,
soundex(c2.firstname) sefn,
soundex(c2.lastname) seln,
row_number() OVER (PARTITION BY soundex(c2.firstname),
soundex(c2.lastname)
ORDER BY c2.customerid DESC) rn
FROM #customers c2) x
ON x.sefn = soundex(c1.firstname)
AND x.seln = soundex(c1.lastname)
AND x.rn = 1
AND x.customerid <> c1.customerid;
db<>fiddle
I don't really get the concept behind the customerstatustypeid and ismerged columns. As what I understand, they're all derived from whether mergedto is null or not. But the sample data neither the expected result doesn't support that. But as these columns apparently don't change between your sample input and output I guess it's alright, that I just left them alone.
If Soundex proves to be insufficient for your needs you may want to look for other string distance metrics, like the Levenshtein distance. AFAIK there's no implementation of that included in SQL Server but search engines may spit out implementations by third parties or maybe there's something that can used via CLR. Or you roll your own, of course.
Below query finds the latest CustomerID which is match to each customer and returns the id in Ref column
select *
, Ref = (select top 1 CustomerID from #Customers where soundex(FirstName) = soundex(ma.FirstName) and soundex(LastName) = soundex(ma.LastName) order by CustomerID desc)
from #Customers ma
using below update, you can update MergedTo column
;with ct as (
select *
, Ref = (select top 1 CustomerID from #Customers where soundex(FirstName) = soundex(ma.FirstName) and soundex(LastName) = soundex(ma.LastName) order by CustomerID desc)
from #Customers ma
)
update c1
set c1.MergedTo = iif(c1.CustomerID = ct.Ref, null, ct.Ref)
from #Customers c1
inner join ct on ct.CustomerID = c1.CustomerID
Final data in Customer table after update
Recursion can be used for this:
WITH CTE as
(
SELECT P.CustomerID, P.MergedTo, CAST(P.CustomerID AS VarChar(Max)) as Levels
FROM #Customers P
WHERE P.MergedTo IS NULL
UNION ALL
SELECT P1.CustomerID, P1.MergedTo, M.Levels + ', ' + CAST(P1.CustomerID AS VarChar(Max))
FROM #Customers P1
INNER JOIN CTE M ON M.CustomerID = P1.MergedTo
)
SELECT
CustomerID
, MergedTo
, x -- "end of chain"
, Levels
FROM CTE
CROSS APPLY (
SELECT LEFT(levels,charindex(',',levels+',')-1) x
) a
WHERE MergedTo IS NOT NULL
Result:
+----+------------+----------+----+------------+
| | CustomerID | MergedTo | x | levels |
+----+------------+----------+----+------------+
| 1 | 10 | 11 | 11 | 11, 10 |
| 2 | 8 | 10 | 11 | 11, 10, 8 |
| 3 | 9 | 10 | 11 | 11, 10, 9 |
| 4 | 6 | 7 | 7 | 7, 6 |
| 5 | 3 | 4 | 4 | 4, 3 |
| 6 | 2 | 3 | 4 | 4, 3, 2 |
| 7 | 1 | 2 | 4 | 4, 3, 2, 1 |
+----+------------+----------+----+------------+
Note the string levels is formed by the recursion, and in the manner this is concatenated the first part will be the "end of chain" (see column x). That first part is extracted using a cross apply although using an apply isn't essential.
Available as a demo

Displaying whole table after stripping characters in SQL Server

This question has 2 parts.
Part 1
I have a table "Groups":
group_ID person
-----------------------
1 Person 10
2 Person 11
3 Jack
4 Person 12
Note that not all data in the "person" column have the same format.
In SQL Server, I have used the following query to strip the "Person " characters out of the person column:
SELECT
REPLACE([person],'Person ','')
AS [person]
FROM Groups
I did not use UPDATE in the query above as I do not want to alter the data in the table.
The query returned this result:
person
------
10
11
12
However, I would like this result instead:
group_ID person
-------------------
1 10
2 11
3 Jack
4 12
What should be my query to achieve this result?
Part 2
I have another table "Details":
detail_ID group1 group2
-------------------------------
100 1 2
101 3 4
From the intended result in Part 1, where the numbers in the "person" column correspond to those in "group1" and "group2" of table "Details", how do I selectively convert the numbers in "person" to integers and join them with "Details"?
Note that all data under "person" in Part 1 are strings (nvarchar(100)).
Here is the intended query output:
detail_ID group1 group2
-------------------------------
100 10 11
101 Jack 12
Note that I do not wish to permanently alter anything in both tables and the intended output above is just a result of a SELECT query.
I don't think first part will be a problem here. Your query is working fine with your expected result.
Schema:
CREATE TABLE #Groups (group_ID INT, person VARCHAR(50));
INSERT INTO #Groups
SELECT 1,'Person 10'
UNION ALL
SELECT 2,'Person 11'
UNION ALL
SELECT 3,'Jack'
UNION ALL
SELECT 4,'Person 12';
CREATE TABLE #Details(detail_ID INT,group1 INT, group2 INT);
INSERT INTO #Details
SELECT 100, 1, 2
UNION ALL
SELECT 101, 3, 4 ;
Part 1:
For me your query is giving exactly what you are expecting
SELECT group_ID,REPLACE([person],'Person ','') AS person
FROM #Groups
+----------+--------+
| group_ID | person |
+----------+--------+
| 1 | 10 |
| 2 | 11 |
| 3 | Jack |
| 4 | 12 |
+----------+--------+
Part 2:
;WITH CTE AS(
SELECT group_ID
,REPLACE([person],'Person ','') AS person
FROM #Groups
)
SELECT D.detail_ID, G1.person, G2.person
FROM #Details D
INNER JOIN CTE G1 ON D.group1 = G1.group_ID
INNER JOIN CTE G2 ON D.group1 = G2.group_ID
Result:
+-----------+--------+--------+
| detail_ID | person | person |
+-----------+--------+--------+
| 100 | 10 | 10 |
| 101 | Jack | Jack |
+-----------+--------+--------+
Try following query, it should give you the desired output.
;WITH MT AS
(
SELECT
GroupId, REPLACE([person],'Person ','') Person
AS [person]
FROM Groups
)
SELECT Detail_Id , MT1.Person AS group1 , MT2.Person AS AS group2
FROM
Details D
INNER JOIN MT MT1 ON MT1.GroupId = D.group1
INNER JOIN MT MT2 ON MT2.GroupId= D.group2
The first query works
declare #T table (id int primary key, name varchar(10));
insert into #T values
(1, 'Person 10')
, (2, 'Person 11')
, (3, 'Jack')
, (4, 'Person 12');
declare #G table (id int primary key, grp1 int, grp2 int);
insert into #G values
(100, 1, 2)
, (101, 3, 4);
with cte as
( select t.id, t.name, ltrim(rtrim(replace(t.name, 'person', ''))) as sp
from #T t
)
-- select * from cte order by cte.id;
select g.id, c1.sp as grp1, c2.sp as grp2
from #G g
join cte c1
on c1.id = g.grp1
join cte c2
on c2.id = g.grp2
order
by g.id;
id grp1 grp2
----------- ----------- -----------
100 10 11
101 Jack 12

Join tables with distinct highest ranked row

I have three tables defined like this:
[tbMember]
memberID | memberName
1 | John
2 | Peter
[tbGroup]
groupID | groupName
1 | Alpha
2 | Beta
3 | Gamma
[tbMemberGroupRelation]
memberID | groupID | memberRank (larger number is higher)
1 | 1 | 0
1 | 2 | 1
2 | 1 | 5
2 | 2 | 3
2 | 3 | 1
And now I want to perform a table-join selection to get result which contains (distinct) member with his highest ranked group in each row, for the given example above, the query result is desired to be:
memberID | memberName | groupName | memberRank
1 | John | Beta | 1
2 | Peter | Alpha | 5
Is there a way to implement it in a single SQL like following style ?
select * from tbMember m
left join tbMemberGroupRelation mg on (m.MemberID = mg.MemberID and ......)
left join tbGroup g on (mg.GroupID = g.GroupID)
Any other solutions are also appreciated if it is impossible to write in a simple query.
========= UPDATED =========
Only ONE highest rank is allowed in table
One solution would be to create an inverted sequence/rank of the memberRank so that the highest rank per member is always equal to 1.
This is how I achieved it using a sub-query:
SELECT
m.memberID,
m.memberName,
g.groupName,
mg.memberRank
FROM
tbMember m
LEFT JOIN
(
SELECT
memberID,
groupID,
groupName,
memberRank,
RANK() OVER(PARTITION BY memberID ORDER BY memberRank DESC) AS invRank
FROM
tbMemberGroupRelation
) mg
ON (mg.memberID = m.memberID)
AND (mg.invRank = 1)
LEFT JOIN
tbGroup g
ON (g.groupID = mg.groupID);
An alternative method:
SELECT
M.memberID,
M.memberName,
G.groupName,
MG.memberRank
FROM
Member M
LEFT OUTER JOIN MemberGroup MG ON MG.memberID = M.memberID
LEFT OUTER JOIN MemberGroup MG2 ON
MG2.memberID = M.memberID AND
MG2.memberRank > MG.memberRank
INNER JOIN [Group] G ON G.groupid = MG.groupid
WHERE
MG2.memberid IS NULL
Might perform better in some situations due to indexing, etc.
create table [tbGroup] (groupid int, groupname varchar(8000))
Insert [tbGroup] Values (1, 'Alpha')
Insert [tbGroup] Values (2, 'Beta')
Insert [tbGroup] Values (3, 'Gamma')
create table [tbMemberGroupRelation] (memberid int, groupid int, memberrank int)
Insert [tbMemberGroupRelation] Values (1,1,0)
Insert [tbMemberGroupRelation] Values (1,2,1)
Insert [tbMemberGroupRelation] Values (2,1,5)
Insert [tbMemberGroupRelation] Values (2,2,3)
Insert [tbMemberGroupRelation] Values (2,3,1)
;With cteMemberGroupRelation As
(
Select *, Row_Number() Over (Partition By MemberID Order By MemberRank Desc) SortOrder
From [tbMemberGroupRelation]
)
Select *
From tbMember M
Join (Select * From cteMemberGroupRelation Where SortOrder = 1) R On R.memberid = M.memberid
Join tbGroup G On G.groupid = R.groupid

Find exactly matched groups in a many to many relationship table

Table shown below maps the many-to-many relationship between courses and students.
CREATE Table CourseStudents
(
CourseId INT NOT NULL,
StudentId INT NOT NULL,
PRIMARY KEY (CourseId, StudentId)
);
INSERT INTO CourseStudents VALUES (1, 1), (1, 2), (2, 1), (2, 2), (3, 3), (3, 2),
(4, 3), (4, 2), (5, 1)
Example data
| CourseId | StudentId |
|----------|-----------|
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |
| 3 | 2 |
| 3 | 3 |
| 4 | 2 |
| 4 | 3 |
| 5 | 1 |
I'm looking for a query that returns all courses that have the exact same students. I was able to come up with the query shown below.
WITH CourseGroups AS
(
SELECT c.CourseId,
STUFF ((
SELECT ',' + CAST(c2.StudentId AS VARCHAR)
FROM CourseStudents c2
WHERE c2.CourseId = c.CourseId
ORDER BY c2.StudentId
FOR XML PATH ('')), 1, 1, '') AS StudentList
FROM CourseStudents c
GROUP BY c.CourseId)
SELECT cg.StudentList,
STUFF ((
SELECT ',' + CAST(cg2.CourseId AS VARCHAR(10))
FROM CourseGroups cg2
WHERE cg2.StudentList = cg.StudentList
FOR XML PATH ('')), 1, 1, '') AS ExactMatchCourseList
FROM CourseGroups cg
GROUP BY cg.StudentList
HAVING COUNT(*) > 1
Query returns
| StudentList | ExactMatchCourseList |
|-------------|----------------------|
| 1,2 | 1,2 |
| 2,3 | 3,4 |
Above result is fine. But I only need the ExactMatchCourseList.
The table I'm dealing has more than a billion rows so I need an efficient query that can find any matched courses within a few minutes of run time. Appreciate any help.
SqlFiddle
This only does 2 runs over your CourseStudents table, instead of the 4 you're currently doing. And if you add an index on CourseId on the CourseStudents table, the first run will only be an index scan. It also only runs the original STUFF once for each course, instead of once for each student, then grouping by course. I left out the final STUFF that I wasn't sure if you wanted or if it was just a byproduct of how you were calculating it.
CREATE TABLE #Course
(
CourseId INT NOT NULL PRIMARY KEY
);
INSERT INTO #Course
SELECT CourseId
FROM
CourseStudents s
GROUP BY
CourseId
ORDER BY
CourseId;
CREATE TABLE #CourseStudentList
(
CourseId INT NOT NULL PRIMARY KEY,
StudentList VARCHAR(MAX) NOT NULL
);
INSERT INTO #CourseStudentList
SELECT
c.CourseId,
STUFF ((
SELECT ',' + CAST(c2.StudentId AS VARCHAR)
FROM CourseStudents c2
WHERE c2.CourseId = c.CourseId
ORDER BY c2.StudentId
FOR XML PATH ('')), 1, 1, '') AS StudentList
FROM
#Course c
ORDER BY
c.CourseId;
SELECT *
FROM
(
SELECT
l.CourseId,
l.StudentList,
COUNT(*) OVER (PARTITION BY l.StudentList) AS [Count]
FROM
#CourseStudentList l
) l
WHERE
l.[Count] > 1
ORDER BY
l.StudentList;
This will give you a list of course pairs, although if you are going to get triplicates (or more) then you'll end up with some extra results. I don't have time to toy with this further to correct that issue, but maybe it points you in the right direction:
WITH CTE_CourseMatches AS (
SELECT
CS1.CourseId AS CourseId_1,
CS2.CourseId AS CourseId_2,
COUNT(*) AS cnt
FROM
CourseStudents CS1
INNER JOIN CourseStudents CS2 ON CS2.StudentId = CS1.StudentId AND CS2.CourseId > CS1.CourseId
GROUP BY
CS1.CourseId,
CS2.CourseId
),
CTE_CourseCounts AS (SELECT CourseId, COUNT(*) AS cnt FROM CourseStudents GROUP BY CourseID)
SELECT
CM.CourseId_1,
CM.CourseId_2
FROM
CTE_CourseMatches CM
INNER JOIN CTE_CourseCounts CC1 ON CC1.CourseId = CM.CourseId_1 AND CC1.cnt = CM.cnt
INNER JOIN CTE_CourseCounts CC2 ON CC2.CourseId = CM.CourseId_2 AND CC2.cnt = CM.cnt

Joining tables based on the maximum value

Here's a simplified example of what I'm talking about:
Table: students exam_results
_____________ ____________________________________
| id | name | | id | student_id | score | date |
|----+------| |----+------------+-------+--------|
| 1 | Jim | | 1 | 1 | 73 | 8/1/09 |
| 2 | Joe | | 2 | 1 | 67 | 9/2/09 |
| 3 | Jay | | 3 | 1 | 93 | 1/3/09 |
|____|______| | 4 | 2 | 27 | 4/9/09 |
| 5 | 2 | 17 | 8/9/09 |
| 6 | 3 | 100 | 1/6/09 |
|____|____________|_______|________|
Assume, for the sake of this question, that every student has at least one exam result recorded.
How would you select each student along with their highest score? Edit: ...AND the other fields in that record?
Expected output:
_________________________
| name | score | date |
|------+-------|--------|
| Jim | 93 | 1/3/09 |
| Joe | 27 | 4/9/09 |
| Jay | 100 | 1/6/09 |
|______|_______|________|
Answers using all types of DBMS are welcome.
Answering the EDITED question (i.e. to get associated columns as well).
In Sql Server 2005+, the best approach would be to use a ranking/window function in conjunction with a CTE, like this:
with exam_data as
(
select r.student_id, r.score, r.date,
row_number() over(partition by r.student_id order by r.score desc) as rn
from exam_results r
)
select s.name, d.score, d.date, d.student_id
from students s
join exam_data d
on s.id = d.student_id
where d.rn = 1;
For an ANSI-SQL compliant solution, a subquery and self-join will work, like this:
select s.name, r.student_id, r.score, r.date
from (
select r.student_id, max(r.score) as max_score
from exam_results r
group by r.student_id
) d
join exam_results r
on r.student_id = d.student_id
and r.score = d.max_score
join students s
on s.id = r.student_id;
This last one assumes there aren't duplicate student_id/max_score combinations, if there are and/or you want to plan to de-duplicate them, you'll need to use another subquery to join to with something deterministic to decide which record to pull. For example, assuming you can't have multiple records for a given student with the same date, if you wanted to break a tie based on the most recent max_score, you'd do something like the following:
select s.name, r3.student_id, r3.score, r3.date, r3.other_column_a, ...
from (
select r2.student_id, r2.score as max_score, max(r2.date) as max_score_max_date
from (
select r1.student_id, max(r1.score) as max_score
from exam_results r1
group by r1.student_id
) d
join exam_results r2
on r2.student_id = d.student_id
and r2.score = d.max_score
group by r2.student_id, r2.score
) r
join exam_results r3
on r3.student_id = r.student_id
and r3.score = r.max_score
and r3.date = r.max_score_max_date
join students s
on s.id = r3.student_id;
EDIT: Added proper de-duplicating query thanks to Mark's good catch in comments
SELECT s.name,
COALESCE(MAX(er.score), 0) AS high_score
FROM STUDENTS s
LEFT JOIN EXAM_RESULTS er ON er.student_id = s.id
GROUP BY s.name
Try this,
Select student.name, max(result.score) As Score from Student
INNER JOIN
result
ON student.ID = result.student_id
GROUP BY
student.name
With Oracle's analytic functions this is easy:
SELECT DISTINCT
students.name
,FIRST_VALUE(exam_results.score)
OVER (PARTITION BY students.id
ORDER BY exam_results.score DESC) AS score
,FIRST_VALUE(exam_results.date)
OVER (PARTITION BY students.id
ORDER BY exam_results.score DESC) AS date
FROM students, exam_results
WHERE students.id = exam_results.student_id;
Select Name, T.Score, er. date
from Students S inner join
(Select Student_ID,Max(Score) as Score from Exam_Results
Group by Student_ID) T
On S.id=T.Student_ID inner join Exam_Result er
On er.Student_ID = T.Student_ID And er.Score=T.Score
Using MS SQL Server:
SELECT name, score, date FROM exam_results
JOIN students ON student_id = students.id
JOIN (SELECT DISTINCT student_id FROM exam_results) T1
ON exam_results.student_id = T1.student_id
WHERE exam_results.id = (
SELECT TOP(1) id FROM exam_results T2
WHERE exam_results.student_id = T2.student_id
ORDER BY score DESC, date ASC)
If there is a tied score, the oldest date is returned (change date ASC to date DESC to return the most recent instead).
Output:
Jim 93 2009-01-03 00:00:00.000
Joe 27 2009-04-09 00:00:00.000
Jay 100 2009-01-06 00:00:00.000
Test bed:
CREATE TABLE students(id int , name nvarchar(20) );
CREATE TABLE exam_results(id int , student_id int , score int, date datetime);
INSERT INTO students
VALUES
(1,'Jim'),(2,'Joe'),(3,'Jay')
INSERT INTO exam_results VALUES
(1, 1, 73, '8/1/09'),
(2, 1, 93, '9/2/09'),
(3, 1, 93, '1/3/09'),
(4, 2, 27, '4/9/09'),
(5, 2, 17, '8/9/09'),
(6, 3, 100, '1/6/09')
SELECT name, score, date FROM exam_results
JOIN students ON student_id = students.id
JOIN (SELECT DISTINCT student_id FROM exam_results) T1
ON exam_results.student_id = T1.student_id
WHERE exam_results.id = (
SELECT TOP(1) id FROM exam_results T2
WHERE exam_results.student_id = T2.student_id
ORDER BY score DESC, date ASC)
On MySQL, I think you can change the TOP(1) to a LIMIT 1 at the end of the statement. I have not tested this though.