Joining tables based on the maximum value - sql

Here's a simplified example of what I'm talking about:
Table: students exam_results
_____________ ____________________________________
| id | name | | id | student_id | score | date |
|----+------| |----+------------+-------+--------|
| 1 | Jim | | 1 | 1 | 73 | 8/1/09 |
| 2 | Joe | | 2 | 1 | 67 | 9/2/09 |
| 3 | Jay | | 3 | 1 | 93 | 1/3/09 |
|____|______| | 4 | 2 | 27 | 4/9/09 |
| 5 | 2 | 17 | 8/9/09 |
| 6 | 3 | 100 | 1/6/09 |
|____|____________|_______|________|
Assume, for the sake of this question, that every student has at least one exam result recorded.
How would you select each student along with their highest score? Edit: ...AND the other fields in that record?
Expected output:
_________________________
| name | score | date |
|------+-------|--------|
| Jim | 93 | 1/3/09 |
| Joe | 27 | 4/9/09 |
| Jay | 100 | 1/6/09 |
|______|_______|________|
Answers using all types of DBMS are welcome.

Answering the EDITED question (i.e. to get associated columns as well).
In Sql Server 2005+, the best approach would be to use a ranking/window function in conjunction with a CTE, like this:
with exam_data as
(
select r.student_id, r.score, r.date,
row_number() over(partition by r.student_id order by r.score desc) as rn
from exam_results r
)
select s.name, d.score, d.date, d.student_id
from students s
join exam_data d
on s.id = d.student_id
where d.rn = 1;
For an ANSI-SQL compliant solution, a subquery and self-join will work, like this:
select s.name, r.student_id, r.score, r.date
from (
select r.student_id, max(r.score) as max_score
from exam_results r
group by r.student_id
) d
join exam_results r
on r.student_id = d.student_id
and r.score = d.max_score
join students s
on s.id = r.student_id;
This last one assumes there aren't duplicate student_id/max_score combinations, if there are and/or you want to plan to de-duplicate them, you'll need to use another subquery to join to with something deterministic to decide which record to pull. For example, assuming you can't have multiple records for a given student with the same date, if you wanted to break a tie based on the most recent max_score, you'd do something like the following:
select s.name, r3.student_id, r3.score, r3.date, r3.other_column_a, ...
from (
select r2.student_id, r2.score as max_score, max(r2.date) as max_score_max_date
from (
select r1.student_id, max(r1.score) as max_score
from exam_results r1
group by r1.student_id
) d
join exam_results r2
on r2.student_id = d.student_id
and r2.score = d.max_score
group by r2.student_id, r2.score
) r
join exam_results r3
on r3.student_id = r.student_id
and r3.score = r.max_score
and r3.date = r.max_score_max_date
join students s
on s.id = r3.student_id;
EDIT: Added proper de-duplicating query thanks to Mark's good catch in comments

SELECT s.name,
COALESCE(MAX(er.score), 0) AS high_score
FROM STUDENTS s
LEFT JOIN EXAM_RESULTS er ON er.student_id = s.id
GROUP BY s.name

Try this,
Select student.name, max(result.score) As Score from Student
INNER JOIN
result
ON student.ID = result.student_id
GROUP BY
student.name

With Oracle's analytic functions this is easy:
SELECT DISTINCT
students.name
,FIRST_VALUE(exam_results.score)
OVER (PARTITION BY students.id
ORDER BY exam_results.score DESC) AS score
,FIRST_VALUE(exam_results.date)
OVER (PARTITION BY students.id
ORDER BY exam_results.score DESC) AS date
FROM students, exam_results
WHERE students.id = exam_results.student_id;

Select Name, T.Score, er. date
from Students S inner join
(Select Student_ID,Max(Score) as Score from Exam_Results
Group by Student_ID) T
On S.id=T.Student_ID inner join Exam_Result er
On er.Student_ID = T.Student_ID And er.Score=T.Score

Using MS SQL Server:
SELECT name, score, date FROM exam_results
JOIN students ON student_id = students.id
JOIN (SELECT DISTINCT student_id FROM exam_results) T1
ON exam_results.student_id = T1.student_id
WHERE exam_results.id = (
SELECT TOP(1) id FROM exam_results T2
WHERE exam_results.student_id = T2.student_id
ORDER BY score DESC, date ASC)
If there is a tied score, the oldest date is returned (change date ASC to date DESC to return the most recent instead).
Output:
Jim 93 2009-01-03 00:00:00.000
Joe 27 2009-04-09 00:00:00.000
Jay 100 2009-01-06 00:00:00.000
Test bed:
CREATE TABLE students(id int , name nvarchar(20) );
CREATE TABLE exam_results(id int , student_id int , score int, date datetime);
INSERT INTO students
VALUES
(1,'Jim'),(2,'Joe'),(3,'Jay')
INSERT INTO exam_results VALUES
(1, 1, 73, '8/1/09'),
(2, 1, 93, '9/2/09'),
(3, 1, 93, '1/3/09'),
(4, 2, 27, '4/9/09'),
(5, 2, 17, '8/9/09'),
(6, 3, 100, '1/6/09')
SELECT name, score, date FROM exam_results
JOIN students ON student_id = students.id
JOIN (SELECT DISTINCT student_id FROM exam_results) T1
ON exam_results.student_id = T1.student_id
WHERE exam_results.id = (
SELECT TOP(1) id FROM exam_results T2
WHERE exam_results.student_id = T2.student_id
ORDER BY score DESC, date ASC)
On MySQL, I think you can change the TOP(1) to a LIMIT 1 at the end of the statement. I have not tested this though.

Related

SQL Query : how to Select Maximum value of each group of joined tables

I have this problem of returning maximum AGE of players in these 2 tables I have, Table tblplayers (with 34 records) when this table is joined to another table called tblClubs (with 9 records).
tblPlayers fields are:
ID(Autonumber) | CLubID(Number) | Player Name(Text) | PlayerAge(Number)
tblClubs fields are:
ID(Autonumber) | ClubName (Text)
Now I need to show Names of players with maximum ages among other players in their own clubs and the club name beside that like this :
Club Name | Player Name | Maximum Age (older player of each club)
please tell me how can i make it?
You can solve this with window functions, if your database supports them:
select c.club_name, p.player_name, p.player_age
from clubs c
inner join (
select
p.*,
rank() over(partition by p.club_id order by p.player_age desc) rn
players p
) p on p.club_id = c.id and p.rn = 1
A common and quite portable alternative is to filter with a subquery:
select
c.club_name,
t.player_name,
t.player_age
from players p
inner join clubs c on c.id = p.club_id
where p.age = (select max(p1.age) from players p1 where p1.club_id = p.club_id)
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE tblPlayers (ID INT, ClubID INT,PlayerName VARCHAR(255)
,PlayerAge INT)
CREATE TABLE tblClubs (ID int,ClubName VARCHAR(255))
INSERT INTO tblPlayers(ID,ClubID,PlayerName
,PlayerAge) VALUES (1,1,'John',30)
,(2,1,'Mark',25)
,(3,1,'Albert',36)
,(4,2,'David',33)
,(5,2,'John',31)
INSERT INTO tblClubs(ID, ClubName) VALUES(1,'TEAM 1')
,(2,'TEAM 2')
Query 1:
SELECT
*
FROM
(SELECT A.*,RANK() OVER
(PARTITION BY A.ClubID ORDER BY A.PlayerAge desc) as rn
FROM tblPlayers A
INNER JOIN tblClubs B ON A.ClubID=B.ID) t
WHERE
t.rn = 1
Results:
| ID | ClubID | PlayerName | PlayerAge | rn |
|----|--------|------------|-----------|----|
| 3 | 1 | Albert | 36 | 1 |
| 4 | 2 | David | 33 | 1 |

SQL - Distinct count between two tables

I'm having a mind lapse on what I believe is a relatively easy script. Hopefully I'm overthinking the logic.
What I'm trying to do is perform two counts on a distinct column which is right joined.
What I want is:
count(a.book_id) as count_of_books
count(b.book_ref_number) as count_of_losses
Expected Output
--------------------------------------------------------
| Book | count_of_books | count of losses|
--------------------------------------------------------
|Hunger Games | 76 | 31 |
--------------------------------------------------------
|Hop on Pop | 27 | 6 |
--------------------------------------------------------
|Pout Pout Fish | 138 | 43 |
--------------------------------------------------------
I have tried a couple different scripts. Here are the two scripts I've tried.
(select count(*) from Inventory_Table x ) Count1,
(select count(*) from Loss_table b ) Count2
from Inventory_Table x
right join Loss_table b on b.book_ref_number = x.book_id
where rownum < 20
select
a.book_name,
count(distinct a.book_id),
count(b.book_ref_number)
from Inventory_Table x
right join Loss_table b on trim(b.book_ref_number) = trim(a.book_id)
Results I get
--------------------------------------------------------
| Book | count_of_books | count of losses|
--------------------------------------------------------
|Moby Dick | 4376 | 2574 |
--------------------------------------------------------
I'm looking for guidance in my neglectful mistake. Thank you in advance
and rownum <20 doesn't make sense. you are limiting your result set with 20 records.
try this:
select * from (
select
a.mrch_Nr,
count(distinct a.fdr_trac_nr),
count(b.auth_id)
from DATASTORE_FD.DEB_CRD_AUTH_LOG_REC a
right join jordab26.ft b on trim(b.auth_id) = trim(a.fdr_trac_nr)
where a.auth_log_dt between '20200101' and '20200408'
group by a.mrch_nr
)
where rownum < 20
Try this, I'm not sure about rownum < 20. Also, make sure your add correct group by condition.
select sum(case book_id when null then 0 else 1 end ) count_of_books,
sum(case book_ref_number when null then 0 else 1 end ) count_of_losses
from Inventory_Table x
right join Loss_table b on b.book_ref_number = x.book_id
where rownum < 20
Is this what you want?
Select distinct bookname,
count(distinct
a.bookid)+sum(
case when a.bookid IS NULL
THEN 1 END) ,
count(distinct b.id) as lossid
From inventary_table a
Left Join
Loss_table b
On
a.bookid=b.book_ref_number
SELECT book_name,COUNT(book_id),COUNT(book_ref_id) FROM Inventory_Table right join Loss_table on book_ref_number = book_id GROUP BY book_name
But if you need all the books in Inventory and only matching books from Loss_table then it should be left join:
SELECT book_name,COUNT(book_id),COUNT(book_ref_id) FROM Inventory_Table leftjoin Loss_table on book_ref_number = book_id GROUP BY book_name
0
SELECT book_name,COUNT(book_id),COUNT(book_ref_id)
FROM Inventory_Table
right join Loss_table on book_ref_number = book_id GROUP BY book_name

SQL Joins: Select status of reviews submitted by employees and also the list of employees who have not submitted the review for each year

For each year, for each employee, I want to list the status of the review that employee had submitted, or "not initiated" in case employee did not submit review for that year.
It's kind of difficult to express the question in words therefore I would try to explain it by giving example:
create table #employees
(
empid int,
name varchar(100)
)
Create table #review
(
empid int,
ryear int,
status varchar(20)
)
insert into #review values(1,2016,'S2')
insert into #review values(2,2016,'S2')
insert into #review values(2,2017,'S1')
insert into #review values(3,2017,'S2')
insert into #employees values(1,'jack')
insert into #employees values(2,'mack')
insert into #employees values(3,'rack')
insert into #employees values(4,'tack')
Wrong Query
select a.empid
,a.name
,b.ryear
,case isnull(b.status,'')
when ''
then 'Not Initiated'
else status
end as status
from #employees as a
left join #review as b
on a.empid = b.empid
and b.ryear in(select distinct
ryear
from #review
);--something like that
Expected Result:
+-------+------+-------+----------------+
| empid | name | ryear | status |
+-------+------+-------+----------------+
| 1 | jack | 2016 | S2 |
| 1 | jack | 2017 | not initiated |
| 2 | mack | 2016 | S2 |
| 2 | mack | 2017 | S1 |
| 3 | rack | 2016 | not initieated |
| 3 | rack | 2017 | S2 |
| 4 | tack | 2016 | Not Initiated |
| 4 | tack | 2017 | Not Initiated |
+-------+------+-------+----------------+
You could use a cross join on your sub query
select a.empid
,a.name
,c.ryear
,isnull(b.status,'Not Initiated') as status
from #employees as a
cross join(select distinct
ryear
from #review
) as c
left join #review as b
on b.ryear = c.ryear
and a.empid = b.empid
order by a.empid, ryear
This supposedly supports the presence of more than one review by the same employee in the same year:
SELECT Employees.empid, Employees.[name], ReviewYears.[Value]
, [Status] = ISNULL(LatestReviews.[status], 'Not Initiated')
FROM #employees AS Employees
CROSS JOIN (SELECT DISTINCT ryear AS [Value] FROM #review) AS ReviewYears -- We need some source of years, hopefully there are no missing years here.
LEFT JOIN (
SELECT *
FROM (
SELECT empid, ryear, [status]
, RN = ROW_NUMBER() OVER(PARTITION BY empid, ryear ORDER BY STATUS DESC) -- Per employee and year, we'll take only one status, hopefully we can order by statuses.
FROM #review
) AS T
WHERE RN = 1 -- Refer to comments at creation of RN.
) AS LatestReviews ON LatestReviews.empid = Employees.empid AND LatestReviews.ryear = ReviewYears.[Value] -- Refer to comments at creation of RN.
ORDER BY Employees.empid ASC, ReviewYears.[Value] ASC
Here's one using a common table expression;
It also supports multiple reviews in the same year for an employee.
WITH X AS
(SELECT distinct ryear FROM #review)
SELECT a.empid, a.name, X.ryear, isnull(b.status,'Not initiated')
FROM X as x
LEFT
JOIN #employees as a
ON 1=1
LEFT
JOIN #review as b
ON a.empid = b.empid
AND b.ryear = x.ryear
ORDER
BY a.empid,x.ryear

In SQL Query a one-to-many relationship with condition

I have the following tables:
event_tbl
| event_id (PK) | event_date | event_location |
|---------------|------------|----------------|
| 1 | 01/01/2018 | Miami |
| 2 | 02/04/2018 | Tampa |
performer_tbl
| performer_id (PK) | event_id (FK) | genre |
|-------------------|---------------|-------|
| 1 | 1 | A |
| 2 | 1 | B |
| 3 | 2 | A |
| 4 | 2 | C |
I want to find events that have both genre A and genre B (should just return event 1), and I'm lost on writing the query. Maybe I just haven't had enough coffee, but all I can come up with is doing two derived columns with a case statement that count either genre and group by the event_id, then filtering both to >0. It just doesn't seem very elegant.
This should do the job (in MySQL, for other DBMS the syntax can be varied easily):
SELECT
e.event_id
FROM
event_tbl e
JOIN performer_tbl p USING(event_id)
GROUP BY e.event_id
HAVING SUM(IF(p.genre = 'A', 1, 0)) >= 1 AND SUM(IF(p.genre = 'B', 1, 0)) >= 1;
if you are using sql server, check below:
Select * From
event_tbl
where event_id
IN
(
select event_id
from performer_tbl as A
where exists (select 1
from perfoermer_tbl as B
where B.event_id = A.event_id and B.genre = 'A')
and
exists (select 1
from perfoermer_tbl as B
where B.event_id = A.event_id and B.genre = 'B')
)
This should work in any SQL database (at least in mysql, sql server, postgres or oracle)
select event_tbl.* FROM (
select event_id
from performer_tbl
where genre = 'A'
GROUP BY event_id) a_t
INNER JOIN (select event_id
from performer_tbl
where genre = 'B'
GROUP BY event_id) b_t
ON a_t.event_id = b_t.event_id
INNER JOIN event_tbl
ON event_tbl.event_id = a_t.event_id
This also works using left joins: (Since there are no function calls or sub-selects, it is fast. Also, it's usable in most SQL engines.)
SELECT DISTINCT
p1.event_id
,e.event_date
,e.event_location
FROM
performer_tbl as p1
inner join event_tbl as e on
p1.event_id = e.event_id
left outer join performer_tbl as p2 on
p1.event_id = p2.event_id
AND p2.genre = 'A'
left outer join performer_tbl as p3 on
p1.event_id = p3.event_id
AND p3.genre = 'B'
WHERE
p2.genre IS NOT NULL
AND p3.genre IS NOT NULL;
If I correctly understand what you need, you can try this:
Select *
from event_tbl e
where exists (select *
from performer_tbl p
where p.event_id = e.event_id
and p.genre in ('A', 'B'))

SQL GROUP BY and retrieve last child records

I'm writing a DB view that pulls data from several tables. The goal is to determine the latest status of a company, and this is noted by each record (grouped by company_id) with the highest vetting_event_type_position.
Essentially I'm trying to grab the latest record for each company. I'm not a SQL guru at all; I understand I need to group by in order to collapse the related records, but I can't get that to work.
Current results
company_id | name | ... | vetting_event_type_position
-----------------------------------------------------
1 | ABC | ... | 1
1 | ABC | ... | 2
1 | ABC | ... | 3
2 | CBS | ... | 1
2 | CBS | ... | 2
3 | HBO | ... | 1
DESIRED results
company_id | name | ... | vetting_event_type_position
-----------------------------------------------------
1 | ABC | ... | 3
2 | CBS | ... | 2
3 | HBO | ... | 1
SQL Code
SELECT
companies.id as company_id,
companies.name as name,
companies.uuid as uuid,
companies.company_type as company_type,
companies.description as overview,
practice_areas.id as practice_area_id,
practice_areas.name as practice_area_name,
companies.created_at as created_at,
companies.updated_at as updated_at,
companies.created_by as created_by,
companies.updated_by as updated_by,
vettings.id as vetting_id,
vettings.name as vetting_name,
vetting_event_types.name as vetting_event_status,
vetting_events.id as vetting_event_id,
vetting_event_types.position as vetting_event_type_position
FROM
vettings
LEFT OUTER JOIN vetting_events ON (vettings.id = vetting_events.vetting_id)
LEFT OUTER JOIN vetting_event_types ON (vetting_events.vetting_event_type_id = vetting_event_types.id)
RIGHT OUTER JOIN companies ON (companies.id = vettings.company_id)
LEFT OUTER JOIN practice_areas ON (companies.practice_area_id = practice_areas.id)
LEFT OUTER JOIN dispositions ON (companies.disposition_id = dispositions.id)
ORDER BY
name, vetting_name, vetting_event_type_position
;
Associations among tables
companies has_many vettings
vettings has_many vetting_events
vetting_events belongs_to vetting_event_types
or put another way...
companies -> vettings -> vetting_events <- vetting_event_types
I am trying to retrieve the company record with the highest vetting_event_types.position value for each group.
SELECT company_id
,name
,uuid
,company_type
,overview
,practice_area_id
,practice_area_name
,created_at
,created_by
,updated_by
,vetting_id
,vetting_name
,vetting_event_status
,vetting_event_id
,vetting_event_type_position
FROM (
SELECT
companies.id as company_id,
companies.name as name,
companies.uuid as uuid,
companies.company_type as company_type,
companies.description as overview,
practice_areas.id as practice_area_id,
practice_areas.name as practice_area_name,
companies.created_at as created_at,
companies.updated_at as updated_at,
companies.created_by as created_by,
companies.updated_by as updated_by,
vettings.id as vetting_id,
vettings.name as vetting_name,
vetting_event_types.name as vetting_event_status,
vetting_events.id as vetting_event_id,
vetting_event_types.position as vetting_event_type_position,
ROW_NUMBER() OVER (PARTITION BY companies.id ORDER BY vetting_event_types.position DESC) rn
FROM vettings
LEFT OUTER JOIN vetting_events ON (vettings.id = vetting_events.vetting_id)
LEFT OUTER JOIN vetting_event_types ON (vetting_events.vetting_event_type_id = vetting_event_types.id)
RIGHT OUTER JOIN companies ON (companies.id = vettings.company_id)
LEFT OUTER JOIN practice_areas ON (companies.practice_area_id = practice_areas.id)
LEFT OUTER JOIN dispositions ON (companies.disposition_id = dispositions.id)
) A
WHERE A.rn = 1
ORDER BY name, vetting_name, vetting_event_type_position
You can use row_number analytic function.
Select * from (
Select ...,
Row_number() over ( partition by company_id order by vetting_event_type_position desc) as seq) T
Where seq=1