Select rows of joined values using MAX - sql

Consider the following schema.
Student:
StudentID uniqueidentifier
Name varchar(max)
FKTeacherID uniqueidentifier
TestScore:
TestScoreID uniqueidentifier
Score int
FKStudentID uniqueidentifier
My goal is to write a query that yields each teacher's highest test score and the student that achieved it. Returning the teacher's id (Student.FKTeacherID), the score that was achieved (TestScore.Score) and the student that achieved it (Student.Name).
I can write something like this to get the first two required columns:
SELECT FKTeacherID, MAX(Score) MaxScore
FROM Student
JOIN TestScore on FKStudentID = StudentID
GROUP BY FKTeacherID
But I can't obtain the relevant Student.Name without adding it to the group by clause, which would change the result set.

If I'm understanding correctly, one option is to use row_number() -- here's an example with a common-table-expression:
with cte as (
select s.fkteacherid,
ts.score,
s.name,
row_number() over (partition by s.fkteacherid order by ts.score desc) rn
from student s
inner join testscore ts on s.studentid = ts.fkstudentid
)
select fkteacherid, score, name
from cte
where rn = 1
The basic idea is to group by fkteacherid, ordering by the score desc within each group, and taking the first record from each group.

UPDATE
Please try the updated query if you might:
SELECT
FKTeacherID, Name, Score
FROM
Student
JOIN TestScore on FKStudentID = StudentID
JOIN
(
SELECT
B.FKTeacherID AS TeacherID, MAX(A.Score) MaxScore
FROM
Student B
JOIN TestScore A on A.FKStudentID = B.StudentID
GROUP BY
B.FKTeacherID
) As TeachersMaxScore
ON TeachersMaxScore.TeacherID = FKTeacherID
AND TeachersMaxScore.MaxScore = Score

Related

SQL Server, how to get younger users?

I'm trying to get users from a younger country for example I have the following tables.
If there is more than one user of the youngest who have the same age, they should also be shown
Thanks
You can try this query, get MIN birthday on subquery then self join on users table.
select u.idcountry,t.name,u.username, (DATEPART(year, getdate()) - t.years) 'age'
from
(
SELECT u.idcountry,c.name,DATEPART(year, u.birthday) as 'years',count(*) as 'cnt'
FROM users u inner join country c on u.idcountry = c.idcountry
group by u.idcountry,c.name,DATEPART(year, u.birthday)
) t inner join users u on t.idcountry = u.idcountry and t.years = DATEPART(year, u.birthday)
where t.cnt > 1
sqlfiddle:https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=9baab959f79b1fa8c28ed87a8640e85d
Use the rank() window function:
select ...
from ...
where rank() over (partition by idcountry order by birthday) = 1
Rows with the same birthday in a country are ranked the same, so this returns all youngest people with if there’s more than one.
This is a little tricky. I would use window functions -- count the people of a particular age and choose the ones where there are duplicates for the youngest.
You don't specify how to define age, so I'll just use the earliest calendar year:
select u.*
from (select u.*,
count(*) over (partition by idcountry, year(birthday)) as cnt_cb,
rank() over (partition by idcountry order by year(birthday)) as rnk
from users u
) u
where cnt_cb > 1 and rnk = 1;
I'll let you handle the joins to bring in the country name.
Your sample data and desired results show the oldest users within each country when more than one of the oldest have the same age. The query below will do that, assuming age is calculated using full birth date.
WITH
users AS (
SELECT
username
, birthday
, idcountry
, (CAST(CONVERT(char(8),GETDATE(),112) AS int) - CAST(CONVERT(char(8),birthday,112) AS int)) / 10000 AS age
, RANK() OVER(PARTITION BY idcountry ORDER BY (CAST(CONVERT(char(8),GETDATE(),112) AS int) - CAST(CONVERT(char(8),birthday,112) AS int)) / 10000 DESC) AS age_rank
FROM dbo.Users
)
, oldest_users AS (
SELECT
username
, birthday
, idcountry
, age
, COUNT(*) OVER(PARTITION BY idcountry, age_rank ORDER BY age_rank) AS age_count
FROM users
WHERE age_rank = 1
)
SELECT
c.idcountry
, c.name
, oldest_users.age
, oldest_users.username
FROM oldest_users
JOIN dbo.Country AS c ON c.idcountry = oldest_users.idcountry
WHERE
oldest_users.age_count > 1;

How to get the largest value of a column in table?

I have the following table:
fileId, studentId,
Given a particular studentId, how can I get the largest fileId that is in the table ?
I saw this other query :
SELECT row
FROM table
WHERE id=(
SELECT max(id) FROM table
)
This would give the row where id is the largest. But what about the largest id for a given student id ? Is it better to match the student in the inner query or outer query ?
if you need the whole row:
select * from table where studentId = your_known_id order by fileId desc limit 1
This seems simplest:
SELECT studentId, MAX(ID)
FROM TableA
GROUP BY studentId
OR With filtering options:
WITH CTE AS
(
SELECT studentId, MAX(ID)
FROM TableA
GROUP BY studentId
)
SELECT * FROM CTE WHERE studentId ...
Try this:
SELECT *
FROM (
SELECT
id,
ROW_NUMBER () OVER (PARTITION BY studentId ORDER BY id desc) rnk
FROM table) a
WHERE a.rnk = 1;
... what about the largest id for a given student id
SELECT MAX(fileId)
FROM table
WHERE studentId = 123
Where 123 is the student if you want to filter on.
say there are three rows with studentId =3, but with fileId's = {4,5,6}, how do i get the row {fileId,studentId} = {6,3}
SELECT MAX(fileId), studentId
FROM table
WHERE studentId = 3
GROUP BY studentId
I think this code works as well
SELECT max(field) from table_name where studenId = <someID>

Group By Column, Select Most Recent Value

I'm performing a query on a table which tracks the results of a test taken by students. The test is composed of multiple sections, and there is a column for each section score. Each row is an instance of the test taken by a student. The sections can either be taken all at once, or split into multiple attempts. For example, a student can take one section today, and the rest tomorrow. In addition, a student is allowed to retake any section of the test.
Sample Student:
StudentID WritingSection ReadingSection MathSection DateTaken
1 65 85 54 4/1/2013 14:53
1 98 NULL NULL 4/8/2013 13:13
1 NULL NULL 38 5/3/2013 12:43
A NULL means that the section was not administered for the given test instance, and a second section score means the section was retaken.
I want a query that groups by the StudentID such that there is only one row per student, and the most recent score for each section is returned. I'm looking for an efficient way to solve this problem as we have many hundreds of thousands of test attempts in the database.
Expected Result:
StudentID WritingSection ReadingSection MathSection DateTaken
1 98 85 38 5/3/2013 12:43
EDIT:
There have been a lot of good solutions. I want to experiment with each next week a little more before choosing the answer. Thanks everyone!
Sorry - my previous answer answered a DIFFERENT question than the one posed :) It will return all data from the MOST RECENT row. The question asked is to aggregate over all rows to grab the most recent score for each subject individually.
But I'm leaving it up there because the question I answered is a common one, and maybe someone landing on this question actually had that question instead :)
Now to answer the actual question:
I think the cleanest way to do this is with PIVOT and UNPIVOT:
SELECT StudentID, [WritingSection], [ReadingSection], [MathSection], MAX(DateTaken) DateTaken
FROM (
SELECT StudentID, Subject, DateTaken, Score
FROM (
SELECT StudentID, Subject, DateTaken, Score
, row_number() OVER (PARTITION BY StudentID, Subject ORDER BY DateTaken DESC) as rowNum
FROM Students s
UNPIVOT (
Score FOR Subject IN ([WritingSection],[ReadingSection],[MathSection])
) u
) x
WHERE x.rowNum = 1
) y
PIVOT (
MAX(Score) FOR Subject IN ([WritingSection],[ReadingSection],[MathSection])
) p
GROUP BY StudentID, [WritingSection], [ReadingSection], [MathSection]
The innermost subquery (x) uses SQL's UNPIVOT function to normalize the data (meaning to turn each student's score on each section of the test into a single row).
The next subquery out (y) is simply to filter the rows to only the most recent score FOR EACH SUBJECT INDIVIDUALLY (this is a workaround of the SQL bug that you can't use windowed functions like row_number() in a WHERE clause).
Lastly, since you want the data displayed back in the denormalized original format (1 column for each section of the test), we use SQL's PIVOT function. This simply turns rows into columns - one for each section of the test. Finally, you said you wanted the most recent test taken shown (despite the fact that each section could have its own unique "most recent" date). So we simply aggregate over those 3 potentially different DateTakens to find the most recent.
This will scale more easily than other solutions if there are more Sections added in the future - just add the column names to the list.
This is tricky. Each section score is coming potentially from a different record. But the normal rules of max() and min() don't apply.
The following query gets a sequence number for each section, starting with the latest non-NULL value. This is then used for conditional aggregation in the outer query:
select s.StudentId,
max(case when ws_seqnum = 1 then WritingSection end) as WritingSection,
max(case when rs_seqnum = 1 then ReadingSection end) as ReadingSection,
max(case when ms_seqnum = 1 then MathSection end) as MathSection,
max(DateTaken) as DateTaken
from (select s.*,
row_number() over (partition by studentid
order by (case when WritingSection is not null then 0 else 1 end), DateTaken desc
) as ws_seqnum,
row_number() over (partition by studentid
order by (case when ReadingSection is not null then 0 else 1 end), DateTaken desc
) as rs_seqnum,
row_number() over (partition by studentid
order by (case when MathSection is not null then 0 else 1 end), DateTaken desc
) as ms_seqnum
from student s
) s
where StudentId = 1
group by StudentId;
The where clause is optional in this query. You can remove it and it should still work on all students.
This query is more complicated than it needs to be, because the data is not normalized. If you have control over the data structure, consider an association/junction table, with one row per student per test with the score and test date as columns in the table. (Full normality would introduce another table for the test dates, but that probably isn't necessary.)
Joe's solution will return only one student id - the one that took the test the latest. The way to get the latest date for each student id is to use analytical functions. Here's an example if you're using Oracle database:
SELECT a.StudentID, a.DateTaken
FROM ( SELECT StudentID,
DateTaken,
ROW_NUMBER ()
OVER (PARTITION BY StudentID ORDER BY DateTaken DESC)
rn
FROM pto.test
ORDER BY DateTaken DESC) a
WHERE a.rn = 1
Note how the row_number() funciton will put 1 at the latest date of each student id. And on the outer select you just filter those records with rn = 1... Execute only the inner select to see how it works.
Let me know what kind of database you're using to give you a solution for it. Each database has it's own implementation of analytical functions but the logic is the same...
This is a pretty classic annoying problem in SQL - there's no super elegant way to do it. Here's the best I've found:
SELECT s.*
FROM Students s
JOIN (
SELECT StudentID, MAX(DateTaken) as MaxDateTaken
FROM Students
GROUP BY StudentID
) f ON s.StudentID = f.StudentID AND s.DateTaken = f.MaxDateTaken
Joining on the date field isn't super ideal (this breaks in the event of ties for a MAX) or fast (depending on how the table is indexed). If you have an int rowID that is unique across all rows, it would be preferable to do:
SELECT s.*
FROM Students s
JOIN (
SELECT rowID
FROM (
SELECT StudentID, rowID, row_number() OVER (PARTITION BY StudentID ORDER BY DateTaken DESC) as rowNumber
FROM Students
) x
WHERE x.rowNumber = 1
) f ON s.rowID = f.rowID
How about using the following to the maximum DateTaken?
SELECT max(DateTaken) FROM TABLE_NAME
WHERE StudentID = 1
You could use that in a sub query to get a row like?
SELECT WritingSection FROM TABLE_NAME
WHERE StudentID = 1 and DateTaken = (SELECT max(DateTaken) FROM TABLE_NAME
WHERE StudentID = 1 and WritingSection IS NOT NULL)
You would need to run this twice more for ReadingSection and MathSection?
SELECT student.studentid,
WRITE.writingsection,
READ.readingsection,
math.mathsection,
student.datetaken
FROM
-- list of students / max dates taken
(SELECT studentid,
Max(datetaken) datetaken
FROM test_record
GROUP BY studentid) student,
-- greatest date for student with a writingsection score (dont care what the date is here, just that the score comes from the greatest date)
(SELECT studentid,
writingsection
FROM test_record t
WHERE writingsection IS NOT NULL
AND datetaken = (SELECT Max(datetaken)
FROM test_record
WHERE studentid = t.studentid
AND writingsection IS NOT NULL)) WRITE,
(SELECT studentid,
readingsection
FROM test_record t
WHERE readingsection IS NOT NULL
AND datetaken = (SELECT Max(datetaken)
FROM test_record
WHERE studentid = t.studentid
AND readingsection IS NOT NULL)) READ,
(SELECT studentid,
mathsection
FROM test_record t
WHERE mathsection IS NOT NULL
AND datetaken = (SELECT Max(datetaken)
FROM test_record
WHERE studentid = t.studentid
AND mathsection IS NOT NULL)) math
WHERE
-- outer join in case a student has no score recorded for one or more of the sections
student.studentid = READ.studentid(+)
AND student.studentid = WRITE.studentid(+)
AND student.studentid = math.studentid(+);

Sum of Highest 5 numbers in SQL Server 2000

I am having a problem in query some data from database. My table is given below:
What i need is that sum of 5 highest total_marks from the table for each student.
Although i tried the code given below, but it is not returning what i expected.
SELECT s.studentid, SUM(s.total_marks)
FROM students s
WHERE s.sub_code IN (SELECT TOP 5 sub_code
FROM students a
WHERE a.studentid = s.studentid
ORDER BY total_marks DESC)
GROUP BY studentid
Please help me guys. Thanking you advance.
You query could work if there's unique/primary key on (studentId, subcode). At the moment, the query returns 6 records instead of 5 for studentId = 1, for example, beause of duplicate subcode 303.
Usually table should have a unique key, may be you can add incremental id to rewrite your query like:
select s.*
from students as s
where
s.id in (
select top 5 a.id
from students as a
where a.studentId = s.studentId
order by a.total_marks desc
);
Or, if you have unique combinations of (studentId, subcode, total_marks), you can use query like this:
select s.*
from students as s
where
exists (
select *
from (
select top 5 a.subcode, a.total_marks
from students as a
where a.studentId = s.studentId
order by a.total_marks desc
) as b
where b.subcode = s.subcode and b.total_marks = s.total_marks
);
sql fiddle demo
First you should select top 5 grades for each student -
select row_number() over (partition by studentid order by total_marks desc) as rank,
studentid,
total_marks
from students
where rank <= 5
from there you'll be able to use this as a subquery, and use group_by and sum:
select studentid, sum(total_marks)
from
(
select row_number() over (partition by studentid order by total_marks desc) as rank,
studentid,
total_marks
from students
where rank <= 5
) t
group by studentid
This isn't ideal, but the method you started to use requires a primary key column. You can simulate one with a temp table since SQL 2000.
CREATE TABLE #temp (
StudentID INT,
total_marks INT,
ID INT Identity(1,1)
)
INSERT INTO #temp (
StudentID,
total_Marks
)
Select
StudentID,
total_marks
FROM Students
SELECT s.studentid, SUM(s.total_marks)
FROM #temp s
WHERE s.ID IN (SELECT TOP 2
a.ID
FROM #temp a
WHERE a.studentid = s.studentid
ORDER BY total_marks DESC)
GROUP BY studentid
I think SQL 2000 may have a slightly more compact syntax for this, but SQL Fiddle won't let me test versions that old.
Please test this carefully. You will be dumping this entire table to a temp table and that's almost always a bad idea.
Also, ensure that there is some combination of fields not including the total that uniquely identifies a row, or consider adding a surrogate key column to the table.
SQL Fiddle Demo

Get the top row after order by in Oracle Subquery

I have a table student(id, name, department, age, score). I want to find the youngest student who has the highest(among the youngest students) score of each department. In SQL Server, I can use following SQL.
select * from student s1
where s1.id in
(select s2.id from student s2
where s2.department = s1.department order by age asc, score desc top 1).
However, in Oracle, you cannot use the order by clause in subquery and there is no limit/top like keyword. I have to join the student table with itself two times to query the result. In oracle, I use following SQL.
select s1.* from student s1,
(select s2.department, s2.age, max(s2.score) as max_score from student s2,
(select s3.department, min(s3.age) as min_age from student s3 group by s3.department) tmp1 where
s2.department = tmp1.department and s2.age = tmp1.min_age group by s2.department, s2.age) tmp2
where s1.department =tmp2.department and s1.age = tmp2.age and s1.score=tmp2.max_score
Does anyone have any idea to simplify the above SQL for oracle.
try this one
select * from
(SELECT id, name, department, age, score,
ROW_NUMBER() OVER (partition by department order by age desc, score asc) srlno
FROM student)
where srlno = 1;
In addition to Allan's answer, this works fine too:
select *
from (SELECT *
FROM student
order by age asc,
score desc)
where rownum = 1;
In addition to Bharat's answer, it is possible to do this using ORDER BY in the sub-query in Oracle (as point out by Jeffrey Kemp):
SELECT *
FROM student s1
WHERE s1.id IN (SELECT id
FROM (SELECT id, ROWNUM AS rn
FROM student s2
WHERE s1.department = s2.department
ORDER BY age ASC, score DESC)
WHERE rn = 1);
If you use this method, you may be tempted to remove the sub-query and just use rownum = 1. This would result in the incorrect result as the sort would be applied after the criteria (you'd get 1 row that was sorted, not one row from the sorted set).
select to_char(job_trigger_time,'mm-dd-yyyy') ,job_status from
(select * from kdyer.job_instances ji INNER JOIN kdyer.job_param_values pm
on((ji.job_id = pm.job_id) and (ji.job_spec_id = '10003') and (pm.param_value='21692') )
order by ji.job_trigger_time desc)
where rownum<'2'