SQL help: select the last 3 comments for EACH student? - sql

I have two tables to store student data for a grade-school classroom:
Behavior_Log has the columns student_id, comments, date
Student_Roster has the columns student_id, firstname, lastname
The database is used to store daily comments about student behavior, and sometimes the teacher makes multiple comments about a student in a given day.
Now let's say the teacher wants to be able to pull up a list of the last 3 comments made for EACH student, like this:
Jessica 7/1/09 talking
Jessica 7/1/09 passing notes
Jessica 5/3/09 absent
Ciboney 7/2/09 great participation
Ciboney 4/30/09 absent
Ciboney 2/22/09 great participation
...and so on for the whole class
The single SQL query must return a set of comments for each student to eliminate the human-time-intensive need for the teacher to run separate queries for each student in the class.
I know that this sounds similar to
SQL Statement Help - Select latest Order for each Customer but I need to display the last 3 entries for each person, I can't figure out how to get from here to there.
Thanks for your suggestions!

A slightly modified solution from this article in my blog:
Analytic functions: SUM, AVG, ROW_NUMBER
SELECT student_id, date, comment
FROM (
SELECT student_id, date, comment, (#r := #r + 1) AS rn
FROM (
SELECT #_student_id:= -1
) vars,
(
SELECT *
FROM
behavior_log a
ORDER BY
student_id, date DESC
) ao
WHERE CASE WHEN #_student_id <> student_id THEN #r := 0 ELSE 0 END IS NOT NULL
AND (#_student_id := student_id) IS NOT NULL
) sc
JOIN Student_Roster sr
ON sr.student_id = sc.student_id
WHERE rn <= 3

A different approach would be to use the group_concat function and a single sub select and a limit on that subselect.
select (
select group_concat( concat( student, ', ', date,', ', comment ) separator '\n' )
from Behavior_Log
where student_id = s.student_id
group by student_id
limit 3 )
from Student_Roster s

Related

sql how to set a condition with having count

I have this table
codice
cfmoglie
cfmarito
data
numero
invitati
I have to find the couples that are already married.
I'm tryng this one and it works:
SELECT 'moglie' as persona,
matrimonio.cfmoglie as cf,
COUNT(matrimonio.cfmoglie) as già_sposati
FROM matrimonio
GROUP BY matrimonio.cfmoglie
HAVING COUNT(matrimonio.cfmoglie)> 1
UNION
SELECT 'marito' as persona,
matrimonio.cfmarito as cf,
COUNT(matrimonio.cfmarito) as già_sposati
FROM matrimonio
GROUP BY matrimonio.cfmarito
HAVING COUNT(matrimonio.cfmarito)> 1;
Changing cfmarito or cfmoglie I have, for example, 7 records, but I need just the couples that have already married, not the person. How I can solve?
You can use analytic functions to find the rows where the cfmoglie and cfmarito have both previously occurred once or more:
SELECT *
FROM (
SELECT cfmoglie,
cfmarito,
data,
COUNT(*) OVER (PARTITION BY cfmoglie ORDER BY data) AS num_moglie,
COUNT(*) OVER (PARTITION BY cfmarito ORDER BY data) AS num_marito
FROM table_name
)
WHERE num_moglie > 1
AND num_marito > 1
If you want to find couples where at least one partner was already married (rather than both partners) then change AND to OR.
If you want to find couples who are renewing their marriage vows (i.e. the same couple has already married before) then:
SELECT *
FROM (
SELECT cfmoglie,
cfmarito,
data,
COUNT(*) OVER (PARTITION BY cfmoglie, cfmarito ORDER BY data)
AS num_married
FROM table_name
)
WHERE num_married > 1

Select MAX or SUM

Simply, I have exam note for many Student for many exam,
see the picture below (MATH = 0, BIOLOGY = 2, ALGEBRA = 1)
I just want to give the student the Max notes = The Student have 2 = on BIOLOGY so ALGEBRA AND MATH must be at 2
Try to have this :
I try this :
SELECT First_Name, EXAM, MAX(NOTE)
FROM My_Table
Group by First_Name, EXAM
Not working, still give me this (MATH = 0, BIOLOGY = 2, ALGEBRA = 1)
Try also :
SELECT First_Name, EXAM,
CASE
WHEN SUM(NOTE) <> 0 THEN MAX(NOTE)
Else 0
END AS MAX_NOTE
FROM My_Table
Not working
Please do you have any idea ? or solution ? Click to check see the picture (screenshot)
Remove the group by, and use a window function to take care of per-student logic:
SELECT First_Name, EXAM, MAX(NOTE) over (partition by First_Name)
FROM My_Table
SELECT
First_Name,
EXAM,
MAX(NOTE) OVER(partition by First_Name) as MAX_NOTE
FROM
My_Table
Reference for using Max and OVER together in SQL Server:
https://learn.microsoft.com/en-us/sql/t-sql/functions/max-transact-sql?view=sql-server-ver15

How to get MAX value out of the GROUPs COUNT

I've recently started to learn tsql beyond basic inserts and selects, I have test database that I train on, and there is one query that I can't really get to work.
There are 3 tables used in that query, in the picture there are simplified fields and relations
I have 2 following queries - first one is simply displaying students and number of marks from each subject. Second is doing almost what I want to achive - shows students and maxiumum amount of marks they got, so ex.
subject1 - (marks) 1, 5, 3, 4 count - 4
subject2 - (marks) 5, 4, 5 - count - 3
Query shows 4 and from what I checked it returns correct results, but I want one more thing - just to show the name of the subject from which there is maximum amount of marks so in the example case - subject1
--Query 1--
SELECT s.Surname, subj.SubjectName, COUNT(m.Mark) as Marks_count
FROM marks m, students s, subjects subj
WHERE m.StudentId = s.StudentNumber and subj.SubjectNumber = m.SubjectId
GROUP BY s.Surname, subj.SubjectName
ORDER BY s.Surname
--Query 2--
SELECT query.Surname, MAX(Marks_count) as Maximum_marks_count FROM (SELECT s.Surname, subj.SubjectNumber, COUNT(m.Mark) as Marks_count
FROM marks m, students s, subjects subj
WHERE marks.StudentId = s.StudentNumber and subj.SubjectNumber = m.SubjectId
GROUP BY s.Surname, subj.SubjectName) as query
GROUP BY query.Surname
ORDER BY query.Surname
--Query 3 - not working as supposed--
SELECT query.Surname, query.SubjectName, MAX(Marks_count) as Maximum_marks_count FROM (SELECT s.Surname, subj.SubjectNumber, COUNT(m.Mark) as Marks_count
FROM marks m, students s, subjects subj
WHERE marks.StudentId = s.StudentNumber and subj.SubjectNumber = m.SubjectId
GROUP BY s.Surname, subj.SubjectName) as query
GROUP BY query.Surname, query.SubjectName
ORDER BY query.Surname
Part of the query 1 result
Part of the query 2 and unfortunately query 3 result
The problem is that when I add to the select statement subject name I got results as from query one - there is no more maximum amount of marks just students, subjects and amount of marks from each subject.
If someone could say what I'm missing, I will much appreciate :)
Here's a query that gets the highest mark per student, put it at the top of your sql file/batch and it will make another "table" you can join to your other tables to get the student name and the subject name:
WITH studentBest as
SELECT * FROM(
SELECT *, ROW_NUMBER() OVER(PARTITION BY studentid ORDER BY mark DESC) rown
FROM marks) a
WHERE rown = 1)
You use it like this (for example)
--the WITH bit goes above this line
SELECT *
FROM
studentBest sb
INNER JOIN
subject s
ON sb.subjectid = s.subjectnumber
Etc
That's also how you should be doing your joins
How does it work? Well.. it establishes an incrementing counter that restarts every time studentid changes (the partition clause) and the numberin goes in des ending mark order (the order by clause). An outer query selects only those rows with 1 in the row number, ie the top mark per student
Why can't I use group by?
You can, but you have to write a query that summarises the marks table into the top mark (max) per student and then you have to join that data back to the mark table to retrieve the subject and all in it's a lot more faff, often less efficient
What if there are two subjects with the same mark?
Use RANK instead of ROW_NUMBER if you want to see both
Edit in response to your comment:
An extension of the above method:
SELECT * FROM
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY su, st ORDER BY c DESC) rn FROM
(
SELECT studentid st, subjectid su, count(*) c
FROM marks
GROUP BY st, su
) a
) b
INNER JOIN student stu on b.st = stu.studentnumber
INNER JOIN subject sub on b.su = sub.subjectnumber
WHERE
b.rn = 1
We count the marks by student/subject, then rownumber them in descending order of count per student-subject pair, then choose only the first row and join in the other wanted data
Ok thanks to Caius Jard, some other Stack's question and a little bit of experiments I managed to write working query, so this is how I did it.
First I created view from query1 and added one more column to it - studentId.
Then I wrote query which almost satisfied me. That question helped me a lot with that task: Question
SELECT marks.Surname,
marks.SubjectName,
marks.Marks_count,
ROW_NUMBER() OVER(PARTITION BY marks.Surname ORDER BY marks.Surname) as RowNum
FROM MarksAmountPerStudentAndSubject marks
INNER JOIN (SELECT MarksAmountPerStudentAndSubject.Id,
MAX(MarksAmountPerStudentAndSubject.Marks_count) as MaxAmount
FROM MarksAmountPerStudentAndSubject
GROUP BY MarksAmountPerStudentAndSubject.Id) m
ON m.Id = marks.Id and marks.Marks_count = m.MaxAmount
It gives following results
That's what I wanted to achieve with one exception - if students have the same amount of marks from multiple subjects it displays all of them - thats fine but I decided to restrict this to the first result for each student - I couldn't just simply put TOP(1)
there so I used similar solution that Caius Jard showed - ROW_NUMBER and window function - it gave me a chance to choose records that has row number equals to 1.
I created another view from this query and I could simply write the final one
SELECT marks.Surname, marks.SubjectName, marks.Marks_count
FROM StudentsMaxMarksAmount marks
WHERE marks.RowNum = 1
ORDER BY marks.Surname
With result

Query including TOP and AVG

This is the last problem I have to deal with in my application and I hope someone will help because I'm clueless, I did my research and cannot find a proper solution.
I have an 'University Administration' application. I need to make a report with few tables included.
The problem is in SQL Query i have to finish. Query needs to MAKE LIST OF BEST 'n' STUDENTS, and the condition for student to be 'best' is grade AVERAGE.
I have 3 columns (students.stID & examines.grades). I need to get an average of my 'examines.grades' column, sort the table from highest (average grade) to the lowest, and I need to filter 'n' best 'averages'.
The user would enter the filter number and as I said, the app needs to show 'n' best averages.
Problem is in my SQL knowledge (not mySQL literaly but T-SQL). This is what I've donne with my SQL query, but the problem lies in the "SELECT TOP" because when I press my button, the app takes average only from TOP 'n' rows selected.
SELECT TOP(#topParam) student.ID, AVG(examines.grades)
FROM examines INNER JOIN
student ON examines.stID = student.stID
WHERE (examines.grades > 1)
For example:
StudentID Grade
1 2
2 5
1 5
2 2
2 4
2 2
EXIT:
StudentID Grade_Average
1 3.5
2 3.25
Being impatient, I think this is what you are looking for. You didn't specify which SQL Server version you're using although.
DECLARE #topParam INT = 3; -- Default
DECLARE #student TABLE (StudentID INT); -- Just for testing purpose
DECLARE #examines TABLE (StudentID INT, Grades INT);
INSERT INTO #student (StudentID) VALUES (1), (2);
INSERT INTO #examines (StudentID, Grades)
VALUES (1, 2), (2, 5), (1, 5), (2, 2), (2, 4), (2, 2);
SELECT DISTINCT TOP(#topParam) s.StudentID, AVG(CAST(e.grades AS FLOAT)) OVER (PARTITION BY s.StudentID) AS AvgGrade
FROM #examines AS e
INNER JOIN #student AS s
ON e.StudentID = s.StudentID
WHERE e.grades > 1
ORDER BY AvgGrade DESC;
If you'll provide some basic data, I'll adapt query for your needs.
Result:
StudentID AvgGrade
--------------------
1 3.500000
2 3.250000
Quick explain:
Query finds grades average in derived table and later queries it sorting by it. Another tip: You could use WITH TIES option in TOP clause to get more students if there would be multiple students who could fit for 3rd position.
If you'd like to make procedure as I suggested in comments, use this snippet:
CREATE PROCEDURE dbo.GetTopStudents
(
#topParam INT = 3
)
AS
BEGIN
BEGIN TRY
SELECT DISTINCT TOP(#topParam) s.StudentID, AVG(CAST(e.grades AS FLOAT)) OVER (PARTITION BY s.StudentID) AS AvgGrade
FROM examines AS e
INNER JOIN student AS s
ON e.StudentID = s.StudentID
WHERE e.grades > 1
ORDER BY AvgGrade DESC;
END TRY
BEGIN CATCH
SELECT ERROR_NUMBER(), ERROR_MESSAGE();
END CATCH
END
And later call it like that. It's a good way to encapsulate your logic.
EXEC dbo.GetTopStudents #topParam = 3;
You should use the group by clause for counting average grades (in case examines.grades has an integer type, you should cast it to the floating-point type) for each student.ID and order by clause to limit your output to only top n with highest average grades:
select top(#topParam) student.ID
, avg(cast(examines.grades as float)) as avg_grade
from examines
join student on examines.stID = student.stID
where (examines.grades > 1)
group by student.ID
order by avg_grade desc
There's no need for A Windowed Aggregate which returns the duplicate rows and then you need DISTINCT to remove them again. It's a simple aggregation and your original query was already quite close:
SELECT TOP(#topParam) student.ID, AVG(CAST(grade AS FLOAT)) as AvgGrade
FROM examines INNER JOIN
student ON examines.stID = student.stID
WHERE (examines.grades > 1)
group by student.ID
order by AvgGrade DESC

How to 'add' a column to a query result while the query contains aggregate function?

I have a table named 'Attendance' which is used to record student attendance time in courses. This table has 4 columns, say 'id', 'course_id', 'attendance_time', and 'student_name'. An example of few records in this table is:
23 100 1/1/2010 10:00:00 Tom
24 100 1/1/2010 10:20:00 Bob
25 187 1/2/2010 08:01:01 Lisa
.....
I want to create a summary of the latest attendance time for each course. I created a query below:
SELECT course_id, max(attendance_time) FROM attendance GROUP BY course_id
The result would be something like this
100 1/1/2010 10:20:00
187 1/2/2010 08:01:01
Now, all I want to do is add the 'id' column to the result above. How to do it?
I can't just change the command to something like this
SELECT id, course_id, max(attendance_time) FROM attendance GROUP BY id, course_id
because it would return all the records as if the aggregate function is not used. Please help me.
This is a typical 'greatest per group', 'greatest-n-per-group' or 'groupwise maximum' query that comes up on Stack Overflow almost every day. You can search Stack Overflow for these terms to find many different examples of how to solve this with different databases. One way to solve it is as follows:
SELECT
T2.course_id,
T2.attendance_time
T2.id
FROM (
SELECT
course_id,
MAX(attendance_time) AS attendance_time
FROM attendance
GROUP BY course_id
) T1
JOIN attendance T2
ON T1.course_id = T2.course_id
AND T1.attendance_time = T2.attendance_time
Note that this query can in theory return multiple rows per course_id if there are multiple rows with the same attendance_time. If that cannot happen then you don't need to worry about this issue. If this is a potential problem then you can solve this by adding an extra grouping on course_id, attendance_time and selecting the minimum or maximum id.
What do you need the additional column for? It already has a course ID, which identifies the data. A synthetic ID to the query would be useless because it does not refer to anything. If you want to get the max from the query results for a single course, then you can add a where condition like this:
SELECT course_id, max(attendance_time) FROM attendance GROUP BY course_id **WHERE course_id = your_id_here**;
If you mean that the column should be named 'id', you can alias it in the query:
SELECT course_id **AS id**, max(attendance_time) FROM attendance GROUP BY course_id;
You could make a view out of your query to easily access the aggregate data:
CREATE VIEW max_course_times AS SELECT course_id AS id, max(attendance_time) FROM attendance GROUP BY course_id;
SELECT * FROM max_course_times;
For SQL Server 2008 onwards, I like to use a Common Table Expression to add aggregated columns to queries:
WITH AttendanceTimes (course_id, maxTime)
AS
(
SELECT
course_id,
MAX(attendance_time)
FROM attendance
GROUP BY course_id
)
SELECT
a.course_id,
t.maxTime,
a.id
FROM attendance a
INNER JOIN AttendanceTimes t
ON a.course_id = t.course_id