Group By: What if I don't want to perform an aggregate function on a column? - sql

I have a table, Students, with the following columns:
________________________________________________
| id | name | class | date_registrered |
------------------------------------------------
I want to select one row for every unique class, and only the row with the largest value in date_registrered,
i.e. I want to select the latest registrered Student for every class, including all the data for that one.
I tried:
SELECT id, name, class, MAX(date_registrered)
FROM Students
GROUP BY class;
I get the following error:
Column 'Students.id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
This question on SO adresses a simplified version of this issue. However, the example is for 2 columns only.
I only want to group by class, and I only want to perform an aggregate function on date_registrered. I also want to display all the other columns for the row with the max date_registrered for every class.
Do you know how to fix it?

use ROW_NUMBER()
SELECT *
FROM ( SELECT id, name, class, date_registrered
ROW_NUMBER() OVER (partition by class ORDER BY date_registrered DESC) rn
FROM Students
) T
WHERE T.rn = 1

The error message explains your issue very well, you can't perform an aggregation on one column, and not use the rest in the GROUP BY. In this case, you'll want to use something like ROW_NUMBER:
WITH CTE AS
(
SELECT id,
name,
class,
date_registered,
RN = ROW_NUMBER() OVER(PARTITION BY class ORDER BY date_registrered DESC)
FROM students
)
SELECT id,
name,
class,
date_registered
FROM CTE
WHERE RN = 1;

Related

Find list of topper across each class when given individual scores for each subject

I need help in writing an efficient query to find a list of toppers (students with maximum total marks in each class) when we are given individual scores for each subject across different classes. We are required to return 3 columns: class, topper_student name and topper_student_total marks.
I have used multiple sub-queries to find a solution. I am sure there would be much better implementations available for this problem (maybe via joins or window functions?).
Input table and my solution can be found at SQL Fiddle link.
http://www.sqlfiddle.com/#!15/2919e/1/0
Input table:
It would be clearer to use temporary tables to store results along the way and make the result traceable, but the solution can be achieved with a single query:
WITH student_marks AS (
SELECT Class_num, Name, SUM(Marks) AS student_total_marks
FROM School
GROUP BY Class_num, Name
)
SELECT Class_num, Name, student_total_marks
FROM (
SELECT Class_num, Name, student_total_marks, ROW_NUMBER() OVER(partition by Class_num order by student_total_marks desc, Class_num) AS beststudentfirst
FROM student_marks
) A
WHERE A.beststudentfirst = 1
The query within WITH statement calculate a sum of marks for every student in a class. At this point, subject is not required anymore. The result is temporarily stored into student_marks.
Next, we need to create a counter (beststudentfirst) using ROW_NUMBER to number the total marks from the highest to the lowest in each class (order by student_total_marks desc, Class_num). The counter should be reinitiated each time the class changes (partition by Class_num order).
From this last result, we only need the counter (beststudentfirst) with the value of one. It is the top student in each class.
Window functions are the most natural way to approach this. If you always want exactly three students, then use row_number():
select Class_num, Name, total_marks
from (select name, class_num, sum(marks) as total_marks,
row_number() over (partition by class_num order by sum(marks) desc) as seqnum
from School
group by Class_num, Name
) s
where seqnum <= 1
order by class_num, total_marks desc;
If you want to take ties into account, then use rank() or dense_rank().
Here is the SQL Fiddle.
select Class_num,[Name],total_marks from
(
select Row_number() over (partition by class_num order by Class_num,SUM(Marks) desc) as
[RN],Class_num,[Name],SUM(Marks) as total_marks
from School
group by Class_num,[Name]
)A
where RN=1

How to get 10 of the results in every group of a table using Hive sql?

I have a table
I want to group the data by class, then every class pick out two of the data,whatever sorting or not.
then get results like this.
How to write the sql?
Use row_number():
select t.*
from (select t.*, row_number() over (partition by class order by class) as seqnum
from t
) t
where seqnum <= 2;
If you want two particular rows -- such as the two highest scoring or lowest scoring -- then adjust the order by clause.

How to do a Postgresql group aggregation: 2 fields using one to select the other

I have a table - Data - of rows, simplified, like so:
Name,Amount,Last,Date
A,16,31,1-Jan-2014
A,27,38,1-Feb-2014
A,12,34,1-Mar-2014
B,8,37,1-Jan-2014
B,3,38,1-Feb-2014
B,17,39,1-Mar-2014
I wish to group them similar to:
select Name,sum(Amount),aggr(Last),max(Date) from Data group by Name
For aggr(Last) I want the value of 'Last' from the row that contains max(Date)
So the result I want would be 2 rows
Name,Amount,Last,Date
A,55,34,1-Mar-2014
B,28,39,1-Mar-2014
i.e. in both cases, the value of Last is the one from the row that contained 1-Mar-2014
The query I'm actually doing is basically the same, but with many more sum() fields and millions of rows, so I'm guessing an aggregate function could avoid multiple extra requests each group of incoming rows.
Instead, use row_number() and conditional aggregation:
select Name, sum(Amount),
max(case when seqnum = 1 then Last end) as Last,
max(date)
from (select d.*, row_number() over (partition by name order by date desc) as seqnum
from data d
) d
group by Name;

Is it possible to get a function result with columns which are not in the group by (SQL)?

I am trying to get the last registration date of a course, but I want to know the id of thar record. As MAX is a function, I must use group by id, which I do not want, because the result is very different (From only one record to each record per id).
Which is the way to manage a query like this?:
SELECT id, MAX(registration_date) AS registration_date
FROM courses;
Because it gives an error and I must do this to avoid it:
SELECT id, MAX(registration_date) AS registration_date
FROM courses
GROUP BY id;
And I do not want the result of the last one.
You could use the rank() window function for that:
SELECT id
FROM (SELECT id, RANK() OVER (ORDER BY registration_date DESC) AS rk
FROM courses)
WHERE rk = 1
One method is to use a sub query like this:
select *
from [dbo].[Courses]
where registration_date =
(select max(registration_date)
from [dbo].[Courses])
but with only a date to match this may return more than one record.
If possible, include more fields in the where clause to narrow it down.

SQL Query to obtain the maximum value for each unique value in another column

ID Sum Name
a 10 Joe
a 8 Mary
b 21 Kate
b 110 Casey
b 67 Pierce
What would you recommend as the best way to
obtain for each ID the name that corresponds to the largest sum (grouping by ID).
What I tried so far:
select ID, SUM(Sum) s, Name
from Table1
group by ID, Name
Order by SUM(Sum) DESC;
this will arrange the records into groups that have the highest sum first. Then I have to somehow flag those records and keep only those. Any tips or pointers? Thanks a lot
In the end I'd like to obtain:
a 10 Joe
b 110 Casey
You want the row_number() function:
select id, [sum], name
from (select t.*]
row_number() over (partition by id order by [sum] desc) as seqnum
from table1
) t
where seqnum = 1;
Your question is more confusing than it needs to be because you have a column called sum. You should avoid using SQL reserved words for identifiers.
The row_number() function assigns a sequential number to a group of rows, starting with 1. The group is defined by the partition by clause. In this case, all rows with the same id are in the same group. The ordering of the numbers is determined by the order by clause, so the one with the largest value of sum gets the value of 1.
If you might have duplicate maximum values and you want all of them, use the related function rank() or dense_rank().
select *
from
(
select *
,rn = row_number() over (partition by Id order by sum desc)
from table
)x
where x.rn=1
demo