group by in sql to find max count - sql

Update
I have a query like this
select sl.College_ID,sl.Department_ID,COUNT(sl.RegisterNumber) from StudentList sl
group by sl.College_ID,sl.Department_ID
order by sl.College_ID,sl.Department_ID asc
abouve query gives the below result
and i have 200 - college id and each college have 6 department_id i could get the count [No.of student ] in each department
College_Id Dept_Id count
1 1 100
1 2 210
2 3 120
2 6 80
3 1 340
but my question is i need to display the maximum count[student] for each department
some thing like this
college_ID Dept_Id count
3 1 340
26 2 250
and i tried this out but getting error
select sl.College_ID,sl.Department_ID,COUNT(sl.RegisterNumber) from StudentList sl
group by sl.College_ID,sl.Department_ID
having COUNT(sl.RegisterNumber)=max(COUNT(sl.RegisterNumber))
order by sl.College_ID,sl.Department_ID asc
what went wrong can some one help me

Maybe something like this?
SELECT sl.College_ID, sl.Department_ID, COUNT(sl.RegisterNumber) As StudentCount, s2.MaxCount
FROM StudentList sl
INNER JOIN (
SELECT Department_ID, MAX(StudentCount) AS MaxCount
FROM (
SELECT College_ID, Department_ID, COUNT(*) As StudentCount
FROM StudentList
GROUP BY College_ID, Department_ID
) s1
GROUP BY Department_ID
) s2 ON sl.Department_ID = s2.Department_ID
GROUP BY sl.College_ID, sl.Department_ID, s2.MaxCount
HAVING COUNT(sl.RegisterNumber) = s2.MaxCount
ORDER BY sl.College_ID, sl.Department_ID ASC
EDIT: I've updated the query to more accurately answer your question, I missed the part where you want the College_ID with the max count.
EDIT 2: Okay, this should work now, I needed a second nested subquery for aggregating the aggregates. I don't know of a better way to compare the aggregates of different groups.

The result you want, which is group on college_ID and you not really care about college_ID, since from you example, with Dept_Id=1 will can not sure which college_ID is.
In that case, you can remove college_ID from your select statement and do a SUM GROUP BY
base on your query, something like:
SELECT t.Department_ID, SUM(t.c)
FROM (
select sl.College_ID,sl.Department_ID,COUNT(sl.RegisterNumber) c from StudentList sl
group by sl.College_ID,sl.Department_ID
) t
GROUP BY t.Department_ID
ORDER BY SUM(t.c)
Note: If you really want college_ID in your result, you can do a JOIN to get your college_ID

Related

Join results of two queries in SQL and produce a result given some condition

I've never used SQL until now, so please bear with me. I have a table of departments:
I have written two queries as follows:
-- nbr of staff associated with each dept.
SELECT count(departmentId) as freq
FROM staff
GROUP BY departmentId
-- nbr of students associated with each dept.
SELECT count(departmentId) as freq
FROM StudentAssignment
GROUP BY departmentId
These produce the following two tables:
For each department id 1 to 5, I need to divide the studentFreq by the staffFreq and show the department id and description if the result is greater than 2.
If the staffFreq i.e. number of staff, for a department id is zero then I need to show that department id and description too.
So for example, in this case I want to produce a table with the department ids of 1, 4 and 5 and their corresponding descriptions: Computing, Classics and Mechanical Engineering.
Computing because 7 / 2 > 2. Classics and ME because 0 staff are assigned to those depts.
One method is a left join, starting with the departments table:
SELECT d.*,
s.freq as as num_staff, sa.freq as num_students,
sa.freq * 1.0 / s.freq as student_staff_ratio
FROM deptartments d LEFT JOIN
(SELECT departmentId, count(*) as freq
FROM staff
GROUP BY departmentId
) s
ON s.departmentId = d.department_id LEFT JOIN
(SELECT departmentId, count(*) as freq
FROM StudentAssignment
GROUP BY departmentId
) sa
ON sa.departmentId = d.departmentId;
Notes:
This should missing values as NULL rather than 0. You can assign 0 instead using COALESCE(): COALESCE(s.freq, 0) as num_staff.
SQL Server does integer division, so 7 / 2 = 3, not 3.5. I think you would typically want the fractional component.

Only joining rows where the date is less than the max date in another field

Let's say I have two tables. One table containing employee information and the days that employee was given a promotion:
Emp_ID Promo_Date
1 07/01/2012
1 07/01/2013
2 07/19/2012
2 07/19/2013
3 08/21/2012
3 08/21/2013
And another table with every day employees closed a sale:
Emp_ID Sale_Date
1 06/12/2013
1 06/30/2013
1 07/15/2013
2 06/15/2013
2 06/17/2013
2 08/01/2013
3 07/31/2013
3 09/01/2013
I want to join the two tables so that I only include sales dates that are less than the maximum promotion date. So the result would look something like this
Emp_ID Sale_Date Promo_Date
1 06/12/2013 07/01/2012
1 06/30/2013 07/01/2012
1 06/12/2013 07/01/2013
1 06/30/2013 07/01/2013
And so on for the rest of the Emp_IDs. I tried doing this using a left join, something to the effect of
left join SalesTable on PromoTable.EmpID = SalesTable.EmpID and Sale_Date
< max(Promo_Date) over (partition by Emp_ID)
But apparently I can't use aggregates in joins, and I already know that I can't use them in the where statement either. I don't know how else to proceed with this.
The maximum promotion date is:
select emp_id, max(promo_date)
from promotions
group by emp_id;
There are various ways to get the sales before that date, but here is one way:
select s.*
from sales s
where s.sales_date < (select max(promo_date)
from promotions p
where p.emp_id = s.emp_id
);
Gordon's answer is right on! Alternatively, you could also do a inner join to a subquery to achieve your desired output like this:
SELECT s.emp_id
,s.sales_date
,t.promo_date
FROM sales s
INNER JOIN (
SELECT emp_id
,max(promo_date) AS promo_date
FROM promotions
GROUP BY emp_id
) t ON s.emp_id = t.emp_id
AND s.sales_date < t.promo_date;
SQL Fiddle Demo

Ranking Aggregate Field in Access Query

I am trying to rank an aggregate field in access but my efforts are in vain with errors based on referencing. I am ranking using a subquery but the problem comes about due to the alias names resulting from performing an average on a field. The code is as below:
SELECT [Exams].[StudentID],
Avg([Exams].[Biology]) AS [AvgBiology],
(SELECT Avg(T.Biology) AS [TAvgBiology],
Count(*)
FROM [Exams] AS T
WHERE T.[TAvgBiology] > [AvgBiology])
+ 1 AS Rank
FROM [Exams]
GROUP BY [Exams].[StudentID]
ORDER BY Avg([Exams].[Biology]) DESC;
Errors that come about state: "You have selected a subquery that can return more than one value blah blah...please use the Exist keyword.. ".
From the code above I think you get the gist of what I am trying to achieve.
Start with the basic GROUP BY query Gordon Linoff suggested to compute the average Biology for each StudentID.
SELECT
e.StudentID,
Avg(e.Biology) AS AvgBiology
FROM Exams AS e
GROUP BY e.StudentID
Save that query as qryAvgBiology and then use it in another query where you compute Rank.
SELECT
q.StudentID,
q.AvgBiology,
(
(
SELECT Count(*)
FROM qryAvgBiology AS q2
WHERE q2.AvgBiology > q.AvgBiology
)
+1
) AS Rank
FROM qryAvgBiology AS q
ORDER BY 3;
For example, if qryAvgBiology returns this result set ...
StudentID AvgBiology
--------- ----------
1 70
2 80
3 90
The ranking query will transform it to this ...
StudentID AvgBiology Rank
--------- ---------- ----
3 90 1
2 80 2
1 70 3
I assume your basic query is:
SELECT e.StudentId Avg(e.Biology) AS AvgBiology
FROM exams as e
GROUP BY e.StudentId;
(Square braces don't help me understand the query at all.)
I think the following will work in Access:
SELECT e.StudentId Avg(e.Biology) AS AvgBiology,
(SELECT 1 + COUNT(*)
FROM (SELECT e.StudentId, Avg(e.Biology) AS AvgBiology
FROM exams as e
GROUP BY e.StudentId
) e2
WHERE e2.AvgBiology > Avg(e.Biology)
) as ranking
FROM exams as e
GROUP BY e.StudentId;

how to avoid duplicate on Joining two tables

Student Table
SID Name
1 A
2 B
3 C
Marks Table
id mark subject
1 50 physics
2 40 biology
1 50 chemistry
3 30 mathematics
SELECT distinct(std.id),std.name,m.mark, row_number() over() as rownum FROM
student std JOIN marks m ON std.id=m.id AND m.mark=50
This result is 2 times A even after using disticnt . My expected result will have only one A. if i remove row_number() over() as rownum its working fine. Why this is happening ? how to resolve. AM using DB2!!
There are two rows in marks Table with id = 1 and mark = 50.. So you will get two rows in the output for each row in student table...
If you only want one, you have to do a group By
SELECT std.id, std.name, m.mark, row_number()
over() as rownum
FROM student std
JOIN marks m
ON m.id=std.id AND m.mark=50
Group By std.id, std.name, m.mark
Now that you've clarified your question as:
I want to find all students with a mark of 50 in at least one subject. I would use the query:
SELECT student.id, '50'
FROM student
WHERE EXISTS (SELECT 1 FROM marks WHERE marks.id = student.id AND marks.mark = 50)
This also gives you flexibility to change the criteria, e.g. at least one mark of 50 or less.
Similar to Charles answer, but you always want to put the predicate (mark=50) in the WHERE clause, so you're filtering before joining. If this is just homework it might not matter but you'll want to remember this if you ever hit any real data.
SELECT std.sid,
std.name,
m.mark,
row_number() over() AS rownum
FROM student std
JOIN marks m
ON std.sid=m.id
WHERE m.mark=50
GROUP BY std.sid, std.name, m.mark

How do I write a standard SQL GROUP BY that includes columns not in the GROUP BY clause

Let's say I have a table called Customer, defined like this:
Id Name DepartmentId Hired
1 X 101 2001/01/01
2 Y 102 2002/01/01
3 Z 102 2003/01/01
And I want to retrieve the date of the last hiring in each department.
Obviously I would do this
SELECT c.DepartmentId, MAX(c.Hired)
FROM Customer c
GROUP BY c.DepartmentId
Which returns:
101 2001/01/01
102 2003/01/01
But what do I do if I want to return the name of the guy hired? I.e. I would want this result set:
101 2001/01/01 X
102 2003/01/01 Z
Note that the following does not work, as it would return three rows rather than the two I'm looking for:
SELECT c.DepartmentId, c.Name, MAX(c.Hired)
FROM Customer c
GROUP BY c.DepartmentId
I can't remember seeing a query that achieves this.
NOTE: It's not acceptable to join on the Hired field, as that would not be guaranteed to be accurate.
A subselect would do the job and would handle the case where more than one person was hired in the same department on the same day:
SELECT c.DepartmentId, c.Name, c.Hired from Customer c,
(SELECT DepartmentId, MAX(Hired) as MaxHired
FROM Customer
GROUP BY DepartmentId) as sub
WHERE c.DepartmentId = sub.DepartmentId AND c.Hired = sub.MaxHired
Standard Sql:
select *
from Customer C
where exists
(
-- Linq to Sql put NULL instead ;-)
-- In fact, you can even put 1/0 here and would not cause division by zero error
-- An RDBMS do not parse the select clause of correlated subquery
SELECT NULL
FROM Customer
where c.DepartmentId = DepartmentId
GROUP BY DepartmentId
having c.Hired = MAX(Hired)
)
If Sql Server happens to support tuple testing, this is the most succint:
select *
from Customer
where (DepartmentId, Hired) in
(select DepartmentId, MAX(Hired)
from Customer
group by DepartmentId)
SELECT a.*
FROM Customer AS a
JOIN
(SELECT DepartmentId, MAX(Hired) AS Hired
FROM Customer GROUP BY DepartmentId) AS b
USING (DepartmentId,Hired);