Ranking Aggregate Field in Access Query - sql

I am trying to rank an aggregate field in access but my efforts are in vain with errors based on referencing. I am ranking using a subquery but the problem comes about due to the alias names resulting from performing an average on a field. The code is as below:
SELECT [Exams].[StudentID],
Avg([Exams].[Biology]) AS [AvgBiology],
(SELECT Avg(T.Biology) AS [TAvgBiology],
Count(*)
FROM [Exams] AS T
WHERE T.[TAvgBiology] > [AvgBiology])
+ 1 AS Rank
FROM [Exams]
GROUP BY [Exams].[StudentID]
ORDER BY Avg([Exams].[Biology]) DESC;
Errors that come about state: "You have selected a subquery that can return more than one value blah blah...please use the Exist keyword.. ".
From the code above I think you get the gist of what I am trying to achieve.

Start with the basic GROUP BY query Gordon Linoff suggested to compute the average Biology for each StudentID.
SELECT
e.StudentID,
Avg(e.Biology) AS AvgBiology
FROM Exams AS e
GROUP BY e.StudentID
Save that query as qryAvgBiology and then use it in another query where you compute Rank.
SELECT
q.StudentID,
q.AvgBiology,
(
(
SELECT Count(*)
FROM qryAvgBiology AS q2
WHERE q2.AvgBiology > q.AvgBiology
)
+1
) AS Rank
FROM qryAvgBiology AS q
ORDER BY 3;
For example, if qryAvgBiology returns this result set ...
StudentID AvgBiology
--------- ----------
1 70
2 80
3 90
The ranking query will transform it to this ...
StudentID AvgBiology Rank
--------- ---------- ----
3 90 1
2 80 2
1 70 3

I assume your basic query is:
SELECT e.StudentId Avg(e.Biology) AS AvgBiology
FROM exams as e
GROUP BY e.StudentId;
(Square braces don't help me understand the query at all.)
I think the following will work in Access:
SELECT e.StudentId Avg(e.Biology) AS AvgBiology,
(SELECT 1 + COUNT(*)
FROM (SELECT e.StudentId, Avg(e.Biology) AS AvgBiology
FROM exams as e
GROUP BY e.StudentId
) e2
WHERE e2.AvgBiology > Avg(e.Biology)
) as ranking
FROM exams as e
GROUP BY e.StudentId;

Related

SQL: Finding duplicate records based on custom criteria

I need to find duplicates based on two tables and based on custom criteria. The following determines whether it's a duplicate, and if so, show only the most recent one:
If Employee Name and all EmployeePolicy CoverageId(s) are an exact match another record, then that's considered a duplicate.
--Employee Table
EmployeeId Name Salary
543 John 54000
785 Alex 63000
435 John 75000
123 Alex 88000
333 John 67000
--EmployeePolicy Table
EmployeePolicyId EmployeeId CoverageId
1 543 8888
2 543 7777
3 785 5555
4 435 8888
5 435 7777
6 123 4444
7 333 8888
8 333 7776
For example, the duplicates in the example above are the following:
EmployeeId Name Salary
543 John 54000
435 John 75000
This is because they are the only ones that have a matching name in the Employee table as well as both have the same exact CoverageIds in the EmployeePolicy table.
Note: EmployeeId 333 also with Name = John is not a match because both of his CoverageIDs are not the same as the other John's CoverageIds.
At first I have been trying to find duplicates the old fashioned way by Grouping records and saying having count(*) > 1, but then quickly realized that it would not work because while in English my criteria defines a duplicate, in SQL the CoverageIDs are different so they are NOT considered duplicates.
By that same accord, I tried something like:
-- Create a TMP table
INSERT INTO #tmp
SELECT *
FROM Employee e join EmployeePolicy ep on e.EmpoyeeId = ep.EmployeeId
SELECT info.*
FROM
(
SELECT
tmp.*,
ROW_NUMBER() OVER(PARTITION BY tmp.Name, tmp.CoverageId ORDER BY tmp.EmployeeId DESC) AS RowNum
FROM #tmp tmp
) info
WHERE
info.RowNum = 1 AND
Again, this does not work because SQL does not see this as duplicates. Not sure how to translate my English definition of duplicate into SQL definition of duplicate.
Any help is most appreciated.
The easiest way is to concatenate the policies into a string. That, alas, is cumbersome in SQL Server. Here is a set-based approach:
with ep as (
select ep.*, count(*) over (partition by employeeid) as cnt
from employeepolicy ep
)
select ep.employeeid, ep2.employeeid
from ep join
ep ep2
on ep.employeeid < ep2.employeeid and
ep.CoverageId = ep2.CoverageId and
ep.cnt = ep2.cnt
group by ep.employeeid, ep2.employeeid, ep.cnt
having count(*) = cnt -- all match
The idea is to match the coverages for different employees. A simple criteria is that the number of coverages need to match. Then, it checks that the number of matching coverages is the actual count.
Note: This puts the employee id pairs in a single row. You can join back to the employees table to get the additional information.
I have not tested the T-SQL but I believe the following should give you the output you are looking for.
;WITH CTE_Employee
AS
(
SELECT E.[Name]
,E.[EmployeeId]
,P.[CoverageId]
,E.[Salary]
FROM Employee E
INNER JOIN EmployeePolicy P ON E.EmployeeId = P.EmployeeId
)
, CTE_DuplicateCoverage
AS
(
SELECT E.[Name]
,E.[CoverageId]
FROM CTE_Employee E
GROUP BY E.[Name], E.[CoverageId]
HAVING COUNT(*) > 1
)
SELECT E.[EmployeeId]
,E.[Name]
,MAX(E.[Salary]) AS [Salary]
FROM CTE_Employee E
INNER JOIN CTE_DuplicateCoverage D ON E.[Name] = D.[Name] AND E.[CoverageId] = D.[CoverageId]
GROUP BY E.[EmployeeId], E.[Name]
HAVING COUNT(*) > 1
ORDER BY E.[EmployeeId]

SQL - Count Results of 2 Columns

I have the following table which contains ID's and UserId's.
ID UserID
1111 11
1111 300
1111 51
1122 11
1122 22
1122 3333
1122 45
I'm trying to count the distinct number of 'IDs' so that I get a total, but I also need to get a total of ID's that have also seen the that particular ID as well... To get the ID's, I've had to perform a subquery within another table to get ID's, I then pass this into the main query... Now I just want the results to be displayed as follows.
So I get a Total No for ID and a Total Number for Users ID - Also would like to add another column to get average as well for each ID
TotalID Total_UserID Average
2 7 3.5
If Possible I would also like to get an average as well, but not sure how to calculate that. So I would need to count all the 'UserID's for an ID add them altogether and then find the AVG. (Any Advice on that caluclation would be appreciated.)
Current Query.
SELECT DISTINCT(a.ID)
,COUNT(b.UserID)
FROM a
INNER JOIN b ON someID = someID
WHERE a.ID IN ( SELECT ID FROM c WHERE GROUPID = 9999)
GROUP BY a.ID
Which then Lists all the IDs and COUNT's all the USERID.. I would like a total of both columns. I've tried warpping the query in a
SELECT COUNT(*) FROM (
but this only counts the ID's which is great, but how do I count the USERID column as well
You seem to want this:
SELECT COUNT(DISTINCT a.ID), COUNT(b.UserID),
COUNT(b.UserID) * 1.0 / COUNT(DISTINCT a.ID)
FROM a INNER JOIN
b
ON someID = someID
WHERE a.ID IN ( SELECT ID FROM c WHERE GROUPID = 9999);
Note: DISTINCT is not a function. It applies to the whole row, so it is misleading to put an expression in parentheses after it.
Also, the GROUP BY is unnecessary.
The 1.0 is because SQL Server does integer arithmetic and this is a simple way to convert a number to a decimal form.
You can use
SELECT COUNT(DISTINCT a.ID) ...
to count all distinct values
Read details here
I believe you want this:
select TotalID,
Total_UserID,
sum(Total_UserID+TotalID) as Total,
Total_UserID/TotalID as Average
from (
SELECT (DISTINCT a.ID) as TotalID
,COUNT(b.UserID) as Total_UserID
FROM a
INNER JOIN b ON someID = someID
WHERE a.ID IN ( SELECT ID FROM c WHERE GROUPID = 9999)
) x

sql combining 2 queries with different order by group by

I have a query where I am counting the most frequent response in a database and ranking them by highest amount so using group by and order by.
The following shows how to do it for one:
select health, count(health) as count
from [Health].[Questionaire]
group by Health
order by count(Health) desc
which outputs the following:
Health Count
----------- -----
Very Good 6
Good 5
Poor 4
I would like to do with another column on the same table another query similar to the following so two queries using one sql statement like the following:
Health Count Diet Count
----------- ----- ----- -----
Very Good 6 Very Good 6
Good 5 Good 4
Poor 4 Poor 3
UPDATE!!
Hello this is how the table looks like at the moment
ID Diet Health
----------- ----- -------
101 Very Good Very Good
102 Poor Good
103 Poor Poor
I would like to do with another column on the same table another query similar to the following so two queries using one sql statement like the following:
Health Count Diet Count
----------- ----- ----- -----
Very Good 2 Very Good 1
Poor 1 Good 1
Good 0 Poor 1
Can anyone please help me out with this one?
Can provide further clarification if needed!
Here are 2 different ways of doing it, notice i removed the redundant column:
Test data:
DECLARE #t table(Health varchar(20), Diet varchar(20))
INSERT #t values
('Very good', 'Very good'),
('Poor', 'Good'),
('Poor', 'Poor')
Query 1:
;WITH CTE1 as
(
SELECT Health, count(*) CountHealth
FROM #t --[Health].[Questionaire]
GROUP BY health
), CTE2 as
(
SELECT Diet, count(*) CountDiet
FROM #t --[Health].[Questionaire]
GROUP BY Diet
)
SELECT
coalesce(Health, Diet) Grade,
coalesce(CountHealth, 0) CountHealth,
coalesce(CountDiet, 0) CountDiet
FROM CTE1
FULL JOIN
CTE2
ON CTE1.Health = CTE2.Diet
ORDER BY CountHealth DESC
Result 1:
Grade CountHealth CountDiet
Poor 2 1
Very good 1 1
Good 0 1
Mixing the results like that is really not good practice, so here is a different solution
Query 2:
SELECT Health, count(*) Count, 'Health' Grade
FROM #t --[Health].[Questionaire]
GROUP BY health
UNION ALL
SELECT Diet, count(*) CountDiet, 'Diet'
FROM #t --[Health].[Questionaire]
GROUP BY Diet
ORDER BY Grade, Count DESC
Result 2:
Health Count Grade
Good 1 Diet
Poor 1 Diet
Very good 1 Diet
Poor 2 Health
Very good 1 Health
You need to join the table to itself, but (as your sample data shows) to deal with gaps in actual data for specific values.
If you have a table that has the range of health/diet values:
select
v.value Status,
count(a.id) healthCount,
count(b.id) DietCount
from health_diet_values v
left join Questionaire a on a.health = v.value
left join Questionaire b on b.diet = v.value
group by v.value
or if you don't have such a table, you need to generate the list of values manually and join from that:
select
v.value Status,
count(a.id) healthCount,
count(b.id) DietCount
from (select 'Very Good' value union all
select 'Good' union all
select 'Poor') v
left join Questionaire a on a.health = v.value
left join Questionaire b on b.diet = v.value
group by v.value
Both of these queries produce zeroes if there is no matching data for the value.
Note that in your desired output you have a redundant column - you repeat the value column. The above queries produce output that looks like:
Status HealthCount DietCount
-------------------------------
Very Good 2 1
Good 1 1
Poor 0 1

group by in sql to find max count

Update
I have a query like this
select sl.College_ID,sl.Department_ID,COUNT(sl.RegisterNumber) from StudentList sl
group by sl.College_ID,sl.Department_ID
order by sl.College_ID,sl.Department_ID asc
abouve query gives the below result
and i have 200 - college id and each college have 6 department_id i could get the count [No.of student ] in each department
College_Id Dept_Id count
1 1 100
1 2 210
2 3 120
2 6 80
3 1 340
but my question is i need to display the maximum count[student] for each department
some thing like this
college_ID Dept_Id count
3 1 340
26 2 250
and i tried this out but getting error
select sl.College_ID,sl.Department_ID,COUNT(sl.RegisterNumber) from StudentList sl
group by sl.College_ID,sl.Department_ID
having COUNT(sl.RegisterNumber)=max(COUNT(sl.RegisterNumber))
order by sl.College_ID,sl.Department_ID asc
what went wrong can some one help me
Maybe something like this?
SELECT sl.College_ID, sl.Department_ID, COUNT(sl.RegisterNumber) As StudentCount, s2.MaxCount
FROM StudentList sl
INNER JOIN (
SELECT Department_ID, MAX(StudentCount) AS MaxCount
FROM (
SELECT College_ID, Department_ID, COUNT(*) As StudentCount
FROM StudentList
GROUP BY College_ID, Department_ID
) s1
GROUP BY Department_ID
) s2 ON sl.Department_ID = s2.Department_ID
GROUP BY sl.College_ID, sl.Department_ID, s2.MaxCount
HAVING COUNT(sl.RegisterNumber) = s2.MaxCount
ORDER BY sl.College_ID, sl.Department_ID ASC
EDIT: I've updated the query to more accurately answer your question, I missed the part where you want the College_ID with the max count.
EDIT 2: Okay, this should work now, I needed a second nested subquery for aggregating the aggregates. I don't know of a better way to compare the aggregates of different groups.
The result you want, which is group on college_ID and you not really care about college_ID, since from you example, with Dept_Id=1 will can not sure which college_ID is.
In that case, you can remove college_ID from your select statement and do a SUM GROUP BY
base on your query, something like:
SELECT t.Department_ID, SUM(t.c)
FROM (
select sl.College_ID,sl.Department_ID,COUNT(sl.RegisterNumber) c from StudentList sl
group by sl.College_ID,sl.Department_ID
) t
GROUP BY t.Department_ID
ORDER BY SUM(t.c)
Note: If you really want college_ID in your result, you can do a JOIN to get your college_ID

how to avoid duplicate on Joining two tables

Student Table
SID Name
1 A
2 B
3 C
Marks Table
id mark subject
1 50 physics
2 40 biology
1 50 chemistry
3 30 mathematics
SELECT distinct(std.id),std.name,m.mark, row_number() over() as rownum FROM
student std JOIN marks m ON std.id=m.id AND m.mark=50
This result is 2 times A even after using disticnt . My expected result will have only one A. if i remove row_number() over() as rownum its working fine. Why this is happening ? how to resolve. AM using DB2!!
There are two rows in marks Table with id = 1 and mark = 50.. So you will get two rows in the output for each row in student table...
If you only want one, you have to do a group By
SELECT std.id, std.name, m.mark, row_number()
over() as rownum
FROM student std
JOIN marks m
ON m.id=std.id AND m.mark=50
Group By std.id, std.name, m.mark
Now that you've clarified your question as:
I want to find all students with a mark of 50 in at least one subject. I would use the query:
SELECT student.id, '50'
FROM student
WHERE EXISTS (SELECT 1 FROM marks WHERE marks.id = student.id AND marks.mark = 50)
This also gives you flexibility to change the criteria, e.g. at least one mark of 50 or less.
Similar to Charles answer, but you always want to put the predicate (mark=50) in the WHERE clause, so you're filtering before joining. If this is just homework it might not matter but you'll want to remember this if you ever hit any real data.
SELECT std.sid,
std.name,
m.mark,
row_number() over() AS rownum
FROM student std
JOIN marks m
ON std.sid=m.id
WHERE m.mark=50
GROUP BY std.sid, std.name, m.mark