SQL to group on maximum of two columns - sql

I am having trouble displaying data from two tables, using what I think should be a group method.
I currently have a table containing pupils, and another containing the grades achieved (points) each year and term. See below:
PupilID, FirstName, Surname, DOB
GradeID, PupilID, SchoolYear, Term, Points
I want to query both tables and display all pupils with their latest grade, this should look for the maximum SchoolYear, then the maximum Term, and display the Points alongside the PupilID, FirstName and Surname.
I would appreciate any help anyone can offer with this

This will select the latest grade per pupil based on SchoolYear and Term
select * from (
select p.*, g.schoolyear, g.term,
row_number() over (partition by PupilID order by SchoolYear desc, Term desc) rn
from pupils p
join grades g on g.PupilID = p.PupilID
) t1 where rn = 1

try this
declare varSchoolYear int
declare vartrem int
set varSchoolYear=(select max (SchoolYear) from Grade)
set vartrem=(select max(term) from Pupil where SchoolYear=varSchoolYear)
select a.firstname,b.idgrade
from pupil a
inner join grade b
on a.pupilid = b.pupilid
where b.term=vartrem and b.SchoolYear=varSchoolYear

Related

SQL Server Get one row for each student with highest date

I have two tables as follows:
I want to find the StudentId, FirstName, StudentLoginInfoId, LoginDate. I am expecting only one entry per student with higher LoginDate.
Expected result:
You could use ROW_NUMBER to number output of the result-set for each partition (here each student) in a subquery and achieve your desired output by applying a condition of the number assigned for each student to be 1 which will equal one row.
select studentid, firstname, studentlogininfoid, logindate
from (
select
s.studentid, s.firstname, sl.studentlogininfoid, sl.logindate,
row_number() over (partition by sl.studentid order by sl.logindate desc) as rn
from student s
inner join studentlogininfoid sl on s.studentid = sl.studentid
) t
where rn = 1
Explaining arguments for row_number:
PARTITION BY specifies what are your groups to enumerate separately (start from 1 for each group)
ORDER BY specifies how should rows be enumerated (based on which order)
If we enumerate rows for each student and sort them from latest date descending, then the first row for each student (the row with rn = 1) will contain highest login date value for that student.
You can use "CROSS APPLY" to find what you want:
SELECT S.StudentId
, S.FirstName
, SLI.StudentLoginInfoId
, SLI.LoginDate
FROM Student S
CROSS APPLY (SELECT TOP 1 * FROM StudentLoginInfo SLI WHERE S.StudentId = SLI.StudentId ORDER BY LoginDate DESC) SLI

SQL random sample with groups

I have a university graduate database and would like to extract a random sample of data of around 1000 records.
I want to ensure the sample is representative of the population so would like to include the same proportions of courses eg
I could do this using the following:
select top 500 id from degree where coursecode = 1 order by newid()
union
select top 300 id from degree where coursecode = 2 order by newid()
union
select top 200 id from degree where coursecode = 3 order by newid()
but we have hundreds of courses codes so this would be time consuming and I would like to be able to reuse this code for different sample sizes and don't particularly want to go through the query and hard code the sample sizes.
Any help would be greatly appreciated
You want a stratified sample. I would recommend doing this by sorting the data by course code and doing an nth sample. Here is one method that works best if you have a large population size:
select d.*
from (select d.*,
row_number() over (order by coursecode, newid) as seqnum,
count(*) over () as cnt
from degree d
) d
where seqnum % (cnt / 500) = 1;
EDIT:
You can also calculate the population size for each group "on the fly":
select d.*
from (select d.*,
row_number() over (partition by coursecode order by newid) as seqnum,
count(*) over () as cnt,
count(*) over (partition by coursecode) as cc_cnt
from degree d
) d
where seqnum < 500 * (cc_cnt * 1.0 / cnt)
Add a table for storing population.
I think it should be like this:
SELECT *
FROM (
SELECT id, coursecode, ROW_NUMBER() OVER (PARTITION BY coursecode ORDER BY NEWID()) AS rn
FROM degree) t
LEFT OUTER JOIN
population p ON t.coursecode = p.coursecode
WHERE
rn <= p.SampleSize
It is not necessary to partition the population at all.
If you are taking a sample of 1000 from a population among hundreds of course codes, it stands to reason that many of those course codes will not be selected at all in any one sampling.
If the population is uniform (say, a continuous sequence of student IDs), a uniformly-distributed sample will automatically be representative of population weighting by course code. Since newid() is a uniform random sampler, you're good to go out of the box.
The only wrinkle that you might encounter is if a student ID is a associated with multiple course codes. In this case make a unique list (temporary table or subquery) containing a sequential id, student id and course code, sample the sequential id from it, grouping by student id to remove duplicates.
I've done similar queries (but not on MS SQL) using a ROW_NUMBER approach:
select ...
from
( select ...
,row_number() over (partition by coursecode order by newid()) as rn
from degree
) as d
join sample size as s
on d.coursecode = s.coursecode
and d.rn <= s.samplesize

How to do the max count part in SQL?

I was told to Find out which occupation has the greatest number of patients with conditionID=MC8
I dk how to do the greatest part.....
Here my code right now
SELECT occupation
FROM Patient
WHERE EXISTS
(SELECT PatientID FROM PatientMedcon
Where conditionID=’MC8’)
GROUP BY occupation
HAVNG count(occupation) = (Select max(occupation)
From Patient
You should approach these types of queries using regular joins and then add additional factors. The following gets the count of patients for each occupation with that condition:
SELECT occupation, COUNT(*)
FROM Patient p JOIN
PatentMedcon pm
ON p.PatientId = pm.PatientId and
pm.conditionId = 'MC8'
GROUP BY occupation
ORDER BY COUNT(*) DESC;
If you want the top row, that depends on the database. It might be select top 1, limit 1 at the end, fetch first 1 rows only at the end, or even something else.

SQL Finding maximum value without top command

Let's say I have a bases with a table:
-courses (key: name [ofthecourse], other attributes: year in which the course takes place)
I want to complete a query looking for an answer to the question:
On which year of study there is a maximum number of courses?
Normally, the query would be:
SELECT TOP 1 STUDYEAR
FROM COURSES
GROUP BY STUDYEAR
ORDER BY COUNT(CNO) DESC;
But my question is, which query could complete this without using the TOP 1 phrase?
You can use an inner query to get the maximum count. The only difference is though that it can return more than one record if they have the same count.
SELECT STUDYEAR
FROM COURSES
GROUP BY STUDYEAR
HAVING COUNT(CNO) = (SELECT MAX(CNOCount) FROM
(SELECT COUNT(CNO) CNOCount
FROM COURSES
GROUP BY STUDYEAR) X)
Another version with only one inner query:
SELECT STUDYEAR
FROM
(SELECT STUDYEAR, ROW_NUMBER() OVER (ORDER BY COUNT(CNO) DESC) RowNumber
FROM COURSES
GROUP BY STUDYEAR) X
WHERE RowNumber = 1

How to find the highest populated instance in a column in SQL

So I have a table (person), that contains columns such as persons name, age, eye-color, favorite movie.
How do I find the most popular eye color(s), returning just the eye color (not the count) using SQL (Microsft Access), without using top as there might be multiple colours with the same count.
Thank you
SELECT
EyeColor
FROM
Person
GROUP BY
EyeColor
HAVING
COUNT(*) = (
SELECT MAX(i.EyeColorCount) FROM (
SELECT COUNT(*) AS EyeColorCount FROM Person GROUP BY EyeColor
) AS i
)
In Access, I think you need something on the lines of:
SELECT First(t.Eyecolor) AS FirstOfEyeColor
FROM (SELECT p.EyeColor, Count(p.EyeColor) AS C
FROM Person p
GROUP BY p.EyeColor
ORDER BY Count(p.EyeColor) DESC) AS t;