SELECT a single field by ordered value - sql

Consider the following two tables:
student_id score date
-------------------------
1 10 05-01-2013
2 100 05-15-2013
2 60 05-01-2012
2 95 05-14-2013
3 15 05-01-2011
3 40 05-01-2012
class_id student_id
----------------------------
1 1
1 2
2 3
I want to get unique class_ids where the score is above a certain threshold for at least one student, ordered by the latest score.
So for instance, if I wanted to get a list of classes where the score was > 80, i would get class_id 1 as a result, since student 2's latest score was above > 80.
How would I go about this in t-sql?

Are you asking for this?
SELECT DISTINCT
t2.[class_ID]
FROM
t1
JOIN t2
ON t2.[student_id] = t1.[student_id]
WHERE
t1.[score] > 80

Edit based on your date requirement, then you could use row_number() to get the result:
select c.class_id
from class_student c
inner join
(
select student_id,
score,
date,
row_number() over(partition by student_id order by date desc) rn
from student_score
) s
on c.student_id = s.student_id
where s.rn = 1
and s.score >80;
See SQL Fiddle with Demo
Or you can use a WHERE EXISTS:
select c.class_id
from class_student c
where exists (select 1
from student_score s
where c.student_id = s.student_id
and s.score > 80
and s.[date] = (select max(date)
from student_score s1
where s.student_id = s1.student_id));
See SQL Fiddle with Demo

select distinct(class_id) from table2 where student_id in
(select distinct(student_id) from table1 where score > thresholdScore)

This should do the trick:
SELECT DISTINCT
CS.Class_ID
FROM
dbo.ClassStudent CS
CROSS APPLY (
SELECT TOP 1 *
FROM dbo.StudentScore S
WHERE CS.Student_ID = S.Student_ID
ORDER BY S.Date DESC
) L
WHERE
L.Score > 80
;
And here's another way:
WITH LastScore AS (
SELECT TOP 1 WITH TIES
FROM dbo.StudentScore
ORDER BY Row_Number() OVER (PARTITION BY Student_ID ORDER BY Date DESC)
)
SELECT DISTINCT
CS.Class_ID
FROM
dbo.ClassStudent CS
WHERE
EXISTS (
SELECT *
FROM LastScore L
WHERE
CS.Student_ID = L.Student_ID
AND L.Score > 80
)
;
Depending on the data and the indexes, these two queries could have very different performance characteristics. It is worth trying several to see if one stands out as superior to the others.
It seems like there could be some version of the query where the engine would stop looking as soon as it finds just one student with the requisite score, but I am not sure at this moment how to accomplish that.

Related

select value based on max of other column

I have a few questions about a table I'm trying to make in Postgres.
The following table is my input:
id
area
count
function
1
100
20
living
1
200
30
industry
2
400
10
living
2
400
10
industry
2
400
20
education
3
150
1
industry
3
150
1
education
I want to group by id and get the dominant function based on max area. With summing up the rows for area and count. When area is equal it should be based on max count, when area and count is equal it should be based on prior function (i still have to decide if education is prior to industry or vice versa). So the result should be:
id
area
count
function
1
300
50
industry
2
1200
40
education
3
300
2
industry
I tried a lot of things and maybe it's easy, but i don't get it. Can someone help to get the right SQL?
One method uses row_number() and conditional aggregation:
select id, sum(area), sum(count),
max(function) over (filter where seqnum = 1) as function
from (select t.*,
row_number() over (partition by id order by area desc) as seqnum
from t
) t
group by id;
Another method uses ``distinct on`:
select id, sum(area) over (partition by id) as area,
sum(count) over (partition by id) as count,
function
from t
order by id, area desc;
Use a scalar sub-query for "function".
select t.id, sum(t.area), sum(t.count),
(
select "function"
from the_table
where id = t.id
order by area desc, count desc, "function" desc
limit 1
) as "function"
from the_table as t
group by t.id order by t.id;
SQL Fiddle
you can use sum as window function:
select distinct on (t.id)
id,
sum(area) over (partition by id) as area,
sum(count) over (partition by id) as count,
( select function from tbl_test where tbl_test.id = t.id order by count desc limit 1 ) as function
from tbl_test t
This is how you get the function for each group based on id:
select id, function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null;
(we ensure that no yt2 exists that would be of the same id but of higher areay)
This would work nicely, but you might have several max areas with different values. To cope with this isue, let's ensure that exactly one is chosen:
select id, max(function) as function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null
group by id;
Now, let's join this to our main table;
select yourtable.id, sum(yourtable.area), sum(yourtable.count), t.function
from yourtable
join (
select id, max(function) as function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null
group by id
) t
on yourtable.id = t.id
group by yourtable.id;

How to count a temporary variable in SQLite?

I am working on a personal analytics project and I need to filter a SQL table. My SQL knowledge is very basic and moreover, I know that in Oracle but in this case I have to use SQLite and it seems to be quite different.
For example, suppose the table is
student physics chemistry maths history english
Brian 78 62 100 40 50
Bill 80 70 95 50 60
Brian 80 40 90 95 60
The table has repetition.
I asked a question earlier today, using the same example above, which would let me rank the subjects for each student.
How to RANK multiple columns of an entire table?
What I want to do now is find out which students had Maths in the top 3 among all subjects and group the table for each student. So the goal is to find out how many times did Brian have Maths in Top 3 of his scores.
IT WeiHan's answer to the previous question (https://www.db-fiddle.com/f/bjui5W1VWmHXcqKAhK5iBD/0 ) worked perfectly and displayed the rank of the subjects for each row. I used their answer and tried to modify it for this purpose.
with cte as (
select student,'physics' as class,physics as score from Table1 union all
select student,'chemistry' as class,chemistry as score from Table1 union all
select student,'maths' as class,maths as score from Table1 union all
select student,'history' as class,history as score from Table1 union all
select student,'english' as class,english as score from Table1
)
SELECT name,class,score,rnk,
(CASE
WHEN class = "maths" AND rnk <=3 THEN 1
ELSE 0
END) as maths_rank
FROM
(select student,class,score,RANK() OVER (partition by student order by score desc) rnk
from cte)
which gives a table like
name class score rnk maths_rank
Brian maths 100 1 1
I want to be able to count the maths_rank values or sum it (as it contains 1 or 0 values) and group the table on student name. I tried to count the maths_rank variable but that didn't work and resulted in errors. Please help me out with a solution.
If I understand correctly, you are on the right path. I think you just need a where clause:
with cte as (
select student,'physics' as class,physics as score from Table1 union all
select student,'chemistry' as class,chemistry as score from Table1 union all
select student,'maths' as class,maths as score from Table1 union all
select student,'history' as class,history as score from Table1 union all
select student,'english' as class,english as score from Table1
)
select t.*
from (select t.*,
rank() over (partition by student order by score desc) as subject_rank
from cte t
) t
where class = 'maths' and subject_rank <= 3;
Edit:
If you want the number of times maths was in the top 3, then:
select student, sum(case when class = 'maths' and subject_rank <= 3 then 1 else 0 end) as maths_top3
from (select t.*,
rank() over (partition by student order by score desc) as subject_rank
from cte t
) t
group by student;

Get sum of last 5 rows for each unique id

This is the query I'm using currently
SELECT SUM(score) as score FROM (SELECT p2.score FROM performance_buzz as p2 WHERE p2.player_id = 922 ORDER BY p2.id DESC LIMIT 5) as performance_buzz
But in the above query I need to pass player_id manually and I don't want to do that. I want to do this with the mysql way because I want to use this query as subquery for getting last 5 rows sum of score for each player
SELECT performance_buzz.id, performance_buzz.score as last_score, performance_buzz.name
FROM `performance_buzz`
LEFT JOIN performance_buzz m2 ON (performance_buzz.name = m2.name AND performance_buzz.id < m2.id)
WHERE m2.id IS NULL
GROUP BY performance_buzz.name
ORDER BY performance_buzz.id DESC
should be like this:
select player_id, sum(score) score
from
(
SELECT
#row_number:=CASE
WHEN #player_id = player_id THEN #row_number + 1
ELSE 1
END AS rn,
#player_id:=player_id as player_id,
score
FROM
performance_buzz,(SELECT #player_id:=0,#row_number:=0) as t
order by player_id, id desc
) a
where rn <= 5
group by player_id

Which row has the highest value?

I have a table of election results for multiple nominees and polls. I need to determine which nominee had the most votes for each poll.
Here's a sample of the data in the table:
PollID NomineeID Votes
1 1 108
1 2 145
1 3 4
2 1 10
2 2 41
2 3 0
I'd appreciate any suggestions or help anyone can offer me.
This will match the highest, and will also bring back ties.
select sd.*
from sampleData sd
inner join (
select PollID, max(votes) as MaxVotes
from sampleData
group by PollID
) x on
sd.PollID = x.PollID and
sd.Votes = x.MaxVotes
SELECT
t.NomineeID,
t.PollID
FROM
( SELECT
NomineeID,
PollID,
RANK() OVER (PARTITION BY i.PollID ORDER BY i.Votes DESC) AS Rank
FROM SampleData i) t
WHERE
t.Rank = 1
SELECT PollID, NomineeID, Votes
FROM
table AS ABB2
JOIN
(SELECT PollID, MAX(Votes) AS most_votes
FROM table) AS ABB1 ON ABB1.PollID = ABB2.PollID AND ABB1.most_votes = ABB2.Votes
Please note, if you have 2 nominees with the same number of most votes for the same poll, they'll both be pulled using this query
select Pollid, Nomineeid, Votes from Poll_table
where Votes in (
select max(Votes) from Poll_table
group by Pollid
);

grouping and aggregates with subqueries

I have a query that is designed to find the number of people who went to a hospital more than once. What I have works, but is there a way to do it without the subquery?
SELECT count(*) as counts, hospitals.hospitalname
FROM Patient INNER JOIN
hospitals ON Patient.hospitalnpi = hospitals.npi
WHERE (hospitals.hospitalname = 'X')
group by patientid, hospitalname
having count(patient.patientid) >1
order by count(*) desc
This will always return the number of correct rows (30), but not the number 30. If I remove the group by patientid then I get the entire result set returned.
I solved this problem by doing
select COUNT(*),hospitalname
from
(
SELECT count(*) as counts,hospitals.hospitalname
FROM hospitals INNER JOIN
Patient ON hospitals.npi = Patient.hospitalnpi
group by patientid, hospitals.hospitalname
having count(patient.patientid) >1
) t
group by t.hospitalname
order by t.hospitalname desc
I feel that there has to be a more elegant solution than using subqueries all the time. How could this be improved?
sample data from first query
row # revisits
1 2
2 2
3 2
4 2
same data from second, working query
row# hosp. name revisitAggregate
1 x 30
2 y 15
3 z 5
Simple one-to-many relationship between patient and hospitals
It's super hacky, but here you are:
SELECT TOP 1
ROW_NUMBER() OVER (order by patient.patientid) as Count
FROM
Patient
INNER JOIN hospitals
ON Patient.hospitalnpi = hospitals.npi
WHERE
(hospitals.hospitalname = 'X')
GROUP BY
patientid,
hospitalname
HAVING
count(patient.patientid) >1
ORDER BY
Count desc
select distinct hospitalname, count(*) over (partition by hospitalname) from (
SELECT hospitalname, count(*) over (partition by patientid,
hospitals.hospitalname) as counter
FROM hospitals INNER JOIN
Patient ON hospitals.npi = Patient.hospitalnpi
WHERE (hospitals.hospitalname = 'X')
) Z
where counter > 1