Finding group maxes in SQL join result [duplicate] - sql

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
SQL: Select first row in each GROUP BY group?
Two SQL tables. One contestant has many entries:
Contestants Entries
Id Name Id Contestant_Id Score
-- ---- -- ------------- -----
1 Fred 1 3 100
2 Mary 2 3 22
3 Irving 3 1 888
4 Grizelda 4 4 123
5 1 19
6 3 50
Low score wins. Need to retrieve current best scores of all contestants ordered by score:
Best Entries Report
Name Entry_Id Score
---- -------- -----
Fred 5 19
Irving 2 22
Grizelda 4 123
I can certainly get this done with many queries. My question is whether there's a way to get the result with one, efficient SQL query. I can almost see how to do it with GROUP BY, but not quite.
In case it's relevant, the environment is Rails ActiveRecord and PostgreSQL.

Here is specific postgresql way of doing this:
SELECT DISTINCT ON (c.id) c.name, e.id, e.score
FROM Contestants c
JOIN Entries e ON c.id = e.Contestant_id
ORDER BY c.id, e.score
Details about DISTINCT ON are here.
My SQLFiddle with example.
UPD To order the results by score:
SELECT *
FROM (SELECT DISTINCT ON (c.id) c.name, e.id, e.score
FROM Contestants c
JOIN Entries e ON c.id = e.Contestant_id
ORDER BY c.id, e.score) t
ORDER BY score

The easiest way to do this is with the ranking functions:
select name, Entry_id, score
from (select e.*, c.name,
row_number() over (partition by e.contestant_id order by score) as seqnum
from entries e join
contestants c
on c.Contestant_id = c.id
) ec
where seqnum = 1

I'm not familiar with PostgreSQL, but something along these lines should work:
SELECT c.*, s.Score
FROM Contestants c
JOIN (SELECT MIN(Score) Score, Contestant_Id FROM Entries GROUP BY Contestant_Id) s
ON c.Id=s.Contestant_Id

one of solutions is
select min(e.score),c.name,c.id from entries e
inner join contestants c on e.contestant_id = c.id
group by e.contestant_id,c.name,c.id
here is example
http://sqlfiddle.com/#!3/9e307/27

This simple query should do the trick..
Select contestants.name as name, entries.id as entry_id, MIN(entries.score) as score
FROM entries
JOIN contestants ON contestants.id = entries.contestant_id
GROUP BY name
ORDER BY score
this grabs the min score for each contestant and orders them ASC

Related

How to sum up max values from another table with some filtering

I have 3 tables
User Table
id
Name
1
Mike
2
Sam
Score Table
id
UserId
CourseId
Score
1
1
1
5
2
1
1
10
3
1
2
5
Course Table
id
Name
1
Course 1
2
Course 2
What I'm trying to return is rows for each user to display user id and user name along with the sum of the maximum score per course for that user
In the example tables the output I'd like to see is
Result
User_Id
User_Name
Total_Score
1
Mike
15
2
Sam
0
The SQL I've tried so far is:
select TOP(3) u.Id as User_Id, u.UserName as User_Name, SUM(maxScores) as Total_Score
from Users as u,
(select MAX(s.Score) as maxScores
from Scores as s
inner join Courses as c
on s.CourseId = c.Id
group by s.UserId, c.Id
) x
group by u.Id, u.UserName
I want to use a having clause to link the Users to Scores after the group by in the sub query but I get a exception saying:
The multi-part identifier "u.Id" could not be bound
It works if I hard code a user id in the having clause I want to add but it needs to be dynamic and I'm stuck on how to do this
What would be the correct way to structure the query?
You were close, you just needed to return s.UserId from the sub-query and correctly join the sub-query to your Users table (I've joined in reverse order to you because to me its more logical to start with the base data and then join on more details as required). Taking note of the scope of aliases i.e. aliases inside your sub-query are not available in your outer query.
select u.Id as [User_Id], u.UserName as [User_Name]
, sum(maxScore) as Total_Score
from (
select s.UserId, max(s.Score) as maxScore
from Scores as s
inner join Courses as c on s.CourseId = c.Id
group by s.UserId, c.Id
) as x
inner join Users as u on u.Id = x.UserId
group by u.Id, u.UserName;

How to group results by count of relationships

Given tables, Profiles, and Memberships where a profile has many memberships, how do I query profiles based on the number of memberships?
For example I want to get the number of profiles with 2 memberships. I can get the number of profiles for each membership with:
SELECT "memberships"."profile_id", COUNT("profiles"."id") AS "membership_count"
FROM "profiles"
INNER JOIN "memberships" on "profiles"."id" = "memberships"."profile_id"
GROUP BY "memberships"."profile_id"
That returns results like
profile_id | membership_count
_____________________________
1 2
2 5
3 2
...
But how do I group and sum the counts to get the query to return results like:
n | profiles_with_n_memberships
_____________________________
1 36
2 28
3 29
...
Or even just a query for a single value of n that would return
profiles_with_2_memberships
___________________________
28
I don't have your sample data, but I just recreated the scenario here with a single table : Demo
You could LEFT JOIN the counts with generate_series() and get zeroes for missing count of n memberships. If you don't want zeros, just use the second query.
Query1
WITH c
AS (
SELECT profile_id
,count(*) ct
FROM Table1
GROUP BY profile_id
)
,m
AS (
SELECT MAX(ct) AS max_ct
FROM c
)
SELECT n
,COUNT(c.profile_id)
FROM m
CROSS JOIN generate_series(1, m.max_ct) AS i(n)
LEFT JOIN c ON c.ct = i.n
GROUP BY n
ORDER BY n;
Query2
WITH c
AS (
SELECT profile_id
,count(*) ct
FROM Table1
GROUP BY profile_id
)
SELECT ct
,COUNT(*)
FROM c
GROUP BY ct
ORDER BY ct;

how to query the value that has changed +/- 10% of the value from first encounter of each patient in sql server?

I have this query that finds users with multiple hospital visits.
Table has about 593 columns, so I don't think I can show you the structure. But let's assume these are basic patients table with following columns.
id, sex, studyDate, referringPhysician, bmi, bsa, height, weight, bloodPressure, heartRate. These are also in the real table.
The patient visits the hospital and has some worked done. What we would like to find is how much of patient's bmi has changed since the first encounter. For example,
ID |SEX| StudyDate | Physician|BMI| BSA | ht| Wt | BP | HR |
1 PatientA | M | 2017-09-11 | Dr. Hale | 60| 2.03 | 6 | 282 | 116/82 | 77 |
2 PatientA | M | 2017-12-11 | Dr. Hale | 58| 2.03 | 6 | 296 | 126/82 | 72 |
3 PatientA | M | 2018-03-17 | Dr. Hale | 50| 2.03 | 6 | 282 | 126/82 | 72 |
In the example above, row 1 was the first encounter and the BMI was 60. In row 2, the bmi decreased to 58, but it's not more than 10%. So, that shouldn't be displayed. However, row 3 has bmi 50 which is decreased by more than 10% of bmi in row 1. That should be displayed.
I'm sorry, I don't have the data that I can share.
with G as(
select * from Patients P
inner join (
select count(*) as counts, ID as oeID
from Patients
group by ID
Having count(*) > 2
) oe on P.ID = oe.oeID where P.BMI > 30
)
select * from G
order by StudyDate asc;
From this, what I'd like to do is find out patients whose BMI has changed by 10% from the first encounter.
How can I do this?
Can you also help me understand the concept of for-each users in SQL, and how it handles such queries?
Guessing at your data model here...I suspect you've got a heavily denormalized structure here with everything being crammed into one table. You would be far better off to have a patient table separate from the table that stores their visits. The with G syntax here is very unneeded as well, especially if you are just doing a select * from the table after. Heh, I'm trying to get into medical analytics, so will give this a try.
I'll build this as I see your data model...you may have to change a step here and there to fit your column names. Lets start by getting first and most recent (last) visit dates by id
select id, min(StudyDate) as first_visit, max(studydate) as last_visit
from patients
group by id
having min(StudyDate) <> max(StudyDate)
Simply query at this point and by using the having clause we ensure that these are two separate visits. But we are lacking the BMI numbers for these visits...so we will have to join back to the patient table to grab them. We will iunclude a where clause to ensure only the +/- of 10% is found
select a.id, a.first_visit, a.last_visi, b.bmi as first_bmi, c.bmi as last_bmi, b.bmi - c.bmi as bmi_change
from
(select id, min(StudyDate) as first_visit, max(studydate) as last_visit
from patients
group by id
having min(StudyDate) <> max(StudyDate) a
inner join patients b on b.id = a.id and b.study_date = a.first_visit
inner join patients c on c.id = a.id and c.study_date = a.last_visit
where b.bmi - c.bmi >= 10 or b.bmi - c.bmi <= -10
Hopefully that makes sense, you'll want to change the top select line to grab all the fields you actually want to return, I'm just returning the ones of interest to your question
Part 2:
Lets approach this from a similar angle:
select id, min(StudyDate) as first_visit
from patients
group by id
Now we've got the first visit date. Lets join back to patients and get the bmi here.
select a.id, first_visit, p.bmi
from
(select id, min(StudyDate) as first_visit
from patients
group by id) a
inner join patients b on a.first_visit = b.studydate and a.id = b.id
This will simply be a list of each patient by ID giving us their first_visit date and their BMI on that first visit. Now we want to compare this bmi to all subsequent visits...so lets join all rows to back to this query. Subquery a below is simply the query above in brackets:
select a.id, a.first_visit, b.study_date, a.bmi, b.bmi, a.bmi-b.bmi as bmi_change
from
(select a.id, first_visit, b.bmi
from
(select id, min(StudyDate) as first_visit
from patients
group by id) a
inner join patients b on a.first_visit = b.studydate and a.id = b.id) a
inner join patients b on a.id = b.id
where a.bmi - b.bmi >= 10 or a.bmi - b.bmi <= -10
Similar idea, instead of joining on the max_date to get most recent, we are joining to all records for that patient and running the math from there. In the commented example, this will give rows 3,5,6.
Part 3
A little more complex...getting rows 3,4,5,6 when row 4 shows less than a 10 change in BMI means you are now trying to pick out the first date that the 10 change is seen and displaying all records from that. Lets call the query in part 2 subquery a and go pseudo code for a moment:
Select id, min(studydate)
from (subquerya) a
(subquerya) simply stands for the entire query used at the end of part 2. This will grab the study date of the first time a bmi change of over 10 is detected for each patient id (in our comment example, it would be visit 3). Now we can join back to patients, this time getting all records that are equal to or more recent than the min(studydate) of the first time bmi changed more than 10 since the first visit
select a.id, b.studydate, b.bmi
from
(Select id, min(studydate) as min_studydate
from (subquerya) a) a
inner join patients b on a.id = b.id and a.min_studydate <= b.studydate
This will bring back the list of all study dates happening after the first time a bmi change more than 10 was detected (3,4,5,6 from our comment example). Of course we've now lost the first study date's bmi value, so lets add that back in and bring the query all together.
select a.id, b.studydate, b.bmi, c.bmi as start_bmi, c.bmi - b.bmi as bmi_change
from
(Select id, min(studydate) as min_studydate
from ( select a.id, a.first_visit, b.study_date, a.bmi, b.bmi, a.bmi-b.bmi as bmi_change
from
(select a.id, first_visit, b.bmi
from
(select id, min(StudyDate) as first_visit
from patients
group by id) a
inner join patients b on a.first_visit = b.studydate and a.id = b.id) a
inner join patients b on a.id = b.id
where a.bmi - b.bmi >= 10 or a.bmi - b.bmi <= -10) a) a
inner join patients b on a.id = b.id and a.min_studydate <= b.studydate
inner join (select a.id, first_visit, p.bmi
from
(select id, min(StudyDate) as first_visit
from patients
group by id) a
inner join patients b on a.first_visit = b.studydate and a.id = b.id) c on c.id = a.id
If I have everything right, this should bring back rows 3,4,5,6 and the change in BMI across each visit. I've left a few more columns in there than need be and it could be cleaned a little, but all logic should be there. I don't have

How to select the highest value after a count() | Sql Oracle

This is my query:
SELECT f.name, COUNT(*) as num_books
from author f
JOIN book b on b.tittle = f.book
Group by f.name
Which gives me this table:
NAME NUM_BOOKS
-------------------------------------------------- ----------
Dyremann 2
Nam mann 1
Thomas 1
Asgeir 1
Tullemann 5
Plantemann 1
Beste forfatter 1
Fagmann 5
Lars 1
Hans 1
Svein Arne 1
How could I easly alter the query to only display the author with the highest amount of released books? (While keeping in mind I'm rather new to sql)
Oracle, and as far as I know - only Oracle, allows you to nest two aggregate functions.
SELECT max (f.name) keep (dense_rank last order by count (*)) as name
from author f
JOIN book b on b.tittle = f.book
Group by f.name
In order to get ALL top authors:
select name
from (SELECT f.name,rank () over (order by count(*) desc) as rnk
from author f
JOIN book b on b.tittle = f.book
Group by f.name
)
where rnk = 1
Since Oracle 12c:
SELECT f.name
from author f
JOIN book b on b.tittle = f.book
Group by f.name
order by count (*) desc
fetch first row /* with ties (optional, in order to get all top authors) */
The best way to do is to use:
SELECT f.name, COUNT(*) as num_books
from author f
JOIN book b on b.tittle = f.book
Group by f.name
Order by num_books DESC
FETCH FIRST ROW ONLY
This will order the results from biggest to smallest and return the first result.
1) Oracle Specific : ( Using ROWNUM, For Postgres/MySql use limit )
select * from
(SELECT f.name, COUNT(*) as num_books
from author f
JOIN book b on b.tittle = f.book
Group by f.name order by num_books desc )
where ROWNUM = 1
2) General Query for all databases :
select f.name,count(*) as max_num_books from author f
JOIN book b on b.tittle = f.book
Group by f.name
having count(*) =
(select max(num_books)
from
(SELECT f.name, COUNT(*) as num_books
from author f
JOIN book b on b.tittle = f.book
Group by f.name)
);
I am not sure why you need a join in the first place. It appears that the author table has a column book - why is it not enough to count(book) from that table, grouping by name? This arrangement is very strange - the author table should only have author properties, the author name should be in the title table, but you do join on author.book = book.title which seems to suggest that you do, in fact, have that strange arrangement (and therefore you don't need a join). Also, having a table and a column (in another table) share the same name, book, is a practice best to be avoided.
The most elementary solution (not the most efficient though), in this case, is
select name, count(book) as max_num_books
from author
group by name
having count(book) = (select max(count(book) from author group by name);
The subquery groups by name, and then it selects the max over all group counts. The outer query selects the names that have a book count equal to this maximum. The subquery returns a single row in a single column - a single value. Such a query is called a "scalar" subquery and can be used wherever a single value is needed, such as the HAVING clause of the outer query. (It's in the HAVING clause and not a WHERE clause, since it refers to group properties - count(book) - and not to individual row properties).
The more efficient solution is as Dudu showed:
select name, ct as max_num_books
from ( select name, count(*) as ct, rank() over (order by count(*) desc) rnk
from author
group by name
)
where rnk = 1;

Join/merge 3 tables with count

I have 3 tables:
people:
pid name
1 Cal
2 Example
3 Another
4 Person
talkingPoints:
tid pid talkingPoint
1 1 "..."
2 1 "..."
3 2 "..."
facts:
fid pid fact
1 3 "..."
2 2 "..."
And I'm trying to combine a count of talkingPoints and facts to 'people', for example:
pid name talkingPoints facts
1 Cal 2 null
2 Example 1 1
3 Another null 1
4 Person null null
(ordered by talkingPoints desc, then alphabetical, including 'people' rows which do not have any values for the counts)
I managed to combine 'people' with only one other table:
SELECT a.pid,a.name,
count(b.tid)
FROM people a, talkingPoints b
WHERE a.pid=b.pid
GROUP BY b.pid;
but that query ignores rows with a zero count (e.g. the row 'Person')
I hacked up this query which works correctly for 'talkingPoints' but I have not been able to adapt it to also combine 'facts' like my example table above.
select people.pid, people.name, x.talkingPoints from people left join
(select pid, name, count(name) talkingPoints from
(select people.pid, people.name from people
join talkingPoints on talkingPoints.pid = people.pid)
as talkingPoints group by talkingPoints.pid)
as x on x.pid = people.pid order by talkingPoints desc, people.name asc;
(probably a terrible way but it worked in the meantime)
How can I adapt my queries so they will output a table like my example?
SELECT a.pid,
a.name,
COUNT(DISTINCT b.tid) talkingpoints,
COUNT(DISTINCT c.fid) facts
FROM people a
LEFT JOIN talkingPoints b
ON a.pid = b.pid
LEFT JOIN facts c
ON a.pid = c.pid
GROUP BY a.pid, a.name
ORDER BY a.pid
SQLFiddle Demo
Using independent correlated subqueries in the SELECT clause avoids the duplicates caused by the joins:
SELECT pid,
name,
(SELECT COUNT(*)
FROM talkingPoints
WHERE pid = people.pid) AS talkingPoints,
(SELECT COUNT(*)
FROM facts
WHERE pid = people.pid) AS facts
FROM people
SQLFiddle