Grouping by country and discipline in SQLServer - sql

i've got a problem I cannot solve in SQLServer. There are 3 tables with data about the Olympics.
Table 1: Dysciplines - contains DisciplineID (int,PK) and Discipline (varchar)
Table 2: Athletes - contains AthleteID (int, FK), Athlete (varchar, PK), Nationality (varchar) and DisciplineID(int,FK)
Table3: Medals - contains AthleteID(int, PK), Year(int) and Medals(int)
I want to extract all the countries, that got more medals in their best discipline than in all the others combined. However, I am having problem with it.
Obviously I joined all the tables, but I'm not sure how do I continue. I tried:
WHERE MAX(SUM(dbo.Medals.Medals))>SUM(dbo.Medals.Medals)-MAX(SUM(dbo.Medals.Medals))
GROUP BY tab1.Dyscypline
But this is clearly wrong. I will be grateful for any help.

You would use aggregation and having. Start with the number of medals in each discipline in each country:
select a.nationality, a.disclipineid, count(*) as num_medals
from athletes a join
medals m
on a.AthleteID = m.AthleteID
group by a.nationality, a.disclipineid;
Then aggregate again:
select nationality
from (select a.nationality, a.disclipineid, count(*) as num_medals
from athletes a join
medals m
on a.AthleteID = m.AthleteID
group by a.nationality, a.disclipineid
) am
group by nationality
having max(num_medals) > sum(num_medals) * 0.5;
That is, the maximum number of medals for a discpline accounts for more than half the medals.

Related

Get rows from primary table, and also a count of how many times that record appears in a secondary table (including 0's)

I have a Dogs table, a Kennels table and Visits table that contains DogId and KennelId columns.
I am trying to get a full list of all the dogs, with a column showing the number of visits to a particular kennel, so many of the results will contain a 0 as the visit count.
This is what I've tried:
select dog.*, visits.visitCount FROM
(select * from Dogs) as dog,
(select COUNT (Visits.Id) as visitCount from Visits INNER JOIN Dogs ON Dogs.Id =
Visits.DogId where KennelId = 'E15A8C60-E0FE-472D-9CC4-08DA251A992F') as visits
With this statement, I end up with all of the dogs, but with the same visit count for all, which is incorrect. I assume my count function is simply executed once with the result repeated for the remaining rows. I do not know how to correct this. Any help will be much appreciated!
With no table schemas or sample data, a guess would be something like the following:
select d.*, Coalesce(v.VisitCount,0) VisitCount
from dogs d
left join (
select DogId, Count(*) VisitCount
from visits v
where v.KennedId = 'E15A8C60-E0FE-472D-9CC4-08DA251A992F'
group by DogId
)v on v.DogId = d.DogId;

find an average of a column using group with inner join and then filtering through the groups

I've been trying to solve an sqlite question where I have two tables: Movies and movie_cast.
Movies has the columns: id, movie_title, and `score. Here is a sample of the data:
11|Star Wars|76.496
62|2001:Space Odyssey|39.064
152|Start Trek|26.551
movie_cast has the columns: movie_id, cast_id, cast_name, birthday, popularity. Here is a sample.
11|2|Mark Hamill|9/25/51|15.015
11|3|Harrison Ford|10/21/56|8.905
11|5|Peter Cushing|05/26/13|6.35
IN this case movies.id and movie_cast.movie_id are the same.
The question is to Find the top ten cast members who have the highest average movie scores.
Do not include movies with score <25 in the average score calculation.
▪ Exclude cast members who have appeared in two or fewer movies.
My query is as below but it doesn't seem to get me the right answer.
SELECT movie_cast.cast_id,
movie_cast.cast_name,
printf("%.2f",CAST(AVG(movies.score) as float)),
COUNT(movie_cast.cast_name)
FROM movies
INNER JOIN movie_cast ON movies.id = movie_cast.movie_id
WHERE movies.score >= 25
GROUP BY movie_cast.cast_id
HAVING COUNT(movie_cast.cast_name) > 2
ORDER BY AVG(movies.score ) DESC, movie_cast.cast_name ASC
LIMIT 10
The answers I get are in the format cast_id,cat_name,avg score.
-And example is: 3 Harrison Ford 52.30
I've analyzed and re-analyzed my logic but to no avail. I'm not sure where I'm going wrong. Any help would be great!
Thank you!
This is how I would write the query:
SELECT mc.cast_id,
mc.cast_name,
PRINTF('%.2f', AVG(m.score)) avg_score
FROM movie_cast mc INNER JOIN movies m
ON m.id = mc.movie_id
WHERE m.score >= 25
GROUP BY mc.cast_id, mc.cast_name
HAVING COUNT(*) > 2
ORDER BY AVG(m.score) DESC, mc.cast_name ASC
LIMIT 10;
I use aliases for the tables to shorten the code and make it more readable.
There is no need to cast the average to a float because the average in SQLite is always a real number.
Both COUNT(movie_cast.cast_name) can be simplified to COUNT(*) but the 1st one in the SELECT list is not needed by your requirement (if it is then add it).
The function PRINTF() returns a string, but if you want a number returned then use ROUND():
ROUND(AVG(m.score), 2) avg_score

PostgreSQL show latest date and one value of a column from many

I have a problem with a query, that i can't figure out. Have tried for some time, but I just can't figure it out. Would be a great deal of help if you could help me. So... I have 4 tables:
cars - ID, make, model, plate_number, price, type, year, owner_ID
persons - ID, name, surname, pers_code
insurance_data - company_ID, car_ID, first_date, last_date
companies - ID, title
My query so far is..
SELECT cars.plate_number, persons.name, persons.surname, insurance_data.last_date
FROM cars,persons,insurance_data
WHERE cars.owner_ID = persons.ID AND cars.ID = insurance_data.car_ID
This outputs cars plate number, owner of the car, and the last date of the car's insurance. But the problem is that there's two cars that have two end dates of insurance, so in the output there's two entries for same car and with both insurance end dates. What i need is that there would be only one entry for each car and corresponding insurance end date should be the latest.
I know this is pretty basic, but i'm a first year student of databases, and this is one of my first assignments. Thanks in advance
(1) Never use commas in the FROM clause. Always use proper, explicit JOIN syntax.
(2) Use table aliases!
The answer to your question is DISTINCT ON:
SELECT DISTINCT ON (c.plate_number) c.plate_number, p.name, p.surname, id.last_date
FROM cars c JOIN
persons p
ON c.owner_ID = p.ID JOIN
insurance_data id
ON c.ID = id.car_ID
ORDER BY c.plate_number, id.last_date DESC;

SQL Divide one count result by another OR Alternate solution

I have a table with the following schema.
Relational Database Schema:
Hotel = hotelNo, hotelName, city
Room = roomNo, hotelNo(FK), type, rate
Guest = guestNo, guestName, guestAddress
Booking = hotelNo(FK), guestNo(FK), dateFrom, dateTo, roomNo(FK)
There are entries in each table however their data isn't completely relevant to this question.
I need to calculate the average number of booking made for each hotel, ensuring that I include the hotels which do not currently have bookings.
I have this :
-- Call this select 1
select count(*)
from booking b, hotel h
where b.hotelNo=h.hotelNo;
-- Call this select 2
select count(*)
from hotel;
Select 1 returns the total number of bookings. Select 2 returns the total number of hotels. If I could simply divide the output of count in select 1 by the output of count in select 2 I would have my answer.
If this is possible can someone please help me with the code, otherwise can someone think of an alternate solution to achieve the same result?
If by "average number of bookings", you just want to divide the two numbers, then you can do:
select count(b.HotelNo) / count(distinct h.hotelNo)
from hotel h left join
booking b
on h.hotelNo = b.hotelNo;
Create a view say temp to get count of rooms per hotel[ The data you 'see' in a view, is not actually stored anywhere, and is generated from the tables on the fly.]
create view temp
as select hotelNo,count(*) as cnt from room group by hotelNo;
Use following query to fetch avg.
select booking.hotelNo,count(*) / cnt
from booking ,temp
where booking.hotelNo = temp.hotelNo
group by booking.hotelNo;
or
select booking.hotelNo,count(*) / cnt
from booking
INNER JOIN temp
on booking.hotelNo = temp.hotelNo
group by booking.hotelNo;
This will not include hotels that do not have any booking.

How to 'add' a column to a query result while the query contains aggregate function?

I have a table named 'Attendance' which is used to record student attendance time in courses. This table has 4 columns, say 'id', 'course_id', 'attendance_time', and 'student_name'. An example of few records in this table is:
23 100 1/1/2010 10:00:00 Tom
24 100 1/1/2010 10:20:00 Bob
25 187 1/2/2010 08:01:01 Lisa
.....
I want to create a summary of the latest attendance time for each course. I created a query below:
SELECT course_id, max(attendance_time) FROM attendance GROUP BY course_id
The result would be something like this
100 1/1/2010 10:20:00
187 1/2/2010 08:01:01
Now, all I want to do is add the 'id' column to the result above. How to do it?
I can't just change the command to something like this
SELECT id, course_id, max(attendance_time) FROM attendance GROUP BY id, course_id
because it would return all the records as if the aggregate function is not used. Please help me.
This is a typical 'greatest per group', 'greatest-n-per-group' or 'groupwise maximum' query that comes up on Stack Overflow almost every day. You can search Stack Overflow for these terms to find many different examples of how to solve this with different databases. One way to solve it is as follows:
SELECT
T2.course_id,
T2.attendance_time
T2.id
FROM (
SELECT
course_id,
MAX(attendance_time) AS attendance_time
FROM attendance
GROUP BY course_id
) T1
JOIN attendance T2
ON T1.course_id = T2.course_id
AND T1.attendance_time = T2.attendance_time
Note that this query can in theory return multiple rows per course_id if there are multiple rows with the same attendance_time. If that cannot happen then you don't need to worry about this issue. If this is a potential problem then you can solve this by adding an extra grouping on course_id, attendance_time and selecting the minimum or maximum id.
What do you need the additional column for? It already has a course ID, which identifies the data. A synthetic ID to the query would be useless because it does not refer to anything. If you want to get the max from the query results for a single course, then you can add a where condition like this:
SELECT course_id, max(attendance_time) FROM attendance GROUP BY course_id **WHERE course_id = your_id_here**;
If you mean that the column should be named 'id', you can alias it in the query:
SELECT course_id **AS id**, max(attendance_time) FROM attendance GROUP BY course_id;
You could make a view out of your query to easily access the aggregate data:
CREATE VIEW max_course_times AS SELECT course_id AS id, max(attendance_time) FROM attendance GROUP BY course_id;
SELECT * FROM max_course_times;
For SQL Server 2008 onwards, I like to use a Common Table Expression to add aggregated columns to queries:
WITH AttendanceTimes (course_id, maxTime)
AS
(
SELECT
course_id,
MAX(attendance_time)
FROM attendance
GROUP BY course_id
)
SELECT
a.course_id,
t.maxTime,
a.id
FROM attendance a
INNER JOIN AttendanceTimes t
ON a.course_id = t.course_id