SQL question about counting - sql

I want to make a query so that I can grab only Locations that have at least 50 Places.
I have a table of Locations:
Id, City, Country
1, Austin, USA
2, Paris, France
And a table of Places connected to Locations by Location_id
Id, Name, Details, Location_id
1, The Zoo, blah, 2
2, Big Park, blah, 2
I can join them like so:
SELECT places.name, places.id, locations.country, locations.city
FROM places
INNER JOIN locations
ON places.location_id = locations.id
by how can I only get the results of cities that have at least 50 places and order them by the largest amount?
Thanks!

Use a GROUP BY with a HAVING clause.
SELECT locations.country, locations.city, COUNT(*)
FROM places
INNER JOIN locations ON places.location_id = locations.id
GROUP BY locations.country, locations.city
HAVING COUNT(*) >= 50

OK I've seen that the above answers are almost there but have some mistakes, so just posting the correct version:
SELECT locations.country, locations.city, COUNT(*) as count_of_places
FROM places
INNER JOIN locations ON places.location_id = locations.id
GROUP BY locations.country, locations.city
HAVING COUNT(*) >= 50
ORDER BY count_of_places;

You can use the having clause to limit these rows by the value of an aggregate column. Also, MySQL allows you to use lazy group bys, so you can absolutely take advantage of this:
select
l.country,
l.city,
p.name,
p.details,
count(*) as number_of_places
from
locations l
inner join places p on
l.id = p.location_id
group by
l.id
having
count(*) >= 50
order by
number_of_places,
l.country,
l.city,
p.name

Somewhat unperformant, but should work:
SELECT places.name, places.id, sq.country, sq.city, sq.counter
FROM (SELECT Locations.id, country,city, count(*) as counter FROM Locations
JOIN Places ON (Locations.Id=Places.Location_id)
GROUP BY locations.id HAVING count(*) >= 50) AS sq
JOIN Places ON (sq.id=Places.Location_id)
ORDER BY counter DESC`
P.S. The exact syntax may vary depending on the database, I'm not sure if this is mysql-compatible as is.

Related

Display courses with at least 10 students

I have the following tables:
Students (id, name, surname, study_year, department_id)
Courses(id, name)
Course_Signup(id, student_id, course_id, year)
I want to display the courses to which at least 10 students have signed up for, only using subqueries (no group-by, join or set operations). This could be easily implemented using data aggregation and join:
SELECT c.name, COUNT(csn.course_id)
FROM Course_Signup csn
JOIN Courses c
ON csn.course_id = c.id
GROUP BY c.name
HAVING COUNT(csn.course_id) >= 10
But how would I do this only using subqueries? Is there any other way, other than COUNT, to get the number of courses? Thank you, in advance!
You can use a correlated sub-query to retrieve the name:
SELECT (SELECT c.name FROM Courses c WHERE csn.course_id = c.id) AS name,
COUNT(*)
FROM Course_Signup csn
GROUP BY
course_id
HAVING COUNT(*) >= 10
Note: you should also GROUP BY the primary key the uniquely identifies the course as there may be two courses with the same name.
If you also don't want to use GROUP BY then:
SELECT name
FROM Courses c
WHERE 10 <= ( SELECT COUNT(*)
FROM Course_Signup csn
WHERE csn.course_id = c.id )
or, to also get the number of sign-ups:
SELECT *
FROM (
SELECT name,
( SELECT COUNT(*)
FROM Course_Signup csn
WHERE csn.course_id = c.id ) AS num_signups
FROM Courses c
)
WHERE num_signups >= 10;
You could do:
SELECT c.name
FROM Courses c
WHERE (
SELECT COUNT(*)
FROM Course_Signup csn
WHERE csn.course_id = c.id
) >= 10
which only uses a subquery and has no group-by, join or set operations.
fiddle
If you wanted the actual count in the result set then you would need to repeat the subquery in the select list.
You might also need to do COUNT(DISTINCT cs.student_id) if there might be duplicates; particularly if the same student can sign up in multiple years - but then you might want to restrict to a single year anyway.

How can I join multiple tables for count?

I have 3 tables named houses, trees, rivers. All of these tables have city_id column. I want to group total counts by cities. Cities are in another table.
My database is postgresql.
city_name trees houses rivers
City-1 1000 200 1
City-2 300 100 2
City-3 4000 210 4
I can get for trees
SELECT
city.name as city_name,
count(*) as trees
FROM trees as t, cities as city
WHERE t.city_id = city.city_id
GROUP BY city.name
But I could not join three tables in sama query.
To avoid issues with duplication of rows in a JOIN it's probably easiest to do the aggregation in subqueries and then JOIN them:
SELECT c.name,
COALESCE(t.cnt, 0) AS trees,
COALESCE(h.cnt, 0) AS houses,
COALESCE(r.cnt, 0) AS rivers
FROM cities c
LEFT JOIN (SELECT city_id, COUNT(*) AS cnt
FROM trees
GROUP BY city_id) t ON t.city_id = c.city_id
LEFT JOIN (SELECT city_id, COUNT(*) AS cnt
FROM houses
GROUP BY city_id) h ON h.city_id = c.city_id
LEFT JOIN (SELECT city_id, COUNT(*) AS cnt
FROM rivers
GROUP BY city_id) r ON r.city_id = c.city_id
We use a LEFT JOIN in case a given city has no trees, houses or rivers.
Demo on dbfiddle
An alternative to Nick's answer:
SELECT
city.name as city_name,
count(distinct t.id) as trees,
count(distinct h.id) as houses,
count(distinct r.id) as rivers
FROM cities as city
left join trees as t on t.city_id = city.city_id
left join rivers as r on r.city_id = city.city_id
left join houses as h on h.city_id = city.city_id
GROUP BY city.name
--
Not sure of the performance implications specifically with Postgres, but here's a (fairly old, so things might have moved on since) article suggesting count(distinct) can be slow in Postgres, together with options:
postgresql COUNT(DISTINCT ...) very slow

Count and group the number of times each town is listed in the table

SELECT PEOPLE.TOWNKEY, TOWN_LOOKUP.TOWN FROM PEOPLE
INNER JOIN TOWN_LOOKUP
ON PEOPLE.TOWNKEY = TOWN_LOOKUP.PK
ORDER BY TOWN
Current Table Output:
You are missing the group by clause entirely:
SELECT tl.town, COUNT(*)
FROM people p
INNER JOIN town_lookup ON p.townkey = tl.pk
GROUP BY tl.town
ORDER BY tl.town

SQL Joins Clarification

I wish to display the hospitalid,hosp name and hosp type for the hospital which have/has the highest no of doctors associated with them.
I have two tables:
Doctor: doctorid, hospitalid
Hospital: hospitalid, hname, htype
SELECT d.hospitalid,h.hname,h.htype
FROM doctor d
INNER JOIN hospital h ON d.hospitalid = h.hospitalid
GROUP BY d.hospitalid,h.hname,h.htype
HAVING MAX(count(d.doctorid));
I tried the above code, but i get an error "group func is nested too deeply". How should i modify d code?
This is a common error when learning SQL, thinking that having Max(col) says "keep only the row with the max". It simply means having <some function on the column> without any condition. For instance, you could say having count(d.doctorid) = 1 to get hospitals with only one doctor.
The way to do this is to order the columns and then take the first row. However, the syntax for "take the first row" varies by database. The following works in many SQL dialects:
SELECT d.hospitalid,h.hname,h.htype
FROM doctor d INNER JOIN
hospital h
ON d.hospitalid = h.hospitalid
GROUP BY d.hospitalid,h.hname,h.htype
order by count(d.doctorid) desc
limit 1;
In SQL Server and Sybase, the syntax is:
SELECT top 1 d.hospitalid,h.hname,h.htype
FROM doctor d INNER JOIN
hospital h
ON d.hospitalid = h.hospitalid
GROUP BY d.hospitalid,h.hname,h.htype
order by count(d.doctorid) desc;
In Oracle:
select t.*
from (SELECT d.hospitalid,h.hname,h.htype
FROM doctor d INNER JOIN
hospital h
ON d.hospitalid = h.hospitalid
GROUP BY d.hospitalid,h.hname,h.htype
order by count(d.doctorid) desc
) t
where rownum = 1;
EDIT (based on comment):
To get all rows with the maximum, then you can do something similar to your original query. It is just more complicated. You can calculate the maximum number using a subquery and do the comparison in the having clause:
SELECT d.hospitalid, h.hname, h.htype
FROM doctor d INNER JOIN
hospital h
ON d.hospitalid = h.hospitalid join
GROUP BY d.hospitalid,h.hname,h.htype
having count(d.doctorid) = (select max(NumDoctors)
from (select hospitalid, count(*) as NumDoctors
from hospitalId
group by hospitalid
) hd
)
As a note, there are easier mechanisms in other databases.
This is how I would write it for SQL Server. THe specific details might vary depending teh database backend you are using.
SELECT TOP 1 a.hospitalid,a.hname,a.htype
FROM
(SELECT d.hospitalid,h.hname,h.htype, count(d.doctorid) as doctorcount FROM doctor d INNER JOIN hospital h ON d.hospitalid = h.hospitalid
GROUP BY d.hospitalid,h.hname,h.htype) a
ORDER BY doctorcount DESC;

Select one record from two tables in Oracle

There are three tables:
A table about students: s41071030(sno, sname, ssex, sage, sdept)
A table about course: c41071030(cno, cname, cpno, credit)
A table about selecting courses: sc41071030(sno, cno, grade)
Now, I want select the details about a student whose sdept='CS' and he or she has selected the most courses in department 'CS'.
As with any modestly complex SQL statement, it is best to do 'TDQD' — Test Driven Query Design. Start off with simple parts of the question and build them into a more complex answer.
To find out how many courses each student in the CS department is taking, we write:
SELECT S.Sno, COUNT(*) NumCourses
FROM s41071030 S
JOIN sc41071030 SC ON S.Sno = SC.Sno
WHERE S.Sdept = 'CS'
GROUP BY S.Sno;
We now need to find the largest value of NumCourses:
SELECT MAX(NumCourses) MaxCourses
FROM (SELECT S.Sno, COUNT(*) NumCourses
FROM s41071030 S
JOIN sc41071030 SC ON S.Sno = SC.Sno
WHERE S.Sdept = 'CS'
GROUP BY S.Sno
)
Now we need to join that result with the sub-query, so it is time for a CTE (Common Table Expression):
WITH N AS
(SELECT S.Sno, COUNT(*) NumCourses
FROM s41071030 S
JOIN sc41071030 SC ON S.Sno = SC.Sno
WHERE S.Sdept = 'CS'
GROUP BY S.Sno
)
SELECT N.Sno
FROM N
JOIN (SELECT MAX(NumCourses) MaxCourses FROM N) M
ON M.MaxCourses = N.NumCourses;
And we need to get the student details, so we join that with the student table:
WITH N AS
(SELECT S.Sno, COUNT(*) NumCourses
FROM s41071030 S
JOIN sc41071030 SC ON S.Sno = SC.Sno
WHERE S.Sdept = 'CS'
GROUP BY S.Sno
)
SELECT S.*
FROM s41071030 S
JOIN N ON N.Sno = S.Sno
JOIN (SELECT MAX(NumCourses) MaxCourses FROM N) M
ON M.MaxCourses = N.NumCourses;
Lightly tested SQL: you were warned. To test, run the component queries, making sure you get reasonable results each time. Don't move on to the next query until the previous one is working correctly.
Note that the courses table turns out to be immaterial to the query you are solving.
Also note that this may return several rows if it turns out there are several students all taking the same number of courses and that number is the largest number that any student is taking. (So, if there are 3 students taking 7 courses each, and no student taking more than 7 courses, then you will see 3 rows in the result set.)
Aggregate sc41071030 rows to get the counts.
Join the results to s41071030 to:
filter rows on sdept;
get student details;
RANK() the joined rows on the count values.
Select rows with the ranking of 1.
WITH
aggregated AS (
SELECT
sno,
COUNT(*) AS coursecount
FROM
sc41071030
GROUP BY
sno
),
ranked AS (
SELECT
s.*,
RANK() OVER (ORDER BY agg.coursecount DESC) AS rnk
FROM
s41071030 s
INNER JOIN aggregated agg ON s.sno = agg.sno
WHERE
s.sdept = 'CS'
)
SELECT
sno,
sname,
ssex,
sage,
sdept
FROM
ranked
WHERE
rnk = 1
;