Subqueries - Finding the average number of tracks per album - sql

I'm trying to Write a subquery to join 'album' and 'track' tables.
Eventually I need to figure out how many songs on average are on albums with the word "Rock" in the title. The chosen albums must have at least eight songs on them.
ER diagram
SELECT AVG(tr.track_id)
FROM(SELECT al.album_id AS album,
tr.name,
COUNT(tr.track_id)
FROM track as tr
LEFT OUTER JOIN album as al ON al.album_id = tr.album_id
WHERE tr.name LIKE '%Rock%'
GROUP tr.name )AS ag
HAVING COUNT(al.album_id) >= 8;

You were close. The HAVING clause should be inside the inner query, filtering only those with more than 8 songs in it. Also you filtering by track name instead of album name, so I changed it as well.
SELECT AVG(ag.cnt)
FROM(SELECT al.album_id AS id,
COUNT(*) as cnt
FROM track as tr
JOIN album as al ON al.album_id = tr.album_id
WHERE al.name LIKE '%Rock%'
GROUP al.album_id
HAVING COUNT(al.album_id) >= 8 ) AS ag

Related

Get Average in SQL Through Join

I'm just playing around with SQL and I'm trying to do the following.
I have 2 tables and here is their structure:
Movies_metadata Movies
ratings table:
Ratings
As there are many ratings for one movie, what I'd like to do is get the avg rating per movie and have it display next to the title which is only available in the Metadata table.
This is as far as I got but obviously the issue with my SELECT statement is that it'll return the average of all movies and display it for each record:
SELECT
(SELECT
AVG(rating)
FROM
`movies-dataset.movies_data.ratings`) AS rating_avg,
metadata.title,
metadata.budget,
metadata.revenue,
metadata.genres,
metadata.original_language,
metadata.release_date
FROM
`movies-dataset.movies_data.Movies_metadata` AS metadata
INNER JOIN `movies-dataset.movies_data.ratings` AS ratings
ON metadata.id = ratings.movieId
LIMIT 10
Here is an example of the result:
Result
I'm thinking I can potentially use a GROUP BY but when I try, I get an error
Appreciate the help!
The following should work:
SELECT movies_metadata.title, AVG(ratings.rating)
FROM movies_metadata
LEFT JOIN ratings ON movies_metadata.id = ratings.movieID
GROUP BY movies_metadata.title
You can swap movies_metadata.title by movies_metadata.id if not unique.
The LIMIT function and GROUP function might conflict with each other. Try getting the average rating as part of the inner join like this:
SELECT
ratings.averagerating,
metadata.title,
metadata.budget,
metadata.revenue,
metadata.genres,
metadata.original_language,
metadata.release_date
FROM `movies-dataset.movies_data.Movies_metadata` AS metadata
INNER JOIN (SELECT movieId, AVG(rating) averagerating FROM `movies-dataset.movies_data.ratings` GROUP by movieId) AS ratings
ON metadata.id = ratings.movieId
ORDER BY ratings.averagerating
LIMIT 5
Maybe try something like:
Select m.movieID, (r.rate_sum / r.num_rate) as avg_rating
From your_movies_table m
Left Join (select movie_id, sum(rating) as ‘rate_sum’, count(rating) as ‘num_rate’
From your_ratings_table
Group by movie_id) r
On m.movie_id = r.movie_id
I'm using a left join because I'm not sure if all movies have been rated at least once.

Not getting 0 value in SQL count aggregate by inner join

I am using the basic chinook database and I am trying to get a query that will display the worst selling genres. I am mostly getting the answer, however there is one genre 'Opera' that has 0 sales, but the query result is ignoring that and moving on to the next lowest non-zero value.
I tried using left join instead of inner join but that returns different values.
This is my query currently:
create view max
as
select distinct
t1.name as genre,
count(*) as Sales
from
tracks t2
inner join
invoice_items t3 on t2.trackid == t3.trackid
left join
genres as t1 on t1.genreid == t2.genreid
group by
t1.genreid
order by
2
limit 10;
The result however skips past the opera value which is 0 sales. How can I include that? I tried using left join but it yields different results.
Any help is appreciated.
If you want to include genres with no sales then you should start the joins from genres and then do LEFT joins to the other tables.
Also, you should not use count(*) which counts any row in the resultset.
SELECT g.name Genre,
COUNT(i.trackid) Sales
FROM genres g
LEFT JOIN tracks t ON t.genreid = g.genreid
LEFT JOIN invoice_items i ON i.trackid = t.trackid
GROUP BY g.genreid
ORDER BY Sales LIMIT 10;
There is no need for the keyword DISTINCT, since the query returns 1 row for each genre.
When asking for the top n one must always state how to deal with ties. If I am looking for the top 1, but there are three rows in the table, all with the same value, shall I select 3 rows? Zero rows? One row arbitrarily chosen? Most often we don't want arbitrary results, which excludes the last option. This excludes LIMIT, too, because LIMIT has no clause for ties in SQLite.
Here is an example with DENSE_RANK instead. You are looking for the worst selling genres, so we must probably look at the revenue per genre, which is the sum of price x quantity sold. In order to include genres without invoices (and maybe even without tracks?) we outer join this data to the genre table.
select total, genre_name
from
(
select
g.name as genre_name,
coalesce(sum(ii.unit_price * ii.quantity), 0) as total
dense_rank() over (order by coalesce(sum(ii.unit_price * ii.quantity), 0)) as rnk
from genres g
left join tracks t on t.genreid = g.genreid
left join invoice_items ii on ii.trackid = t.trackid
group by g.name
) aggregated
where rnk <= 10
order by total, genre_name;

SQL Collect duplicates to one place? PostgreSQL

Sorry I'm new here and I'm also new with SQL and can't really explain my problem in the title...
So I have a TV show database, and there I have a Genre column, but for a TV show there are multiple Genres stored, so when I'm selecting all my TV Shows how can I combine them?
It needs to look like this:
https://i.stack.imgur.com/3EhBj.png
So I have to combine the string together, here is my code so far what I wrote:
SELECT title,
year,
runtime,
MIN(name) as name,
ROUND(rating, 1) as rating,
trailer,
homepage
FROM shows
JOIN show_genres
on shows.id = show_genres.show_id
JOIN genres
on show_genres.genre_id = genres.id
GROUP BY title,
year,
runtime,
rating,
trailer,
homepage
ORDER BY rating DESC
LIMIT 15;
I also have some other stuff here, that's my exerciese tasks! Thanks!
Also here is the relationship model:
https://i.stack.imgur.com/M89ho.png
Basically you need string aggregation - in Postgres, you can use string_agg() for this.
For efficiency, I would recommend moving the aggregation to a correlated subquery or a lateral join rather than aggregating in the outer query, so:
SELECT
s.title,
s.year,
s.runtime,
g.genre_names,
ROUND(s.rating, 1) as rating,
s.trailer,
s.homepage
FROM shows s
LEFT JOIN LATERAL (
SELECT string_agg(g.name, ', ') genre_names
FROM show_genres sg
INNER JOIN genres g ON g.id = sg.genre_id
WHERE sg.show_id = s.id
) g ON 1 = 1
ORDER BY s.rating DESC
LIMIT 15

Is my solution to this SQL problem from an interview correct?

I am new to SQL (just completed a course on Edx) and had this interview question before I knew anything and thought I'd give it a shot now. Wondering if my solution is correct. Thank you!
Problem
Given the following two database tables, team and player (with corresponding database columns provided in rows 8-12 below), write a SQL statement that would return a list of names of the top 10 teams sorted from the tallest average player to the shortest. Assume that player height is stored as an integer representing number of inches.
team
id
league
name
division
player
id
name
height
weight
team_id
Solution
SELECT TOP 10 T.Team, P.Name
FROM Team AS T
JOIN Player AS P on T.id = P.team_id
ORDER BY height DESC;
Unfortunately, your solution is not correct - that would essentially provide you a list of players with the greatest height. So, if the two tallest players in the database were on the same team, that team name would appear twice.
Here's how I would write the query:
SELECT TOP 10 T.[name] as [Team Name], AVG(P.[height]) as [Average Height]
FROM team T
INNER JOIN player P on (T.[id] = P.[team_id])
GROUP BY T.[name]
ORDER BY AVG(P.[height]) DESC
The teams sorted by average height of players
sql server:
select top 10 T.name, avg(height) as AvgHeight
from team t inner join Player p on T.id = P.team_id
group by T.name
order by 2 desc
Postgres
select T.name, avg(height) as AvgHeight
from team t inner join Player p on T.id = P.team_id
group by T.name
order by 2 desc limit 10;
The problem statement asks you to select out the names of the teams, not to select out the height. You need to order the team names by average player height, and you can do this without selecting the average height as part of the final result set by using GROUP BY along with the AVG() function.
SELECT TOP 10 T.name FROM team T
INNER JOIN player P
ON T.id = P.team_id
GROUP BY T.id
ORDER BY AVG(P.height) DESC

SQL - How to Return Max Results of Count without the count appearing

Working on my homework and having a VERY difficult time trying to figure out to ONLY have the Artist Name appear who has the highest number of tracks.
While I get the answer correctly, it continues to show the artist name and the number of tracks. Just need the artist name. Tried to use WHERE, HAVING. Nothing seems to work. Any ideas?
SELECT TOP 1
Artist.Name 'ArtistName',
COUNT(*) TrackName
FROM Artist
JOIN Album ON
Artist.ArtistId = Album.AlbumId
JOIN Track ON
Album.AlbumId = Track.AlbumId
GROUP BY Artist.Name
ORDER BY TrackName DESC
Just use count(*) in the order by:
SELECT TOP 1 a.Name as ArtistName
FROM Artist a JOIN
Album al
ON a.ArtistId = al.AlbumId join
Track t
ON al.AlbumId = t.AlbumId
GROUP BY a.Name
ORDER BY COUNT(*) DESC;
Note the other changes to the query:
The use of as for column aliases.
The removal of the single quotes around the column alias. Only use single quotes for string and date constants.
The introduction of table aliases. This makes the query easier to write and to read.