SQL Collect duplicates to one place? PostgreSQL - sql

Sorry I'm new here and I'm also new with SQL and can't really explain my problem in the title...
So I have a TV show database, and there I have a Genre column, but for a TV show there are multiple Genres stored, so when I'm selecting all my TV Shows how can I combine them?
It needs to look like this:
https://i.stack.imgur.com/3EhBj.png
So I have to combine the string together, here is my code so far what I wrote:
SELECT title,
year,
runtime,
MIN(name) as name,
ROUND(rating, 1) as rating,
trailer,
homepage
FROM shows
JOIN show_genres
on shows.id = show_genres.show_id
JOIN genres
on show_genres.genre_id = genres.id
GROUP BY title,
year,
runtime,
rating,
trailer,
homepage
ORDER BY rating DESC
LIMIT 15;
I also have some other stuff here, that's my exerciese tasks! Thanks!
Also here is the relationship model:
https://i.stack.imgur.com/M89ho.png

Basically you need string aggregation - in Postgres, you can use string_agg() for this.
For efficiency, I would recommend moving the aggregation to a correlated subquery or a lateral join rather than aggregating in the outer query, so:
SELECT
s.title,
s.year,
s.runtime,
g.genre_names,
ROUND(s.rating, 1) as rating,
s.trailer,
s.homepage
FROM shows s
LEFT JOIN LATERAL (
SELECT string_agg(g.name, ', ') genre_names
FROM show_genres sg
INNER JOIN genres g ON g.id = sg.genre_id
WHERE sg.show_id = s.id
) g ON 1 = 1
ORDER BY s.rating DESC
LIMIT 15

Related

Get Average in SQL Through Join

I'm just playing around with SQL and I'm trying to do the following.
I have 2 tables and here is their structure:
Movies_metadata Movies
ratings table:
Ratings
As there are many ratings for one movie, what I'd like to do is get the avg rating per movie and have it display next to the title which is only available in the Metadata table.
This is as far as I got but obviously the issue with my SELECT statement is that it'll return the average of all movies and display it for each record:
SELECT
(SELECT
AVG(rating)
FROM
`movies-dataset.movies_data.ratings`) AS rating_avg,
metadata.title,
metadata.budget,
metadata.revenue,
metadata.genres,
metadata.original_language,
metadata.release_date
FROM
`movies-dataset.movies_data.Movies_metadata` AS metadata
INNER JOIN `movies-dataset.movies_data.ratings` AS ratings
ON metadata.id = ratings.movieId
LIMIT 10
Here is an example of the result:
Result
I'm thinking I can potentially use a GROUP BY but when I try, I get an error
Appreciate the help!
The following should work:
SELECT movies_metadata.title, AVG(ratings.rating)
FROM movies_metadata
LEFT JOIN ratings ON movies_metadata.id = ratings.movieID
GROUP BY movies_metadata.title
You can swap movies_metadata.title by movies_metadata.id if not unique.
The LIMIT function and GROUP function might conflict with each other. Try getting the average rating as part of the inner join like this:
SELECT
ratings.averagerating,
metadata.title,
metadata.budget,
metadata.revenue,
metadata.genres,
metadata.original_language,
metadata.release_date
FROM `movies-dataset.movies_data.Movies_metadata` AS metadata
INNER JOIN (SELECT movieId, AVG(rating) averagerating FROM `movies-dataset.movies_data.ratings` GROUP by movieId) AS ratings
ON metadata.id = ratings.movieId
ORDER BY ratings.averagerating
LIMIT 5
Maybe try something like:
Select m.movieID, (r.rate_sum / r.num_rate) as avg_rating
From your_movies_table m
Left Join (select movie_id, sum(rating) as ‘rate_sum’, count(rating) as ‘num_rate’
From your_ratings_table
Group by movie_id) r
On m.movie_id = r.movie_id
I'm using a left join because I'm not sure if all movies have been rated at least once.

Not getting 0 value in SQL count aggregate by inner join

I am using the basic chinook database and I am trying to get a query that will display the worst selling genres. I am mostly getting the answer, however there is one genre 'Opera' that has 0 sales, but the query result is ignoring that and moving on to the next lowest non-zero value.
I tried using left join instead of inner join but that returns different values.
This is my query currently:
create view max
as
select distinct
t1.name as genre,
count(*) as Sales
from
tracks t2
inner join
invoice_items t3 on t2.trackid == t3.trackid
left join
genres as t1 on t1.genreid == t2.genreid
group by
t1.genreid
order by
2
limit 10;
The result however skips past the opera value which is 0 sales. How can I include that? I tried using left join but it yields different results.
Any help is appreciated.
If you want to include genres with no sales then you should start the joins from genres and then do LEFT joins to the other tables.
Also, you should not use count(*) which counts any row in the resultset.
SELECT g.name Genre,
COUNT(i.trackid) Sales
FROM genres g
LEFT JOIN tracks t ON t.genreid = g.genreid
LEFT JOIN invoice_items i ON i.trackid = t.trackid
GROUP BY g.genreid
ORDER BY Sales LIMIT 10;
There is no need for the keyword DISTINCT, since the query returns 1 row for each genre.
When asking for the top n one must always state how to deal with ties. If I am looking for the top 1, but there are three rows in the table, all with the same value, shall I select 3 rows? Zero rows? One row arbitrarily chosen? Most often we don't want arbitrary results, which excludes the last option. This excludes LIMIT, too, because LIMIT has no clause for ties in SQLite.
Here is an example with DENSE_RANK instead. You are looking for the worst selling genres, so we must probably look at the revenue per genre, which is the sum of price x quantity sold. In order to include genres without invoices (and maybe even without tracks?) we outer join this data to the genre table.
select total, genre_name
from
(
select
g.name as genre_name,
coalesce(sum(ii.unit_price * ii.quantity), 0) as total
dense_rank() over (order by coalesce(sum(ii.unit_price * ii.quantity), 0)) as rnk
from genres g
left join tracks t on t.genreid = g.genreid
left join invoice_items ii on ii.trackid = t.trackid
group by g.name
) aggregated
where rnk <= 10
order by total, genre_name;

Join with count

I need to write SQL query like:
Show all countries with more than 1000 users, sorted by user count.
The country with the most users should be at the top.
I have tables:
● Table users (id, email, citizenship_country_id)
● Table countries (id, name, iso)
Users with columns: id, email, citizenship_country_id
Countries with columns: id, name, iso
SELECT countries.name,
Count(users.citiizenship_country_id) AS W1
FROM countries
LEFT JOIN users ON countries.id = users.citizenship_country_id
GROUP BY users.citiizenship_country_id, countries.name
HAVING ((([users].[citiizenship_country_id])>2));
But this does not work - I get an empty result set.
Could you please tell me what I'm doing wrong?
A LEFT JOIN is superfluous for this purpose. To have 1000 users, you need at least one match:
SELECT c.name, Count(*) AS W1
FROM countries c JOIN
users u
ON c.id = u.citizenship_country_id
GROUP BY c.name
HAVING COUNT(*) > 1000;
Notice that table aliases also make the query easier to write and to read.
Group by country name and use HAVING Count(u.citiizenship_country_id)>1000, it filters rows after aggregation:
SELECT c.name,
Count(u.citiizenship_country_id) AS W1
FROM countries c
INNER JOIN users u ON c.id = u.citizenship_country_id
GROUP BY c.name
HAVING Count(u.citiizenship_country_id)>1000
ORDER BY W1 desc --Order top counts first
;
As #GordonLinoff pointed, you can use INNER JOIN instead of LEFT JOIN, because anyway this query does not return counries without users and INNER JOIN performs better because no need to pass not joined records to the aggregation.

What am I missing this? Do I need to use JOIN or UNION or a subquery?

I'm trying to get some practice making queries with SQL.
I'm working with a playground that uses SQLite.
There are two tables:books_north and books_south
Both have columns for: id, title, author, genre and first_published
The query I'm trying is to generate a report that lists the book titles from both locations and count the total number of books with the same title.
I can't work out how to even get started with the count.
So far I have
SELECT title
FROM books_north
INNER JOIN books_south
ON books_north.title = books_south.title;
But it just says that title is an ambiguous column.
How do I do this? Thank you
You need UNION ALL to get the count of each title
Select Title,Count(1) as [Count]
From
(
SELECT title FROM books_north bn
union all
select title from books_south bs
) A
Group by Title
Another approach using FULL OUTER JOIN (If your RDBMS supports)
SELECT COALESCE(bn.Title, bs.title) as title,
( bn.[count] + bs.[count] ) AS [Count]
FROM (SELECT title,
Count(1) AS [count]
FROM books_north
GROUP BY title) bn
FULL OUTER JOIN (SELECT title,
Count(1) AS [count]
FROM books_south
GROUP BY title) bs
ON bn.Title = bs.Title
Regarding your error message, Title column is present in the both the table so when you select the Title column you need to tell the compiler from which table you want to select Title column. It can be done by giving a alias name to the tables in Join
SELECT COUNT( DISTINCT a.title) AS TITLECount
FROM books_north a
INNER JOIN books_south b
ON a.title = b.title;
A simple inner join would be sufficient to get your count. Use a table alias in SELECT to remove the ambiguity of column title as it is present in both the tables.
In regard to the error that you mentioned:
The problem is SELECT title - it asks you to be specific about where title shall be read from, books_north or books_south.
So you need to tell either SELECT book_north.title or SELECT book_south.title, that's all there is regarding the ambiguity error.
The count is explained in the other answers. You need to learn group by if you want to display title and count (it is basically a group by title in your case.
SELECT books_north.title, count(books_north.title)
FROM books_north
INNER JOIN books_south
ON books_north.title = books_south.title
GROUP BY books_north.title;
This works:
SELECT title, COUNT(title) count FROM
(
SELECT title FROM books_north
UNION ALL
SELECT title FROM books_south
)
GROUP BY title;

SQL aggregate query error

I have 3 tables like this
player(id,name,age,teamid)
team(id,name,sponsor,totalplayer,totalchampion,boss,joindate)
playerdetail(id,playerid,position,number,allstar,joindate)
I want to select teaminfo include name,sponsor,totalplayer,totalchampion,boss,
the average age of the players, the number of the allstar players
I write the t-sql as below
SELECT T.NAME,T.SPONSOR,T.TOTALPLAYER,T.TOTALCHAMPION,T.BOSS,T.JOINDATE,
AVG(P.AGE) AS AverageAge,COUNT(D.ALLSTAR) As AllStarPlayer
FROM Team T,Player P,PlayerDetail D
WHERE T.ID=P.TID AND P.ID=D.PID
but it doesn't work, the error message is
'Column 'Team.Name' is invalid in the select list because it is not
contained in either an aggregate function or the GROUP BY clause.'
Who can help me?
Thx in advance!
Add
GROUP BY
T.NAME,T.SPONSOR,T.TOTALPLAYER,T.TOTALCHAMPION,T.BOSS,T.JOINDATE
In most RDBMS (except MySQL which will guess for you), a column must be either aggregated (COUNT, AVG) or in the GROUP BY
Also, you should use explicit JOINs.
This is clearer, less ambiguous and more difficult to bollix your code
SELECT
T.NAME, T.SPONSOR, T.TOTALPLAYER, T.TOTALCHAMPION, T.BOSS, T.JOINDATE,
AVG(P.AGE) AS AverageAge,
COUNT(D.ALLSTAR) As AllStarPlayer
FROM
Team T
JOIN
Player P ON T.ID=P.TID
JOIN
PlayerDetail D ON P.ID=D.PID
GROUP BY
T.NAME, T.SPONSOR, T.TOTALPLAYER, T.TOTALCHAMPION, T.BOSS, T.JOINDATE;
Given that you want this data per team, and team.ID uniquely identifies team, I suggest the following:
SELECT max(T.NAME) As TeamName,
max(T.SPONSOR) As Sponsor,
max(T.TOTALPLAYER) As TotalPlayers,
max(T.TOTALCHAMPION) As TotalChampions,
max(T.BOSS) As Boss,
max(T.JOINDATE) As JoinDate,
AVG(P.AGE) AS AverageAge,
COUNT(D.PID) As AllStarPlayer
FROM Team T
join Player P on T.ID=P.TID
left join PlayerDetail D on P.ID=D.PID and D.ALLSTAR = 'Y'
group by T.ID
Use:
SELECT T.NAME,T.SPONSOR,T.TOTALPLAYER,T.TOTALCHAMPION,T.BOSS,T.JOINDATE,
AVG(P.AGE) AS AverageAge,COUNT(D.ALLSTAR) As AllStarPlayer
FROM Team T
JOIN Player P ON T.ID = P.TEAMID
JOIN PlayerDetail D ON P.ID = D.PLAYERID
GROUP BY T.NAME,T.SPONSOR,T.TOTALPLAYER,T.TOTALCHAMPION,T.BOSS,T.JOINDATE