Understanding summary of inner query - sql

Problem understanding subquery
I do not understand this example from www.sqlitetutorial.net/sqlite-subquery :
Only one number is returned by the inner query: 1422138358
But the average of this number is different:
So why is the average of 1422138358 not 1422138358? The two queries are not independent? If I remove "ORDER BY albumid" the result is the same:
Example data:
http://www.sqlitetutorial.net/sqlite-sample-database/
Edit: Ok, there is probably some integer overflow going on as the columns are integer, but I still don't understand why the example take the average of a single number?

Very possibly that it was a mistake
1)
From the text you can see that they wanted to 'sum the size of an album' and you`re querying Tracks table, which supposedly have an album_ID column
2)
You cannot use ORDER BY if you`re using only aggregation column
such as
select SUM(bytes)
from Tracks
Order by albumID
because it has nothing to order it by from.
Also note that you cannot use order by in subqueries
Finally what was missing here was this remaining of the query :
Select AVG(album.size) as [avg(album.size)]
from (
select albumID,SUM(bytes) as size
from Tracks
GROUP BY albumID
) as album
You can learn more about subqueries here
And if you want to play around with these, heres the code that you can replicate and use it for further exercies on that website:
CREATE TABLE tracks (AlbumID int,bytes int)
CREATE TABLE albums (AlbumID int, title nvarchar(50))
insert into Tracks values (1,2),(2,10),(3,15)
Select AVG(album.size) as [avg(album.size)]
from (
select AlbumID,SUM(bytes) as size
from tracks
GROUP BY albumID
) as album
Hope it helps

Related

How can I output a table containing the count of all items that match this condition in SQL?

I'm working in CS50 pset7 SQLite queries, and I'm stuck in this problem:
write a SQL query to determine the number of movies with an IMDb rating of 10.0. Your query should output a table with a single column and a single row (plus optional header) containing the number of movies with a 10.0 rating.
So basically what I have to do is go into a table called 'ratings',
which has the strucutre of the image above, and get the number of how many items in the column rating has a value of 10.0.
I have tried count(SELECT * FROM ratings WHERE rating=10.0 but I believe count doesn't work like that...
Hopefully you can help me! Thanks!
Try the below -
select count(*)
FROM ratings WHERE rating=10.0
For returning number of rows of table we use count(*) function of SQL.
Select count(*) from ratings where rating where rating = 10.0 ;

Get the row of each different column A with highest value for another column B

I know the title is a bit messy, but I will show the problem black in white right now.
I have a table like this one:
CREATE TABLE items (
item_id int primary key,
item_type int,
item_value int
);
The actual table is a bit different, but this is a simplified version for the sake of understanding.
Now, what I want to get in a SELECT query is those with the greatest item_value for each different item_type.
I have tried something like:
SELECT item_id,
item_type,
item_value
FROM items
GROUP BY item_type
ORDER BY item_value DESC;
That seems to do the trick, but it takes aeons to run, and I think it is utterly suboptimal. For that matter it would be faster to do one query for each type, but I wonder if there is a way to do the same in only one query with a join or so.
Thanks a lot in advance!
Standard SQL forbids this, but in SQLite 3.7.11 or later, you can select a row from a group with MAX():
SELECT item_id,
item_type,
MAX(item_value) AS item_value
FROM items
GROUP BY item_type;
To make this query efficient, you need an index on the item_type column.
The query suggested by CL seems to take the same amount of time (measured with .time on) than mine, although for me his query looks more clear than mine.
Indexing only item_type doesn't seem to make difference for any of the two queries, what finally have worked is to create an index with the three columns like:
CREATE INDEX idx_items_tvi ON items(item_type, item_value, item_id)
After that, the speed has improved A LOT (from about three seconds to just one half of a second ceteris paribus).

Complex sql select

I can't figure out how to make this sql select statement...Here are my tables :
I opened the tables concerned by the request
So basically I want to select the number of albums for each interpret.
I just can't figure it out... I am currently thinking that I need to do my first select on album like :
select interpret.no_interpret, count(*)
from album
.
.
.
group by interpret.no_interpret;
and there work from this but I don't know where to go next.
I may be missing something, but I'm not seeing the direct relation from your song table to the album...
I would first start by getting the link_interpret_song table joined to the song table and get count of distinct albums. However, I didn't see what appears to be a "No_Album" column in the field list of the song table. I can only guess it IS in there associated to the particular album. I did see media, but to me, that would be like a TYPE of media (digital, download, vinyl, CD) vs the actual ID Key apparent to the album table.
That said, I am thinking there IS such a "No_Album" column in the SONG table.
select
LIS.No_Interpret,
COUNT( DISTINCT S.No_Album )
from
Link_Interpret_Song LIS
JOIN Song S
on LIS.No_Song = S.No_Song
group by
LIS.No_Interpret;
Now, that said, if you want the interpret details, take the above results and join that to the interpret table. I've done both distinct album count and total # of songs just as an example of count() vs count(distinct) context... such as
select
PreCounts.No_Interpret,
PreCounts.DistinctAlbums,
PreCounts.ActualSongs,
I.Name_Interpret,
I.First_Name,
I.Stage_Name
from
( select
LIS.No_Interpret,
COUNT( DISTINCT S.No_Album ) as DistinctAlbums,
COUNT(*) as ActualSongs
from
Link_Interpret_Song LIS
JOIN Song S
on LIS.No_Song = S.No_Song
group by
LIS.No_Interpret ) as PreCounts
JOIN Interpret I
ON PreCounts.No_Interpret = I.No_Interpret
The question is ambiguous since there isn't a clear indication of how the tables are related. Given assumptions about these relations, your query will likely take on something similar to the following form:
SELECT COUNT(distinct a.no_album) from album a, interpret i, song s
where i.no_song=s.no_song
and a.no_album=s.no_album GROUP BY i.no_interpret

correlated query to update a table based on a select

I have these tables Genre and Songs. There is obviously many to many relationship btw them, as one genre can have (obviously) have many songs and one song may belong to many genre (say there is a song xyz, it belong to rap, it can also belong to hip-hop). I have this table GenreSongs which acts as a many to many relationship map btw these two, as it contains GenreID and SongID column. So, what I am supposed to do this, add a column to this Genre table named SongsCount which will contain the number of songs in this genre. I can alter table to add a column, also create a query that will give the count of song,
SELECT GenreID, Count(SongID) FROM GenreSongs GROUP BY GenreID
Now, this gives us what we require, the number of songs per genre, but how can I use this query to update the column I made (SongsCount). One way is that run this query and see the results, and then manually update that column, but I am sure everyone will agree that's not a programmtic way to do it.
I came to think I would require to create a query with a subquery, that would get the value of GenreID from outer query and then count of its value from inner query (correlated query) but I can't make any. Can any one please help me make this?
The question of how to approach this depends on the size of your data and how frequently it is updated. Here are some scenarios.
If your songs are updated quite frequently and your tables are quite large, then you might want to have a column in Genre with the count, and update the column using a trigger on the Songs table.
Alternatively, you could build an index on the GenreSong table on Genre. Then the following query:
select count(*)
from GenreSong gs
where genre = <whatever>
should run quite fast.
If your songs are updated infrequently or in a batch (say nightly or weekly), then you can update the song count as part of the batch. Your query might look like:
update Genre
set SongCnt = cnt
from (select Genre, count(*) as cnt from GenreCount gc group by Genre) gc
where Genre.genre = gc.Genre
And yet another possibility is that you don't need to store the value at all. You can make it part of a view/query that does the calculation on the fly.
Relational databases are quite flexible, and there is often more than one way to do things. The right approach depends very much on what you are trying to accomplish.
Making a table named SongsCount is just plainly bad design (redundant data and update overhead). Instead use this query for single results:
SELECT ID, ..., (SELECT Count(*) FROM GenreSongs WHERE GenreID = X) AS SongsCount FROM Genre WHERE ID = X
And this for multiple results (much more efficient):
SELECT ID, ..., SongsCount FROM (SELECT GenreID, Count(*) AS SongsCount FROM GenreSongs GROUP BY GenreID) AS sub RIGHT JOIN Genre AS g ON sub.GenreID = g.ID

Aggregation with two Joins (MySQL)

I have one table called gallery. For each row in gallery there are several rows in the table picture. One picture belongs to one gallery. Then there is the table vote. There each row is an upvote or a downvote for a certain gallery.
Here is the (simplified) structure:
gallery ( gallery_id )
picture ( picture_id, picture_gallery_ref )
vote ( vote_id, vote_value, vote_gallery_ref )
Now I want one query to give me the following information: All galleries with their own data fields and the number of pictures that are connected to the gallery and the sumarized value of the votes.
Here is my query, but due to the multiple joining the aggregated values are not the right ones. (At least when there is more than one row of either pictures or votes.)
SELECT
*, SUM( vote_value ) as score, COUNT( picture_id ) AS pictures
FROM
gallery
LEFT JOIN
vote
ON gallery_id = vote_gallery_ref
LEFT JOIN
picture
ON gallery_id = picture_gallery_ref
GROUP BY gallery_id
Because I have noticed that COUNT( DISTINCT picture_id ) gives me the correct number of pictures I tried this:
( SUM( vote_value ) / GREATEST( COUNT( DISTINCT picture_id ), 1 ) ) AS score
It works in this example, but what if there were more joins in one query?
Just want to know whether there is a better or more 'elegant' way this problem can be solved. Also I'd like to know whether my solution is MySQL-specific or standard SQL?
This quote from William of Okham applies here:
Enita non sunt multiplicanda praeter necessitatem
(Latin for "entities are not to be multiplied beyond necessity").
You should reconsider why do you need this to be done in a single query? It's true that a single query has less overhead than multiple queries, but if the nature of that single query becomes too complex, both for you to develop, and for the RDBMS to execute, then run separate queries.
Or just use subqueries...
I don't know if this is valid MySQL syntax, but you might be able to do something similar to:
SELECT
gallery.*, a.score, b.pictures
LEFT JOIN
(
select vote_gallery_ref, sum(vote_value) as score
from vote
group by vote_gallery_ref
) a ON gallery_id = vote_gallery_ref
LEFT JOIN
(
select picture_gallery_ref, count(picture_id) as pictures
from picture
group by picture_gallery_ref
) b ON gallery_id = picture_gallery_ref
How often do you add/change vote records?
How often do you add/remove picture records?
How often do you run this query for these totals?
It might be better to create total fields on the gallery table (total_pictures, total_votes, total_vote_values).
When you add or remove a record on the picture table you also update the total on the gallery table. This could be done using triggers on the picture table to automatically update the gallery table. It could also be done using a transaction combining two SQL statements to update the picture table and the gallery table. When you add a record on the picture table increment the total_pictures field on the gallery table. When you delete a record on the picture table decrement the total_pictures field.
Similary when a vote record is added or removed or the vote_value changes you update the total_votes and total_vote_values fields. Adding a record increments the total_votes field and adds vote_values to total_vote_values. Deleting a record decrements the total_votes field and subtracts vote_values from total_vote_values. Updating vote_values on a vote record should also update total_vote_values with the difference (subtract old value, add new value).
Your query now becomes trivial - it's just a straightforward query from the gallery table. But this is at the expense of more complex updates to the picture and vote tables.
As Bill Karwin said, doing this all within one query is pretty ugly.
But, if you have to do it, joining and selecting non-aggregate data with aggregate data requires joining against subqueries (I haven't used SQL that much in the past few years so I actually forgot the proper term for this).
Let's assume your gallery table has additional fields name and state:
select g.gallery_id, g.name, g.state, i.num_pictures, j.sum_vote_values
from gallery g
inner join (
select g.gallery_id, count(p.picture_id) as 'num_pictures'
from gallery g
left join picture p on g.gallery_id = p.picture_gallery_ref
group by g.gallery_id) as i on g.gallery_id = i.gallery_id
left join (
select g.gallery_id, sum(v.vote_value) as 'sum_vote_values'
from gallery g
left join vote v on g.gallery_id = v.vote_gallery_ref
group by g.gallery_id
) as j on g.gallery_id = j.gallery_id
This will yield a result set that looks like:
gallery_id, name, state, num_pictures, sum_vote_values
1, 'Gallery A', 'NJ', 4, 19
2, 'Gallery B', 'NY', 3, 32
3, 'Empty gallery', 'CT', 0,