correlated query to update a table based on a select - sql

I have these tables Genre and Songs. There is obviously many to many relationship btw them, as one genre can have (obviously) have many songs and one song may belong to many genre (say there is a song xyz, it belong to rap, it can also belong to hip-hop). I have this table GenreSongs which acts as a many to many relationship map btw these two, as it contains GenreID and SongID column. So, what I am supposed to do this, add a column to this Genre table named SongsCount which will contain the number of songs in this genre. I can alter table to add a column, also create a query that will give the count of song,
SELECT GenreID, Count(SongID) FROM GenreSongs GROUP BY GenreID
Now, this gives us what we require, the number of songs per genre, but how can I use this query to update the column I made (SongsCount). One way is that run this query and see the results, and then manually update that column, but I am sure everyone will agree that's not a programmtic way to do it.
I came to think I would require to create a query with a subquery, that would get the value of GenreID from outer query and then count of its value from inner query (correlated query) but I can't make any. Can any one please help me make this?

The question of how to approach this depends on the size of your data and how frequently it is updated. Here are some scenarios.
If your songs are updated quite frequently and your tables are quite large, then you might want to have a column in Genre with the count, and update the column using a trigger on the Songs table.
Alternatively, you could build an index on the GenreSong table on Genre. Then the following query:
select count(*)
from GenreSong gs
where genre = <whatever>
should run quite fast.
If your songs are updated infrequently or in a batch (say nightly or weekly), then you can update the song count as part of the batch. Your query might look like:
update Genre
set SongCnt = cnt
from (select Genre, count(*) as cnt from GenreCount gc group by Genre) gc
where Genre.genre = gc.Genre
And yet another possibility is that you don't need to store the value at all. You can make it part of a view/query that does the calculation on the fly.
Relational databases are quite flexible, and there is often more than one way to do things. The right approach depends very much on what you are trying to accomplish.

Making a table named SongsCount is just plainly bad design (redundant data and update overhead). Instead use this query for single results:
SELECT ID, ..., (SELECT Count(*) FROM GenreSongs WHERE GenreID = X) AS SongsCount FROM Genre WHERE ID = X
And this for multiple results (much more efficient):
SELECT ID, ..., SongsCount FROM (SELECT GenreID, Count(*) AS SongsCount FROM GenreSongs GROUP BY GenreID) AS sub RIGHT JOIN Genre AS g ON sub.GenreID = g.ID

Related

How to design a query in WHERE clause of all column that contain same data value?

I have a table, the columns are:
Respondent_ID, classical, gospel, pop, kpop, country, folk, rock, metal ... (all genre of music)
there are 16 columns of different type of genre of music,
and data value is Never, Rarely, Sometimes or Very frequently
SELECT *
FROM genre_frequency
WHERE
I want to design a query which show results of all columns in the table what has the value 'Very Frequently', can anyone lend me a hand here? I'm still new to this, please help anyone...
Could put the same criteria under every genre field with OR operator - very messy. Or could use a VBA custom function.
Or could normalize data structure so you have fields: RespondentID, Genre, Frequency. A UNION query can rearrange data to this normalized structure (unpivot). There is a limit of 50 SELECT lines and there is no builder or wizard for UNION - must type or copy/paste in SQL View.
SELECT Respondent_ID, "classical" AS Genre, classical AS Frequency FROM genre_frequency
UNION SELECT Respondent_ID, "gospel", gospel FROM genre_frequency
... {continue for additional genre columns};
Now use that query like a table in subsequent queries. Just cannot edit data.
SELECT * FROM qryUNION WHERE Frequency="Very frequently";
UNION query can perform slowly with very large dataset. Probably would be best to redesign table. Could save this rearranged data to a table. If you want to utilize lookup tables for Genre and Frequency in order to save ID keys instead of full descriptive text, that can also be accommodated in redesign.
You should normalize your schema. This one has the problem that it requires you to alter the table whenever you want to add or remove a genre.
You must have at least three tables:
Table Respondent: Respondent_ID (PK), Name, etc.
Table Genre: Genre_ID (PK), Name
Table Respondent_Genre: Respondent_ID (PK, FK), Genre_ID (PK, FK), Frequency
This also easily allows you to alter the name of a genre or to add additional attributes to a genre like sub-genre or an annotation like (1930–present).
Optionally, you could also have a lookup table for Frequencies and then include the Frequency_ID in Respondent_Genre instead the Frequency as text.
Then you can write a query like this
SELECT r.Name, g.Name, rg.Frequency
FROM
(Respondent r
INNER JOIN Respondent_Genre rg
ON r.Respondent_ID = rg.Respondent_ID)
INNER JOIN Genre g
ON rg.Genre_ID = g.Genre_ID
WHERE
rg.Frequency = 'Very Frequently'

DISTINCT in a simple SQL query

When executing SQL queries I have been trying to figure out the following:
In this example:
SELECT DISTINCT AL.id, AL.name
FROM albums AL
why is there a need to specify distinct? I thought that the Id being a primary key was enough to avoid duplicate results.
When you specify distinct you are specifying that you want the whole row to be distinct. For example if you have two rows:
ID=1 and Name='Joe Smith'
ID=2 and Name='Joe Smith'
then your query is going to return both rows because the different ID values make the rows distinct.
However, if you are selecting only the ID column (and it's your primary key) then the distinct is pointless.
If you're trying to find all of the unique names then you'd want to:
SELECT DISTINCT AL.name
FROM albums AL
You are right, in your case there should be no need for the word distinct because you are asking for the id and the name. Now, for sake of example where distinct is necessary, say you had multiple id's with the same name. Let It Be is an album by both the Beatles and the Replacements. And let's say you were using your database to write out labels that only included the names of the albums. The query you would want would be:
select distinct al.name
from albums al;
Sometimes your database is not perfect and it ends up with a bunch of junk data. If the id has not been designated as unique, you might end up with duplicate records, and then you might want to avoid seeing the duplicates in your query results.

SQL: How do I find which movie genre a user watched the most? (IMDb personal project)

I'm currently working on a personal project and I could use a little help. Here's the scenario:
I'm creating a database (MS Access) for all of the movies myself and some friends have ever watched. We rated all of our movies on IMDb and used the export feature to get all of the movie data and our movie ratings. I plan on doing some summary analysis on Excel. One thing I am interested in is the most common movie genre that each person watched. Below is my current scenario. Note that the column "const" is the movies' unique IDs. I also have individual tables for each person's ratings and the following tables are the summary tables that make up the combination of all the movies we have watched.
Here's the table I had: http://imgur.com/v5x9Dhg
I assigned each genre an ID, like this: http://imgur.com/aXdr9XI
And here is a table where I have separate instances for each movie ID and a unique genre: http://imgur.com/N0wULo8
I want to find a way to count up all of the genres that each person watches. Any advice? I would love to provide any additional information that you need!
Thank you!
You need to have at least one table which has one row per user and const (movie watched). In the 3 example tables you posted nothing shows who watched which movies, which is information you need to solve your problem. You mention having "individual tables for each person's ratings," so I assume you have that information. You will want to combine all of them though, into a table called PERSON_MOVIE or something of the like.
So let's say your second table is called GENRE and its columns are ID, Genre.
Let's say your third table is called GENRE_MOVIE and its columns are Const and ID (ID corresponds to ID on the GENRE table)
Let's say the fourth table, which you did not post, but which is required, is called PERSON_MOVIE and its columns are person, Const, rating.
You could then write a query like this:
select vw1.*, ge.genre
from (select um.person, gm.id as genre_id, count(*) as num_of_genre
from user_movie um
inner join genre_movie gm
on um.const = gm.const
group by um.person, gm.id) vw1
inner join (select person, max(num_of_genre) as high_count
from (select um.person, gm.id, count(*) as num_of_genre
from user_movie um
inner join genre_movie gm
on um.const = gm.const
group by um.person, gm.id) x
group by person) vw2
on vw1.person = vw2.person
and vw1.num_of_genre = vw2.high_count
inner join genre ge
on vw1.genre_id = ge.id
Edit re: your comment:
So right now you have multiple tables reflecting people's ratings of movies. You need to combine those into a table called PERSON_MOVIE or something similar (as in example above).
There will be 3 columns on the table: person, const, rating
I'm not sure if access supports the traditional create table as select query but ordinarily you would be able to construct such a table in the following way:
create table person_movie as
select 'Bob', const, [You rated]
from ratings_by_bob
union all
select 'Sally', const, [You rated]
from ratings_by_sally
union all
select 'Jack', const, [You rated]
from ratings_by_jack
....
If not, just combine the tables manually and add a third column as shown indicating what users are reflected by each row. Then you can run my initial query.

Complex sql select

I can't figure out how to make this sql select statement...Here are my tables :
I opened the tables concerned by the request
So basically I want to select the number of albums for each interpret.
I just can't figure it out... I am currently thinking that I need to do my first select on album like :
select interpret.no_interpret, count(*)
from album
.
.
.
group by interpret.no_interpret;
and there work from this but I don't know where to go next.
I may be missing something, but I'm not seeing the direct relation from your song table to the album...
I would first start by getting the link_interpret_song table joined to the song table and get count of distinct albums. However, I didn't see what appears to be a "No_Album" column in the field list of the song table. I can only guess it IS in there associated to the particular album. I did see media, but to me, that would be like a TYPE of media (digital, download, vinyl, CD) vs the actual ID Key apparent to the album table.
That said, I am thinking there IS such a "No_Album" column in the SONG table.
select
LIS.No_Interpret,
COUNT( DISTINCT S.No_Album )
from
Link_Interpret_Song LIS
JOIN Song S
on LIS.No_Song = S.No_Song
group by
LIS.No_Interpret;
Now, that said, if you want the interpret details, take the above results and join that to the interpret table. I've done both distinct album count and total # of songs just as an example of count() vs count(distinct) context... such as
select
PreCounts.No_Interpret,
PreCounts.DistinctAlbums,
PreCounts.ActualSongs,
I.Name_Interpret,
I.First_Name,
I.Stage_Name
from
( select
LIS.No_Interpret,
COUNT( DISTINCT S.No_Album ) as DistinctAlbums,
COUNT(*) as ActualSongs
from
Link_Interpret_Song LIS
JOIN Song S
on LIS.No_Song = S.No_Song
group by
LIS.No_Interpret ) as PreCounts
JOIN Interpret I
ON PreCounts.No_Interpret = I.No_Interpret
The question is ambiguous since there isn't a clear indication of how the tables are related. Given assumptions about these relations, your query will likely take on something similar to the following form:
SELECT COUNT(distinct a.no_album) from album a, interpret i, song s
where i.no_song=s.no_song
and a.no_album=s.no_album GROUP BY i.no_interpret

Optimizing MySQL Query

We have a query that is currently killing our database and I know there has to be a way to optimize it. We have 3 tables:
items - table of items where each items has an associated object_id, length, difficulty_rating, rating, avg_rating & status
lists - table of lists which are basically lists of items created by our users
list_items - table with 2 columns: list_id, item_id
We've been using the following query to display a simple HTML table that shows each list and a number of attributes related to the list including averages of attributes of the included list items:
select object_id, user_id, slug, title, description, items,
city, state, country, created, updated,
(select AVG(rating) from items
where object_id IN
(select object_id from list_items where list_id=lists.object_id)
AND status="A"
) as 'avg_rating',
(select AVG(avg_rating) from items
where object_id IN
(select object_id from list_items where list_id=lists.object_id)
AND status="A"
) as 'avg_avg_rating',
(select AVG(length) from items
where object_id IN
(select object_id from list_items where list_id=lists.object_id)
AND status="A"
) as 'avg_length',
(select AVG(difficulty_rating) from items
where object_id IN
(select object_id from list_items where list_id=lists.object_id)
AND status="A"
) as 'avg_difficulty'
from lists
where user_id=$user_id AND status="A"
order by $orderby LIMIT $start,$step
The reason why we haven't broken this up in 1 query to get all the lists and subsequent lookups to pull the averages for each list is because we want the user to be able to sort on the averages columns (i.e. 'order by avg_difficulty').
Hopefully my explanation makes sense. There has to be a much more efficient way to do this and I'm hoping that a MySQL guru out there can point me in the right direction. Thanks!
It looks like you can replace all the subqueries with joins:
SELECT l.object_id,
l.user_id,
<other columns from lists>
AVG(i.rating) as avgrating,
AVG(i.avg_rating) as avgavgrating,
<other averages>
FROM lists l
LEFT JOIN list_items li
ON li.list_id = l.object_id
LEFT JOIN items i
ON i.object_id = li.object_id
AND i.status = 'A'
WHERE l.user_id = $user_id AND l.status = 'A'
GROUP BY l.object_id, l.user_id, <other columns from lists>
That would save a lot of work for the DB engine.
Here how to find the bottleneck:
Add the keyword EXPLAIN before the SELECT. This will cause the engine to output how the SELECT was performed.
To learn more about Query Optimization with this method see: http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
A couple of things to consider:
Make sure that all of your joins are indexed on both sides. For example, you join list_items.list_id=lists.object_id in several places. list_id and object_id should both have indexes on them.
Have you done any research as to what the variation in the averages are? You might benefit from having a worker thread (or cronjob) calculate the averages periodically rather than putting the load on your RDBMS every time you run this query. You'd need to store the averages in a separate table of course...
Also, are you using status as an enum or a varchar? The cardinality of an enum would be much lower; consider switching to this type if you have a limited range of values for status column.
-aj
That's one hell of a query... you should probably edit your question and change the query so it's a bit more readable, although due to the complex nature of it, I'm not sure that's possible.
Anyway, the simple answer here is to denormalize your database a bit and cache all of your averages on the list table itself in indexed decimal columns. All those sub queries are killing you.
The hard part, and what you'll have to figure out is how to keep those averages updated. A generally easy way is to store the count of all items and the sum of all those values in two separate fields. Anytime an action is made, increment the count by 1, and the sum by whatever. Then update table avg_field = sum_field/count_field.
Besides indexing, even a cursory analysis shows that your query contains much redundancy that your DBMS' optimizer cannot be able to spot (SQL is a redundant language, it admits too many equivalents, syntactically different expressions; this is a known and documented problem - see for example SQL redundancy and DBMS performance, by Fabian Pascal).
I will rewrite your query, below, to highlight that:
let LI =
select object_id from list_items where list_id=lists.object_id
in
select object_id, user_id, slug, title, description, items, city, state, country, created, updated,
(select AVG(rating) from items where object_id IN LI AND status="A") as 'avg_rating',
(select AVG(avg_rating) from items where object_id IN LI AND status="A") as 'avg_avg_rating',
(select AVG(length) from items where object_id IN LI AND status="A") as 'avg_length',
(select AVG(difficulty_rating) from items where object_id IN LI AND status="A") as 'avg_difficulty'
from lists
where user_id=$user_id AND status="A"
order by $orderby
LIMIT $start, $step
Note: this is only the first step to refactor that beast.
I wonder: why people rarely - if at all - use views, even only to simplify SQL queries? It will help in writing more manageable and refactorable queries.