SQL nested subqueries - sql

I have a schema "Album" for a music database.
This schema includes attributes: Name, Artist, Rating and Year.
I'm trying to create a query that will allow me to get the names of all albums which have a higher rating than every previous album by the same person. I need to check every tuple in my database and compare tuples where artists are the same, and where the year of one album is greater than the year of another, and also compare rating.
I've tried a few different things all using different strategies. My current attempt is a nested query:
SELECT A1.Title
FROM Album A1
WHERE A1.Title NOT IN (SELECT A2.Title
FROM Album A2
WHERE A1.Artist = A2.Artist, A1.Year > A2.Year, A1.Rating > A2.Rating);
This obviously doesn't work (hence my question) - where am I going wrong? I thought a correlated query (like this one) checks every tuple in the table against the subquery? Any clearance on how I could do this query is appreciated. I'm pretty novice at SQL.

I would use window functions:
select a.*
from (select a.*,
max(a.rating) over (partition by a.artist
order by a.year
range between unbounded preceding and 1 preceding
) as prev_max_rating
from album a
) a
where rating > prev_max_rating;

(after replacing the commas by ANDs) The NOT EXISTS(...) is similar to NOT IN(...), but behaves nicer if NULLs are involved.
SELECT A1.Title
FROM Album A1
-- There should not exist an older album with a higher rating
-- (for the same artist)
WHERE NOT EXISTS (SELECT *
FROM Album A2
WHERE A2.Artist = A1.Artist
AND A2.Year < A1.Year
AND A2.Rating > A1.Rating -- Note: the sign is flipped, compared to the query in the question
);

Related

How do I write my Postgres query to return the value that is occuring with the maximum frequency in a particular column?

I have two tables in my database that have information for movie tickets (whose columns are movie ID and ticket ID) and for movie screenings (whose columns are movie title, ticket ID, and show time). I am trying to write an PSQL query that allows me to figure out which specific show time is the most popular (i.e. which show time occurs with the highest frequency in the show time column)
To illustrate, I have so far written my query to return the show times of a particular movie (e.g. 20:20, 13:00) based on the movie's ID being 15.
SELECT show_time FROM screenings
INNER JOIN tickets
ON screenings.ticket_id = tickets.id
WHERE screenings.film_id = 15
Let's pretend that this query returns a series of times such as 20:20, 18:05, 13:00, 20:20. Now, I want to expand this query so it returns the one show time that occurred most frequently in the results (which would be 20:20 in this case). I have tried using a few different approaches, but none of them have worked yet. I tried entering the above as a subquery like:
SELECT MAX(*) FROM
(SELECT COUNT(show_time)
FROM screenings
INNER JOIN tickets
ON screenings.ticket_id = tickets.id
WHERE screenings.film_id = 15)
But then I get the error:
ERROR: subquery in FROM must have an alias
LINE 2: (SELECT COUNT(show_time)
^
HINT: For example, FROM (SELECT ...) [AS] foo.
I have tried researching this topic and putting in an alias, but I'm not familiar enough with PSQL to structure the query correctly. Any help?
Here is one way using analytic functions:
SELECT show_time
FROM (SELECT show_time, COUNT(*) as cnt,
ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) as seqnum
FROM screenings s JOIN
tickets t
ON s.ticket_id = t.id
WHERE s.film_id = 15
GROUP BY show_time
) st
WHERE seqnum = 1;
You can use the mode() function:
SELECT mode() within group (order by show_time) as most_frequent_show_time
FROM screenings
JOIN tickets ON screenings.ticket_id = tickets.id
WHERE screenings.film_id = 15;

Rank order ST_DWithin results by the number of radii a result appears in

I have a table of existing customers and another table of potential customers. I want to return a list of potential customers rank ordered by the number of radii of existing purchasers that they appear in.
There are many rows in the potential customers table per each existing customer, and the radius around a given existing customer could encompass multiple potential customers. I want to return a list of potential customers ordered by the count of the existing customer radii that they fall within.
SELECT pur.contact_id AS purchaser, count(pot.*) AS nearby_potential_customers
FROM purchasers_geocoded pur, potential_customers_geocoded pot
WHERE ST_DWithin(pur.geom,pot.geom,1000)
GROUP BY purchaser;
Does anyone have advice on how to proceed?
EDIT:
With some help, I wrote this query, which seems to do the job, but I'm verifying now.
WITH prequalified_leads_table AS (
SELECT *
FROM nearby_potential_customers
WHERE market_val > 80000
AND market_val < 120000
)
, proximate_to_existing AS (
SELECT pot.prop_id AS prequalified_leads
FROM purchasers_geocoded pur, prequalified_leads_table pot
WHERE ST_DWithin(pot.geom,pur.geom,100)
)
SELECT prequalified_leads, count(prequalified_leads)
FROM proximate_to_existing
GROUP BY prequalified_leads
ORDER BY count(*) DESC;
I want to return a list of potential customers ordered by the count of the existing customer radii that they fall within.
Your query tried the opposite of your statement, counting potential customers around existing ones.
Inverting that, and after adding some tweaks:
SELECT pot.contact_id AS potential_customer
, rank() OVER (ORDER BY pur.nearby_customers DESC
, pot.contact_id) AS rnk
, pur.nearby_customers
FROM potential_customers_geocoded pot
LEFT JOIN LATERAL (
SELECT count(*) AS nearby_customers
FROM purchasers_geocoded pur
WHERE ST_DWithin(pur.geom, pot.geom, 1000)
) pur ON true
ORDER BY 2;
I suggest a subquery with LEFT JOIN LATERAL ... ON true to get counts. Should make use of the spatial index that you undoubtedly have:
CREATE INDEX ON purchasers_geocoded USING gist (geom);
Thereby retaining rows with 0 nearby customers in the result - your original join style would exclude those. Related:
What is the difference between LATERAL and a subquery in PostgreSQL?
Then ORDER BY the resulting nearby_customers in the outer query (not: nearby_potential_customers).
It's not clear whether you want to add an actual rank. Use the window function rank() if so. I made the rank deterministic while being at it, breaking ties with an additional ORDER BY expression: pot.contact_id. Else, peers are returned in arbitrary order which can change for every execution.
ORDER BY 2 is short syntax for "order by the 2nd out column". See:
Select first row in each GROUP BY group?
Related:
How do I query all rows within a 5-mile radius of my coordinates?

Complex sql select

I can't figure out how to make this sql select statement...Here are my tables :
I opened the tables concerned by the request
So basically I want to select the number of albums for each interpret.
I just can't figure it out... I am currently thinking that I need to do my first select on album like :
select interpret.no_interpret, count(*)
from album
.
.
.
group by interpret.no_interpret;
and there work from this but I don't know where to go next.
I may be missing something, but I'm not seeing the direct relation from your song table to the album...
I would first start by getting the link_interpret_song table joined to the song table and get count of distinct albums. However, I didn't see what appears to be a "No_Album" column in the field list of the song table. I can only guess it IS in there associated to the particular album. I did see media, but to me, that would be like a TYPE of media (digital, download, vinyl, CD) vs the actual ID Key apparent to the album table.
That said, I am thinking there IS such a "No_Album" column in the SONG table.
select
LIS.No_Interpret,
COUNT( DISTINCT S.No_Album )
from
Link_Interpret_Song LIS
JOIN Song S
on LIS.No_Song = S.No_Song
group by
LIS.No_Interpret;
Now, that said, if you want the interpret details, take the above results and join that to the interpret table. I've done both distinct album count and total # of songs just as an example of count() vs count(distinct) context... such as
select
PreCounts.No_Interpret,
PreCounts.DistinctAlbums,
PreCounts.ActualSongs,
I.Name_Interpret,
I.First_Name,
I.Stage_Name
from
( select
LIS.No_Interpret,
COUNT( DISTINCT S.No_Album ) as DistinctAlbums,
COUNT(*) as ActualSongs
from
Link_Interpret_Song LIS
JOIN Song S
on LIS.No_Song = S.No_Song
group by
LIS.No_Interpret ) as PreCounts
JOIN Interpret I
ON PreCounts.No_Interpret = I.No_Interpret
The question is ambiguous since there isn't a clear indication of how the tables are related. Given assumptions about these relations, your query will likely take on something similar to the following form:
SELECT COUNT(distinct a.no_album) from album a, interpret i, song s
where i.no_song=s.no_song
and a.no_album=s.no_album GROUP BY i.no_interpret

How to join 3 tables and order by date of all 3?

I have 3 tables,
albums, videos and stories.
All 3 have a date column and each connected with subject_id column.
I want to select all albums, videos and stories of subject_id ordered by date, How can i do that?
Now i have this query :
SELECT albums.*, videos.*, stories.*
FROM albums
LEFT JOIN videos
ON videos.subject_id = albums.subject_id
RIGHT JOIN stories
ON stories.subject_id = albums.subject_id
ORDER BY albums.date
Will this do the trick or am i missing something?
Because i want the dates not be ordered just by albums, stories could have older enteries then albums and so on.
You'll want to replace
ORDER BY albums.date
with
ORDER BY albums.date, videos.date, stories.date
This will order by album date first, then video date within that, and story date within that. I think that will get you what you're looking for.
If I understand correctly, assuming you always want to sort in Ascending order by the earliest date in the set, you could do:
ORDER BY LEAST(albums.date, videos.date, stories.date)
"LEAST" is an Oracle function that returns the minimum (i.e. earliest) value in the set. If you're on a database without a simliar command you could use a CASE statement:
ORDER BY CASE
WHEN albums.date <= videos.date AND albums.date <= stories.date THEN albums.date
WHEN videos.date <= albums.date AND videos.date <= stories.date THEN videos.date
ELSE stories.date
Although I'm not sure this is what you want either. You're going to have a large row with all the album, video, and story information on one line. If you want a row for each different album, video, or story, then you probably want a UNION rather than a set of JOINs, although then you'll have to make sure you use the same fields. For example, if you had a title field in each table that you wanted to see, you could use:
SELECT * FROM (
SELECT 'Album' as RowType, album_title as title, subject_id, date from albums
UNION
SELECT 'Video' as RowType, video_title as title, subject_id, date from videos
UNION
SELECT 'Story' as RowType, story_title as title, subject_id, date from stories
)
ORDER BY subject_id, date
Which would give the sort order I believe you wanted (or you can take subject_id out of the ORDER BY if you preferred). The only thing about the UNION is that you have to return the same columns in each underlying query, so if you want vastly different information from albums than from videos, it will get tricky.

Aggregation with two Joins (MySQL)

I have one table called gallery. For each row in gallery there are several rows in the table picture. One picture belongs to one gallery. Then there is the table vote. There each row is an upvote or a downvote for a certain gallery.
Here is the (simplified) structure:
gallery ( gallery_id )
picture ( picture_id, picture_gallery_ref )
vote ( vote_id, vote_value, vote_gallery_ref )
Now I want one query to give me the following information: All galleries with their own data fields and the number of pictures that are connected to the gallery and the sumarized value of the votes.
Here is my query, but due to the multiple joining the aggregated values are not the right ones. (At least when there is more than one row of either pictures or votes.)
SELECT
*, SUM( vote_value ) as score, COUNT( picture_id ) AS pictures
FROM
gallery
LEFT JOIN
vote
ON gallery_id = vote_gallery_ref
LEFT JOIN
picture
ON gallery_id = picture_gallery_ref
GROUP BY gallery_id
Because I have noticed that COUNT( DISTINCT picture_id ) gives me the correct number of pictures I tried this:
( SUM( vote_value ) / GREATEST( COUNT( DISTINCT picture_id ), 1 ) ) AS score
It works in this example, but what if there were more joins in one query?
Just want to know whether there is a better or more 'elegant' way this problem can be solved. Also I'd like to know whether my solution is MySQL-specific or standard SQL?
This quote from William of Okham applies here:
Enita non sunt multiplicanda praeter necessitatem
(Latin for "entities are not to be multiplied beyond necessity").
You should reconsider why do you need this to be done in a single query? It's true that a single query has less overhead than multiple queries, but if the nature of that single query becomes too complex, both for you to develop, and for the RDBMS to execute, then run separate queries.
Or just use subqueries...
I don't know if this is valid MySQL syntax, but you might be able to do something similar to:
SELECT
gallery.*, a.score, b.pictures
LEFT JOIN
(
select vote_gallery_ref, sum(vote_value) as score
from vote
group by vote_gallery_ref
) a ON gallery_id = vote_gallery_ref
LEFT JOIN
(
select picture_gallery_ref, count(picture_id) as pictures
from picture
group by picture_gallery_ref
) b ON gallery_id = picture_gallery_ref
How often do you add/change vote records?
How often do you add/remove picture records?
How often do you run this query for these totals?
It might be better to create total fields on the gallery table (total_pictures, total_votes, total_vote_values).
When you add or remove a record on the picture table you also update the total on the gallery table. This could be done using triggers on the picture table to automatically update the gallery table. It could also be done using a transaction combining two SQL statements to update the picture table and the gallery table. When you add a record on the picture table increment the total_pictures field on the gallery table. When you delete a record on the picture table decrement the total_pictures field.
Similary when a vote record is added or removed or the vote_value changes you update the total_votes and total_vote_values fields. Adding a record increments the total_votes field and adds vote_values to total_vote_values. Deleting a record decrements the total_votes field and subtracts vote_values from total_vote_values. Updating vote_values on a vote record should also update total_vote_values with the difference (subtract old value, add new value).
Your query now becomes trivial - it's just a straightforward query from the gallery table. But this is at the expense of more complex updates to the picture and vote tables.
As Bill Karwin said, doing this all within one query is pretty ugly.
But, if you have to do it, joining and selecting non-aggregate data with aggregate data requires joining against subqueries (I haven't used SQL that much in the past few years so I actually forgot the proper term for this).
Let's assume your gallery table has additional fields name and state:
select g.gallery_id, g.name, g.state, i.num_pictures, j.sum_vote_values
from gallery g
inner join (
select g.gallery_id, count(p.picture_id) as 'num_pictures'
from gallery g
left join picture p on g.gallery_id = p.picture_gallery_ref
group by g.gallery_id) as i on g.gallery_id = i.gallery_id
left join (
select g.gallery_id, sum(v.vote_value) as 'sum_vote_values'
from gallery g
left join vote v on g.gallery_id = v.vote_gallery_ref
group by g.gallery_id
) as j on g.gallery_id = j.gallery_id
This will yield a result set that looks like:
gallery_id, name, state, num_pictures, sum_vote_values
1, 'Gallery A', 'NJ', 4, 19
2, 'Gallery B', 'NY', 3, 32
3, 'Empty gallery', 'CT', 0,