SQL Arithmetic and joining three columns - sql

I have a schema that looks like this:
+----------+
| tour |
+----------+
| id |
| name |
+----------+
+----------+
| golfer |
+----------+
| id |
| name |
| tour_id |
+----------+
+-----------+
| stat |
+-----------+
| id |
| round |
| score |
| golfer_id |
+-----------+
So essentially a golf tour has X number of golfers in it. A golfer will have X number of stats. The round column in the stat table just contains numbers (1, 2, 3, 4... and so on). They aren't necessarily one after the other but they are unique.
I now want to find all golfers that belong to the "PGA" tour and for each of those golfers, tally up their scores from the last 2 rounds. The last 2 rounds are essentially the rows in the stat table for the golfer with the biggest two numbers. So let's say golfer "Tiger Woods" has played in rounds 1, 3, 6 and 10, then I will only want to tally his scores from rounds 6 and 10. Another requirement is that I don't want to show golfers who are yet to have played in at least two rounds.
I've tried several ways to get this going but have always got myself into a tangle.

If you just want the last two rounds (emphasize on "two") there is a simple trick. This trick does not expand to getting more than two, or not the last two, records. For getting arbitrary records in a partition, you'll have to use window functions, which are more involved and only supported in newer versions of mainstream database engines.
The trick is to self-equal-join the "stat" table to itself on the golfer id. This way, you get all combinations of any two rounds of a golfer, including combinations with the same round:
SELECT s1.round as s1_round, s2.round AS s2_round
FROM stat s1 INNER JOIN stat s2 ON (s1.golfer_id = s2.golfer_id)
Then you exclude (via a WHERE clause) the combinations that have the same rounds and also make sure that these combinations are always first round > second round. This means that now you have all combinations of any two rounds of a golfer, with no duplicates:
SELECT s1.round as s1_round, s2.round AS s2_round
FROM stat s1 INNER JOIN stat s2 ON (s1.golfer_id = s2.golfer_id)
WHERE s1.round > s2.round
Notice that if you select only the records for a particular golfer and sort DESC on the two round columns, the top row will be the last two rounds of that golfer:
SELECT TOP 1 s1.round as s1_round, s2.round AS s2_round
FROM stat s1 INNER JOIN stat s2 ON (s1.golfer_id = s2.golfer_id)
WHERE s1.round > s2.round
ORDER BY s1.round DESC, s2.round DESC
TOP 1 is SQL Server lingo to get the top row. For MySQL, you need to use LIMIT 1. For other databases, use the database engine's particular way.
However, in this case you can't do it so simply because you need the last two rounds of ALL golfers. You'll have to do more joins:
SELECT id,
(SELECT MAX(s1.round) FROM stat s1 INNER JOIN stat s2 ON (s1.golfer_id = s2.golfer_id)
WHERE s1.round > s2.round AND s1.golfer_id = golfer.id) AS last_round,
(SELECT MAX(s2.round) FROM stat s1 INNER JOIN stat s2 ON (s1.golfer_id = s2.golfer_id)
WHERE s1.round > s2.round AND s1.golfer_id = golfer.id) AS second_to_last_round
FROM golfer
This will give you the last two rounds (in two columns) for each golfer.
Or joining the golfer table with the two-column temp set should work also:
SELECT golfer.id, MAX(r.s1_round) AS last_round, MAX(r.s2_round) AS second_to_last_round
FROM golfer INNER JOIN
(
SELECT s1.golfer_id AS golfer_id, s1.round AS s1_round, s2.round AS s2_round
FROM stat s1 INNER JOIN stat s2 ON (s1.golfer_id = s2.golfer_id)
WHERE s1.round > s2.round
) r ON (r.golfer_id = golfer.id)
GROUP BY golfer.id
I leave it as a trivial exercise to join this query to the tour table to get golfers of the PGA tour, and to join this query back to the stats table to get the scores of the last two rounds.

HSQLDB 2.1 supports LATERAL joins, which allow this sort of select with arbitrary criteria.
A simple join will list all the golfers in the PGA tour:
select golfer.name from tour join golfer on (tour.id = tour_id and tour.name = 'PGA')
You then LATERAL join this table as many times as you need to the particular score. The next example includes the score for the last round (only if the play has played a round)
select golfer.name, firststat.score from tour join golfer on (tour.id = tour_id and tour.name = 'PGA' ),
lateral(select * from stat where golfer_id = golfer.id order by round desc limit 1) firststat
In the next example, you use one more lateral join to include the last but one round. If the player has not palyed two rounds, there will be no row for the player:
select golfer.name, secondstat.score score1, firststat.score score2 from tour join golfer on (tour.id = tour_id and tour.name = 'PGA' ),
lateral(select * from stat where golfer_id = golfer.id order by round desc limit 1 offset 1) secondstat,
lateral(select * from stat where golfer_id = golfer.id order by round desc limit 1) firststat
The LATERAL join does not need a WHERE clause, because the "where condition" is taken from the tables in the FROM list that appear before the current table. Therefore the SELECT statements in the subqueries of the LATERAL tables can use the golfer.id from the first joined table.

Related

SQL multiple Joing Question, cant join 5 tables, problem with max

I got 6 tables:
Albums
id_album | title | id_band | year |
Bands
id_band | name |style | origin
composers
id_musician | id_song
members
id_musician | id_band | instrument
musicians
id_musician | name | birth | death | gender
songs
id_song | title | duration | id_album
I need to write a query where I get the six bands with more members and of those bands, get the longest song duration and it's title.
So far, I can get the biggest bands:
SELECT bands.name, COUNT(id_musician) AS numberMusician
FROM bands
INNER JOIN members USING (id_band)
GROUP BY bands.name
ORDER BY numberMusician DESC
LIMIT 6;
I can also get the longest songs:
SELECT MAX(duration), songs.title, id_album, id_band
FROM SONGs
INNER JOIN albums USING (id_album)
GROUP BY songs.title, id_album, id_band
ORDER BY MAX(duration) DESC
The problem occurs when I am trying to write a subquery to get the band with the corresponding song and its duration. Trying to do it with inner joins also gets me undesired results. Could someone help me?
I have tried to put the subquery in the where, but I can't find how to do it due to MAX.
Thanks
I find that using lateral joins makre the query easier to write. You already have the join logic all right, so we just need to correlate the bands with the musicians the songs.
So:
select b.name, m.*, s.*
from bands b
cross join lateral (
select count(*) as cnt_musicians
from members m
where m.id_band = b.id_band
) m
cross join lateral (
select s.title, s.duration
from songs s
inner join albums a using (id_album)
where a.id_band = b.id_band
order by s.duration desc limit 1
) s
order by m.cnt_musicians desc
limit 6
For each band, subquery m counts the number of musicians per group (its where clause correlates to the outer query), while s retrieves the longest song, using correlation, order by and limit. The outer query just combines the information, and then orders selects the top 6 bands.

postgres STRING_AGG() returns duplicates?

I have seen some similar posts, requesting advice for getting distinct results from the query. This can be solved with a subquery, but the column I am aggregating image_name is unique image_name VARCHAR(40) NOT NULL UNIQUE. I don't believe that should be necersarry.
This is the data in the spot_images table
spotdk=# select * from spot_images;
id | user_id | spot_id | image_name
----+---------+---------+--------------------------------------
1 | 1 | 1 | 81198013-e8f8-4baa-aece-6fbda15a0498
2 | 1 | 1 | 21b78e4e-f2e4-4d66-961f-83e5c28d69c5
3 | 1 | 1 | 59834585-8c49-4cdf-95e4-38c437acb3c1
4 | 1 | 1 | 0a42c962-2445-4b3b-97a6-325d344fda4a
(4 rows)
SELECT Round(Avg(ratings.rating), 2) AS rating,
spots.*,
String_agg(spot_images.image_name, ',') AS imageNames
FROM spots
FULL OUTER JOIN ratings
ON ratings.spot_id = spots.id
INNER JOIN spot_images
ON spot_images.spot_id = spots.id
WHERE spots.id = 1
GROUP BY spots.id;
This is the result of the images row:
81198013-e8f8-4baa-aece-6fbda15a0498,
21b78e4e-f2e4-4d66-961f-83e5c28d69c5,
59834585-8c49-4cdf-95e4-38c437acb3c1,
0a42c962-2445-4b3b-97a6-325d344fda4a,
81198013-e8f8-4baa-aece-6fbda15a0498,
21b78e4e-f2e4-4d66-961f-83e5c28d69c5,
59834585-8c49-4cdf-95e4-38c437acb3c1,
0a42c962-2445-4b3b-97a6-325d344fda4a,
81198013-e8f8-4baa-aece-6fbda15a0498,
21b78e4e-f2e4-4d66-961f-83e5c28d69c5,
59834585-8c49-4cdf-95e4-38c437acb3c1,
0a42c962-2445-4b3b-97a6-325d344fda4a
Not with linebreaks, I added them for visibility.
What should I do to retrieve the image_name's one time each?
If you don't want duplicates, use DISTINCT:
String_agg(distinct spot_images.image_name, ',') AS imageNames
Likely, there are several rows in ratings that match the given spot, and several rows in spot_images that match the given sport as well. As a results, rows are getting duplicated.
One option to avoid that is to aggregate in subqueries:
SELECT r.avg_raging
s.*,
si.image_names
FROM spots s
FULL OUTER JOIN (
SELECT spot_id, Round(Avg(ratings.rating), 2) avg_rating
FROM ratings
GROUP BY spot_id
) r ON r.spot_id = s.id
INNER JOIN (
SELECT spot_id, string_agg(spot_images.image_name, ',') image_names
FROM spot_images
GROUP BY spot_id
) si ON si.spot_id = s.id
WHERE s.id = 1
This actually could be more efficient that outer aggregation.
Note: it is hard to tell without seeing your data, but I am unsure that you really need a FULL JOIN here. A LEFT JOIN might actually be what you want.

How to group results by count of relationships

Given tables, Profiles, and Memberships where a profile has many memberships, how do I query profiles based on the number of memberships?
For example I want to get the number of profiles with 2 memberships. I can get the number of profiles for each membership with:
SELECT "memberships"."profile_id", COUNT("profiles"."id") AS "membership_count"
FROM "profiles"
INNER JOIN "memberships" on "profiles"."id" = "memberships"."profile_id"
GROUP BY "memberships"."profile_id"
That returns results like
profile_id | membership_count
_____________________________
1 2
2 5
3 2
...
But how do I group and sum the counts to get the query to return results like:
n | profiles_with_n_memberships
_____________________________
1 36
2 28
3 29
...
Or even just a query for a single value of n that would return
profiles_with_2_memberships
___________________________
28
I don't have your sample data, but I just recreated the scenario here with a single table : Demo
You could LEFT JOIN the counts with generate_series() and get zeroes for missing count of n memberships. If you don't want zeros, just use the second query.
Query1
WITH c
AS (
SELECT profile_id
,count(*) ct
FROM Table1
GROUP BY profile_id
)
,m
AS (
SELECT MAX(ct) AS max_ct
FROM c
)
SELECT n
,COUNT(c.profile_id)
FROM m
CROSS JOIN generate_series(1, m.max_ct) AS i(n)
LEFT JOIN c ON c.ct = i.n
GROUP BY n
ORDER BY n;
Query2
WITH c
AS (
SELECT profile_id
,count(*) ct
FROM Table1
GROUP BY profile_id
)
SELECT ct
,COUNT(*)
FROM c
GROUP BY ct
ORDER BY ct;

Left join doesn't return all results from first table [duplicate]

This question already has answers here:
Left Join With Where Clause
(7 answers)
Closed 5 years ago.
I have a table that is used for reference (number), as follows
----------------------------
Id | Name | Description |
----------------------------
1 One The number one
2 Two The number two
3 Three The number three
And then another table (user_number) that references these values but for specific users
--------------------------
Id | User_Id | Number_Id |
--------------------------
1 400 1
2 400 2
I want to retrieve all the results of the first table but I want to see where they match up also. I have tried to use the following query but it only returns what an INNER JOIN would return
SELECT n.Id, n.Name, n.Description, un.Id As Active_Number
FROM number n
LEFT JOIN user_number un
ON n.Id = un.Id
WHERE un.User_Id = 400;
This query returns:
[
{
"Id": 1,
"Name": "One",
"Description": "The number one",
"Active_Number": 1
},
{
"Id": 2,
"Name": "Two",
"Description": "The number two",
"Active_Number": 2
}
]
However this doesn't return the third set of values from the number table.
Your WHERE clause is making this into an INNER JOIN. Put that in the ON clause instead:
SELECT n.Id, n.Name, n.Description, un.Id As Active_Number
FROM number n
LEFT JOIN user_number un
ON n.Id = un.Id
AND un.User_Id = 400;
Dont' know what DB you're using but try
SELECT n.Id, n.Name, n.Description, un.Id As Active_Number
FROM number n
LEFT JOIN user_number un
ON n.Id = un.Id and un.User_Id = 400;
Because in your where condition, you are eliminating all records where User ID is not 400 - which excludes any row in the left table, but not the right.
Based on the where condition placement, you're applying that filter to the whole query, AFTER the join. What you should od instead is this:
SELECT n.Id, n.Name, n.Description, un.Id As Active_Number
FROM number n
LEFT JOIN (select * from user_number
WHERE un.User_Id = 400 ) un
ON n.Id = un.Id;
EDIT: the other responder's answer is the better choice, which eluded me when I was typing up this code - which works also, but is not the ideal solution.
Your LEFT JOIN does return all results from the first table. Here's the result of the LEFT JOIN:
-------------------------------------------------------
Id | Name | Description | Id | User_Id | Number_Id |
-------------------------------------------------------
1 One The number one 1 400 1
2 Two The number two 2 400 2
3 Three The number three NULL NULL NULL
Here, if you filter on User_Id = 400, it should be obvious why the third row is excluded.
A common suggestion when seeing these results is changing the filter to WHERE User_Id IS NULL OR User_Id = 400. Don't do this.
Suppose you have other records in your second table with a different user ID, so that the left join result looks like:
-------------------------------------------------------
Id | Name | Description | Id | User_Id | Number_Id |
-------------------------------------------------------
1 One The number one 1 400 1
2 Two The number two 2 400 2
2 Two The number two 3 500 2
3 Three The number three 4 500 3
Here, it should be obvious that again, you'd be removing the last row.
The other answers you've received about moving the WHERE condition to the join condition will work, but a more logical, IMO, approach is to use a subquery:
SELECT n.Id, n.Name, n.Description, un.Id As Active_Number
FROM number n
LEFT JOIN (
SELECT *
FROM user_number
WHERE user_id = 400
) AS un
ON n.Id = un.Id;
This reduces the data set to which you're joining, to only the data to which you actually want to join. But the result is the same as putting the user_id = 400 check in the join condition.
Always put the join conditions in the on clause If you are doing an inner join, so do not add any where conditions to the on clause, put them in the where clause
If you are doing a left join, add any where conditions to the on clause for the table in the right side of the join. This is a must because adding a where clause that references the right side of the join will convert the join to an inner join (With one exception described below).
SELECT n.Id, n.Name, n.Description, un.Id As Active_Number
FROM number n
LEFT JOIN user_number un
ON n.Id = un.Id
AND un.User_Id = 400;
Remove the WHERE condition and it will work as you intended. What you are doing basically is getting the left join correctly and then filtering out the 3rd row.
If you want to get all 3 results you can do:
SELECT n.Id, n.Name, n.Description, un.Id As Active_Number
FROM number n
LEFT JOIN user_number un
ON n.Id = un.Id
WHERE u.Id < 4;
In other words, the table you should be filtering is the left one on the left join (number).

SQL select two rows from same table (same column, in fact), based on join with other table

I have two tables: votes and submissions. votes stores all the received votes (each person can give a first and second vote), submissions holds the items people can vote for. Each item has its own ID. The votes table stores the ID of each vote. I want to retrieve the names of the items people votes on. Here's an example of what the tables look like:
votes:
**voter** | **vote1_ID** | **vote2_ID**
Foo | 1 | 2
Bar | 3 | 2
Mark | 2 | 3
submissions:
**ID** | **name**
1 | John
2 | Jane
3 | Mary
I already stated I want to retrieve both the name associated with the first vote and the name associated with the second vote within one query (in fact, I don't really care how many queries it takes, but a single query is always nicer and cleaner of course). How would I go on doing this? I already tried figured I need to use a join, but I can't figure out how to retrieve the value from a same column twice.
EDIT: I figured giving an example of what query I'm trying to perform might be useful:
For example, if I want to see what Bar has voted for, the result of the query should be submissions.name twice. In the result of Mark, this is Jane and Mary.
You can do two inner joins to select the separate values.
SELECT s1.name, s2.name
FROM votes v
INNER JOIN submissions s1 ON v.vote1_ID = s1.ID
INNER JOIN submissions s2 ON v.vote2_ID = s2.ID
You have to join the submissions table twice to get the expected result.
select v.voter, s1.name, s2.name
from votes v
join submissions s1 on v.vote1_id = s1.id
join submissions s2 on v.vote2_id = s2.id
And if you want rows instead of columns, you could do two joins and union them together:
select v.voter, s1.name
from votes v
join submissions s1 on v.vote1_id = s1.id
UNION ALL
select v.voter, s2.name
from votes v
join submissions s2 on v.vote2_id = s2.id