I have the following piece of code:
select sum(max(p.highest_market_value_in_eur)) as value, c.name from Transfermarket.dbo.players$ p
left join Transfermarket.dbo.player_valuations v
on p.player_id = v.player_id
inner join Transfermarket.dbo.competitions$ c
on c.competition_id = v.player_club_domestic_competition_id
where c.name = 'Ligue 1' and last_season = 2022
group by c.name;
Apparently, performing aggregate functions on expressions containing an aggregate doesn't work. Is there any other way? I suppose I could use a subquery, but I am not sure how to do that.
So you're going to want to do this in two steps.
If i am understanding your question correctly, you are wanting to find the maximum p.highest_market_value_in_eur for each player in the players table for a given c.name where the last_season is 2022. Then you want sum for all players in that competition.
Lets find our max first:
select
max(p.highest_market_value_in_eur) as value,
p.player_id as player_id
from
Transfermarket.dbo.players$ p
left join Transfermarket.dbo.player_valuations v on p.player_id = v.player_id
inner join Transfermarket.dbo.competitions$ c on c.competition_id = v.player_club_domestic_competition_id
where c.name = 'Ligue 1' and last_season = 2022
group by p.player_id;
The above query will find the max value of p.highest_market_value_in_eur for each player where the competition name is 'Ligue 1' and the last season is 2022. Notice that the first step here is grouping on a different value. This is because we want the max PER PLAYER - so we ask for the max and group by player.
Now that we've found the max for every player in Ligue 1 from 2022, lets calculate our sum to find 'Total Market Value in EUR for all players competing in Ligue 1 from 2022'. We'll do this by calling the sum() aggregate function on the results from our first query. we'll do this in the form of a subquery:
select
sum(value)
from
(select
max(p.highest_market_value_in_eur) as value,
p.player_id as player_id
from
Transfermarket.dbo.players$ p
left join Transfermarket.dbo.player_valuations v on p.player_id = v.player_id
inner join Transfermarket.dbo.competitions$ c on c.competition_id = v.player_club_domestic_competition_id
where c.name = 'Ligue 1' and last_season = 2022
group by p.player_id) max_per_player;
Hope this helps.
Related
I am using the basic chinook database and I am trying to get a query that will display the worst selling genres. I am mostly getting the answer, however there is one genre 'Opera' that has 0 sales, but the query result is ignoring that and moving on to the next lowest non-zero value.
I tried using left join instead of inner join but that returns different values.
This is my query currently:
create view max
as
select distinct
t1.name as genre,
count(*) as Sales
from
tracks t2
inner join
invoice_items t3 on t2.trackid == t3.trackid
left join
genres as t1 on t1.genreid == t2.genreid
group by
t1.genreid
order by
2
limit 10;
The result however skips past the opera value which is 0 sales. How can I include that? I tried using left join but it yields different results.
Any help is appreciated.
If you want to include genres with no sales then you should start the joins from genres and then do LEFT joins to the other tables.
Also, you should not use count(*) which counts any row in the resultset.
SELECT g.name Genre,
COUNT(i.trackid) Sales
FROM genres g
LEFT JOIN tracks t ON t.genreid = g.genreid
LEFT JOIN invoice_items i ON i.trackid = t.trackid
GROUP BY g.genreid
ORDER BY Sales LIMIT 10;
There is no need for the keyword DISTINCT, since the query returns 1 row for each genre.
When asking for the top n one must always state how to deal with ties. If I am looking for the top 1, but there are three rows in the table, all with the same value, shall I select 3 rows? Zero rows? One row arbitrarily chosen? Most often we don't want arbitrary results, which excludes the last option. This excludes LIMIT, too, because LIMIT has no clause for ties in SQLite.
Here is an example with DENSE_RANK instead. You are looking for the worst selling genres, so we must probably look at the revenue per genre, which is the sum of price x quantity sold. In order to include genres without invoices (and maybe even without tracks?) we outer join this data to the genre table.
select total, genre_name
from
(
select
g.name as genre_name,
coalesce(sum(ii.unit_price * ii.quantity), 0) as total
dense_rank() over (order by coalesce(sum(ii.unit_price * ii.quantity), 0)) as rnk
from genres g
left join tracks t on t.genreid = g.genreid
left join invoice_items ii on ii.trackid = t.trackid
group by g.name
) aggregated
where rnk <= 10
order by total, genre_name;
I'm making a select in which I give a year (hardcoded as 1981 below) and I expect to get one row per qualifying band. The main problem is to get the oldest living member for each band:
SELECT b.id_band,
COUNT(DISTINCT a.id_album),
COUNT(DISTINCT s.id_song),
COUNT(DISTINCT m.id_musician),
(SELECT name FROM MUSICIAN WHERE year_death IS NULL ORDER BY(birth)LIMIT 1)
FROM BAND b
LEFT JOIN ALBUM a ON(b.id_band = a.id_band)
LEFT JOIN SONG s ON(a.id_album = s.id_album)
JOIN MEMBER m ON(b.id_band= m.id_band)
JOIN MUSICIAN mu ON(m.id_musician = mu.id_musician)
/*LEFT JOIN(SELECT name FROM MUSICIAN WHERE year_death IS NULL
ORDER BY(birth) LIMIT 1) AS alive FROM mu*/ -- ??
WHERE b.year_formed = 1981
GROUP BY b.id_band;
I would like to obtain the oldest living member from mu for each band. But I just get the oldest musician overall from the relation MUSICIAN.
Here is screenshot showing output for my current query:
Well, I think you can follow the structure that you have, but you need JOINs in in the subquery.
SELECT b.id_band,
COUNT(DISTINCT a.id_album),
COUNT(DISTINCT s.id_song),
COUNT(DISTINCT mem.id_musician),
(SELECT m.name
FROM MUSICIAN m JOIN
MEMBER mem
ON mem.id_musician = m.id_musician
WHERE m.year_death IS NULL AND mem.id_band = b.id_band
ORDER BY m.birth
LIMIT 1
) as oldest_member
FROM BAND b LEFT JOIN
ALBUM a
ON b.id_band = a.id_band LEFT JOIN
SONG s
ON a.id_album = s.id_album LEFT JOIN
MEMBER mem
ON mem.id_band = b.id_band
WHERE b.year_formed = 1981
GROUP BY b.id_band
Following query will give you oldest member of each band group. You can put filter by year_formed = 1981 if you need.
SELECT
b.id_band,
total_albums,
total_songs,
total_musicians
FROM
(
SELECT b.id_band,
COUNT(DISTINCT a.id_album) as total_albums,
COUNT(DISTINCT s.id_song) as total_songs,
COUNT(DISTINCT m.id_musician) as total_musicians,
dense_rank() over (partition by b.id_band order by mu.year_death desc) as rnk
FROM BAND b
LEFT JOIN ALBUM a ON(b.id_band = a.id_band)
LEFT JOIN SONG s ON(a.id_album = s.id_album)
JOIN MEMBER m ON(b.id_band= m.id_band)
JOIN MUSICIAN mu ON(m.id_musician = mu.id_musician)
WHERE mu.year_death is NULL
)
where rnk = 1
You can reference a table that is out of this nested select, like so
SELECT b.id_band,
COUNT(DISTINCT a.id_album),
COUNT(DISTINCT s.id_song),
COUNT(DISTINCT m.id_musician),
(SELECT name FROM MUSICIAN WHERE year_death IS NULL ORDER BY(birth) AND
MUSICIAN.id_BAND = b.id_band LIMIT 1)
FROM BAND b
LEFT JOIN ALBUM a ON(b.id_band = a.id_band)
LEFT JOIN SONG s ON(a.id_album = s.id_album)
JOIN MEMBER m ON(b.id_band= m.id_band)
JOIN MUSICIAN mu ON(m.id_musician = mu.id_musician)
/*LEFT JOIN(SELECT name FROM MUSICIAN WHERE year_death IS NULL ORDER
BY(birth)LIMIT 1) AS alive FROM mu*/
WHERE b.year_formed= 1981
GROUP BY b.id_band
For queries where you want to find the "max person by age" you can use ROW_NUMBER() grouped by the band
SELECT b.id_band,
COUNT(DISTINCT a.id_album),
COUNT(DISTINCT s.id_song),
COUNT(DISTINCT m.id_musician),
oldest_living_members.*
FROM
band b
LEFT JOIN album a ON(b.id_band = a.id_band)
LEFT JOIN song s ON(a.id_album = s.id_album)
LEFT JOIN
(
SELECT
m.id_band
mu.*,
ROW_NUMBER() OVER(PARTITION BY m.id_band ORDER BY mu.birthdate ASC) rown
FROM
MEMBER m
JOIN MUSICIAN mu ON(m.id_musician = mu.id_musician)
WHERE year_death IS NULL
) oldest_living_members
ON
b.id_band = oldest_living_members.id_band AND
oldest_living_members.rown = 1
WHERE b.year_formed= 1981
GROUP BY b.id_band
If you run just the subquery you'll see how it's working = artists are joined to member to get the band id, and this forms a partition. Rownumber will start numbering from 1 according to the order of birthdates (I didn't know what your column name for birthday was; you'll have to edit it) so the oldest person (earliest birthday) gets a 1.. Every time the band id changes the numbering will restart from 1 with the oldest person in that band. Then when we join it we just pick the 1s
I think this should be considerably faster (while also solving your problem):
SELECT b.id_band, a.*, m.*
FROM band b
LEFT JOIN LATERAL (
SELECT count(*) AS ct_albums, sum(ct_songs) AS ct_songs
FROM (
SELECT id_album, count(*) AS ct_songs
FROM album a
LEFT JOIN song s USING (id_album)
WHERE a.id_band = b.id_band
GROUP BY 1
) ab
) a ON true
LEFT JOIN LATERAL (
SELECT count(*) OVER () AS ct_musicians
, name AS senior_member -- any other columns you need?
FROM member m
JOIN musician mu USING (id_musician)
WHERE m.id_band = b.id_band
ORDER BY year_death IS NOT NULL -- sorts the living first
, birth
, name -- as tiebreaker (my optional addition)
LIMIT 1
) m ON true
WHERE b.year_formed = 1981;
Getting the senior band member is solved in the LATERAL subquery m - without multiplying the cost for the base query. It works because the window function count(*) OVER () is computed before ORDER BY and LIMIT are applied. Since bands naturally only have few members, this should be the fastest possible way. See:
Best way to get result count before LIMIT was applied
What is the difference between LATERAL and a subquery in PostgreSQL?
Prevent duplicate values in LEFT JOIN
The other optimization for counting albums and songs builds on the assumption that the same id_song is never included in multiple albums of the same band. Else, those are counted multiple times. (Easily fixed, and uncorrelated to the task of getting the senior band member.)
The point is to eliminate the need for DISTINCT at the top level after multiplying rows at the N-side repeatedly (I like to call that "proxy cross join"). That would produce a possibly huge number of rows in the derived table without need.
Plus, it's much more convenient to retrieve additional column (like more columns for the senior band member) than with some other query styles.
I have this query :
SELECT AVG(legs.avg) FROM legs INNER JOIN matchs ON matchs.id = legs.match_id WHERE matchs.player_id=4 GROUP BY match_id
Which allows me to get the average of the attribute "legs.avg".
The problem is that I get several results for this query, one for each matchs.id.
I need to get the average of these different results, so only one row with the total average.
Is that possible ?
One approach is to get the "average of averages":
SELECT avg(l_avg)
FROM (SELECT AVG(l.avg) as l_avg
FROM legs l INNER JOIN
matchs m
ON m.id = l.match_id
WHERE m.player_id = 4
GROUP BY l.match_id
) lm;
There are two other approaches with no subquery:
SELECT AVG(l.avg) as l_avg
FROM legs l INNER JOIN
matchs m
ON m.id = l.match_id
WHERE m.player_id = 4;
Or:
SELECT SUM(l.avg) / COUNT(DISTINCT l.match_id) as l_avg
FROM legs l INNER JOIN
matchs m
ON m.id = l.match_id
WHERE m.player_id = 4;
These do not return the same value. The first is the overall average and the second is weighted so each match has a weight of exactly 1. This is the same as the first first query, with the subquery.
Without sample data, it is not clear which version you really want.
Use below query :-
select avg(leg_avg) from (SELECT AVG(legs.avg) leg_avg FROM legs INNER JOIN matchs ON matchs.id = legs.match_id WHERE matchs.player_id=4 GROUP BY match_id) a11
SELECT f_name,l_name,teachers.first_name,teachers.t_id,p_id,paid_amount,family_id,date,sum(payments.paid_amount)
FROM payments
LEFT JOIN family ON family.id = payments.family_id
LEFT JOIN teachers ON family.teacher_id = teachers.t_id
How can I get the selected columns fully and the sum column separately?
because that sum function makes all the selected result one row
SELECT f_name,l_name,teachers.first_name,teachers.t_id,p_id,paid_amount,family_id,date
FROM payments
LEFT JOIN family ON family.id = payments.family_id
LEFT JOIN teachers ON family.teacher_id = teachers.t_id
This query is working fine without the sum column
You didn't tell the database, which column to use for aggregating the data. Don't know which database you are using, but some complain, that there is no GROUP BY statement in the SQL text.
Please try with the following query:
SELECT f_name,l_name,teachers.first_name,teachers.t_id,p_id,paid_amount,family_id,date,sum(payments.paid_amount)
FROM payments
LEFT JOIN family ON family.id = payments.family_id
LEFT JOIN teachers ON family.teacher_id = teachers.t_id
GROUP BY f_name,l_name,teachers.first_name,teachers.t_id,p_id,paid_amount,family_id,date
GROUP BY tells the database, which are the key columns in the aggregation.
If you want all the payments, use a subquery or join:
SELECT f_name, l_name, t.first_name, t.t_id, p.p_id, p.paid_amount, p.family_id, date,
(select sum(p.paid_amount) from payments) as all_paid
FROM payments p LEFT JOIN
family f
ON f.id = p.family_id LEFT JOIN
teachers t
ON f.teacher_id = tetchers.t_id;
SELECT f_name,l_name,t.first_name,t.t_id,p_id,paid_amount,family_id,date,sum(p.paid_amount)
FROM payments p,family f,teachers t where f.id = p.family_id and f.teacher_id = t.t_id
Group by f_name,l_name,teachers.first_name,teachers.t_id,p_id,paid_amount,family_id
You can add date column also in Group by expression based on your requirement. Example:
f_name,l_name,teachers.first_name,teachers.t_id,p_id,paid_amount,family_id,date
I'm arranging a sort of Tennis Players database, and I'd like to show each country's top-scoring player. I have the table Players with a column called Country which is the Country the player is from, and a table Rating with a column called Points which is the total number of points the player scored.
Since there are multiple players from each country, I don't know how to show the player with the maximum score from each country.
I tried the following:
select
playerstbl.FirstName, playerstbl.Country, ratingtbl.Points
from
playerstbl
join
ratingtbl on playerstbl.PlayerId = ratingtbl.PlayerId
where
ratingtbl.Points = (select MAX(ratingtbl.Points)
from ratingtbl
group by playerstbl.Country);
The following query is a somewhat non-intuitive way to answer this question. It is standard SQL though:
select p.FirstName, p.Country, r.Points
from playerstbl p join
ratingtbl r
on p.PlayerId = r.PlayerId
where not exists (select 1
from playerstbl p2 join
ratingtbl r2
on p2.PlayerId = r2.PlayerId
where p2.Country = p.Country and
r2.Points > r.Points
);
And, this structure often performs best. It gets the answer to this question: "Get me all players where there is no player in the same country with more points." That is equivalent to getting the max.
For your query to work, you need to incorporate the country into the subquery:
select p.FirstName, p.Country, r.Points
from playerstbl p join
ratingtbl r
on p.PlayerId = r.PlayerId
where r.Points = (select MAX(r2.Points)
from playerstbl p2 join
ratingtbl r2
on p2.PlayerId = r2.PlayerI
where p2.Country = p.Country
);
The where clause in the subquery refers to the outer query. This is called a "correlated subquery" and is a very powerful construct in SQL. Your original query returned an error, no doubt, saying that the subquery returned more than one row. This version fixed that problem.