Postgres OFFSET the results of subqueries group by - sql

Im calculating weekly score for game where certain weeks there are bonuses, and when totaling your score the lowest two scores are dropped.
id
name
week
score
1
Player A
1
10
2
Player A
2
20
3
Player A
3
30
4
Player A
4
40
5
Player B
1
5
6
Player B
2
10
7
Player B
3
15
8
Player B
4
20
Let's say in week 2 your score should be doubled,
So A's scores should be [10,40,30,40] and B [5,20,15,20]
With the rules of removing the two lowest scores
A [40,40] total 80
B [20,20] total 40
If I run this this query
select name, sum(special_scores) as total_score
from(
select
name,
case
when week = 2 then score * 2
else score
end special_scores
from public.standings
where name = 'Player A'
order by special_scores
offset 2
) s
group by name
order by total_score desc;
I see the expected result of totaling the score column and omitting the last two results, so I believe my sub query is correct.
However if I remove the where clause from the subquery
select name, sum(special_scores) as total_score from (
select name, case
when week = 2 then score * 2
else score
end special_scores from public.standings
order by special_scores
offset 2
) s
group by name
order by total_score desc
The table will populate but will not omit the two lowest scores
So I'm getting something like
name.
total_score
Player A
120
Player B
60
Could someone help as to why the offset in the second query is not removing the scores before totaling?

Brake the problem into subproblems.
First create a subquery that computes the actual scores (when they have to be doubled). keep id, week and actual score.
THen use a window function (row_number()) to drop the bottom two scores.
Then sum the results by id.
Finally, join the id to this result table to get the player name too.

Related

How to consecutively count everything greater than or equal to itself in SQL?

Let's say if I have a table that contains Equipment IDs of equipments for each Equipment Type and Equipment Age, how can I do a Count Distinct of Equipment IDs that have at least that Equipment Age.
For example, let's say this is all the data we have:
equipment_type
equipment_id
equipment_age
Screwdriver
A123
1
Screwdriver
A234
2
Screwdriver
A345
2
Screwdriver
A456
2
Screwdriver
A567
3
I would like the output to be:
equipment_type
equipment_age
count_of_equipment_at_least_this_age
Screwdriver
1
5
Screwdriver
2
4
Screwdriver
3
1
Reason is there are 5 screwdrivers that are at least 1 day old, 4 screwdrivers at least 2 days old and only 1 screwdriver at least 3 days old.
So far I was only able to do count of equipments that falls within each equipment_age (like this query shown below), but not "at least that equipment_age".
SELECT
equipment_type,
equipment_age,
COUNT(DISTINCT equipment_id) as count_of_equipments
FROM equipment_table
GROUP BY 1, 2
Consider below join-less solution
select distinct
equipment_type,
equipment_age,
count(*) over equipment_at_least_this_age as count_of_equipment_at_least_this_age
from equipment_table
window equipment_at_least_this_age as (
partition by equipment_type
order by equipment_age
range between current row and unbounded following
)
if applied to sample data in your question - output is
Use a self join approach:
SELECT
e1.equipment_type,
e1.equipment_age,
COUNT(*) AS count_of_equipments
FROM equipment_table e1
INNER JOIN equipment_table e2
ON e2.equipment_type = e1.equipment_type AND
e2.equipment_age >= e1.equipment_age
GROUP BY 1, 2
ORDER BY 1, 2;
GROUP BY restricts the scope of COUNT to the rows in the group, i.e. it will not let you reach other rows (rows with equipment_age greater than that of the current group). So you need a subquery or windowing functions to get those. One way:
SELECT
equipment_type,
equipment_age,
(Select COUNT(*)
from equipment_table cnt
where cnt.equipment_type = a.equipment_type
AND cnt.equipment_age >= a.equipment_age
) as count_of_equipments
FROM equipment_table a
GROUP BY 1, 2, 3
I am not sure if your environment supports this syntax, though. If not, let us know we will find another way.

Order competitors by multiple conditions

I use a concrete, but hypothetical, example.
Consider a database containing the results of a shooting competition, where each competitor made several series of shots. DB contains 3 tables: Competitors, Series and Shots.
Competitors:
id
name
1
A
2
B
Series:
id
competitorId
1
1
2
1
3
1
4
2
5
2
6
2
Shots:
id
serieId
score
1
1
8
2
1
8
3
1
8
4
2
10
5
2
7
6
2
6
7
3
10
8
3
8
9
3
6
10
4
8
11
4
8
12
4
7
13
5
7
14
5
10
15
5
7
16
6
7
17
6
10
18
6
7
(DDL with the above statements: dbfiddle)
What I need is to order competitors by multiple conditions, which are:
Total score of all series
Number of center hits (center hit has 10 points score)
The next step to order by is:
Highest score on last serie
Highest score on next to last serie
Highest score on next to next to last serie
...
and so on for the number of series in the competition.
The query that uses the first two order conditions is quite straightforward:
SELECT comp.name,
SUM(shots.score) AS score,
SUM(IIF(shots.score = 10, 1, 0)) AS centerHits
FROM Shots shots
INNER JOIN Series series ON series.id = shots.serieId
INNER JOIN Competitors comp ON comp.id = series.competitorId
GROUP BY comp.name
ORDER BY score DESC, centerHits DESC
It produces following results:
name
score
centerHits
A
71
2
B
71
2
With the 3rd order condition I expect B competitor to be above A, because both have the same total score, the same centerHits and the same score for the last serie (24), but the score of next to last serie of B is 24 while A's score is only 23.
I wonder if it's possible to make a query that uses the third and following order conditions.
You should be able to do this pretty simply, as your requirements can be done with normal aggregation and window functions.
For each level of ordering:
"Total score of all series" can be satisfied by summing all scores.
"Number of center hits (center hit has 10 points score)" can be satisfied with a conditional count.
To order by each series working backwards by date, we can aggregate the total score per series (which we calculate using a window function) using STRING_AGG, ordering the aggregation by date (or id). Then if we order the final query by that aggregation, the later series will be sorted first.
This method allows you to order by an arbitrary number of series, as opposed to the other answer.
It's unclear how you define "later" and "earlier" as you have no date column, but I've used series.id as a proxy for that.
SELECT
comp.name,
SUM(shots.score) as totalScore,
COUNT(CASE WHEN shots.score = 10 THEN 1 END) AS centerHits,
STRING_AGG(NCHAR(shots.MaxScore + 65), ',') WITHIN GROUP (ORDER BY series.id DESC) as AllShots
FROM (
SELECT *,
SUM(shots.score) OVER (PARTITION BY shots.serieID) MaxScore
FROM Shots shots
) shots
INNER JOIN Series series ON series.id = shots.serieId
INNER JOIN Competitors comp ON comp.id = series.competitorId
GROUP BY
comp.id,
comp.name
ORDER BY
totalScore DESC,
centerHits DESC,
AllShots DESC;
Note that when grouping by name, you should also add in the primary key to the GROUP BY as the name might not be unique.
A similar, but slightly more complex query, is to pre-aggregate shots in the derived table. This is likely to perform better than using a window function.
SELECT
comp.name,
SUM(shots.totalScore) as totalScore,
SUM(centerHits) AS centerHits,
STRING_AGG(NCHAR(shots.totalScore + 65), ',') WITHIN GROUP (ORDER BY series.id DESC) as AllShots
FROM (
SELECT
shots.serieId,
SUM(shots.score) as totalScore,
COUNT(CASE WHEN shots.score = 10 THEN 1 END) AS centerHits
FROM Shots shots
GROUP BY
shots.serieId
) shots
INNER JOIN Series series ON series.id = shots.serieId
INNER JOIN Competitors comp ON comp.id = series.competitorId
GROUP BY
comp.id,
comp.name
ORDER BY
totalScore DESC,
centerHits DESC,
AllShots DESC;
db<>fiddle
It appears you need a multi-level query, each building on the one prior.
The INNER-MOST query with alias PQ is a simple sum on a per SerieID which gets the total Center Hits and total points for each respective set. Similar to what you had for the counting.
From that, you need to know which series is the latest (most recent) and work your way backwards to the prior and again prior to that. By using the OVER / PARTITION, I am joining to the series table to get the competitor ID and name while I'm at it.
By Partitioning the data based on each competitor, and applying the order based on the SerieID DESCENDING, I am getting the row number which will put the most recent as row_number() becoming 1, 2 and 3 respectively, such that for Competitor A, who had SerieID 1, then 2, then 3 will have the final "MostRecent" column as 3, 2 and 1 respetively, so SerieID 3 = 1 -- the most recent, and SerieID 1 = 3 the OLDEST serie or the competitor.
Similarly for the second competitor B, SerieIDs 4, 5 and 6 become 3, 2, 1 respectively. So now, you have a basis to know what was the latest (1 = most recent), the next to last (2 = next most recent), and next to next to last (3...)
Now that these two parts are all set, I can sum the respective totals, center hits, and now expliitly know what the most recent (1) was for its sort, and second latest (2) and third from last (3) are. These are added to the group by.
Now, if one competitor has 6 shooting series vs another having 4 series (not that it will happen in a real competition, but to understand the context), the 6 series will have their LATEST as the MostRecent = 1, similarly with 4 series, the 4th series will be MostRecent = 1.
So the final group by at the COMPETITOR level, you can assess all the parts in question.
select
F.Name,
F.CompetitorID,
sum( F.SeriesTotalScore ) TotalScore,
sum( F.CenterHits ) CenterHits,
sum( case when F.MostRecent = 1
then F.SeriesTotalScore else 0 end ) MostRecentScore,
sum( case when F.MostRecent = 2
then F.SeriesTotalScore else 0 end ) SecondToMostRecentScore,
sum( case when F.MostRecent = 3
then F.SeriesTotalScore else 0 end ) ThirdToMostRecentScore
from
( select
c.Name,
Se.CompetitorID,
PQ.SerieId,
PQ.CenterHits,
PQ.SeriesTotalScore,
ROW_NUMBER() OVER( PARTITION BY Se.CompetitorID
order by PQ.SerieId DESC) AS MostRecent
from
( select
s.serieId,
sum( case when s.score = 10 then 1 else 0 end ) as CenterHits,
sum( s.Score ) SeriesTotalScore
from
Shots s
group by
s.SerieID ) PQ
Join Series Se
on PQ.SerieID = se.id
JOIN Competitors c
on Se.CompetitorID = c.id
) F
group by
F.Name,
F.CompetitorID
order by
sum( F.SeriesTotalScore ) desc,
sum( F.CenterHits ),
sum( case when F.MostRecent = 1
then F.SeriesTotalScore else 0 end ) desc,
sum( case when F.MostRecent = 2
then F.SeriesTotalScore else 0 end ) desc,
sum( case when F.MostRecent = 3
then F.SeriesTotalScore else 0 end ) desc

Return all the second highest valued rows SQL

Let's say I have a table called bookings, containing 3 columns: hotelid, timestampOfBookingand numberOfGuests.
How do I return all the dates on which the second highest beds were booked (the amount of beds booked is the same as the number of guests).
In other words, I'm looking for the dates on which the second maximum number of numberOfGuestsoccur. This means that in the event of a tie (where there is more than 1 date on which the described condition applies), it should return all those dates. In the event that all the dates have exactly the same numberOfGuests the query should return nothing.
If possible, I would only like to have one column in the query result that contains those specific dates.
Example:
hotelid timestampOfBooking numberOfGuests
11 22/11/2021 2
34 23/11/2021 2
30 23/11/2021 5
19 24/11/2021 7
8 25/11/2021 12
34 25/11/2021 5
In this case two dates should be in the result: 23/11/2021 and 24/11/2021 as they both had 7 numberOfGuests. The max numberOfGuests here is 17 (occurs on 25/11/2021) and 7 is the second highest, explaining why 23/11/2021 (2 + 5) and 24/11/2021 (7) are returned. The final result should look like this:
dates
23/11/2021
24/11/2021
Method 1:
You can use DENSE_RANK() with SUM(numberOfGuests) IN DESC:
SELECT timestampOfBooking, total_beds FROM
(
select timestampOfBooking,
sum(numberOfGuests) as total_beds,
dense_rank() over (order by sum(numberOfGuests) DESC) as rnk
from bookings
group by timestampOfBooking
) as sq
where rnk = 2
Method 2:
Using OFFSET and LIMIT:
SELECT timestampOfBooking,
SUM(numberOfGuests) AS total_beds
FROM bookings
GROUP BY timestampOfBooking
HAVING sum(numberOfGuests)=
(
SELECT distinct SUM(numberOfGuests) AS total_beds
FROM bookings
GROUP BY timestampOfBooking
ORDER BY total_beds DESC
OFFSET 1 LIMIT 1
);
Both the methods will give you same output.
Working fiddle

Get rollup group value in SQL Server

I have a table with following data:
Name
Score
A
2
B
3
A
1
B
3
I want a query which returns the following output.
Name
Score
A
2
A
1
Subtotal: A
3
B
3
B
3
Subtotal: B
6
I am able to get "Subtotal" with group by rollup query but I want to get subtotal along with group column value.
Please help me with some SQL code
If score has at most one value per name, you can use GROUPING SETS`:
select name, sum(score) as score
from t
group by grouping sets ((name, score), (name));
If name is never null, I would just use:
coalesce(name, 'Grouping ' + name)
Otherwise you need to use grouping().

SQL: Displaying results from conditional loop

I'm not even sure if this is possible using SQL only but here goes...
I have a list of football results in one table, each row is a match and contains all data from that match, I want to cycle through each match, get the home team, check their last 6 matches and display only the matches where the specified team scored 2 goals or more in 50% or more of their last 6 matches.
So far I have this, I just don't know how to stitch it together...
Create list of all games, returning only the home team:
SELECT Date, Home
FROM [FDATA].[dbo].[Goals]
ORDER BY Date
Get last 6 games of that team:
SELECT TOP 6 *
FROM [FDATA].[dbo].[Goals]
WHERE Home = 'home from first query' AND Date <= 'date from first query'
ORDER BY Date DESC
Then check if the team scored 2 or more goals in >= 50% of the 6 games returned and output the row from the first query if true:
SELECT *
FROM last query
WHERE HomeGoals >= 2
ORDER BY Date DESC
Apologies for the crudeness of this question but I'm a bit of a novice.
Use only need two queries:
SELECT home, count(1) cnt
FROM
(
SELECT TOP 6 G1.HomeGoals, G1.Home
FROM [FDATA].[dbo].[Goals] AS G1
LEFT OUTER JOIN
[FDATA].[dbo].[Goals] AS G2 ON
G1.Home = G2.HOME AND G1.Date <= G2.Date
ORDER BY G1.Date DESC
)
WHERE HomeGoals >= 2
GROUP BY home
HAVING count(1) >= 3