Order competitors by multiple conditions - sql

I use a concrete, but hypothetical, example.
Consider a database containing the results of a shooting competition, where each competitor made several series of shots. DB contains 3 tables: Competitors, Series and Shots.
Competitors:
id
name
1
A
2
B
Series:
id
competitorId
1
1
2
1
3
1
4
2
5
2
6
2
Shots:
id
serieId
score
1
1
8
2
1
8
3
1
8
4
2
10
5
2
7
6
2
6
7
3
10
8
3
8
9
3
6
10
4
8
11
4
8
12
4
7
13
5
7
14
5
10
15
5
7
16
6
7
17
6
10
18
6
7
(DDL with the above statements: dbfiddle)
What I need is to order competitors by multiple conditions, which are:
Total score of all series
Number of center hits (center hit has 10 points score)
The next step to order by is:
Highest score on last serie
Highest score on next to last serie
Highest score on next to next to last serie
...
and so on for the number of series in the competition.
The query that uses the first two order conditions is quite straightforward:
SELECT comp.name,
SUM(shots.score) AS score,
SUM(IIF(shots.score = 10, 1, 0)) AS centerHits
FROM Shots shots
INNER JOIN Series series ON series.id = shots.serieId
INNER JOIN Competitors comp ON comp.id = series.competitorId
GROUP BY comp.name
ORDER BY score DESC, centerHits DESC
It produces following results:
name
score
centerHits
A
71
2
B
71
2
With the 3rd order condition I expect B competitor to be above A, because both have the same total score, the same centerHits and the same score for the last serie (24), but the score of next to last serie of B is 24 while A's score is only 23.
I wonder if it's possible to make a query that uses the third and following order conditions.

You should be able to do this pretty simply, as your requirements can be done with normal aggregation and window functions.
For each level of ordering:
"Total score of all series" can be satisfied by summing all scores.
"Number of center hits (center hit has 10 points score)" can be satisfied with a conditional count.
To order by each series working backwards by date, we can aggregate the total score per series (which we calculate using a window function) using STRING_AGG, ordering the aggregation by date (or id). Then if we order the final query by that aggregation, the later series will be sorted first.
This method allows you to order by an arbitrary number of series, as opposed to the other answer.
It's unclear how you define "later" and "earlier" as you have no date column, but I've used series.id as a proxy for that.
SELECT
comp.name,
SUM(shots.score) as totalScore,
COUNT(CASE WHEN shots.score = 10 THEN 1 END) AS centerHits,
STRING_AGG(NCHAR(shots.MaxScore + 65), ',') WITHIN GROUP (ORDER BY series.id DESC) as AllShots
FROM (
SELECT *,
SUM(shots.score) OVER (PARTITION BY shots.serieID) MaxScore
FROM Shots shots
) shots
INNER JOIN Series series ON series.id = shots.serieId
INNER JOIN Competitors comp ON comp.id = series.competitorId
GROUP BY
comp.id,
comp.name
ORDER BY
totalScore DESC,
centerHits DESC,
AllShots DESC;
Note that when grouping by name, you should also add in the primary key to the GROUP BY as the name might not be unique.
A similar, but slightly more complex query, is to pre-aggregate shots in the derived table. This is likely to perform better than using a window function.
SELECT
comp.name,
SUM(shots.totalScore) as totalScore,
SUM(centerHits) AS centerHits,
STRING_AGG(NCHAR(shots.totalScore + 65), ',') WITHIN GROUP (ORDER BY series.id DESC) as AllShots
FROM (
SELECT
shots.serieId,
SUM(shots.score) as totalScore,
COUNT(CASE WHEN shots.score = 10 THEN 1 END) AS centerHits
FROM Shots shots
GROUP BY
shots.serieId
) shots
INNER JOIN Series series ON series.id = shots.serieId
INNER JOIN Competitors comp ON comp.id = series.competitorId
GROUP BY
comp.id,
comp.name
ORDER BY
totalScore DESC,
centerHits DESC,
AllShots DESC;
db<>fiddle

It appears you need a multi-level query, each building on the one prior.
The INNER-MOST query with alias PQ is a simple sum on a per SerieID which gets the total Center Hits and total points for each respective set. Similar to what you had for the counting.
From that, you need to know which series is the latest (most recent) and work your way backwards to the prior and again prior to that. By using the OVER / PARTITION, I am joining to the series table to get the competitor ID and name while I'm at it.
By Partitioning the data based on each competitor, and applying the order based on the SerieID DESCENDING, I am getting the row number which will put the most recent as row_number() becoming 1, 2 and 3 respectively, such that for Competitor A, who had SerieID 1, then 2, then 3 will have the final "MostRecent" column as 3, 2 and 1 respetively, so SerieID 3 = 1 -- the most recent, and SerieID 1 = 3 the OLDEST serie or the competitor.
Similarly for the second competitor B, SerieIDs 4, 5 and 6 become 3, 2, 1 respectively. So now, you have a basis to know what was the latest (1 = most recent), the next to last (2 = next most recent), and next to next to last (3...)
Now that these two parts are all set, I can sum the respective totals, center hits, and now expliitly know what the most recent (1) was for its sort, and second latest (2) and third from last (3) are. These are added to the group by.
Now, if one competitor has 6 shooting series vs another having 4 series (not that it will happen in a real competition, but to understand the context), the 6 series will have their LATEST as the MostRecent = 1, similarly with 4 series, the 4th series will be MostRecent = 1.
So the final group by at the COMPETITOR level, you can assess all the parts in question.
select
F.Name,
F.CompetitorID,
sum( F.SeriesTotalScore ) TotalScore,
sum( F.CenterHits ) CenterHits,
sum( case when F.MostRecent = 1
then F.SeriesTotalScore else 0 end ) MostRecentScore,
sum( case when F.MostRecent = 2
then F.SeriesTotalScore else 0 end ) SecondToMostRecentScore,
sum( case when F.MostRecent = 3
then F.SeriesTotalScore else 0 end ) ThirdToMostRecentScore
from
( select
c.Name,
Se.CompetitorID,
PQ.SerieId,
PQ.CenterHits,
PQ.SeriesTotalScore,
ROW_NUMBER() OVER( PARTITION BY Se.CompetitorID
order by PQ.SerieId DESC) AS MostRecent
from
( select
s.serieId,
sum( case when s.score = 10 then 1 else 0 end ) as CenterHits,
sum( s.Score ) SeriesTotalScore
from
Shots s
group by
s.SerieID ) PQ
Join Series Se
on PQ.SerieID = se.id
JOIN Competitors c
on Se.CompetitorID = c.id
) F
group by
F.Name,
F.CompetitorID
order by
sum( F.SeriesTotalScore ) desc,
sum( F.CenterHits ),
sum( case when F.MostRecent = 1
then F.SeriesTotalScore else 0 end ) desc,
sum( case when F.MostRecent = 2
then F.SeriesTotalScore else 0 end ) desc,
sum( case when F.MostRecent = 3
then F.SeriesTotalScore else 0 end ) desc

Related

Postgres OFFSET the results of subqueries group by

Im calculating weekly score for game where certain weeks there are bonuses, and when totaling your score the lowest two scores are dropped.
id
name
week
score
1
Player A
1
10
2
Player A
2
20
3
Player A
3
30
4
Player A
4
40
5
Player B
1
5
6
Player B
2
10
7
Player B
3
15
8
Player B
4
20
Let's say in week 2 your score should be doubled,
So A's scores should be [10,40,30,40] and B [5,20,15,20]
With the rules of removing the two lowest scores
A [40,40] total 80
B [20,20] total 40
If I run this this query
select name, sum(special_scores) as total_score
from(
select
name,
case
when week = 2 then score * 2
else score
end special_scores
from public.standings
where name = 'Player A'
order by special_scores
offset 2
) s
group by name
order by total_score desc;
I see the expected result of totaling the score column and omitting the last two results, so I believe my sub query is correct.
However if I remove the where clause from the subquery
select name, sum(special_scores) as total_score from (
select name, case
when week = 2 then score * 2
else score
end special_scores from public.standings
order by special_scores
offset 2
) s
group by name
order by total_score desc
The table will populate but will not omit the two lowest scores
So I'm getting something like
name.
total_score
Player A
120
Player B
60
Could someone help as to why the offset in the second query is not removing the scores before totaling?
Brake the problem into subproblems.
First create a subquery that computes the actual scores (when they have to be doubled). keep id, week and actual score.
THen use a window function (row_number()) to drop the bottom two scores.
Then sum the results by id.
Finally, join the id to this result table to get the player name too.

SQL Query getting the latest record of the Group and calculate the value of those particular records

I do have the following table (just a sample) and would like to get the Points subtract from Record2 to Record1. (Record2-Record1) from the latest record of both record1 and 2. The records are entered in category of Match. 1 Match will consists of 2 records which are Record 1 and Record 2.
The output will be 3 as the newest record is ID 3 and 4 from the Match2.)
ID
Name
Points
TimeRecorded
Match
1
Record 1
3
2-Mar 2pm
1
2
Record 2
5
2-Mar 2pm
1
3
Record 1
5
4-Mar 5pm
2
4
Record 2
8
4-Mar 5pm
2
I tried to get the value of subtracting both query as below. But I feel that this is not the good way as it is hard coded for the match and the Name of the record. May I know how to construct a better query in order to get the latest record of the grouped match and calculate the points whereby subtracting Record1 from Record2.
SELECT
(select Points from RunRecord where Name= 'Record2' AND Match = 2)
- (select Points from RunRecord where Name= 'Record1' AND Match = 2)
You could use:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY TimeRecorded DESC) rn
FROM yourTable
)
SELECT
MAX(CASE WHEN Name = 'Record 2' THEN Points END) -
MAX(CASE WHEN Name = 'Record 1' THEN Points END) AS diff
FROM cte
WHERE rn = 1;
The CTE assigns a row number for each group of records of the same name, with 1 being assigned to the most recent record. Then, we aggregate over the entire table and pivot out the points to find the difference.
You can use the rank() window function to rank the records by match descending. Then take the top of the ranked records and use conditional aggregation to control the sign of the points added.
SELECT sum(CASE x.name
WHEN 'Record2' THEN
x.points
WHEN 'Record1' THEN
-x.points
END)
FROM (SELECT rr.name,
rr.points,
rank() OVER (ORDER BY rr.match DESC) r
FROM runrecord rr
WHERE name IN ('Record1',
'Record2')) x
WHERE x.r = 1;

Oracle SQL Group by and sum with multiple conditions

I attached a capture of two tables:
- the left table is a result of others "Select" query
- the right table is the result I want from the left table
The right table can be created following the next conditions:
When the same Unit have all positive or all negative
energy values, the result remain the same
When the same Unit have positive and negative energy values then:
Make a sum of all Energy for that Unit(-50+15+20 = -15) and then take the maximum of absolut value for the Energy.e.g. max(abs(energy))=50 and take the price for that value.
I use SQL ORACLE.
I realy appreciate the help in this matter !
http://sqlfiddle.com/#!4/eb85a/12
This returns desired result:
signs CTE finds out whether there are positive/negative values, as well as maximum ABS energy value
then, there's union of two selects: one that returns "original" rows (if count of distinct signs is 1), and one that returns "calculated" values, as you described
SQL> with
2 signs as
3 (select unit,
4 count(distinct sign(energy)) cnt,
5 max(abs(energy)) max_abs_ene
6 from tab
7 group by unit
8 )
9 select t.unit, t.price, t.energy
10 from tab t join signs s on t.unit = s.unit
11 where s.cnt = 1
12 union all
13 select t.unit, t2.price, sum(t.energy)
14 from tab t join signs s on t.unit = s.unit
15 join tab t2 on t2.unit = s.unit and abs(t2.energy) = s.max_abs_ene
16 where s.cnt = 2
17 group by t.unit, t2.price
18 order by unit;
UNIT PRICE ENERGY
-------------------- ---------- ----------
A 20 -50
A 50 -80
B 13 -15
SQL>
Though, what do you expect if there was yet another "B" unit row with energy = +50? Then two rows would have the same MAX(ABS(ENERGY)) value.
A union all might be the simplest solution:
with t as (
select t.*,
max(energy) over (partition by unit) as max_energy,
min(energy) over (partition by unit) as min_energy
from t
)
select unit, price, energy
from t
where max_energy > 0 and min_energy > 0 or
max_energy < 0 and min_enery < 0
union all
select unit,
max(price) keep (dense_rank first order by abs(energy)),
sum(energy)
from t
where max_energy > 0 and min_energy < 0
group by unit;

How to identify subsequent user actions based on prior visits

I want to identify the users who visited section a and then subsequently visited b. Given the following data structure. The table contains 300,000 rows and updates daily with approx. 8,000 rows:
**USERID** **VISITID** **SECTION** Desired Solution--> **Conversion**
1 1 a 0
1 2 a 0
2 1 b 0
2 1 b 0
2 1 b 0
1 3 b 1
Ideally I want a new column that flags the visit to section b. For example on the third visit User 1 visited section b for the first time. I was attempting to do this using a CASE WHEN statement but after many failed attempts I am not sure it is even possible with CASE WHEN and feel that I should take a different approach, I am just not sure what that approach should be. I do also have a date column at my disposal.
Any suggestions on a new way to approach the problem would be appreciated. Thanks!
Correlated sub-queries should be avoided at all cost when working with Redshift. Keep in mind there are no indexes for Redshift so you'd have to rescan and restitch the column data back together for each value in the parent resulting in an O(n^2) operation (in this particular case going from 300 thousand values scanned to 90 billion).
The best approach when you are looking to span a series of rows is to use an analytic function. There are a couple of options depending on how your data is structured but in the simplest case, you could use something like
select case
when section != lag(section) over (partition by userid order by visitid)
then 1
else 0
end
from ...
This assumes that your data for userid 2 increments the visitid as below. If not, you could also order by your timestamp column
**USERID** **VISITID** **SECTION** Desired Solution--> **Conversion**
1 1 a 0
1 2 a 0
2 1 b 0
2 *2* b 0
2 *3* b 0
1 3 b 1
select t.*, case when v.ts is null then 0 else 1 end as conversion
from tbl t
left join (select *
from tbl x
where section = 'b'
and exists (select 1
from tbl y
where y.userid = x.userid
and y.section = 'a'
and y.ts < x.ts)) v
on t.userid = v.userid
and t.visitid = v.visitid
and t.section = v.section
Fiddle:
http://sqlfiddle.com/#!15/5b954/5/0
I added sample timestamp data as that field is necessary to determine whether a comes before b or after b.
To incorporate analytic functions you could use:
(I've also made it so that only the first occurrence of B (after an A) will get flagged with the 1)
select t.*,
case
when v.first_b_after_a is not null
then 1
else 0
end as conversion
from tbl t
left join (select userid, min(ts) as first_b_after_a
from (select t.*,
sum( case when t.section = 'a' then 1 end)
over( partition by userid
order by ts ) as a_sum
from tbl t) x
where section = 'b'
and a_sum is not null
group by userid) v
on t.userid = v.userid
and t.ts = v.first_b_after_a
Fiddle: http://sqlfiddle.com/#!1/fa88f/2/0

Oracle- logically give each set of rows a group based on a value from ordered list

I have this
SQL Fiddle
When ordered by the sequence_number field, these records need to be grouped and given a row_number based on the following logic:
All records for which the following line type is not a 0 is part of the same group.
Example, from the provided SQL fiddle,
sequence numbers 0,1 and 2 are part of the same group, and sequence numbers 3 and 4 are part of another group. Basically, any rows up to a 0 line type are part of a single group. The data I am trying to return will look like:
GROUP LINE_TYPE SEQUENCE_NUMBER PRODUCT
------------------------------------------------
1 0 0 REM322
1 6 1 Discount
1 7 2 Loyalty Discount
2 0 3 RGM32
2 6 4 Discount
Another way to re-word what I am after is that when ordered by the sequence number, the group number will change when it hit's a 0.
I've been racking my brain trying to think how to do this using partitions/lags and even self joins but am having trouble.
Any help appreciated.
Set the column value to 1 if line_type is 0 and then calculate the running sum(using SUM as analytical function) over this.
select sum(case when line_type = 0 then 1
else 0 end
) over (order by sequence_number) as grp,
line_type,
sequence_number,
product
from ret_trand
order by sequence_number;
Demo.
Another way of doing the grouping is using a hierarchical query and CONNECT_BY_ROOT:
SELECT CONNECT_BY_ROOT sequence_number AS first_in_sequence,
line_type,
sequence_number,
product
FROM ret_trand
START WITH
line_type = 0
CONNECT BY
( sequence_number - 1 = PRIOR sequence_number
AND line_type <> 0)
ORDER SIBLINGS BY
sequence_number;
SQLFIDDLE
This will identify the groups by the initial sequence number of the group.
If you want to change this to a sequential ranking for the groups then you can use DENSE_RANK to do this:
WITH first_in_sequences AS
(
SELECT CONNECT_BY_ROOT sequence_number AS first_in_sequence,
line_type,
sequence_number,
product
FROM ret_trand
START WITH
line_type = 0
CONNECT BY
( sequence_number - 1 = PRIOR sequence_number
AND line_type <> 0)
ORDER SIBLINGS BY
sequence_number
)
SELECT DENSE_RANK() OVER ( ORDER BY first_in_sequence ) AS "group",
line_type,
sequence_number,
product
FROM first_in_sequences;
SQLFIDDLE