Postgres rank() without duplicates - sql

I'm ranking race data for series of cycling events. Racers win various amounts of points for their position in races. I want to retain the discrete event scoring, but also rank the racer in the series. For example, considering a sub-query that returns this:
License #
Rider Name
Total Points
Race Points
Race ID
123
Joe
25
5
567
123
Joe
25
12
234
123
Joe
25
8
987
456
Ahmed
20
12
567
456
Ahmed
20
8
234
You can see Joe has 25 points, as he won 5, 12, and 8 points in three races. Ahmed has 20 points, as he won 12 and 8 points in two races.
Now for the ranking, what I'd like is:
Place
License #
Rider Name
Total Points
Race Points
Race ID
1
123
Joe
25
5
567
1
123
Joe
25
12
234
1
123
Joe
25
8
987
2
456
Ahmed
20
12
567
2
456
Ahmed
20
8
234
But if I use rank() and order by "Total Points", I get:
Place
License #
Rider Name
Total Points
Race Points
Race ID
1
123
Joe
25
5
567
1
123
Joe
25
12
234
1
123
Joe
25
8
987
4
456
Ahmed
20
12
567
4
456
Ahmed
20
8
234
Which makes sense, since there are three "ties" at 25 points.
dense_rank() solves this problem, but if there are legitimate ties across different racers, I want there to be gaps in the rank (e.g if Joe and Ahmed both had 25 points, the next racer would be in third place, not second).
The easiest way to solve this I think would be to issue two queries, one with the "duplicate" racers eliminated, and then a second one where I can retain the individual race data, which I need for the points break down display.
I can also probably, given enough effort, think of a way to do this in a single query, but I'm wondering if I'm not just missing something really obvious that could accomplish this in a single, relatively simple query.
Any suggestions?

You have to break this into steps to get what you want, but that can be done in a single query with common table expressions:
with riders as ( -- get individual riders
select distinct license, rider, total_points
from racists
), places as ( -- calculate non-dense rankings
select license, rider, rank() over (order by total_points desc) as place
from riders
)
select p.place, r.* -- join rankings into main table
from places p
join racists r on (r.license, r.rider) = (p.license, p.rider);
db<>fiddle here

Related

What logic should be used to label customers (monthly) based on the categories they bought more often in the preceding 4 calendar months?

I have a table that looks like this:
user
type
quantity
order_id
purchase_date
john
travel
10
1
2022-01-10
john
travel
15
2
2022-01-15
john
books
4
3
2022-01-16
john
music
20
4
2022-02-01
john
travel
90
5
2022-02-15
john
clothing
200
6
2022-03-11
john
travel
70
7
2022-04-13
john
clothing
70
8
2022-05-01
john
travel
200
9
2022-06-15
john
tickets
10
10
2022-07-01
john
services
20
11
2022-07-15
john
services
90
12
2022-07-22
john
travel
10
13
2022-07-29
john
services
25
14
2022-08-01
john
clothing
3
15
2022-08-15
john
music
5
16
2022-08-17
john
music
40
18
2022-10-01
john
music
30
19
2022-11-05
john
services
2
20
2022-11-19
where i have many different users, multiple types making purchases daily.
I want to end up with a table of this format
user
label
month
john
travel
2022-01-01
john
travel
2022-02-01
john
clothing
2022-03-01
john
travel-clothing
2022-04-01
john
travel-clothing
2022-05-01
john
travel-clothing
2022-06-01
john
travel
2022-07-01
john
travel
2022-08-01
john
services
2022-10-01
john
music
2022-11-01
where the label would record the most popular type (based on % of quantity sold) for each user in a timeframe of the last 4 months (including the current month). So for instance, for March 2022 john ordered 200/339 clothing (Jan to and including Mar) so his label is clothing. But for months where two types are almost even I'd want to use a double label like for April (185 travel 200 clothing out of 409). In terms of rules this is not set in stone yet but it's something like, if two types are around even (e.g. >40%) then use both types in the label column; if three types are around even (e.g. around 30% each) use three types as label; if one label is 40% but the rest is made up of many small % keep the first label; and of course where one is clearly a majority use that. One other tricky bit is that there might be missing months for a user.
I think regarding the rules I need to just compare the % of each type, but I don't know how to retrieve the type as label afterwards. In general, I don't have the SQL/BigQuery logic very clearly in my head. I have done somethings but nothing that comes close to the target table.
Broken down in steps, I think I need 3 things:
group by user, type, month and get the partial and total count (I have done this)
then retrieve the counts for the past 4 months (have done something but it's not exactly accurate yet)
compare the ratios and make the label column
I'm not very clear on the sql/bigquery logic here, so please advise me on the correct steps to achieve the above. I'm working on bigquery but sql logic will also help
Consider below approach. It looks a little bit messy and has a room to optimize but hope you get some idea or a direction to address your problem.
WITH aggregation AS (
SELECT user, type, DATE_TRUNC(purchase_date, MONTH) AS month, month_no,
SUM(quantity) AS net_qty,
SUM(SUM(quantity)) OVER w1 AS rolling_qty
FROM sample_table, UNNEST([EXTRACT(YEAR FROM purchase_date) * 12 + EXTRACT(MONTH FROM purchase_date)]) month_no
GROUP BY 1, 2, 3, 4
WINDOW w1 AS (
PARTITION BY user ORDER BY month_no RANGE BETWEEN 3 PRECEDING AND CURRENT ROW
)
),
rolling AS (
SELECT user, month, ARRAY_AGG(STRUCT(type, net_qty)) OVER w2 AS agg, rolling_qty
FROM aggregation
QUALIFY ROW_NUMBER() OVER (PARTITION BY user, month) = 1
WINDOW w2 AS (PARTITION BY user ORDER BY month_no RANGE BETWEEN 3 PRECEDING AND CURRENT ROW)
)
SELECT user, month, ARRAY_TO_STRING(ARRAY(
SELECT type FROM (
SELECT type, SUM(net_qty) / SUM(SUM(net_qty)) OVER () AS pct,
FROM r.agg GROUP BY 1
) QUALIFY IFNULL(FIRST_VALUE(pct) OVER (ORDER BY pct DESC) - pct, 0) < 0.10 -- set threshold to 0.1
), '-') AS label
FROM rolling r
ORDER BY month;
Query results

questions in SQL Oracle

I'm trying to answer these questions but I couldn't and I need here
1) List the number of days that have elapsed since each student joined.
this what I did
Select FR_FIRSTNAME,
FR_LASTNAME,
trunc(sysdate - FR_DATEJOINED) / 7 DAYS
from alharbi_bandar5_FRESHMEN;
no rows selected
2) List the student names and city in upper case.
This what i did
Select FR_FIRSTNAME, FR_LASTNAME, CITY FROM alharbi_bandar5_FRESHMEN
where UPPER (FR_FIRSTNAME, FR_LASTNAME, CITY) like 'SMITH%';
> where UPPER (FR_FIRSTNAME, FR_LASTNAME, CITY) like 'SMITH%'
*
ERROR at line 2:
ORA-00909: invalid number of arguments
3) List the no and last name of the student(s) with the highest ACT score.
This what i did
Select FR_NO, FR_LASTNAME, ACT from alharbi_bandar5_FRESHME
where ACT = MAX(ACT);
where ACT = MAX(ACT)
*
ERROR at line 2:
ORA-00934: group function is not allowed here
this is my table
FR_ FR_FIRSTNAME FR_LASTNAME FR_DATEJO ACT CITY
--- ------------------------------ ------------------------------ --------- ---------- ------------------------------
100 Mark Ramon 12-JUL-13 21 Florence
101 John Wright 13-JUN-13 31 Edgewood
102 Peter Sellers 06-JAN-13 30 Blue Ash
103 Eric Bates 14-MAY-13 24 Milford
104 Theresa Boyers 23-APR-13 22 Covingtion
105 Alex William 04-MAR-13 24 Edgewood
106 Eric Byrd 23-MAR-13 19 Alexandria
107 Steve Norris 21-DEC-12 21 Highland
108 Lisa Nkosi 13-FEB-13 33 Florence
109 Bradley Rego 21-FEB-12 29 Covington
110 Kathy Thomas 15-OCT-12 27 Milford
111 Catherine Jones 17-APR-13 34 Edgewood
112 Emily Hess 15-NOV-12 36 Highland
113 Josha Hunter 19-MAY-14 31 Florence
A lot of these questions have answers in the Oracle SQL reference and are mostly syntax issues.
1) trunc(sysdate - FR_DATEJOINED) / 7 DAYS
Oracle gies out the number of days in the units of difference, so sysdate - FR_DATEJOINED would gie you number of days, which could also involve fractional component (2.5 days for example, if it has been 2 days and 12 hours since the candidate joined). Trunc would get rid of the fractional component, but "/7" would convert the result into number of weeks instead. why are you doing this?
Either way, i don't believe this query is being fired against the table below, otherwise you'd not get zero rows as you are not filtering anything at all.
Check these out for more info on Oracle's date functions.
http://docs.oracle.com/cd/E17952_01/refman-5.1-en/date-and-time-functions.html
https://www.youtube.com/watch?v=H18UWBoHhHY
2) UPPER function accepts a column name or an expression, so if you need multiple columns. you'd need to use UPPER around each column.
3) For this example, you'll need to use a subquery to get the max value first and then use the query on top.
getting the max value
Select max(act) from alharbi_bandar5_FRESHME;
so, final query would be...
Select FR_NO, FR_LASTNAME, ACT from alharbi_bandar5_FRESHME
where ACT = (select MAX(ACT) from alharbi_bandar5_FRESHME);
Or, you could use the oracle rank function..
select fr_no,
fr_last_name,
act
from (
select fr_no, fr_lastname, act,
rank () over (order by act desc) rnk
from alharbi_bandar5_FRESHME
) where rnk = 1

VBA/SQL recordsets

The project I'm asking about is for sending an email to teachers asking what books they're using for the classes they're teaching next semester, so that the books can be ordered. I have a query that compares the course number of this upcoming semester's classes to the course numbers of historical textbook orders, pulling out only those classes that are being taught this semester. That's where I get lost.
I have a table that contains the following:
Professor
Course Number
Year
Book Title
The data looks like this:
professor year course number title
--------- ---- ------------- -------------------
smith 13 1111 Pride and Prejudice
smith 13 1111 The Fountainhead
smith 13 1222 The Alchemist
smith 12 1111 Pride and Prejudice
smith 11 1222 Infinite Jest
smith 10 1333 The Bible
smith 13 1333 The Bible
smith 12 1222 The Alchemist
smith 10 1111 Moby Dick
johnson 12 1222 The Tipping Point
johnson 11 1333 Anna Kerenina
johnson 10 1333 Everything is Illuminated
johnson 12 1222 The Savage Detectives
johnson 11 1333 In Search of Lost Time
johnson 10 1333 Great Expectations
johnson 9 1222 Proust on the Shore
Here's what I need the code to do "on paper":
Group the records by professor. Determine every unique course number in that group, and group records by course number. For each unique course number, determine the highest year associated. Then spit out every record with that professor+course number+year combination.
With the sample data, the results would be:
professor year course number title
--------- ---- ------------- -------------------
smith 13 1111 Pride and Prejudice
smith 13 1111 The Fountainhead
smith 13 1222 The Alchemist
smith 13 1333 The Bible
johnson 12 1222 The Tipping Point
johnson 11 1333 Anna Kerenina
johnson 12 1222 The Savage Detectives
johnson 11 1333 In Search of Lost Time
I'm thinking I should make a record set for each teacher, and within that, another record set for each course number. Within the course number record set, I need the system to determine what the highest year number is - maybe store that in a variable? Then pull out every associated record so that if the teacher ordered 3 books the last time they taught that class (whether it was in 2013 or 2012 and so on) all three books display. I'm not sure I'm thinking of record sets in the right way, though.
My SQL so far is basic and clearly doesn't work:
SELECT [All].Professor, [All].Course, Max([All].Year)
FROM [All]
GROUP BY [All].Professor, [All].Course;
Use your query as a subquery and INNER JOIN it back to the [ALL] table to filter the rows.
SELECT
a.Professor,
a.Year,
a.Course,
a.title
FROM
[ALL] AS a
INNER JOIN
(
SELECT [All].Professor, [All].Course, Max([All].Year) AS MaxOfYear
FROM [All]
GROUP BY [All].Professor, [All].Course
) AS sub
ON
a.Professor = sub.Professor
AND a.Course = sub.Course
AND a.Year = sub.MaxOfYear;

Access SQL - Select only the last sequence

I have a table with an ID and multiple informative columns. Sometimes however, I can have multiple data for an ID, so I added a column called "Sequence". Here is a shortened example:
ID Sequence Name Tel Date Amount
124 1 Bob 873-4356 2001-02-03 10
124 2 Bob 873-4356 2002-03-12 7
124 3 Bob 873-4351 2006-07-08 24
125 1 John 983-4568 2007-02-01 3
125 2 John 983-4568 2008-02-08 13
126 1 Eric 345-9845 2010-01-01 18
So, I would like to obtain only these lines:
124 3 Bob 873-4351 2006-07-08 24
125 2 John 983-4568 2008-02-08 13
126 1 Eric 345-9845 2010-01-01 18
Anyone could give me a hand on how I could build a SQL query to do this ?
Thanks !
You can calculate the maximum sequence using group by. Then you can use join to get only the maximum in the original data.
Assuming your table is called t:
select t.*
from t join
(select id, MAX(sequence) as maxs
from t
group by id
) tmax
on t.id = tmax.id and
t.sequence = tmax.maxs

Retrieve top 48 unique records from database based on a sorted Field

I have database table that I am after some SQL for (Which is defeating me so far!)
Imagine there are 192 Athletic Clubs who all take part in 12 Track Meets per season.
So that is 2304 individual performances per season (for example in the 100Metres)
I would like to find the top 48 (unique) individual performances from the table, these 48 athletes are then going to take part in the end of season World Championships.
So imagine the 2 fastest times are both set by "John Smith", but he can only be entered once in the world champs. So i would then look for the next fastest time not set by "John Smith"... so on and so until I have 48 unique athletes..
hope that makes sense.
thanks in advance if anyone can help
PS
I did have a nice screen shot created that would explain it much better. but as a newish user i cannot post images.
I'll try a copy and paste version instead...
ID AthleteName AthleteID Time
1 Josh Lewis 3 11.99
2 Joe Dundee 4 11.31
3 Mark Danes 5 13.44
4 Josh Lewis 3 13.12
5 John Smith 1 11.12
6 John Smith 1 12.18
7 John Smith 1 11.22
8 Adam Bennett 6 11.33
9 Ronny Bower 7 12.88
10 John Smith 1 13.49
11 Adam Bennett 6 12.55
12 Mark Danes 5 12.12
13 Carl Tompkins 2 13.11
14 Joe Dundee 4 11.28
15 Ronny Bower 7 12.14
16 Carl Tompkin 2 11.88
17 Nigel Downs 8 14.14
18 Nigel Downs 8 12.19
Top 4 unique individual performances
1 John Smith 1 11.12
3 Joe Dundee 4 11.28
5 Adam Bennett 6 11.33
6 Carl Tompkins 2 11.88
Basically something like this:
select top 48 *
from (
select athleteId,min(time) as bestTime
from theRaces
where raceId = '123' -- e.g., 123=100 meters
group by athleteId
) x
order by bestTime
try this --
select x.ID, x.AthleteName , x.AthleteID , x.Time
(
select rownum tr_count,v.AthleteID AthleteID, v.AthleteName AthleteName, v.Time Time,v.id id
from
(
select
tr1.AthleteName AthleteName, tr1.Time time,min(tr1.id) id, tr1.AthleteID AthleteID
from theRaces tr1
where time =
(select min(time) from theRaces tr2 where tr2.athleteId = tr1.athleteId)
group by tr1.AthleteName, tr1.AthleteID, tr1.Time
having tr1.Time = ( select min(tr2.time) from theRaces tr2 where tr1.AthleteID =tr2.AthleteID)
order by tr1.time
) v
) x
where x.tr_count < 48