Find correct place to insert new hi-score record - sql

Given the table:
ArcadeScores
------------
ID
GameID
UserID
Score
Milliseconds
Rank
Where Rank is > 0 and calculated as the index of Score DESC then by Milliseconds ASC (best score is always top, in case of equal score it's ranked by whoever did it fastest).
Storing Rank is required as it allows me to perform fast queries such as How many top 3 scores does userID 5 have?.
Recalculating the Rank for a GameID when a new score is inserted by ordering all the records and looping each one updating the rank works OK, but looping through every record and performing an update query on every record slows down when you have thousands of records. For a popular game (especially a fast game where a single user might be posting a new score every 3 seconds or so), this is too costly.
Given a new score record, I need to work out which position it should be inserted into. If our new record is going to be rank 45 we can then increment every record above it by one which is a far cheaper operation:
UPDATE ArcadeScores SET ScoreRank = ScoreRank + 1 WHERE gameID = " + myGameID + " AND ScoreRank >= 45
The difficulty I'm having is working out the rank of a record of the record to insert. On Score or Milliseconds alone it's fairly easy, but I'm struggling to make it discover the correct Rank as a combination of both.
How many score records there are for a game is a known value.

Do you need a query or maybe you can use a function? Try this query - if i understood order in your table correctly, it will give a rank for the new row with values inserting_score and inserting_milliseconds:
SELECT COUNT(1) + 1 FROM ArcadeScores
WHERE Score > inserting_score OR (Score = inserting_score AND Milliseconds < inserting_millisecondes)
Oh, forgot about GameID :)
SELECT COUNT(1) + 1 FROM ArcadeScores
WHERE GameID = inserting_gameid AND (Score > inserting_score
OR (Score = inserting_score AND Milliseconds < inserting_millisecondes))

Related

SQL return minimum by groupby and ratio of

A SQL question: I have a table game with columns user_id (unique per user), game_id (unique per game), game_start_timestamp_utc (the UTC timestamp for when the game starts), and game_status, which can either be ‘pass’, ‘in progress’ or ‘fail’.
The question is to write a query to return the game that has the lowest pass rate (pass users/enrolled users).
The table should be like this
user_id game_id game_start_timestamp_utc game_status
-----------------------------------------------------
1 111 10/22/2019 pass
2 111 10/21/2018 fail
...
I know how to do it in Python pandas, just need group by game_id to calculate pass rate, but have not much idea to do it in SQL. Thanks in advance.
Use conditional aggregation. avg() comes handy for this:
select game_id,
avg(case when game_status = 'pass' then 1.0 else 0 end) as pass_rate
from game
group by game_id
order by pass_rate
This gives you the pass rate of each game, as a value between 0 and 1, ordered by increasing rate - so the first row is the result you want.
You can keep that one row only with a row-limiting clause. The syntax varies across databases: limit 1, top (1), fetch first row, ...

SQL field constraint that forces a column to be ascending

I am working on a small table that has a user input with a number field. The number that the user inputs has to be larger by a few points than the current highest number. Can I also check that the score has to be for instance 1 higher if the current highest score is < 10 but 5 higher if the current highest 10 <= score < 100?
for instance:
user score
1 1
1 2
1 4
1 5
1 7
Now, I want a constraint that will check on insert that the inserted score is bigger than the current highest score by x amount.
Is such a constraint possible?
Such a constraint is difficult to implement. If you care about performance, can you simply input the difference?
1 1
1 1
1 2
1 1
1 2
If you do the data this way, then you can use check (score > 0) and then use sum(score) over (order by ??), where ?? specifies the ordering of the rows.
Otherwise, you'll need to use either a trigger or user-defined function to implement the constraint.

Generate random data in Oracle based on ranks

It is given the following scenario. I have a list of 3000 first names and a list of 2500 last names. Each one of them has a "ranking" that represents the position in a name's top. Two or more names can have the same ranking. Also, a table with 1500 cities is given, each with 4 census values in certain years.
From the tables above I must generate 5 million random entries containing the first name, last name, birth date and place of birth of one person, that should follow the rules given by ranking of the names and population number of the cities.
This have to be generated using just Oracle (stored functions, stored procedures and so on). How can I do this?
Disclaimer: I'm not a statistics expert, and there are probably way more efficient means to do that.
The most challenging task seems to be the creation of 5 million names according to ranks. In real world, those would be distributed unevenly among the population: difference between second last and last would be 1-2 persons, and the difference between the first and second rank could be thousands of people. That said, I have no idea how to achieve that, so we'll model it in other way. Suppose we have total population of 100 and list of four ranked names:
Alice: 1
Bob: 2
Betty: 2
Claire: 3
We can make the distribution "even", so that rank 3 has X people, rank 2 has twice as many, and rank 1 thrice as many. If the ranks were unique, the formula would be as simple as X + 2X + 3X = 100, but we have two names in rank 2, so it should be X + 2*2X + 3X = 100, so X = 12.5. We can truncate it to integer and get people counts for all ranks except the first (12, 24 and 24) and first rank would get what remains: 40. Seems good enough, though it will not work for edge case when you have multiple first ranks.
There's a little problem, though. For 3000 different names, the sum of coefficients would be 4501500. So, truncated X would be 1, making rank 3000 to rank 2 have 1 to 2999 people respectively, and rank 1 have a little under 500000. That's not quite good enough. To illustrate with four names above, assume total count of 15. With current algorithm, X will be 1 as well, and distribution will be 1-2-2-10. Luckily, we'll be processing ranks one by one in procedure, so we can remove processed people from equation and recalculate X. E.G. first it's X + 2*2X + 3X = 15 with X=1, then 2*2X + 3X = 14 with X=2. This way, distribution will be 1-4-4-6, which is far from ideal, but better.
Now, this can already be expressed as PL/SQL. I suggest to create the table with following columns: LAST_NAME, FIRST_NAME, BIRTHDAY, CITY, RAND_ROWNO.
First of all, let's fill it with 5M last names. Assuming your table for them is last_names(name, name_rank), you'll need the following:
declare
cursor cur_last_name_ranks is
select name_rank, count(*) cnt, row_number() over (order by name_rank desc) coeff
from last_names l
group by name_rank;
cursor cur_last_names (c_rank number) is
select name from last_names
where name_rank = c_rank;
v_coeff_sum number;
v_total_people_count number:= 5000000;
v_remaining_people number;
v_x number;
v_insert_cnt number;
begin
--Get a sum of all coefficients for our formula
select sum(coeff) into v_coeff_sum
from
(
select count(*) * row_number() over (order by name_rank desc) coeff
from last_names l
group by name_rank
);
v_remaining_people := v_total_people_count;
--Now, loop for all coefficients
for r in cur_last_name_ranks loop
--Recalculate X
v_x := trunc(v_remaining_people / v_coeff_sum);
--First, determine how many rows should be inserted per last name with such rank
if r.name_rank = 1 then
if r.cnt > 1 then
--raise an exception here, we don't allow multiple first ranks
raise TOO_MANY_ROWS;
end if;
v_insert_cnt := v_remaining_people;
else
v_insert_cnt := v_x*r.coeff;
end if;
--Insert last names N times.
--Instead of multiple INSERT statements, use select from dual with connect trick.
for n in cur_last_names(r.name_rank) loop
insert into result_table(last_name)
select n.name from dual connect by level <= v_insert_cnt;
end loop;
commit;
--Calculate remaining people count
v_remaining_people := v_remaining_people - v_x*r.cnt*r.coeff;
--Recalculate remmaining coefficients
v_coeff_sum := v_coeff_sum - r.cnt*r.coeff;
end loop;
end;
Now you have 5 million rows with last names filled according to ranks. Now, we'll need to assign random number from 1 to 5000000 for each row - you'll see why. This is done with a single query using merge on self:
merge into result_table t1
using (select rowid rid, row_number() over (ORDER BY DBMS_RANDOM.VALUE) rnk from result_table) t2
on (t1.rowid = t2.rid)
when matched then update set t1.rand_rowno = t2.rnk
Note that it will take some time because of large size.
Now you must repeat the same procedure for first names. It'll be very similar to last names, except you'll be updating existing records, not inserting new. If you keep track of how many rows you've updated already, it'll be as simple putting this in the inner loop:
update result_table
set first_name = n.name
where rand_rowno between
(v_processed_rows+1) and
(v_processed_rows+v_insert_cnt);
v_processed_rows := v_processed_rows+v_insert_cnt;
That does it - you now have a decent sample of 5M names according to your ranking, last names randomly matched with first names.
Now, for census. I don't really understand your format, but that's relatively simple. If you get data to the form of "N people were born in city C between DATE1 and DATE2", you can update the table in a loop, setting N rows to have CITY = C and BIRTHDAY = a random date between DATE1 and DATE2. You'll need a function to return a random date from a time period, see this. Also, don't forget to assign random row numbers again before doing that.
I'll leave the census part for you to implement, I've spent too much time on writing this already. Thanks for a good brain exercise!

postgreSQL get index of a row that is outside the limit you searched for

I am new to SQL and I am not sure how to properly search my question so I will ask here.
Please see this link to see the SQL tables and queries I am working with
In this example there are 6 rows and I am limiting my search to start at the first index and give me at most 2. However, I would like to know the index of the row that has id 1.
When I use the query I describe in sqlfiddle, It shows me rows with id 5 and 23. But it doesn't include the row with id 1. However, I need to know the index of the row with id 1..
click here to see the full list
The link above prints out all of the rows and we can see that index 3 has the row containing id 1.
However I need to know that index without asking for the entire Array.
Why is this important? Well, lets say that we have 1 million rows. And if I ask for a million rows, that would mean allocating an array of one million. I could parse the array until I find the id I am looking for. However, allocating a million is way too costly.
Lets say for example that the row I am looking for resides in index 26, But I make my query so that it starts at index 0 and limits to 10. The array that I get from this query would not contain index 26. However I still need to know that it IS at index 26.
So this magic query would give me two things:
the top ten rows of the sorted rows
the index of a specified id (e.g. id of 1) regardless of its placement in the list.
Is this a possible query?
Clarification:
I use the word index to mean the row number.
If a we query a list of names from the db, we could get something like this:
bob
frank
dawn
then bob would be at index 0, frank would be at index 1 and dawn at index 2.
If I ORDER BY name ASC then the list of names would become
bob
dawn
frank
bob would be index 0 dawn would be index 1 and frank would be index 2.
I hope this makes things more clear.
If you want the row number, use the row_number() function:
SELECT *
FROM (SELECT ud.id, ud.team_name, ui.name, ui.date_created,
row_number() over (order by ui.name, ui.id) as rownumber
FROM user_data ud JOIN
user_infos ui
ON ui.id = ud.id
WHERE ui.date_created BETWEEN NOW() - INTERVAL '1 year' AND NOW()
) t
WHERE rownumber <= 10 or id = 1;
If you want them in order, just add order by rownumber as the last statement.

Getting player rank from database

I have a RuneScape private server, which stores the player scores in a database.
The highscores load the player's scores and put them into a table.
But now comes the harder part I can't fix:
I want to display the rank of the player. Like: 'Attack level: 44, ranked 12'. So it has to find the rank the user has.
How can I get this to work? I googled for 2 days now, I did not find anything.
I don't know if there's a way to achieve this using the same query.
You could make another query like:
pos = select count(*) from players where attack > 44 + 1
This query would return the number of players ranked above someone. The "plus one" part is to make the rank start at 1 (because the first one won't have anyone ranked above him).
For example, if the table is:
id attack
0 35
1 22
2 121
3 76
pos(3) = 1 (only player 2 is ranked above) + 1 = 2
You can create a view (probably) that shows every players score. Something along these lines might work.
create view player_scores as
select player_id, sum(score)
from scores
group by player_id
That will give you one row per player, with their total score. Having that view, the rank is simple.
select count(*)
from player_scores
where sum > (select sum from player_scores where player_id = 1)
That query will return the number of players having a higher score than player_id = 1.
Of course, if you know your player's score before you run the query, you can pass that score as a parameter. That will run a lot faster as long as the column is indexed.