Create a random selection weighted on number of points, SQL - sql

I have a table of winners for a prize draw, where each winner has earned a number of points over the year. There are 1300 registered users, with points varying between 50 and 43,000. I need to be able to select a random winner, which is straight forward, but the challenge I am having is building the logic where each point counts as an entry ticket into the prize draw. Would appreciate any help.
John

Your script would look something similar to this:
Script 1 :
DECLARE #Name varchar(100),
#Points int,
#i int
DECLARE Cursor1 CURSOR FOR SELECT Name, Points FROM Table1
OPEN Cursor1
FETCH NEXT FROM Cursor1
INTO #Name, #Points
WHILE ##FETCH_STATUS = 0
BEGIN
SET #i = 0
WHILE #i < #Points
BEGIN
INSERT INTO Table2 (Name)
VALUES (#Name)
SET #i = #i + 1
END
FETCH NEXT FROM Cursor1 INTO #Name, #Points
END
DEALLOCATE Cursor1
I have created a table (Table1) with only a Name and Points column (varchar(100) and int), I have created a cursor in order to look through all the records within Table1 and then loop through the Points and then inserted each record into another table (Table2).
This then imports the Name depending on the Points column.
Script 2 :
DECLARE #Name varchar(100),
#Points int,
#i int,
#Count int
CREATE TABLE #temptable(
UserEmailID nvarchar(200),
Points int)
DECLARE Cursor1 CURSOR FOR SELECT UserEmailID, Points FROM Table1_TEST
OPEN Cursor1
FETCH NEXT FROM Cursor1
INTO #Name, #Points
WHILE ##FETCH_STATUS = 0
BEGIN
SET #i = 0
WHILE #i < #Points
BEGIN
INSERT INTO #temptable (UserEmailID, Points)
VALUES (#Name, #Points)
SET #i = #i + 1
END
FETCH NEXT FROM Cursor1 INTO #Name, #Points
END
DEALLOCATE Cursor1
SELECT * FROM #temptable
DROP TABLE #temptable
In Script2 I have imported the result into a TEMP table as requested.
The script now runs through each record within you Table1 and imports the individuals UserEmailID and Points into the TEMP table depending on how much the Points are in Table1.
So if John has a total of 3 points, and Sarah 2, the script will import Johns UserEmailID 3 times into the TEMP table and 2 times for Sarah.
If you apply the random selector on the TEMP table, it will then randomly select a individual.
John would obviously stand a better chance to win because he has 3 records in the TEMP table whereas Sarah only has 2.
Suppose Johns UserEmailID is 1 and Sarah is 2:
The OUTPUT of TEMP table would then be:
UserEmailID | Points
1 | 3
1 | 3
1 | 3
2 | 2
2 | 2
Please let me know if you need any clarity.
Hope this helps.

You can do a weighted draw using the following method:
Calculate the cumulative sum of points.
Divide by the total number of points to get a value between 0 and 1
Each row in the original data will have a range, such as [0, 0.1), [0.1, 0.3), [0.3, 1]
Calculate a random number and choose the row where the value falls in the range
Here is standard'ish SQL for this approach:
with u as (
select u.*,
coalesce(lead(rangestart) over (order by points) as rangeend, 1)
from (select u.*,
sum(points*1.0) over (order by points) / sum(points) over () as rangestart
from users u
) u
),
r as (
select random() as rand
)
select u.*
from u
where r.rand between rangestart and rangeend;
In addition to using window functions (which can be handled by correlated subqueries in many cases), the exact format depends on whether the random number generator is deterministic for a query (such as SQL Server where random() returns one value no matter how often called in a query) or non-deterministic (such as in other databases). This method only requires one value for the random number generator, so it will work with either method.

So you want a winner with 1000 points have double the chances as another with only 500 points.
Sort the winners by whatever order and create a running total for the points:
id points
winner1 100
winner2 50
winner3 150
gives:
id points from to
winner1 100 1 100
winner2 50 101 150
winner3 150 151 300
Then compare with a random number from 1 to sum(points), in the example a number between 1 and 300. Find the winner with that number range and you're done.
select winpoints.id_winner
from
(
select
id as id_winner,
coalesce(sum(points) over(order by id rows between unbounded preceding and 1 preceding), 0) + 1 as from_points,
sum(points) over(order by id rows between unbounded preceding and current row) as to_points
from winners
) winpoints
where (select floor(rand() * (sum(points) from winners)) + 1)
between winpoints.from_points and winpoints.to_points;

This solution also works with fractional points/weights. It creates a helper table usersum.
create table user (id int primary key, points float);
insert into user values (1, 0.5), (2, 0), (3, 1);
create table usersum (id int primary key, pointsum float);
insert into usersum
select id, (select sum(points) from user b where b.id <= a.id)
from user a;
set #r = rand() * (select max(pointsum) from usersum);
select #r, usersum.* from usersum where pointsum >= #r order by id limit 1;
http://sqlfiddle.com/#!2/ae539e/1

Related

Create equal sized, random buckets, with no repetition to the row

Having some difficulty in a scheduling task.
Background: I have 100 members, 10 different sessions, and 10 different activities.
Rules:
Each member must do each activity only once.
Each activity must have the same number of members in each session.
The members must be with (at least mostly) different people in each session.
Each activity must be run in each session with 10 people per activity.
The expected outcome would be something like this:
Person ID
Session ID
Activity ID
1
S1
A
2
S1
B
3
S1
C
1
S2
B
2
S2
C
3
S2
A
In the above example, each activity in each session has only 1 participant, I have to lock that activity in that session out at 10 members.
I have tried a few different solutions in excel / SQL, but not able to meet all 3 rules. The hardest being keeping each activity/session slot to 10 people.
The closest solution I've had is the following.. its not pretty though:
SET STATISTICS TIME, io OFF
-- Create list of applicants
IF OBJECT_ID('process.Numbers') IS NOT NULL DROP TABLE process.Numbers
CREATE TABLE Numbers (ApplicantID INT, SessionID INT, GroupID INT)
DECLARE #i INT,
#Session INT,
#Group INT;
SELECT #i = 1;
SET NOCOUNT ON
WHILE #i <= 100
BEGIN
INSERT INTO Numbers (ApplicantID, SessionID) VALUES (#i, 1);
SELECT #i = #i + 1;
END;
-- Duplicate ApplicantID list for each different session
SELECT #Session = 1
WHILE #Session <= 10
BEGIN
IF #Session > 1
BEGIN
INSERT INTO
Numbers (ApplicantID, SessionID)
SELECT ApplicantID, #Session FROM Numbers WHERE SessionID = 1
END
-- SELECT RANDOM TOP 10 AND SET AS GROUP ID
SELECT #Group = 1
WHILE #Group <= 10
BEGIN
WITH dups_check AS ( SELECT ApplicantID,
GroupID,
COUNT(*) AS vol
FROM Numbers
GROUP BY ApplicantID,
GroupID),
cte AS ( SELECT TOP 10 *
FROM Numbers
WHERE numbers.GroupID IS NULL
AND SessionID = #Session
AND NOT EXISTS (SELECT 1
FROM dups_check
WHERE Numbers.ApplicantID = dups_check.ApplicantID
AND dups_check.GroupID = #Group)
ORDER BY newid())
UPDATE cte SET GroupID = #Group
SELECT #Group = #Group + 1
END
SELECT #Session = #Session + 1
END
SELECT * FROM Numbers
SET NOCOUNT OFF
This code starts to fall over regularly in the higher session numbers when it tries to set an activity that the individual has already done.
Thanks!
I tried using your code to Generate ApplicantID and SessionID rows and modified the last part to generate GroupID column using Ranking functions.
Below is the output of what I have tried:
SET STATISTICS TIME, io OFF
-- Create list of applicants
IF OBJECT_ID('dbo.Numbers') IS NOT NULL DROP TABLE dbo.Numbers
CREATE TABLE dbo.Numbers (ApplicantID INT, SessionID INT, GroupID INT)
DECLARE #i INT,
#Session INT,
#Group INT;
SELECT #i = 1;
SET NOCOUNT ON
WHILE #i <= 100
BEGIN
INSERT INTO Numbers (ApplicantID, SessionID) VALUES (#i, 1);
SELECT #i = #i + 1;
END;
-- Duplicate ApplicantID list for each different session
SELECT #Session = 1
WHILE #Session <= 10
BEGIN
IF #Session > 1
BEGIN
INSERT INTO
Numbers (ApplicantID, SessionID)
SELECT ApplicantID, #Session FROM Numbers WHERE SessionID = 1
END
SELECT #Session = #Session + 1
END
SET NOCOUNT OFF
drop table if exists #temp;
select ApplicantID, SessionID, row_number() OVER(PARTITION BY applicantID ORDER BY applicantID) AS grp_row into #temp
from Numbers
update a
set a.GroupID = b.grp_row
from Numbers a
join #temp b on a.ApplicantID = b. ApplicantID and a.SessionID = b.SessionID
where a.GroupID is null
Each member must do each activity only once.
There are 100 applicants, and as an example, I am showing applicants 1 & 100. Here Each applicant is having each groupID only once.
Each activity must have the same number of members in each session.
There are 10 GroupID's and the number of applicants for each GroupID is the same (100).
The members must be with (at least mostly) different people in each session.
There are 100 applicants but I am taking the top 10 as an example. Here each sessionID has different applicants.

Convert Excel formula ' =COUNTIF($B$2:B2,[#[reg_no]]) ' to SQL

My excel sheet having a column Count is responsible for counting how many times one registration number is repeated as you can see in the given picture.
Whenever I am going to add any new record in my excel table this column go up and count how many records are there as like my reg_no
Let us take Example:
If we add new record at 17th id with
Reg_no = 3591
Name = 'dani'
grade = 'A'
Count ?
Now it will be like Count = 4
I want to convert this table into a SQL query and I am having a problem converting this Count column that how I am going to calculate this count column in SQL
Does anyone know? please help
step 1 create a temp table with empty column
SELECT * , null as desired_column ,
into #yourTable_t1
FROM #yourTable j;
step 2 create a cursor to calculate your desired_column and update temp_table
begin
declare #row int, #order int, #prod varchar(100), #prod_count int =0 ;
declare prod_cur cursor for
SELECT row_num, MyColumn1,MyColumn2
FROM #yourTable_t1 ;
open prod_cur;
fetch next from prod_cur into #row , #order, #prod;
while (##FETCH_STATUS=0)
begin
set #prod_count= ( select count(MyColumn2) from #yourTable_t1 where
MyColumn2= #prod and ROW_NUM <= #row);
update #yourTable_t1
set desired_column = #prod_count
where ROW_NUM= #row;
fetch next from prod_cur into #row , #order, #prod;
end;
close prod_cur;
deallocate prod_cur;
--select * from #yourTable_t1 order by MyColumn2;
end;
Good Luck!
This can be done using window functions
count(*) over (partition by rege_no order by id) as count
Online example

TSQL Divide Up a Table evenly based on a Sort

How can I split a table evenly based on a sort? Here is a mock up script of what I am talking about:
Edit: I want to split the table evenly by balance into 4 different groups. (or any number of groups). It's important so that each group has their fair share of high and low balances.
DECLARE #WorkList TABLE
(
account_number VARCHAR(10),
balance MONEY,
assigned_to INT
)
DECLARE #Loop INT
DECLARE #TotalPartsToSplitEvenly INT
SET #TotalPartsToSplitEvenly = 4
SET #Loop = 1
WHILE #Loop < 50
BEGIN
INSERT INTO #WorkList (account_number, balance, assigned_to)
VALUES ((#Loop * 5) * 1234, #Loop * 1000, NULL)
SET #Loop = #Loop + 1
END
SELECT *
FROM #WorkList
ORDER BY balance DESC
I want to split the result set evenly so that everyone gets their fair share of balance.
account_number balance assigned_to
-------------- --------------------- -----------
302330 49000.00 1
296160 48000.00 2
289990 47000.00 3
283820 46000.00 4
277650 45000.00 1
271480 44000.00 2
265310 43000.00 3
259140 42000.00 4
252970 41000.00 1
246800 40000.00 2
240630 39000.00 3
NTILE doesn't work for this. I am out of ideas.
You seem to want row_number() mod 4:
select wl.*,
(1 + (row_number() over (order by balance desc) - 1) % 4) as assigned_to
from #worklist wl;

Repeat query if no results came up

Could someone please advise on how to repeat the query if it returned no results. I am trying to generate a random person out of the DB using RAND, but only if that number was not used previously (that info is stored in the column "allready_drawn").
At this point when the query comes over the number that was drawn before, because of the second condition "is null" it does not display a result.
I would need for query to re-run once again until it comes up with a number.
DECLARE #min INTEGER;
DECLARE #max INTEGER;
set #min = (select top 1 id from [dbo].[persons] where sector = 8 order by id ASC);
set #max = (select top 1 id from [dbo].[persons] where sector = 8 order by id DESC);
select
ordial,
name_surname
from [dbo].[persons]
where id = ROUND(((#max - #min) * RAND() + #min), 0) and allready_drawn is NULL
The results (two possible outcomes):
Any suggestion is appreciated and I would like to thank everyone in advance.
Just try this to remove the "id" filter so you only have to run it once
select TOP 1
ordial,
name_surname
from [dbo].[persons]
where allready_drawn is NULL
ORDER BY NEWID()
#gbn that's a correct solution, but it's possible it's too expensive. For very large tables with dense keys, randomly picking a key value between the min and max and re-picking until you find a match is also fair, and cheaper than sorting the whole table.
Also there's a bug in the original post, as the min and max rows will be selected only half as often as the others, as each maps to a smaller interval. To fix generate a random number from #min to #max + 1, and truncate, rather than round. That way you map the interval [N,N+1) to N, ensuring a fair chance for each N.
For this selection method, here's how to repeat until you find a match.
--drop table persons
go
create table persons(id int, ordial int, name_surname varchar(2000), sector int, allready_drawn bit)
insert into persons(id,ordial,name_surname,sector, allready_drawn)
values (1,1,'foo',8,null),(2,2,'foo2',8,null),(100,100,'foo100',8,null)
go
declare #min int = (select top 1 id from [dbo].[persons] where sector = 8 order by id ASC);
declare #max int = 1+ (select top 1 id from [dbo].[persons] where sector = 8 order by id DESC);
set nocount on
declare #results table(ordial int, name_surname varchar(2000))
declare #i int = 0
declare #selected bit = 0
while #selected = 0
begin
set #i += 1
insert into #results(ordial,name_surname)
select
ordial,
name_surname
from [dbo].[persons]
where id = ROUND(((#max - #min) * RAND() + #min), 0, 1) and allready_drawn is NULL
if ##ROWCOUNT > 0
begin
select *, #i tries from #results
set #selected = 1
end
end

SQL: how to get random number of rows from one table for each row in another

I have two tables where the data is not related
For each row in table A i want e.g. 3 random rows in table B
This is fairly easy using a cursor, but it is awfully slow
So how can i express this in single statement to avoid RBAR ?
To get a random number between 0 and (N-1), you can use.
abs(checksum(newid())) % N
Which means to get positive values 1-N, you use
1 + abs(checksum(newid())) % N
Note: RAND() doesn't work - it is evaluated once per query batch and you get stuck with the same value for all rows of tableA.
The query:
SELECT *
FROM tableA A
JOIN (select *, rn=row_number() over (order by newid())
from tableB) B ON B.rn <= 1 + abs(checksum(newid())) % 9
(assuming you wanted up to 9 random rows of B per A)
assuming tableB has integer surrogate key, try
Declare #maxRecs integer = 11 -- Maximum number of b records per a record
Select a.*, b.*
From tableA a Join tableB b
On b.PKColumn % (floor(Rand() * #maxRecs)) = 0
If you have a fixed number that you know in advance (such as 3), then:
select a.*, b.*
from a cross join
(select top 3 * from b) b
If you want a random number of rows from "b" for each row in "a", the problem is a bit harder in SQL Server.
Heres an example of how this could be done, code is self contained, copy and press F5 ;)
-- create two tables we can join
DECLARE #datatable TABLE(ID INT)
DECLARE #randomtable TABLE(ID INT)
-- add some dummy data
DECLARE #i INT = 1
WHILE(#i < 3) BEGIN
INSERT INTO #datatable (ID) VALUES (#i)
SET #i = #i + 1
END
SET #i = 1
WHILE(#i < 100) BEGIN
INSERT INTO #randomtable (ID) VALUES (#i)
SET #i = #i + 1
END
--The key here being the ORDER BY newid() which makes sure that
--the TOP 3 is different every time
SELECT
d.ID AS DataID
,rtable.ID RandomRow
FROM #datatable d
LEFT JOIN (SELECT TOP 3 * FROM #randomtable ORDER BY newid()) as rtable ON 1 = 1
Heres an example of the output