Select random row with different probability - SQL - sql

I want select random row with different probability column based:
ID Type
Bike 1
Moto 1
Asus 2
Car 1
Apple 2
John 3
If i do this i will have random probability:
select top 1 * from Items order by newid()
I want John (type 3) has 70% probability to get, and 5% for type 1 and 25% for type 2.

I would use the RAND() function instead of NEWID().
Using RAND(), we can generate a random number between 1 and 100, and then use a CASE statement to select a type based on the number randomly generated.
According to MSDN:
RAND() returns a pseudo-random float value from 0 through 1,
exclusive
Meaning that multiplying RAND() by 100 will give us a number from 0 to 99. Adding 1 changes the range to 1 to 100.
If after selecting which type to return, you want to randomly select a record from that type, you can then add a SELECT TOP 1... ORDER BY NEWID() to get a random record of that type:
DECLARE #Random INT
SET #Random = (RAND() * 100) + 1
SELECT TOP 1 ID, Type
FROM Items
WHERE Type = CASE
WHEN #Random > 30 THEN 3
WHEN #Random BETWEEN 6 AND 30 THEN 2
ELSE 1
END
ORDER BY NEWID()
See it here... run it a few times to see that the results match the probabilities.

You mean 5% probability for entire type=1 group, or you want every record of type=1 to have 5% probability of being selected?
If it's second option, then you have 70+15+50=135 = no way you can do this.
If it's first option, then you'll have to make 2 draws - first for a type, and then for a row in this type.

Related

SAP HANA random number generator

I want to write a statement that generates a random pick from a set of four values (2,4,6, and 8). Below is the select statement that I have so far
SELECT
CASE
WHEN RN_GENERATOR.RANDOM_NUMBER BETWEEN 0 AND 2.00 THEN 2
WHEN RN_GENERATOR.RANDOM_NUMBER BETWEEN 2.01 AND 4.00 THEN 4
WHEN RN_GENERATOR.RANDOM_NUMBER BETWEEN 4.01 AND 6.00 THEN 6
WHEN RN_GENERATOR.RANDOM_NUMBER BETWEEN 6.01 AND 8.00 THEN 8
END AS ORDER_FREQUENCY
FROM (SELECT ROUND(RAND()*8,2) AS RANDOM_NUMBER FROM DUMMY) RN_GENERATOR
is there a more intelligent way of doing this?
Looks to me as if your requirement can be fulfilled but his statement
select
ROUND(rand()*4, 0, ROUND_CEILING) * 2 as ORDER_FREQUENCY
from dummy;
RAND() * 4 spreads the value range of possible outcomes for the RAND() function from 0..1 to 0..4.
ROUND( ... , 0, ROUND_CEILING) rounds the number to the next larger or equal integer and leaves no decimal places. For this example this means that the output of this rounding can only be 1, 2, 3 or 4.
*2 simply maps the four possible values to your target number range 2, 4, 6, 8. If the multiplication wouldn't suffice, you could also use the MAP() function for this.
And that's it. Random numbers picked from the set of (2, 4, 6, 8).
You can randomly sort a data set by using
ORDER BY Rand()
and select first one as your random value
Here is an example
select top 1 rownum
from Numbers_Table(10) as nt
where rownum in (2,4,6,8)
order by rand();
Numbers_Table function returns a numbers table on HANA database and I filter only the values that you want to see as your possible random values using WHERE clause
TOP 1 clause in SELECT command returns the first integer which is randomly ordered
I hope it helps

To Generate Random Numbers in Ms Sql server [duplicate]

I need a different random number for each row in my table. The following seemingly obvious code uses the same random value for each row.
SELECT table_name, RAND() magic_number
FROM information_schema.tables
I'd like to get an INT or a FLOAT out of this. The rest of the story is I'm going to use this random number to create a random date offset from a known date, e.g. 1-14 days offset from a start date.
This is for Microsoft SQL Server 2000.
Take a look at SQL Server - Set based random numbers which has a very detailed explanation.
To summarize, the following code generates a random number between 0 and 13 inclusive with a uniform distribution:
ABS(CHECKSUM(NewId())) % 14
To change your range, just change the number at the end of the expression. Be extra careful if you need a range that includes both positive and negative numbers. If you do it wrong, it's possible to double-count the number 0.
A small warning for the math nuts in the room: there is a very slight bias in this code. CHECKSUM() results in numbers that are uniform across the entire range of the sql Int datatype, or at least as near so as my (the editor) testing can show. However, there will be some bias when CHECKSUM() produces a number at the very top end of that range. Any time you get a number between the maximum possible integer and the last exact multiple of the size of your desired range (14 in this case) before that maximum integer, those results are favored over the remaining portion of your range that cannot be produced from that last multiple of 14.
As an example, imagine the entire range of the Int type is only 19. 19 is the largest possible integer you can hold. When CHECKSUM() results in 14-19, these correspond to results 0-5. Those numbers would be heavily favored over 6-13, because CHECKSUM() is twice as likely to generate them. It's easier to demonstrate this visually. Below is the entire possible set of results for our imaginary integer range:
Checksum Integer: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Range Result: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 1 2 3 4 5
You can see here that there are more chances to produce some numbers than others: bias. Thankfully, the actual range of the Int type is much larger... so much so that in most cases the bias is nearly undetectable. However, it is something to be aware of if you ever find yourself doing this for serious security code.
When called multiple times in a single batch, rand() returns the same number.
I'd suggest using convert(varbinary,newid()) as the seed argument:
SELECT table_name, 1.0 + floor(14 * RAND(convert(varbinary, newid()))) magic_number
FROM information_schema.tables
newid() is guaranteed to return a different value each time it's called, even within the same batch, so using it as a seed will prompt rand() to give a different value each time.
Edited to get a random whole number from 1 to 14.
RAND(CHECKSUM(NEWID()))
The above will generate a (pseudo-) random number between 0 and 1, exclusive. If used in a select, because the seed value changes for each row, it will generate a new random number for each row (it is not guaranteed to generate a unique number per row however).
Example when combined with an upper limit of 10 (produces numbers 1 - 10):
CAST(RAND(CHECKSUM(NEWID())) * 10 as INT) + 1
Transact-SQL Documentation:
CAST(): https://learn.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql
RAND(): http://msdn.microsoft.com/en-us/library/ms177610.aspx
CHECKSUM(): http://msdn.microsoft.com/en-us/library/ms189788.aspx
NEWID(): https://learn.microsoft.com/en-us/sql/t-sql/functions/newid-transact-sql
Random number generation between 1000 and 9999 inclusive:
FLOOR(RAND(CHECKSUM(NEWID()))*(9999-1000+1)+1000)
"+1" - to include upper bound values(9999 for previous example)
Answering the old question, but this answer has not been provided previously, and hopefully this will be useful for someone finding this results through a search engine.
With SQL Server 2008, a new function has been introduced, CRYPT_GEN_RANDOM(8), which uses CryptoAPI to produce a cryptographically strong random number, returned as VARBINARY(8000). Here's the documentation page: https://learn.microsoft.com/en-us/sql/t-sql/functions/crypt-gen-random-transact-sql
So to get a random number, you can simply call the function and cast it to the necessary type:
select CAST(CRYPT_GEN_RANDOM(8) AS bigint)
or to get a float between -1 and +1, you could do something like this:
select CAST(CRYPT_GEN_RANDOM(8) AS bigint) % 1000000000 / 1000000000.0
The Rand() function will generate the same random number, if used in a table SELECT query. Same applies if you use a seed to the Rand function. An alternative way to do it, is using this:
SELECT ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT)) AS [RandomNumber]
Got the information from here, which explains the problem very well.
Do you have an integer value in each row that you could pass as a seed to the RAND function?
To get an integer between 1 and 14 I believe this would work:
FLOOR( RAND(<yourseed>) * 14) + 1
If you need to preserve your seed so that it generates the "same" random data every time, you can do the following:
1. Create a view that returns select rand()
if object_id('cr_sample_randView') is not null
begin
drop view cr_sample_randView
end
go
create view cr_sample_randView
as
select rand() as random_number
go
2. Create a UDF that selects the value from the view.
if object_id('cr_sample_fnPerRowRand') is not null
begin
drop function cr_sample_fnPerRowRand
end
go
create function cr_sample_fnPerRowRand()
returns float
as
begin
declare #returnValue float
select #returnValue = random_number from cr_sample_randView
return #returnValue
end
go
3. Before selecting your data, seed the rand() function, and then use the UDF in your select statement.
select rand(200); -- see the rand() function
with cte(id) as
(select row_number() over(order by object_id) from sys.all_objects)
select
id,
dbo.cr_sample_fnPerRowRand()
from cte
where id <= 1000 -- limit the results to 1000 random numbers
select round(rand(checksum(newid()))*(10)+20,2)
Here the random number will come in between 20 and 30.
round will give two decimal place maximum.
If you want negative numbers you can do it with
select round(rand(checksum(newid()))*(10)-60,2)
Then the min value will be -60 and max will be -50.
try using a seed value in the RAND(seedInt). RAND() will only execute once per statement that is why you see the same number each time.
If you don't need it to be an integer, but any random unique identifier, you can use newid()
SELECT table_name, newid() magic_number
FROM information_schema.tables
You would need to call RAND() for each row. Here is a good example
https://web.archive.org/web/20090216200320/http://dotnet.org.za/calmyourself/archive/2007/04/13/sql-rand-trap-same-value-per-row.aspx
The problem I sometimes have with the selected "Answer" is that the distribution isn't always even. If you need a very even distribution of random 1 - 14 among lots of rows, you can do something like this (my database has 511 tables, so this works. If you have less rows than you do random number span, this does not work well):
SELECT table_name, ntile(14) over(order by newId()) randomNumber
FROM information_schema.tables
This kind of does the opposite of normal random solutions in the sense that it keeps the numbers sequenced and randomizes the other column.
Remember, I have 511 tables in my database (which is pertinent only b/c we're selecting from the information_schema). If I take the previous query and put it into a temp table #X, and then run this query on the resulting data:
select randomNumber, count(*) ct from #X
group by randomNumber
I get this result, showing me that my random number is VERY evenly distributed among the many rows:
It's as easy as:
DECLARE #rv FLOAT;
SELECT #rv = rand();
And this will put a random number between 0-99 into a table:
CREATE TABLE R
(
Number int
)
DECLARE #rv FLOAT;
SELECT #rv = rand();
INSERT INTO dbo.R
(Number)
values((#rv * 100));
SELECT * FROM R
select ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT)) as [Randomizer]
has always worked for me
Use newid()
select newid()
or possibly this
select binary_checksum(newid())
If you want to generate a random number between 1 and 14 inclusive.
SELECT CONVERT(int, RAND() * (14 - 1) + 1)
OR
SELECT ABS(CHECKSUM(NewId())) % (14 -1) + 1
DROP VIEW IF EXISTS vwGetNewNumber;
GO
Create View vwGetNewNumber
as
Select CAST(RAND(CHECKSUM(NEWID())) * 62 as INT) + 1 as NextID,
'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'as alpha_num;
---------------CTDE_GENERATE_PUBLIC_KEY -----------------
DROP FUNCTION IF EXISTS CTDE_GENERATE_PUBLIC_KEY;
GO
create function CTDE_GENERATE_PUBLIC_KEY()
RETURNS NVARCHAR(32)
AS
BEGIN
DECLARE #private_key NVARCHAR(32);
set #private_key = dbo.CTDE_GENERATE_32_BIT_KEY();
return #private_key;
END;
go
---------------CTDE_GENERATE_32_BIT_KEY -----------------
DROP FUNCTION IF EXISTS CTDE_GENERATE_32_BIT_KEY;
GO
CREATE function CTDE_GENERATE_32_BIT_KEY()
RETURNS NVARCHAR(32)
AS
BEGIN
DECLARE #public_key NVARCHAR(32);
DECLARE #alpha_num NVARCHAR(62);
DECLARE #start_index INT = 0;
DECLARE #i INT = 0;
select top 1 #alpha_num = alpha_num from vwGetNewNumber;
WHILE #i < 32
BEGIN
select top 1 #start_index = NextID from vwGetNewNumber;
set #public_key = concat (substring(#alpha_num,#start_index,1),#public_key);
set #i = #i + 1;
END;
return #public_key;
END;
select dbo.CTDE_GENERATE_PUBLIC_KEY() public_key;
Update my_table set my_field = CEILING((RAND(CAST(NEWID() AS varbinary)) * 10))
Number between 1 and 10.
Try this:
SELECT RAND(convert(varbinary, newid()))*(b-a)+a magic_number
Where a is the lower number and b is the upper number
If you need a specific number of random number you can use recursive CTE:
;WITH A AS (
SELECT 1 X, RAND() R
UNION ALL
SELECT X + 1, RAND(R*100000) --Change the seed
FROM A
WHERE X < 1000 --How many random numbers you need
)
SELECT
X
, RAND_BETWEEN_1_AND_14 = FLOOR(R * 14 + 1)
FROM A
OPTION (MAXRECURSION 0) --If you need more than 100 numbers

SQL field constraint that forces a column to be ascending

I am working on a small table that has a user input with a number field. The number that the user inputs has to be larger by a few points than the current highest number. Can I also check that the score has to be for instance 1 higher if the current highest score is < 10 but 5 higher if the current highest 10 <= score < 100?
for instance:
user score
1 1
1 2
1 4
1 5
1 7
Now, I want a constraint that will check on insert that the inserted score is bigger than the current highest score by x amount.
Is such a constraint possible?
Such a constraint is difficult to implement. If you care about performance, can you simply input the difference?
1 1
1 1
1 2
1 1
1 2
If you do the data this way, then you can use check (score > 0) and then use sum(score) over (order by ??), where ?? specifies the ordering of the rows.
Otherwise, you'll need to use either a trigger or user-defined function to implement the constraint.

Updating values with conditions

Let's assume I have a table like that
END
id rank degree
1 4 3
2 3 3
**rank 4 has 4 degrees.
**rank 3 has 4 degrees.
And we will assume that each rank has some number of degrees. I will assume that rank 3 has 4 degrees.
And I want that, when I increase a degree that is the maximum for the current rank, the rank increases by 1 and the degree resets back to 1. For example, I want to increase the degree of the id 2 in the above table by 1. As a result, the rank should be 4 and the degree should be 1.
How can I make that update efficiently in SQL Server 2008?
I assume you want to update all rows with the same rank of the one with id=3.
UPDATE t SET t.rank = t.rank + 1, t.degree = 1
FROM tableName t
WHERE rank = (SELECT rank FROM tableName t2 WHERE id=#id)
Is this what you want?
update t
set degree = (case when degree = 5 then 1 else degree + 1 end),
rank = (case when degree = 5 then rank + 1 else rank end)
where id = 3;
If there is a reference table containing information about the maximum degree for every supported rank (let's call it dbo.ranks), something like this:
rank maxdegree
---- ---------
1 3
2 4
3 5
4 4
... ...
where maxdegree is assumed to be an integer greater than 0, then here's how you could use it:
UPDATE t
SET
t.rank += t.degree / r.maxdegree,
t.degree += t.degree % r.maxdegree + 1
FROM dbo.atable AS t
INNER JOIN dbo.ranks AS r
ON t.rank = r.rank
;
where dbo.atable is assumed to be the name of the table to update.
Note that this query is only intended for increasing by 1. If you want it to be able to increase degree by an arbitrary number, you will need to make more substantial changes than just replacing 1 in the + 1 bit.
Also, this query will not work correctly with some cases of invalid data in your table (like degrees greater than the corresponding maximums), so make sure you've removed any anomalies in your data before trying to use this.

Giving Range to the SQL Column

I have SQL table in which I have column and Probability . I want to select one row from it with randomly but I want to give more chances to the more waighted probability. I can do this by
Order By abs(checksum(newid()))
But the difference between Probabilities are too much so it gives more chance to highest probability.Like After picking 74 times that value it pick up another value for once than again around 74 times.I want to reduce this .Like I want 3-4 times to it and than others and all. I am thinking to give Range to the Probabilies.Its Like
Row[i] = Row[i-1]+Row[i]
How can I do this .Do I need to create function?Is there any there any other way to achieve this.I am neewby.Any help will be appriciated.Thank You
EDIT:
I have solution of my problem . I have one question .
if I have table as follows.
Column1 Column2
1 50
2 30
3 20
can i get?
Column1 Column2 Column3
1 50 50
2 30 80
3 20 100
Each time I want to add value with existing one.Is there any Way?
UPDATE:
Finally get the solution after 3 hours,I just take square root of my probailities that way I can narrow the difference bw them .It is like I add column with
sqrt(sqrt(sqrt(Probability)))....:-)
I'd handle it by something like
ORDER BY rand()*pow(<probability-field-name>,<n>)
for different values of n you will distort the linear probabilities into a simple polynomial. Small values of n (e.g. 0.5) will compress the probabilities to 1 and thus make less probable choices more probable, big values of n (e.g. 2) will do the opposite and further reduce probability of already inprobable values.
Since the difference in probabilities is too great, you need to add a computed field with a revised weighting that has a more even probability distribution. How you do that depends on your data and preferred distribution. One way to do it is to "normalize" the weighting to an integer between 1 and 10 so that the lowest probability is never more than ten times smaller than the highest.
Answer to your recent question:
SELECT t.Column1,
t.Column2,
(SELECT SUM(Column2)
FROM table t2
WHERE t2.Column1 <= t.Column1) Column3
FROM table t
Here is a basic example how to select one row from the table with taking into account the assigned row weights.
Suppose we have table:
CREATE TABLE TableWithWeights(
Id int NOT NULL PRIMARY KEY,
DataColumn nvarchar(50) NOT NULL,
Weight decimal(18, 6) NOT NULL -- Weight column
)
Let's fill table with sample data.
INSERT INTO TableWithWeights VALUES(1, 'Frequent', 50)
INSERT INTO TableWithWeights VALUES(2, 'Common', 30)
INSERT INTO TableWithWeights VALUES(3, 'Rare', 20)
This is the query that returns one random row with taking into account given row weights.
SELECT * FROM
(SELECT tww1.*, -- Select original table data
-- Add column with the sum of all weights of previous rows
(SELECT SUM(tww2.Weight)- tww1.Weight
FROM TableWithWeights tww2
WHERE tww2.id <= tww1.id) as SumOfWeightsOfPreviousRows
FROM TableWithWeights tww1) as tww,
-- Add column with random number within the range [0, SumOfWeights)
(SELECT RAND()* sum(weight) as rnd
FROM TableWithWeights) r
WHERE
(tww.SumOfWeightsOfPreviousRows <= r.rnd)
and ( r.rnd < tww.SumOfWeightsOfPreviousRows + tww.Weight)
To check query results we can run it for 100 times.
DECLARE #count as int;
SET #count = 0;
WHILE ( #count < 100)
BEGIN
-- This is the query that returns one random row with
-- taking into account given row weights
SELECT * FROM
(SELECT tww1.*, -- Select original table data
-- Add column with the sum of all weights of previous rows
(SELECT SUM(tww2.Weight)- tww1.Weight
FROM TableWithWeights tww2
WHERE tww2.id <= tww1.id) as SumOfWeightsOfPreviousRows
FROM TableWithWeights tww1) as tww,
-- Add column with random number within the range [0, SumOfWeights)
(SELECT RAND()* sum(weight) as rnd
FROM TableWithWeights) r
WHERE
(tww.SumOfWeightsOfPreviousRows <= r.rnd)
and ( r.rnd < tww.SumOfWeightsOfPreviousRows + tww.Weight)
-- Increase counter
SET #count += 1
END
PS The query was tested on SQL Server 2008 R2. And of course the query can be optimized (it's easy to do if you get the idea)