Random numbers from database [duplicate] - sql

This question already has answers here:
Closed 14 years ago.
EDIT: Duplicate of How do I return random numbers as a column in SQL Server 2005?
Hello
How can I generate random numbers in mssql server 2005.
for example: when I select 500rows from table, each row must have one generated random number, numbers must be generated at runtime
UPDATE: I need the fastest method, generate numbers from large tables, datepart, same magic computations is realatively slow with big amount of rows
UPDATE thanks for answers, but o finally used this solution
SELECT ABS(CHECKSUM(NewId())) % 10 - 5 AS Random
for random numbers from -5 to 5 and bonus is approximately the same number of occurances for each number

SELECT TOP 500
CONVERT(INT, CONVERT(VARBINARY(16), NEWID()))
FROM
dbo.MyTable

I love this trick, its awesome ...
-- 500 random numbers from 0 to 499
select top 500 ABS(CAST(CAST(NEWID() AS VARBINARY) AS int)) % 500 from sysobjects

easy google: 1 2 ...
Returning Random Numbers in a SELECT statement
As it is implemented, the RAND() function in SQL Server doesn't let you return a different random number per row in your SELECT statement. For example, if you execute this:
SELECT Rand() as RandomNumber, *
FROM Northwind..Customers
You will see the same number over and over. However, sometimes, you may want to return a randomly generated number per line in your SELECT. Here's one way to do that, in SQL Server 2000, using a UDF.
First, we need to create a View that returns a single random number:
CREATE VIEW vRandNumber
AS
SELECT RAND() as RandNumber
The view is necessary because normally in a UDF we cannot use the rand() function, because that would make the function non-determistic. We can trick the UDF to accepting a random number by using a View.
Once that is set up, we can then create our function:
CREATE FUNCTION RandNumber()
RETURNS float
AS
BEGIN
RETURN (SELECT RandNumber FROM vRandNumber)
END
Finally, you can use this function in any SELECT to now return a random number between 0 and 1 per row:
SELECT dbo.RandNumber(), *
FROM Northwind..Customers
You can get even fancier in your RandNumber function by accepting a seed if you like, or allowing for range parameters like this:
CREATE FUNCTION RandNumber2(#Min int, #Max int)
RETURNS float
AS
BEGIN
RETURN #Min + (select RandNumber from RetRandNumber) * (#Max-#Min)
END

It might be helpful to know why you want the random number. For example if you just want to sort the results randomly you can do ORDER BY NEWID() and not worry about generating the random number at all. If you really need the random number later, use #Tim quoted solution.

Related

How to permutate an SQL table using a seed?

Background
I have a front-end with a list of items with infinite scrolling, and I fetch pages of items by specifying the page limit and offset.
Problem
Apart from simply ordering the result by some of the columns, I would like to add a "random" option. The thing is, I don't want repetitions, so I need to have the entire dataset permutated before doing the limit and offset, and I need to be able to get the same permutation as long as I supply the same seed.
What I tried
A naive approach was to write a table-valued function that takes an int seed and uses it in the ORDER BY clause like so:
SELECT *
FROM dbo.Entities e
ORDER BY HASHBYTES('MD2', e.Title) ^ #seed
OFFSET 0 ROWS
FETCH NEXT (SELECT COUNT(*) FROM dbo.Entities) ROWS ONLY
This seemed to work well at a first glance, but it turned out it's not very "volatile" for the lack of better word - it becomes more visible with sparse result sets, where most seeds (chosen randomly from between 0 and 2147483647) yield the same order.
I thought I would get better results by hashing the seed as well, but SQL Server doesn't allow me to XOR two varbinary variables. Am I even looking in the right direction? Are there any performance considerations that I should be making and I might not be aware of?
The best way is to create a tally table with two columns: first a sequential integer, (between 1 and 1,000,000), second a random integer number. Then generate a random number to get the first value and then make a join with a computed ROW_NUMBER().
CREATE TABLE T_NUM (SEQUENTIAL INT, RANDOM INT);
GO
WITH
N AS(SELECT 0 AS I
UNION ALL
SELECT I + 1
FROM N
WHERE I < 9)
INSERT INTO T_NUM (SEQUENTIAL)
SELECT N1.I + N2.I * 10 + N3.I * 100 + N4.I * 1000 + N5.I * 10000 + N6.I * 100000
FROM N AS N1
CROSS JOIN N AS N2
CROSS JOIN N AS N3
CROSS JOIN N AS N4
CROSS JOIN N AS N5
CROSS JOIN N AS N6;
GO
WITH T AS
(
SELECT SEQUENTIAL, ROW_NUMBER() OVER (ORDER BY CHECKSUM(NEWID())) AS ALEA
FROM T_NUM
)
UPDATE N
SET RANDOM = ALEA
FROM T_NUM AS N
JOIN T ON T.SEQUENTIAL = N.SEQUENTIAL;
GO
DECLARE #SEED INT = FLOOR(1 + RAND() * 1000000);
Now you have your seed to enter in the alea sequence then join your table on sequential order
ORDER BY HASHBYTES('MD2', e.Title + convert(nvarchar(max), #seed))
should work, but performance-wise it would be a disaster. You would calculate MD2 for all records every time. I would not do this on server side at all. You can generate random sequence on client and then just pick from server rows with row number 158, 7, 1027 and 9. But it has still two problems
if item is deleted, row number of all consecutive records shifts. It would just break the whole sequence and you would get duplicities and missing records
row number over millions of records is not that fast either
I see two options. You can query all ids from the table and use them for generating of random order. But that would be a lot of numbers. Or you have to ensure the id space is dense enough. Then you can query 20 random ids and hope at least 10 of them exist. If you are unlucky, you would have to query again.

How to select a repeatable random number with setseed in postgres sql?

What I am trying to achieve is selecting a control group for a process. To do this I am using random(), and for debugging / consistency I would like to be able to set the random number in a repeatable fashion. Meaning, I run the query once it assigns user 123 random number .001. At a different time I delete the previous data, I call the same query, and once again user 123 is assigned random number .001.
I have tried:
SELECT setseed(0);
SELECT 1, random() from generate_series(1,10);
I receive a different random number with every run.
SELECT 1, setseed(0), random() from generate_series(1,10);
Each row receives the same random number, which is useless.
I'm sure there is something I don't understand here. Any help is appreciated.
Do a union all of the setseed() query with the desired query. It is necessary to match the column types from both queries. setseed() returns void.
select setseed(0), null
union all
select null, random()
from generate_series(1, 10)
offset 1
;
The offset 1 clause eliminates the setseed() row from the result set

How can I find a find a match in a list of values, rounded down to the nearest match, in SQL Server

I'm aware of the ABS and Round functions in SQL Server but I believe my problem is a little different and I'm not sure how to use these to achieve the desired result.
Say I have a number: 8000
And I have a query that returns this list of numbers: 0, 5000, 10000, 15000
If I use the ABS function with this list e.g
DECLARE #target as INT
SET #target = 8000
SELECT TOP(1) #result AS Number
FROM dbo.Numbers
ORDER BY ABS(Number - #target)
I get 10000
Which is expected
But how can I make this return 5000 i.e. I always get the result rounded down?
Understanding your question to mean you want the nearest match without going over:
Add a WHERE Number <= #target.
Try this:
...
WHERE Number <= #target
ORDER BY Number DESC
We simply get the first number (ORDER BY Number DESC) not greater than the #target (WHERE Number <= #target).
The nice thing about losing ABS in ORDER BY this is that the DBMS should now be able to utilize an index on Number (if there is one).

Create a view on an integer?

I have a table named A. it has only one record with one field. It is an integer named number.
I want to create a view that have A.number records, each are one of the numbers less than A.number.
For example:
select A.number -----> 5
the view should show 5 records 0 1 2 3 4
P.S: This is a real problem that I simplified it a lot. The real problem is like dividing a budget in a fixed period to each day.
This sounds a bit like it might be homework, so I'm wary of providing the code outright.
I can give a pointer for how to solve the question, though. You use a recursive CTE where each iteration adds one to the previous iteration. Just be sure to set the MAXRECURSION option if you'll be checking numbers > 101. You can use a scalar sub query to key the view to the original table:
WITH numbers ( n ) AS (
SELECT 0 UNION ALL
SELECT 1 + n FROM numbers WHERE n < (select number from a) -1)
SELECT n FROM numbers
OPTION ( MAXRECURSION 500) --example
If the number of your table will be < 2048 and you are on SQL Server, this will work for you:
CREATE VIEW MyView AS
SELECT number
FROM master..spt_values
WHERE type = 'p'
AND number < (SELECT value FROM yourTable)
Alternatively you could consider creating your own Numbers table with an appropriate size to suit your application if you require a higher limit, or are not on SQL Server that has this provided to you. Here is a link to a blog post on the idea of having a "Numbers table" handy.

Big performance difference (1hr to 1 minute ) found in SQL. Can you explain why?

The following queries are taking 70 minutes and 1 minute respectively on a standard machine for 1 million records. What could be the possible reasons?
Query [01:10:00]
SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_PartitionTest(
CASE WHEN sys.fn_cdc_increment_lsn(0x00)<sys.fn_cdc_get_min_lsn('dbo_PartitionTest')
THEN sys.fn_cdc_get_min_lsn('dbo_PartitionTest')
ELSE sys.fn_cdc_increment_lsn(0x00) END
, sys.fn_cdc_get_max_lsn()
, 'all with mask')
WHERE __$operation <> 1
Modified Query [00:01:10]
DECLARE #MinLSN binary(10)
DECLARE #MaxLSN binary(10)
SELECT #MaxLSN= sys.fn_cdc_get_max_lsn()
SELECT #MinLSN=CASE WHEN sys.fn_cdc_increment_lsn(0x00)<sys.fn_cdc_get_min_lsn('dbo_PartitionTest')
THEN sys.fn_cdc_get_min_lsn('dbo_PartitionTest')
ELSE sys.fn_cdc_increment_lsn(0x00) END
SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_PartitionTest(
#MinLSN, #MaxLSN, 'all with mask') WHERE __$operation <> 1
[Modified]
I tried to recreate the scenario with a similar function to see if the parameters are evaluated for each row.
CREATE FUNCTION Fn_Test(#a decimal)RETURNS TABLE
AS
RETURN
(
SELECT #a Parameter, Getdate() Dt, PartitionTest.*
FROM PartitionTest
);
SELECT * FROM Fn_Test(RAND(DATEPART(s,GETDATE())))
But I am getting the same value for the column 'Parameter' for a a million records processed in 38 seconds.
In your first query, your fn_cdc_increment_lsn and fn_cdc_get_min_lsn get executed for every row. In second example, just once.
Even deterministic scalar functions are evaluated at least once per row. If the same deterministic scalar function occurs multiple times on the same "row" with the same parameters, I believe only then will it be evaluated once - e.g. in a CASE WHEN fn_X(a, b, c) > 0 THEN fn_X(a, b, c) ELSE 0 END or something like that.
I think your RAND problem is because you continue to reseed:
Repetitive calls of RAND() with the
same seed value return the same
results.
For one connection, if RAND() is
called with a specified seed value,
all subsequent calls of RAND() produce
results based on the seeded RAND()
call. For example, the following query
will always return the same sequence
of numbers.
I have taken to caching scalar function results as you have indicated - even going so far as to precalculate tables of scalar function results and joining to them. Something has to be done eventually to make scalar functions perform. Right not, the best option is the CLR - apparently these far outperform SQL UDFs. Unfortunately, I cannot use them in my current environment.