Using:
value = arc4random() % x
How can I avoid or eliminate modulo bias?
At least according to Wikipedia, modulo bias is an issue when programming games of chance.
Use arc4random_uniform(x). This does it for you.
According to the man page:
arc4random_uniform() will return a uniformly distributed random number less than upper_bound. arc4random_uniform() is recommended over constructions like arc4random() % upper_bound as it avoids "modulo bias" when the upper bound is not a power of two.
arc4random returns a 32-bit unsigned integer (0 to 232-1).
There will probably be no noticable modulo bias for small enough x. However, if you want to be really sure, do this:
y = 2p where 2p-1 < x ≤ 2p
val = arc4random() % y;
while(val >= x)
val = arc4random() % y;
u_int32_t maxValue = ~((u_int32_t) 0); // equal to 0xffff...
maxValue -= maxValue % x; // make maxValue a multiple of x
while((value = arc4random()) >= maxValue) { // loop until we get 0 ≤ value < maxValue
}
value %= x;
although unless you are using any x under a million (or more) I wouldn't worry about it
If the maximum value of arc4random mod x is greater than x, ignore any values larger than the largest arc4random-max mod x, calling arc4random again instead.
u_int32_t maxValue = ~((u_int32_t) 0); // equal to 0xffff...
maxValue -= maxValue % x; // make maxValue a multiple of x
while((value = arc4random()) >= maxValue) { // loop until we get 0 ≤ value < maxValue
}
value %= x;
Somewhat pedantic objection to cobbal's answer. It "works", that is it removes the modulo bias, but it rejects more values than are necessary. The most extreme case is x = 2^31. All values of arc4random() should be accepted here but the code as written will reject half of them.
Instead, add 1 to the initialization of maxValue (that puts it at 2^32 so you'll have to use a 64 bit int), and then it's right. You can also avoid using a 64 bit int. Test beforehand if 2^32 % x == 0, if so all arc4random() values are acceptable and you can skip the loop, otherwise you can keep maxValue at 32 bits by subtracting 2^32 % x on initialization.
Related
The below-given code has a space complexity of O(1). I know it has something to do with the call stack but I am unable to visualize it correctly. If somebody could make me understand this a little bit clearer that would be great.
int pairSumSequence(int n) {
int sum = 0;
for (int i = 0;i < n; i++) {
sum += pairSum(i, i + l);
}
return sum;
}
int pairSum(int a, int b) {
return a + b;
}
How much space does it needs in relation to the value of n?
The only variable used is sum.
sum doesn't change with regards to n, therefore it's constant.
If it's constant then it's O(1)
How many instructions will it execute in relation to the value of n?
Let's first simplify the code, then analyze it row by row.
int pairSumSequence(int n) {
int sum = 0;
for (int i = 0; i < n; i++) {
sum += 2 * i + l;
}
return sum;
}
The declaration and initialization of a variable take constant time and doesn't change with the value of n. Therefore this line is O(1).
int sum = 0;
Similarly returning a value takes constant time so it's also O(1)
return sum;
Finally, let's analyze the inside of the for:
sum += 2 * i + l;
This is also constant time since it's basically one multiplication and two sums. Again O(1).
But this O(1) it's called inside a for:
for (int i = 0; i < n; i++) {
sum += 2 * i + l;
}
This for loop will execute exactly n times.
Therefore the total complexity of this function is:
C = O(1) + n * O(1) + O(1) = O(n)
Meaning that this function will take time proportional to the value of n.
Time/space complexity O(1) means a constant complexity, and the constant is not necessarily 1, it can be arbitrary number, but it has to be constant and not dependent from n. For example if you always had 1000 variables (independent from n), it would still give you O(1). Sometimes it may even happen that the constant will be so big compared to your n that O(n) would be much better than O(1) with that constant.
Now in your case, your time complexity is O(n) because you enter the loop n times and each loop has constant time complexity, so it is linearly dependent from your n. Your space complexity, however, is independent from n (you always have the same number of variables kept) and is constant, hence it will be O(1)
Could someone explain how this algorithm is O(log(n)) and not O(n)?
the loop runs for all the digits in a given number. So isn't the complexity O(n)?
while (x != 0) {
int pop = x % 10;
x /= 10;
if (rev > Integer.MAX_VALUE/10 || (rev == Integer.MAX_VALUE / 10 && pop > 7))
return 0;
if (rev < Integer.MIN_VALUE/10 || (rev == Integer.MIN_VALUE / 10 && pop < -8))
return 0;
rev = rev * 10 + pop;
}
It depends on what n is. If n is x itself, a numeric value, then the complexity is O(log(n)). If you multiply x by 10, the while loop will only be one iteration longer, not ten times as long. Likewise, multiplying x by 100 will only add two iterations.
On the other hand, if there was a variable s which was the string representation of x, and n was the length of string s, then the complexity would be O(n). Note that in this case, the length of s is proportional to log(x), so the logarithm is implicit from the viewpoint of the numeric value.
Here's a thought experiment:
The algorithm depends on the number of digits in n, not the value of n. n = 10 takes as many iterations as n = 99 because they both have 2 digits.
The number digits in n grows at log(n) rate, since adding a single digit requires n to be at least 10 times bigger.
Hence the algorithm has a complexity of O(log(n))
I'm looking at the documentation for random():
https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man3/srandomdev.3.html#//apple_ref/c/func/random
It returns successive pseudo-random numbers in the range from 0 to (2**31)-1.
I don't want it to return 0 ever.
I'm thinking about writing:
long rand = random() + 1;
But if I'm not mistaken, long can be 32-bits on a 32-bit processor. I guess I would risk stack overflow then.
What is the best approach to getting a random number between 1 and (2**31)-1?
NSUInteger r = arc4random_uniform(N) + 1;
This will generate a number between 1 and N. arc4random_uniform(N) generates a number between 0 and N-1.
You should have no problem with overflow.
long rand = 0;
while (rand == 0) {
rand = random();
}
This will almost absolutely certainly run exactly once. In a very, very rare case (that will never happen), it will run twice.
(Note that this is just a simplified version of how arc4random_uniform works. If you can use that function, as suggested by Jeff, you should.)
The maximum value returned by random() is RAND_MAX, so you can do this:
long rand = 1 + (random() % RAND_MAX);
When random() returns a value between zero and RAND_MAX-1, inclusive, you offset it by adding 1. When random() returns exactly RAND_MAX, modulo operator % converts the result to zero, so rand would be 1 again.
The drawback of this approach is that the probability of getting 1 becomes roughly twice as high as that of getting any other number.
I am calling arc4random in a function in my iOS application to generate random values from -5 to 6.
double num;
for (int i = 0; i < 3; i++) {
num = (arc4random() % 11) - 5;
NSLog(#"%0.0f", num);
}
I get the following output from console.
2012-05-01 20:25:41.120 Project32[8331:fb03] 0
2012-05-01 20:25:41.121 Project32[8331:fb03] 1
2012-05-01 20:25:41.122 Project32[8331:fb03] 4294967295
0 and 1 are values within range, but wowww, where did 4294967295 come from?
Changing arc4random() to rand() fixes the problem, but rand(), of course, requires seeding.
arc4random() returns a u_int32_t -- that's an unsigned integer, one that doesn't represent negative values. Every time arc4random() % 11 comes up with a number 0 ≤ n < 5, you subtract 5 and wrap around to a very large number.
doubles can represent negative numbers, of course, but you're not converting to double until it's too late. Stick a cast in there:
num = (double)(arc4random() % 11) - 5;
to promote the result of the modulo before the subtraction, and everything will be okay.
Try using
arc4random_uniform(11) - 5;
instead.
From the man page:
arc4random_uniform() will return a uniformly distributed random number
less than upper_bound. arc4random_uniform() is recommended over con-
structions like ``arc4random() % upper_bound'' as it avoids "modulo bias"
when the upper bound is not a power of two.
Question: Suppose you have a random number generator randn() that returns a uniformly distributed random number between 0 and n-1. Given any number m, write a random number generator that returns a uniformly distributed random number between 0 and m-1.
My answer:
-(int)randm() {
int k=1;
while (k*n < m) {
++k;
}
int x = 0;
for (int i=0; i<k; ++i) {
x += randn();
}
if (x < m) {
return x;
} else {
return randm();
}
}
Is this correct?
You're close, but the problem with your answer is that there is more than one way to write a number as a sum of two other numbers.
If m<n, then this works because the numbers 0,1,...,m-1 appear each with equal probability, and the algorithm terminates almost surely.
This answer does not work in general because there is more than one way to write a number as a sum of two other numbers. For instance, there is only one way to get 0 but there are many many ways to get m/2, so the probabilities will not be equal.
Example: n = 2 and m=3
0 = 0+0
1 = 1+0 or 0+1
2 = 1+1
so the probability distribution from your method is
P(0)=1/4
P(1)=1/2
P(2)=1/4
which is not uniform.
To fix this, you can use unique factorization. Write m in base n, keeping track of the largest needed exponent, say e. Then, find the biggest multiple of m that is smaller than n^e, call it k. Finally, generate e numbers with randn(), take them as the base n expansion of some number x, if x < k*m, return x, otherwise try again.
Assuming that m < n^2, then
int randm() {
// find largest power of n needed to write m in base n
int e=0;
while (m > n^e) {
++e;
}
// find largest multiple of m less than n^e
int k=1;
while (k*m < n^2) {
++k
}
--k; // we went one too far
while (1) {
// generate a random number in base n
int x = 0;
for (int i=0; i<e; ++i) {
x = x*n + randn();
}
// if x isn't too large, return it x modulo m
if (x < m*k)
return (x % m);
}
}
It is not correct.
You are adding uniform random numbers, which does not produce a uniformly random result. Say n=2 and m = 3, then the possible values for x are 0+0, 0+1, 1+0, 1+1. So you're twice as likely to get 1 than you are to get 0 or 2.
What you need to do is write m in base n, and then generate 'digits' of the base-n representation of the random number. When you have the complete number, you have to check if it is less than m. If it is, then you're done. If it is not, then you need to start over.
The sum of two uniform random number generators is not uniformly generated. For instance, the sum of two dice is more likely to be 7 than 12, because to get 12 you need to throw two sixes, whereas you can get 7 as 1 + 6 or 6 + 1 or 2 + 5 or 5 + 2 or ...
Assuming that randn() returns an integer between 0 and n - 1, n * randn() + randn() is uniformly distributed between 0 and n * n - 1, so you can increase its range. If randn() returns an integer between 0 and k * m + j - 1, then call it repeatedly until you get a number <= k * m - 1, and then divide the result by k to get a number uniformly distributed between 0 and m -1.
Assuming both n and m are positive integers, wouldn't the standard algorithm of scaling work?
return (int)((float)randn() * m / n);