Numpy RNG non-deterministic even when seeded - numpy

I'm using numpy.random for a Monte Carlo simulation where very small acceptance/rejection probabilities are possible. Although I'm seeding the RNG, I'm unable to reproduce the same sequence of random numbers. In numpy 1.15.1's documentation it says:
Compatibility Guarantee: A fixed seed and a fixed series of calls to
‘RandomState’ methods using the same parameters will always produce
the same results up to roundoff error except when the values were
incorrect. Incorrect values will be fixed and the NumPy version in
which the fix was made will be noted in the relevant docstring.
Extension of existing parameter ranges and the addition of new
parameters is allowed as long the previous behavior remains unchanged.
First of all, what do they mean by incorrect values? Second, how is roundoff error handled? Aren't values always rounded in precisely the same way? Is it possible at all that my code is not fully deterministic even though I provide a seed? I am certain that the seed is nowhere else reset because I provide my RNG object to each of my function as an argument.

It appears that I used sets throughout the code and was picking randomly from those sets, throwing a random number to pick the index of an element. The issue was that sets are unordered and the particular order of sets was uncontrollable thus random.

Related

SciPy Basinhopping not returning lowest-found minimum

I know there is a very similar question, but mine is different. I am running an optimization using Basinhopping, with the Powell method. Within the function I am optimizing, I also store to an external array the parameters and the resulting cost function value for each iteration, so I can afterwards check the results. I've noticed repeatedly that the lowest minimization result which the basinhopping function returns is not actually the set of parameters which resulted in the lowest overall error. I assume this is not an error, but maybe me misunderstanding how the technique works. For example, in an optimization I just ran, I found the result which was returned was actually the 35th-best option, when I check my arrays after completion. The difference in cost is very small (I'm using RMSE as a metric, and the difference is 0.02), but I still don't understand how it selected the minimum.
My first thought was maybe these parameters somehow exceeded the bounds I set, but I checked and that isn't the case.
I don't yet have a shareable reproducible version since I'm using some internal modules in the function call, but I figured I would post my question since it is more about the conceptual aspect of how basinhopping selects its result.

Numerical Instability in Optim.jl

I'm currently working on a project in Julia where I am starting with an input beta which is assumed to be incorrect. I'm running through a sequence of code that updates this beta to be the correct value and checking the error. As beta gets larger, I expect this error to reach 100%. This code ultimately does a minimization of some parameter chi which is why I've chosen to employ the optimize function from Optim.jl. The output I'm getting is below.
When I perform this calculation by hand (using 1st and 2nd derivative to update) I get this
I see that this still has some numerical instability, but it holds up longer than the Optim way does. I would expect it to behave the other way around. My optimize function is set up as
result = optimize(β -> TEfunc(E,nc,onecut,β,pcutoff,μcutoff,N),β/2,2.2*β,Brent(),abs_tol=tempcutoff,rel_tol=sqrt(tempcutoff))
βstar=Optim.minimizer(result)
Is there an argument that I'm missing in the optimize call? I just want to figure out why I have numerical instability so quickly.

What does the number in parentheses in `np.random.seed(number)` means?

What is the difference between np.random.seed(0), np.random.seed(42), and np.random.seed(..any number). what is the function of the number in parentheses?
python uses the iterative Mersenne Twister algorithm to generate pseudo-random numbers [1]. The seed is simply where we start iterating.
To be clear, most computers do not have a "true" source of randomness. It is kind of an interesting thing that "randomness" is so valuable to so many applications, and is quite hard to come by (you can buy a specialized device devoted to this purpose). Since it is difficult to make random numbers, but they are nevertheless necessary, many, many, many, many algorithms have been developed to generate numbers that are not random, but nevertheless look as though they are. Algorithms that generate numbers that "look randomish" are called pseudo-random number generators (PRNGs). Since PRNGs are actually deterministic, they can't simply create a number from the aether and have it look randomish. They need an input. It turns out that using some complex operations and modular arithmetic, we can take in an input, and get another number that seems to have little or no relation to the input. Using this intuition, we can simply use the previous output of the PRNG as the next input. We then get a sequence of numbers which, if our PRNG is good, will seem to have no relation to each other.
In order to get our iterative PRNG started, we need an initial input. This initial input is called a "seed". Since the PRNG is deterministic, for a given seed, it will generate an identical sequence of numbers. Usually, there is a default seed that is, itself, sort of randomish. The most common one is the current time. However, the current time isn't a very good random number, so this behavior is known to cause problems sometimes. If you want your program to run in an identical manner each time you run it, you can provide a seed (0 is a popular option, but is entirely arbitrary). Then, you get a sequence of randomish numbers, but if you give your code to someone they can actually entirely recreate the runtime of the program as you witnessed it when you ran it.
That would be the starting key of the generator. Typically if you want to get reproducible results you'll use the same seed over and over again throughout your simulations.
You are setting the seed of the random number generator so you can get reproducible results. Example.
np.random.seed(0)
np.random.randint(0,100,10)
Output:
array([44, 47, 64, 67, 67, 9, 83, 21, 36, 87])
Now, if you ran the same code your computer, you should get the same 10 number output from the random integers from 0 to 100.

Using pymc.potential to prevent evaluation of function at meaningless parameters values

I am building a pymc model which must evaluate a very cpu expensive function (up to 1 sec per call on a very decent hardware). I am trying to limit the explored parameter space to meaningful solutions by means of a potential (the sum of a list of my variables has to stay within a given range). This works but I noticed that even when my potential returns an infinite value and forbids the parameters choice, this function gets evaluated. Is there a way to prevent that? Can one force the sampler to use a given evaluation sequence (pick up the necessary variables, check if the potential is ok and proceed if allowed)
I thought of using the potential inside the function itself and use it to determine whether it must proceed or immediately return, but is there a better way?
Jean-François
I am not aware of a way of ordering the evaluation of the potentials. This might not be the best way of doing so, but you might be able to check if the parameters are within reasonable at the beginning of the simulation. If the parameters are not within reasonable bounds you can return a value that will create your posterior to be zero.
Another option is to create a function for your likelihood. At the beginning of this function you could check if the parameters are within reasonable limits. If they are not you can return -inf without running your simulation. If they are reasonable you can run your model and calculate the log(p).
This is definitely not an elegant solution but it should work.
Full disclosure - I am not by any means a pymc expert.

Testing whether some code works for ALL input numbers?

I've got an algorithm using a single (positive integer) number as an input to produce an output. And I've got the reverse function which should do the exact opposite, going back from the output to the same integer number. This should be a unique one-to-one reversible mapping.
I've tested this for some integers, but I want to be 100% sure that it works for all of them, up to a known limit.
The problem is that if I just test every integer, it takes an unreasonably long time to run. If I use 64-bit integers, that's a lot of numbers to check if I want to check them all. On the other hand, if I only test every 10th or 100th number, I'm not going to be 100% sure at the end. There might be some awkward weird constellation in one of the 90% or 99% which I didn't test.
Are there any general ways to identify edge cases so that just those "interesting" or "risky" numbers are checked? Or should I just pick numbers at random? Or test in increasing increments?
Or to put the question another way, how can I approach this so that I gain 100% confidence that every case will be properly handled?
The approach for this is generally checking every step of the computation for potential flaws. Concerning integer math, that is overflows, underflows and rounding errors from division, basically that the mathematical result can't be represented accurately. In addition, all operations derived from this suffer similar problems.
The process of auditing then looks at single steps in turn. For example, if you want to allocate memory for N integers, you need N times the size of an integer in bytes and this multiplication can overflow. You now determine those values where the multiplication overflows and create according tests that exercise these. Note that for the example of allocating memory, proper handling typically means that the function does not allocate memory but fail.
The principle behind this is that you determine the ranges for every operation where the outcome is somehow different (like e.g. where it overflows) and then make sure via tests that both variants work. This reduces the number of tests from all possible input values to just those where you expect a significant difference.