Self-Correcting Probability Distribution - Maintain randomness, while gravitating each outcome's frequency towards its probability - dynamic

This is a common problem when you want to introduce randomness, but at the same time you want your experiment to stick close to the intended probability distribution, and can not / do not want to count on the law of big numbers.
Say you have programmed a coin with 50-50 chance for heads / tails. If you simulate it 100 times, most likely you will get something close to the intended 50-50 (binary distribution centered at 50-50).
But what if you wanted similar certainty for any number of repeats of the experiment.
A client of ours asked us this ::
We may also need to add some restrictions on some of the randomizations (e.g. if spatial location of our stimuli is totally random, the program could present too many stimuli in some locations and not very many in others. Locations should be equally sampled, so more of an array that is shuffled instead of randomization with replacement).
So they wanted randomness they could control.

Implementation details aside (arrays, vs other methods), the wanted result for our client's problem was the following ::
Always have as close to 1 / N of the stimuli in each of the N potential locations, yet do so in a randomized (hard-to-predict) way.
This is commonly needed in games (when distributing objects, characters, stats, ..), and I would imagine many other applications.
My preferred method for dealing with this is to dynamically weight the intended probabilities based on how the experiment has gone so far. This effectively moves us away from independently drawn variables.
Let p[i] be the wanted probability of outcome i
Let N[i] be the number of times outcome i has happened up to now
Let N be the sum of N[] for all outcomes i
Let w[i] be the correcting weight for i
Let W_Max be the maximum weight you want to assign (ie. when an outcome has occurred 0 times)
Let P[i] be the unnormalized probability for i
Then p_c[i] is the corrected probability for i
p[i] is fixed and provided by the design. N[i] is an accumulation - every time i happens, increment N[i] by 1.
w[i] is given by
w[i] = CalculateWeight(p[i], N[i], N, W_Max)
{
if (N == 0) return 1;
if (N[i] == 0) return W_Max;
intended = p[i] * N
current = N[i]
return intended / current;
}
And P[i] is given by
P[i] = p[i] * w[i]
Then we calculate p_c[i] as
p_c[i] = P[i] / sum(P[i])
And we run the next iteration of our random experiment (sampling) with p_c[i] instead of p[i] for outcome i.
The main drawback is that you trade control for predictability. After 4 tails in a row, it's highly likely you will see a head.
Note 1 :: The described method will provide at any step a distribution close to the original if the experiment's results match the intended results, or skewed towards (away) outcomes that have happened less (more) than intended.
Note 2 :: You can introduce a "control" parameter c and add an extra step.
p_c2[i] = c * p_c[i] + (1-c) * p[i]
For c = 1, this defaults to the described method, for c = 0 it defaults to the the original probabilities (independently drawn variables).

Related

Median of Medians using blocks of 3 - why is it not linearic?

I understand why, in worst case, where T is the running time of the algorithm, that using the median of medians algorithm with blocks of size three gives a recurrence relation of
T(n) = T(2n / 3) + T(n / 3) + O(n)
The Wikipedia article for the median-of-medians algorithm says that with blocks of size three the runtime is not O(n) because it still needs to check all n elements. I don't quite understand this explanation, and in my homework it says I need to show it by induction.
How would I show that median-of-medians takes time Ω(n log n) in this case?
Since this is a homework problem I'm going to let you figure out a rigorous proof of this result on your own, but it might be helpful to think about this one by looking at the shape of the recursion tree, which will be something like this:
n Total work: n
2n/3 n/3 Total work: n
4n/9 2n/9 2n/9 n/9 Total work: n
Essentially, each node's children collectively will do the exact same amount of work as the node itself, so if you sum up the work done across the layers, you should see roughly linear work done per level. It won't be exactly linear work per level because eventually the smaller call starts to bottom out, but for the top layers you'll see this pattern hold.
You can formalize this by induction by guessing that the runtime is something of the form cn log n, possibly with some lower-order terms added in, but (IMHO) it's more important and instructive to see where the runtime comes from than it is to be able to prove it inductively.
If we add the fractional parts of T(2n/3) and T(n/3), get T(n). Then, using the Master theorem, we have n^(log_(b)(a)) = n^(log_(1)(1)) = n. We also have f(n) = O(n). So n^(log_(b)(a)) = O(n) = Theta(f(n)), thus Case 2 of the Master theorem applies. Thus T(n) = Theta(n^(log_(b)(a)) * log(n)) = Theta(n*log(n)).

Find global maximum in the lest number of computations

Let's say I have a function f defined on interval [0,1], which is smooth and increases up to some point a after which it starts decreasing. I have a grid x[i] on this interval, e.g. with a constant step size of dx = 0.01, and I would like to find which of those points has the highest value, by doing the smallest number of evaluations of f in the worst-case scenario. I think I can do much better than exhaustive search by applying something inspired with gradient-like methods. Any ideas? I was thinking of something like a binary search perhaps, or parabolic methods.
This is a bisection-like method I coded:
def optimize(f, a, b, fa, fb, dx):
if b - a <= dx:
return a if fa > fb else b
else:
m1 = 0.5*(a + b)
m1 = _round(m1, a, dx)
fm1 = fa if m1 == a else f(m1)
m2 = m1 + dx
fm2 = fb if m2 == b else f(m2)
if fm2 >= fm1:
return optimize(f, m2, b, fm2, fb, dx)
else:
return optimize(f, a, m1, fa, fm1, dx)
def _round(x, a, dx, right = False):
return a + dx*(floor((x - a)/dx) + right)
The idea is: find the middle of the interval and compute m1 and m2- the points to the right and to the left of it. If the direction there is increasing, go for the right interval and do the same, otherwise go for the left. Whenever the interval is too small, just compare the numbers on the ends. However, this algorithm still does not use the strength of the derivatives at points I computed.
Such a function is called unimodal.
Without computing the derivatives, you can work by
finding where the deltas x[i+1]-x[i] change sign, by dichotomy (the deltas are positive then negative after the maximum); this takes Log2(n) comparisons; this approach is very close to what you describe;
adapting the Golden section method to the discrete case; it takes Logφ(n) comparisons (φ~1.618).
Apparently, the Golden section is more costly, as φ<2, but actually the dichotomic search takes two function evaluations at a time, hence 2Log2(n)=Log√2(n) .
One can show that this is optimal, i.e. you can't go faster than O(Log(n)) for an arbitrary unimodal function.
If your function is very regular, the deltas will vary smoothly. You can think of the interpolation search, which tries to better predict the searched position by a linear interpolation rather than simple halving. In favorable conditions, it can reach O(Log(Log(n)) performance. I don't know of an adaptation of this principle to the Golden search.
Actually, linear interpolation on the deltas is very close to parabolic interpolation on the function values. The latter approach might be the best for you, but you need to be careful about the corner cases.
If derivatives are allowed, you can use any root solving method on the first derivative, knowing that there is an isolated zero in the given interval.
If only the first derivative is available, use regula falsi. If the second derivative is possible as well, you may consider Newton, but prefer a safe bracketing method.
I guess that the benefits of these approaches (superlinear and quadratic convergence) are made a little useless by the fact that you are working on a grid.
DISCLAIMER: Haven't test the code. Take this as an "inspiration".
Let's say you have the following 11 points
x,f(x) = (0,3),(1,7),(2,9),(3,11),(4,13),(5,14),(6,16),(7,5),(8,3)(9,1)(1,-1)
you can do something like inspired to the bisection method
a = 0 ,f(a) = 3 | b=10,f(b)=-1 | c=(0+10/2) f(5)=14
from here you can see that the increasing interval is [a,c[ and there is no need to that for the maximum because we know that in that interval the function is increasing. Maximum has to be in interval [c,b]. So at the next iteration you change the value of a s.t. a=c
a = 5 ,f(a) = 14 | b=10,f(b)=-1 | c=(5+10/2) f(6)=16
Again [a,c] is increasing so a is moved on the right
you can iterate the process until a=b=c.
Here the code that implements this idea. More info here:
int main(){
#define STEP (0.01)
#define SIZE (1/STEP)
double vals[(int)SIZE];
for (int i = 0; i < SIZE; ++i) {
double x = i*STEP;
vals[i] = -(x*x*x*x - (0.6)*(x*x));
}
for (int i = 0; i < SIZE; ++i) {
printf("%f ",vals[i]);
}
printf("\n");
int a=0,b=SIZE-1,c;
double fa=vals[a],fb=vals[b] ,fc;
c=(a+b)/2;
fc = vals[c];
while( a!=b && b!=c && a!=c){
printf("%i %i %i - %f %f %f\n",a,c,b, vals[a], vals[c],vals[b]);
if(fc - vals[c-1] > 0){ //is the function increasing in [a,c]
a = c;
}else{
b=c;
}
c=(a+b)/2;
fa=vals[a];
fb=vals[b];
fc = vals[c];
}
printf("The maximum is %i=%f with %f\n", c,(c*STEP),vals[a]);
}
Find points where derivative(of f(x))=(df/dx)=0
for derivative you could use five-point-stencil or similar algorithms.
should be O(n)
Then fit those multiple points (where d=0) on a polynomial regression / least squares regression .
should be also O(N). Assuming all numbers are neighbours.
Then find top of that curve
shouldn't be more than O(M) where M is resolution of trials for fit-function.
While taking derivative, you could leap by k-length steps until derivate changes sign.
When derivative changes sign, take square root of k and continue reverse direction.
When again, derivative changes sign, take square root of new k again, change direction.
Example: leap by 100 elements, find sign change, leap=10 and reverse direction, next change ==> leap=3 ... then it could be fixed to 1 element per step to find exact location.
I am assuming that the function evaluation is very costly.
In the special case, that your function could be approximately fitted with a polynomial, you can easily calculate the extrema in least number of function evaluations. And since you know that there is only one maximum, a polynomial of degree 2 (quadratic) might be ideal.
For example: If f(x) can be represented by a polynomial of some known degree, say 2, then, you can evaluate your function at any 3 points and calculate the polynomial coefficients using Newton's difference or Lagrange interpolation method.
Then its simple to solve for the maximum for this polynomial. For a degree 2 you can easily get a closed form expression for the maximum.
To get the final answer you can then search in the vicinity of the solution.

Determine the running time of an algorithm with two parameters

I have implemented an algorithm that uses two other algorithms for calculating the shortest path in a graph: Dijkstra and Bellman-Ford. Based on the time complexity of the these algorithms, I can calculate the running time of my implementation, which is easy giving the code.
Now, I want to experimentally verify my calculation. Specifically, I want to plot the running time as a function of the size of the input (I am following the method described here). The problem is that I have two parameters - number of edges and number of vertices.
I have tried to fix one parameter and change the other, but this approach results in two plots - one for varying number of edges and the other for varying number of vertices.
This leads me to my question - how can I determine the order of growth based on two plots? In general, how can one experimentally determine the running time complexity of an algorithm that has more than one parameter?
It's very difficult in general.
The usual way you would experimentally gauge the running time in the single variable case is, insert a counter that increments when your data structure does a fundamental (putatively O(1)) operation, then take data for many different input sizes, and plot it on a log-log plot. That is, log T vs. log N. If the running time is of the form n^k you should see a straight line of slope k, or something approaching this. If the running time is like T(n) = n^{k log n} or something, then you should see a parabola. And if T is exponential in n you should still see exponential growth.
You can only hope to get information about the highest order term when you do this -- the low order terms get filtered out, in the sense of having less and less impact as n gets larger.
In the two variable case, you could try to do a similar approach -- essentially, take 3 dimensional data, do a log-log-log plot, and try to fit a plane to that.
However this will only really work if there's really only one leading term that dominates in most regimes.
Suppose my actual function is T(n, m) = n^4 + n^3 * m^3 + m^4.
When m = O(1), then T(n) = O(n^4).
When n = O(1), then T(n) = O(m^4).
When n = m, then T(n) = O(n^6).
In each of these regimes, "slices" along the plane of possible n,m values, a different one of the terms is the dominant term.
So there's no way to determine the function just from taking some points with fixed m, and some points with fixed n. If you did that, you wouldn't get the right answer for n = m -- you wouldn't be able to discover "middle" leading terms like that.
I would recommend that the best way to predict asymptotic growth when you have lots of variables / complicated data structures, is with a pencil and piece of paper, and do traditional algorithmic analysis. Or possibly, a hybrid approach. Try to break the question of efficiency into different parts -- if you can split the question up into a sum or product of a few different functions, maybe some of them you can determine in the abstract, and some you can estimate experimentally.
Luckily two input parameters is still easy to visualize in a 3D scatter plot (3rd dimension is the measured running time), and you can check if it looks like a plane (in log-log-log scale) or if it is curved. Naturally random variations in measurements plays a role here as well.
In Matlab I typically calculate a least-squares solution to two-variable function like this (just concatenates different powers and combinations of x and y horizontally, .* is an element-wise product):
x = log(parameter_x);
y = log(parameter_y);
% Find a least-squares fit
p = [x.^2, x.*y, y.^2, x, y, ones(length(x),1)] \ log(time)
Then this can be used to estimate running times for larger problem instances, ideally those would be confirmed experimentally to know that the fitted model works.
This approach works also for higher dimensions but gets tedious to generate, maybe there is a more general way to achieve that and this is just a work-around for my lack of knowledge.
I was going to write my own explanation but it wouldn't be any better than this.

What is reduction variable? Could anyone give me some examples?

What is reduction variable?
Could anyone give me some examples?
Here's a simple example in a C-like language of computing the sum of an array:
int x = 0;
for (int i = 0; i < n; i++) {
x += a[i];
}
In this example,
i is an induction variable - in each iteration it changes by some constant. It can be +1 (as in the above example) or *2 or /3 etc., but the key is that in all the iterations the number is the same.
In other words, in each iteration i_new = i_old op constant, where op is +, *, etc., and neither op nor constant change between iterations.
x is a reduction variable - it accumulates data from one iteration to the next. It always has some initialization (x = 0 in this case), and while the data accumulated can be different in each iteration, the operator remains the same.
In other words, in each iteration x_new = x_old op data, and op remains the same in all iterations (though data may change).
In many languages there's a special syntax for performing something like this - often called a "fold" or "reduce" or "accumulate" (and it has other names) - but in the context of LLVM IR, an induction variable will be represented by a phi node in a loop between a binary operation inside the loop and the initialization value before it.
Commutative* operations in reduction variables (such as addition) are particularly interesting for an optimizing compiler because they appear to show a stronger dependency between iterations than there really is; for instance the above example could be rewritten into a vectorized form - adding, say, 4 numbers at a time, and followed by a small loop to sum the final vector into a single value.
* there are actually more conditions that the reduction variable has to fulfill before a vectorization like this can be applied, but that's really outside the scope here

How to make a start on the "crackless wall" problem

Here's the problem statement:
Consider the problem of building a wall out of 2x1 and 3x1 bricks (horizontal×vertical dimensions) such that, for extra strength, the gaps between horizontally-adjacent bricks never line up in consecutive layers, i.e. never form a "running crack".
There are eight ways of forming a crack-free 9x3 wall, written W(9,3) = 8.
Calculate W(32,10). < Generalize it to W(x,y) >
http://www.careercup.com/question?id=67814&form=comments
The above link gives a few solutions, but I'm unable to understand the logic behind them. I'm trying to code this in Perl and have done so far:
input : W(x,y)
find all possible i's and j's such that x == 3(i) + 2(j);
for each pair (i,j) ,
find n = (i+j)C(j) # C:combinations
Adding all these n's should give the count of all possible combinations. But I have no idea on how to find the real combinations for one row and how to proceed further.
Based on the claim that W(9,3)=8, I'm inferring that a "running crack" means any continuous vertical crack of height two or more. Before addressing the two-dimensional problem as posed, I want to discuss an analogous one-dimensional problem and its solution. I hope this will make it more clear how the two-dimensional problem is thought of as one-dimensional and eventually solved.
Suppose you want to count the number of lists of length, say, 40, whose symbols come from a reasonably small set of, say, the five symbols {a,b,c,d,e}. Certainly there are 5^40 such lists. If we add an additional constraint that no letter can appear twice in a row, the mathematical solution is still easy: There are 5*4^39 lists without repeated characters. If, however, we instead wish to outlaw consonant combinations such as bc, cb, bd, etc., then things are more difficult. Of course we would like to count the number of ways to choose the first character, the second, etc., and multiply, but the number of ways to choose the second character depends on the choice of the first, and so on. This new problem is difficult enough to illustrate the right technique. (though not difficult enough to make it completely resistant to mathematical methods!)
To solve the problem of lists of length 40 without consonant combinations (let's call this f(40)), we might imagine using recursion. Can you calculate f(40) in terms of f(39)? No, because some of the lists of length 39 end with consonants and some end with vowels, and we don't know how many of each type we have. So instead of computing, for each length n<=40, f(n), we compute, for each n and for each character k, f(n,k), the number of lists of length n ending with k. Although f(40) cannot be
calculated from f(39) alone, f(40,a) can be calculated in terms of f(30,a), f(39,b), etc.
The strategy described above can be used to solve your two-dimensional problem. Instead of characters, you have entire horizontal brick-rows of length 32 (or x). Instead of 40, you have 10 (or y). Instead of a no-consonant-combinations constraint, you have the no-adjacent-cracks constraint.
You specifically ask how to enumerate all the brick-rows of a given length, and you're right that this is necessary, at least for this approach. First, decide how a row will be represented. Clearly it suffices to specify the locations of the 3-bricks, and since each has a well-defined center, it seems natural to give a list of locations of the centers of the 3-bricks. For example, with a wall length of 15, the sequence (1,8,11) would describe a row like this: (ooo|oo|oo|ooo|ooo|oo). This list must satisfy some natural constraints:
The initial and final positions cannot be the centers of a 3-brick. Above, 0 and 14 are invalid entries.
Consecutive differences between numbers in the sequence must be odd, and at least three.
The position of the first entry must be odd.
The difference between the last entry and the length of the list must also be odd.
There are various ways to compute and store all such lists, but the conceptually easiest is a recursion on the length of the wall, ignoring condition 4 until you're done. Generate a table of all lists for walls of length 2, 3, and 4 manually, then for each n, deduce a table of all lists describing walls of length n from the previous values. Impose condition 4 when you're finished, because it doesn't play nice with recursion.
You'll also need a way, given any brick-row S, to quickly describe all brick-rows S' which can legally lie beneath it. For simplicity, let's assume the length of the wall is 32. A little thought should convince you that
S' must satisfy the same constraints as S, above.
1 is in S' if and only if 1 is not in S.
30 is in S' if and only if 30 is not in S.
For each entry q in S, S' must have a corresponding entry q+1 or q-1, and conversely every element of S' must be q-1 or q+1 for some element q in S.
For example, the list (1,8,11) can legally be placed on top of (7,10,30), (7,12,30), or (9,12,30), but not (9,10,30) since this doesn't satisfy the "at least three" condition. Based on this description, it's not hard to write a loop which calculates the possible successors of a given row.
Now we put everything together:
First, for fixed x, make a table of all legal rows of length x. Next, write a function W(y,S), which is to calculate (recursively) the number of walls of width x, height y, and top row S. For y=1, W(y,S)=1. Otherwise, W(y,S) is the sum over all S' which can be related to S as above, of the values W(y-1,S').
This solution is efficient enough to solve the problem W(32,10), but would fail for large x. For example, W(100,10) would almost certainly be infeasible to calculate as I've described. If x were large but y were small, we would break all sensible brick-laying conventions and consider the wall as being built up from left-to-right instead of bottom-to-top. This would require a description of a valid column of the wall. For example, a column description could be a list whose length is the height of the wall and whose entries are among five symbols, representing "first square of a 2x1 brick", "second square of a 2x1 brick", "first square of a 3x1 brick", etc. Of course there would be constraints on each column description and constraints describing the relationship between consecutive columns, but the same approach as above would work this way as well, and would be more appropriate for long, short walls.
I found this python code online here and it works fast and correctly. I do not understand how it all works though. I could get my C++ to the last step (count the total number of solutions) and could not get it to work correctly.
def brickwall(w,h):
# generate single brick layer of width w (by recursion)
def gen_layers(w):
if w in (0,1,2,3):
return {0:[], 1:[], 2:[[2]], 3:[[3]]}[w]
return [(layer + [2]) for layer in gen_layers(w-2)] + \
[(layer + [3]) for layer in gen_layers(w-3)]
# precompute info about whether pairs of layers are compatible
def gen_conflict_mat(layers, nlayers, w):
# precompute internal brick positions for easy comparison
def get_internal_positions(layer, w):
acc = 0; intpos = set()
for brick in layer:
acc += brick; intpos.add(acc)
intpos.remove(w)
return intpos
intpos = [get_internal_positions(layer, w) for layer in layers]
mat = []
for i in range(nlayers):
mat.append([j for j in range(nlayers) \
if intpos[i].isdisjoint(intpos[j])])
return mat
layers = gen_layers(w)
nlayers = len(layers)
mat = gen_conflict_mat(layers, nlayers, w)
# dynamic programming to recursively compute wall counts
nwalls = nlayers*[1]
for i in range(1,h):
nwalls = [sum(nwalls[k] for k in mat[j]) for j in range(nlayers)]
return sum(nwalls)
print(brickwall(9,3)) #8
print(brickwall(9,4)) #10
print(brickwall(18,5)) #7958
print(brickwall(32,10)) #806844323190414