What is reduction variable?
Could anyone give me some examples?
Here's a simple example in a C-like language of computing the sum of an array:
int x = 0;
for (int i = 0; i < n; i++) {
x += a[i];
}
In this example,
i is an induction variable - in each iteration it changes by some constant. It can be +1 (as in the above example) or *2 or /3 etc., but the key is that in all the iterations the number is the same.
In other words, in each iteration i_new = i_old op constant, where op is +, *, etc., and neither op nor constant change between iterations.
x is a reduction variable - it accumulates data from one iteration to the next. It always has some initialization (x = 0 in this case), and while the data accumulated can be different in each iteration, the operator remains the same.
In other words, in each iteration x_new = x_old op data, and op remains the same in all iterations (though data may change).
In many languages there's a special syntax for performing something like this - often called a "fold" or "reduce" or "accumulate" (and it has other names) - but in the context of LLVM IR, an induction variable will be represented by a phi node in a loop between a binary operation inside the loop and the initialization value before it.
Commutative* operations in reduction variables (such as addition) are particularly interesting for an optimizing compiler because they appear to show a stronger dependency between iterations than there really is; for instance the above example could be rewritten into a vectorized form - adding, say, 4 numbers at a time, and followed by a small loop to sum the final vector into a single value.
* there are actually more conditions that the reduction variable has to fulfill before a vectorization like this can be applied, but that's really outside the scope here
Related
This is a common problem when you want to introduce randomness, but at the same time you want your experiment to stick close to the intended probability distribution, and can not / do not want to count on the law of big numbers.
Say you have programmed a coin with 50-50 chance for heads / tails. If you simulate it 100 times, most likely you will get something close to the intended 50-50 (binary distribution centered at 50-50).
But what if you wanted similar certainty for any number of repeats of the experiment.
A client of ours asked us this ::
We may also need to add some restrictions on some of the randomizations (e.g. if spatial location of our stimuli is totally random, the program could present too many stimuli in some locations and not very many in others. Locations should be equally sampled, so more of an array that is shuffled instead of randomization with replacement).
So they wanted randomness they could control.
Implementation details aside (arrays, vs other methods), the wanted result for our client's problem was the following ::
Always have as close to 1 / N of the stimuli in each of the N potential locations, yet do so in a randomized (hard-to-predict) way.
This is commonly needed in games (when distributing objects, characters, stats, ..), and I would imagine many other applications.
My preferred method for dealing with this is to dynamically weight the intended probabilities based on how the experiment has gone so far. This effectively moves us away from independently drawn variables.
Let p[i] be the wanted probability of outcome i
Let N[i] be the number of times outcome i has happened up to now
Let N be the sum of N[] for all outcomes i
Let w[i] be the correcting weight for i
Let W_Max be the maximum weight you want to assign (ie. when an outcome has occurred 0 times)
Let P[i] be the unnormalized probability for i
Then p_c[i] is the corrected probability for i
p[i] is fixed and provided by the design. N[i] is an accumulation - every time i happens, increment N[i] by 1.
w[i] is given by
w[i] = CalculateWeight(p[i], N[i], N, W_Max)
{
if (N == 0) return 1;
if (N[i] == 0) return W_Max;
intended = p[i] * N
current = N[i]
return intended / current;
}
And P[i] is given by
P[i] = p[i] * w[i]
Then we calculate p_c[i] as
p_c[i] = P[i] / sum(P[i])
And we run the next iteration of our random experiment (sampling) with p_c[i] instead of p[i] for outcome i.
The main drawback is that you trade control for predictability. After 4 tails in a row, it's highly likely you will see a head.
Note 1 :: The described method will provide at any step a distribution close to the original if the experiment's results match the intended results, or skewed towards (away) outcomes that have happened less (more) than intended.
Note 2 :: You can introduce a "control" parameter c and add an extra step.
p_c2[i] = c * p_c[i] + (1-c) * p[i]
For c = 1, this defaults to the described method, for c = 0 it defaults to the the original probabilities (independently drawn variables).
I came across a question asking to describe the computational complexity in Big O of the following code:
i = 1;
while(i < N) {
i = i * 2;
}
I found this Stack Overflow question asking for the answer, with the most voted answer saying it is Log2(N).
On first thought that answer looks correct, however I remember learning about psuedo polynomial runtimes, and how computational complexity measures difficulty with respect to the length of the input, rather than the value.
So for integer inputs, the complexity should be in terms of the number of bits in the input.
Therefore, shouldn't this function be O(N)? Because every iteration of the loop increases the number of bits in i by 1, until it reaches around the same bits as N.
This code might be found in a function like the one below:
function FindNextPowerOfTwo(N) {
i = 1;
while(i < N) {
i = i * 2;
}
return i;
}
Here, the input can be thought of as a k-bit unsigned integer which we might as well imagine as having as a string of k bits. The input size is therefore k = floor(log(N)) + 1 bits of input.
The assignment i = 1 should be interpreted as creating a new bit string and assigning it the length-one bit string 1. This is a constant time operation.
The loop condition i < N compares the two bit strings to see which represents the larger number. If implemented intelligently, this will take time proportional to the length of the shorter of the two bit strings which will always be i. As we will see, the length of i's bit string begins at 1 and increases by 1 until it is greater than or equal to the length of N's bit string, k. When N is not a power of two, the length of i's bit string will reach k + 1. Thus, the time taken by evaluating the condition is proportional to 1 + 2 + ... + (k + 1) = (k + 1)(k + 2)/2 = O(k^2) in the worst case.
Inside the loop, we multiply i by two over and over. The complexity of this operation depends on how multiplication is to be interpreted. Certainly, it is possible to represent our bit strings in such a way that we could intelligently multiply by two by performing a bit shift and inserting a zero on the end. This could be made to be a constant-time operation. If we are oblivious to this optimization and perform standard long multiplication, we scan i's bit string once to write out a row of 0s and again to write out i with an extra 0, and then we perform regular addition with carry by scanning both of these strings. The time taken by each step here is proportional to the length of i's bit string (really, say that plus one) so the whole thing is proportional to i's bit-string length. Since the bit-string length of i assumes values 1, 2, ..., (k + 1), the total time is 2 + 3 + ... + (k + 2) = (k + 2)(k + 3)/2 = O(k^2).
Returning i is a constant time operation.
Taking everything together, the runtime is bounded from above and from below by functions of the form c * k^2, and so a bound on the worst-case complexity is Theta(k^2) = Theta(log(n)^2).
In the given example, you are not increasing the value of i by 1, but doubling it at every time, thus it is moving 2 times faster towards N. By multiplying it by two you are reducing the size of search space (between i to N) by half; i.e, reducing the input space by the factor of 2. Thus the complexity of your program is - log_2 (N).
If by chance you'd be doing -
i = i * 3;
The complexity of your program would be log_3 (N).
It depends on important question: "Is multiplication constant operation"?
In real world it is usually considered as constant, because you have fixed 32 or 64 bit numbers and multiplicating them takes always same (=constant) time.
On the other hand - you have limitation that N < 32/64 bit (or any other if you use it).
In theory where you do not consider multiplying as constant operation or for some special algorithms where N can grow too much to ignore the multiplying complexity, you are right, you have to start thinking about complexity of multiplying.
The complexity of multiplying by constant number (in this case 2) - you have to go through each bit each time and you have log_2(N) bits.
And you have to do hits log_2(N) times before you reach N
Which ends with complexity of log_2(N) * log_2(N) = O(log_2^2(N))
PS: Akash has good point that multiply by 2 can be written as constant operation, because the only thing you need in binary is to "add zero" (similar to multiply by 10 in "human readable" format, you just add zero 4333*10 = 43330)
However if multiplying is not that simple (you have to go through all bits), the previous answer is correct
Let's say I have a function f defined on interval [0,1], which is smooth and increases up to some point a after which it starts decreasing. I have a grid x[i] on this interval, e.g. with a constant step size of dx = 0.01, and I would like to find which of those points has the highest value, by doing the smallest number of evaluations of f in the worst-case scenario. I think I can do much better than exhaustive search by applying something inspired with gradient-like methods. Any ideas? I was thinking of something like a binary search perhaps, or parabolic methods.
This is a bisection-like method I coded:
def optimize(f, a, b, fa, fb, dx):
if b - a <= dx:
return a if fa > fb else b
else:
m1 = 0.5*(a + b)
m1 = _round(m1, a, dx)
fm1 = fa if m1 == a else f(m1)
m2 = m1 + dx
fm2 = fb if m2 == b else f(m2)
if fm2 >= fm1:
return optimize(f, m2, b, fm2, fb, dx)
else:
return optimize(f, a, m1, fa, fm1, dx)
def _round(x, a, dx, right = False):
return a + dx*(floor((x - a)/dx) + right)
The idea is: find the middle of the interval and compute m1 and m2- the points to the right and to the left of it. If the direction there is increasing, go for the right interval and do the same, otherwise go for the left. Whenever the interval is too small, just compare the numbers on the ends. However, this algorithm still does not use the strength of the derivatives at points I computed.
Such a function is called unimodal.
Without computing the derivatives, you can work by
finding where the deltas x[i+1]-x[i] change sign, by dichotomy (the deltas are positive then negative after the maximum); this takes Log2(n) comparisons; this approach is very close to what you describe;
adapting the Golden section method to the discrete case; it takes Logφ(n) comparisons (φ~1.618).
Apparently, the Golden section is more costly, as φ<2, but actually the dichotomic search takes two function evaluations at a time, hence 2Log2(n)=Log√2(n) .
One can show that this is optimal, i.e. you can't go faster than O(Log(n)) for an arbitrary unimodal function.
If your function is very regular, the deltas will vary smoothly. You can think of the interpolation search, which tries to better predict the searched position by a linear interpolation rather than simple halving. In favorable conditions, it can reach O(Log(Log(n)) performance. I don't know of an adaptation of this principle to the Golden search.
Actually, linear interpolation on the deltas is very close to parabolic interpolation on the function values. The latter approach might be the best for you, but you need to be careful about the corner cases.
If derivatives are allowed, you can use any root solving method on the first derivative, knowing that there is an isolated zero in the given interval.
If only the first derivative is available, use regula falsi. If the second derivative is possible as well, you may consider Newton, but prefer a safe bracketing method.
I guess that the benefits of these approaches (superlinear and quadratic convergence) are made a little useless by the fact that you are working on a grid.
DISCLAIMER: Haven't test the code. Take this as an "inspiration".
Let's say you have the following 11 points
x,f(x) = (0,3),(1,7),(2,9),(3,11),(4,13),(5,14),(6,16),(7,5),(8,3)(9,1)(1,-1)
you can do something like inspired to the bisection method
a = 0 ,f(a) = 3 | b=10,f(b)=-1 | c=(0+10/2) f(5)=14
from here you can see that the increasing interval is [a,c[ and there is no need to that for the maximum because we know that in that interval the function is increasing. Maximum has to be in interval [c,b]. So at the next iteration you change the value of a s.t. a=c
a = 5 ,f(a) = 14 | b=10,f(b)=-1 | c=(5+10/2) f(6)=16
Again [a,c] is increasing so a is moved on the right
you can iterate the process until a=b=c.
Here the code that implements this idea. More info here:
int main(){
#define STEP (0.01)
#define SIZE (1/STEP)
double vals[(int)SIZE];
for (int i = 0; i < SIZE; ++i) {
double x = i*STEP;
vals[i] = -(x*x*x*x - (0.6)*(x*x));
}
for (int i = 0; i < SIZE; ++i) {
printf("%f ",vals[i]);
}
printf("\n");
int a=0,b=SIZE-1,c;
double fa=vals[a],fb=vals[b] ,fc;
c=(a+b)/2;
fc = vals[c];
while( a!=b && b!=c && a!=c){
printf("%i %i %i - %f %f %f\n",a,c,b, vals[a], vals[c],vals[b]);
if(fc - vals[c-1] > 0){ //is the function increasing in [a,c]
a = c;
}else{
b=c;
}
c=(a+b)/2;
fa=vals[a];
fb=vals[b];
fc = vals[c];
}
printf("The maximum is %i=%f with %f\n", c,(c*STEP),vals[a]);
}
Find points where derivative(of f(x))=(df/dx)=0
for derivative you could use five-point-stencil or similar algorithms.
should be O(n)
Then fit those multiple points (where d=0) on a polynomial regression / least squares regression .
should be also O(N). Assuming all numbers are neighbours.
Then find top of that curve
shouldn't be more than O(M) where M is resolution of trials for fit-function.
While taking derivative, you could leap by k-length steps until derivate changes sign.
When derivative changes sign, take square root of k and continue reverse direction.
When again, derivative changes sign, take square root of new k again, change direction.
Example: leap by 100 elements, find sign change, leap=10 and reverse direction, next change ==> leap=3 ... then it could be fixed to 1 element per step to find exact location.
I am assuming that the function evaluation is very costly.
In the special case, that your function could be approximately fitted with a polynomial, you can easily calculate the extrema in least number of function evaluations. And since you know that there is only one maximum, a polynomial of degree 2 (quadratic) might be ideal.
For example: If f(x) can be represented by a polynomial of some known degree, say 2, then, you can evaluate your function at any 3 points and calculate the polynomial coefficients using Newton's difference or Lagrange interpolation method.
Then its simple to solve for the maximum for this polynomial. For a degree 2 you can easily get a closed form expression for the maximum.
To get the final answer you can then search in the vicinity of the solution.
I need to do something very similar to what is detailed in this post. But the way the stencils are done are not obvious to me... well the stencil for _flux is, but the ones for temp_bz & temp_bx are not.
I think the picture would get clearer with variables, instead of numbers (something like stencil = np.array([[a, b], [c, d]]) with a=0.5, b=...
As example, if the recurrence relation is
flux2[i,j] = a*flux2[i-1,j] + b*bz[i-1,j]*dx + c*flux2[i,j-1] - d*bx[i,j-1]*dz
how the code would be changed ?
Having flux2, bz and bx variables, and assuming they are numpy arrays (if they are not, they should), you could write that ecuation in a vectorized form as follows:
flux2[1:,1:] = a * flux2[:-1,1:] + b * bz[:-1,1:] * dx + c * flux2[1:,:-1] - d * bx[1:,:-1] * dz
Note that, since you didn't mention dz, I assumed it is a constant, if it is a matrix of the same shape as flux2, replace with dz[1:, 1:] (same applies to dx).
That line above will vectorize the operation to every i,j of the matrix, and thus, remove the for loop, giving a considerable speedup.
You would have to define the boundary conditions for row and column 0, as your equation doesn't define what to do in those special cases.
So, in short, as your stencil only uses one position for each variable, and only has 4 interactions, I would say is way faster to calculate it in its analytic form, rather than convolving 3 images with almost all-0 stencils (which would be quite a lot of overkill).
Check if an array of n integers contains 3 numbers which can form a triangle (i.e. the sum of any of the two numbers is bigger than the third).
Apparently, this can be done in O(n) time.
(the obvious O(n log n) solution is to sort the array so please don't)
It's difficult to imagine N numbers (where N is moderately large) so that there is no triangle triplet. But we'll try:
Consider a growing sequence, where each next value is at the limit N[i] = N[i-1] + N[i-2]. It's nothing else than Fibonacci sequence. Approximately, it can be seen as a geometric progression with the factor of golden ratio (GRf ~= 1.618).
It can be seen that if the N_largest < N_smallest * (GRf**(N-1)) then there sure will be a triangle triplet. This definition is quite fuzzy because of floating point versus integer and because of GRf, that is a limit and not an actual geometric factor. Anyway, carefully implemented it will give an O(n) test that can check if the there is sure a triplet. If not, then we have to perform some other tests (still thinking).
EDIT: A direct conclusion from fibonacci idea is that for integer input (as specified in Q) there will exist a garanteed solution for any possible input if the size of array will be larger than log_GRf(MAX_INT), and this is 47 for 32 bits or 93 for 64 bits. Actually, we can use the largest value from the input array to define it better.
This gives us a following algorithm:
Step 1) Find MAX_VAL from input data :O(n)
Step 2) Compute the minimum array size that would guarantee the existence of the solution:
N_LIMIT = log_base_GRf(MAX_VAL) : O(1)
Step 3.1) if N > N_LIMIT : return true : O(1)
Step 3.2) else sort and use direct method O(n*log(n))
Because for large values of N (and it's the only case when the complexity matters) it is O(n) (or even O(1) in cases when N > log_base_GRf(MAX_INT)), we can say it's O(n).