Is there an iterative way to calculate radii along a scanline? - optimization

I am processing a series of points which all have the same Y value, but different X values. I go through the points by incrementing X by one. For example, I might have Y = 50 and X is the integers from -30 to 30. Part of my algorithm involves finding the distance to the origin from each point and then doing further processing.
After profiling, I've found that the sqrt call in the distance calculation is taking a significant amount of my time. Is there an iterative way to calculate the distance?
In other words:
I want to efficiently calculate: r[n] = sqrt(x[n]*x[n] + y*y)). I can save information from the previous iteration. Each iteration changes by incrementing x, so x[n] = x[n-1] + 1. I can not use sqrt or trig functions because they are too slow except at the beginning of each scanline.
I can use approximations as long as they are good enough (less than 0.l% error) and the errors introduced are smooth (I can't bin to a pre-calculated table of approximations).
Additional information:
x and y are always integers between -150 and 150
I'm going to try a couple ideas out tomorrow and mark the best answer based on which is fastest.
Results
I did some timings
Distance formula: 16 ms / iteration
Pete's interperlating solution: 8 ms / iteration
wrang-wrang pre-calculation solution: 8ms / iteration
I was hoping the test would decide between the two, because I like both answers. I'm going to go with Pete's because it uses less memory.

Just to get a feel for it, for your range y = 50, x = 0 gives r = 50 and y = 50, x = +/- 30 gives r ~= 58.3. You want an approximation good for +/- 0.1%, or +/- 0.05 absolute. That's a lot lower accuracy than most library sqrts do.
Two approximate approaches - you calculate r based on interpolating from the previous value, or use a few terms of a suitable series.
Interpolating from previous r
r = ( x2 + y2 ) 1/2
dr/dx = 1/2 . 2x . ( x2 + y2 ) -1/2 = x/r
double r = 50;
for ( int x = 0; x <= 30; ++x ) {
double r_true = Math.sqrt ( 50*50 + x*x );
System.out.printf ( "x: %d r_true: %f r_approx: %f error: %f%%\n", x, r, r_true, 100 * Math.abs ( r_true - r ) / r );
r = r + ( x + 0.5 ) / r;
}
Gives:
x: 0 r_true: 50.000000 r_approx: 50.000000 error: 0.000000%
x: 1 r_true: 50.010000 r_approx: 50.009999 error: 0.000002%
....
x: 29 r_true: 57.825065 r_approx: 57.801384 error: 0.040953%
x: 30 r_true: 58.335225 r_approx: 58.309519 error: 0.044065%
which seems to meet the requirement of 0.1% error, so I didn't bother coding the next one, as it would require quite a bit more calculation steps.
Truncated Series
The taylor series for sqrt ( 1 + x ) for x near zero is
sqrt ( 1 + x ) = 1 + 1/2 x - 1/8 x2 ... + ( - 1 / 2 )n+1 xn
Using r = y sqrt ( 1 + (x/y)2 ) then you're looking for a term t = ( - 1 / 2 )n+1 0.36n with magnitude less that a 0.001, log ( 0.002 ) > n log ( 0.18 ) or n > 3.6, so taking terms to x^4 should be Ok.

Y=10000
Y2=Y*Y
for x=0..Y2 do
D[x]=sqrt(Y2+x*x)
norm(x,y)=
if (y==0) x
else if (x>y) norm(y,x)
else {
s=Y/y
D[round(x*s)]/s
}
If your coordinates are smooth, then the idea can be extended with linear interpolation. For more precision, increase Y.
The idea is that s*(x,y) is on the line y=Y, which you've precomputed distances for. Get the distance, then divide it by s.
I assume you really do need the distance and not its square.
You may also be able to find a general sqrt implementation that sacrifices some accuracy for speed, but I have a hard time imagining that beating what the FPU can do.
By linear interpolation, I mean to change D[round(x)] to:
f=floor(x)
a=x-f
D[f]*(1-a)+D[f+1]*a

This doesn't really answer your question, but may help...
The first questions I would ask would be:
"do I need the sqrt at all?".
"If not, how can I reduce the number of sqrts?"
then yours: "Can I replace the remaining sqrts with a clever calculation?"
So I'd start with:
Do you need the exact radius, or would radius-squared be acceptable? There are fast approximatiosn to sqrt, but probably not accurate enough for your spec.
Can you process the image using mirrored quadrants or eighths? By processing all pixels at the same radius value in a batch, you can reduce the number of calculations by 8x.
Can you precalculate the radius values? You only need a table that is a quarter (or possibly an eighth) of the size of the image you are processing, and the table would only need to be precalculated once and then re-used for many runs of the algorithm.
So clever maths may not be the fastest solution.

Well there's always trying optimize your sqrt, the fastest one I've seen is the old carmack quake 3 sqrt:
http://betterexplained.com/articles/understanding-quakes-fast-inverse-square-root/
That said, since sqrt is non-linear, you're not going to be able to do simple linear interpolation along your line to get your result. The best idea is to use a table lookup since that will give you blazing fast access to the data. And, since you appear to be iterating by whole integers, a table lookup should be exceedingly accurate.

Well, you can mirror around x=0 to start with (you need only compute n>=0, and the dupe those results to corresponding n<0). After that, I'd take a look at using the derivative on sqrt(a^2+b^2) (or the corresponding sin) to take advantage of the constant dx.
If that's not accurate enough, may I point out that this is a pretty good job for SIMD, which will provide you with a reciprocal square root op on both SSE and VMX (and shader model 2).

This is sort of related to a HAKMEM item:
ITEM 149 (Minsky): CIRCLE ALGORITHM
Here is an elegant way to draw almost
circles on a point-plotting display:
NEW X = OLD X - epsilon * OLD Y
NEW Y = OLD Y + epsilon * NEW(!) X
This makes a very round ellipse
centered at the origin with its size
determined by the initial point.
epsilon determines the angular
velocity of the circulating point, and
slightly affects the eccentricity. If
epsilon is a power of 2, then we don't
even need multiplication, let alone
square roots, sines, and cosines! The
"circle" will be perfectly stable
because the points soon become
periodic.
The circle algorithm was invented by
mistake when I tried to save one
register in a display hack! Ben Gurley
had an amazing display hack using only
about six or seven instructions, and
it was a great wonder. But it was
basically line-oriented. It occurred
to me that it would be exciting to
have curves, and I was trying to get a
curve display hack with minimal
instructions.

Related

Big O notation and measuring time according to it

Suppose we have an algorithm that is of order O(2^n). Furthermore, suppose we multiplied the input size n by 2 so now we have an input of size 2n. How is the time affected? Do we look at the problem as if the original time was 2^n and now it became 2^(2n) so the answer would be that the new time is the power of 2 of the previous time?
Big 0 is not for telling you the actual running time, just how the running time is affected by the size of input. If you double the size of input the complexity is still O(2^n), n is just bigger.
number of elements(n) units of work
1 1
2 4
3 8
4 16
5 32
... ...
10 1024
20 1048576
There's a misunderstanding here about how Big-O relates to execution time.
Consider the following formulas which define execution time:
f1(n) = 2^n + 5000n^2 + 12300
f2(n) = (500 * 2^n) + 6
f3(n) = 500n^2 + 25000n + 456000
f4(n) = 400000000
Each of these functions are O(2^n); that is, they can each be shown to be less than M * 2^n for an arbitrary M and starting n0 value. But obviously, the change in execution time you notice for doubling the size from n1 to 2 * n1 will vary wildly between them (not at all in the case of f4(n)). You cannot use Big-O analysis to determine effects on execution time. It only defines an upper boundary on the execution time (which is not even guaranteed to be the minimum form of the upper bound).
Some related academia below:
There are three notable bounding functions in this category:
O(f(n)): Big-O - This defines a upper-bound.
Ω(f(n)): Big-Omega - This defines a lower-bound.
Θ(f(n)): Big-Theta - This defines a tight-bound.
A given time function f(n) is Θ(g(n)) only if it is also Ω(g(n)) and O(g(n)) (that is, both upper and lower bounded).
You are dealing with Big-O, which is the usual "entry point" to the discussion; we will neglect the other two entirely.
Consider the definition from Wikipedia:
Let f and g be two functions defined on some subset of the real numbers. One writes:
f(x)=O(g(x)) as x tends to infinity
if and only if there is a positive constant M such that for all sufficiently large values of x, the absolute value of f(x) is at most M multiplied by the absolute value of g(x). That is, f(x) = O(g(x)) if and only if there exists a positive real number M and a real number x0 such that
|f(x)| <= M|g(x)| for all x > x0
Going from here, assume we have f1(n) = 2^n. If we were to compare that to f2(n) = 2^(2n) = 4^n, how would f1(n) and f2(n) relate to each other in Big-O terms?
Is 2^n <= M * 4^n for some arbitrary M and n0 value? Of course! Using M = 1 and n0 = 1, it is true. Thus, 2^n is upper-bounded by O(4^n).
Is 4^n <= M * 2^n for some arbitrary M and n0 value? This is where you run into problems... for no constant value of M can you make 2^n grow faster than 4^n as n gets arbitrarily large. Thus, 4^n is not upper-bounded by O(2^n).
See comments for further explanations, but indeed, this is just an example I came up with to help you grasp Big-O concept. That is not the actual algorithmic meaning.
Suppose you have an array, arr = [1, 2, 3, 4, 5].
An example of a O(1) operation would be directly access an index, such as arr[0] or arr[2].
An example of a O(n) operation would be a loop that could iterate through all your array, such as for elem in arr:.
n would be the size of your array. If your array is twice as big as the original array, n would also be twice as big. That's how variables work.
See Big-O Cheat Sheet for complementary informations.

Calculating Collision Times Between Two Circles - Physics

I keep stumbling into game/simulation solutions for finding distance while time is running, and it's not what I'm looking for.
I'm looking for an O(1) formula to calculate the (0 or 1 or 2) clock time(s) in which two circles are exactly r1+r2 distance from each other. Negative time is possible. It's possible two circles don't collide, and they may not have an intersection (as in 2 cars "clipping" each other while driving too close to the middle of the road in opposite directions), which is messing up all my mx+b solutions.
Technically, a single point collision should be possible.
I'm about 100 lines of code deep, and I feel sure there must be a better way, and I'm not even sure whether my test cases are correct or not. My initial setup was:
dist( x1+dx1*t, y1+dy1*t, x2+dx2*t, y2+dy2*t ) == r1+r2
By assuming the distance at any time t could be calculated with Pythagoras, I would like to know the two points in time in which the distance from the centers is precisely the sum of the radii. I solved for a, b, and c and applied the quadratic formula, and I believe that if I'm assuming they were phantom objects, this would give me the first moment of collision and the final moment of collision, and I could assume at every moment between, they are overlapping.
I'm working under the precondition that it's impossible for 2 objects to be overlapping at t0, which means infinite collision of "stuck inside each other" is not possible. I'm also filtering out and using special handling for when the slope is 0 or infinite, which is working.
I tried calculating the distance when, at the moment object 1 is at the intersection point, it's distance from object 2, and likewise when o2 is at the intersection point, but this did not work as it's possible to have collision when they are not at their intersection.
I'm having problems for when the slopes are equal, but different magnitude.
Is there a simple physics/math formula for this already?
Programming language doesn't matter, pseudcode would be great, or any math formula that doesn't have complex symbols (I'm not a math/physics person)... but nothing higher order (I assume python probably has a collide(p1, p2) method already)
There is a simple(-ish) solution. You already mentioned using the quadratic formula which is a good start.
First define your problem where the quadratic formula can be useful, in this case, distance between to centers, over time.
Let's define our time as t
Because we are using two dimensions we can call our dimensions x & y
First let's define the two center points at t = 0 of our circles as a & b
Let's also define our velocity at t = 0 of a & b as u & v respectively.
Finally, assuming a constant acceleration of a & b as o & p respectively.
The equation for a position along any one dimension (which we'll call i) with respect to time t is as follows: i(t) = 1 / 2 * a * t^2 + v * t + i0; with a being constant acceleration, v being initial velocity, and i0 being initial position along dimension i.
We know the distance between two 2D points at any time t is the square root of ((a.x(t) - b.x(t))^2 + (a.y(t) - b.y(t))^2)
Using the formula of position along a dimensions we can substitute everything in the distance equation in terms of just t and the constants we defined earlier. For shorthand we will call the function d(t);
Finally using that equation, we will know that the t values where d(t) = a.radius + b.radius are where collision starts or ends.
To put this in terms of quadratic formula we move the radius to the left so we get d(t) - (a.radius + b.radius) = 0
We can then expand and simplify the resulting equation so everything is in terms of t and the constant values that we were given. Using that solve for both positive & negative values with the quadratic formula.
This will handle errors as well because if you get two objects that will never collide, you will get an undefined or imaginary number.
You should be able to translate the rest into code fairly easily. I'm running out of time atm and will write out a simple solution when I can.
Following up on #TinFoilPancakes answer and heavily using using WolframAlpha to simplify the formulae, I've come up with the following pseudocode, well C# code actually that I've commented somewhat:
The Ball class has the following properties:
public double X;
public double Y;
public double Xvel;
public double Yvel;
public double Radius;
The algorithm:
public double TimeToCollision(Ball other)
{
double distance = (Radius + other.Radius) * (Radius + other.Radius);
double a = (Xvel - other.Xvel) * (Xvel - other.Xvel) + (Yvel - other.Yvel) * (Yvel - other.Yvel);
double b = 2 * ((X - other.X) * (Xvel - other.Xvel) + (Y - other.Y) * (Yvel - other.Yvel));
double c = (X - other.X) * (X - other.X) + (Y - other.Y) * (Y - other.Y) - distance;
double d = b * b - 4 * a * c;
// Ignore glancing collisions that may not cause a response due to limited precision and lead to an infinite loop
if (b > -1e-6 || d <= 0)
return double.NaN;
double e = Math.Sqrt(d);
double t1 = (-b - e) / (2 * a); // Collison time, +ve or -ve
double t2 = (-b + e) / (2 * a); // Exit time, +ve or -ve
// b < 0 => Getting closer
// If we are overlapping and moving closer, collide now
if (t1 < 0 && t2 > 0 && b <= -1e-6)
return 0;
return t1;
}
The method will return the time that the Balls collide, which can be +ve, -ve or NaN, NaN means they won't or didn't collide.
Further points to note are, we can check the discriminant against <zero to bail out early which will be most of the time, and avoid the Sqrt. Also since I'm using this in a continuous collision detection system, I'm ignoring collisions (glancing) that will have little or no impact since it's possible the response to the collision won't change the velocities and lead to the same situation being checked infinitely, freezing the simulation.
The 'b' variable can used for this check since luckily it's similar to the dot product. If b is >-1e-6 ie. they're not moving closer fast enough we return NaN, ie. they don't collide. You can tweak this value to avoid freezes, smaller will allow closer glancing collisions but increase the chance of a freeze when they happen like when a bunch of circles are packed tightly together. Likewise to avoid Balls moving through each other we signal an immediate collison if they're already overlapping and moving closer.

Smooth Coloring Mandelbrot Set Without Complex Number Library

I've coded a basic Mandelbrot explorer in C#, but I have those horrible bands of color, and it's all greyscale.
I have the equation for smooth coloring:
mu = N + 1 - log (log |Z(N)|) / log 2
Where N is the escape count, and |Z(N)| is the modulus of the complex number after the value has escaped, it's this value which I'm unsure of.
My code is based off the pseudo code given on the wikipedia page: http://en.wikipedia.org/wiki/Mandelbrot_set#For_programmers
The complex number is represented by the real values x and y, using this method, how would I calculate the value of |Z(N)| ?
|Z(N)| means the distance to the origin, so you can calculate it via sqrt(x*x + y*y).
If you run into an error with the logarithm: Check the iterations before. If it's part of the Mandelbrot set (iteration = max_iteration), the first logarithm will result 0 and the second will raise an error.
So just add this snippet instead of your old return code. .
if (i < iterations)
{
return i + 1 - Math.Log(Math.Log(Math.Sqrt(x * x + y * y))) / Math.Log(2);
}
return i;
Later, you should divide i by the max_iterations and multiply it with 255. This will give you a nice rgb-value.

Is it possible to optimize this Matlab code for doing vector quantization with centroids from k-means?

I've created a codebook using k-means of size 4000x300 (4000 centroids, each with 300 features). Using the codebook, I then want to label an input vector (for purposes of binning later on). The input vector is of size Nx300, where N is the total number of input instances I receive.
To compute the labels, I calculate the closest centroid for each of the input vectors. To do so, I compare each input vector against all centroids and pick the centroid with the minimum distance. The label is then just the index of that centroid.
My current Matlab code looks like:
function labels = assign_labels(centroids, X)
labels = zeros(size(X, 1), 1);
% for each X, calculate the distance from each centroid
for i = 1:size(X, 1)
% distance of X_i from all j centroids is: sum((X_i - centroid_j)^2)
% note: we leave off the sqrt as an optimization
distances = sum(bsxfun(#minus, centroids, X(i, :)) .^ 2, 2);
[value, label] = min(distances);
labels(i) = label;
end
However, this code is still fairly slow (for my purposes), and I was hoping there might be a way to optimize the code further.
One obvious issue is that there is a for-loop, which is the bane of good performance on Matlab. I've been trying to come up with a way to get rid of it, but with no luck (I looked into using arrayfun in conjunction with bsxfun, but haven't gotten that to work). Alternatively, if someone know of any other way to speed this up, I would be greatly appreciate it.
Update
After doing some searching, I couldn't find a great solution using Matlab, so I decided to look at what is used in Python's scikits.learn package for 'euclidean_distance' (shortened):
XX = sum(X * X, axis=1)[:, newaxis]
YY = Y.copy()
YY **= 2
YY = sum(YY, axis=1)[newaxis, :]
distances = XX + YY
distances -= 2 * dot(X, Y.T)
distances = maximum(distances, 0)
which uses the binomial form of the euclidean distance ((x-y)^2 -> x^2 + y^2 - 2xy), which from what I've read usually runs faster. My completely untested Matlab translation is:
XX = sum(data .* data, 2);
YY = sum(center .^ 2, 2);
[val, ~] = max(XX + YY - 2*data*center');
Use the following function to calculate your distances. You should see an order of magnitude speed up
The two matrices A and B have the columns as the dimenions and the rows as each point.
A is your matrix of centroids. B is your matrix of datapoints.
function D=getSim(A,B)
Qa=repmat(dot(A,A,2),1,size(B,1));
Qb=repmat(dot(B,B,2),1,size(A,1));
D=Qa+Qb'-2*A*B';
You can vectorize it by converting to cells and using cellfun:
[nRows,nCols]=size(X);
XCell=num2cell(X,2);
dist=reshape(cell2mat(cellfun(#(x)(sum(bsxfun(#minus,centroids,x).^2,2)),XCell,'UniformOutput',false)),nRows,nRows);
[~,labels]=min(dist);
Explanation:
We assign each row of X to its own cell in the second line
This piece #(x)(sum(bsxfun(#minus,centroids,x).^2,2)) is an anonymous function which is the same as your distances=... line, and using cell2mat, we apply it to each row of X.
The labels are then the indices of the minimum row along each column.
For a true matrix implementation, you may consider trying something along the lines of:
P2 = kron(centroids, ones(size(X,1),1));
Q2 = kron(ones(size(centroids,1),1), X);
distances = reshape(sum((Q2-P2).^2,2), size(X,1), size(centroids,1));
Note
This assumes the data is organized as [x1 y1 ...; x2 y2 ...;...]
You can use a more efficient algorithm for nearest neighbor search than brute force.
The most popular approach are Kd-Tree. O(log(n)) average query time instead of the O(n) brute force complexity.
Regarding a Maltab implementation of Kd-Trees, you can have a look here

Fast formula for a "high contrast" curve

My inner loop contains a calculation that profiling shows to be problematic.
The idea is to take a greyscale pixel x (0 <= x <= 1), and "increase its contrast". My requirements are fairly loose, just the following:
for x < .5, 0 <= f(x) < x
for x > .5, x < f(x) <= 1
f(0) = 0
f(x) = 1 - f(1 - x), i.e. it should be "symmetric"
Preferably, the function should be smooth.
So the graph must look something like this:
.
I have two implementations (their results differ but both are conformant):
float cosContrastize(float i) {
return .5 - cos(x * pi) / 2;
}
float mulContrastize(float i) {
if (i < .5) return i * i * 2;
i = 1 - i;
return 1 - i * i * 2;
}
So I request either a microoptimization for one of these implementations, or an original, faster formula of your own.
Maybe one of you can even twiddle the bits ;)
Consider the following sigmoid-shaped functions (properly translated to the desired range):
error function
normal CDF
tanh
logit
I generated the above figure using MATLAB. If interested here's the code:
x = -3:.01:3;
plot( x, 2*(x>=0)-1, ...
x, erf(x), ...
x, tanh(x), ...
x, 2*normcdf(x)-1, ...
x, 2*(1 ./ (1 + exp(-x)))-1, ...
x, 2*((x-min(x))./range(x))-1 )
legend({'hard' 'erf' 'tanh' 'normcdf' 'logit' 'linear'})
Trivially you could simply threshold, but I imagine this is too dumb:
return i < 0.5 ? 0.0 : 1.0;
Since you mention 'increasing contrast' I assume the input values are luminance values. If so, and they are discrete (perhaps it's an 8-bit value), you could use a lookup table to do this quite quickly.
Your 'mulContrastize' looks reasonably quick. One optimization would be to use integer math. Let's say, again, your input values could actually be passed as an 8-bit unsigned value in [0..255]. (Again, possibly a fine assumption?) You could do something roughly like...
int mulContrastize(int i) {
if (i < 128) return (i * i) >> 7;
// The shift is really: * 2 / 256
i = 255 - i;
return 255 - ((i * i) >> 7);
A piecewise interpolation can be fast and flexible. It requires only a few decisions followed by a multiplication and addition, and can approximate any curve. It also avoids the courseness that can be introduced by lookup tables (or the additional cost in two lookups followed by an interpolation to smooth this out), though the lut might work perfectly fine for your case.
With just a few segments, you can get a pretty good match. Here there will be courseness in the color gradients, which will be much harder to detect than courseness in the absolute colors.
As Eamon Nerbonne points out in the comments, segmentation can be optimized by "choos[ing] your segmentation points based on something like the second derivative to maximize detail", that is, where the slope is changing the most. Clearly, in my posted example, having three segments in the middle of the five segment case doesn't add much more detail.