Code optimisation of math.asin function vb.net - vb.net

I have the following line of code:
SomeDouble= constant1/ ((a * b) * (Math.Asin((c- a) / (a * d)) + constant2))
The two constants are different and calculated out of the loops, a - d are variables that change each time.
And on the face of it it's pretty fast 0.002ms on average (47,633.588s for 26,508,249 hits). The issue I'm having is it's going to be called billions of times, literally around 20 billion hits each time the software is run. So if I can cut this down to 0.001ms the difference will be substantial. I know that dividing is a very slow process and I expect calculating arcsin is also slow. If anyone can suggest if there's a faster method of calculating arcsin or any other help in speeding up this line of code that would be great. On a side note any advice on whether vb.net's built in math functions are optimised for speed would be great I've noticed that math.sqrt(somevalue) is quicker than (somevalue)^0.5.
Thanks in advanced!

I'd do some tests to make sure that the Math.Asin really is the slowest part of the formula. If it really is slow relative to the other multiplications and divisions, then you could try implementing your own lookup table for Math.Asin. In other words, calculate millions of Math.Asin values in advance, and code them into your program. This would trade the size of your program for speed, so if program size doesn't matter, it might help.

You can use the following approximation for Asin, in my tests it was just under twice as fast as Math.Asin (32bit, better if manually inlined, under 64bit it was slightly better than twice as fast) and it looked reasonably accurate, but you'd have to test whether the accuracy is acceptable.
static double Asin(double x)
{
double x2 = x * x;
double x3 = x2 * x;
const double piover2 = 1.5707963267948966;
const double a = 1.5707288;
const double b = -0.2121144;
const double c = 0.0742610;
const double d = -0.0187293;
return piover2 - Math.Sqrt(1 - x) * (a + b * x + c * x2 + d * x3);[]
}
(this is C# of course, but you can convert it I'm sure)
[EDIT]
Here is the VB version.
Shared Function Asin(ByVal x As Double) As Double
Dim x2 As Double = x * x
Dim x3 As Double = x2 * x
Const piover2 As Double = 1.5707963267948966
Const a As Double = 1.5707288
Const b As Double = -0.2121144
Const c As Double = 0.0742610
Const d As Double = -0.0187293
Return piover2 - Math.Sqrt(1 - x) * (a + b * x + c * x2 + d * x3)
End Function

You could try to implement the series expansion to whatever precision you require, and compare that against Stochastically's suggestion of a lookup table (which may well be how Math.asin ultimately calculates it) built to the precision required. It seems unlikely that you could enjoy the halving of processing time you'd like, however.
If the calculations aren't meaningfully dependent upon one another (or if the dependencies can be isolated into different batches), you might try to run them in parallel (whether on different systems or with different processors), but be wary -- I worked at a space physics lab, and we found that the precision we required generated really annoying anomalies when we ran tests on different systems.
(I also must say, I'm incredibly curious as to why you'd need to run billions of arc-sine calculations.)

Related

Are calculations involving a large matrix using arrays in VBA faster than doing the same calculation manually in Excel?

I am trying to do calculations as part of regression model in Excel.
I need to calculate ((X^T)WX)^(-1)(X^T)WY. Where X, W, Y are matrices and ^T and ^-1 are denoting the matrix transpose and inverting operation.
Now when X, W, Y are of small dimensions I simply run my macro which calculates these values very very fast.
However sometimes I am dealing with the case when say, the dimensions of X, W, Y are 5000 X 5, 5000 X 1 and 5000 X 1 respectively, then the macro can take a lot longer to run.
I have two questions:
Would, instead of using my macro which generates the matrices on Excel sheets and then uses Excel formulas like MMULT and MINVERSE etc. to calculate the output, it be faster for larger dimension matrices if I used arrays in VBA to do all the calculations? (I am not too sure how arrays work in VBA so I don't actually know if it would do anything to excel, and hence if it would be any quicker/less computationally intensive.)
If the answer to the above question is no it would be no quicker. Then does anybody have an idea how to speed such calculations up? Or do I need to simply put up with it and wait.
Thanks for your time.
Considering that the algorithm of the code is the same, the speed ranking is the following:
Dll custom library with C#, C++, C, Java or anything similar
VBA
Excel
I have compared a VBA vs C++ function here, in the long term the result is really bad for VBA.
So, the following Fibonacci with recursion in C++:
int __stdcall FibWithRecursion(int & x)
{
int k = 0;
int p = 0;
if (x == 0)
return 0;
if (x == 1)
return 1;
k = x - 1;
p = x - 2;
return FibWithRecursion(k) + FibWithRecursion(p);
}
is exponentially better, when called in Excel, than the same complexity function in VBA:
Public Function FibWithRecursionVBA(ByRef x As Long) As Long
Dim k As Long: k = 0
Dim p As Long: p = 0
If (x = 0) Then FibWithRecursionVBA = 0: Exit Function
If (x = 1) Then FibWithRecursionVBA = 1: Exit Function
k = x - 1
p = x - 2
FibWithRecursionVBA = FibWithRecursionVBA(k) + FibWithRecursionVBA(p)
End Function
Better late than never:
I use matrices that are bigger, 3 or 4 dimensions, sized like 16k x 26 x 5.
I run through them to find data, apply one or two formulas or make combos with other matrices.
Number one, after starting the macro, open another application like notepad, you might have a nice speed increase ☺ !
Then, I guess you switched of screen updating etc, and turned of automatic calculation
As last: don't put the data in cells, not in arrays.
Just something like:
Dim Matrix1 as String ===>'put it in declarations if you want to use it in other macros as well. Remember you can not do "blabla=activecell.value2" etc anymore!!
In the "Sub()" code, use ReDim Matrix1(1 to a_value, 1 to 2nd_value, ... , 1 to last_value)
Matrix1(45,32,63)="what you want to put there"
After running, just drop the
Matrix1(1 to a_value, 1 to 2nd_value,1) at 1st sheet,
Matrix1(1 to a_value, 1 to 2nd_value,2) at 2nd sheet, etc
Switch on screen updating again, etc
In this way my calculation went from 45 minutes to just one, by avoiding the intermediary screen update
Success, I hope it is useful for somebody

Calculating Collision Times Between Two Circles - Physics

I keep stumbling into game/simulation solutions for finding distance while time is running, and it's not what I'm looking for.
I'm looking for an O(1) formula to calculate the (0 or 1 or 2) clock time(s) in which two circles are exactly r1+r2 distance from each other. Negative time is possible. It's possible two circles don't collide, and they may not have an intersection (as in 2 cars "clipping" each other while driving too close to the middle of the road in opposite directions), which is messing up all my mx+b solutions.
Technically, a single point collision should be possible.
I'm about 100 lines of code deep, and I feel sure there must be a better way, and I'm not even sure whether my test cases are correct or not. My initial setup was:
dist( x1+dx1*t, y1+dy1*t, x2+dx2*t, y2+dy2*t ) == r1+r2
By assuming the distance at any time t could be calculated with Pythagoras, I would like to know the two points in time in which the distance from the centers is precisely the sum of the radii. I solved for a, b, and c and applied the quadratic formula, and I believe that if I'm assuming they were phantom objects, this would give me the first moment of collision and the final moment of collision, and I could assume at every moment between, they are overlapping.
I'm working under the precondition that it's impossible for 2 objects to be overlapping at t0, which means infinite collision of "stuck inside each other" is not possible. I'm also filtering out and using special handling for when the slope is 0 or infinite, which is working.
I tried calculating the distance when, at the moment object 1 is at the intersection point, it's distance from object 2, and likewise when o2 is at the intersection point, but this did not work as it's possible to have collision when they are not at their intersection.
I'm having problems for when the slopes are equal, but different magnitude.
Is there a simple physics/math formula for this already?
Programming language doesn't matter, pseudcode would be great, or any math formula that doesn't have complex symbols (I'm not a math/physics person)... but nothing higher order (I assume python probably has a collide(p1, p2) method already)
There is a simple(-ish) solution. You already mentioned using the quadratic formula which is a good start.
First define your problem where the quadratic formula can be useful, in this case, distance between to centers, over time.
Let's define our time as t
Because we are using two dimensions we can call our dimensions x & y
First let's define the two center points at t = 0 of our circles as a & b
Let's also define our velocity at t = 0 of a & b as u & v respectively.
Finally, assuming a constant acceleration of a & b as o & p respectively.
The equation for a position along any one dimension (which we'll call i) with respect to time t is as follows: i(t) = 1 / 2 * a * t^2 + v * t + i0; with a being constant acceleration, v being initial velocity, and i0 being initial position along dimension i.
We know the distance between two 2D points at any time t is the square root of ((a.x(t) - b.x(t))^2 + (a.y(t) - b.y(t))^2)
Using the formula of position along a dimensions we can substitute everything in the distance equation in terms of just t and the constants we defined earlier. For shorthand we will call the function d(t);
Finally using that equation, we will know that the t values where d(t) = a.radius + b.radius are where collision starts or ends.
To put this in terms of quadratic formula we move the radius to the left so we get d(t) - (a.radius + b.radius) = 0
We can then expand and simplify the resulting equation so everything is in terms of t and the constant values that we were given. Using that solve for both positive & negative values with the quadratic formula.
This will handle errors as well because if you get two objects that will never collide, you will get an undefined or imaginary number.
You should be able to translate the rest into code fairly easily. I'm running out of time atm and will write out a simple solution when I can.
Following up on #TinFoilPancakes answer and heavily using using WolframAlpha to simplify the formulae, I've come up with the following pseudocode, well C# code actually that I've commented somewhat:
The Ball class has the following properties:
public double X;
public double Y;
public double Xvel;
public double Yvel;
public double Radius;
The algorithm:
public double TimeToCollision(Ball other)
{
double distance = (Radius + other.Radius) * (Radius + other.Radius);
double a = (Xvel - other.Xvel) * (Xvel - other.Xvel) + (Yvel - other.Yvel) * (Yvel - other.Yvel);
double b = 2 * ((X - other.X) * (Xvel - other.Xvel) + (Y - other.Y) * (Yvel - other.Yvel));
double c = (X - other.X) * (X - other.X) + (Y - other.Y) * (Y - other.Y) - distance;
double d = b * b - 4 * a * c;
// Ignore glancing collisions that may not cause a response due to limited precision and lead to an infinite loop
if (b > -1e-6 || d <= 0)
return double.NaN;
double e = Math.Sqrt(d);
double t1 = (-b - e) / (2 * a); // Collison time, +ve or -ve
double t2 = (-b + e) / (2 * a); // Exit time, +ve or -ve
// b < 0 => Getting closer
// If we are overlapping and moving closer, collide now
if (t1 < 0 && t2 > 0 && b <= -1e-6)
return 0;
return t1;
}
The method will return the time that the Balls collide, which can be +ve, -ve or NaN, NaN means they won't or didn't collide.
Further points to note are, we can check the discriminant against <zero to bail out early which will be most of the time, and avoid the Sqrt. Also since I'm using this in a continuous collision detection system, I'm ignoring collisions (glancing) that will have little or no impact since it's possible the response to the collision won't change the velocities and lead to the same situation being checked infinitely, freezing the simulation.
The 'b' variable can used for this check since luckily it's similar to the dot product. If b is >-1e-6 ie. they're not moving closer fast enough we return NaN, ie. they don't collide. You can tweak this value to avoid freezes, smaller will allow closer glancing collisions but increase the chance of a freeze when they happen like when a bunch of circles are packed tightly together. Likewise to avoid Balls moving through each other we signal an immediate collison if they're already overlapping and moving closer.

Is *0.25 faster than / 4 in VB.NET

I know similar questions have been answered before but none of the answers I could find were specific to .NET.
Does the VB.NET compiler optimize expressions like:
x = y / 4
by compiling:
x = y * 0.25
And before anyone says don't worry the difference is small, i already know that but this will be executed a lot and choosing one over the other could make a useful difference in total execution time and will be much easier to do than a more major refactoring exercise.
Perhaps I should have mentioned for the benefit of those who live in an environment of total freedom: I am not at liberty to change to a different language. If I were I would probably have written this code in Fortran.
As suggested here is a simple comparison:
Dim y = 1234.567
Dim x As Double
Dim c = 10000000000.0
Dim ds As Date
Dim df As Date
ds = Now
For i = 1 To c
x = y / 4
Next
df = Now
Console.WriteLine("divide " & (df - ds).ToString)
ds = Now
For i = 1 To c
x = y * 0.25
Next
df = Now
Console.WriteLine("multiply " & (df - ds).ToString)
The output is:
divide 00:00:52.7452740
multiply 00:00:47.2607256
So divide does appear to be slower by about 10%. But this difference is so small that I suspected it to be accidental. Another two runs give:
divide 00:00:45.1280000
multiply 00:00:45.9540000
divide 00:00:45.9895985
multiply 00:00:46.8426838
Suggesting that in fact the optimization is made or that the arithmetic operations are a vanishingly small part of the total time.
In either case it means that I don't need to care which is used.
In fact ildasm shows that the IL uses div in the first loop and mul in the second. So it doesn't make the subsitution after all. Unless the JIT compiler does.

Is there an iterative way to calculate radii along a scanline?

I am processing a series of points which all have the same Y value, but different X values. I go through the points by incrementing X by one. For example, I might have Y = 50 and X is the integers from -30 to 30. Part of my algorithm involves finding the distance to the origin from each point and then doing further processing.
After profiling, I've found that the sqrt call in the distance calculation is taking a significant amount of my time. Is there an iterative way to calculate the distance?
In other words:
I want to efficiently calculate: r[n] = sqrt(x[n]*x[n] + y*y)). I can save information from the previous iteration. Each iteration changes by incrementing x, so x[n] = x[n-1] + 1. I can not use sqrt or trig functions because they are too slow except at the beginning of each scanline.
I can use approximations as long as they are good enough (less than 0.l% error) and the errors introduced are smooth (I can't bin to a pre-calculated table of approximations).
Additional information:
x and y are always integers between -150 and 150
I'm going to try a couple ideas out tomorrow and mark the best answer based on which is fastest.
Results
I did some timings
Distance formula: 16 ms / iteration
Pete's interperlating solution: 8 ms / iteration
wrang-wrang pre-calculation solution: 8ms / iteration
I was hoping the test would decide between the two, because I like both answers. I'm going to go with Pete's because it uses less memory.
Just to get a feel for it, for your range y = 50, x = 0 gives r = 50 and y = 50, x = +/- 30 gives r ~= 58.3. You want an approximation good for +/- 0.1%, or +/- 0.05 absolute. That's a lot lower accuracy than most library sqrts do.
Two approximate approaches - you calculate r based on interpolating from the previous value, or use a few terms of a suitable series.
Interpolating from previous r
r = ( x2 + y2 ) 1/2
dr/dx = 1/2 . 2x . ( x2 + y2 ) -1/2 = x/r
double r = 50;
for ( int x = 0; x <= 30; ++x ) {
double r_true = Math.sqrt ( 50*50 + x*x );
System.out.printf ( "x: %d r_true: %f r_approx: %f error: %f%%\n", x, r, r_true, 100 * Math.abs ( r_true - r ) / r );
r = r + ( x + 0.5 ) / r;
}
Gives:
x: 0 r_true: 50.000000 r_approx: 50.000000 error: 0.000000%
x: 1 r_true: 50.010000 r_approx: 50.009999 error: 0.000002%
....
x: 29 r_true: 57.825065 r_approx: 57.801384 error: 0.040953%
x: 30 r_true: 58.335225 r_approx: 58.309519 error: 0.044065%
which seems to meet the requirement of 0.1% error, so I didn't bother coding the next one, as it would require quite a bit more calculation steps.
Truncated Series
The taylor series for sqrt ( 1 + x ) for x near zero is
sqrt ( 1 + x ) = 1 + 1/2 x - 1/8 x2 ... + ( - 1 / 2 )n+1 xn
Using r = y sqrt ( 1 + (x/y)2 ) then you're looking for a term t = ( - 1 / 2 )n+1 0.36n with magnitude less that a 0.001, log ( 0.002 ) > n log ( 0.18 ) or n > 3.6, so taking terms to x^4 should be Ok.
Y=10000
Y2=Y*Y
for x=0..Y2 do
D[x]=sqrt(Y2+x*x)
norm(x,y)=
if (y==0) x
else if (x>y) norm(y,x)
else {
s=Y/y
D[round(x*s)]/s
}
If your coordinates are smooth, then the idea can be extended with linear interpolation. For more precision, increase Y.
The idea is that s*(x,y) is on the line y=Y, which you've precomputed distances for. Get the distance, then divide it by s.
I assume you really do need the distance and not its square.
You may also be able to find a general sqrt implementation that sacrifices some accuracy for speed, but I have a hard time imagining that beating what the FPU can do.
By linear interpolation, I mean to change D[round(x)] to:
f=floor(x)
a=x-f
D[f]*(1-a)+D[f+1]*a
This doesn't really answer your question, but may help...
The first questions I would ask would be:
"do I need the sqrt at all?".
"If not, how can I reduce the number of sqrts?"
then yours: "Can I replace the remaining sqrts with a clever calculation?"
So I'd start with:
Do you need the exact radius, or would radius-squared be acceptable? There are fast approximatiosn to sqrt, but probably not accurate enough for your spec.
Can you process the image using mirrored quadrants or eighths? By processing all pixels at the same radius value in a batch, you can reduce the number of calculations by 8x.
Can you precalculate the radius values? You only need a table that is a quarter (or possibly an eighth) of the size of the image you are processing, and the table would only need to be precalculated once and then re-used for many runs of the algorithm.
So clever maths may not be the fastest solution.
Well there's always trying optimize your sqrt, the fastest one I've seen is the old carmack quake 3 sqrt:
http://betterexplained.com/articles/understanding-quakes-fast-inverse-square-root/
That said, since sqrt is non-linear, you're not going to be able to do simple linear interpolation along your line to get your result. The best idea is to use a table lookup since that will give you blazing fast access to the data. And, since you appear to be iterating by whole integers, a table lookup should be exceedingly accurate.
Well, you can mirror around x=0 to start with (you need only compute n>=0, and the dupe those results to corresponding n<0). After that, I'd take a look at using the derivative on sqrt(a^2+b^2) (or the corresponding sin) to take advantage of the constant dx.
If that's not accurate enough, may I point out that this is a pretty good job for SIMD, which will provide you with a reciprocal square root op on both SSE and VMX (and shader model 2).
This is sort of related to a HAKMEM item:
ITEM 149 (Minsky): CIRCLE ALGORITHM
Here is an elegant way to draw almost
circles on a point-plotting display:
NEW X = OLD X - epsilon * OLD Y
NEW Y = OLD Y + epsilon * NEW(!) X
This makes a very round ellipse
centered at the origin with its size
determined by the initial point.
epsilon determines the angular
velocity of the circulating point, and
slightly affects the eccentricity. If
epsilon is a power of 2, then we don't
even need multiplication, let alone
square roots, sines, and cosines! The
"circle" will be perfectly stable
because the points soon become
periodic.
The circle algorithm was invented by
mistake when I tried to save one
register in a display hack! Ben Gurley
had an amazing display hack using only
about six or seven instructions, and
it was a great wonder. But it was
basically line-oriented. It occurred
to me that it would be exciting to
have curves, and I was trying to get a
curve display hack with minimal
instructions.

What's the fastest way to divide an integer by 3?

int x = n / 3; // <-- make this faster
// for instance
int a = n * 3; // <-- normal integer multiplication
int b = (n << 1) + n; // <-- potentially faster multiplication
The guy who said "leave it to the compiler" was right, but I don't have the "reputation" to mod him up or comment. I asked gcc to compile int test(int a) { return a / 3; } for an ix86 and then disassembled the output. Just for academic interest, what it's doing is roughly multiplying by 0x55555556 and then taking the top 32 bits of the 64 bit result of that. You can demonstrate this to yourself with eg:
$ ruby -e 'puts(60000 * 0x55555556 >> 32)'
20000
$ ruby -e 'puts(72 * 0x55555556 >> 32)'
24
$
The wikipedia page on Montgomery division is hard to read but fortunately the compiler guys have done it so you don't have to.
This is the fastest as the compiler will optimize it if it can depending on the output processor.
int a;
int b;
a = some value;
b = a / 3;
There is a faster way to do it if you know the ranges of the values, for example, if you are dividing a signed integer by 3 and you know the range of the value to be divided is 0 to 768, then you can multiply it by a factor and shift it to the left by a power of 2 to that factor divided by 3.
eg.
Range 0 -> 768
you could use shifting of 10 bits, which multiplying by 1024, you want to divide by 3 so your multiplier should be 1024 / 3 = 341,
so you can now use (x * 341) >> 10
(Make sure the shift is a signed shift if using signed integers), also make sure the shift is an actually shift and not a bit ROLL
This will effectively divide the value 3, and will run at about 1.6 times the speed as a natural divide by 3 on a standard x86 / x64 CPU.
Of course the only reason you can make this optimization when the compiler cant is because the compiler does not know the maximum range of X and therefore cannot make this determination, but you as the programmer can.
Sometime it may even be more beneficial to move the value into a larger value and then do the same thing, ie. if you have an int of full range you could make it an 64-bit value and then do the multiply and shift instead of dividing by 3.
I had to do this recently to speed up image processing, i needed to find the average of 3 color channels, each color channel with a byte range (0 - 255). red green and blue.
At first i just simply used:
avg = (r + g + b) / 3;
(So r + g + b has a maximum of 768 and a minimum of 0, because each channel is a byte 0 - 255)
After millions of iterations the entire operation took 36 milliseconds.
I changed the line to:
avg = (r + g + b) * 341 >> 10;
And that took it down to 22 milliseconds, its amazing what can be done with a little ingenuity.
This speed up occurred in C# even though I had optimisations turned on and was running the program natively without debugging info and not through the IDE.
See How To Divide By 3 for an extended discussion of more efficiently dividing by 3, focused on doing FPGA arithmetic operations.
Also relevant:
Optimizing integer divisions with Multiply Shift in C#
Depending on your platform and depending on your C compiler, a native solution like just using
y = x / 3
Can be fast or it can be awfully slow (even if division is done entirely in hardware, if it is done using a DIV instruction, this instruction is about 3 to 4 times slower than a multiplication on modern CPUs). Very good C compilers with optimization flags turned on may optimize this operation, but if you want to be sure, you are better off optimizing it yourself.
For optimization it is important to have integer numbers of a known size. In C int has no known size (it can vary by platform and compiler!), so you are better using C99 fixed-size integers. The code below assumes that you want to divide an unsigned 32-bit integer by three and that you C compiler knows about 64 bit integer numbers (NOTE: Even on a 32 bit CPU architecture most C compilers can handle 64 bit integers just fine):
static inline uint32_t divby3 (
uint32_t divideMe
) {
return (uint32_t)(((uint64_t)0xAAAAAAABULL * divideMe) >> 33);
}
As crazy as this might sound, but the method above indeed does divide by 3. All it needs for doing so is a single 64 bit multiplication and a shift (like I said, multiplications might be 3 to 4 times faster than divisions on your CPU). In a 64 bit application this code will be a lot faster than in a 32 bit application (in a 32 bit application multiplying two 64 bit numbers take 3 multiplications and 3 additions on 32 bit values) - however, it might be still faster than a division on a 32 bit machine.
On the other hand, if your compiler is a very good one and knows the trick how to optimize integer division by a constant (latest GCC does, I just checked), it will generate the code above anyway (GCC will create exactly this code for "/3" if you enable at least optimization level 1). For other compilers... you cannot rely or expect that it will use tricks like that, even though this method is very well documented and mentioned everywhere on the Internet.
Problem is that it only works for constant numbers, not for variable ones. You always need to know the magic number (here 0xAAAAAAAB) and the correct operations after the multiplication (shifts and/or additions in most cases) and both is different depending on the number you want to divide by and both take too much CPU time to calculate them on the fly (that would be slower than hardware division). However, it's easy for a compiler to calculate these during compile time (where one second more or less compile time plays hardly a role).
For 64 bit numbers:
uint64_t divBy3(uint64_t x)
{
return x*12297829382473034411ULL;
}
However this isn't the truncating integer division you might expect.
It works correctly if the number is already divisible by 3, but it returns a huge number if it isn't.
For example if you run it on for example 11, it returns 6148914691236517209. This looks like a garbage but it's in fact the correct answer: multiply it by 3 and you get back the 11!
If you are looking for the truncating division, then just use the / operator. I highly doubt you can get much faster than that.
Theory:
64 bit unsigned arithmetic is a modulo 2^64 arithmetic.
This means for each integer which is coprime with the 2^64 modulus (essentially all odd numbers) there exists a multiplicative inverse which you can use to multiply with instead of division. This magic number can be obtained by solving the 3*x + 2^64*y = 1 equation using the Extended Euclidean Algorithm.
What if you really don't want to multiply or divide? Here is is an approximation I just invented. It works because (x/3) = (x/4) + (x/12). But since (x/12) = (x/4) / 3 we just have to repeat the process until its good enough.
#include <stdio.h>
void main()
{
int n = 1000;
int a,b;
a = n >> 2;
b = (a >> 2);
a += b;
b = (b >> 2);
a += b;
b = (b >> 2);
a += b;
b = (b >> 2);
a += b;
printf("a=%d\n", a);
}
The result is 330. It could be made more accurate using b = ((b+2)>>2); to account for rounding.
If you are allowed to multiply, just pick a suitable approximation for (1/3), with a power-of-2 divisor. For example, n * (1/3) ~= n * 43 / 128 = (n * 43) >> 7.
This technique is most useful in Indiana.
I don't know if it's faster but if you want to use a bitwise operator to perform binary division you can use the shift and subtract method described at this page:
Set quotient to 0
Align leftmost digits in dividend and divisor
Repeat:
If that portion of the dividend above the divisor is greater than or equal to the divisor:
Then subtract divisor from that portion of the dividend and
Concatentate 1 to the right hand end of the quotient
Else concatentate 0 to the right hand end of the quotient
Shift the divisor one place right
Until dividend is less than the divisor:
quotient is correct, dividend is remainder
STOP
For really large integer division (e.g. numbers bigger than 64bit) you can represent your number as an int[] and perform division quite fast by taking two digits at a time and divide them by 3. The remainder will be part of the next two digits and so forth.
eg. 11004 / 3 you say
11/3 = 3, remaineder = 2 (from 11-3*3)
20/3 = 6, remainder = 2 (from 20-6*3)
20/3 = 6, remainder = 2 (from 20-6*3)
24/3 = 8, remainder = 0
hence the result 3668
internal static List<int> Div3(int[] a)
{
int remainder = 0;
var res = new List<int>();
for (int i = 0; i < a.Length; i++)
{
var val = remainder + a[i];
var div = val/3;
remainder = 10*(val%3);
if (div > 9)
{
res.Add(div/10);
res.Add(div%10);
}
else
res.Add(div);
}
if (res[0] == 0) res.RemoveAt(0);
return res;
}
If you really want to see this article on integer division, but it only has academic merit ... it would be an interesting application that actually needed to perform that benefited from that kind of trick.
Easy computation ... at most n iterations where n is your number of bits:
uint8_t divideby3(uint8_t x)
{
uint8_t answer =0;
do
{
x>>=1;
answer+=x;
x=-x;
}while(x);
return answer;
}
A lookup table approach would also be faster in some architectures.
uint8_t DivBy3LU(uint8_t u8Operand)
{
uint8_t ai8Div3 = [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, ....];
return ai8Div3[u8Operand];
}