How could I optimize my code to cut down on the cycles it takes to run it? - optimization

I am a newcomer to assembly language in MIPS but I have prior experience in JAVA. I have the following block of code and was wondering how I could make it significantly faster. As you can see, this code takes a total of 45 cycles to run. You will notice that the div instruction is a big portion of the total. Maybe I could add something else in the code in place of div to optimize the code and cut down on cycles?
The code:
li $t0, -32 ----------------------2 cycles
lw $t2, 0($s1)--------------------1 cycle
div $t2, $t2, $t0------------------41 cycles
sw $t2, 0($s1)--------------------1 cycle
total cycles----------------45 cycles
Your help is much appreciated. Thanks.

Related

Break if Newton's method is not convergent

I'm trying to implement Newton's method for polynomials to find zero of function. But I must predict case when function hasn't a root. I'm wondering how can I detect moment when the method becomes divergent and then stop procedure?
Thank you in advance for any help
Generally, if the root is not found after 10 iterations, then the initial point was bad. To be safe take 15 or 20 iterations. Or check after 5-10 iterations for quadratic convergence, measured by the function value decreasing from iteration to iteration faster than by a factor of 0.25.
Restart in a bad case with a different point.

Why does floating point addition took longer than multiplication

I was working with PIC18f4550 and the program is critical to speed. when I multiply two floating variables it tooks the PIC about 140 cycles to perform the multiplication. I am measuring it with PIC18f4550 timer1.
variable_1 = variable_2 * variable_3; // took 140 cycles to implement
On the the other hand when I add the same two variables the PIC tooks 280 cycles to perfom the addition.
variable_1 = variable_2 + variable_3; // took 280 cycles to implement
I have seen that the number of cycles vary if the variables changed depend on their exponents.
What is the reason of those more cycles? though I was thinking the addition is more simple than multiplication.
Is there any solution?
For floating point addition, the operands need to be adjusted so that they have the same exponent before the add, and that involves shifting one of the mantissas across byte boundaries, whereas a multiply is basically multiplying the mantissas and adding the exponents.
Since the PIC apparently has a small hardware multiplier, it may not be surprising that sometimes the multiply can be faster than doing a multi-byte shift (especially if the PIC only has single bit shift instructions).
Unless a processor has direct support for it, floating point is always slow, and you should certainly consider arranging your code to use fixed point if at all possible. Getting rid of the floating point library would probably free up a lot of code space as well.

Time complexity of program that run infinitely

Does time complexity of the following segment of program could be O(2^n)?
I’m confused
n=1;
for j=1 to n do
output(j);
n=2*n;
end {for}
No, this is O(n).
You are just raising n to the 2^n power.
This is because the number of iterations of the loop is "n", regardless of the final answer or the computation inside it.

Variable time step bug with Box2D

Can anybody spot what is wrong with the code below. It is supposed to average the frame interval (dt) for the previous TIME_STEPS number of frames.
I'm using Box2d and cocos2d, although I don't think the cocos2d bit is very relevent.
-(void) update: (ccTime) dt
{
float32 timeStep;
const int32 velocityIterations = 8;
const int32 positionIterations = 3;
// Average the previous TIME_STEPS time steps
for (int i = 0; i < TIME_STEPS; i++)
{
timeStep += previous_time_steps[i];
}
timeStep = timeStep/TIME_STEPS;
// step the world
[GB2Engine sharedInstance].world->Step(timeStep, velocityIterations, positionIterations);
for (int i = 0; i < TIME_STEPS - 1; i++)
{
previous_time_steps[i] = previous_time_steps[i+1];
}
previous_time_steps[TIME_STEPS - 1] = dt;
}
The previous_time_steps array is initially filled with whatever the animation interval is set too.
This doesn't do what I would expect it too. On devices with a low frame rate it speeds up the simulation and on devices with a high frame rate it slows it down. I'm sure it's something stupid I'm over looking.
I know box2D likes to work with fixed times steps but I really don't have a choice. My game runs at a very variable frame rate on the various devices so a fixed time stop just won't work. The game runs at an average of 40 fps, but on some of the crappier devices like the first gen iPad it runs at barely 30 frames per second. The third gen ipad runs it at 50/60 frames per second.
I'm open to suggestion on other ways of dealing with this problem too. Any advice would be appreciated.
Something else unusual I should note that somebody might have some insight into is the fact that running any debug optimisations on the build has a huge effect on the above. The frame rate isn't changed much when debug optimisations are set to -Os vs -O0. But when the debut optimisations are set to -Os the physics simulation runs much faster than -O0 when the above code is active. If I just use dt as the interval instead of the above code then the debug optimisations make no difference.
I'm totally confused by that.
On devices with a low frame rate it speeds up the simulation and on
devices with a high frame rate it slows it down.
That's what using a variable time step is all about. If you only get 10 fps the physics engine will iterate the world faster because the delta time is larger.
PS: If you do any kind of performance tests like these, run them with the release build. That also ensures that (most) logging is disabled and code optimizations are on. It's possible that you simply experience much greater impact on performance from debugging code on older devices.
Also, what value is TIME_STEPS? It shouldn't be more than 10, maybe 20 at most. The alternative to averaging is to use delta time directly, but if delta time is greater than a certain threshold (30 fps) switch to using a fixed delta time (cap it). Because variable time step below 30 fps can get really ugly, it's probably better in such cases to allow the physics engine to slow down with the framerate or else the game will become harder if not unplayable at lower fps.

analog milliseconds for clock in iphone

I am actually trying to make an analog stopwatch app for iOS.Does anybody know what will be the right approach to have an analog clock with milliseconds hand. My problem is that the core graphics of iOS SDK does not support that high a refresh rate to refresh the movement of the milliseconds hand. Can anybody help with OpenGL-ES since I have a very little experience with OpenGL, so just need some tips for a head start.
Assuming you know you won't get the same result of your TAG Heuer watch (because of the refresh rates), you should interpolate the time to your needs.
To make things easier, I'll try to demonstrate a pointer that makes one lap each second.
Step 1: Get the elapsed time (assuming each unit is 1/100 second). Example value: 234 (wich is 2.34 seconds, in our scale).
Step 2: Reduce it to the elapsed time within your timeframe. (if you're measuring 1/100 second, you already used 200 for 2 full laps, you only need the ramaining of that). In our case: 34. How to obtain? In C: 234 % 100 = 34.
Step 3: Rotate your coordinates accordingly: in pure OpenGL: glRotatef(((float)34/100)*360, 0, 1, 0); (This is rotating around the Y axis. The OpenGL uses degrees, so, a full circle = 360).
Step 4: Draw your pointer
Step 5: Start over (since you're retrieving the time again in step 1, you'll redraw your pointer on the new location).
Remember that this is just the "drawing" phase and Step 5 is just a consequence of your running loop and is illustrated just for clarification.
Hope it helps get you started. If you need more specifics, just comment on the answer and I'll try to help you out!