Faster calculation for large amounts of data / inner loop - objective-c

So, I am programming a simple Mandelbrot renderer.
My inner loop (which is executed up to ~100,000,000 times each time I draw on screen) looks like this:
Complex position = {re,im};
Complex z = {0.0, 0.0};
uint32_t it = 0;
for (; it < maxIterations; it++)
{
//Square z
double old_re = z.re;
z.re = z.re*z.re - z.im*z.im;
z.im = 2*old_re*z.im;
//Add c
z.re = z.re+position.re;
z.im = z.im+position.im;
//Exit condition (mod(z) > 5)
if (sqrt(z.re*z.re + z.im*z.im) > 5.0f)
break;
}
//Color in the pixel according to value of 'it'
Just some very simple calculations. This takes between 0.5 and a couple of seconds, depending on the zoom and so on, but i need it to be much faster, to enable (almost) smooth scrolling.
My question is: What is my best bet to achieve the maximum possible calculation speed?
OpenCl to use the GPU? Coding it in assembly? Dividing the image into small pieces and dispatch the calculation of each piece on another thread? A combination of those?
Any help is appreciated!

I have written a Mandelbrot set renderer several times... and here are the things that you should keep in mind...
The things that take the longest are the ones that never escape and take all the iterations.
a. so you can make a region in the middle out of a few rectangles and check that first.
any starting point with a real and imaginary part between -1 and 1 will never escape.
you can cache points (20, or 30) in a rolling buffer and if you ever see a point in the buffer that you just calculated means that you have a cycle and it will never escape.
You can use a more general logic that doesn't require a square root... in that if any part is less than -2 or more than 2 it will race out of control and can be considered escaped.
But you can also break this up because each point is its own thing, so you can make a separate thread or gcd dispatch or whatever for each row or quadrant... it is a very easy problem to divide up and run in parallel.

In addition to the comments by #Grady Player you could start just by optimising your code
//Add c
z.re += position.re;
z.im += position.im;
//Exit condition (mod(z) > 5)
if (z.re*z.re + z.im*z.im > 25.0f)
break;
The compiler may optimise the first, but the second will certainly help.
Why are you coding your own complex rather than using complex.h

Related

VTK cutter output

I am seeking a solution of connecting all the lines that have the same slope and share a common point. For example, after I load a STL file and cut it using a plane, the cutter output includes the points defining the contour. Connecting them one by one forms a (or multiple) polyline. However, some lines can be merged when their slopes are the same and they share a common point. E.g., [[0,0,0],[0,0,1]] and [[0,0,1],[0,0,2]] can be represented by one single line [[0,0,0],[0,0,2]].
I wrote a function that can analyse all the lines and connect them if they can be merged. But when the number of lines are huge, this process is slow. I am thinking in the VTK pipeline, is there a way to do the line merging?
Cheers!
plane = vtk.vtkPlane()
plane.SetOrigin([0,0,5])
plane.SetNormal([0,0,1])
cutter = vtk.vtkCutter()
cutter.SetCutFunction(plane)
cutter.SetInput(triangleFilter.GetOutput())
cutter.Update()
cutStrips = vtk.vtkStripper()
cutStrips.SetInputConnection(cutter.GetOutputPort())
cutStrips.Update()
cleanDataFilter = vtk.vtkCleanPolyData()
cleanDataFilter.AddInput(cutStrips.GetOutput())
cleanDataFilter.Update()
cleanData = cleanDataFilter.GetOutput()
print cleanData.GetPoint(0)
print cleanData.GetPoint(1)
print cleanData.GetPoint(2)
print cleanData.GetPoint(3)
print cleanData.GetPoint(4)
The output is:
(0.0, 0.0, 5.0)
(5.0, 0.0, 5.0)
(10.0, 0.0, 5.0)
(10.0, 5.0, 5.0)
(10.0, 10.0, 5.0)
Connect the above points one by one will form a polyline representing the cut result. As we can see, the line [point0, point1] and [point1, point2] can be merged.
Below is the code for merging the lines:
Assume that the LINES are represented by list: [[(p0),(p1)],[(p1),(p2)],[(p2),(p3)],...]
appended = 0
CurrentLine = LINES[0]
CurrentConnectedLine = CurrentLine
tempLineCollection = LINES[1:len(LINES)]
while True:
for HL in tempLineCollection:
QCoreApplication.processEvents()
if checkParallelAndConnect(CurrentConnectedLine, HL):
appended = 1
LINES.remove(HL)
CurrentConnectedLine = ConnectLines(CurrentConnectedLine, HL)
processedPool.append(CurrentConnectedLine)
if len(tempLineCollection) == 1:
processedPool.append(tempLineCollection[0])
LINES.remove(CurrentLine)
if len(LINES) >= 2:
CurrentLine = LINES[0]
CurrentConnectedLine = CurrentLine
tempLineCollection = LINES[1:len(LINES)]
appended = 0
else:
break
Solution:
I figured out a way of further accelerating this process using some vtk data structure. I found out that a polyline line will be stored in a cell, which can be checked by using GetCellType(). Since the point order for a polyline is sorted already, We do not need to search globally which lines are colinear with the current one. For each point on the polyline, I just need to check the point[i-1], point[i], point[i+1]. And if they are colinear, the end of the line will be updated to the next point. This process continues until the end of the polyline is reached. The speed increases by a huge amount compared with the global search approach.
Not sure if it is the main source of slowness (depends on how many positive hits on the colinearity you have), but removing items from a vector is costly (O(n)), since it requires reorganizing the rest of the vector, you should avoid it. But even without hits on colinearity, the LINES.remove(CurrentLine) call is surely slowing things down and there isn't really any need for it - just leave the vector untouched, write the final results to a new vector (processedPool) and get rid of the LINES vector in the end. You can modify your algorithm by making a bool array (vector), initialized at "false" for each item, then when you remove a line, you don't actually remove it, but only mark it as "true" and you skip all lines for which you have "true", i.e. something like this (I don't speak python so the syntax is not accurate):
wasRemoved = bool vector of the size of LINES initialized at false for each entry
for CurrentLineIndex = 0; CurrentLineIndex < sizeof(LINES); CurrentLineIndex++
if (wasRemoved[CurrentLineIndex])
continue // skip a segment that was already removed
CurrentConnectedLine = LINES[CurrentLineIndex]
for HLIndex = CurrentLineIndex + 1; HLIndex < sizeof(LINES); HLIndex++:
if (wasRemoved[HLIndex])
continue;
HL = LINES[HLIndex]
QCoreApplication.processEvents()
if checkParallelAndConnect(CurrentConnectedLine, HL):
wasRemoved[HLIndex] = true
CurrentConnectedLine = ConnectLines(CurrentConnectedLine, HL)
processedPool.append(CurrentConnectedLine)
wasRemoved[CurrentLineIndex] = true // this is technically not needed since you won't go back in the vector anyway
LINES = processedPool
BTW, the really correct data structure for LINES to use for that kind of algorithm would be a linked list, since then you would have O(1) complexity for removal and you wouldn't need the boolean array. But a quick googling showed that that's not how lists are implemented in Python, also don't know if it would not interfere with other parts of your program. Alternatively, using a set might make it faster (though I would expect times similar to my "bool array" solution), see python 2.7 set and list remove time complexity
If this does not do the trick, I suggest you measure times of individual parts of the program to find the bottleneck.

Do - While now i am Logically Stuck

I am proper confused.com now and my lack of programming knowledge is clearly showing... so time to call in the pro's! (as a side note i do feel that i have learned a great deal already from you all)
The Problem
I have a little app that takes the width of a virtual wall then asks for the tile size, it then works out how many tiles you can fit with no gap between the width, if the tiles do not fit equally the user is presented with two buttons "grow wall" or "shrink wall", in this example the code below if for growing the wall. It should then run through the while trying to find out in increments of 0.1 how wide the wall needs to be to allow the tiles to fit exactly. It needs to be able to handle double values such as the wall being 12.5 feet wide with a tile width and length 13.6 that is why i am using doubles not ints
I have the following code (but think i am simplifying this way too much)
either way it is not working i have spent half a day reading playing etc but can't figure it out, am i going down the wrong road or am i on the correct track
As always any advice would be warmly received
Mr H
{
double wWdth;
double tWdth;
double wdivision = 3;
NSString *growstring = [NSString stringWithFormat:#"%#,%#", self.wallWidth.text, self.tileWidth.text];
NSArray *wall = [growstring componentsSeparatedByString:#","];
NSLog(#"%#",wall);
wWdth = [[wall objectAtIndex:0]doubleValue];
blkWdth = [[wall objectAtIndex:1]doubleValue];
do
{
NSLog(#" INSIDE WHILE %.f",wdivision);
wWdth = wWdth;
wWdth += 0.1;
wdivision = fmod(qWdth,blkWdth);
} while (wdivision ==! 0);
NSString* newWdth = [NSString stringWithFormat:#"%.f", wWdth];
self.wallWidth.text=newWdth;
}
Your while loop condition has several problems. It is based on the value of wdivision and the value of wdivision is from fmod(qWdth,blkWdth). But neither qWdth nor blkWdth ever change. So your loop will either execute once or forever.
Your loop condition needs to be based on a value that will eventually become false so the loop terminates at some point.
Also, you are using ==! for (what I assume) to be the "not equal" operator. The "not equal" operator is !=. What you have is really:
while (wdivision == !0);
and !0 equates to 1 so you really have:
while (wdivision == 1);
So your loop will end after one iteration unless is just so happens that wdivision is equal to 1. And if it is, your loop will never end.
Since wdivision is a double based on the fmod function, it may never actually equal 0 exactly. You should probably use this instead:
while (division > DBL_EPSILON);
But again, you need to change how wdivision is calculated so it is different each iteration.
If you want to stick with the code you have, recognize that "getting within half of the increment" is as close as you can get. So the lines
wWdth += 0.1;
wdivision = fmod(qWdth,blkWdth);
} while (wdivision ==! 0);
Should be replaced with
wWdth += increment;
wdivision = fmod(qWdth,blkWdth);
} while (abs(wdivision) > 0.5*increment);
Maybe I'm not understanding the question completely correctly, so I apologize if I don't. You say you ask for a tile size right? And then you grow/shrink the wall if the tiles can't fit perfectly into that wall size? If so, can't you just take the current wall size and set it to the next multiple (up or down) of the tile size based on when you click grow/shrink wall? Setting it to a multiple of the tile size will allow for the tiles to fit exactly in it.

audio sequencer with swing (shuffle) Obj-C

I'm working on a drum computer with sequencer for the iPad. The drum computer is working just fine and writing the sequencer wasn't that much of a problem either. However, the sequencer is currently only capable of a straight beat (each step has equal duration). I would like to add a swing (or shuffle as some seem to call it) option, but I'm having trouble figuring out how.
'Swing' according to Wikipedia
Straight beat (midi, low volume)
Beat with Swing (midi, low volume)
If I understand correctly, swing is pretty much achieved by offsetting the eights notes between the 1-2-3-4 with a configurable amount. So instead of
1 + 2 + 3 + 4 +
it becomes something like
1 +2 +3 +4 +
The linked midi files illustrate this better...
However, the sequencer works with 1/16th or even 1/32th steps, so if the 2/8th (4/16th) note is offset, how would that affect the 5/16th note.
I'm probably not approaching this the correct way. Any pointers?
Sequencer code
This is the basics of how I implemented the sequencer. I figured altering the stepDuration at certain points should give me the swing effect I want, but how?
#define STEPS_PER_BAR 32
// thread
- (void) sequencerLoop
{
while(isRunning)
{
NSAutoreleasePool* pool = [[NSAutoreleasePool alloc] init];
// prepare for step
currentStep++;
if(currentStep >= STEPS_PER_BAR * activePatternNumBars)
currentStep = 0;
// handle the step/tick
...
//calculate the time to sleep until the next step
NSTimeInterval stepDuration = (60.0f / (float)bpm) / (STEPS_PER_BAR / 4);
nextStepStartTime = nextStepStartTime + stepDuration;
NSTimeInterval now = [NSDate timeIntervalSinceReferenceDate];
// sleep if there is time left
if(nextStepStartTime > now)
[NSThread sleepUntilDate:[NSDate dateWithTimeIntervalSinceReferenceDate:nextStepStartTime]];
else {
NSLog(#"WARNING: sequencer loop is lagging behind");
}
[pool release];
}
}
Edit: added code
I'm not familiar with the sequencer on iOS, but usually sequencers subdivide steps or beats into "ticks", so the way to do this would be to shift the notes that don't fall right on a beat back by a few "ticks" durring playback. So if the user programmed:
1 + 2 + 3 + 4 +
Instead of playing it back like that, you shift any notes falling on the "and" back by however many ticks (depending on exactly where it falls, how much "swing" was used, and how many "ticks" per beat)
1 + 2 + 3 + 4 +
Sorry if that's not much help, or if I'm not much more than restating the question, but the point is you should be able to do this, probably using something called "ticks". You may need to access another layer of the API to do this.
Update:
So say there are 32 ticks per beat. That means the "+" in the diagram above is tick # 16 -- that's what needs to be shifted. (that's not really a lot of resolution, so having more ticks is better).
Lets call the amount we move it, the "swing factor", s. For no swing, s = 1, for "infinite" swing, s=2. You probably want to use a value like 1.1 or 1.2. For simplicity, we'll use linear interpolation to determine the new position. (As a side note, for more on linear interpolation and how it pertains to audio, I wrote a little tutorial) we need to break the time before and after 16 into two sections, since the time before is going to be stretched and the time after is going to be compressed.
if( tick <= 16 )
tick *= s; //stretch
else
tick = (2-s)*tick + 32*(s-1) //compress
How you deal with rounding is up to you. Obviously, you'll want to do this on playback only and not store the new values, since you won't be able to recover the original value exactly due to rounding.
Change the number of steps to 12 instead of 16. Then each beat has 3 steps instead of 4. Triplets instead of 16th notes. Put sounds on the first and third triplet and it swings. Musicians playing swing use the second triplet also.
Offsetting the notes to create a shuffle does not give you access to the middle triplet.

Getting any point along an NSBezier path

For a program I'm writing, I need to be able to trace a virtual line (that is not straight) that an object must travel along. I was thinking to use NSBezierPath to draw the line, but I cannot find a way to get any point along the line, which I must do so I can move the object along it.
Can anyone suggest a way to find a point along an NSBezierPath? If thats not possible, can anyone suggest a method to do the above?
EDIT: The below code is still accurate, but there are much faster ways to calculate it. See Introduction to Fast Bezier and Even Faster Bezier.
There are two ways to approach this. If you just need to move something along the line, use a CAKeyframeAnimation. This is pretty straightforward and you never need to calculate the points.
If on the other hand you actually need to know the point for some reason, you have to calculate the Bézier yourself. For an example, you can pull the sample code for Chapter 18 from iOS 5 Programming Pushing the Limits. (It is written for iOS, but it applies equally to Mac.) Look in CurvyTextView.m.
Given control points P0_ through P3_, and an offset between 0 and 1 (see below), pointForOffset: will give you the point along the path:
static double Bezier(double t, double P0, double P1, double P2,
double P3) {
return
pow(1-t, 3) * P0
+ 3 * pow(1-t, 2) * t * P1
+ 3 * (1-t) * pow(t, 2) * P2
+ pow(t, 3) * P3;
}
- (CGPoint)pointForOffset:(double)t {
double x = Bezier(t, P0_.x, P1_.x, P2_.x, P3_.x);
double y = Bezier(t, P0_.y, P1_.y, P2_.y, P3_.y);
return CGPointMake(x, y);
}
NOTE: This code violates one of my cardinal rules of always using accessors rather than accessing ivars directly. It's because in it's called many thousands of times, and eliminating the method call has a significant performance impact.
"Offset" is not a trivial thing to work out. It does not proceed linearly along the curve. If you need evenly spaced points along the curve, you'll need to calculate the correct offset for each point. This is done with this routine:
// Simplistic routine to find the offset along Bezier that is
// aDistance away from aPoint. anOffset is the offset used to
// generate aPoint, and saves us the trouble of recalculating it
// This routine just walks forward until it finds a point at least
// aDistance away. Good optimizations here would reduce the number
// of guesses, but this is tricky since if we go too far out, the
// curve might loop back on leading to incorrect results. Tuning
// kStep is good start.
- (double)offsetAtDistance:(double)aDistance
fromPoint:(CGPoint)aPoint
offset:(double)anOffset {
const double kStep = 0.001; // 0.0001 - 0.001 work well
double newDistance = 0;
double newOffset = anOffset + kStep;
while (newDistance <= aDistance && newOffset < 1.0) {
newOffset += kStep;
newDistance = Distance(aPoint,
[self pointForOffset:newOffset]);
}
return newOffset;
}
I leave Distance() as an exercise for the reader, but it's in the example code of course.
The referenced code also provides BezierPrime() and angleForOffset: if you need those. Chapter 18 of iOS:PTL covers this in more detail as part of a discussion on how to draw text along an arbitrary path.

Bouncing ball not conforming to Conservation of Energy Rule

I am currently busy on writing a small ball physics engine for my programming course in Win32 API and c++. I have finished the GDI backbuffer renderer and the whole GUI (couple of more things to adjust) but i am very near to completion. The only big obstacles that last are ball to ball collision (but i can fix this on my own) but the biggest problem of them all is the bouncing of the balls. What happens is that i throw a ball and it really falls, but once it bounces it will bounce higher than the point were i released it??? the funny thing is, it only happens if below a certain height. This part is the physics code:
(If you need any more code or explanation, please ask, but i would greatly appreciate it if you guys could have a look at my code.)
#void RunPhysics(OPTIONS &o, vector<BALL*> &b)
{
UINT simspeed = o.iSimSpeed;
DOUBLE DT; //Delta T
BOOL bounce; //for playing sound
DT= 1/o.REFRESH;
for(UINT i=0; i<b.size(); i++)
{
for(UINT k=0; k<simspeed; k++)
{
bounce=false;
//handle the X bounce
if( b.at(i)->rBall.left <= 0 && b.at(i)->dVelocityX < 0 ) //ball bounces against the left wall
{
b.at(i)->dVelocityX = b.at(i)->dVelocityX * -1 * b.at(i)->dBounceCof;
bounce=true;
}
else if( b.at(i)->rBall.right >= SCREEN_WIDTH && b.at(i)->dVelocityX > 0) //ball bounces against the right wall
{
b.at(i)->dVelocityX = b.at(i)->dVelocityX * -1 * b.at(i)->dBounceCof;
bounce=true;
}
//handle the Y bounce
if( b.at(i)->rBall.bottom >= SCREEN_HEIGHT && b.at(i)->dVelocityY > 0 ) //ball bounces against the left wall
{
//damping of the ball
if(b.at(i)->dVelocityY < 2+o.dGravity/o.REFRESH)
{
b.at(i)->dVelocityY = 0;
}
//decrease the Velocity of the ball according to the bouncecof
b.at(i)->dVelocityY = b.at(i)->dVelocityY * -1*b.at(i)->dBounceCof;
b.at(i)->dVelocityX = b.at(i)->dVelocityX * b.at(i)->dBounceCof;
bounce=true;
}
//gravity
b.at(i)->dVelocityY += (o.dGravity)/o.REFRESH;
b.at(i)->pOrigin.y += b.at(i)->dVelocityY + (1/2)*o.dGravity/o.REFRESH*DT*METER;
//METER IS DEFINED GLOBALLY AS 100 which is the amount of pixels in a meter
b.at(i)->pOrigin.x += b.at(i)->dVelocityX/o.REFRESH*METER;
b.at(i)->UpdateRect();
}
}
return;
}
You are using the Euler method of integration. It is possible that your time step (DT) is too large. Also there seems to be a mistake on the row that updates the Y coordinate:
b.at(i)->pOrigin.y += b.at(i)->dVelocityY + (1/2)*o.dGravity/o.REFRESH*DT*METER;
You have already added the gravity to the velocity, so you don't need to add it to the position and you are not multiplying the velocity by DT. It should be like this:
b.at(i)->pOrigin.y += b.at(i)->dVelocityY * DT;
Furthermore there appears to be some confusion regarding the units (the way METER is used).
Okay, a few things here.
You have differing code paths for bounce against left wall and against right wall, but the code is the same. Combine those code paths, since the code is the same.
As to your basic problem: I suspect that your problem stems from the fact that you apply the gravity after you apply any damping forces / bounce forces.
When do you call RunPhysics? In a timer loop? This code is just an approximation and no exact calculation. In the short interval of delta t, the ball has already changed his position and velocity a litte bit which isn't considered in your algorithm and produces little mistakes. You'll have to compute the time until the ball hits the ground and predict the changes.
And the gravity is already included in the velocity, so don't add it twice here:
b.at(i)->pOrigin.y += b.at(i)->dVelocityY + (1/2)*o.dGravity/o.REFRESH*DT*METER;
By the way: Save b.at(i) in a temporary variable, so you don't have to recompute it in every line.
Ball* CurrentBall = b.at(i);
ANSWER!!ANSWER!!ANSWER!! but i forgot my other account so i can't flag it :-(
Thanks for all the great replies, it really helped me alot! The answers that you gave were indeed correct, a couple of my formulas were wrong and some code optimisation could be done, but none was really a solution to the problem. So i just sat down with a piece of paper and started calculation every value i got from my program by hand, took me like two hours :O But i did find the solution to my problem:
The problem is that as i update my velocity (whith corrected code) i get a decimal value, no problem at all. Later i increase the position in Y by adding the velocity times the Delta T, which is a verry small value. The result is a verry small value that needs to be added. The problem is now that if you draw a Elipse() in Win32 the point is a LONG and so all the decimal values are lost. That means that only after a verry long period, when the values velocity starts to come out of the decimal values something happens, and that alongside with that, the higher you drop the ball the better the results (one of my symptons) The solution to this problem was really simple, ad an extra DOUBLE value to my Ball class which contained the true position (including decimals) of my ball. During the RenderFrame() you just take the floor or ceiling value of the double to draw the elipse but for all the calculations you use the Double value. Once again thanks alot for all your replies, STACKOVERFLOW PEOPLE ROCK!!!
If your dBounceCof is > 1 then, yes your ball will bounce higher.
We do not have all the values to be able to reply to your question.
I don't think your equation for position is right:
b.at(i)->dVelocityY += (o.dGravity)/o.REFRESH;
This is v=v0+gt - that seems fine, although I'd write dGravity*DT instead of dGravity/REFRESH_FREQ.
b.at(i)->pOrigin.y += b.at(i)->dVelocityY + (1/2)*o.dGravity/o.REFRESH*DT*METER;
But this seems off: It is eqivalent to p = p0+v + 1/2gt^2.
You ought to multiply velocity * time to get the units right
You are scaling the gravity term by pixels/meter, but not the velocity term. So that ought to be multiplied by METER also
You have already accounted for the effect of gravity when you updated velocity, so you don't need to add the gravity term again.
Thanks for the quick replies!!! Sorry, i should have been more clear, the RunPhysics is beiing run after a PeekMessage. I have also added a frame limiter which makes sure that no more calculations are done per second than the refresh rate of the monitor. My dleta t is therefore 1 second devided by the refresh rate. Maybe my DT is actually too small to calculate, although it's a double value??? My cof of restitution is adjustable but starts at 0.9
You need to recompute your position on bounce, to make sure you bounce from the correct place on the wall.
I.e. resolve the exact point in time when the bounce occured, and calculate new velocity/position based on that direction change (partially into a "frame" of calculation) to make sure your ball does not move "beyond" the walls, more and more on each bounce.
W.r.t. time step, you might want to check out my answer here.
In a rigid body simulation, you need to run the integration up to the instant of collision, then adjust the velocities to avoid penetration at the collision, and then resume the integration. It's sort of an instantaneous kludge to cover the fact that rigid bodies are an approximation. (A real ball deforms during a collision. That's hard to model, and it's unnecessary for most purposes.)
You're combining these two steps (integrating the forces and resolving the collisions). For a simple simulation like you've shown, it's probably enough to skip the gravity bit on any iteration where you've handled a vertical bounce.
In a more advanced simulation, you'd split any interval (dt) that contains a collision at the actual instance of collision. Integrate up to the collision, then resolve the collision (by adjusting the velocity), and then integrate for the rest of the interval. But this looks like overkill for your situation.