So I just recently kind of had a breakthrough in Neural Nets and made a couple of games with NN AI's. For training, I use frameRate(100000) to jack the frame rate up. However, checking with println(frameRate) I see that the average frame rate is about 270. Removing all displays (drawing shapes pretty much) increases it to about 300.
I'd like to make it faster, I noticed the documentation states the frameRate() only goes as high as your processor can handle, but checking with task manager I see the program is only using about 20% of my CPU and only 90MB. I've increased the maximum available memory to 4096MB in preferences, but that didn't seem to make a difference.
So I guess my question is, how do I allow processing to use more of my CPU for faster frameRate [or is there a better option other than simply "optimizing my code", because its already fairly optimized IMO (not saying it couldn't be better though)].
Keep in mind that even with a very high framerate, the mechanisms that call draw() have some overhead that you don't need if you aren't drawing anything. Your computer might limit the frame rate depending on your graphics settings. Also note that the println() statement itself is very slow, so you should not use it for continuously printing out your frame rate.
If you aren't drawing anything (or if you're only drawing a single frame), you can probably just use a basic loop instead of the draw() function.
Instead, try something like this:
boolean running = true;
while(running){
// do your processing
if(done){
running = false;
}
}
Related
I am playing around on Shadertoy and I kept running into a hard loop count of about 12000 iterations, so I decided to check just how many calculations per frame it could do without dropping frames. Sure enough, the shaders don't seem to be able to anything more than 12000 calculations per frame without the frame rate dropping. This seems odd because I had thought that shaders run directly on the GPU, which regularly does way more calculations per polygon (like 100 calculations per 150K polygons!) with rasterizer software like OpenGL and Vulkan. So essentially my question is, how can I directly send calculations through the GPU like a rasterizer does to get the speedy data crunching that rasterizers do?
I have a (desktop) LabVIEW program running several large While loops. Each loop corresponds to the functions on an IO card in a myRIO DAQ system. Each card operates at a different speed, therefore each loop and subVIs in my code run at different speeds as well.
However, I'm now finding that I need to pass data from a low speed loop to a high speed loop, and I'm not sure how to best go about it.
The low speed loop actually connects via TCP to a Yokogowa power analyzer, and the loop speed is 50ms (20Hz). The high speed loop runs at 50kHz, and performs math operations using inputs from a high speed ADC to calculate motor torque, and needs the info from the low speed loop (power analyzer) to proceed. There's an 816:1 data flow difference.
At runtime, it appears to work fine, until I spin the motor up, then the overtorque routine kicks in and shuts me down.
So I next tried to queue the data, and that only significantly slowed the high speed loop.
That being said, my thought was to take the incoming data on the low speed loop, and fill an array with that data (816 deep) and queue it up to the high speed loop, but I'm not quite certain how to go about that as well.
How should I accomplish what I'm trying to do in a more efficient and proper manner?
Look to the Real-Time FIFO palette. The functions here create and operate a lockless-FIFO system explicitly designed for passing data in a deterministic way between loops. Used correctly, they guarantee that the slower loop, trying to write data, will not lock the FFO in a way that throws the faster loop off of its schedule.
You can find a simple example of the RT FIFO code here. You'll find more in the LabVIEW shipping examples.
If the high speed loop is running faster then it only really needs the latest value so you need a variable/tag type communication.
Depending on what you are aware of already there are a few options:
Local/Global Variable
Functional Global Variable (but globals are faster)
Notifier (if you use get staus you can read this like a variable.
I would pick one you are comfortable with and try that.
In a soccer game, I am computing a steering force using steering behaviors. This part is ok.
However, I am looking for the best way to implement simple 2d human locomotion.
For instance, the players should not "steer" (or simply add acceleration computed from steering force) to its current velocity when the cos(angle) between the steering force and the current velocity or heading vectors is lower than 0.5 because it looks as if the player is a vehicule. A human, when there is an important change of direction, slows down and when it has slowed enough, it starts accelerating in the new direction.
Does anyone have any advice, ideas on how to achieve this behavior? Thanks in advance.
Make it change direction very quickly but without perfect friction. EG super mario
Edit: but feet should not slide - use procedural animation for feet
This is already researched and developed in an initiative called "Robocup". They have a simulation 2D league that should be really similar to what you are trying to accomplish.
Here's a link that should point you to the right direction:
http://wiki.robocup.org/wiki/Main_Page
Maybe you could compute the curvature. If the curvature value is to big, the speed slows down.
http://en.wikipedia.org/wiki/Curvature
At low speed a human can turn on a dime. At high speed only very slight turns require no slowing. The speed and radius of the turn are thus strongly correlated.
How much a human slows down when aiming toward a target is actually a judgment call, not an automatic computation. One human might come to almost a complete stop, turn sharply, and run directly toward the target. Another human might slow only a little and make a wide curving arc—even if this increases the total length to the target. The only caveat is that if the desired target is inside the radius of the curve at the current speed, the only reasonable path is to slow since it would take a wide loop far from the target in order to reach it (rather than circling it endlessly).
Here's how I would go about doing it. I apologize for the Imperial units if you prefer metric.
The fastest human ever recorded traveled just under 28 mph. Each of your human units should be given a personal top speed between 1 and 28 mph.
Create a 29-element table of the maximum acceleration and deceleration rates of a human traveling at each whole mph in a straight line. It doesn't have to be exact--just approximate accel and decel values for each value. Create fast, medium, slow versions of the 29-element table and assign each human to one of these tables. The table chosen may be mapped to the unit's top speed, so a unit with a max of 10mph would be a slow accelerator.
Create a 29-element table of the sharpest radius a human can turn at that mph (0-28).
Now, when animating each human unit, if you have target information and must choose an acceleration from that, the task is harder. If instead you just have a force vector, it is easier. Let's start with the force vector.
If the force vector's net acceleration and resultant angle would exceed the limit of the unit's ability, restrict the unit's new vector to the maximum angle allowed, and also decelerate the unit at its maximum rate for its current linear speed.
During the next clock tick, being slower, it will be able to turn more sharply.
If the force vector can be entirely accommodated, but the unit is traveling slower than its maximum speed for that curvature, apply the maximum acceleration the unit has at that speed.
I know the details are going to be quite difficult, but I think this is a good start.
For the pathing version where you have a target and need to choose a force to apply, the problem is a bit different, and even harder. I'm out of ideas for now--but suffice it to say that, given the example condition of the human already running away from the target at top stpeed, there will be a best-time path that is between on the one hand, slowing enough while turning to complete a perfect arc to the target, and on the other hand stopping completely, rotating completely and running straight to the target.
I'm drawing quads in openGL. My question is, is there any additional performance gain from this:
// Method #1
glBegin(GL_QUADS);
// Define vertices for 10 quads
glEnd();
... over doing this for each of the 10 quads:
// Method #2
glBegin(GL_QUADS);
// Define vertices for first quad
glEnd();
glBegin(GL_QUADS);
// Define vertices for second quad
glEnd();
//etc...
All of the quads use the same texture in this case.
Yes, the first is faster, because each call to glBegin or glEnd changes the OpenGL state.
Even better, however, than one call to glBegin and glEnd (if you have a significant number of vertices), is to pass all of your vertices with glVertexPointer (and friends), and then make one call to glDrawArrays or glDrawElements. This will send all your vertices to the GPU in one fell swoop, instead of incrementally by calling glVertex3f repeatedly.
From a function call overhead perspective the second approach is more expensive. If instead of ten quads we used ten thousand. Then glBegin/glEnd would be called ten thousand times per frame instead of once.
More importantly glBegin/glEnd have been deprecated as of OpenGL 3.0, and are not supported by OpenGL ES.
Instead vertices are uploaded as vertex arrays using calls such as glDrawArrays. Tutorial and much more in depth information can be found on the NeHe site.
I decided to go ahead and benchmark it using a loop of 10,000 quads.
The results:
Method 1: 0.0128 seconds
Method 2: 0.0132 seconds
Method #1 does have some improvement, but the improvement is very marginal (3%). It's probably nothing more than the overhead of simply calling more functions. So it's likely that OpenGL itself doesn't get any additional optimization from Method #1.
This is on Windows XP service pack 3 using OpenGL 2.0 and visual studio 2005.
I believe the answer is yes, but you should try it out yourself. Write something to draws 100k quads and see if one is much faster. Then report your results here :)
schnaader: What is meant in the document you read is that you should not have non-gl related code between glBegin and glEnd. They do not mean that you should call it multiple times over calling it in short bits.
I suppose that you get the highest performance gain by reusing the vertices. To achieve that, you would require to maintain some structure for primitives yourself.
You would get better performance for sure in just how much code gets called by the CPU.
Whether or not your drawing performance would be better on the GPU, that would completely depend on the implementation of the driver for your 3d graphics card. You could get potentially wildly different results with a different manufacturer's driver and even with a different version of the driver for the same card.
This is what happens:
The drawGL function is called at the exact end of the frame thanks to a usleep, as suggested. This already maintains a steady framerate.
The actual presentation of the renderbuffer takes place with drawGL(). Measuring the time it takes to do this, gives me fluctuating execution times, resulting in a stutter in my animation. This timer uses mach_absolute_time so it's extremely accurate.
At the end of my frame, I measure timeDifference. Yes, it's on average 1 millisecond, but it deviates a lot, ranging from 0.8 milliseconds to 1.2 with peaks of up to more than 2 milliseconds.
Example:
// Every something of a second I call tick
-(void)tick
{
drawGL();
}
- (void)drawGL
{
// startTime using mach_absolute_time;
glBindRenderbufferOES(GL_RENDERBUFFER_OES, viewRenderbuffer);
[context presentRenderbuffer:GL_RENDERBUFFER_OES];
// endTime using mach_absolute_time;
// timeDifference = endTime - startTime;
}
My understanding is that once the framebuffer has been created, presenting the renderbuffer should always take the same effort, regardless of the complexity of the frame? Is this true? And if not, how can I prevent this?
By the way, this is an example for an iPhone app. So we're talking OpenGL ES here, though I don't think it's a platform specific problem. If it is, than what is going on? And shouldn't this be not happening? And again, if so, how can I prevent this from happening?
The deviations you encounter maybe be caused by a lot of factors, including OS scheduler that kicks in and gives cpu to another process or similar issues. In fact normal human won't tell a difference between 1 and 2 ms render times. Motion pictures run at 25 fps, which means each frame is shown for roughly 40ms and it looks fluid for human eye.
As for animation stuttering you should examine how you maintain constant animation speed. Most common approach I've seen looks roughly like this:
while(loop)
{
lastFrameTime; // time it took for last frame to render
timeSinceLastUpdate+= lastFrameTime;
if(timeSinceLastUpdate > (1 second / DESIRED_UPDATES_PER_SECOND))
{
updateAnimation(timeSinceLastUpdate);
timeSinceLastUpdate = 0;
}
// do the drawing
presentScene();
}
Or you could just pass lastFrameTime to updateAnimation every frame and interpolate between animation states. The result will be even more fluid.
If you're already using something like the above, maybe you should look for culprits in other parts of your render loop. In Direct3D the costly things were calls for drawing primitives and changing render states, so you might want to check around OpenGL analogues of those.
My favorite OpenGL expression of all times: "implementation specific". I think it applies here very well.
A quick search for mach_absolute_time results in this article: Link
Looks like precision of that timer on an iPhone is only 166.67 ns (and maybe worse).
While that may explain the large difference, it doesn't explain that there is a difference at all.
The three main reasons are probably:
Different execution paths during renderbuffer presentation. A lot can happen in 1ms and just because you call the same functions with the same parameters doesn't mean the exact same instructions are executed. This is especially true if other hardware is involved.
Interrupts/other processes, there is always something else going on that distracts the CPU. As far as I know the iPhone OS is not a real-time OS and so there's no guarantee that any operation will complete within a certain time limit (and even a real-time OS will have time variations).
If there are any other OpenGL calls still being processed by the GPU that might delay presentRenderbuffer. That's the easiest to test, just call glFinish() before getting the start time.
It is best not to rely on a high constant frame rate for a number of reasons, the most important being that the OS may do something in the background that slows things down. Better to sample a timer and work out how much time has passed each frame, this should ensure smooth animation.
Is it possible that the timer is not accurate to the sub ms level even though it is returning decimals 0.8->2.0?