Optimizing Actionscript performance - optimization

I am setting out for a visualization project that will generate 1000+ sprites from dynamic data. The toolkit I am using (Flare) requires some optimization. I am trying to figure out some optimization techniques for Flash. How can I make Flash run fast when there are so many sprites on the stage, or maybe there is an optimization technique that doesn't involve generating so many sprites?

One good way of doing is freeze animations which are not visible to the user. But the complication with this is that, you need to remember the state from which the animation has to resume or refers the animation based on the current state of the whole application. Since you have so many sprites generated, make sure that you group them logically. This would help in easily implement the freezing logic.

Related

Choosing game model design

I need help designing a game where characters
have universal actions(sit, jump, etc.) or same across all characters; roughly 50 animations
unique attack patterns(different attacks) roughly 6 animations per character
item usage attacks(same across all characters) roughly 4 animations per item which could scale to 500+
What would be the best way to design this? I use blender for animations. And I just started a week ago.
I’m thinking of using either one model for everything and limiting actions or to create multiple and import those separately. Any help is appreciated!
Edit: also considering optimization since I don’t want lag to incur; making a mmo like game.
There is an initial release (MIT License) of the module GodotAnimationRetargeting that I referenced in comments. Update: There is a GDScript version now.
Usually in Godot you have an animation player with the animations tied to a given model. Which means you would have to add them for all the models. However, this module allows you apply animations from an animation player to another model. You can also apply them partially (e.g. only rotation, or position or scaling of bones).
That should help you have a common set of animation applied to different models.
Being a module it requires to compile Godot using it. See Compiling on the Godot docs.

Can we control progressive rendering in the Viewer based on the distance to the camera?

We have to work with very large models and we're hoping to use the first person camera to walk through them, and eventually do this in VR. The progressive rendering does wonders for improving perceived responsiveness, but it can be disorienting to have so many items around you disappear as you move.
Is there any way to turn progressive rendering off but only for objects closest to the camera? Maybe a number of objects up to a maximum number, or objects within a radius from the camera. Everything further away can load in later and flicker during motion, but it would be nice to keep the nearby objects rendered, especially structural objects like stairs. I've often walked towards stairs just to have them disappear in front of me, forcing me to fly to a platform with E and Q instead of walking.
So far I've only found a way to toggle progressive rendering for the entire model on or off with viewer.setProgressiveRendering(bool) but I haven't found a way to customize the rendering behavior.
Per our Engineering's recommendation can you try set rendering targets with
viewer.impl.setFPSTargets(1, 5, 15) //min, target, max
In fact we've been having similar reports from other developers requesting similar capability to fine tune rendering with large models so our Engineering is considering their options to extend on existing functions and even build them into extensions.

When to use VK_IMAGE_LAYOUT_GENERAL

It isn't clear to me when it's a good idea to use VK_IMAGE_LAYOUT_GENERAL as opposed to transitioning to the optimal layout for whatever action I'm about to perform. Currently, my policy is to always transition to the optimal layout.
But VK_IMAGE_LAYOUT_GENERAL exists. Maybe I should be using it when I'm only going to use a given layout for a short period of time.
For example, right now, I'm writing code to generate mipmaps using vkCmdBlitImage. As I loop through the sub-resources performing the vkCmdBlitImage commands, should I transition to VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL as I scale down into a mip, then transition to VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL when I'll be the source for the next mip before finally transitioning to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL when I'm all done? It seems like a lot of transitioning, and maybe generating the mips in VK_IMAGE_LAYOUT_GENERAL is better.
I appreciate the answer might be to measure, but it's hard to measure on all my target GPUs (especially because I haven't got anything running on Android yet) so if anyone has any decent rule of thumb to apply it would be much appreciated.
FWIW, I'm writing Vulkan code that will run on desktop GPUs and Android, but I'm mainly concerned about performance on the latter.
You would use it when:
You are lazy
You need to map the memory to host (unless you can use PREINITIALIZED)
When you use the image as multiple incompatible attachments and you have no choice
For Store Images
( 5. Other cases when you would switch layouts too much (and you don't even need barriers) relatively to the work done on the images. Measurement needed to confirm GENERAL is better in that case. Most likely a premature optimalization even then.
)
PS: You could transition all the mip-maps together to TRANSFER_DST by a single command beforehand and then only the one you need to SRC. With a decent HDD, it should be even best to already have them stored with mip-maps, if that's a option (and perhaps even have a better quality using some sophisticated algorithm).
PS2: Too bad, there's not a mip-map creation command. The cmdBlit most likely does it anyway under the hood for Images smaller than half resolution....
If you read from mipmap[n] image for creating the mipmap[n+1] image then you should use the transfer image flags if you want your code to run on all Vulkan implementations and get the most performance across all implementations as the flags may be used by the GPU to optimize the image for reads or writes.
So if you want to go cross-vendor only use VK_IMAGE_LAYOUT_GENERAL for setting up the descriptor that uses the final image and not image reads or writes.
If you don't want to use that many transitions you may copy from a buffer instead of an image, though you obviously wouldn't get the format conversion, scaling and filtering that vkCmdBlitImage does for you for free.
Also don't forget to check if the target format actually supports the BLIT_SRC or BLIT_DST bits. This is independent of whether you use the transfer or general layout for copies.

Cocos2d moving nodes is choppy

In my upcoming iPhone game different scene elements are split up into their own CCNode.
My Obstacle node contains many nodes, each representing an obstacle. Inside every obstacle node are the images that make up the obstacle (1 - 4 images), and there are only ~10 obstacles at a time. Every update my game calls the update function in the Obstacle node, which moves every obstacle to the left. But this slows down my game quite a bit.
At the same time, I have a particle node that just contains images and moves them all every frame exactly the same way the Obstacle node does, but it has no noticeable effect on performance. But it has hundreds of images at a time.
My question is why do the obstacles slow it down so much but the particles don't? I have even tried replacing the images used in the obstacles with the ones in the particles and it makes no (noticeable) difference. Would it be that there is another level of child nodes?
You will dramatically increase the app's performance, run speed, frame rate and more if you put all your images in a texture atlas and rendering them once as a batch using the CCSpriteBatchNode class. If you are moving lots of objects around on the screen a lot, this makes the hardware work a lot less.
Using this class is easy. Create the class with a texture atlas that contains all your images, and then add this class as a child to your layer, just as you would a sprite.
However, when you create sprites, add them as children to this batch node, not as children to the layer.
It's very easy and will probably help you quite a lot here.
From what I recall of the Cocos2d documentation, particles are intended to be VERY lightweight so you can have many, many of them on screen at once. Nodes are heavier, require more processing between frames as they interact with the physics system and requiring node-specific rendering. The last time I looked at the render loop code, it was basically O(n) based on the number of CCnodes you had in a scene. Using NSTimers versus Cocos' built in run loop also makes quite a bit of difference in performance.
Could you provide an example of something that slows down a lot? Exactly how do you update these Obstacles?
The cocos2d documentation has some best practices that all, in one way or another, touch on performance. There's a LOT you can do to optimize your frames per second.
In general, when your code is slow, it helps to use Instruments.app to figure out where your code is spending so much time. Since you're using a framework this will be less helpful because you'll end up finding out what functions your code spends a lot of time in, and then figure out how to reduce that via the framework's best practices or other optimizations. There are a few good blog posts on improving performance, I found this one very helpful.

Per frame optimization for large datasets

Summary
New to iPhone programming, I'm having trouble picking the right optimization strategy to filter a set of view components in a scrollview with huge content. In what area would my app gain the most performance?
Introduction
My current iPad app-in-progress let's users explore fairly large binary tree structures. The trees contain between 30 to 900 nodes, and when drawing inside a scrollview (with limited zoom) it looks like this.
The nodes' contents are stored in a SQLite backed Core Data model. It's a binary tree and if a node has children, there are always exactly two. The x and y positions are part of the model, as are the dimensions of the node connections, shown as dotted lines.
Optimization
Only about 50 nodes fit the screen at any given time. With the largest trees containing up to 900 nodes, it's not possible to put everything in a scrollview controlled and zooming UIView, that's a recipe for crashes. So I have to do per frame filtering of the nodes.
And that's where my troubles start. I don't have the experience to make a well founded decision between the possible filtering options, and in addition I probably don't know about that really fast special magic buried deep in Objective-C or Cocoa Touch. Because the backing store is close to 200 MB in size (some 90.000 nodes in hundreds of trees), it's very time consuming to test every single tree on the iPad device. Which is why I'd like to ask you guys for advice.
For all my attempts I'm putting a filter method in the scrollViewDidScroll: and scrollViewDidZoom:. I'm also blocking the main thread with the filter, because I can't show the content without the nodes anyway. But maybe someone has an idea in that area?
Because all the positioning is present in the Core Data model, I might use NSFetchRequest to do the filtering. Is that really fast though? I have the idea it's not a very optimized method.
From what I've tried, the faulted managed objects seem to fit in memory at once, but it might be tricky for the larger trees once their contents start firing faults. Is it a good idea to loop over the NSSet of nodes and see what items should be on screen?
Are there other tricks to gain performance? Would you see ways where I could use multi threading to get the display set faster, even though the model's context was created on the main thread?
Thanks for your advice,
EP.
Ironically your binary tree could be divided using Binary Space Partitioning done in 2D so rendering will be very fast performant and a number of check close to minimum necessary.