Per frame optimization for large datasets - objective-c

Summary
New to iPhone programming, I'm having trouble picking the right optimization strategy to filter a set of view components in a scrollview with huge content. In what area would my app gain the most performance?
Introduction
My current iPad app-in-progress let's users explore fairly large binary tree structures. The trees contain between 30 to 900 nodes, and when drawing inside a scrollview (with limited zoom) it looks like this.
The nodes' contents are stored in a SQLite backed Core Data model. It's a binary tree and if a node has children, there are always exactly two. The x and y positions are part of the model, as are the dimensions of the node connections, shown as dotted lines.
Optimization
Only about 50 nodes fit the screen at any given time. With the largest trees containing up to 900 nodes, it's not possible to put everything in a scrollview controlled and zooming UIView, that's a recipe for crashes. So I have to do per frame filtering of the nodes.
And that's where my troubles start. I don't have the experience to make a well founded decision between the possible filtering options, and in addition I probably don't know about that really fast special magic buried deep in Objective-C or Cocoa Touch. Because the backing store is close to 200 MB in size (some 90.000 nodes in hundreds of trees), it's very time consuming to test every single tree on the iPad device. Which is why I'd like to ask you guys for advice.
For all my attempts I'm putting a filter method in the scrollViewDidScroll: and scrollViewDidZoom:. I'm also blocking the main thread with the filter, because I can't show the content without the nodes anyway. But maybe someone has an idea in that area?
Because all the positioning is present in the Core Data model, I might use NSFetchRequest to do the filtering. Is that really fast though? I have the idea it's not a very optimized method.
From what I've tried, the faulted managed objects seem to fit in memory at once, but it might be tricky for the larger trees once their contents start firing faults. Is it a good idea to loop over the NSSet of nodes and see what items should be on screen?
Are there other tricks to gain performance? Would you see ways where I could use multi threading to get the display set faster, even though the model's context was created on the main thread?
Thanks for your advice,
EP.

Ironically your binary tree could be divided using Binary Space Partitioning done in 2D so rendering will be very fast performant and a number of check close to minimum necessary.

Related

In Unity Combine Meshes Vs Instance Objects the Difference

I am in a serious need of optimization of my some Unity projects and i have so many objects which are from 3DsMax, so i am wondering if Combining the meshes would have any effect on the memory/performance or i should leave the objects Instance to each other as it would save me some space.
This arise the question that what is the difference between Combined mesh objects or Instance Objects as it will save a lot of memory and hassle if one realy knows the difference and what is better
Looking forward for some Brief information about the two
Thanks
Combining is useful if you have a lot of unique assets that only appear once or twice in a scene, e.g unique buildings in a 3D FPS, but not cloned houses in a SimCity style game. If you have a model that appears many times in a scene it's more performant to have Unity (automatically) batch them, this is Unity's default behaviour. e.g lets say your scene is in an art gallery; if the gallery contains a dozen distinct sculptures then combine them. If it contains a dozen of the same sculpture don't bother, Unity will batch them for you.
However, you should be wary of using different materials, each material adds to the draw count. So, if you had 10 of the same model but using 5 different materials it's going to be expensive. The way round this is to use a texture atlas with a single material, with different UV mapping for each models. This means you have a lot of different models, but save on render time due to the single material.
Also, be aware that transparent shaders much more expensive than opaque, if you have three semi transparent objects in front of each other that's at least 4 render passes.
As you probably know this is a complex subject with a lot of variables (many more than I can describe here) and is best judged by using the profiler.
Here are some general rules of thumb I've learned while creating a game for mobile which naturally is performance critical:
Use as few a materials as possible
Use as fewer textures as possible, share textures between materials
Recycle models as often as possible. Often a model oriented at a different angle or in a different material can look like a whole new model to the player, particularly if their attention is elsewhere in the game
Use LODS extensively
Ensure your models are clean, remove all unnecessary vertices before importing
After importing think if there's anything about the model that could be represented with less vertices
Good use of normal mapping can pay off, depending on the platform. If you can trade in 1000 verts for a 256 px normal map and 50 verts then do it, otherwise dont bother normal mapping just to save a few verts
I created a tutorial that explains draw calls, static batching, lightmapping etc.
https://www.youtube.com/watch?v=x0t2xylbTo8&t=253s

Best data structure for insert/update/delete and search of 2d point

I have a infrastructure for massive entities on a GIS map - using the graphic card for drawing (WebGL).
Today, I'm using quad-tree for indexing the data and querying - for selection/drawing purposes.
Recently I've found a way to enable my users to update the location of the entities and draw the change very fast. for example updating 15000 locations, and redrawing took less then 0.05 ms.
The problem now is updating my data structure. It is very very slow.
I've ran over many data structures, such as R-tree, B-Tree and more.. but not yet found satisfied result.
My question is
what is the optimal data structure for 2d points from performances point of view, for inserting/updating and query (by distance from point, rectangle) ?
Maybe there is a web gl solution for this ?
what is the optimal data structure for 2d points from performances
point of view, for inserting/updating and query (by distance from
point, rectangle) ?
It's hard to find a data structure that satisfies all of these beautifully. Typically for searches to be fastest, insertion has to be slower, for insertion to be faster, searches have to be slower, so it's a balancing act.
However, given your needs, a simple NxM grid might actually be ideal. I come originally from a gaming background where sprites can be moving at every single frame. Quad-trees are often too expensive for such dynamic content that wants to update the structure every single frame. As a result, a simple NxM grid often works better for those cases.
The NxM grid can do a rectangular query potentially even faster than the quad-tree, as it simply has to determine which grid cells overlap the rectangle and then simply check the elements within the cells to see if the overlap/are inside the rectangle. Same goes for a search within a radial parameter (circle).
Insertion and removal is dirt cheap since we only need to update the cells in which the elements overlap. If you're dealing with points, a point only belongs in one cell, and it becomes an extremely simple constant-time operation. In fact, if each point stores a list pointer for the next element, moving it from one cell to another (along with removing it from a cell and inserting it to a cell) only requires changing pointers around -- no memory allocation required. The grid then becomes a grid of head pointers for a singly-linked list.
Another approach is that I imagine you're currently using one massive quad-tree for the entire world. You can use a hybrid of these techniques by using a quad-tree to represent a small region, e.g., with quad-trees tiled together to form a grid of quad-trees. This can significantly reduce the cost to update a quad-tree for a specific region since each quad-tree stores only the data for that region.

Cocos2d moving nodes is choppy

In my upcoming iPhone game different scene elements are split up into their own CCNode.
My Obstacle node contains many nodes, each representing an obstacle. Inside every obstacle node are the images that make up the obstacle (1 - 4 images), and there are only ~10 obstacles at a time. Every update my game calls the update function in the Obstacle node, which moves every obstacle to the left. But this slows down my game quite a bit.
At the same time, I have a particle node that just contains images and moves them all every frame exactly the same way the Obstacle node does, but it has no noticeable effect on performance. But it has hundreds of images at a time.
My question is why do the obstacles slow it down so much but the particles don't? I have even tried replacing the images used in the obstacles with the ones in the particles and it makes no (noticeable) difference. Would it be that there is another level of child nodes?
You will dramatically increase the app's performance, run speed, frame rate and more if you put all your images in a texture atlas and rendering them once as a batch using the CCSpriteBatchNode class. If you are moving lots of objects around on the screen a lot, this makes the hardware work a lot less.
Using this class is easy. Create the class with a texture atlas that contains all your images, and then add this class as a child to your layer, just as you would a sprite.
However, when you create sprites, add them as children to this batch node, not as children to the layer.
It's very easy and will probably help you quite a lot here.
From what I recall of the Cocos2d documentation, particles are intended to be VERY lightweight so you can have many, many of them on screen at once. Nodes are heavier, require more processing between frames as they interact with the physics system and requiring node-specific rendering. The last time I looked at the render loop code, it was basically O(n) based on the number of CCnodes you had in a scene. Using NSTimers versus Cocos' built in run loop also makes quite a bit of difference in performance.
Could you provide an example of something that slows down a lot? Exactly how do you update these Obstacles?
The cocos2d documentation has some best practices that all, in one way or another, touch on performance. There's a LOT you can do to optimize your frames per second.
In general, when your code is slow, it helps to use Instruments.app to figure out where your code is spending so much time. Since you're using a framework this will be less helpful because you'll end up finding out what functions your code spends a lot of time in, and then figure out how to reduce that via the framework's best practices or other optimizations. There are a few good blog posts on improving performance, I found this one very helpful.

Efficient way to handle large runtime-generated tile maps?

I am coding a 2 dimensional, tile based (orthogonal tiles) iPhone game. All levels are procedurally generated when the app is first played, and then persist until the user wants a new map. Maps are rather large, being 1000 tiles in both width and height, and the terrain is destructible. At the moment it is rather similar to Terraria, but that will change.
To hold map/tile information I am currently using several 2 dimensional c style arrays. This works well, but I am concerned as to the amount of memory this takes up, as the arrays are all defined as short array[1000][1000], which takes up (1000 * 1000 * sizeof(short)) bytes of space.
This is not particularly desirable when the iPhone doesn't have an incredibly large amount of memory to work with, especially when the user is multitasking. The main problem is that there is no way that I can use a specific tile map format such as .tmx, because all the levels are procedurally generated. Performance could also be an issue, because if a tile is destroyed at index(x, y), then I need to change the data in that index. I have also thought about writing tile map data to a text file, but I think there would be difficulties or performance issues when accessing or changing data.
Keeping all this in mind, what would be an efficient and fast way to handle my tile data?
My gut feeling on this is Core Data structured such that each tile element has relationships to the tiles around it. There's some non-trivial overhead here, but the advantage is that you can release tiles that aren't onscreen from memory and fault them back when you need them. As you move in a direction, you can query for the tiles in that direction, and you can fairly cheaply dump memory when you're in the background. This would get rid of the "several" 2D arrays and move all the data into a single object. In principle, the grid could be infinite in size this way, since everything is by relationship rather than coordinate.
You could similarly approach the problem using SQLite, querying for rows and columns in a given range. You might mark the objects as NSDiscardableContent and put them in an NSCache, which could dramatically improve memory performance. You could still generate an effectively-infinite grid as long as you allow coordinates to be both positive and negative.

How can I speed up this 3D grid-based rendering system?

I have recently been developing an isometric, rendering system to map out 3D grids in Javascript. All of the items on the grid are cubes of equal dimensions, the only differences between each one is a texture to represent a value for that coordinate. My application requires large grids to be graphed, even though only a small portion is visible in the viewport at once.
Because I am using Canvas, which is slow to draw thousands of shapes per frame, I set my script to loop through each block but only draw its faces if they are 1.) next to an empty grid space and 2.) inside the viewport. This system works fine for smaller grids, but as my application will need considerably large ones (1000+x1000+x128), I will need to add some performance improvements for the final product.
Does anyone that has worked with rendering systems know any way I can further optimize my engine? One thing that I guess may be effective will be trying to not loop through each grid value, even if it is not being drawn. However, I do not know the most efficient way to know whether to loop through a grid value or not (I am currently going through EVERY value, then calculating whether it should be drawn).
If I have been too vague, please tell me and I will be happy to elaborate. Thank you for your time and expertise; I am a student and any help will greatly aid my learning.
Some pointers to you: you might want to have a look at classic culling algorithms using things like octree (or quadtrees in your case), ...