Efficient way to handle large runtime-generated tile maps? - objective-c

I am coding a 2 dimensional, tile based (orthogonal tiles) iPhone game. All levels are procedurally generated when the app is first played, and then persist until the user wants a new map. Maps are rather large, being 1000 tiles in both width and height, and the terrain is destructible. At the moment it is rather similar to Terraria, but that will change.
To hold map/tile information I am currently using several 2 dimensional c style arrays. This works well, but I am concerned as to the amount of memory this takes up, as the arrays are all defined as short array[1000][1000], which takes up (1000 * 1000 * sizeof(short)) bytes of space.
This is not particularly desirable when the iPhone doesn't have an incredibly large amount of memory to work with, especially when the user is multitasking. The main problem is that there is no way that I can use a specific tile map format such as .tmx, because all the levels are procedurally generated. Performance could also be an issue, because if a tile is destroyed at index(x, y), then I need to change the data in that index. I have also thought about writing tile map data to a text file, but I think there would be difficulties or performance issues when accessing or changing data.
Keeping all this in mind, what would be an efficient and fast way to handle my tile data?

My gut feeling on this is Core Data structured such that each tile element has relationships to the tiles around it. There's some non-trivial overhead here, but the advantage is that you can release tiles that aren't onscreen from memory and fault them back when you need them. As you move in a direction, you can query for the tiles in that direction, and you can fairly cheaply dump memory when you're in the background. This would get rid of the "several" 2D arrays and move all the data into a single object. In principle, the grid could be infinite in size this way, since everything is by relationship rather than coordinate.
You could similarly approach the problem using SQLite, querying for rows and columns in a given range. You might mark the objects as NSDiscardableContent and put them in an NSCache, which could dramatically improve memory performance. You could still generate an effectively-infinite grid as long as you allow coordinates to be both positive and negative.

Related

Best data structure for insert/update/delete and search of 2d point

I have a infrastructure for massive entities on a GIS map - using the graphic card for drawing (WebGL).
Today, I'm using quad-tree for indexing the data and querying - for selection/drawing purposes.
Recently I've found a way to enable my users to update the location of the entities and draw the change very fast. for example updating 15000 locations, and redrawing took less then 0.05 ms.
The problem now is updating my data structure. It is very very slow.
I've ran over many data structures, such as R-tree, B-Tree and more.. but not yet found satisfied result.
My question is
what is the optimal data structure for 2d points from performances point of view, for inserting/updating and query (by distance from point, rectangle) ?
Maybe there is a web gl solution for this ?
what is the optimal data structure for 2d points from performances
point of view, for inserting/updating and query (by distance from
point, rectangle) ?
It's hard to find a data structure that satisfies all of these beautifully. Typically for searches to be fastest, insertion has to be slower, for insertion to be faster, searches have to be slower, so it's a balancing act.
However, given your needs, a simple NxM grid might actually be ideal. I come originally from a gaming background where sprites can be moving at every single frame. Quad-trees are often too expensive for such dynamic content that wants to update the structure every single frame. As a result, a simple NxM grid often works better for those cases.
The NxM grid can do a rectangular query potentially even faster than the quad-tree, as it simply has to determine which grid cells overlap the rectangle and then simply check the elements within the cells to see if the overlap/are inside the rectangle. Same goes for a search within a radial parameter (circle).
Insertion and removal is dirt cheap since we only need to update the cells in which the elements overlap. If you're dealing with points, a point only belongs in one cell, and it becomes an extremely simple constant-time operation. In fact, if each point stores a list pointer for the next element, moving it from one cell to another (along with removing it from a cell and inserting it to a cell) only requires changing pointers around -- no memory allocation required. The grid then becomes a grid of head pointers for a singly-linked list.
Another approach is that I imagine you're currently using one massive quad-tree for the entire world. You can use a hybrid of these techniques by using a quad-tree to represent a small region, e.g., with quad-trees tiled together to form a grid of quad-trees. This can significantly reduce the cost to update a quad-tree for a specific region since each quad-tree stores only the data for that region.

Best practice for simple DirectX overlay rendering

I'm creating a DirectX 11 game that renders complex meshes in 3D space. I'm using vertex/index buffers/shaders and this all works fine. However I now want to perform some basic 'overlay' rendering - more specifically, I want to render wireframe boxes in 3D space to show the bounds of a particular area. There would only ever be one or two boxes in view at any one time, and their vertices would change position each frame.
I've therefore been searching for simpler DX11 rendering methods but most articles I find still prepare a vertex/index buffer for very simple rendering. I know that hardware is well optimised for processing vertex streams, but is the overhead of building and filling a vertex buffer every frame just to process 8 vertices really the most efficient method?
My question is therefore, what is the most efficient method for performing this very simple rendering in DX11? Is there any more primitive method ("DrawLine", "DrawLineList(D3DXVECTOR3[])", ...) that would be a better solution? It could be less efficient per-vertex than the standard method of passing vertex buffers because it's only ever going to be used for a handful of vertices per frame.
Thanks in advance
Rob
You should create a single vertex / index buffer for each primitive Shape (box, sphere, ...) and use transformation matrix to place it correctly in the world.

Strange thing in memory management for ios development

I have a app in my ipod.
1.Open the app, and look at the memory Instruments (Activity monitor), it's 8.95M
2.click a button, it will add a UIImageView with a large image to the screen, the memory is 17.8M now.
3.Remove the UIImageView from screen, and wait a second, the memory is 9.09M now.
I am sure the UIImageView is released after it removed from screen. That's very simple code.
So when it removed, the status of the app should as the same as before add the UIImageView add to the screen, am I right? But why the memory is 9.09M rather than 8.95M? If you add a more complex View to the screen, the difference is more obvious.
This is normal. It's due to a "lazy grow, lazy shrink" algorithm. What that means is that you have a data structure that can be sized for small numbers of items or large numbers of items. The sizing for small numbers of items uses very little memory but isn't efficient when handling large numbers of items. The sizing for large numbers is very efficient for managing large collections of things, but uses more memory to index the objects.
A "lazy grow, lazy shrink" algorithm tries to avoid the cost of resizing a structure's index by only growing the index if it's much too small and only shrinking it if it's much too big. For example, a typical algorithm might grow the index only if its ideal size is at least three times bigger than it is and shrink it only if it's more than three times its ideal size. This is also needed to prevent large numbers of resize operations if an application rapidly allocates and frees collections of resources -- you want the index size to be a bit 'sticky'.
When you open the large object and consume GUI objects, you make the index much too small, and it grows. But when you close the large object, you make the index only a bit too big, so it doesn't shrink.
If the device comes under memory pressure, the index will shrink. If the application continues to reduce its use of UI resources, the index will shrink. If the application uses more UI resources, the index will not need to grow again quite as soon.
A good analogy might be stacks of paper on your desk. If you have 30 papers you might need to find, you might keep them in 4 stacks. But if you have 5,000 papers, 4 stacks will make searching tedious. You'll need more stacks in that case. So when the number of papers gets too big for 4 stacks, you need to re-index into a greater number of stacks. But then when the number gets small, you won't bother to constantly re-index until you have way too many stacks, because searching is still pretty fast.
When you're done handling all those papers, your desk has a few extra stacks. That saves it from re-indexing the next time it needs to handle a lot of papers.

Per frame optimization for large datasets

Summary
New to iPhone programming, I'm having trouble picking the right optimization strategy to filter a set of view components in a scrollview with huge content. In what area would my app gain the most performance?
Introduction
My current iPad app-in-progress let's users explore fairly large binary tree structures. The trees contain between 30 to 900 nodes, and when drawing inside a scrollview (with limited zoom) it looks like this.
The nodes' contents are stored in a SQLite backed Core Data model. It's a binary tree and if a node has children, there are always exactly two. The x and y positions are part of the model, as are the dimensions of the node connections, shown as dotted lines.
Optimization
Only about 50 nodes fit the screen at any given time. With the largest trees containing up to 900 nodes, it's not possible to put everything in a scrollview controlled and zooming UIView, that's a recipe for crashes. So I have to do per frame filtering of the nodes.
And that's where my troubles start. I don't have the experience to make a well founded decision between the possible filtering options, and in addition I probably don't know about that really fast special magic buried deep in Objective-C or Cocoa Touch. Because the backing store is close to 200 MB in size (some 90.000 nodes in hundreds of trees), it's very time consuming to test every single tree on the iPad device. Which is why I'd like to ask you guys for advice.
For all my attempts I'm putting a filter method in the scrollViewDidScroll: and scrollViewDidZoom:. I'm also blocking the main thread with the filter, because I can't show the content without the nodes anyway. But maybe someone has an idea in that area?
Because all the positioning is present in the Core Data model, I might use NSFetchRequest to do the filtering. Is that really fast though? I have the idea it's not a very optimized method.
From what I've tried, the faulted managed objects seem to fit in memory at once, but it might be tricky for the larger trees once their contents start firing faults. Is it a good idea to loop over the NSSet of nodes and see what items should be on screen?
Are there other tricks to gain performance? Would you see ways where I could use multi threading to get the display set faster, even though the model's context was created on the main thread?
Thanks for your advice,
EP.
Ironically your binary tree could be divided using Binary Space Partitioning done in 2D so rendering will be very fast performant and a number of check close to minimum necessary.

Planning a 2D tile engine - Performance concerns

As the title says, I'm fleshing out a design for a 2D platformer engine. It's still in the design stage, but I'm worried that I'll be running into issues with the renderer, and I want to avoid them if they will be a concern.
I'm using SDL for my base library, and the game will be set up to use a single large array of Uint16 to hold the tiles. These index into a second array of "tile definitions" that are used by all parts of the engine, from collision handling to the graphics routine, which is my biggest concern.
The graphics engine is designed to run at a 640x480 resolution, with 32x32 tiles. There are 21x16 tiles drawn per layer per frame (to handle the extra tile that shows up when scrolling), and there are up to four layers that can be drawn. Layers are simply separate tile arrays, but the tile definition array is common to all four layers.
What I'm worried about is that I want to be able to take advantage of transparencies and animated tiles with this engine, and as I'm not too familiar with designs I'm worried that my current solution is going to be too inefficient to work well.
My target FPS is a flat 60 frames per second, and with all four layers being drawn, I'm looking at 21x16x4x60 = 80,640 separate 32x32px tiles needing to be drawn every second, plus however many odd-sized blits are needed for sprites, and this seems just a little excessive. So, is there a better way to approach rendering the tilemap setup I have? I'm looking towards possibilities of using hardware acceleration to draw the tilemaps, if it will help to improve performance much. I also want to hopefully be able to run this game well on slightly older computers as well.
If I'm looking for too much, then I don't think that reducing the engine's capabilities is out of the question.
I think the thing that will be an issue is the sheer amount of draw calls, rather than the total "fill rate" of all the pixels you are drawing. Remember - that is over 80000 calls per second that you must make. I think your biggest improvement will be to batch these together somehow.
One strategy to reduce the fill-rate of the tiles and layers would be to composite static areas together. For example, if you know an area doesn't need updating, it can be cached. A lot depends of if the layers are scrolled independently (parallax style).
Also, Have a look on Google for "dirty rectangles" and see if any schemes may fit your needs.
Personally, I would just try it and see. This probably won't affect your overall game design, and if you have good separation between logic and presentation, you can optimise the tile drawing til the cows come home.
Make sure to use alpha transparency only on tiles that actually use alpha, and skip drawing blank tiles. Make sure the tile surface color depth matches the screen color depth when possible (not really an option for tiles with an alpha channel), and store tiles in video memory, so sdl will use hardware acceleration when it can. Color key transparency will be faster than having a full alpha channel, for simple tiles where partial transparency or blending antialiased edges with the background aren't necessary.
On a 500mhz system you'll get about 6.8 cpu cycles per pixel per layer, or 27 per screen pixel, which (I believe) isn't going to be enough if you have full alpha channels on every tile of every layer, but should be fine if you take shortcuts like those mentioned where possible.
I agree with Kombuwa. If this is just a simple tile-based 2D game, you really ought to lower the standards a bit as this is not Crysis. 30FPS is very smooth (research Command & Conquer 3 which is limited to 30FPS). Even still, I had written a remote desktop viewer that ran at 14FPS (1900 x 1200) using GDI+ and it was still pretty smooth. I think that for your 2D game you'll probably be okay, especially using SDL.
Can you just buffer each complete layer into its view plus an additional tile size for all four ends(if you have vertical scrolling), use the buffer again to create a new buffer minus the first column and drawing on a new end column?
This would reduce a lot of needless redrawing.
Additionally, if you want a 60fps, you can look up ways to create frame skip methods for slower systems, skipping every other or every third draw phase.
I think you will be pleasantly surprised by how many of these tiles you can draw a second. Modern graphics hardware can fill a 1600x1200 framebuffer numerous times per frame at 60 fps, so your 640x480 framebuffer will be no problem. Try it and see what you get.
You should definitely take advantage of hardware acceleration. This will give you 1000x performance for very little effort on your part.
If you do find you need to optimise, then the simplest way is to only redraw the areas of the screen that have changed since the last frame. Sounds like you would need to know about any animating tiles, and any tiles that have changed state each frame. Depending on the game, this can be anywhere from no benefit at all, to a massive saving - it really depends on how much of the screen changes each frame.
You might consider merging neighbouring tiles with the same texture into a larger polygon with texture tiling (sort of a build process).
What about decreasing the frame rate to 30fps. I think it will be good enough for a 2D game.