My compact framework application creates a smooth-scrolling list by rendering all the items to a large bitmap surface, then copying that bitmap to an offset position on the screen so that only the appropriate items show. Older versions only rendered the items that should appear on screen at the time, but this approach was too slow for a smooth scrolling interface.
It occasionally generates an OutOfMemoryException when initially creating the large bitmap. If the user performs a soft-reset of the device and runs the application again, it is able to perform the creation without issue.
It doesn't look like this bitmap is being generated in program memory, since the application uses approximately the same amount of program memory as it did before the new smooth-scrolling methods.
Is there some way I can prevent this exception? Is there any way I can free up the memory I need (wherever it is) before the exception is thrown?
I'd suggest going back to the old mechanism of rendering only part of the data, as the size of the fully-rendered data is obviously an issue. To help prevent rendering problems I would probably pre-render a few rows above and below the current view so they can be "scrolled" in with limited impact.
And just as soon as I posted I thought of something you can do to fix your problem with the new version. The problem you have is one of CF trying to find one block of contiguous memory available for the huge bitmap, and this is occasionally a problem.
Instead of creating one big bitmap, you can instead create a collection of smaller bitmaps, one for each item, and render each item onto its own little bitmap. During display, you then just copy over the bitmaps you need. CF will have a much easier time creating a bunch of little bitmaps than one big one, and you shouldn't have any memory problems unless this is a truly enormous bunch of items.
I should avoid expressions like "there is no fix".
One other important point: make sure you call Dispose() on each bitmap when you're finished with it.
Your bitmap definitely is being created in program memory. How much memory the bitmap needs depends on how big it is, and whether or not this required size will generate the OutOfMemoryException depends on how much is available to the PDA (which makes this a randomly-occuring error).
Sorry, but this is generally an inadvisable control rendering technique (especially on the Compact Framework) for which there is no fix short of increasing the physical memory on the PDA, which isn't usually possible (and often won't fix the problem anyway, since a CF process is limited to 32MB no matter how much the device has available).
Your best bet is to go back to the old version and improve its rendering speed. There is also a simple technique available on CF for making a control double-buffered to eliminate flicker.
Since it appears you've run into a device limitation that is restricting the total size of Bitmap space you can create (these are apparently created in video RAM rather than general program memory), one alternative is to replace the big Bitmap object used here with a plain-old block of Windows memory, accessing it for reading and writing by PInvoking the BitBlt API function.
Initially creating the memory block is tricky, and you'd probably want to ask another SO question about that (GCHandle.Alloc can be used here to create a "pinned" object, which means .NET isn't allowed to move it around in memory, which is critical here). I know how to do it, but I'm not sure I do it correctly and I'd rather have an expert's input.
Once you've created the big block, you'd iterate through your items, render each to one small bitmap that you keep re-using (using your existing .NET code), and BitBlt it to the appropriate spot in your memory block.
After creating the entire cache, your rendering code should work just like before, with the difference that instead of copying from the big bitmap to your rendering surface, you BitBlt from your cache block. The arguments for BitBlt are essentially the same as for DrawImage (destination, source, coordinates and sizes etc.).
Since you're creating the cache out of regular memory this way instead of specialized video RAM, I don't think you'll run into the same problem. However, I would definitely get the block creation code working first and test to make sure it can create a big enough block every time.
Update: actually, the ideal approach would be to have a collection of smaller memory blocks rather than one big one (like I thought was the problem with the Bitmap approach), but you already have enough to do. I've worked with CF apps that deal with 5 and 10MB objects and it's not a huge problem anyway (although it might be a bigger problem when that chunk is pinned - I dunno). BTW, I've always been surprised by the OOMEs on BitMap creation because I knew the bitmaps were much smaller than the available memory, as did you - now I know why. Sorry I thought this was an easy solve at first.
Related
I can currently acquire swap chain image, draw to it and then present it. After vkQueuePresentKHR the image is returned back to the swap chain. Is there other way to return the image back. I do not want to display the rendered data to screen.
You can probably do what you want here by simply not presenting the images to the device. But the number of images you can get depends on the VkSurfaceCapabilities of your device.
The maximum number of images that the application can simultaneously acquire from this swapchain is derived by subtracting VkSurfaceCapabilitiesKHR::minImageCount from the number of images in the swapchain and adding 1.
On my device, I can have an 8-image swapchain and the minImageCount is 2, letting me acquire 7 images at once.
If you really want for whatever reason to scrap the frame just do not Present the Image and reuse it next iteration (do not Acquire new Image; use the one you already have).
If there's a possibility you are never going to use some Swapchain Image, you still do not need to worry about it. Acquired Images will be reclaimed (unpresented) when a Swapchain is destroyed.
Seeing your usage comment now, I must add you still need to synchronize. And it is not guaranteed to be round-robin. And that it sounds very misguided. Creating Swapchain seems like equal programming work to creating and binding memory to the Image. Considering the result is not "how it is meant to be used"...
From a practical point, you will probably not have good choice of Swapchain Image formats, types and usage flags and they can be limited by size and numbers you can use. It will probably not work well across platforms. It may come with performance hit too.
TL;DR Swapchains are only for interaction with the windowing system (or lack thereof) of the OS. For other uses there are appropriate non-Swapchain commands and objects.
Admittedly Vulkan is sometimes less than terse to write in(a product of it being C-based, reasonably low-level and abstracting a wide range of GPU-like HW), but your proposed technique is not a viable way around it. You need to get used to it and where apropriate make your own abstractions (or use a library doing that).
It isn't clear to me when it's a good idea to use VK_IMAGE_LAYOUT_GENERAL as opposed to transitioning to the optimal layout for whatever action I'm about to perform. Currently, my policy is to always transition to the optimal layout.
But VK_IMAGE_LAYOUT_GENERAL exists. Maybe I should be using it when I'm only going to use a given layout for a short period of time.
For example, right now, I'm writing code to generate mipmaps using vkCmdBlitImage. As I loop through the sub-resources performing the vkCmdBlitImage commands, should I transition to VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL as I scale down into a mip, then transition to VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL when I'll be the source for the next mip before finally transitioning to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL when I'm all done? It seems like a lot of transitioning, and maybe generating the mips in VK_IMAGE_LAYOUT_GENERAL is better.
I appreciate the answer might be to measure, but it's hard to measure on all my target GPUs (especially because I haven't got anything running on Android yet) so if anyone has any decent rule of thumb to apply it would be much appreciated.
FWIW, I'm writing Vulkan code that will run on desktop GPUs and Android, but I'm mainly concerned about performance on the latter.
You would use it when:
You are lazy
You need to map the memory to host (unless you can use PREINITIALIZED)
When you use the image as multiple incompatible attachments and you have no choice
For Store Images
( 5. Other cases when you would switch layouts too much (and you don't even need barriers) relatively to the work done on the images. Measurement needed to confirm GENERAL is better in that case. Most likely a premature optimalization even then.
)
PS: You could transition all the mip-maps together to TRANSFER_DST by a single command beforehand and then only the one you need to SRC. With a decent HDD, it should be even best to already have them stored with mip-maps, if that's a option (and perhaps even have a better quality using some sophisticated algorithm).
PS2: Too bad, there's not a mip-map creation command. The cmdBlit most likely does it anyway under the hood for Images smaller than half resolution....
If you read from mipmap[n] image for creating the mipmap[n+1] image then you should use the transfer image flags if you want your code to run on all Vulkan implementations and get the most performance across all implementations as the flags may be used by the GPU to optimize the image for reads or writes.
So if you want to go cross-vendor only use VK_IMAGE_LAYOUT_GENERAL for setting up the descriptor that uses the final image and not image reads or writes.
If you don't want to use that many transitions you may copy from a buffer instead of an image, though you obviously wouldn't get the format conversion, scaling and filtering that vkCmdBlitImage does for you for free.
Also don't forget to check if the target format actually supports the BLIT_SRC or BLIT_DST bits. This is independent of whether you use the transfer or general layout for copies.
I have an image at about 7000x6000px. I need this to be in a scrollview/imageView in my app, however this is way to huge for display. It is supposed to be a kind of map. I was hoping to keep the size of the app to the minimum, and the image is just about 13mb in .jpg. In .png it is over 100mb, which is unacceptable. Many have suggested CATiledLayer as an option, but I believe this would result in even bigger file sizes. Anyway, I tried to do it with CATiledLayer, and create my own tiles in TileCutter, (tiles in .jpg), and the size wasn't too bad. But I am having errors all over the place. The iOS version of CATiledLayer is a mystery to me, and I can't find a way to solve this. I get an error saying something about the java-equivalent "index out of bounds of array", even though the array has content at that specific index..
It has a method which returns an array. The array contains data of a .plist. Before the return I log out the content of the array, giving me good data. The call is trying to access
[array objectAtIndex:0]
and put it in a dictionary, but throws OutOfBounds. When logged out the whole array, I can clearly see the content, but when logged out
NSLog("%#",[method objectAtIndex]); I get the same exception.
Anyway, CATiledLayer has given me nothing but problems. I have been reverse-engineering the PhotoScroller project with no luck. Anyone have any other solutions?
Thanks.
Apple has this really neat project, PhotoScroller, that uses CATiledLayer and lets you scroll through several images and zoom them. This seemed really neat until I found out that Apple "cheated" and pre-tiled the images (about 800 tiles saved as file in the bundle!)
I had need for a similar capability, but had to download the images from the network. Thus came about PhotoScrollerNetwork. With the TiledImageBuilder you can download (or read from disk) massive images - I even tested a 18000x18000 image - and it works.
What this class does is start tiling the image as it downloads (when using libjpegturbo) or can save the file then tile (takes longer). The class figures out how many levels of detail are needed to show the image at full resolution and sized to fit in the containing view (a scrollview).
The class uses the disk cache to hold the tiles, but uses and old Unix trick of creating a file, opening it, then unlinking it so that the tiles never really get saved - once the class is dealloced (and the file descriptor closed) the tiles are freed (or if your app crashes they get freed too.
Someone had problems on an iPad 1 with its quite limited memory, so the class now throttles its use of the file system when concurrently loading multiple images. I had a discussion with the iOS kernel manager at WWDC this year, and after explaining the throttling technique to him, he said the algorithm (on managing the amount of disk cache usage) was probably the best technique that could be used.
I think those who suggested CATiledLayer are right. You should really use it! If you need a sample project that displays a huge bitmap using that technology, look here: http://www.cimgf.com/2011/03/01/subduing-catiledlayer/
Many technologies we use as Cocoa/Cocoa Touch developers stand
untouched by the faint of heart because often we simply don’t
understand them and employing them can seem a daunting task. One of
those technologies is found in Core Animation and is referred to as
the CATiledLayer. It seems like a magical sort of technology because
so much of its implementation is a bit of a black box and this fact
contributes to it being misunderstood. CATiledLayer simply provides a
way to draw very large images without incurring a severe memory hit.
This is important no matter where you’re deploying, but it especially
matters on iOS devices as memory is precious and when the OS tells you
to free up memory, you better be able to do so or your app will be
brought down. This blog post is intended to demonstrate that
CATiledLayer works as advertised and implementing it is not as hard as
it may have once seemed.
I want to make an extremely large bitmap (250,000 pixels on each side, to be eventually written out as BigTIFF). I don't see a memory size or dimensional limit anywhere in the docs, can Core Graphics handle it?
CG is not designed for that kind of workload.
(I'd be surprised if you found any general-purpose graphics framework that is, frankly. If you're pushing images that big, you're going to have to write your own code to get anything done in a reasonable amount of time.)
In my experience, images started to fail once dimensions got over 32767 or so. Not in any organized way, just crashes and hard-to-repro failures; certain parts of the API would work, others wouldn't. Things may be better in 64-bit but I wouldn't count on it.
When examining a process in Process Explorer, what does it mean when there are several page faults? The application is processing quite a bit of data and the UI is not very responsive. Are there optimizations to the code that could reduce or eliminate page faults? Would increasing the physical RAM of the system make a difference?
http://en.wikipedia.org/wiki/Page_fault
Increasing the physical RAM on your machine could result in fewer page faults, although design changes to your application will do much better than adding RAM. In general, having a smaller memory footprint, and having things that will often be accessed around the same time be on the same page will decrease the number of page faults. It can, also, be helpful to try to do everything you can with some bit of data in memory all at once so that you don't need to access it many different times, which may cause page faults (aka thrashing).
It might also be helpful to make sure that memory that is accessed after each other is near to each other (eg if you have some objects, place them in an array) if these objects have lots of data that is very infrequently used, place it in another class and make the first class have a reference to the second one. This way you will use less memory most of the time.
A design option would be to write a memory cache system, lazy creating memory (create on demand). such cache would have a collection of pre-allocated memory chunks, accessed by their size. For example, an array of N lists, each list having M buffers.each list is responsible to bring you memory in a certain size range. (for example, from each list bringing you memory in the range of 2^i (i = 0..N-1). even if you want to use less then 2^i, you just dont use the extra memory in the buffer.
this would be a tradeoff of small memory waste, vs caching and less page faults.
another option is to use nedmalloc
good luck
Lior