Let's get swapchain's image count straight - vulkan

After getting familiar with tons of books, tutorials and documentation regarding Vulkan I am still really confused by how does swapchain image count work.
Documentation on swapchain image count:
VkSwapchainCreateInfoKHR::minImageCount is the minimum number of presentable images that the application needs. The implementation will either create the swapchain with at least that many images, or it will fail to create the swapchain.
After reading this field's description, my understanding is that if I will create swapchain with minImageCount value greater than or equal to VkSurfaceCapabilitiesKHR::minImageCount and lesser or equal to VkSurfaceCapabilitiesKHR::maxImageCount then I will be able to acquire minImageCount images, because it is number of images that the application needs.
Let's assume the following values:
VkSurfaceCapabilitiesKHR::minImageCount == 2
VkSurfaceCapabilitiesKHR::maxImageCount == 8
VkSwapchainCreateInfoKHR::minImageCount == 3
In such case I expect to be able to acquire 3 images from swapchain, let's say one designated to be presented, one waiting for being presented and one for drawing (just like in triple buffering case).
On the other hand many tutorials advise to set VkSwapchainCreateInfoKHR::minImageCount value to VkSwapchainCreateInfoKHR::minImageCount + 1, explaining that not all images created in swapchain are designated to be acquired by the application, because some of them might be used by driver internally.
Example: Discussion
Is there any reliable explanation on how to pick number of images in swapchain so that the application won't be forced to wait for image acquisition?

Ultimately, the details of image presentation are not in your control. Asking for more images may make it less likely to encounter a CPU blocking situation, but there is no count, or other parameter, which can guarantee that it can't happen. Using more swapchain images merely makes it less likely.
However, you can easily tell when blocking happens simply by looking at how vkAcquireNextImageKHR behaves with a timeout of 0. If it returned that no image could be acquired, then you know that you need to wait. This gives you the opportunity to decide what to do with this information.
When things like this happens, you can note that it happened. If it happens frequently enough, it may be worthwhile to recreate the swapchain set with more images. Obviously, this is not a light-weight solution, but it would be fairly hardware neutral.

Related

FabricJS v3.4.0: Filters & maxTextureSize - performance/size limitations

Intro:
I've been messing with fabricJS image filtering features in an attempt to start using them in my webapp, but i've run into the following.
It seems fabricJS by default only sets the image size cap (textureSize) on filters to be 2048, meaning the largest image is 2048x2048 pixels.
I've attempted to raise the default by calling fabric.isWebGLSupported() and then setting fabric.textureSize = fabric.maxTextureSize, but that still caps it at 4096x4096 pixels, even though my maxTextureSize on my device is in the 16000~ range.
I realize that devices usually report the full value without accounting for current memory actually available, but that still seems like a hard limitation.
So I guess the main issues I'm looking at here to start effectively using this feature:
1- Render blocking applyFilters() method:
The current filter application function seems to be render blocking in the browser, is there a way call it without blocking the rendering, so I can show an indeterministic loading spinner or something?
is it as simple as making the apply filter method async and calling it from somewhere else in the app? (I'm using vue for context, with webpack/babel which polyfills async/await etc.)
2- Size limits:
Is there a way to bypass the size limit on images? I'm looking to filter images up to 4800x7200 pixels
I can think of one way atleast to do this, which is to "break up" the image into smaller images, apply the filters, and then stitch it back together. But I worry it might be a performance hit, as there will be a lot of canvas exports & canvas initializations in this process.
I'm surprised fabricjs doesn't do this "chunking" by default as its quite a comprehensive library, and I think they've already gone to the point where they use webGL shaders (which is a black box to me) for filtering under the hood for performance, is there a better way to do this?
My other solution would be to send the image to a service (one i handroll, or a pre-existing paid one) that applies the filters somewhere in the cloud and returns it to the user, but thats not a solution i prefer to resort to just yet.
For context, i'm mostly using fabric.Canvas and fabric.StaticCanvas to initialize canvases in my app.
Any insights/help with this would be great.
i wrote the filtering backend for fabricJS, with Mr. Scott Seaward (credits to him too), and i can give you some answers.
Hard block to 2048
A lot of macbook with intel integrated only videocard report a max texture size of 4096, but then they crash the webgl instance at anything higher of 2280. This was happening widely in 2017 when the webgl filtering was written. 4096 would have left uncovered by default a LOT of notebooks. Do not forget mobile phones too.
You know your userbase, you can up the limit to what your video card allows and what canvas allows in your browser. The final image, for how big the texture can be, must be copied in a canvas and displayed. ( canvas has a different max size depending on browser and device )
Render blocking applyFilters() method
Webgl is sync for what i understood.
Creating a parallel executing in a thread for filtering operations that are in the order of 20-30 ms ( sometimes just a couple of ms in chrome ) seems excessive.
Also consider that i tried it but when more than 4 webgl context were open in firefox, some would have been dropped. So i decided for one at time.
The non webgl filtering take longer of course, that could be done probably in a separate thread, but fabricJS is a generic library that does both vectors and filterings and serialization, it has already lot of things on the plate, filtering performances are not that bad. But i'm open to argue around it.
Chunking
Shutterstock editor uses fabricJS and is the main reason why a webgl backend was written. The editor has also chunking and can filter with tiles of 2048 pixels bigger images. We did not release that as opensource and i do not plan of asking. That kind of tiling limit the kind of filters you can write because the code has knowledge of a limited portion of the image at time, even just blurring becomes complicated.
Here there is a description of the process of tiling, is written for casual reader and not only software engineers, is just a blog post.
https://tech.shutterstock.com/2019/04/30/canvas-webgl-filtering-concepts
Generic render blocking consideration
So fabricJS has some pre-written filters made with shaders.
The timing i note here are from my memory and not reverified
The time that pass away filtering an image is:
Uploading the image in the GPU ( i do not know how many ms )
Compiling the shader ( up to 40 ms, depends )
Running the shader ( like 2 ms )
Downloading the result on the GPU ( like 0ms or 13 depends on what method is using )
Now the first time you run a filter on a single image:
The image gets uploaded
Filter compiled
Shader Run
Result downloaded
The second time you do this:
Shader Run
Result downloaded
When a new filter is added or filter is changed:
New filter compiled
Shader or both shader run
Result downloaded
Most common errors in application building with filtering that i have noticed are:
You forget to remove old filters, leaving them active with a value near 0 that does not produce visual changes, but adds up time
You connect the filter to a slider change event, without throttling, and that depending on the browser/device brings up to 120 filtering operation per second.
Look at the official simple demo:
http://fabricjs.com/image-filters
Use the sliders to filter, apply even more filters, everything seems pretty smooth to me.

Combined image samplers vs seprate sampled image and sampler

I want to access multiple textures with the same sampling parameters from a fragment shader (For instance, texture and normal map). Moreover, images change frequently whilst sampler stays stationary (suppose the texture is a video). I've found contradictory information about how it can be done. Vulkan Cookbook states that using combined image samplers might have a performance benefit on some platforms, but this Reddit answer states that combined image samplers don't make any sense.
My question is: Is there any reason to not use separate sampled images and one sampler (for both images) considering it makes the program's logic more simple?
Odds are good that which one you pick will not be the primary limiting factor in your application's performance. It's speed is more likely to be determined by the user factors: how efficient you are at building CBs, walking through your data structures, and so forth.
So use whichever works best for your needs and move on.
this Reddit answer states that combined image samplers don't make any sense.
Considering that said "answer" claims that this statement from the specification:
On some implementations, it may be more efficient to sample from an image using
a combination of sampler and sampled image that are stored together in the
descriptor set in a combined descriptor.
"warns you that [combined image samplers] may not be as efficient on some platforms", it's best to just ignore whatever they said and move on.

Why do I need resources per swapchain image

I have been following different tutorials and I don't understand why I need resources per swapchain image instead of per frame in flight.
This tutorial:
https://vulkan-tutorial.com/Uniform_buffers
has a uniform buffer per swapchain image. Why would I need that if different images are not in flight at the same time? Can I not start rewriting if the previous frame has completed?
Also lunarg tutorial on depth buffers says:
And you need only one for rendering each frame, even if the swapchain has more than one image. This is because you can reuse the same depth buffer while using each image in the swapchain.
This doesn't explain anything, it basically says you can because you can. So why can I reuse the depth buffer but not other resources?
It is to minimize synchronization in the case of the simple Hello Cube app.
Let's say your uniforms change each frame. That means main loop is something like:
Poll (or simulate)
Update (e.g. your uniforms)
Draw
Repeat
If step #2 did not have its own uniform, then it needs to write a uniform previous frame is reading. That means it has to sync with a Fence. That would mean the previous frame is no longer considered "in-flight".
It all depends on the way You are using Your resources and the performance You want to achieve.
If, after each frame, You are willing to wait for the rendering to finish and You are still happy with the final performance, You can use only one copy of each resource. Waiting is the easiest synchronization, You are sure that resources are not used anymore, so You can reuse them for the next frame. But if You want to efficiently utilize both CPU's and GPU's power, and You don't want to wait after each frame, then You need to see how each resource is being used.
Depth buffer is usually used only temporarily. If You don't perform any postprocessing, if Your render pass setup uses depth data only internally (You don't specify STORE for storeOp), then You can use only one depth buffer (depth image) all the time. This is because when rendering is done, depth data isn't used anymore, it can be safely discarded. This applies to all other resources that don't need to persist between frames.
But if different data needs to be used for each frame, or if generated data is used in the next frame, then You usually need another copy of a given resource. Updating data requires synchronization - to avoid waiting in such situations You need to have a copy a resource. So in case of uniform buffers, You update data in a given buffer and use it in a given frame. You cannot modify its contents until the frame is finished - so to prepare another frame of animation while the previous one is still being processed on a GPU, You need to use another copy.
Similarly if the generated data is required for the next frame (for example framebuffer used for screen space reflections). Reusing the same resource would cause its contents to be overwritten. That's why You need another copy.
You can find more information here: https://software.intel.com/en-us/articles/api-without-secrets-the-practical-approach-to-vulkan-part-1

How to return acquired SwapChain image back to the SwapChain?

I can currently acquire swap chain image, draw to it and then present it. After vkQueuePresentKHR the image is returned back to the swap chain. Is there other way to return the image back. I do not want to display the rendered data to screen.
You can probably do what you want here by simply not presenting the images to the device. But the number of images you can get depends on the VkSurfaceCapabilities of your device.
The maximum number of images that the application can simultaneously acquire from this swapchain is derived by subtracting VkSurfaceCapabilitiesKHR::minImageCount from the number of images in the swapchain and adding 1.
On my device, I can have an 8-image swapchain and the minImageCount is 2, letting me acquire 7 images at once.
If you really want for whatever reason to scrap the frame just do not Present the Image and reuse it next iteration (do not Acquire new Image; use the one you already have).
If there's a possibility you are never going to use some Swapchain Image, you still do not need to worry about it. Acquired Images will be reclaimed (unpresented) when a Swapchain is destroyed.
Seeing your usage comment now, I must add you still need to synchronize. And it is not guaranteed to be round-robin. And that it sounds very misguided. Creating Swapchain seems like equal programming work to creating and binding memory to the Image. Considering the result is not "how it is meant to be used"...
From a practical point, you will probably not have good choice of Swapchain Image formats, types and usage flags and they can be limited by size and numbers you can use. It will probably not work well across platforms. It may come with performance hit too.
TL;DR Swapchains are only for interaction with the windowing system (or lack thereof) of the OS. For other uses there are appropriate non-Swapchain commands and objects.
Admittedly Vulkan is sometimes less than terse to write in(a product of it being C-based, reasonably low-level and abstracting a wide range of GPU-like HW), but your proposed technique is not a viable way around it. You need to get used to it and where apropriate make your own abstractions (or use a library doing that).

Resizable image resource with embedded cap insets

This is by far not a showstopper problem just something I've been curious about for some time.
There is this well-known -[UIImage resizableImageWithCapInsets:] API for creating resizable images, which comes really handy when texturing variable size buttons and frames, especially on the retina iPad and especially if you have lots of those and you want to avoid bloating the app bundle with image resources.
The cap insets are typically constant for a given image, no matter what size we want to stretch it to. We can also put that this way: the cap insets are characteristic for a given image. So here is the thing: if they logically belong to the image, why don't we store them together with the image (as some kind of metadata), instead of having to specify them everywhere where we got to create a new instance?
In the daily practice, this could have serious benefits, mainly by means of eliminating the possibility of human error in the process. If the designer who creates the images could embed the appropriate cap values upon exporting in the image file itself then the developers would no longer have to write magic numbers in the code and maintain them updated each time the image changes. The resizableImage API could read and apply the caps automatically. Heck, even a category on UIImage would make do.
Thus my question is: is there any reliable way of embedding metadata in images?
I'd like to emphasize these two words:
reliable: I have already seen some entries on the optional PNG chunks but I'm afraid those are wiped out of existence once the iOS PNG optimizer kicks in. Or is there a way to prevent that? (along with letting the optimizer do its job)
embedding: I have thought of including the metadata in the filename similarly to what Apple does, i.e. "#2x", "~ipad" etc. but having kilometer-long names like "image-20.0-20.0-40.0-20.0#2x.png" just doesn't seem to be the right way.
Can anyone come up with smart solution to this?
Android has a filetype called nine-patch that is basically the pieces of the image and metadata to construct it. Perhaps a class could be made to replicate it. http://developer.android.com/reference/android/graphics/NinePatch.html