What are layout transitions in graphics programming - vulkan

I'm following vulkan-tutorial.com and in the images portion of the tutorial it mentions layout transitions, but doesn't elaborate on what they are. I don't like to copy and paste code without knowing exactly what it does and I can't find a sufficient explanation in the tutorial or on google.

A "layout transition" is exactly what those words mean. It's when you transition the layout of an image sub-resource from one layout to another. So your question really seems to be... what is a layout?
In the Vulkan abstraction, images are composed of sub-resources. These represent distinct sections of an image which can be manipulated independently of other sections. For example, each mipmap level of a mipmapped image is a sub-resource.
At any particular time that an image sub-resource is being used by a GPU process, that sub-resource has a layout. This is part of the Vulkan abstraction of GPU operations, so exactly what it means to the GPU will vary from chip to chip.
The important part is this: layouts restrict how you can use an image sub-resource. Or more to the point, in order to use an image sub-resource in a particular way, it must be in a layout which permits that usage.
When a sub-resource is in the VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL layout, you can only perform operations which read from the sub-resource within a shader. The shader cannot write to the image, nor can the image be used as a render target.
Now, the general layout allows pretty much any use at any time while within that layout. However, this also can represent less optimal performance. Any of the more restricted layouts can make those accesses to the image more performance-friendly (depending on hardware).
So it is your job to keep track of the layout of any image sub-resources you plan to use. Now for most images, you're going to use destination transfer layout to upload to them, and then just leave them as shader read-only, because you don't generally use most images more arbitrarily. So generally, this means keeping track of render targets that you want to read from, as well as swapchain images (you have to transition them to the present layout before presenting them) and storage images.
Layout transitions typically happen as part of an explicit dependency between two operations. This makes sense; if you're uploading data to an image, and you later want to read from it, you need a dependency between the upload and the read. You may as well do the layout transition then, since the transition can modify the way the bytes of the image are stored, so you need the transfer to be done first.

Related

Custom rendering with GPU, Direct3D or OpenGL

I have a Windows application that currently renders graphics largely using MFC that I'd like to change to get better use out of the GPU. Most of the graphics are straightforward and could easily be built up into a scene graph, but some of the graphics could prove very difficult. Specifically, in addition to the normal mesh type objects, I'm also dealing with point clouds which are liable to contain billions of Cartesian stored in a very compact manner that use quite a lot of custom culling techniques to be displayed in real time (Example). What I'm looking for is a mechanism that does the bulk of the scene rendering to a buffer and then gives me access to that buffer, a z buffer, and camera parameters such that I can modify them before putting them out to the display. I'm wondering whether this is possible with Direct3D, OpenGL or possibly use a higher level framework like OpenSceneGraph, and what would be the best starting point? Given the software is Windows based, I'd probably prefer to use Direct3D as this is likely to lead to fewest driver issues which I'm eager to avoid. OpenSceneGraph seems to provide custom culling via octrees, which are close but not identical to what I'm using.
Edit: To clarify a bit more, currently I have the following;
A display list / scene in memory which will typically contain up to a few million triangles, lines, and pieces of text, which I cull in software and output to a bitmap using low performing drawing primitives
A point cloud in memory which may contain billions of points in a highly compressed format (~4.5 bytes per 3d point) which I cull and output to the same bitmap
Cursor information that gets added to the bitmap prior to output
A camera, z-buffer and attribute buffers for navigation and picking purposes
The slow bit is the highlighted part of section 1 which I'd like to replace with GPU rendering of some kind. The solution I envisage is to build a scene for the GPU, render it to a bitmap (with matching z-buffer) based on my current camera parameters and then add my point cloud prior to output.
Alternatively, I could move to a scene based framework that managed the cameras and navigation for me and provide points in view as spheres or splats based on volume and level of detail during the rendering loop. In this scenario I'd also need to be able add cursor information to the view.
In either scenario, the hosting application will be MFC C++ based on VS2017 which would require too much work to change for the purposes of this exercise.
It's hard to say exactly based on your description of a complex problem.
OSG can probably do what you're looking for.
Depending on your timeframe, I'd consider eschewing both OpenGL (OSG) and DirectX in favor of the newer Vulkan 3D API. It's a successor to both D3D and OGL, and is designed by the GPU manufacturers themselves to provide optimal performance exceeding both of its predecessors.
The OSG project is currently developing a Vulkan scenegraph known as VSG, which already demonstrates superior performance to OSG and will have more generalized culling ability.
I've worked a bunch with point clouds and am pretty experienced with them, but I'm not exactly clear on what you're proposing to do.
If you want to actually have a verbal discussion about the matter, I'm pretty easy to find (my company is AlphaPixel -- AlphaPixel.com) and you could call us. I'm in the European time zone right now, it's not clear from your question where you are but you sound US-based.

Why do I need resources per swapchain image

I have been following different tutorials and I don't understand why I need resources per swapchain image instead of per frame in flight.
This tutorial:
https://vulkan-tutorial.com/Uniform_buffers
has a uniform buffer per swapchain image. Why would I need that if different images are not in flight at the same time? Can I not start rewriting if the previous frame has completed?
Also lunarg tutorial on depth buffers says:
And you need only one for rendering each frame, even if the swapchain has more than one image. This is because you can reuse the same depth buffer while using each image in the swapchain.
This doesn't explain anything, it basically says you can because you can. So why can I reuse the depth buffer but not other resources?
It is to minimize synchronization in the case of the simple Hello Cube app.
Let's say your uniforms change each frame. That means main loop is something like:
Poll (or simulate)
Update (e.g. your uniforms)
Draw
Repeat
If step #2 did not have its own uniform, then it needs to write a uniform previous frame is reading. That means it has to sync with a Fence. That would mean the previous frame is no longer considered "in-flight".
It all depends on the way You are using Your resources and the performance You want to achieve.
If, after each frame, You are willing to wait for the rendering to finish and You are still happy with the final performance, You can use only one copy of each resource. Waiting is the easiest synchronization, You are sure that resources are not used anymore, so You can reuse them for the next frame. But if You want to efficiently utilize both CPU's and GPU's power, and You don't want to wait after each frame, then You need to see how each resource is being used.
Depth buffer is usually used only temporarily. If You don't perform any postprocessing, if Your render pass setup uses depth data only internally (You don't specify STORE for storeOp), then You can use only one depth buffer (depth image) all the time. This is because when rendering is done, depth data isn't used anymore, it can be safely discarded. This applies to all other resources that don't need to persist between frames.
But if different data needs to be used for each frame, or if generated data is used in the next frame, then You usually need another copy of a given resource. Updating data requires synchronization - to avoid waiting in such situations You need to have a copy a resource. So in case of uniform buffers, You update data in a given buffer and use it in a given frame. You cannot modify its contents until the frame is finished - so to prepare another frame of animation while the previous one is still being processed on a GPU, You need to use another copy.
Similarly if the generated data is required for the next frame (for example framebuffer used for screen space reflections). Reusing the same resource would cause its contents to be overwritten. That's why You need another copy.
You can find more information here: https://software.intel.com/en-us/articles/api-without-secrets-the-practical-approach-to-vulkan-part-1

How to return acquired SwapChain image back to the SwapChain?

I can currently acquire swap chain image, draw to it and then present it. After vkQueuePresentKHR the image is returned back to the swap chain. Is there other way to return the image back. I do not want to display the rendered data to screen.
You can probably do what you want here by simply not presenting the images to the device. But the number of images you can get depends on the VkSurfaceCapabilities of your device.
The maximum number of images that the application can simultaneously acquire from this swapchain is derived by subtracting VkSurfaceCapabilitiesKHR::minImageCount from the number of images in the swapchain and adding 1.
On my device, I can have an 8-image swapchain and the minImageCount is 2, letting me acquire 7 images at once.
If you really want for whatever reason to scrap the frame just do not Present the Image and reuse it next iteration (do not Acquire new Image; use the one you already have).
If there's a possibility you are never going to use some Swapchain Image, you still do not need to worry about it. Acquired Images will be reclaimed (unpresented) when a Swapchain is destroyed.
Seeing your usage comment now, I must add you still need to synchronize. And it is not guaranteed to be round-robin. And that it sounds very misguided. Creating Swapchain seems like equal programming work to creating and binding memory to the Image. Considering the result is not "how it is meant to be used"...
From a practical point, you will probably not have good choice of Swapchain Image formats, types and usage flags and they can be limited by size and numbers you can use. It will probably not work well across platforms. It may come with performance hit too.
TL;DR Swapchains are only for interaction with the windowing system (or lack thereof) of the OS. For other uses there are appropriate non-Swapchain commands and objects.
Admittedly Vulkan is sometimes less than terse to write in(a product of it being C-based, reasonably low-level and abstracting a wide range of GPU-like HW), but your proposed technique is not a viable way around it. You need to get used to it and where apropriate make your own abstractions (or use a library doing that).

Working around WebGL readPixels being slow

I'm trying to use WebGL to speed up computations in a simulation of a small quantum circuit, like what the Quantum Computing Playground does. The problem I'm running into is that readPixels takes ~10ms, but I want to call it several times per frame while animating in order to get information out of gpu-land and into javascript-land.
As an example, here's my exact use case. The following circuit animation was created by computing things about the state between each column of gates, in order to show the inline-with-the-wire probability-of-being-on graphing:
The way I'm computing those things now, I'd need to call readPixels eight times for the above circuit (once after each column of gates). This is waaaaay too slow at the moment, easily taking 50ms when I profile it (bleh).
What are some tricks for speeding up readPixels in this kind of use case?
Are there configuration options that significantly affect the speed of readPixels? (e.g. the pixel format, the size, not having a depth buffer)
Should I try to make the readPixel calls all happen at once, after all the render calls have been made (maybe allows some pipelining)?
Should I try to aggregate all the textures I'm reading into a single megatexture and sort things out after a single big read?
Should I be using a different method to get the information back out of the textures?
Should I be avoiding getting the information out at all, and doing all the layout and rendering gpu-side (urgh...)?
Should I try to make the readPixel calls all happen at once, after all the render calls have been made (maybe allows some pipelining)?
Yes, yes, yes. readPixels is fundamentally a blocking, pipeline-stalling operation, and it is always going to kill your performance wherever it happens, because it's sending a request for data to the GPU and then waiting for it to respond, which normal draw calls don't have to do.
Do readPixels as few times as you can (use a single combined buffer to read from). Do it as late as you can. Everything else hardly matters.
Should I be avoiding getting the information out at all, and doing all the layout and rendering gpu-side (urgh...)?
This will get you immensely better performance.
If your graphics are all like you show above, you shouldn't need to do any “layout” at all (which is good, because it'd be very awkward to implement) — everything but the text is some kind of color or boundary animation which could easily be done in a shader, and all the layout can be just a static vertex buffer (each vertex has attributes which point at which simulation-state-texel it should be depending on).
The text will be more tedious merely because you need to load all the digits into a texture to use as a spritesheet and do the lookups into that, but that's a standard technique. (Oh, and divide/modulo to get the digits.)
I don't know enough about your use case but just guessing, Why do you need to readPixels at all?
First, you don't need to draw text or your the static parts of your diagram in WebGL. Put another canvas or svg or img over the WebGL canvas, set the css so they overlap. Let the browser composite them. Then you don't have to do it.
Second, let's assume you have a texture that has your computed results in it. Can't you just then make some geometry that matches the places in your diagram that needs to have colors and use texture coords to look up the results from the correct places in the results texture? Then you don't need to call readPixels at all. That shader can use a ramp texture lookup or any other technique to convert the results to other colors to shade the animated parts of your diagram.
If you want to draw numbers based on the result you can use a technique like this so you'd make a shader at references the result shader to look at a result value and then indexes glyphs from another texture based on that.
Am I making any sense?

Resizable image resource with embedded cap insets

This is by far not a showstopper problem just something I've been curious about for some time.
There is this well-known -[UIImage resizableImageWithCapInsets:] API for creating resizable images, which comes really handy when texturing variable size buttons and frames, especially on the retina iPad and especially if you have lots of those and you want to avoid bloating the app bundle with image resources.
The cap insets are typically constant for a given image, no matter what size we want to stretch it to. We can also put that this way: the cap insets are characteristic for a given image. So here is the thing: if they logically belong to the image, why don't we store them together with the image (as some kind of metadata), instead of having to specify them everywhere where we got to create a new instance?
In the daily practice, this could have serious benefits, mainly by means of eliminating the possibility of human error in the process. If the designer who creates the images could embed the appropriate cap values upon exporting in the image file itself then the developers would no longer have to write magic numbers in the code and maintain them updated each time the image changes. The resizableImage API could read and apply the caps automatically. Heck, even a category on UIImage would make do.
Thus my question is: is there any reliable way of embedding metadata in images?
I'd like to emphasize these two words:
reliable: I have already seen some entries on the optional PNG chunks but I'm afraid those are wiped out of existence once the iOS PNG optimizer kicks in. Or is there a way to prevent that? (along with letting the optimizer do its job)
embedding: I have thought of including the metadata in the filename similarly to what Apple does, i.e. "#2x", "~ipad" etc. but having kilometer-long names like "image-20.0-20.0-40.0-20.0#2x.png" just doesn't seem to be the right way.
Can anyone come up with smart solution to this?
Android has a filetype called nine-patch that is basically the pieces of the image and metadata to construct it. Perhaps a class could be made to replicate it. http://developer.android.com/reference/android/graphics/NinePatch.html