I want to load a texture which is layed out in memory as ARGB (with 1 byte components). Therefore when loading it with Vulkan, I need to apply a texture swizzle to it, as there is no ARGB pixel format in Vulkan.
My question is, should I prefer using VK_FORMAT_R8G8B8A8_UNORM over VK_FORMAT_B8G8R8A8_UNORM or the other way around?
PD: This texture will be used only once and then updated
Edit: Please note that the available alternatives are:
VK_FORMAT_R8G8B8A8_UNORM
VK_FORMAT_B8G8R8A8_UNORM
VK_FORMAT_A8B8G8R8_UNORM_PACK32 (equivalent to VK_FORMAT_R8G8B8A8_UNORM on LE systems, therefore I will ignore it)
Both of the two non-packed types are required image formats in Vulkan and therefore will always be present in Vulkan implementations for most kinds of image usage. So long as you're not using the format for buffer views, image load/store, or as an attachment, there is no Vulkan-detectable reason to prefer one format over the other.
Since you are going to apply a swizzle mask to the image view, the order of components in the format isn't all that relevant to texture sampling operations.
As such, which one you use is just a matter of taste.
Related
I'm a little fuzzy on the concepts of RenderPass, Image, and Framebuffer, especially FrameBuffer, but is it a piece of memory for storing pixel data or not, and if so, how is it different from Image?
Or does the rendering flow like this:
The rendered pixels are stored in Framebuffer first, and then in Image?
Objects in Vulkan are pretty much what the API suggests that they are. VkFramebuffer does not get attached to any VkDeviceMemory allocations (directly), so it clearly is not "a piece of memory for storing pixel data". The Vulkan specification lays out the role of a VkFramebuffer pretty simply:
Framebuffers represent a collection of specific memory attachments that a render pass instance uses.
That's all they do: they hold all of the image attachments that a particular render pass instance will use.
In Direct3D12, you can use "ID3D12Resource::WriteToSubresource" to enable zero-copy optimizations for UMA adapters.
What is the equivalent of "ID3D12Resource::WriteToSubresource" in Vulkan?
What WriteToSubresource seems to do (in Vulkan-equivalent terms) is write pixel data from CPU memory to an image whose storage is in CPU-writable memory (hence the requirement that it first be mapped), to do so immediately without the need for a command buffer, and to be able to do so regardless of linear/tiling.
Vulkan doesn't have a way to do that. You can write directly to the backing storage for linear images (in the generic layout), but not for tiled ones. You have to use a proper transfer command for that, even on UMA architectures. Which means building a command buffer and submitting to a transfer-capable queue, since Vulkan doesn't have any immediate copy commands like that.
A Vulkan way to do this would essentially be a function that writes data to a mapped pointer to device memory storage as appropriate for a tiled VkImage in the pre-initialized layout that you intend to store in a particular region of memory. That way, you could then bind the image to that location of memory, and you'd be able to transition the layout to whatever you want.
But that would require adding such a function and allowing the pre-initialized layout to be used for tiled images (so long as the data is written by this function).
So, from ID3D12Resource::WriteToSubresource docunentation I read it performs one copy, with marketeze sprinkled on top.
Vulkan is an explicit API, which does perfectly allow you to do an one-copy on UMA (or on anything else). It even allows you to do real zero-copy, if you stick with linear tiling.
UMA may look like this: https://vulkan.gpuinfo.org/displayreport.php?id=4919#memorytypes
I.e. has only one heap, and the memory type is both DEVICE_LOCAL and HOST_VISIBLE.
So, if you create a linearly tiled image\buffer in Vulkan, vkMapMemory its memory, and then produce your data into that mapped pointer directly, there you have a (real) zero-copy.
Since this is not always practical (i.e. you cannot always choose how things are allocated, e.g. if it is data returned from library function), there is an extension VK_EXT_external_memory_host (assuming your ICD supports it of course), which allows you to import your host data directly, without having to first make a Vulkan memory map.
Now, there are optimally tiled images. Optimal tiling is opaque in Vulkan (so far), and implementation-dependent, so you do not even know the addressing scheme without some reverse engineering. You, generally speaking, want to use optimally tiled images, because supposedly accessing them has better performance characteristics (at least in common situations).
This is where the single copy comes in. You would take your linearly tiled image (or buffer), and vkCmdCopy* it into your optimally tiled image. That copy is performed by the Device\GPU with all its bells and whistles, potentially faster than CPU, i.e. what I suspect they would call "near zero-copy".
Let me say the scenario, we have several meshes with the same shader (material type, e.g. PBR material), but the difference between meshes materials are the uniform buffer and textures for rendering them.
For uniform buffer we have a dynamic uniform buffer technique that uniform buffer offsets can be specify for each draw in the command buffer, but for the image till here I didn't find a way of specifying image view in command buffer for descriptor set. In all the sample codes I have seen till now, for every mesh and every material of that mesh they have a new pipeline, descriptor sets and etc.
I think it is not the best way, there must be a way to only have one pipeline and descriptor set and etc for a material type and only change the uniform buffer offset and texture image-view and sampler, am I right?
If I'm wrong, are these samples doing the best way?
How should I specify the VkDescriptorPoolCreateInfo.maxSets (or other limits like that) for dynamic scene that every minute meshes will add and remove?
Update:
I think it is possible to have a same pipeline and descriptor set layout for all of the objects but problem with VkDescriptorPoolCreateInfo.maxSets (or other limits like that) and the best practice still exist.
It is not duplicate
I was seeking for a way of specifying textures like what we can do with dynamic uniform buffer (to reduce number of descriptor sets) and along with this question there were complementary questions mostly to find best practices for the way that's gonna be suggested with an answer.
You have many options.
The simplest mechanism is to divide your descriptor set layout into sets based on the frequency of changes. Things that change per-scene would be in set 0, things that change per-kind-of-object (character, static mesh, etc), would be in set 1, and things that change per-object would be in set 2. Or whatever. The point is that the things that change with greater frequency go in higher numbered sets.
This texture is per-object, so it would be in the highest numbered set. So you would give each object its own descriptor set containing that texture, then apply that descriptor set when you go to render.
As for VkDescriptorPoolCreateInfo.maxSets, you set that to whatever you feel is appropriate for your system. And if you run out, you can always create another pool; nobody's forcing you to use just one.
However, this is only one option. You can also employ array textures or arrays of textures (depending on your hardware capabilities). In either method, you have an array of different images (either as a single image view or multiple views bound to the same arrayed descriptor). Your per-object uniform data would have that object's texture index, so that it can fetch the index from the array texture/array of textures.
In DirectX12, you render multiple objects in different locations using the equivalent of a single uniform buffer for the world transform like:
// Basic simplified pseudocode
SetRootSignature();
SetPrimitiveTopology();
SetPipelineState();
SetDepthStencilTarget();
SetViewportAndScissor();
for (auto object : objects)
{
SetIndexBuffer();
SetVertexBuffer();
struct VSConstants
{
QEDx12::Math::Matrix4 modelToProjection;
} vsConstants;
vsConstants.modelToProjection = ViewProjMat * object->GetWorldProj();
SetDynamicConstantBufferView(0, sizeof(vsConstants), &vsConstants);
DrawIndexed();
}
However, in Vulkan, if you do something similar with a single uniform buffer, all the objects are rendered in the location of last world matrix:
for (auto object : objects)
{
SetIndexBuffer();
SetVertexBuffer();
UploadUniformBuffer(object->GetWorldProj());
DrawIndexed();
}
Is there a way to draw multiple objects with a single uniform buffer in Vulkan, just like in DirectX12?
I'm aware of Sascha Willem's Dynamic uniform buffer example (https://github.com/SaschaWillems/Vulkan/tree/master/dynamicuniformbuffer) where he packs many matrices in one big uniform buffer, and while useful, is not exactly what I am looking for.
Thanks in advance for any help.
I cannot find a function called SetDynamicConstantBufferView in the D3D 12 API. I presume this is some function of your invention, but without knowing what it does, I can only really guess.
It looks like you're uploading data to the buffer object while rendering. If that's the case, well, Vulkan can't do that. And that's a good thing. Uploading to memory that you're currently reading from requires synchronization. You have to issue a barrier between the last rendering command that was reading the data you're about to overwrite, and the next rendering command. It's just not a good idea if you like performance.
But again, I'm not sure exactly what that function is doing, so my understanding may be wrong.
In Vulkan, descriptors are generally not meant to be changed in the middle of rendering a frame. However, the makers of Vulkan realized that users sometimes want to draw using different subsets of the same VkBuffer object. This is what dynamic uniform/storage buffers are for.
You technically don't have multiple uniform buffers; you just have one. But you can use the offset(s) provided to vkCmdBindDescriptorSets to shift where in that buffer the next rendering command(s) will get their data from. So it's a light-weight way to supply different rendering commands with different data.
Basically, you rebind your descriptor sets, but with different pDynamicOffset array values. To make these work, you need to plan ahead. Your pipeline layout has to explicitly declare those descriptors as being dynamic descriptors. And every time you bind the set, you'll need to provide the offset into the buffer used by that descriptor.
That being said, it would probably be better to make your uniform buffer store larger arrays of matrices, using the dynamic offset to jump from one block of matrices to the other. You would tehn
The point of that is that the uniform data you provide (depending on hardware) will remain in shader memory unless you do something to change the offset or shader. There is some small cost to uploading such data, so minimizing the need for such uploads is probably not a bad idea.
So you should go and upload all of your objects buffer data in a single DMA operation. Then you issue a barrier, and do your rendering, using dynamic offsets and such to tell each offset where it goes.
You either have to use Push constants or have separate uniform buffers for each location. These can be bound either with a descriptor per location of dynamic offset.
In Sasha's example you can have more than just the one matrix inside the uniform.
That means that inside UploadUniformBuffer you append the new matrix to the buffer and bind the new location.
I'm trying to use WebGL to speed up computations in a simulation of a small quantum circuit, like what the Quantum Computing Playground does. The problem I'm running into is that readPixels takes ~10ms, but I want to call it several times per frame while animating in order to get information out of gpu-land and into javascript-land.
As an example, here's my exact use case. The following circuit animation was created by computing things about the state between each column of gates, in order to show the inline-with-the-wire probability-of-being-on graphing:
The way I'm computing those things now, I'd need to call readPixels eight times for the above circuit (once after each column of gates). This is waaaaay too slow at the moment, easily taking 50ms when I profile it (bleh).
What are some tricks for speeding up readPixels in this kind of use case?
Are there configuration options that significantly affect the speed of readPixels? (e.g. the pixel format, the size, not having a depth buffer)
Should I try to make the readPixel calls all happen at once, after all the render calls have been made (maybe allows some pipelining)?
Should I try to aggregate all the textures I'm reading into a single megatexture and sort things out after a single big read?
Should I be using a different method to get the information back out of the textures?
Should I be avoiding getting the information out at all, and doing all the layout and rendering gpu-side (urgh...)?
Should I try to make the readPixel calls all happen at once, after all the render calls have been made (maybe allows some pipelining)?
Yes, yes, yes. readPixels is fundamentally a blocking, pipeline-stalling operation, and it is always going to kill your performance wherever it happens, because it's sending a request for data to the GPU and then waiting for it to respond, which normal draw calls don't have to do.
Do readPixels as few times as you can (use a single combined buffer to read from). Do it as late as you can. Everything else hardly matters.
Should I be avoiding getting the information out at all, and doing all the layout and rendering gpu-side (urgh...)?
This will get you immensely better performance.
If your graphics are all like you show above, you shouldn't need to do any “layout” at all (which is good, because it'd be very awkward to implement) — everything but the text is some kind of color or boundary animation which could easily be done in a shader, and all the layout can be just a static vertex buffer (each vertex has attributes which point at which simulation-state-texel it should be depending on).
The text will be more tedious merely because you need to load all the digits into a texture to use as a spritesheet and do the lookups into that, but that's a standard technique. (Oh, and divide/modulo to get the digits.)
I don't know enough about your use case but just guessing, Why do you need to readPixels at all?
First, you don't need to draw text or your the static parts of your diagram in WebGL. Put another canvas or svg or img over the WebGL canvas, set the css so they overlap. Let the browser composite them. Then you don't have to do it.
Second, let's assume you have a texture that has your computed results in it. Can't you just then make some geometry that matches the places in your diagram that needs to have colors and use texture coords to look up the results from the correct places in the results texture? Then you don't need to call readPixels at all. That shader can use a ramp texture lookup or any other technique to convert the results to other colors to shade the animated parts of your diagram.
If you want to draw numbers based on the result you can use a technique like this so you'd make a shader at references the result shader to look at a result value and then indexes glyphs from another texture based on that.
Am I making any sense?