I'm just getting started with Metal, and am having trouble grasping some basic things. I've been reading a whole bunch of web pages about Metal, and working through Apple's examples, and so forth, but gaps in my understanding remain. I think my key point of confusion is: what is the right way to handle vertex buffers, and how do I know when it's safe to reuse them? This confusion manifests in several ways, as I'll describe below, and maybe those different manifestations of my confusion need to be addressed in different ways.
To be more specific, I'm using an MTKView subclass in Objective-C on macOS to display very simple 2D shapes: an overall frame for the view with a background color inside, 0+ rectangular subframes inside that overall frame with a different background color inside them, and then 0+ flat-shaded squares of various colors inside each subframe. My vertex function is just a simple coordinate transformation, and my fragment function just passes through the color it receives, based on Apple's triangle demo app. I have this working fine for a single subframe with a single square. So far so good.
There are several things that puzzle me.
One: I could design my code to render the whole thing with a single vertex buffer and a single call to drawPrimitives:, drawing all of the (sub)frames and squares in one big bang. This is not optimal, though, as it breaks the encapsulation of my code, in which each subframe represents the state of one object (the thing that contains the 0+ squares); I'd like to allow each object to be responsible for drawing its own contents. It would be nice, therefore, to have each object set up a vertex buffer and make its own drawPrimitives: call. But since the objects will draw sequentially (this is a single-threaded app), I'd like to reuse the same vertex buffer across all of these drawing operations, rather than having each object have to allocate and own a separate vertex buffer. But can I do that? After I call drawPrimitives:, I guess the contents of the vertex buffer have to be copied over to the GPU, and I assume (?) that is not done synchronously, so it wouldn't be safe to immediately start modifying the vertex buffer for the next object's drawing. So: how do I know when Metal is done with the buffer and I can start modifying it again?
Two: Even if #1 has a well-defined answer, such that I could block until Metal is done with the buffer and then start modifying it for the next drawPrimitives: call, is that a reasonable design? I guess it would mean that my CPU thread would be repeatedly blocking to wait for the memory transfers, which is not great. So does that pretty much push me to a design where each object has its own vertex buffer?
Three: OK, suppose each object has its own vertex buffer, or I do one "big bang" render of the whole thing with a single big vertex buffer (this question applies to both designs, I think). After I call presentDrawable: and then commit on my command buffer, my app will go off and do a little work, and then will try to update the display, so my drawing code now executes again. I'd like to reuse the vertex buffers I allocated before, overwriting the data in them to do the new, updated display. But again: how do I know when that is safe? As I understand it, the fact that commit returned to my code doesn't mean Metal is done copying my vertex buffers to the GPU yet, and in the general case I have to assume that could take an arbitrarily long time, so it might not be done yet when I re-enter my drawing code. What's the right way to tell? And again: should I just block waiting until they are available (however I'm supposed to do that), or should I have a second set of vertex buffers that I can use in case Metal is still busy with the first set? (That seems like it just pushes the problem down the pike, since when my drawing code is entered for the third update both previously used sets of buffers might not yet be available, right? So then I could add a third set of vertex buffers, but then the fourth update...)
Four: For drawing the frame and subframes, I'd like to just write a reuseable "drawFrame" type of function that everybody can call, but I'm a bit puzzled as to the right design. With OpenGL this was easy:
- (void)drawViewFrameInBounds:(NSRect)bounds
{
int ox = (int)bounds.origin.x, oy = (int)bounds.origin.y;
glColor3f(0.77f, 0.77f, 0.77f);
glRecti(ox, oy, ox + 1, oy + (int)bounds.size.height);
glRecti(ox + 1, oy, ox + (int)bounds.size.width - 1, oy + 1);
glRecti(ox + (int)bounds.size.width - 1, oy, ox + (int)bounds.size.width, oy + (int)bounds.size.height);
glRecti(ox + 1, oy + (int)bounds.size.height - 1, ox + (int)bounds.size.width - 1, oy + (int)bounds.size.height);
}
But with Metal I'm not sure what a good design is. I guess the function can't just have its own little vertex buffer declared as a local static array, into which it throws vertices and then calls drawPrimitives:, because if it gets called twice in a row Metal might not yet have copied the vertex data from the first call when the second call wants to modify the buffer. I obviously don't want to have to allocate a new vertex buffer every time the function gets called. I could have the caller pass in a vertex buffer for the function to use, but that just pushes the problem out a level; how should the caller handle this situation, then? Maybe I could have the function append new vertices onto the end of a growing list of vertices in a buffer provided by the caller; but this seems to either force the whole render to be completely pre-planned (so that I can preallocate a big buffer of the right size to fit all of the vertices everybody will draw – which requires the top-level drawing code to somehow know how many vertices every object will end up generating, which violates encapsulation), or to do a design where I have an expanding vertex buffer that gets realloc'ed as needed when its capacity proves insufficient. I know how to do these things; but none of them feels right. I'm struggling with what the right design is, because I don't really understand Metal's memory model well enough, I think. Any advice? Apologies for the very long multi-part question, but I think all of this goes to the same basic lack of understanding.
The short answer to you underlying question is: you should not overwrite resources that are used by commands added to a command buffer until that command buffer has completed. The best way to determine that is to add a completion handler. You could also poll the status property of the command buffer, but that's not as good.
First, until you commit the command buffer, nothing is copied to the GPU. Further, as you noted, even after you commit the command buffer, you can't assume the data has been fully copied to the GPU.
Second, you should, in the simple case, put all drawing for a frame into a single command buffer. Creating and committing a lot of command buffers (like one for every object that draws) adds overhead.
These two points combined means you can't typically reuse a resource during the same frame. Basically, you're going to have to double- or triple-buffer to get correctness and good performance simultaneously.
A typical technique is to create a small pool of buffers guarded by a semaphore. The semaphore count is initially the number of buffers in the pool. Code which wants a buffer waits on the semaphore and, when that succeeds, take a buffer out of the pool. It should also add a completion handler to the command buffer that puts the buffer back in the pool and signals the semaphore.
You could use a dynamic pool of buffers. If code wants a buffer and the pool is empty, it creates a buffer instead of blocking. Then, when it's done, it adds the buffer to the pool, effectively increasing the size of the pool. However, there's typically no point in doing that. You would only need more than three buffers if the CPU is running way ahead of the GPU, and there's no real benefit to that.
As to your desire to have each object draw itself, that can certainly be done. I'd use a large vertex buffer along with some metadata about how much of it has been used so far. Each object that needs to draw will append its vertex data to the buffer and encode its drawing commands referencing that vertex data. You would use the vertexStart parameter to have the drawing command reference the right place in the vertex buffer.
You should also consider indexed drawing with the primitive restart value so there's only a single draw command which draws all of the primitives. Each object would add its primitive to the shared vertex data and index buffers and then some high level controller would do the draw.
I thought in two ways to write my opengl es 2.0 code.
First, I write many calls to draw elements in the screen with many VAOs and VBOs or one only VAO and many VBOs.
Second, I save the coordinates of all elements in one list and I write all vertices of these coordinates in one only VAO and one only VBO and draw all vertices in the screen.
What is the better way that I should follow?
These are the ones I thought, what other ways are there?
The VAO is meant to save you some setup calls when setting the vertex attributes pointers and enabling/disabling the pipeline states related to that setup. Having just one VAO isn't saving you anything, because you will repeatedly re-bind the vertex buffers and change some settings. So you should aim to have multiple VAOs, one per "static" rendering batch, but not necessarily one per object drawn.
As to having all vertices in single VBO or many VBOs - that really depends on the task.
Having all data in single VBO has no benefits if you draw that all in many calls. But there's also no point in allocating one VBO per sprite. It's always about the balance between the costs of different calls to setup the pipeline, so ideally you try different approaches and decide what's best for you in your particular case.
There might be restrictions on the buffer sizes, and there's definitely "reasonable" sizes preferred by specific implementations. I remember some issues with old Intel drivers, when rendering the portion of the buffer would process the entire buffer, skipping unneeded vertices.
In my vulkan application i used to draw meshes like this when all the meshes used the same texture
Updatedescriptorsets(texture)
Command buffer record
{
For each mesh
Bind transformubo
Draw mesh
}
But now I want each mesh to have a unique texture so i tried this
Command buffer record
{
For each mesh
Bind transformubo
Updatedescriptorsets (textures[meshindex])
Draw mesh
}
But it gives an error saying descriptorset is destroyed or updated. I looked in vulkan documentation and found out that I can't update descriptorset during command buffer records. So how can I have a unique texture to each mesh?
vkUpdateDescriptorSets is not synchonrized with anything. Therefore, you cannot update a descriptor set while it is in use. You must ensure that all rendering operations that use the descriptor set in question have finished, and that no commands have been placed in command buffers that use the set in question.
It's basically like a global variable; you can't have people accessing a global variable from numerous threads without some kind of synchronization. And Vulkan doesn't synchronize access to descriptor sets.
There are several ways to deal with this. You can give each object its own descriptor set. This is usually done by having the frequently changing descriptor set data be of a higher index than the less frequently changing data. That way, you're not changing every descriptor for each object, only the ones that change on a per-object basis.
You can use push constant data to index into large tables/array textures. So the descriptor set would have an array texture or an array of textures (if you have dynamic indexing for arrays of textures). A push constant would provide an index, which is used by the shader to fetch that particular object's texture from the array texture/array of textures. This makes frequent changes fairly cheap, and the same index can also be used to give each object its own transformation matrices (by fetching into an array of matrices).
If you have the extension VK_KHR_push_descriptor available, then you can integrate changes to descriptors directly into the command buffer. How much better this is than the push constant mechanism is of course implementation-dependent.
If you update a descriptor set then all command buffers that this descriptor set is bound to will become invalid. Invalid command buffers cannot be submitted or be executed by the GPU.
What you basically need to do is to update descriptor sets before you bind them.
This odd behavior is there because in vkCmdBindDescriptorSets some implementations take the vulkan descriptor set, translate it to native descriptor tables and then store it in the command buffer. So if you update the descriptor set after vkCmdBindDescriptorSets the command buffer will be seeing stale data. VK_EXT_descriptor_indexing extension relaxed this behavior under some circumstances.
In DirectX12, you render multiple objects in different locations using the equivalent of a single uniform buffer for the world transform like:
// Basic simplified pseudocode
SetRootSignature();
SetPrimitiveTopology();
SetPipelineState();
SetDepthStencilTarget();
SetViewportAndScissor();
for (auto object : objects)
{
SetIndexBuffer();
SetVertexBuffer();
struct VSConstants
{
QEDx12::Math::Matrix4 modelToProjection;
} vsConstants;
vsConstants.modelToProjection = ViewProjMat * object->GetWorldProj();
SetDynamicConstantBufferView(0, sizeof(vsConstants), &vsConstants);
DrawIndexed();
}
However, in Vulkan, if you do something similar with a single uniform buffer, all the objects are rendered in the location of last world matrix:
for (auto object : objects)
{
SetIndexBuffer();
SetVertexBuffer();
UploadUniformBuffer(object->GetWorldProj());
DrawIndexed();
}
Is there a way to draw multiple objects with a single uniform buffer in Vulkan, just like in DirectX12?
I'm aware of Sascha Willem's Dynamic uniform buffer example (https://github.com/SaschaWillems/Vulkan/tree/master/dynamicuniformbuffer) where he packs many matrices in one big uniform buffer, and while useful, is not exactly what I am looking for.
Thanks in advance for any help.
I cannot find a function called SetDynamicConstantBufferView in the D3D 12 API. I presume this is some function of your invention, but without knowing what it does, I can only really guess.
It looks like you're uploading data to the buffer object while rendering. If that's the case, well, Vulkan can't do that. And that's a good thing. Uploading to memory that you're currently reading from requires synchronization. You have to issue a barrier between the last rendering command that was reading the data you're about to overwrite, and the next rendering command. It's just not a good idea if you like performance.
But again, I'm not sure exactly what that function is doing, so my understanding may be wrong.
In Vulkan, descriptors are generally not meant to be changed in the middle of rendering a frame. However, the makers of Vulkan realized that users sometimes want to draw using different subsets of the same VkBuffer object. This is what dynamic uniform/storage buffers are for.
You technically don't have multiple uniform buffers; you just have one. But you can use the offset(s) provided to vkCmdBindDescriptorSets to shift where in that buffer the next rendering command(s) will get their data from. So it's a light-weight way to supply different rendering commands with different data.
Basically, you rebind your descriptor sets, but with different pDynamicOffset array values. To make these work, you need to plan ahead. Your pipeline layout has to explicitly declare those descriptors as being dynamic descriptors. And every time you bind the set, you'll need to provide the offset into the buffer used by that descriptor.
That being said, it would probably be better to make your uniform buffer store larger arrays of matrices, using the dynamic offset to jump from one block of matrices to the other. You would tehn
The point of that is that the uniform data you provide (depending on hardware) will remain in shader memory unless you do something to change the offset or shader. There is some small cost to uploading such data, so minimizing the need for such uploads is probably not a bad idea.
So you should go and upload all of your objects buffer data in a single DMA operation. Then you issue a barrier, and do your rendering, using dynamic offsets and such to tell each offset where it goes.
You either have to use Push constants or have separate uniform buffers for each location. These can be bound either with a descriptor per location of dynamic offset.
In Sasha's example you can have more than just the one matrix inside the uniform.
That means that inside UploadUniformBuffer you append the new matrix to the buffer and bind the new location.
My current understanding is that when I do instanced rendering, I pack vertex, texture, normal, index and instance data to a single VAO.
This is a bit of a problem, I'd want to separate instance data from the rest. The reason for this is that I desire to make OO code and make a Mesh class that doesn't hold any instance data.
I have no idea how to do this, it might even be impossible. Binding two VAOs is not possible(?) Expanding a VAO is not possible either(?)