Can a Command Buffer be reset while being executed on the GPU in Vulkan? - vulkan

I have the following scenario in mind and I don't know if it is valid:
Create a VkCommandPool at the beginning of the program and allocate a single VkCommandBuffer from it.
In the render loop, record commands to the VkCommandBuffer (implicitly resetting it) referencing the VkFramebuffer appropriate for the current VkImageView from the VkSwapchain.
Submit the command buffer
I am not sure if I can reset the command buffer and rerecord it immediately on the next frame after it has just been submitted for execution. Is this defined behavior and does it allow multiple frames in flight, or is it flawed in some way?
On one hand, it seems that this should be valid as after being submitted, the commands were copied to the GPU, but on the other hand, after seeing the flag VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT it seems like by default command buffers cannot be submitted while they are already "pending".
I think this problem can be generalized to multiple command buffers and whether each of them should have as many copies there are vkImages in the vkSwapchain or a single one would suffice.

Command buffers exist in 5 possible states: initial, recording, executable (can be submitted), pending (submitted for execution), and invalid (can't be used). Which operations are valid depends on which state. And what states are accessible from which other states is pretty well specified in the standard.
If you have submitted a CB, then it is in one of the following states:
pending, if it has yet to complete execution
invalid, if it completed execution but was one-use-only
executable, if it completed execution and can be re-submitted
A CB which is in the pending state cannot be reset, but a CB can be reset if it is executable or invalid.
So if by "while it is being executed by the GPU", you mean "in the pending state", then no.
Note that there is no way to detect the state of a CB. It's something you have to keep track of indirectly. For example, if you submit a batch of work, you must assume it is pending unless you do something that synchronizes with the execution of that batch of work. That could be testing a fence (if it returns set, then the CB is no longer pending) testing a timeline semaphore, or something similar.

Related

Vulkan - How to efficiently copy data to CPU *and* wait for it

Let's say I want to execute the following commands:
cmd_buff start
dispatch (write to texture1)
copy (texture1 on gpu to buffer1 host-visible)
dispatch (write to texture2)
cmd_buff end
I'd like to know as soon as possible when buffer1's data are available.
My idea here is to have a waiting thread on which I'd wait for the copy to have completed. What I'd do is first split the above list of cmds into:
cmd_buff_1 start
dispatch (write to texture1)
copy (texture1 on gpu to buffer1 host-visible)
cmd_buff_1 end
and:
cmd_buff_2 start
dispatch (write to texture2)
cmd_buff_2 end
Now, I'd call vkQueueSubmit with cmd_buff_1 and with some fence1, followed by a call to another vkQueueSubmit with cmd_buff_2 with NULL fence.
On the waiting thread I'd call vkWaitForFences( fence1 ).
That's how I see such an operation. However, I'm wondering if that is optimal and if there was actually any way to put a direct sync still within cmd_buff_1 so that I wouldn't need to split the cmd buffer into two?
Never break up submit operations just to test fences; submit operations are too heavyweight to do that. If the CPU needs to check to see if work on the GPU has reached a specific point, there are many options other than a fence.
The simplest mechanism for something like this is to use an event. Set the event after the transfer operation, then use vkGetEventStatus on the CPU to see when it is ready. That's a polling function, so a waiting CPU thread won't immediately wake up when the data is ready (but then, there's no guarantee that would happen with a non-polling function either).
If timeline semaphores are available to you, you can wait for them to reach a particular counter value on the CPU with vkWaitSemaphores. This requires that you break the batch up into two batches, but they can both be submitted in the same submit command.

vulkan command buffers synchronization for the case of updating the command buffer

Suppose we have 3 command buffers A, B and C. All enable VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT when they are created. The dependencies are as follows:
B -> A
C -> B, the next image in the swapchain
The synchronization between them is done using semaphores. In the most of time, I can pre-create A, B and C and then just submit them one after another to the rendering queue in the rendering loop. At some time point, I want to modify the command buffer A. However, the problem is that several A have been submitted into the rendering queue. I think at that time point the rendering queue may look like
A B C A B
I cannot modify command buffer A because it is being executed or queued by the GPU. The most naive way is to call vkQueueWaitIdle on the CPU side to wait for all CBs to be done. Then I can modify A and continue back to my rendering sequence. The problem for this method is that it will wait all CBs to be done. In my opinion, I only need to wait for all As to be done in the queue as opposed to waiting for all CBs. Is it possible to do it? Is there a better way to modify A without calling vkQueueWaitIdle?
Don't modify the command buffer. Create a new one and record into that. It really shouldn't matter whether it's the command buffer object A or some alternative command buffer object A'. What matters are the commands you record into it.
In any case, the typical way to know when an operation is finished with a command buffer (or some set thereof) is to use a fence at queue submission time. Fences are particularly coarse grained, but you can query information about their status from the CPU.

Barriers or semaphores for multiple submissions over the same queue?

In this example where we copy some buffer into vertex buffer and we want to quickly to start rendering using this buffer in two submissions without waiting over some fence:
vkBeginCommandBuffer(tansferCommandBuffer)
vkCmdCopyBuffer(tansferCommandBuffer, hostVisibleBuffer, vertexBuffer)
vkEndCommandBuffer(tansferCommandBuffer)
vkQueueSubmit(queue, tansferCommandBuffer)
vkBeginCommandBuffer(renderCommandBuffer)
...
vkCmdBindVertexBuffers(vertexBuffer)
vkCmdDraw()
...
vkEndCommandBuffer(renderCommandBuffer)
vkQueueSubmit(queue, renderCommandBuffer)
From what I understand is that tansferCommandBuffer might not have been finished when renderCommandBuffer is submitted, and renderCommandBuffer may get scheduled and reads form floating data in vertexBuffer.
We could attach a semaphore while submitting tansferCommandBuffer to be singled after completion and forward this semaphore to renderCommandBuffer to wait for before execution. The issue here is that it blocks the second batch commands that do not depend on the buffer.
Or we could insert a barrier after the copy command or before the bind vertex command, which seems to be much better since we can specify that the access to the buffer is our main concerned and possibly keep part of the batch to be executed.
Is there any good reason for using semaphores instead of barriers for similar cases (single queue, multiple submissions)?
Barriers are necessary whenever you change the way in which resources are used to inform the driver/hardware about that change. So in your example the barrier is probably needed too, nevertheless.
But as for the semaphores. When you submit a command buffer, you specify both semaphore handles and pipeline stages at which wait should occur on a corresponding semaphore. You do this through the following members of the VkSubmitInfo structure:
pWaitSemaphores is a pointer to an array of semaphores upon which to wait before the command buffers for this batch begin execution. If
semaphores to wait on are provided, they define a semaphore wait
operation.
pWaitDstStageMask is a pointer to an array of pipeline stages at which each corresponding semaphore wait will occur.
So when you submit a command buffer, all commands up to the specified stage can be executed by the hardware.

Rerecording secondary command buffers

I tried the use of secondary command buffers and run into a problem. After resizing my window, both primary and secondary command buffers are rerecorded.
If the secondary command buffers are updated and the primary command buffers, that those contain, do not have been submitted yet, the validation layers throw a
Calling vkBeginCommandBuffer() on active CB 0x0x166dbc0 before it has completed. You must check CB fence before this call.
error. To fix this, I currently ensure, that all primary command buffers are at least once submitted, before updating the secondary command buffers.
Is there an easier way to avoid this problem, as I waste the rendering of up to 7 frames (number of framebuffers in my swapchain) with this solution?
Command buffer must not be still in use when you try to re-record it.
You need to use VkFence (or some equivalent: vkDeviceWaitIdle() or vkQueueWaitIdle()) to make sure it is not.
There's usually lot to do when resizing and it is not expected to be frequent operation, so:
Just use vkDeviceWaitIdle() when reacting to the resize event (then recreate all entities that need it).
As for secondary command buffers, there is this counter-intuitive statement in the spec:
A secondary command buffer is considered to be pending execution from the time its execution is recorded into a primary buffer (via vkCmdExecuteCommands) until the final time that primary buffer’s submission to a queue completes.
So reading it literaly it is "pending execution", as soon as it is recorded in primary buffer.
(Might be unintended interpretation by the spec makers... I raised it as #414 Issue.)

process states - new state & ready state

As OS concepts book illustrate this section "Process States":
Process has defined states: new, ready, running, waiting and terminated.
I have conflict between new and ready states, I know that in ready state the process is allocated in memory and all resources needed at creation time is allocated but it is only waiting for CPU time (scheduling).
But what is the new state? what is the previous stage before allocating it in memory?
All the tasks that the OS has to perform cannot be allocated memory immediately after the task is submitted to the OS. So they have to remain in the new state. The decision as to when they move to the ready state is taken by the Long term scheduler. More info about long term scheduler here http://en.wikipedia.org/wiki/Scheduling_(computing)#Long-term_scheduling
To be more precise,the new state is for those processes which are just being created.These haven't been created fully and are in it's growing stage.
Whereas,the ready state means that the process created which is stored in PCB(Process Control Block) has got all the resources which it required for execution,but CPU is not running that process' instructions,
I am giving you a simple example :-
Say, you are having 2 processes.Process A is syncing your data over cloud storage and Process B is printing other data.
So,in case process B is getting created to be stored in PCB,the other
process,Process A has been already created and is not getting the
chance to run because CPU hasn't called these instructions of Process
A.But,Process B requires printer to be found and other drivers to be
checked.It must also check for verification of pages to be printed!
So,here Process A has been created and is waiting for
CPU-time---hence,in ready state. Whereas,Process B is waiting for
printer to be initialised and files to be examined to be
printed--->Hence,in new state(That means these processes haven't been
successfully added into PCB).
One more thing to guide you isFor each process there is a Process Control Block, PCB, which stores the process-specific information.
I hope it clears your doubt.Feel free to comment whatever you don't understand...