This question already has an answer here:
Synchronization between drawcalls in Vulkan
(1 answer)
Closed 3 years ago.
Suppose I have two VkPipelines and within a VkCommandBuffer I record...
vkCmdBeginRenderPass(cmd, /*...*/);
vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline1);
vkCmdDraw(cmd, /*...*/); // [1]
vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline2);
vkCmdDraw(cmd, /*...*/); // [2]
vkCmdEndRenderPass(cmd);
When the command buffer is queued and executed, will it be as-if the rendering operations to the framebuffer attachments of [1] are fully realized before [2] starts executing.
ie Will [2] draw over [1] ?
Most stages in Vulkan execute in an arbitrary order relative to each other. However, rasterization order is respected with regard to framebuffer attachment processes within a subpass (between subpasses, you have to use subpass dependencies, and outside of the renderpass, you'll need either external subpass dependencies or a barrier). Each primitive is ordered relative to each other primitive, and the implementation must respect rasterization order when doing reordering.
The stages that follow rasterization order atomically include depth/stencil test, blending, write masking, and the like, but they do not include the fragment shader itself. That is, FS outputs have to go through rasterization order, but FS side effects (ie: writes via image store or SSBOs) do not.
There's a set of rules defined in 24.2. Rasterization Order regarding primitive drawing in a single subpass. According to these rules, blending operations and color writes of the second primitive should happen strictly after blending operations and color writes of the first primitive.
Related
I have a case where I am writing to integer framebuffers, and I want to use logic operations when writing to pixels in the fragment shader. These are the steps I followed:
When creating the logical device, I set the VkPhysicalDeviceFeatures.logicOp to VK_TRUE (so this feature is enabled)
when creating the pipeline, I set VkPipelineColorBlendStateCreateInfo.logicOpEnable to VK_TRUE, and VkPipelineColorBlendStateCreateInfo.logicOp to VK_LOGIC_OP_COPY.
My framebuffer format is VK_FORMAT_R32G32B32A32_SINT
Once I render the frame, I see that nothing is getting updated in the frame buffer. Is there any step I am missing? (btw, I don't get any validation errors).
Thanks!
The following code from the Vulkan Tutorial seems to conflict with how synchronization scopes work.
// <dependency> is a subpass dependency.
dependency.srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
...
dependency.dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
The above code is trying to set both the srcStageMask and dstStageMask to be the same pipeline stage: VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT.
According to Vulkan Specification:
If a synchronization command includes a source stage mask, its first synchronization scope only includes execution of the pipeline stages specified in that mask, ...
If a synchronization command includes a destination stage mask, its second synchronization scope only includes execution of the pipeline stages specified in that mask, ...
In other words, srcStageMask and dstStageMask create a first synchronization scope with specified stage(s) and a second one with the specified stage(s), respectively.
Also, according to the following:
... for two sets of operations, the first set must happen before the second set.
My confusion is that, since the source and destination stage are the same, the subpass dependency is requiring this pipeline stage must complete before the exact same stage starts to execute.
The color attachment output stage is already guaranteed to be finished (the first scope). How can you specify to start to execute the same finished stage again? (the second scope)
So what is this dependency is trying to say?
A stage only exists within an action command that executes some portion of itself within that stage. Synchronization scopes are based on commands first. Once you have defined which commands are in the scope, stage masks can specify which stages within those commands are affected by the synchronization.
As such, all synchronization operations define a set of commands that happen before the synchronization and the set of commands that happen after. These represent the "first synchronization scope" and "second synchronization scope".
The source stage mask applies to the commands in the "first synchronization scope". The destination stage mask applies to commands in the "second synchronization scope". The commands in one scope are a distinct set from the other scope. So even if you're talking about the same pipeline stages, they're stages in different commands that execute at different times.
So what that does is exactly what it says: it creates a dependency between all executions of the color attachment stage from the source subpass (aka: the "first synchronization scope") and all executions of the color attachment stage from the destination subpass (aka: the "second synchronization scope").
I think the question is clear, but in case the answer is no I'll describe the conundrum I have:
Minimal setup so a single render pass with a single subpass. Two attachments: color and depth, rendering a cube. The Depth attachment layouts (initial, mid, final) are:
VK_IMAGE_LAYOUT_UNDEFINED
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
So there's one automatic layout transition. I know that because of my .loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR, I'll get a write-after-write warning If I don't make it visible. So I'll use this subpass dependency:
constexpr VkSubpassDependency in_dependency{
.srcSubpass = VK_SUBPASS_EXTERNAL,
.dstSubpass = 0,
.srcStageMask = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
.dstStageMask = VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT,
.srcAccessMask = 0,
.dstAccessMask = VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT
};
This targets the early fragment test because that's where the depth att gets clear-loaded. But: Don't I also need to include the _READ_BIT in my .dstAccessMask? Sync validation doesn't seem to care, but I think I do unless I missed some rule about the write visibility implying a read visibility?
In case there is such a thing, a pointer to the spec would be nice.
WRITE does not include READ. This is simply a matter of the operation in question.
Clearing an image uses the WRITE access mode. It does not use the READ access mode. So there's is no further hazard as far as clearing is concerned.
Once the image is cleared, the subpass can begin executing. Since subpass execution happens-after the clearing operation, there's no need for any further dependency.
I wrote a VkRenderPass. There are two color and depth attachment VkAttachmentDescriptions and we note they are C0, C1, D0, and D1. I list the details of each VkAttachmentDescription as follows:
C0 : LoadOp clear, StoreOp Store.
D0 : LoadOp clear, StoreOp Store.
C1 : LoadOp load, StoreOp Store.
D1 : LoadOp load, StoreOp Store.
And I wrote two sub-passes for this VkRenderPass, the first sub-pass (note SP1) use C0 as color attachment and D0 as depth attachment. The second sub-pass (note SP2) use C1 as color attachment and D1 as depth attachment. The VkSubPassDependency between SP1 and SP2 describe as follows :
srcPipelineStage : Bottom.
dstPipelineStage : Top
It means draw commands in SP2 need wait SP1 finished. I try to avoid write-after-write hazard.(I’m not sure it’s correct usage or not?)
Now let us see my render flow.
Firstly, I have two VkImages, one is used as color buffer(note Cb). Another is used as depth buffer(note Db).
Secondly, I wrote four VkImageViews for bind the color and depth buffer for VkFrameBuffer(note FB). I specify the four ImageViews as follows :
ImageView 0 bind with color buffer Cb.
ImageView 1 bind with depth buffer Db.
ImageView 2 bind with color buffer Cb.
ImageView 3 bind with depth buffer Db.
And then, the draw flow are list as follow :
BeginRenderPass with FB( current is SP0)
vkCmdDraw for quad1. (we call this C00)
vkCmdDraw for quad2. (we call this C01)
vkCmdNextRenderPass (current should be SP1)
vkCmdDraw for quad1. (we call this C00)
vkCmdDraw for quad2. (we call this C01)
EndRenderPass
Finally, I can see the result is my expect. But I have some questions about this rendering flow.
The first is about multiple sub-passes. I found some informations and they tell me that the execution between sub-passes are asyncheonization. Is it real?
If it’s real(command execution is asynchronization between sub-passes), the second question is coming.
The pipeline stages of the draw commands in same sub-pass progress are step-by-steps?
For example, C00 and C01 are draw cmds in SP0. Is the execution about C00 and C01 like this :
SP0 => C00 TOP -> C01 TOP -> C00 VERTEX INPUT -> C01 VERTEX INPUT -> C00 VERTEX SHADER -> C01 VERTEX SHADER -> … C00 BOTTOM -> C01 BOTTOM.
(Because I think it should execute by the steps in this example, thus VkSubPassDependency be able to use for synchronizing sub-passes.)
They are my two question.
B.R.
1.
Yes, as mostly everything in Vulkan, subpasses are asynchronous to each other. Any kind of synchronization is given only explicitly by those Subpass Dependencies.
For completeness, "asynchronous" means no timing is specified between the subpasses. They might be executed serially, reordered, in paralel, preempted, or anything in between. The drivers choice.
2.
In a single subpass output attachments of draw commands are synchronized implicitly\automatically (one of about two exceptions the specification makes for sanity). This is called Rasterization Order. And it respects Primitive Order, meaning the triangles of the second draw will be on top of the triangles of the first draw.
Anything else needs explicit synchronization.
TOP of all commands does technically execute in ordrer. Though "execute" is bit of a misnomer. The stage does nothing.
Your stage order is only one of many possible ones. Another conformant order would be e.g. all stages on C00 -> all stages on C01. Though that is a highly theoretical difference. For it to matter we would have to observe some side-effect of this, which I can't imagine being possible.
The Subpass Dependency works either way, so not sure how the two questions relate. When you mean all stages, I suggest using ALL instead of TOP or BOTTOM; much more readable and less error-prone.
Standard usage of Barriers is relatively straightforward, but I was wondering what is the behavior of two (or more) overlapping Image Barriers (especially with respect to their side effect -- the layout transition). E.g. (pseudocode):
begin( commandBuffer );
1: write( image );
2: imageBarrier(
image,
src=STAGE_FRAGMENT(from the write at 1:),
dst=STAGE_FRAGMENT(intended for read in FS of read at 4:),
appropriate src and dst access flags,
newLayout=A
);
3: imageBarrier(
image,
src=STAGE_FRAGMENT(from the write at 1:),
dst=STAGE_TRANSFER(intended for read by transfer of readT at 5:),
appropriate src and dst access flags,
newLayout=B
);
4: read( image ); // through vkCmdDraw -- expects layout A
5: readT( image ); // different kind of read through Transfer -- expects layout B
end( commandBuffer );
Is this even legal? (can you back it up by spec quote?)
What is the image layout at each point of the program?
For completeness, what is the proper/best way to write this (one producer, two consumers situation)? (Swap lines 3: and 4: and make it Read-Read dependency?)
An image cannot assume multiple layouts simultaneously. In the case of the code you suggested above, since the two barriers have no dependencies on each other, one would happen before the other, but the order is not specified. So the layout of the image afterwards would be one or the other. Which means one of the two reading operations is going to fail.
If you have two operations that use the image from two different layouts, then one of those operations must execute before the other, since they both cannot read the image in the layout they need to. And therefore, there must be an execution dependency between them:
1: write( image );
2: imageBarrier( image, src=COLOR_ATTACHMENT_OUT, dst=FRAGMENT_SHADER, newLayout=A );
3: read( image ); // e.g. through vkCmdDraw -- expects layout A
4: imageBarrier( image, src=FRAGMENT_SHADER, dst=TRANSFER, newLayout=B );
5: readT( image ); // different kind of read e.g. Transfer -- expects layout B
The dependency in #4 says that the layout transition and later TRANSFER commands will not occur until all previous FRAGMENT_SHADER operations have completed.
make it Read-Read dependency
It's not a "Read-Read dependency". A layout transition modifies the image (theoretically at any rate), just as surely as if you had written values to the image directly. So logically what you have is "I need to read from it in the FS. After that, I have to transition it to a new layout. After that, I need to read from it in a transfer operation".
It's a "Read-Write-Read dependency." The middle part needs to wait until the first read is done, but the second read can't happen until the middle part is finished. You need an execution dependency with an associated image memory barrier&layout transition.