So I have a render pass with a single subpass that draws directly to a framebuffer. The specification doesn't force me to use dependencies - if I omit them, the implementation inserts them implicitly (though I don't understand why it uses srcStageMask = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT for the first subpass - this stage means the very beginning, i.e. don't wait for anything).
But as usual with the Vulkan - better to be explicit. And here is the confusion - multiple sources use subpasses differently.
Sdk's cube example does not use them at all.
Vulkan-tutorial uses only one:
dependency.srcSubpass = VK_SUBPASS_EXTERNAL;
dependency.dstSubpass = 0;
dependency.srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
dependency.srcAccessMask = 0;
dependency.dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
dependency.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
Why srcAccessMask is zero here?
API without secrets uses two:
dependency.srcSubpass = VK_SUBPASS_EXTERNAL;
dependency.dstSubpass = 0;
dependency.srcStageMask = VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT;
dependency.srcAccessMask = VK_ACCESS_MEMORY_READ_BIT;
dependency.dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
dependency.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
and
dependency.srcSubpass = 0;
dependency.dstSubpass = VK_SUBPASS_EXTERNAL;
dependency.srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
dependency.srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
dependency.dstStageMask = VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT;
dependency.dstAccessMask = VK_ACCESS_MEMORY_READ_BIT;
It's not very clear why srcStageMaskis
VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT in the first subpass - isn't
this stage should be used for execution dependencies, but here we
need a memory dependency? The same question about why dstStageMask
is VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT in the second subpass?
Khronos synchronization examples uses one:
dependency.srcSubpass = VK_SUBPASS_EXTERNAL;
dependency.dstSubpass = 0;
dependency.srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
dependency.srcAccessMask = 0;
dependency.dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
dependency.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
Why srcAccessMask is 0?
And here's my attempt with two dependencies:
dependency.srcSubpass = VK_SUBPASS_EXTERNAL;
dependency.dstSubpass = 0;
dependency.srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT; // need to wait until
presentation is finished reading the image
dependency.srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT;
dependency.dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT; // we are writing to
the image in this stage
dependency.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
and
dependency.srcSubpass = 0;
dependency.dstSubpass = VK_SUBPASS_EXTERNAL;
dependency.srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT; // we are writing to
the image in this stage
dependency.srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
dependency.dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT; // presentation reads
image in this stage (is it?)
dependency.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT;
All this is very confusing. As you can see multiple competent sources have a different vision. Which one to use? How to understand these dependencies?
Oh dear, this is a topic that is not that well understood by most people. A lot of misinformation was assumed and proliferated. The proper place for canonical examples is the wikipage in the github repo.
The relevant section about swapchain images acquire/present can be found here.
it's an old example from before a lot of the synchronization specifics were properly nailed down.
The purpose there is to synchronize the semaphore use to wait on the result of vkAcquireNextImage with rendering. This is a write-after-read hazard and the semaphore will include memory visibility synchronization. So srcAccessMask is no necessary.
Again the barrier is meant to sync against the semaphore. When submitting a command buffer to a queue you can set which stages wait on which semaphores. In this case they use the bottom of pipe stage instead of the VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT
same answer as 2, the only difference is with the dstAccessMask
The wiki page I linked above uses the following:
/* Only need a dependency coming in to ensure that the first
layout transition happens at the right time.
Second external dependency is implied by having a different
finalLayout and subpass layout. */
VkSubpassDependency dependency = {
.srcSubpass = VK_SUBPASS_EXTERNAL,
.dstSubpass = 0,
// .srcStageMask needs to be a part of pWaitDstStageMask in the WSI semaphore.
.srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
.dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
.srcAccessMask = 0,
.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT,
.dependencyFlags = 0};
/* Normally, we would need an external dependency at the end as well since we are changing layout in finalLayout,
but since we are signalling a semaphore, we can rely on Vulkan's default behavior,
which injects an external dependency here with
dstStageMask = VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT,
dstAccessMask = 0. */
You mixed up the subpass indices for the second barrier. Though as mentioned in the comments there is already an implicit barrier at the end of the renderpass which in turn syncs against the semaphore or fence you use to sync against the present. So there is no need fo an explicit dependency.
Related
I want to use a vkImage rendered at a previous render pass as Texture to do the composite operation in a fragment shader. From here I learned vkCmdPipelineBarrier is used to wait for GPU finish a rendering operation and I write this code. It works well on Snapdragon devices. But not on Mali-G52. The Write-after-write error is partly happed. Is this code not enough? Any suggestions?
vkCmdEndRenderPass(cb);
vkCmdBeginRenderPass(cb, &renderPassBeginInfo, VK_SUBPASS_CONTENTS_INLINE);
VkViewport viewport = vks::initializers::viewport((float)offscreenPass.width, (float)offscreenPass.height, 0.0f, 1.0f);
vkCmdSetViewport(cb, 0, 1, &viewport);
VkRect2D scissor = vks::initializers::rect2D(offscreenPass.width, offscreenPass.height, 0, 0);
vkCmdSetScissor(cb, 0, 1, &scissor);
// https://github.com/KhronosGroup/Vulkan-Samples/blob/master/samples/performance/pipeline_barriers/pipeline_barriers.cpp
VkImageMemoryBarrier imageMemoryBarrier = vks::initializers::imageMemoryBarrier();
imageMemoryBarrier.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
imageMemoryBarrier.newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
imageMemoryBarrier.srcAccessMask = 0;
imageMemoryBarrier.dstAccessMask = 0;
imageMemoryBarrier.image = offscreenPass.color[drawframe].image;
imageMemoryBarrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
imageMemoryBarrier.subresourceRange.baseMipLevel = 0;
imageMemoryBarrier.subresourceRange.levelCount = 1;
imageMemoryBarrier.subresourceRange.baseArrayLayer = 0;
imageMemoryBarrier.subresourceRange.layerCount = 1;
vkCmdPipelineBarrier(
cb,
VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT,
VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
0, 0, nullptr, 0, nullptr, 1, &imageMemoryBarrier);
imageMemoryBarrier.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
imageMemoryBarrier.newLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL;
imageMemoryBarrier.image = offscreenPass.depth.image;
imageMemoryBarrier.srcAccessMask = 0;
imageMemoryBarrier.dstAccessMask = 0;
vkCmdPipelineBarrier(
cb,
VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT,
VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
0, 0, nullptr, 0, nullptr, 1, &imageMemoryBarrier);
I have tried every pattern written here.
If you want to synchronize render passes then your pipeline barrier must be outside of the render pass in the command stream. I.e. it must be after the vkCmdEndRenderPass() of the first pass, and before the vkCmdBeginRenderPass() of the second pass. Pipeline barriers issued inside a render pass, as you are currently doing, are used for synchronization only within the current subpass.
Also, try to avoid:
srcStage=VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT
dstStage=VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT
... for pipeline barriers when you only consume the output of the first pass as a fragment shader input in the second. This is overly conservative and needlessly serializes execution of the geometry processing too. In this case, you should use:
srcStage=VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT
dstStage=VK_PIPELINE_STAGE_FRAGMENT_BIT
... which allows the non-dependent vertex shading and binning for the second pass to run in parallel to the first pass.
Self solved.
The difference in the precision of sampler2D between Adreno and Mali causes this issue. I can read correct data using "precision highp sampler2D".
I am very much a Vulkan/ graphics APIs beginner. I've read some resources on Vulkan synchronization and understand it more than at the beginning but the code still doesn't work. I'm expecting the ray tracing pipeline to output a flat color bule image, but it flickers intensly between blue and just black. Validation layers scream every frame that "images passed to present must be in layout VK_IMAGE_LAYOUT_PRESENT_SRC_KHR or VK_IMAGE_LAYOUT_SHARED_PRESENT_KHR but is in VK_IMAGE_LAYOUT_UNDEFINED."
This is more or less what my code looks like:
vkBeginCommandBuffer();
// ... bind pipeline and descriptor sets
vkCmdTraceRaysKHR();
// Prepare current swap chain image as transfer destination
VkImageMemoryBarrier barrier{};
barrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
barrier.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
barrier.newLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.image = swapchainImage;
barrier.subresourceRange = subresource_range;
// No need to make anything available
barrier.srcAccessMask = 0;
// The result of this transition should be visible for transfers
barrier.dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT | VK_ACCESS_TRANSFER_READ_BIT;
vkCmdPipelineBarrier(
cmdBuffer,
VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, // No need to wait for anything
VK_PIPELINE_STAGE_TRANSFER_BIT, // Should make transfers wait
0,
0, nullptr,
0, nullptr,
1, &barrier
);
// Prepare ray tracing output image as transfer source
VkImageMemoryBarrier barrier{};
barrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
barrier.oldLayout = VK_IMAGE_LAYOUT_GENERAL;
barrier.newLayout = VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL;
barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.image = renderImage.image;
barrier.subresourceRange = subresource_range;
// The data written by the ray tracing should be made available
barrier.srcAccessMask = VK_ACCESS_SHADER_WRITE_BIT;
// The transition and data should be visible for transitions
barrier.dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT | VK_ACCESS_TRANSFER_READ_BIT;
vkCmdPipelineBarrier(
cmdBuffer,
VK_PIPELINE_STAGE_RAY_TRACING_SHADER_BIT_KHR, // Should wait until ray tracing is done
VK_PIPELINE_STAGE_TRANSFER_BIT, // Should make transfers wait
0,
0, nullptr,
0, nullptr,
1, &barrier
);
vkCmdCopyImage();
// Transition swap chain image back for presentation
VkImageMemoryBarrier barrier{};
barrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
barrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
barrier.newLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.image = swapchainImage;
barrier.subresourceRange = subresource_range;
// The effects of the transfer should be made available
barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
// The effects of the transfer should be made visible for swapchain presentation
barrier.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT;
vkCmdPipelineBarrier(
cmdBuffer,
VK_PIPELINE_STAGE_TRANSFER_BIT, // Wait for transfers
VK_PIPELINE_STAGE_ALL_COMMANDS_BIT, // Block all commands after this barrier
0,
0, nullptr,
0, nullptr,
1, &barrier
);
// Transition ray tracing output image back to general layout
VkImageMemoryBarrier barrier{};
barrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
barrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL;
barrier.newLayout = VK_IMAGE_LAYOUT_GENERAL;
barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.image = renderImage.image;
barrier.subresourceRange = subresource_range;
// The effects of the transfer should be made available (possibly unnecessary?)
barrier.srcAccessMask = VK_ACCESS_TRANSFER_READ_BIT;
// (possibly unnecessary?)
barrier.dstAccessMask = VK_ACCESS_MEMORY_WRITE_BIT;
vkCmdPipelineBarrier(
cmdBuffer,
VK_PIPELINE_STAGE_TRANSFER_BIT, // Wait for transfers
VK_PIPELINE_STAGE_ALL_COMMANDS_BIT, // Block all commands after this barrier
0,
0, nullptr,
0, nullptr,
1, &barrier
);
After all this the queue is submitted with two semaphores:
one wait semaphore that is signalled by vkAcquireNextImageKHR. It's wait stage is set to VK_PIPELINE_STAGE_RAY_TRACING_SHADER_BIT_KHR.
one signal semaphore that is later used as a wait semaphore in vkQueuePresentKHR.
So how do I get rid of the vulkan validation layer message and properly display the rendered image?
Edit: The culrpit was found somwhere else (choosing wrong swapchain image for rendering), but I would still appreciate it if someone could confirm/correct my rationale behind the chosen stage and access masks. Especially that now I can't even make it freak out on purpouse, for example by setting the semaphore wait stage to BOTTOM_OF_PIPE (I thought it would mean that no stages wait so the render runs and writes without a swapchain image)
I've been learning vulkan and following vulkan-tutorial and right now I'm at the Texture mapping part. I'm loading an image and uploading it to the host memory, but I'm having trouble understanding the layout transitions and barriers.
Consider this (pseudo)code for loading and transitioning an image (inspired by this), which will be sampled in a fragment shader:
auto texture = loadTexture(filePath);
auto stagingBuffer = createStagingBuffer(texture.pixels, texture.size);
// Create image with:
// usage - VK_IMAGE_USAGE_TRANSFER_DST_BIT | VK_IMAGE_USAGE_SAMPLED_BIT
// properties - VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT
auto imageBuffer = createImage();
// -- begin single usage command buffer --
auto cb = beginCommandBuffer();
VkImageMemoryBarrier preCopyBarrier {
// ...
.image = imageBuffer.image,
.srcAccessMask = 0,
.dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT,
.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED,
.newLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
// ...
};
// PipelineBarrier (preCopyBarrier):
// srcStageMask = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT
// dstStageMask = VK_PIPELINE_STAGE_TRANSFER_BIT
// imageMemoryBarrier = &preCopyBarrier
vkCmdPipelineBarrier(cb, ...);
// Copy stagingBuffer.buffer to imageBuffer.image
// dstImageLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL
vkCmdCopyBufferToImage(cb, ...);
VkImageMemoryBarrier postCopyBarrier {
// ...
.image = imageBuffer.image,
.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT,
.dstAccessMask = VK_ACCESS_SHADER_READ_BIT,
.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
.newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL,
// ...
};
// PipelineBarrier (postCopyBarrier):
// srcStageMask = VK_PIPELINE_STAGE_TRANSFER_BIT
// dstStageMask = VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT
// imageMemoryBarrier = &postCopyBarrier
vkCmdPipelineBarrier(cb, ...)
endAndSubmitCommandBuffer(cb);
The preCopyBarrier is there because of the vkCmdCopyBufferToImage(...) command and will be "used"/"activated" only once and that is during this command(?).
The postCopyBarrier is there because of the fact, that it will be sampled in the fragment shader, so the layout transition
VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL -> VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL
has to happen every single time a frame is rendered(? Please correct me, if I'm wrong).
But (assuming I'm correct, which I'm probably not) I'm having trouble wrapping my head around the fact, that I'm creating a preCopyBarrier, which will be used only once and postCopyBarrier, which will be used continuously. If I were to load for example 200 textures, I'd have a bunch of their single usage preCopyBarriers laying around. Isn't this a...waste?
This might be a stupid question and I'm probably missing/misunderstanding something important, but I feel like I shouldn't move on without understanding this concept correctly.
This occurs in my attempts to render metal with a CAMetalLayer, and in a lot of 'Metal By Example' sample code I download. The problem is with the 'texture' I guess, here's some code, I can't provide all of it, but I'll try to provide the most relevant parts. It doesn't accept the texture descriptor, printing this into the console.
failed assertion `MTLTextureDescriptor: Depth, Stencil, DepthStencil, and Multisample textures must be allocated with the MTLStorageModePrivate or MTLStorageModeMemoryless storage mode.'
- (void)buildDepthTexture
{
CGSize drawableSize = self.layer.drawableSize;
MTLTextureDescriptor *descriptor = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatDepth32Float
width:drawableSize.width
height:drawableSize.height
mipmapped:NO];
self.depthTexture = [self.device newTextureWithDescriptor:descriptor]; // Thread 1: signal SIGABRT
[self.depthTexture setLabel:#"Depth Texture"];
}
Again, this is sample code that presumably worked, but no longer does. So I'm like OK, let's allocate it with private storage mode or some junk. descriptor.storageMode = MTLStorageModePrivate;
But when I do that, the render pass descriptor can't be created in draw.
failed assertion `Texture at depthAttachment has usage (0x01) which doesn't specify MTLTextureUsageRenderTarget (0x04)'
MTLRenderPassDescriptor *renderPass = [self newRenderPassWithColorAttachmentTexture:[drawable texture]];
id<MTLCommandBuffer> commandBuffer = [self.commandQueue commandBuffer];
id<MTLRenderCommandEncoder> commandEncoder = [commandBuffer renderCommandEncoderWithDescriptor:renderPass]; //Thread 1: signal SIGABRT
Here's the code for newRenderPassWithColorAttachmentTexture.
- (MTLRenderPassDescriptor *)newRenderPassWithColorAttachmentTexture:(id<MTLTexture>)texture
{
MTLRenderPassDescriptor *renderPass = [MTLRenderPassDescriptor new];
renderPass.colorAttachments[0].texture = texture;
renderPass.colorAttachments[0].loadAction = MTLLoadActionClear;
renderPass.colorAttachments[0].storeAction = MTLStoreActionStore;
renderPass.colorAttachments[0].clearColor = MBEClearColor;
renderPass.depthAttachment.texture = self.depthTexture;
renderPass.depthAttachment.loadAction = MTLLoadActionClear;
renderPass.depthAttachment.storeAction = MTLStoreActionStore;
renderPass.depthAttachment.clearDepth = 1.0;
return renderPass;
}
So basically, it seems two different stages of rendering require two different mutually exclusive conditions to be the case. If one's works, the other doesn't work, and vice versa. Seems impossible, seriously, what am I supposed to do? What gives?
You should provide texture usage description:
- (void)buildDepthTexture
{
CGSize drawableSize = self.layer.drawableSize;
MTLTextureDescriptor *descriptor = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatDepth32Float
width:drawableSize.width
height:drawableSize.height
mipmapped:NO];
descriptor.storageMode = MTLStorageModePrivate;
descriptor.usage = MTLTextureUsageRenderTarget | MTLTextureUsageShaderRead | MTLTextureUsageShaderWrite;
self.depthTexture = [self.device newTextureWithDescriptor:descriptor]; // Thread 1: signal SIGABRT
[self.depthTexture setLabel:#"Depth Texture"];
}
I don't quite understand here.:
https://github.com/SaschaWillems/Vulkan/blob/master/examples/computeshader/computeshader.cpp
void draw()
{
VulkanExampleBase::prepareFrame();
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &drawCmdBuffers[currentBuffer];
VK_CHECK_RESULT(vkQueueSubmit(queue, 1, &submitInfo, VK_NULL_HANDLE));
VulkanExampleBase::submitFrame();
// Submit compute commands
// Use a fence to ensure that compute command buffer has finished executin before using it again
vkWaitForFences(device, 1, &compute.fence, VK_TRUE, UINT64_MAX);
vkResetFences(device, 1, &compute.fence);
VkSubmitInfo computeSubmitInfo = vks::initializers::submitInfo();
computeSubmitInfo.commandBufferCount = 1;
computeSubmitInfo.pCommandBuffers = &compute.commandBuffer;
VK_CHECK_RESULT(vkQueueSubmit(compute.queue, 1, &computeSubmitInfo, compute.fence));
}
drawCmdBuffers[currentBuffer] runs before compute.commandBuffer, but the consumer drawCmdBuffers[currentBuffer] requires the textureComputeTarget produced by the producer compute.commandBuffer.
I don't understand why drawCmdBuffers[currentBuffer] is called before compute.commandBuffer.
In the following code, only the first frame is rendered, while the right picture does not get the textureComputeTarget, so it is rendered with a blue background.
void draw()
{
VulkanExampleBase::prepareFrame();
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &drawCmdBuffers[currentBuffer];
VK_CHECK_RESULT(vkQueueSubmit(queue, 1, &submitInfo, VK_NULL_HANDLE));
VulkanExampleBase::submitFrame();
// Submit compute commands
// Use a fence to ensure that compute command buffer has finished executin before using it again
vkWaitForFences(device, 1, &compute.fence, VK_TRUE, UINT64_MAX);
vkResetFences(device, 1, &compute.fence);
VkSubmitInfo computeSubmitInfo = vks::initializers::submitInfo();
computeSubmitInfo.commandBufferCount = 1;
computeSubmitInfo.pCommandBuffers = &compute.commandBuffer;
VK_CHECK_RESULT(vkQueueSubmit(compute.queue, 1, &computeSubmitInfo, compute.fence));
sleep(1000) // <-------- Wait
}
Executed when calling vkQueueSubmit(queue, 1, &submitInfo, VK_NULL_HANDLE):
VkImageMemoryBarrier imageMemoryBarrier = {};
imageMemoryBarrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
// We won't be changing the layout of the image
imageMemoryBarrier.oldLayout = VK_IMAGE_LAYOUT_GENERAL;
imageMemoryBarrier.newLayout = VK_IMAGE_LAYOUT_GENERAL;
imageMemoryBarrier.image = textureComputeTarget.image;
imageMemoryBarrier.subresourceRange = { VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 };
imageMemoryBarrier.srcAccessMask = VK_ACCESS_SHADER_WRITE_BIT;
imageMemoryBarrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;
vkCmdPipelineBarrier(
drawCmdBuffers[i],
VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT,
VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT,
VK_FLAGS_NONE,
0, nullptr,
0, nullptr,
1, &imageMemoryBarrier);
vkCmdBeginRenderPass(drawCmdBuffers[i], &renderPassBeginInfo, VK_SUBPASS_CONTENTS_INLINE);
Wait for VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, but this phase has not been executed before, why is the pipeline not stuck? Is it because
there is no pipeline before, so there is no need to wait?
In section 6.6 Pipeline Barriers
vkCmdPipelineBarrier is a synchronization command that inserts a dependency between commands submitted to the same queue, or between commands in the same subpass.
void draw()
{
printf("%p, %p\n", queue, compute.queue);
VulkanExampleBase::prepareFrame();
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &drawCmdBuffers[currentBuffer];
VK_CHECK_RESULT(vkQueueSubmit(queue, 1, &submitInfo, VK_NULL_HANDLE));
VulkanExampleBase::submitFrame();
// Submit compute commands
// Use a fence to ensure that compute command buffer has finished executin before using it again
vkWaitForFences(device, 1, &compute.fence, VK_TRUE, UINT64_MAX);
vkResetFences(device, 1, &compute.fence);
VkSubmitInfo computeSubmitInfo = vks::initializers::submitInfo();
computeSubmitInfo.commandBufferCount = 1;
computeSubmitInfo.pCommandBuffers = &compute.commandBuffer;
VK_CHECK_RESULT(vkQueueSubmit(compute.queue, 1, &computeSubmitInfo, compute.fence));
sleep(1000);
}
Print results:
0x6000039c4a20, 0x6000039c4a20
The current queue and compute.queue are the same queue.But it is possible that the above code may generate different queue.
Can VkImageMemoryBarrier be synchronized in multiple queues?
vkCmdPipelineBarrier is a synchronization command that inserts a dependency between commands submitted to the same queue, or
between commands in the same subpass. why use "or", why not use
"and"?
I don't understand why drawCmdBuffers[currentBuffer] is called before compute.commandBuffer.
Dunno, it is an example. Author was probably not awfully woried what happens in the first frame. It would simply be drawn with one frame delay. Swapping the compute before draw should also work with some effort.
Wait for VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, but this phase has not been executed before, why is the pipeline not stuck? Is it because there is no pipeline before, so there is no need to wait?
Because that is not how pipeline and dependencies work. vkCmdPipelineBarrier makes sure any command\operation in queue before the barrier reaches (and finishes) at least the srcStage stage (i.e. COMPUTE) before any command\op recorded after it reach dstStage.
Such dependency is satisfied even if there are no commands recorded before. I.e. by definition of "nothing", there are no commands that have not reached COMPUTE stage yet.
Can VkImageMemoryBarrier be synchronized in multiple queues?
Yes, with the help of a Semaphore.
For VK_SHARING_MODE_EXCLUSIVE and different queue family it is called Queue Family Ownership Transfer (QFOT).
Otherwisely, a Semaphore already performs a memory dependency and a VkImageMemoryBarrier is not needed.
vkCmdPipelineBarrier is a synchronization command that inserts a dependency between commands submitted to the same queue, or between commands in the same subpass. why use "or", why not use "and"?
vkCmdPipelineBarrier is either outside subpass, then it forms a dependency with commands recorded before and after in the queue.
Or vkCmdPipelineBarrier is inside a subpass, in which case it is called "subpass self-dependency" and its scope is limited only to that subpass (among other restrictions).