Standard usage of Barriers is relatively straightforward, but I was wondering what is the behavior of two (or more) overlapping Image Barriers (especially with respect to their side effect -- the layout transition). E.g. (pseudocode):
begin( commandBuffer );
1: write( image );
2: imageBarrier(
image,
src=STAGE_FRAGMENT(from the write at 1:),
dst=STAGE_FRAGMENT(intended for read in FS of read at 4:),
appropriate src and dst access flags,
newLayout=A
);
3: imageBarrier(
image,
src=STAGE_FRAGMENT(from the write at 1:),
dst=STAGE_TRANSFER(intended for read by transfer of readT at 5:),
appropriate src and dst access flags,
newLayout=B
);
4: read( image ); // through vkCmdDraw -- expects layout A
5: readT( image ); // different kind of read through Transfer -- expects layout B
end( commandBuffer );
Is this even legal? (can you back it up by spec quote?)
What is the image layout at each point of the program?
For completeness, what is the proper/best way to write this (one producer, two consumers situation)? (Swap lines 3: and 4: and make it Read-Read dependency?)
An image cannot assume multiple layouts simultaneously. In the case of the code you suggested above, since the two barriers have no dependencies on each other, one would happen before the other, but the order is not specified. So the layout of the image afterwards would be one or the other. Which means one of the two reading operations is going to fail.
If you have two operations that use the image from two different layouts, then one of those operations must execute before the other, since they both cannot read the image in the layout they need to. And therefore, there must be an execution dependency between them:
1: write( image );
2: imageBarrier( image, src=COLOR_ATTACHMENT_OUT, dst=FRAGMENT_SHADER, newLayout=A );
3: read( image ); // e.g. through vkCmdDraw -- expects layout A
4: imageBarrier( image, src=FRAGMENT_SHADER, dst=TRANSFER, newLayout=B );
5: readT( image ); // different kind of read e.g. Transfer -- expects layout B
The dependency in #4 says that the layout transition and later TRANSFER commands will not occur until all previous FRAGMENT_SHADER operations have completed.
make it Read-Read dependency
It's not a "Read-Read dependency". A layout transition modifies the image (theoretically at any rate), just as surely as if you had written values to the image directly. So logically what you have is "I need to read from it in the FS. After that, I have to transition it to a new layout. After that, I need to read from it in a transfer operation".
It's a "Read-Write-Read dependency." The middle part needs to wait until the first read is done, but the second read can't happen until the middle part is finished. You need an execution dependency with an associated image memory barrier&layout transition.
Related
I have a case where I am writing to integer framebuffers, and I want to use logic operations when writing to pixels in the fragment shader. These are the steps I followed:
When creating the logical device, I set the VkPhysicalDeviceFeatures.logicOp to VK_TRUE (so this feature is enabled)
when creating the pipeline, I set VkPipelineColorBlendStateCreateInfo.logicOpEnable to VK_TRUE, and VkPipelineColorBlendStateCreateInfo.logicOp to VK_LOGIC_OP_COPY.
My framebuffer format is VK_FORMAT_R32G32B32A32_SINT
Once I render the frame, I see that nothing is getting updated in the frame buffer. Is there any step I am missing? (btw, I don't get any validation errors).
Thanks!
I wrote a VkRenderPass. There are two color and depth attachment VkAttachmentDescriptions and we note they are C0, C1, D0, and D1. I list the details of each VkAttachmentDescription as follows:
C0 : LoadOp clear, StoreOp Store.
D0 : LoadOp clear, StoreOp Store.
C1 : LoadOp load, StoreOp Store.
D1 : LoadOp load, StoreOp Store.
And I wrote two sub-passes for this VkRenderPass, the first sub-pass (note SP1) use C0 as color attachment and D0 as depth attachment. The second sub-pass (note SP2) use C1 as color attachment and D1 as depth attachment. The VkSubPassDependency between SP1 and SP2 describe as follows :
srcPipelineStage : Bottom.
dstPipelineStage : Top
It means draw commands in SP2 need wait SP1 finished. I try to avoid write-after-write hazard.(I’m not sure it’s correct usage or not?)
Now let us see my render flow.
Firstly, I have two VkImages, one is used as color buffer(note Cb). Another is used as depth buffer(note Db).
Secondly, I wrote four VkImageViews for bind the color and depth buffer for VkFrameBuffer(note FB). I specify the four ImageViews as follows :
ImageView 0 bind with color buffer Cb.
ImageView 1 bind with depth buffer Db.
ImageView 2 bind with color buffer Cb.
ImageView 3 bind with depth buffer Db.
And then, the draw flow are list as follow :
BeginRenderPass with FB( current is SP0)
vkCmdDraw for quad1. (we call this C00)
vkCmdDraw for quad2. (we call this C01)
vkCmdNextRenderPass (current should be SP1)
vkCmdDraw for quad1. (we call this C00)
vkCmdDraw for quad2. (we call this C01)
EndRenderPass
Finally, I can see the result is my expect. But I have some questions about this rendering flow.
The first is about multiple sub-passes. I found some informations and they tell me that the execution between sub-passes are asyncheonization. Is it real?
If it’s real(command execution is asynchronization between sub-passes), the second question is coming.
The pipeline stages of the draw commands in same sub-pass progress are step-by-steps?
For example, C00 and C01 are draw cmds in SP0. Is the execution about C00 and C01 like this :
SP0 => C00 TOP -> C01 TOP -> C00 VERTEX INPUT -> C01 VERTEX INPUT -> C00 VERTEX SHADER -> C01 VERTEX SHADER -> … C00 BOTTOM -> C01 BOTTOM.
(Because I think it should execute by the steps in this example, thus VkSubPassDependency be able to use for synchronizing sub-passes.)
They are my two question.
B.R.
1.
Yes, as mostly everything in Vulkan, subpasses are asynchronous to each other. Any kind of synchronization is given only explicitly by those Subpass Dependencies.
For completeness, "asynchronous" means no timing is specified between the subpasses. They might be executed serially, reordered, in paralel, preempted, or anything in between. The drivers choice.
2.
In a single subpass output attachments of draw commands are synchronized implicitly\automatically (one of about two exceptions the specification makes for sanity). This is called Rasterization Order. And it respects Primitive Order, meaning the triangles of the second draw will be on top of the triangles of the first draw.
Anything else needs explicit synchronization.
TOP of all commands does technically execute in ordrer. Though "execute" is bit of a misnomer. The stage does nothing.
Your stage order is only one of many possible ones. Another conformant order would be e.g. all stages on C00 -> all stages on C01. Though that is a highly theoretical difference. For it to matter we would have to observe some side-effect of this, which I can't imagine being possible.
The Subpass Dependency works either way, so not sure how the two questions relate. When you mean all stages, I suggest using ALL instead of TOP or BOTTOM; much more readable and less error-prone.
I have multiple "renderers" which should draw to the same attachment (swap chain image to be precise). I don't know the number of such renderers beforehand so I can't use subpasses. This is how I wanted to implement it:
VkCommandBuffer cb{...}; // get current "main" command buffer
for(auto r : renderers)
{
VkRenderPassBeginInfo renderPassBeginInfo{get_render_pass_begin_info(...)};
vkCmdBeginRenderPass(cb, &renderPassBeginInfo, VK_SUBPASS_CONTENTS_SECONDARY_COMMAND_BUFFERS);
array<VkCommandBuffer, 2> buffs{r->getCommandBuffers()}; // renderer build two secondary command buffers...
vkCmdExecuteCommands(cb, 1, buffs[0]); // first should be used in a render pass
vkCmdEndRenderPass(cb);
vkCmdExecuteCommands(cb, 1, buffs[1]); // second should be used ooutside of a render pass
}
The problem here is that each new call to vkCmdBeginRenderPass clears the target. This happens because the attachment was created with loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR because I need to clear it (but only once).
The solution in my case would be to move vkCmdBeginRenderPass and vkCmdEndRenderPass outside of the loop, but in this case, I need to "collect" all secondary command buffers that can't be used inside a render pass and execute them later.
But since the concept of render passes doesn't go into my head I wonder if there may be a way to keep the attachment's data between render passes?
You could stop clearing the attachments on load. Just manually clear them, either before the render pass begins or at the start of the first subpass.
That being said, render passes are not cheap, and this is really not the way to use them. The correct solution is to restructure your rendering code so that you only need a single render pass.
I have read (after running into the limitation myself) that for copying data from the host to a VK_IMAGE_TILING_OPTIMAL VkImage, you're better off using a VkBuffer rather than a VkImage for the staging image to avoid restrictions on mipmap and layer counts. (Here and Here)
So, when it came to implementing a glReadPixels-esque piece of functionality to read the results of a render-to-texture back to the host, I thought that reading to a staging VkBuffer with vkCmdCopyImageToBuffer instead of using a staging VkImage would be a good idea.
However, I haven't been able to get it to work yet, I'm seeing most of the intended image, but with rectangular blocks of the image in incorrect locations and even some bits duplicated.
There is a good chance that I've messed up my synchronization or layout transitions somewhere and I'll continue to investigate that possibility.
However, I couldn't figure out from the spec whether using vkCmdCopyImageToBuffer with an image source using VK_IMAGE_TILING_OPTIMAL is actually supposed to 'un-tile' the image, or whether I should actually expect to receive a garbled implementation-defined image layout if I attempt such a thing.
So my question is: Does vkCmdCopyImageToBuffer with a VK_IMAGE_TILING_OPTIMAL source image fill the buffer with linearly tiled data or optimally (implementation defined) tiled data?
Section 18.4 describes the layout of the data in the source/destination buffers, relative to the image being copied from/to. This is outlined in the description of the VkBufferImageCopy struct. There is no language in this section which would permit different behavior from tiled images.
The specification even has pseudo code for how copies work (this is for non-block compressed images):
rowLength = region->bufferRowLength;
if (rowLength == 0)
rowLength = region->imageExtent.width;
imageHeight = region->bufferImageHeight;
if (imageHeight == 0)
imageHeight = region->imageExtent.height;
texelSize = <texel size taken from the src/dstImage>;
address of (x,y,z) = region->bufferOffset + (((z * imageHeight) + y) * rowLength + x) * texelSize;
where x,y,z range from (0,0,0) to region->imageExtent.width,height,depth}.
The x,y,z part is the location of the pixel in question from the image. Since this location is not dependent on the tiling of the image (as evidenced by the lack of anything stating that it would be), buffer/image copies will work equally on both kinds of tiling.
Also, do note that this specification is shared between vkCmdCopyImageToBuffer and vkCmdCopyBufferToImage. As such, if a copy works one way, it by necessity must work the other.
In my application I am using a stack of 3 filters and adding that to a stillCamera. I am trying to take the image from filter1, its an empty filter so it returns the actual image.
[stillCamera addTarget:filter1];
[filter1 addTarget:filter2];
[filter2 addTarget:filter3];
[filter3 addTarget:cameraView];
When I call capturePhotoAsImageProcessedUpToFilter, it only ever returns an image when I pass it filter3 like below.
[stillCamera capturePhotoAsImageProcessedUpToFilter:filter3 with...
The two examples below never return images
[stillCamera capturePhotoAsImageProcessedUpToFilter:filter1 with...
[stillCamera capturePhotoAsImageProcessedUpToFilter:filter2 with...
Am I doing something wrong? As a fix I am using:
[filter1 imageFromCurrentlyProcessedOutput]
Is there any difference between calling capturePhotoAsImageProcessedUpToFilter and imageFromCurrentlyProcessedOutput?
I think this is a side effect of a memory conservation optimization I tried to put in place last year. For very large images, like photos, what I try to do is destroy the framebuffer that backs each filter as the filtered image progresses through the filter chain. The idea is to try to minimize memory spikes by only having one or two copies of the large image in memory at any point in time.
Unfortunately, that doesn't seem to work as intended much of the time, and because the framebuffers are deleted as the image progresses, only the last filter in the chain ends up having a valid framebuffer to read from. I'm probably going to yank this optimization out at some point in the near future in favor of an internal framebuffer and texture cache, but I'm not sure what can be done in the meantime to read from these intermediary filters in a chain.