Do you need synchronization between 2 consecutive renderpasses if the 2nd render pass is depending on the 1st one's output - vulkan

say the first RenderPass generates an output of a rendered texture, then the second RenderPass sample the texture in the shader and render it to the swapchain .
i don't know if i can do this kind of rendering inside a single RenderPass using subpasses, can subpass attachments have different sizes? in other words , can subpass behave like render textures?

It seems like there's a few separate questions wrapped up here, so I'll have a stab at answering them in order!
First, there's no need to implicitly synchronize two renderpasses where the second relies on the output from the first, provided they are either recorded on the same command buffer in the correct order, or (if recorded on separate command buffers) submitted in the correct order. The GPU will consume commands in the order submitted, so there's an implicit synchronization there.
If you are consuming the output from a renderpass (or subpass) by sampling inside a shader (which you will need to if the sizes are different, see below), rather than setting up a subpass output as an input attachment in a later subpass then you will need to do so in a separate render pass.
If you are consuming the output from a previous subpass as an input attachment, that implies you are using pixel local load operations inside your shader (where framebuffer attachments written in a previous subpass are read from at the exact same location in a subsequent subpass). This requires attachments be the same size, since all operations occur at the same pixel location.
From the Vulkan specification:
The subpasses in a render pass all render to the same dimensions, and fragments for pixel (x,y,layer) in one subpass can only read attachment contents written by previous subpasses at that same (x,y,layer) location. For multi-pixel fragments, the pixel read from an input attachment is selected from the pixels covered by that fragment in an implementation-dependent manner. However, this selection must be made consistently for any fragment with the same shading rate for the lifetime of the VkDevice.
So if your attachments vary in size, this would imply you need to consume your initial output in a separate renderpass.

Related

Vulkan spec clarification regarding subpass dependency

I need some clarity about something in the Vulkan Spec. Section 8.1 says:
Render passes must include subpass dependencies (either directly or
via a subpass dependency chain) between any two subpasses that operate
on the same attachment or aliasing attachments and those subpass
dependencies must include execution and memory dependencies separating
uses of the aliases, if at least one of those subpasses writes to one
of the aliases. These dependencies must not include the
VK_DEPENDENCY_BY_REGION_BIT if the aliases are views of distinct image
subresources which overlap in memory.
Does this mean that if subpass S0 write to attachment X (either as Color, Depth, or Resolve) and subsequent subpass S1 uses that attachment X (either as Color, Input, Depth, or Resolve), then there must be a subpass dependency from S0->S1 (directly or via chain)?
EDIT 1:
Upon further thinking, the scenario is not just if S0 writes and S1 reads. If S0 reads and S1 writes, that also needs synchronization from S0->S1.
EDIT 2:
I should say that what I was specifically unsure before was with a color attachment that is written by 2 different subpasses. Assuming that subpasses don't have a logical dependency, other than they use the same color attachment, they could be ran in parallel if they used different attachments. Before reading this paragraph, I was under the impression that dependencies were only needed between 2 subpasses if subpass B need some actual data from subpass A, and so needed to wait until this data was available. That paragraphs talks about general memory hazards.
If there is no logical need for 2 subpasses to be ordered in a certain way, the GPU could decide which is the better one to run first. But, if the developer always has to declare dependencies if 2 subpasses touch the same attachment, then that's potential speed lost that only gpu could figure out. It shouldn't be hard for the GPU to figure out that, although 2 subpasses don't have a developer-stated dependency between them, they do read/write to the same attachment, so it shouldn't just mindlessly write to it at the same time from both subpasses. Yes, I mean that the GPU would do some simple synchronization for basic cases, so as to not decapitate itself.
If there is a render pass that has two subpasses A and B, and both use the same attachment, and A writes to the shared attachment, then there is logically an ordering requirement between A and B. There has to be.
If there is no ordering requirement between two operations, then it is logically OK for those two operations to be interleaved. That is, partially running one, then partially running another, then completing the first. And they can be interleaved to any degree.
You can't interleave A and B, because the result you get is incoherent. For any shared attachment between A and B, if B writes to a pixel in that attachment, should A read that data? What if B writes twice to it? Should A read the pre-written value, the value after the first write, or the value after the second write? If A also writes to that pixel, should B's write happen before it or after? Or should A's write be between two of B's writes? And if so, which ones?
Interleaving A and B just doesn't make sense. There must be a logical order between them. So the scenario you hypothesize, where there "is no logical need for 2 subpasses to be ordered in a certain way" does not make sense.
Either you want any reads/writes done by B to happen before the writes done by A, or you want them to happen after. Neither choice is better or more correct than the other; they are both equally valid usage patterns.
Vulkan is an explicit, low-level rendering API. It is not Vulkan's job to figure out what you're trying to do. It's your job to tell Vulkan what you want it to do. And since either possibility is perfectly valid, you must tell Vulkan what you want done.
Both A & B need 5 color attachments each, but other than the memory, they don't care about each other. Why can't the GPU share the 5 color attachments intelligently between the subpasses, interleaving as it sees fit?
How would that possibly work?
If the first half of A writes some data to attachments that the second half of A's operations will read that data, B can't get in the middle of that and overwrite that data. Because then the data will be overwritten with incorrect values and the second half of A won't have access to the data written by the first half.
If A and B both start with clearing buffers (FYI: calling vkCmdClearAttachments at all should be considered at least potentially dubious), then no interleaving is possible. since they first thing they will do is overwrite the attachments in their entirety. The rendering commands within those subpasses expect the attachments to have known data, and having someone come along and mess with it will break those assumptions and yield incorrect results.
Therefore, whatever these subpasses are doing, they must execute in their entirety before the other. You may not care what order they execute in, but they must entirely execute in some order, with no overlap between them.
Vulkan just makes you spell out what that order is.

Vulkan: Is the rendering pipeline executed once per subpass?

Considering a RenderPass that has multiple Subpasses:
Is the implication of multiple subpasses that the entire rendering pipeline is executed once per subpass?
And that the image output of a prior subpass is accessible to subsequent subpasses, assumming correct subpass dependencies?
(with the stipulation that reading prior image data happens at the same pixel location, for tile gpu optimation)
I understand that hardware may optimize things out; it's more of a way of thinking about how the multi-subpass processing happens.
Extending this to multiple renderpasses, then is it the same thing as subpasses? except that image data from prior renderpasses can be accessed at any location, and that the synchronization between renderpasses uses different mechanisms that between subpasses.
Pipeline is not "executed". Pipeline just exists. That's why it is called a pipeline, and not a state machine. The queue operations are the things that are executed.
With Vulkan's render pass it is good to know how tile-based architectures work. First they need to sort everything into tiles; that means they need to know the position of everything upfront. So, the geometry processing (vertex shader, geometry shader, tesselation shader, and all the relevant fixed-function stages) need to be finished for all the queue operations, before pixel processing (fragment shader, framebuffer writes, and other fixed-function stages) starts for any of them.
From that, the subpass restrictions are derived:
If srcSubpass is equal to dstSubpass and not all of the stages in srcStageMask and dstStageMask are framebuffer-space stages, the logically latest pipeline stage in srcStageMask must be logically earlier than or equal to the logically earliest pipeline stage in dstStageMask
I.e. you cannot have a vertex shader dependency waiting on a fragment shader output of previous ops. But you can have "framebuffer-space" dependencies; e.g. fragment shader waiting on fragment shader of previous ops.
Subpass dependencies are just another abstraction of the Vulkan API of how to express the synchronization between different commands (each of which can run through multiple pipeline stages). W.r.t. render passes, subpass dependencies serve two purposes:
Expressing synchronization between the commands submitted within different render passes (that is when the VK_SUBPASS_EXTERNAL subpass-id is being used, see VkSubpassDependency
Expressing synchronization between the commands submitted within the same or different subpasses. In this case, pairs of (0 and 0), or (0 and 1), and so on are specified for the srcSubpass and dstSubpass i a VkSubpassDependency structure, respectively.
Given the correct synchronization scopes, a subsequent subpass can read the rendered results of a previous subpass. Framebuffer attachments can be passed on via input attachments, which are specified in VkSubpassDescription. You can get an overview of this in this lecture from 43:28 onwards.
Regarding the "rendering pipeline is executed"-thing: The lecture mentioned above explains commands and how they
go through pipeline stages in quite some details starting from 22:29. It should make things a lot clearer.
Regarding tiled GPUs: If you are referring to VK_DEPENDENCY_BY_REGION_BIT for VkSubpassDependency::dependencyFlags, the spec says the following:
VK_DEPENDENCY_BY_REGION_BIT specifies that dependencies will be framebuffer-local.
That means, you can only use the following pipeline stages with that flag:
VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT
VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT
VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT
Valuable information about tile-based architectures are given in the other answer by krOoze already.

During which pipeline stage is blending performed?

The Vulkan spec states:
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT specifies the stage of
the pipeline after blending where the final color values are output
from the pipeline.
This seems to imply some undefined stage between fragment-shader and color-attachment-output where blending takes place.
But let's say after writing to an image I want to use it as color attachment, and add a memory dependency with srcStageMask=VK_PIPELINE_STAGE_TRANSFER_BIT, srcAccessMask=VK_ACCESS_TRANSFER_WRITE_BIT, dstStageMask=VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, dstAccessMask=VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT. If blending took place before color-attachment-output stage, it could read data that's not yet visible.
So what does the spec actually mean in this case?
It's important to remember several facts:
You can only do blending within a render pass.
An image used as an attachment during a render pass cannot be transferred to.
Given these facts, a render pass has to have begun between the transfer to the image and the attempt to blend with that image. And note that your blending operation is relying on the data in the image to be what it was when the render pass began. That means your loadOp for that attachment needs to load the image, not clear it.
And in order for the render pass begin to load the image... it must synchronize with prior modifications to that image. And the specification does spell out which stage actually performs the load operation and how all of those things work:
The load operation for each sample in an attachment happens-before any recorded command which accesses the sample in the first subpass where the attachment is used. Load operations for attachments with a depth/stencil format execute in the VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT pipeline stage. Load operations for attachments with a color format execute in the
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT pipeline stage.
So you don't need to synchronize the transfer with the blend operation; you need to synchronize the transfer with the render pass. And the stage for that is COLOR_ATTACHMENT_OUTPUT.
As to the deeper point of your question (what stage does blending), the answer is that Vulkan doesn't allow it to matter. Images being used as an attachment in a render pass can only be used in very limited ways. As previously stated, you can't just perform arbitrary transfer operations to them. You can't perform arbitrary write operations to them. You can only access their data as color/depth/stencil attachments and/or as input attachments.
Synchronization between blending in different rendering commands (in the same render pass) is handled automatically. And you can't write to an image via an input attachment (hence the name). So there's no special need to make blending products visible to other operations.
Basically, blending never needs an explicit stage because of the restrictions of the render pass model and the ordering guarantees of blending and other per-sample operations.

Why do I need resources per swapchain image

I have been following different tutorials and I don't understand why I need resources per swapchain image instead of per frame in flight.
This tutorial:
https://vulkan-tutorial.com/Uniform_buffers
has a uniform buffer per swapchain image. Why would I need that if different images are not in flight at the same time? Can I not start rewriting if the previous frame has completed?
Also lunarg tutorial on depth buffers says:
And you need only one for rendering each frame, even if the swapchain has more than one image. This is because you can reuse the same depth buffer while using each image in the swapchain.
This doesn't explain anything, it basically says you can because you can. So why can I reuse the depth buffer but not other resources?
It is to minimize synchronization in the case of the simple Hello Cube app.
Let's say your uniforms change each frame. That means main loop is something like:
Poll (or simulate)
Update (e.g. your uniforms)
Draw
Repeat
If step #2 did not have its own uniform, then it needs to write a uniform previous frame is reading. That means it has to sync with a Fence. That would mean the previous frame is no longer considered "in-flight".
It all depends on the way You are using Your resources and the performance You want to achieve.
If, after each frame, You are willing to wait for the rendering to finish and You are still happy with the final performance, You can use only one copy of each resource. Waiting is the easiest synchronization, You are sure that resources are not used anymore, so You can reuse them for the next frame. But if You want to efficiently utilize both CPU's and GPU's power, and You don't want to wait after each frame, then You need to see how each resource is being used.
Depth buffer is usually used only temporarily. If You don't perform any postprocessing, if Your render pass setup uses depth data only internally (You don't specify STORE for storeOp), then You can use only one depth buffer (depth image) all the time. This is because when rendering is done, depth data isn't used anymore, it can be safely discarded. This applies to all other resources that don't need to persist between frames.
But if different data needs to be used for each frame, or if generated data is used in the next frame, then You usually need another copy of a given resource. Updating data requires synchronization - to avoid waiting in such situations You need to have a copy a resource. So in case of uniform buffers, You update data in a given buffer and use it in a given frame. You cannot modify its contents until the frame is finished - so to prepare another frame of animation while the previous one is still being processed on a GPU, You need to use another copy.
Similarly if the generated data is required for the next frame (for example framebuffer used for screen space reflections). Reusing the same resource would cause its contents to be overwritten. That's why You need another copy.
You can find more information here: https://software.intel.com/en-us/articles/api-without-secrets-the-practical-approach-to-vulkan-part-1

Multiple instances of same Vulkan subpass

I have been reading through many online tutorials on creating a Vulkan renderer, however, the idea of subpasses is still very unclear to me.
Say I have the following scenario: I need to do a first subpass for setup (fill a depth buffer for testing etc) then have a subpass for every light in the scene (the number of which could change at any time). Because each lighting subpass is exactly the same, would it be possible to declare 2 subpasses and have multiple instances of the same subpass?
The term "pass" here does not mean "full-screen pass" or something like that. Subpasses only matter in terms of what you're rendering to (and reading from previous subpass renderings as input attachments). Where your data comes from (descriptors/push constants), what vertex data they get, what shaders they use, none of that matters to the subpass. The only things the subpass controls are render targets.
So unless different lights are rendering to different images, then there's no reason to give each light a subpass. You simply issue the rendering commands for all of your lights within the same subpass.