implicit subpass dependency and barriers

implicit subpass dependency and barriers - vulkan

I must copy a color attachment into a buffer. Do I need an image memory barrier between the end of the render pass and the copy operation, to ensure visibility to the transfer?
The render pass has a single subpass with a single attachment. The VkAttachmentDescription.finalLayout differs from the VkAttachmentReference.layout, so an implicit subpass dependency with dstSubpass = VK_SUBPASS_EXTERNAL does indeed exists.
My confusion arises from the description of such implicit external dependency: it has a dstAccessMask = 0 and dstStageMask = VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT. Since the transfer stage happens before the end of pipe pseudostage, does it mean I need to specify a barrier? Or does the implicit dependency only affect operations in the render pass, so I need a barrier anyway? As a side question, what is the meaning of 0 as a source or destination access mask?
Validation layers do not report any issue either with or without a barrier, and the output is as expected, but I am not sure it is correct though.

Yes, you need a dependency between _______ and _______ in Vulkan.
Implicit dependency exists in the spec only for formal reasons. Without it you would not know when the layout transition happens when used e.g. with a Semaphore. But in practice the implicit dependency is no-op. Its dst half does not cover anything.
The meaning of 0 in access mask is "no access". E.g. VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT stage does not read nor write anything, so the access flag accompanying it should be 0.

Related

Questionst about Vulkan VkSubpassDependency members

I'm studying Vulkan RenderPasses, for hobby.
I have the following descriptions for members of VkSubpassDependency struct.
The verbiage is a combination of language from sources(books, spec, internet), and
my own word-smithing. The descriptions might be wrong because of my messing with them.
// .srcSubpass:
// The subpass index from which "producing" operations should be finished before the second set of "consuming" operations executes.
// If there are no dependencies on previous subpasses(eg: first subpass), use VK_SUBPASS_EXTERNAL.
// .srcStageMask:
// A bitmask of the pipeline stages which produce the data read by the "consuming" commands.
// .srcAccessMask:
// The types of memory operations that occurred during the "producing" commands.
// ----------
// .dstSubpass:
// The index of the first subpass whose operations depend the output of this subpass; or VK_SUBPASS_EXTERNAL, if
// there are no destination subpasses dependencies.
// .dstStageMask:
// A bitmask of the pipeline stages which depend on the data generated by the "producing" commands.
// .dstAccessMask:
// The types of memory operations that will be performed in "consuming" commands.
I have some questions
Lets say that a RenderPass has 3 subpasses -- S1, S2, S3 -- which are "logically" executed in sequence, but which may be executed out of order by the GPU.
srcSubpass:
1.1. Is it possible for there to be a scenario in which S3 does not depend on either S1 nor S2? Or do subsequent subpasses always depend on some previous pass?
1.2. If 1.1 is true, then srcSubpass would be VK_SUBPASS_EXTERNAL, correct?
1.3. Or, does VK_SUBPASS_EXTERNAL only ever apply to S1?
2. dstSubpass:
2.1. Is this simply always the index of the next logical subpass (ie: S1->S2)? Or is it possible for it to be S1->S3?
2.2. Similar to question 1.1, is it possible for S3 not to depend on S2, and thus this value would be VK_SUBPASS_EXTERNAL for S2?
3. srcStageMask:
3.1. Is it the case that the earlier the pipeline stage is, the less the dependency between the 2 sets of operations would be?
That is, a srcStageMask of VK_PIPELINE_STAGE_VERTEX_INPUT_BIT would have a smaller dependency time than VK_PIPELINE_STAGE_VERTEX_SHADER_BIT.
3.2. If 3.1 is true, then this means that the ideal pipeline stage would be VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, correct?
4. dstStageMask:
4.1. For a smaller dependency time between 2 sets of operations, the farther down the pipeline, the better?
4.2. If 4.1 is true, then VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT would be the ideal stage, for minimum dependency. Yeah?
Thanks

1.1 Anything that is not impossible\invalid is by definition possible. Might make sense to render to two independent color attachments using the same vertex buffer. Might save on binning, which need only be done once.
1.2 Not necessarily. For example a subpass might have no color attachments and instead output via Storage Images. Therefore it needs no VK_SUBPASS_EXTERNAL dependecy, explict nor implicit.
1.3 No it applies to whichever subpass uses given attachment first, as that one executes the LoadOp for that attachment.
2.1 Yes, it is a dependency DAG, so it might have S1→S3, S2→S3 (and no S1→S2).
2.2 It might be VK_SUBPASS_EXTERNAL. Note you can have more dependencies; one can be VK_SUBPASS_EXTERNAL and other may have another subpass. One attachment in S2 might be finished and have early StoreOp here, and therefore VK_SUBPASS_EXTERNAL would be appropriate for it.
3.1 Yes, the more logically-earlier stage is used in srcStageMask, the restriction imposed by such dependency is lesser or equal.
3.2 Yes, that would be a no-op dependency. Coincidentally this is how the implicit subpass dependency is specified.
4.1 Yea, same logic. The more logically-later stage is used in `dstStageMask, the restriction imposed by such dependency is lesser or equal.
4.2 That is the no-op dependency. And coincidentally this is how the implicit subpass dependency is specified.

About render pass compatibility in Vulkan

I read the portion relevant to "Render Pass Compatibility" in Vulkan specification. I'm not sure if my understanding is right.
Record some commands inside a render pass, which exists the invocation of VkFrameBuffer and VkPipeline. VkFrameBuffer or VkPipeline is strongly related to VkRenderPass. They must only be used with that render pass object, or one compatible with it. Can I reuse VkFrameBuffer or VkPipeline in a compatible render pass? Tell me more about this topic, please.

Not sure what do with your question except answer "yes".
VkRenderPassBeginInfo VU:
renderPass must be compatible with the renderPass member of the VkFramebufferCreateInfo structure specified when creating framebuffer.
e.g. vkCmdDraw VU:
The current render pass must be compatible with the renderPass member of the VkGraphicsPipelineCreateInfo structure specified when creating the VkPipeline bound to VK_PIPELINE_BIND_POINT_GRAPHICS.
I.e. the VkFramebuffer resp. VkPipeline has to be used with render pass that is "only" compatible, not necessarily the same object handle.

Using same VkPipeline with more than one VkRenderPass

VkGraphicsPipelineCreateInfo expects assignment of a VkRenderPass to .renderPass property. I don't really understand why a pipeline must be coupled with render pass. I mean, VkGraphicsPipelineCreateInfo doesn't directly "talks" to render pass related content, like FBOs and their attachments. I may want to use same pipeline with more than one render pass, like in case where I want to render same set of objects in different scenes, so do I have to create another one with exactly the same setup?
Just to add that creating VkPipeline with .renderPass = nullptr fails with validation error:
vkCreateGraphicsPipelines: required parameter
pCreateInfos[0].renderPass specified as VK_NULL_HANDLE.Invalid
VkRenderPass Object 0x0. The Vulkan spec states: renderPass must be a
valid VkRenderPass handle
(https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-VkGraphicsPipelineCreateInfo-renderPass-parameter)

I mean VkGraphicsPipelineCreateInfo doesn't directly "talks" to render pass related content, like FBOs and their attachments.
Of course it does. What do you think a fragment shader is doing when it writes into a render pass' attachments?
Do I have to create another one with exactly the same setup?
No. As per the specification:
"renderPass is a handle to a render pass object describing the environment in which the pipeline will be used; the pipeline must only be used with an instance of any render pass compatible with the one provided. See Render Pass Compatibility for more information".
... so a pipeline can be used with any render pass that is compatible with the one used to create it.

How to synchronize vkCmdCopyBufferToImage()?

I need to render an image and copy it back to host. I issue the vkCmdCopyImageToBuffer() from the render pass result to a host readable buffer right after the vkCmdEndRenderPass(), it seems to work, but I am worried the copy starts before the rendering is finished (or the image is transitioned).
Do I need to perform some kind of synchronization, or it is implicitly guaranteed the image will be transitioned to the needed VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, which happens at the end of the render pass, before the copy is initiated? In the specs, where is this sequence defined?

The layout of an image after the execution of a renderpass is explicitly specified by you, when you constructed that renderpass. This is specified by VkAttachmentDescription::finalLayout for the attached image.
As for synchronization, that again is specified by you at renderpass creation time. Subpasses can have external dependencies, where they depend on something that happens before the renderpass, or where they cause something after the renderpass to depend on them.
That being said, if you do not specify a subpass dependency for which the destination subpass is VK_SUBPASS_EXTERNAL, then an implicit dependency is created (one of the few times Vulkan does synchronization implicitly). This implicit dependency synchronizes color, depth, and input attachments from any command with the bottom of the pipe. However, because it does not specify any destination access forms in its mask, this is not useful and you should provide an explicit external dependency.
Also, using bottom of the pipe as the destination stage is pretty much never useful.

Serializing Running Programs in a Functional Interpreter

I am writing an interpreter implemented functionally using a variations of the Cont Monad. Inspired by Smalltalk's use of images to capture a running program, I am investigating how to serialize the executing hosted program and need help determining how to accomplish this at a high level.
Problem Statement
Using the Cont monad, I can capture the current continuation of a running program in my interpreter. Storing the current continuation allows resuming interpreter execution by calling the continuation. I would like to serialize this continuation so that the state of a running program can be saved to disk or loaded by another interpreter instance. However, my language (I am both targeting and working in Javascript) does not support serializing functions this way.
I would be interested in an approach that can be used to build up the continuation at a given point of execution given some metadata without running the entire program again until it reaches that point. Preferably, minimal changes to the implementation of the interpreter itself would be made.
Considered Approach
One approach that may work is to push all control flow logic into the program state. For example, I currently express a C style for loop using the host language's recursion for the looping behavior:
var forLoop = function(init, test, update, body) {
var iter = function() {
// When test is true, evaluate the loop body and start iter again
// otherwise, evaluate an empty statement and finish
return branch(test,
next(
next(body, update),
iter()),
empty);
};
return next(
init,
iter());
};
This is a nice solution but if I pause the program midway though a for loop, I don't know how I can serialize the continuation that has been built up.
I know I can serialize a transformed program using jumps however, and the for loop can be constructed from jump operations. A first pass of my interpreter would generate blocks of code and save these in the program state. These blocks would capture some action in the hosted language and potentially execute other blocks. The preprocessed program would look something like this:
Label Actions (Block of code, there is no sequencing)
-----------------------------------
start: init, GOTO loop
loop: IF test GOTO loop_body ELSE GOTO end
loop_body: body, GOTO update
update: update, GOTO loop
end: ...
This makes each block of code independent, only relying on values stored in the program state.
To serialize, I would save off the current label name and the state when it was entered. Deserialization would preprocess the input code to build the labels again and then resume at the given label with the given state. But now I have to think in terms of these blocks when implementing my interpreter. Even using composition to hide some of this seems kind of ugly.
Question
Are there any good existing approaches for addressing this problem? Am I thinking about serializing a program the entirely wrong way? Is this even possible for structures like this?

After more research, I have some thoughts on how I would do this. However, I'm not sure that adding serialization is something I want to do at this point as it would effect the rest of the implementation so much.
I'm not satisfied with this approach and would greatly like to hear any alternatives.
Problem
As I noted, transforming the program into a list of statements makes serialization easier. The entire program can be transformed into something like assembly language, but I wanted to avoid this.
Keeping a concept of expressions, what I didn't originally consider is that function calls can occur inside of deeply nested expressions. Take this program for example:
function id(x) { return x; }
10 + id(id(5)) * id(3);
The serializer should be able to serialize the program at any statement, but the statement could potentially be evaluated inside of an expression.
Host Functions In the State
The reason the program state cannot be easily serialized is that it contains host functions for continuations. These continuations must be transformed into data structures that can be serialized and independently reconstructed into the action the original continuation represented. Defunctionalization is most often used to express a higher order language in a first order language, but I believe it would also enable serialization.
Not all continuations can be easily defunctionalized without drastically rewriting the interpreter. As we are only interested in serialization at specific points, serialization at these points requires the entire continuation stack be defunctionalized. So all statements and expressions must be defunctionalized, but internal logic can remain unchanged in most cases because we don't want to allow serialization partway though an internal operation.
However, to my knowledge, defunctionalization does not work the Cont Monad because of bind statements. The lack of a good abstraction makes it difficult to work with.
Thoughts on a Solution
The goal is to create an object made up of only simple data structures that can be used to reconstruct the entire program state.
First, to minimize the amount of work required, I would rewrite the statements level interpreter to use something more like a state machine that can be more easily serialized. Then, I would defunctionalize expressions. Function calls would push the defunctionlized continuation for the remaining expression onto an internal stack for the state machine.
Making Program State a Serializable Object
Looking at how statements work, I'm not convinced that the Cont Monad is the best approach for chaining statements together (I do think it works pretty well at the expression level and for internal operations however). A state machine seems more natural approach, and this would also be easier to serialize.
A machine that steps between statements would be written. All other structures in the state would also be made serializable. Builtin functions would have to use serializable handles to identify them so that no functions are in the state.
Handling Expressions
Expressions would be rewritten to pass defunctionalized continuations instead of host function continuations. When a function call is encountered in an expression, it captures the defunctionalized current continuation and pushes it onto the statement machine's internal stack (This would only happen for hosted functions, not builtin ones), creating a restore point where computation can resume.
When the function returns, the defunctionalized continuation is passed the result.
Concerns
Javascript also allows hosted functions to be evaluated inside almost any expression (getters, setters, type conversion, higher order builtins) and this may complicate things if we allow serialization inside of those functions.
Defunctionalization seems to require working directly with continuations and would make the entire interpreter less flexible.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas