VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT VkAccessFlags set to 0? - vulkan

In the Vulkan spec it defines:
VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT is equivalent to VK_PIPELINE_STAGE_ALL_COMMANDS_BIT with
VkAccessFlags set to 0 when specified in the second synchronization scope, but specifies no
stages in the first scope.
and similarly:
VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT is equivalent to VK_PIPELINE_STAGE_ALL_COMMANDS_BIT with
VkAccessFlags set to 0 when specified in the first synchronization scope, but specifies no stages
in the second scope.
I'm unclear what it means by "with VkAccessFlags set to 0" in this context?
Technically VkAccessFlags is a type, not a variable, so it can't be set to anything.
(It seems to be adjusting the definitions of TOP/BOTTOM_OF_PIPE for some special property of VK_PIPELINE_STAGE_ALL_COMMANDS_BIT with respect to VkAccessFlags, but I can't quite see what that special property is or where it is specified.)
Anyone know what it's talking about?
(or, put another way: If we removed those two utterances of "with VkAccessFlags set to 0" from the spec, what would break?)

It is roundabout way to say the interpretation of the stage flag is different for a memory dependency.
For execution dependency in src it takes the stage bits you provide, and logically-earlier stages are included automatically. Similarly for dst, logically-later stages are included automatically.
But this applies only to the execution dependency. For a memory dependency, only the stage flags you provide count (and none are added automatically).
For example, let's say you have VK_PIPELINE_STAGE_ALL_COMMANDS_BIT + VK_ACCESS_MEMORY_WRITE_BIT in src. That means all memory writes from all previous commands will be made available. But if you have VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT + VK_ACCESS_MEMORY_WRITE_BIT in src, that means all memory writes from only BOTTOM_OF_PIPE stage are made available, so no memory writes are made available (because that particular stage doesn't make any).
Either way IMO, for code clarity it is better to always state all pipeline stages explicitly whenever one can.

Related

Can Vulkan synchronization source and destination stage mask be the same?

The following code from the Vulkan Tutorial seems to conflict with how synchronization scopes work.
// <dependency> is a subpass dependency.
dependency.srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
...
dependency.dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
The above code is trying to set both the srcStageMask and dstStageMask to be the same pipeline stage: VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT.
According to Vulkan Specification:
If a synchronization command includes a source stage mask, its first synchronization scope only includes execution of the pipeline stages specified in that mask, ...
If a synchronization command includes a destination stage mask, its second synchronization scope only includes execution of the pipeline stages specified in that mask, ...
In other words, srcStageMask and dstStageMask create a first synchronization scope with specified stage(s) and a second one with the specified stage(s), respectively.
Also, according to the following:
... for two sets of operations, the first set must happen before the second set.
My confusion is that, since the source and destination stage are the same, the subpass dependency is requiring this pipeline stage must complete before the exact same stage starts to execute.
The color attachment output stage is already guaranteed to be finished (the first scope). How can you specify to start to execute the same finished stage again? (the second scope)
So what is this dependency is trying to say?
A stage only exists within an action command that executes some portion of itself within that stage. Synchronization scopes are based on commands first. Once you have defined which commands are in the scope, stage masks can specify which stages within those commands are affected by the synchronization.
As such, all synchronization operations define a set of commands that happen before the synchronization and the set of commands that happen after. These represent the "first synchronization scope" and "second synchronization scope".
The source stage mask applies to the commands in the "first synchronization scope". The destination stage mask applies to commands in the "second synchronization scope". The commands in one scope are a distinct set from the other scope. So even if you're talking about the same pipeline stages, they're stages in different commands that execute at different times.
So what that does is exactly what it says: it creates a dependency between all executions of the color attachment stage from the source subpass (aka: the "first synchronization scope") and all executions of the color attachment stage from the destination subpass (aka: the "second synchronization scope").

What does "VkImageMemoryBarrier::srcAccessMask = 0" mean?

I just read Images Vulkan tutorial, and I didn't understand about "VkImageMemoryBarrier::srcAccessMask = 0".
code:
barrier.srcAccessMask = 0;
barrier.dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
and this tutorial say:
Since the transitionImageLayout function executes a command buffer with only a single command, you could use this implicit synchronization and set srcAccessMask to 0 if you ever needed a VK_ACCESS_HOST_WRITE_BIT dependency in a layout transition.
Q1 : If function have commandbuffer with multi command, then can't use this implicit synchronization?
Q2 : According to the manual page, VK_ACCESS_HOST_WRITE_BIT is 0x00004000. but tutorial use "0". why?
it's "0" mean implicit
it's "VK_ACCESS_HOST_WRITE_BIT" mean explicit ?
Am I understanding correctly?
0 access mask means "nothing". As in, there is no memory dependency the barrier introduces.
Implicit synchronization means Vulkan does it for you. As the tutorial says:
One thing to note is that command buffer submission results in implicit VK_ACCESS_HOST_WRITE_BIT synchronization
Specifically this is Host Write Ordering Guarantee.
Implicit means you don't have to do anything. Any host write to mapped memory is already automatically visible to any device access of any vkQueueSubmit called after the mapped memory write.
Explicit in this case would mean to submit a barrier with VK_PIPELINE_STAGE_HOST_BIT and VK_ACCESS_HOST_*_BIT.
Note the sync guarantees only work one way. So CPU → GPU will be automatic\implicit. But GPU → CPU always need to be explicit (you need a barrier with dst = VK_PIPELINE_STAGE_HOST_BIT to perform memory domain transfer operation).

Is VkDescriptorPoolSize struct really needed when creating descriptor pools?

I'm creating descriptor pool with poolSizeCount == 0 and pPoolSizes == nullptr and i still can allocate various number of descriptors of any type. There are no validation errors on Linux, only on Windows (but code works).
Another case: i'm providing VkDescriptorPoolSize with only 1 VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER, but can allocate more VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER or even descriptors of other types (in this case errors don't occur on both Linux and Windows).
Why is this happening?
It is not technically invalid usage to exceed the pool limits in general:
If a call to vkAllocateDescriptorSets would cause the total number of descriptor sets allocated from the pool to exceed the value of vkDescriptorPoolCreateInfo::maxSets used to create pAllocateInfo->descriptorPool, then the allocation may fail due to lack of space in the descriptor pool. Similarly, the allocation may fail due to lack of space if the call to vkAllocateDescriptorSets would cause the number of any given descriptor type to exceed the sum of all the descriptorCount members of each element of VkDescriptorPoolCreateInfo::pPoolSizes with a member equal to that type.
Note the use of the word "may", which allows implementations to fail but doesn't require them to do so. This means that you're supposed to stay within those limits, but nobody's going to stop you if you exceed them and get away with it.
Now, it is a violation of valid usage to pass no sizes at all:
poolSizeCount must be greater than 0
And the appropriate layer should catch that. But outside of layers, you just achieve undefined behavior. Which can be "appears to work".

Understanding flags in Vulkan

In trying to set up a debug callback in Vulkan I noticed something weird about the LunarG SDK validation layers.
In setting it up the create info struct, I do the following:
VkDebugUtilsMessengerCreateInfoEXT debugCreateInfo;
debugCreateInfo.sType = VK_STRUCTURE_TYPE_DEBUG_UTILS_MESSENGER_CREATE_INFO_EXT;
debugCreateInfo.messageSeverity = VK_DEBUG_UTILS_MESSAGE_SEVERITY_VERBOSE_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_SEVERITY_WARNING_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_SEVERITY_ERROR_BIT_EXT;
debugCreateInfo.messageType = VK_DEBUG_UTILS_MESSAGE_TYPE_GENERAL_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_TYPE_VALIDATION_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_TYPE_PERFORMANCE_BIT_EXT;
debugCreateInfo.pfnUserCallback = debugCallback;
Everything works, but when I run it the application I get the following message:
VUID-VkDebugUtilsMessengerCreateInfoEXT-flags-zerobitmask(ERROR / SPEC): msgNum: 1138790405 - vkCreateDebugUtilsMessengerEXT: parameter pCreateInfo->flags must be 0. The spec valid usage text states 'flags must be 0' (https://www.khronos.org/registry/vulkan/specs/1.0-extensions/html/vkspec.html#VUID-VkDebugUtilsMessengerCreateInfoEXT-flags-zerobitmask)
I do not really understand the message and the link just takes me to the start of the Vulkan specification page. So all I can understand is:
vkCreateDebugUtilsMessengerEXT: parameter pCreateInfo->flags must be 0
If I do set debugCreateInfo.flags = 0; explicitly the error goes away. But this has not been necessary anywhere else? I have never used the flags and I don't understand them at all either.
What I then found is that the error also dissappears if I change the struct declaration from:
VkDebugUtilsMessengerCreateInfoEXT debugCreateInfo;
// to
VkDebugUtilsMessengerCreateInfoEXT debugCreateInfo = {};
So my question is what are flags, and what is the connection between the way I declare the struct and the declaration of the flag?
Flags in Vulkan work just like flags anywhere else and are simple bit masks to pass information to the implementation, just like the ones you pass via messageSeverity in your above example.
But as of now, there are no valid flags you can actually set for the debug utils create info structure as per the specs:
flags is 0 and reserved for future use.
And the valid usage chapter clearly states:
flags must be 0
This member is reserved for future usage, e.g. for extensions, so right now it must always be zero.
In your initial code snippet you don't explicitly clear the VkDebugUtilsMessengerCreateInfoEXT structure which may result in pFlags having some random value that does not fit within the rules set by the spec.
This also applies for all other Vulkan structures that use e.g. flags. So if you don't explicitly set any flags you should always clear the create info structures so that any pFlags member is set to zero. Not doing so may result in undefined behavior.

Atomic updates to variables in TensorFlow

Say I have a TensorFlow variable to track the mean of a value. mean can be updated with the following graph snippet:
mean.assign((step * mean.read() + value) / (step + 1))
Unfortunately, those operations are not atomic, so if two different portions of the graph try to update the same mean variable one of the updates may be lost.
If instead I were tracking sum, I could just do
sum.assign_add(value, use_locking=True)
and everything would be great. Unfortunately, in other cases a more complicated update to mean (or std or etc.) may be required, and it may be impossible to use tf.assign_add.
Question: Is there any way to make the first code snippet atomic?
Unfortunately I believe the answer is no, since (1) I don't remember any such mechanism and (2) one of our reasons for making optimizers C++ ops was to get atomic behavior. My main source of hope is XLA, but I do not whether whether this kind of atomicity can be guaranteed there.
The underlying problem of the example is that there are two operations - read and subsequent assign - that together need to be executed atomically.
Since beginning of 2018, the tensorflow team added the CriticalSection class to the codebase. However, this only works for resource variables (as pointed out in Geoffrey's comments). Hence, value in the example below needs to be acquired as:
value = tf.get_variable(..., use_resource=True, ...)
Although I did not test this, according to the class' documentation the atomic update issue should then be solvable as follows:
def update_mean(step, value):
old_value = mean.read_value()
with tf.control_dependencies([old_value]):
return mean.assign((step * old_value + value) / (step + 1))
cs = tf.CriticalSection()
mean_update = cs.execute(update_mean, step, value)
session.run(mean_update)
Essentially, it provides a lock from the beginning of execute() till its end, i.e. covering the whole assignment operation including read and assign.