Questionst about Vulkan VkSubpassDependency members - vulkan

I'm studying Vulkan RenderPasses, for hobby.
I have the following descriptions for members of VkSubpassDependency struct.
The verbiage is a combination of language from sources(books, spec, internet), and
my own word-smithing. The descriptions might be wrong because of my messing with them.
// .srcSubpass:
// The subpass index from which "producing" operations should be finished before the second set of "consuming" operations executes.
// If there are no dependencies on previous subpasses(eg: first subpass), use VK_SUBPASS_EXTERNAL.
// .srcStageMask:
// A bitmask of the pipeline stages which produce the data read by the "consuming" commands.
// .srcAccessMask:
// The types of memory operations that occurred during the "producing" commands.
// ----------
// .dstSubpass:
// The index of the first subpass whose operations depend the output of this subpass; or VK_SUBPASS_EXTERNAL, if
// there are no destination subpasses dependencies.
// .dstStageMask:
// A bitmask of the pipeline stages which depend on the data generated by the "producing" commands.
// .dstAccessMask:
// The types of memory operations that will be performed in "consuming" commands.
I have some questions
Lets say that a RenderPass has 3 subpasses -- S1, S2, S3 -- which are "logically" executed in sequence, but which may be executed out of order by the GPU.
srcSubpass:
1.1. Is it possible for there to be a scenario in which S3 does not depend on either S1 nor S2? Or do subsequent subpasses always depend on some previous pass?
1.2. If 1.1 is true, then srcSubpass would be VK_SUBPASS_EXTERNAL, correct?
1.3. Or, does VK_SUBPASS_EXTERNAL only ever apply to S1?
2. dstSubpass:
2.1. Is this simply always the index of the next logical subpass (ie: S1->S2)? Or is it possible for it to be S1->S3?
2.2. Similar to question 1.1, is it possible for S3 not to depend on S2, and thus this value would be VK_SUBPASS_EXTERNAL for S2?
3. srcStageMask:
3.1. Is it the case that the earlier the pipeline stage is, the less the dependency between the 2 sets of operations would be?
That is, a srcStageMask of VK_PIPELINE_STAGE_VERTEX_INPUT_BIT would have a smaller dependency time than VK_PIPELINE_STAGE_VERTEX_SHADER_BIT.
3.2. If 3.1 is true, then this means that the ideal pipeline stage would be VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, correct?
4. dstStageMask:
4.1. For a smaller dependency time between 2 sets of operations, the farther down the pipeline, the better?
4.2. If 4.1 is true, then VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT would be the ideal stage, for minimum dependency. Yeah?
Thanks

1.1 Anything that is not impossible\invalid is by definition possible. Might make sense to render to two independent color attachments using the same vertex buffer. Might save on binning, which need only be done once.
1.2 Not necessarily. For example a subpass might have no color attachments and instead output via Storage Images. Therefore it needs no VK_SUBPASS_EXTERNAL dependecy, explict nor implicit.
1.3 No it applies to whichever subpass uses given attachment first, as that one executes the LoadOp for that attachment.
2.1 Yes, it is a dependency DAG, so it might have S1→S3, S2→S3 (and no S1→S2).
2.2 It might be VK_SUBPASS_EXTERNAL. Note you can have more dependencies; one can be VK_SUBPASS_EXTERNAL and other may have another subpass. One attachment in S2 might be finished and have early StoreOp here, and therefore VK_SUBPASS_EXTERNAL would be appropriate for it.
3.1 Yes, the more logically-earlier stage is used in srcStageMask, the restriction imposed by such dependency is lesser or equal.
3.2 Yes, that would be a no-op dependency. Coincidentally this is how the implicit subpass dependency is specified.
4.1 Yea, same logic. The more logically-later stage is used in `dstStageMask, the restriction imposed by such dependency is lesser or equal.
4.2 That is the no-op dependency. And coincidentally this is how the implicit subpass dependency is specified.

Related

Vulkan ShaderViewportIndexLayerEXT support

I'm in the process of porting some rendering code to Vulkan. I've been using the SPIR-V cross-compiler to avoid the requirement of re-writing all my shaders, which has been working well in most cases, but now I've hit an issue I just can't get past.
I have a vertex shader being compiled to SPIR-V that uses SV_RenderTargetArrayIndex. This maps to ShaderViewportIndexLayerEXT in SPIRV. The device I'm running on (an NVidia 3090 with latest drivers) supports the VK_EXT_shader_viewport_index_layer extension (which is core in 1.2), and I'm explicitly enabling both shaderOutputViewportIndex and shaderOutputLayer in the VkPhysicalDeviceVulkan12Features struct (chained off the vkPhysicalDeviceFeatures2 which is in turn chained on pNext for the vkDeviceCreateInfo struct.
I've also added the line:
[[vk::ext_extension("SPV_EXT_shader_viewport_index_layer")]]
To the .hlsl source file, and verified the SPIRV being output contains the extension reference.
The validation layer is whining, with:
Validation Error: [ VUID-VkShaderModuleCreateInfo-pCode-01091 ] Object 0: handle = 0x1e66c477eb0, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xa7bb8db6 | vkCreateShaderModule(): The SPIR-V Capability (ShaderViewportIndexLayerEXT) was declared, but none of the requirements were met to use it. The Vulkan spec states: If pCode declares any of the capabilities listed in the SPIR-V Environment appendix, one of the corresponding requirements must be satisfied
Is there something else I need to do to enable this feature on the device? Any help/insight would be greatly appreciated.
(I can always drop the requirement to use this feature - realistically in my use case on DX12 it's not gaining me much if anything, but I'd rather figure this out!)

implicit subpass dependency and barriers

I must copy a color attachment into a buffer. Do I need an image memory barrier between the end of the render pass and the copy operation, to ensure visibility to the transfer?
The render pass has a single subpass with a single attachment. The VkAttachmentDescription.finalLayout differs from the VkAttachmentReference.layout, so an implicit subpass dependency with dstSubpass = VK_SUBPASS_EXTERNAL does indeed exists.
My confusion arises from the description of such implicit external dependency: it has a dstAccessMask = 0 and dstStageMask = VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT. Since the transfer stage happens before the end of pipe pseudostage, does it mean I need to specify a barrier? Or does the implicit dependency only affect operations in the render pass, so I need a barrier anyway? As a side question, what is the meaning of 0 as a source or destination access mask?
Validation layers do not report any issue either with or without a barrier, and the output is as expected, but I am not sure it is correct though.
Yes, you need a dependency between _______ and _______ in Vulkan.
Implicit dependency exists in the spec only for formal reasons. Without it you would not know when the layout transition happens when used e.g. with a Semaphore. But in practice the implicit dependency is no-op. Its dst half does not cover anything.
The meaning of 0 in access mask is "no access". E.g. VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT stage does not read nor write anything, so the access flag accompanying it should be 0.

How to synchronize vkCmdCopyBufferToImage()?

I need to render an image and copy it back to host. I issue the vkCmdCopyImageToBuffer() from the render pass result to a host readable buffer right after the vkCmdEndRenderPass(), it seems to work, but I am worried the copy starts before the rendering is finished (or the image is transitioned).
Do I need to perform some kind of synchronization, or it is implicitly guaranteed the image will be transitioned to the needed VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, which happens at the end of the render pass, before the copy is initiated? In the specs, where is this sequence defined?
The layout of an image after the execution of a renderpass is explicitly specified by you, when you constructed that renderpass. This is specified by VkAttachmentDescription::finalLayout for the attached image.
As for synchronization, that again is specified by you at renderpass creation time. Subpasses can have external dependencies, where they depend on something that happens before the renderpass, or where they cause something after the renderpass to depend on them.
That being said, if you do not specify a subpass dependency for which the destination subpass is VK_SUBPASS_EXTERNAL, then an implicit dependency is created (one of the few times Vulkan does synchronization implicitly). This implicit dependency synchronizes color, depth, and input attachments from any command with the bottom of the pipe. However, because it does not specify any destination access forms in its mask, this is not useful and you should provide an explicit external dependency.
Also, using bottom of the pipe as the destination stage is pretty much never useful.

hdfsFileStatus and FileStatus difference

what is the main difference between the 2 classes.
mainly, what situation would i use one and not the other?
org.apache.hadoop.hdfs.protocol package
http://www.sching.com/javadoc/hadoop/org/apache/hadoop/hdfs/protocol/HdfsFileStatus.html
org.apache.hadoop.fs package
https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/fs/FileStatus.html
HdfsFileStatus is marked with #InterfaceAudience.Private and #InterfaceStability.Evolving annotations (check the source code). The first annotation means it intended to be used for internal Hadoop implementations. The second annotation means the file might be changing (backwards compatible support might not be available between releases). Basically you should not use HdfsFileStatus in your code.

Aspects scanning too many classes and method cache fills memory

In our application we have several (actuall many, about 30) web services. Each web service resides in its own WAR file and has its own Spring context that is initialised when application starts.
We also have a number of annotation-driven aspect classes that we apply to web service classes. In the begining the poincut expression looked like this:
#Pointcut("execution(public * my.package.service.business.*BusinessServiceImpl.*(..))")
public void methodsToBeLogged() {
}
And AOP was enabled on services through entry in configuration.
But when the number of web serivces grew, we began to experience OutOfMemoryExceptions on our servers. After doing some profiling and analysis it appeared that memory is taken by the cache that is kept by instances of AspectJExpressionPointcut class.
Each instance's cache was about 5 MBs. And as we had 3 aspects and 30 services it resulted in 90 instances holding 450MBs of data in total.
After examining the contents of the cache we realised that it contains Java reflection Method instances for all classes existing in the WAR even those which are not part of my.package.service.business package. After modifing the point cut expression to have additionally within clause:
#Pointcut("execution(public * my.package.service.business.*BusinessServiceImpl.*(..)) &&
within(my.package.service.business..*)")
public void methodsToBeLogged() {
}
Memory usage was down to normal again. And all AspectJExpressionPointcut instances took less than 1MB all together.
Can someone explain why is that? And why first point cut expression is not enough? Why the cache of AspectJExpressionPointcut is not shared?
The AspectJExpressionPointcut uses a cache (shadowMatchCache) which speeds up the decision of whether AOP should be applied to a certain method call or not, based on the pointcut expression. This cache possibly consumes a lot of memory.
Additionaly, before offering all methods of a specific bean to see if there is a pointcut expression match or not, Spring first checks if a bean class, could possibly match or not, by calling AspectJExpressionPointcut.matches(Class targetClass).
This method delegates to AspectJ's PointcutExpressionImpl.couldPossiblyMatch() method. This will perform a fast check whether a class could 'possibly' match a pointcut expression or will never 'definetely' match.
According to the AspectJ developers using a within pointcut, results in more definite no's. They also recommend to never use a standalone kind of pointcuts (execution, call, get, set), but combine these with within.
The shadowMatchCache can not be shared however, because it contains the result of match or no match per pointcut expression.
But at least you can limit what gets cached. I also think Spring could possibly improve on this by not keeping this whole cache around, once the applicationContext is started. F.e. they could possibly throw away all the no matches, at the expense of redoing some of the matching, when a new bean is dynamically added to the applicationContext after it is already started.
Another possible memory hog inside the AspectJExpressionPointcut class is the pointCutParser. This parser could possibly be shared across all AspectJExpressionPointcuts in the applicationContext. Take a loot at JIRA ticket SPR-7678.