Concurrent use of VkSamplers? - vulkan

So a VkSampler is created with a VkSamplerCreateInfo that just has a bunch of configuration settings, that as far as I can see would just define a pure function of some input image.
They are described as:
VkSampler objects represent the state of an image sampler which is used by the implementation to
read image data and apply filtering and other transformations for the shader.
One use (possibly only use) of VkSampler is to write them to descriptors (such as VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER) for use in descriptor sets that are bound to pipelines/shaders.
My question is: can you write the same VkSampler to multiple different descriptors? from the same or multiple different descriptor pools? even if one of the current descriptors is in use in some currently executing render pass?
Can you use the same VkSampler concurrently from multiple different render passes / subpasses / pipelines?
Put another way, are VkSamplers stateless? or do they represent some stateful memory on the device and so you shouldn't use the same one concurrently?

VkSampler objects definitely have data associated with them, so it would be wrong to call them "stateless". What they are is immutable. Like VkRenderPass, VkPipeline, and similar objects, once they are created, their contents cannot be changed.
Synchronization between accesses is (generally) needed only for cases when one of the accesses is a modification operation. Since VkSamplers are immutable, there are no modification operations. So synchronization is not needed for cases where you're accessing a VkSampler from different threads, commands, or whathaveyou.
The only exception is the obvious one: vkDestroySampler, which requires that submitted commands that use the sampler have completed before calling the function.

Related

When to use multiple MTLRenderCommandEncoders to perform my Metal rendering?

I'm learning Metal, and there's a conceptual question that I'm trying to wrap my head around: at what level, exactly, should my code handle successive drawing operations that require different pipeline states? As I understand it (from answers like this: https://stackoverflow.com/a/43827775/2752221), I can use a single MTLRenderCommandEncoder and change its pipeline state, the vertex buffer it's using, etc., between calls to drawPrimitives:, and the encoder state that was current at the time of each call to drawPrimitives: will be preserved. So that's great. But it also seems like the design of Metal is such that one can make multiple MTLRenderCommandEncoder instances, and use them to sequentially throw batches of commands into a MTLCommandBuffer. Given that the former works – using one MTLRenderCommandEncoder and changing its state – why would one do the latter? Under what circumstances is it correct to do the former, and under what circumstances is it necessary to do the latter? What is an example of a situation where the latter would be necessary/appropriate?
If it matters, I'm working on a macOS app, using Objective-C. Thanks.
Ignoring multithreaded encoding cases, which are somewhat advanced, the main reason you'd want to create multiple render command encoders during a frame is because you need to change which textures you're rendering to.
You'll notice that you need to provide a render pass descriptor when creating a render command encoder. For this reason, we often say that the sequence of commands belonging to a particular encoder constitute a render pass. The attachments of that descriptor refer to the textures that will be written to by the commands encoded by the encoder.
Many different techniques, including shadow mapping and postprocessing effects like bloom require multiple passes to produce. Since you can't change attachments in the midst of a pass, creating a new encoder is the only way to encode multiple passes in a frame.
Relatedly, you should ordinarily use one command buffer per frame. You can, however, sometimes reduce frame time by splitting your passes across multiple command buffers, but this is highly dependent on the shape of your workload and should only be done in tandem with profiling, as it's not always an optimization.
In addition to Warren's answer, another way to look at the question is by examining the API. A number of Metal objects are created from descriptors. The properties of the descriptor at the time an object is created from it govern that object for its lifetime. Those are aspects of the object that can't be changed after creation.
By contrast, the object will have various setter methods to modify other properties over its lifetime.
For a render command encoder, the properties that are fixed for its lifetime are those specified by the MTLRenderPassDescriptor used to create it. If you want to render with different values for any of those properties, the only way to do so is to create a new encoder from a different descriptor. On the other hand, if you can do everything you need/want to do by using the encoder's setter methods, then you don't need a new encoder.

Can I use VkDevice from multiple threads concurrently?

In particular, can I create pipelines, allocate device memory and create images and buffers from the same VkDevice concurrently?
Where in the specs this is specified?
In the specification we can read:
Vulkan is intended to provide scalable performance when used on multiple host threads. All commands support being called concurrently from multiple threads, but certain parameters, or components of parameters are defined to be externally synchronized. This means that the caller must guarantee that no more than one thread is using such a parameter at a given time.
Then there is a list of parameters of different Vulkan functions in which they must be externally synchronized (meaning they cannot be accessed at the same time from multiple threads). In case of a VkDevice objects, we can find that only vkDestroyDevice(). So all other usages of a VkDevice objects can happen on multiple threads.
And there are practically no vkCreate...() functions in that list (only 3 swapchain related functions). Which means you can create objects from multiple threads at the same time.
Statements in the Vulkan specification of the form "host access to X must be externally synchronized" mean that you cannot cause accesses to X while also calling the function that has this requirement. If a function's specification doesn't say that about a particular parameter, then that parameter can be accessed from multiple threads. So long as all functions that could concurrently access it don't have this specification, of course.
Note that the Valid Usage section of various functions can have additional concurrency requirements.

Is this understanding of VkDescriptorPoolCreateInfo.pPoolSizes correct?

In Vulkan, I understand that a descriptor pool is used to allocate descriptor sets of some layout for use in a shader, but in the VkDescriptorPoolCreateInfo passed to vkCreateDescriptorPool, there is a field pPoolSizes that takes a bunch of objects containing a descriptor type and a number.
The documentation seems somewhat vague, but is this saying that a given descriptor pool can only have a certain, predetermined amount of each type of descriptor allocated from it in descriptor sets? If so, how do I determine how many I will need beforehand? What happens if it runs out?
Your understanding of descriptor pools is correct.
If so, how do I determine how many I will need beforehand?
That's up to you and your application's needs.
If your application needs to be completely flexible and freeform, then you will need to create descriptor pools dynamically as needed. If your application has greater foreknowledge of what the scene will look like, then your application will need fewer of such gymnastics.
Many serious Vulkan applications try to avoid having the number of descriptor sets be based on the number of objects in the scene. Push constants and/or dynamic UBO/SSBO descriptors allow different per-object state to be used without changing the descriptor itself. Textures for lots of objects can be bundled together into array textures, or depending on the hardware, arrays of textures.
In a perfect world, all meshes of a type (say, skinned meshes) could be rendered with the exact same descriptor set, using some per-object state to fetch the right matrix/texture data for that object.
But that's how they render. Such applications have firm control over the kinds of objects they render, what per-object data looks like, and so forth. Other applications may have different needs.
Vulkan is a tool; how you use it is entirely up to you.
What happens if it runs out?
Then you cannot allocate more descriptors from that pool. If you need to allocate another descriptor set, you will need to create another pool.
My approach was to have a class that initially allocates N of the descriptor, and if it runs out, it'll create another pool with N*2 entries. It'll keep doubling in size. It uses a simple linked lists and when it comes to allocating, it just tries the first one, and then moves onto the next if it's full.
That's all pretty inefficient, so I also had my code fire an assert if it ever had to create a second pool, that way I can make sure I choose a value of N that's big enough so that the retail version should never have to do it (but if it does somehow manage to due to some unforeseen set of circumstances, it'll still render correctly).
At the time, I remember cursing the spec and wishing descriptor pools would auto grow like command pools do. Still I imagine there's a good reason that they are like they are.

When is it useful to have broadcast data in deserialized form?

Reading the docs for Spark I see
The data broadcasted this way is cached in serialized form and deserialized before running each task. This means that explicitly creating broadcast variables is only useful when tasks across multiple stages need the same data or when caching the data in deserialized form is important.
I understand why broadcasts variables are useful when re-using them in multiple tasks. You don't want to re-send them with all closures.
However the second part, in bold, says when caching data in deserialized form is important. When and why would that be important? If you're only going to use data in 1 task it will still get serialized/deserialized once, no?
I think you ignored following part:
and deserialized before running each task.
A single stage typically consist of multiple tasks (it is not common to have only a single partition, is it?) and multiple tasks belonging to the same stage can be processed by the same executor. Since deserialization can be quite expensive you may prefer to perform it only once.

Which design patterns allows managing state of involved objects/ holding (lazy) (im)mutable state, inspect and modify object passed/ returned etc

Consider two problems:
We have a wrapper that detects if the wrapped object started a transaction, keeps the transaction number and makes it available to users of wrapper through a method. Can it be called a facade, assuming it simplifies interface of course?
There is a communication layer which provides high-level interface for low-level operations required to execute functions on attached device (these involves pushing bytes through socket and parsing the answers). Some of the answers contains a special "prompt number" which is required for some other queries. Communication layer detects answers which contains a prompt number and stores that number in a special holder which is available to caller. Could that be called a facade?
Overall those questions are related to a more general question:
Which design patterns allows to store or manage mutable or immutable state and/ or inspect the objects that are passed to wrapped objects or returned from them.
Take a look at the Observer Pattern http://en.wikipedia.org/wiki/Observer_pattern
The State pattern could be of use as well: http://en.wikipedia.org/wiki/State_pattern
and perhaps also Memento http://en.wikipedia.org/wiki/Memento_pattern
depending on what you want to accomplish.
For the Observer look at boost signals and slots or at qt signals and slots for some neat implementation.