What is the use of .range field when updating a buffer descriptor? - vulkan

When I went to update a UNIFORM_BUFFER descriptor, I set up the buffer info:
VkDescriptorBufferInfo buffer_info;
buffer_info.buffer = /* SOME BUFFER */;
buffer_info.offset = 0;
buffer_info.range = 0; // I assume this doesn't do any thing for this use
And then vkUpdateDescriptorSets().
Get the validation layer error:
VkDescriptorBufferInfo range is not VK_WHOLE_SIZE and is zero, which
is not allowed.. The Vulkan spec states: If range is not equal to
VK_WHOLE_SIZE, range must be greater than 0
My question is, isn't the job of the buffer info to tell which buffer and what offset to read from the shaders at a particular descriptor set and binding? I didn't think the size of the buffer mattered because generally that's how these things work, you usually specify a buffer and offset and then you read outside that buffer in the shader at your peril.
Let's just I write the wrong range, what would that do? If I write a 32 and in the shader I access 64 bytes in, what happens? Is this argument for validation warnings?
Edit: I just want to clarify the range argument can't mean how much of the buffer I want to copy, what I'm writing to is essentially a pointer. The actual writing of the buffer data is done in a buffer to buffer copy transfer.

A descriptor describes a (usually memory) resource being used by a shader in some capacity. Buffers do have a size, but a shader can use a subset of a buffer's memory range. The descriptor describes which portion of the buffer is being used.
If a descriptor should use the whole size of the buffer assigned to it (starting at offset), that's what VK_WHOLE_SIZE is for.
This allows you to have multiple uniform buffers provided by the same VkBuffer resource. You can even use dynamic uniform block descriptors to change the offset/range without changing the buffer binding itself. This is faster than switching descriptor sets, thus making it easier to provide per-object data.
Let's just I write the wrong range, what would that do?
If the range is smaller than the size of the uniform block specified in the shader, then you'll get a validation failure/undefined behavior.

Related

what's the difference between .descriptorCount in VkDescriptorPoolSize and in VkDescriptorSetLayoutBinding(VkDescriptorSetLayoutCreateInfo)? how set?

what's the difference between .descriptorCount in VkDescriptorPoolSize and in VkDescriptorSetLayoutBinding(VkDescriptorSetLayoutCreateInfo)?
how to set them when there are many shaders, how to set them correctly?
eg.
layout(binding = 0) uniform Buffers {
uint x[];
} buffers[5];
then:
VkDescriptorSetLayoutBinding.descriptorCount = 5,//not 1,
VkDescriptorPoolSize.descriptorCount = 5,//not 1,
what if this same uniform Buffers are in many shaders? should add 5 to descriptorCount when it exist in 1 more shader?
In the case of VkDescriptorSetLayoutBinding::descriptorCount. The spec says:
descriptorCount is the number of descriptors contained in the binding,
accessed in a shader as an array, except if descriptorType is
VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK in which case descriptorCount
is the size in bytes of the inline uniform block. If descriptorCount
is zero this binding entry is reserved and the resource must not be
accessed from any stage via this binding within any pipeline using the
set layout.
So most likely be 1, unless you have an array of textures or array of uniform buffers (which is not very common).
VkDescriptorPoolSize::descriptorCount has nothing to do with the previous one. A descriptor pool is used for allocating descriptors. So descriptorCount here is how many descriptors this pool can provide. The spec says:
descriptorCount is the number of descriptors of that type to allocate.
If type is VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK then
descriptorCount is the number of bytes to allocate for descriptors of
this type.
You can use the same pool for allocating multiple descriptor sets of multiple different layouts. So this most likely needs to be a fairly large number.
Anyways, I think you need to keep studying the general basic concepts around descriptors!

How to map SSBO buffer to CPU in Vulkan similar to glMapBuffer() in openGL

I am making a project in Vulkan, and I want to use an SSBO modified in the GPU on CPU; but Vulkan doesn't have a function to map the buffer, only have a memory function. I tried everything about MemoryMapping, but nothing worked.
With Vulkan, after creating the SSBO memory buffer and specifying memory property flag VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT (which will create the buffer from memory accessible by the system/CPU), use command vkMapMemory() and pass it the void *pointer to use to access the shader block.
The memcpy() command can then be used to read and write data to and from the block (be sure to use fences and avoid reading/writing while the GPU is still using the SSBO).
A quick note on casting and offsetting - whilst using the void pointer to write data to an SSBO with a single memcpy() call is fine, it can't be used to read in the same manner. The pointer has to be cast to the data type in use.
Also, offset arithmetic cannot be performed on void pointers to reach individual structs either.
The data type or struct to which the pointer is cast defines how increment/decrement works - it will do so by the size of said data type and not by bytes in the address (the latter may seem more intuitive).
For example:
(copy the fifth int from a block of ints...)
int theInt;
int *ssboBlockPointer = (int*)vTheSSBOMappedPointer;
memcpy(&theInt, ssboBlockPointer + 5, sizeof(int));
(or copy the 5th struct from a block of structs - offset will move 5 structs)
theStruct oneStruct;
theStruct *ssboBlockPointer = (theStruct*)vTheSSBOMappedPointer;
memcpy(&theStruct , ssboBlockPointer + 5, sizeof(theStruct));

how do i know in advance that the buffer size is enough in nanopb?

im trying to use nanopb, according to the example:
https://github.com/nanopb/nanopb/blob/master/examples/simple/simple.c
the buffer size is initialized to 128:
uint8_t buffer[128];
my question is how do i know (in advance) this 128-length buffer is enough to transmit my message? how to decide a proper(enough but not waste too much due to over-large) size of buffer before initial (or coding) it?
looks like a noob question :) , but thx for your quick suggestion.
When possible, nanopb adds a define in the generated .pb.h file that has the maximum encoded size of a message. In the file examples/simple/simple.pb.h you'll find:
/* Maximum encoded size of messages (where known) */
#define SimpleMessage_size 11
And could specify uint8_t buffer[SimpleMessage_size];.
This define will be available only if all repeated and string fields have been specified (nanopb).max_count and (nanopb).max_size options.
For many practical purposes, you can pick a buffer size that you estimate will be large enough, and handle error conditions. It is also possible to use pb_get_encoded_size() to calculate the encoded size and dynamically allocate storage, but in general that is not a great solution in embedded applications. When total system memory size is limited, it is often better to have a constant sized buffer that you can test with, instead of having the available amount of dynamic memory vary at the runtime.

Vullkan compute shader caches and barriers

I'm trying to understand how the entire L1/L2 flushing works. Suppose I have a compute shader like this one
layout(std430, set = 0, binding = 2) buffer Particles{
Particle particles[];
};
layout(std430, set = 0, binding = 4) buffer Constraints{
Constraint constraints[];
};
void main(){
const uint gID = gl_GlobalInvocationID.x;
for (int pass=0;pass<GAUSS_SEIDEL_PASSES;pass++){
// first query the constraint, which contains particle_id_1 and particle_id_1
const Constraint c = constraints[gID*GAUSS_SEIDEL_PASSES+pass];
// read newest positions
vec3 position1 = particles[c.particle_id_1].position;
vec3 position2 = particles[c.particle_id_2].position;
// modify position1 and position2
position1 += something;
position2 -= something;
// update positions
particles[c.particle_id_1].position = position1;
particles[c.particle_id_2].position = position2;
// in the next iteration, different constraints may use the updated positions
}
}
From what I understand, initially all data resides in L2. When I read particles[c.particle_id_1].position I copy some of the data from L2 to L1 (or directly to a register).
Then in position1 += something I modify L1 (or the register). Finally in particles[c.particle_id_2].position = position1, I flush the data from L1 (or a register) back to L2, right? So if I then have a second compute shader that I want to run afterward this one, and that second shader will read positions of particles, I do not need to synchronize Particles. It would be enough to just put an execution barrier, without memory barrier
void vkCmdPipelineBarrier(
VkCommandBuffer commandBuffer,
VkPipelineStageFlags srcStageMask, // here I put VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT
VkPipelineStageFlags dstStageMask, // here I put VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT
VkDependencyFlags dependencyFlags, // here nothing
uint32_t memoryBarrierCount, // here 0
const VkMemoryBarrier* pMemoryBarriers, // nullptr
uint32_t bufferMemoryBarrierCount, // 0
const VkBufferMemoryBarrier* pBufferMemoryBarriers, // nullptr
uint32_t imageMemoryBarrierCount, // 0
const VkImageMemoryBarrier* pImageMemoryBarriers); // nullptr
Vulkan's memory model does not care about "caches" as caches. Its model is built on the notion of availability and visibility. A value produced by GPU command/stage A is "available" to GPU command/stage B if the command/stage A has an execution dependency with command/stage B. A value produced by GPU command/stage A is "visible" to GPU command/stage B if command/stage A has a memory dependency with command/stage B with regard to the particular memory in question and the access modes that A wrote it and B will access it.
If a value is not both available and visible to a command/stage, then attempting to access it yields undefined behavior.
The implementation of availability and visibility will involve clearing caches and the like. But as far as the Vulkan memory model is concerned, this is an implementation detail it doesn't care about. Nor should you: understand the Vulkan memory model and write code that works within it.
Your pipeline barrier creates an execution dependency, but not a memory dependency. Therefore, values written by CS processes before the barrier are available to CS processes afterwards, but not visible to them. You need to have a memory dependency to establish visibility.
However, if you want a GPU level understanding... it all depends on the GPU. Does the GPU have a cache hierarchy, an L1/L2 split? Maybe some do, maybe not.
It's kind of irrelevant anyway, because merely writing a value to an address in memory is not equivalent to a "flush" of the appropriate caches around that memory. Even using the coherent qualifier would only cause a flush for compute shader operations executing within that same dispatch call. It would not be guaranteed to affect later dispatch calls.
Implementation-dependent. For all we know, a device might have no cache at all, or in future it might be some quantum magic bs.
Shader assignment operation does not imply anything about anything. There's no "L1" or "L2" mentioned anywhere in the Vulkan specification. It is a concept that does not exist.
Completely divorce ourselves from the cache stuff, and all mental bagage that comes with it.
What is important here is that when you read something, then that thing needs to be "visible to" the reading agent (irrespective of what kind of device you use, and what obscure memory architecture it might have). If it is not "visible to", then you might be reading garbage.
When you write something, this does not happen automatically. The writes are not "visible to" anyone.
First you put your writes into src* part of a memory dependency (e.g. via a pipeline barrier). That will make your writes "available from".
Then you put your reader into dst* that will take all referenced writes that are "available from" and make them "visible to" the second synchronization scope.
If you really want to shoehorn this into a cache system concept, don't think of it as levels of cache. Think of it as separate caches. That something is already in some cache does not mean it is in the particular cache the consumer needs.

(STM32) Erasing flash and writing to flash gives HAL_FLASH_ERROR_PGP error (using HAL)

Trying to write to flash to store some configuration. I am using an STM32F446ze where I want to use the last 16kb sector as storage.
I specified VOLTAGE_RANGE_3 when I erased my sector. VOLTAGE_RANGE_3 is mapped to:
#define FLASH_VOLTAGE_RANGE_3 0x00000002U /*!< Device operating range: 2.7V to 3.6V */
I am getting an error when writing to flash when I use FLASH_TYPEPROGRAM_WORD. The error is HAL_FLASH_ERROR_PGP. Reading the reference manual I read that this has to do with using wrong parallelism/voltage levels.
From the reference manual I can read
Furthermore, in the reference manual I can read:
Programming errors
It is not allowed to program data to the Flash
memory that would cross the 128-bit row boundary. In such a case, the
write operation is not performed and a program alignment error flag
(PGAERR) is set in the FLASH_SR register. The write access type (byte,
half-word, word or double word) must correspond to the type of
parallelism chosen (x8, x16, x32 or x64). If not, the write operation
is not performed and a program parallelism error flag (PGPERR) is set
in the FLASH_SR register
So I thought:
I erased the sector in voltage range 3
That gives me 2.7 to 3.6v specification
That gives me x32 parallelism size
I should be able to write WORDs to flash.
But, this line give me an error (after unlocking the flash)
uint32_t sizeOfStorageType = ....; // Some uint I want to write to flash as test
HAL_StatusTypeDef flashStatus = HAL_FLASH_Program(TYPEPROGRAM_WORD, address++, (uint64_t) sizeOfStorageType);
auto err= HAL_FLASH_GetError(); // err == 4 == HAL_FLASH_ERROR_PGP: FLASH Programming Parallelism error flag
while (flashStatus != HAL_OK)
{
}
But when I start to write bytes instead, it goes fine.
uint8_t *arr = (uint8_t*) &sizeOfStorageType;
HAL_StatusTypeDef flashStatus;
for (uint8_t i=0; i<4; i++)
{
flashStatus = HAL_FLASH_Program(TYPEPROGRAM_BYTE, address++, (uint64_t) *(arr+i));
while (flashStatus != HAL_OK)
{
}
}
My questions:
Am I understanding it correctly that after erasing a sector, I can only write one TYPEPROGRAM? Thus, after erasing I can only write bytes, OR, half-words, OR, words, OR double words?
What am I missing / doing wrong in above context. Why can I only write bytes, while I erased with VOLTAGE_RANGE_3?
This looks like an data alignment error, but not the one related with 128-bit flash memory rows which is mentioned in the reference manual. That one is probably related with double word writes only, and is irrelevant in your case.
If you want to program 4 bytes at a time, your address needs to be word aligned, meaning that it needs to be divisible by 4. Also, address is not a uint32_t* (pointer), it's a raw uint32_t so address++ increments it by 1, not 4. As far as I know, Cortex M4 core converts unaligned accesses on the bus into multiple smaller size aligned accesses automatically, but this violates the flash parallelism rule.
BTW, it's perfectly valid to perform a mixture of byte, half-word and word writes as long as they are properly aligned. Also, unlike the flash hardware of F0, F1 and F3 series, you can try to overwrite a previously written location without causing an error. 0->1 bit changes are just ignored.