How to correctly use decriptor sets for multiple interleaving buffers? - vulkan

I have a uniform buffer which should be updated every frame. In order to avoid big stalls I want to create 3 buffers (by the number of my frame buffers) which should be interleaved every frame (0-1-2-0-1-2-...). But I can't understand how to create descriptors and bind them. This is how I'm doing it now:
I created a VkDescriptorSetLayout where I specified that I want to use a uniform buffer at binding position 0 in some shader stage.
I created a VkDescriptorPool with a size for 3 descriptors for uniform buffers.
Next I need to allocate descriptor sets but how many descriptor sets do I need here? I have only one VkDescriptorSetLayout and I'm expecting to get one VkDescriptorSet.
Next I need to update created descriptor sets. Since I have only one binding (0) in my shader I can use only one one buffer in VkDescriptorBufferInfo which will be passed to VkWriteDescriptorSet which will be passed to vkUpdateDescriptorSets. But what about other two buffers? Where to specify them?
When I need to record a command I need to bind my descriptor set. But till now I have a descriptor set which is updated only for one buffer. What about the others?
Do I need to create 3 VkDescriptorSetLayout - one for every frame? Next do I need to allocate and update corresponding descriptor set with a corresponding buffer? And after this do I need to create 3 different command buffers where I should specify corresponding descriptor set.
It seems it's a lot of work - the data is almost the same - all bindings, states stays the same, only the buffer changes.
All it sounds very confusing so please don't hesitate to clarify.

Descriptor Set Layouts defines the contents of a descriptor set - what types of resources (descriptors) given set contains. When You need several descriptor sets with a single uniform buffer, You can create all of these descriptor sets using the same layout (layout is only a description, a specification). This way You just tell the driver: "Hey, driver! Give me 3 descriptor sets. But all of them should be exactly the same".
But because they are created from the same layout it doesn't mean they must contain the same resource handles. All them (in Your case) must contain a uniform buffer. But what resource will be used for this uniform buffer depends on You. So each descriptor set can be updated with separate buffer.
Now when You want to use 3 buffers one after another in three consecutive frames, You can do it in several different ways:
You can have a single descriptor set. Then in every frame, before You start preparing command buffers, You update the descriptor set with the next buffer. But when You update a descriptor set, it cannot be used by any submitted (and not yet finished) command buffers. So this would require additional synchronization and wouldn't be that much different than using a single buffer. This way You also cannot "pre-record" command buffers.
You can have a single descriptor set. To change its contents (use a different buffer in it) You can update it through functions added in the VK_KHR_descriptor_update_template extension. It allows descriptor updates to be recorded in command buffers so synchronization should be a bit easier. It should allow You to pre-record command buffers. But it needs an extension to be supported so it's not an option on platforms that do not support it.
Method You probably thought of - You can have 3 separate descriptor sets. All of them allocated using the same layout with a uniform buffer. Then You update each descriptor set with a different buffer (You can use 1st buffer with 1st descriptor set, 2nd buffer with 2nd descriptor set and 3rd buffer with 3rd descriptor set). Now during recording a command buffer, when You want to use a first buffer then You just bind the first descriptor set. In the next frame, You just bind a second descriptor set and so on.
The method 3 is probably the easiest to implement as it requires no synchronization for descriptors (only per frame-level synchronization is required if You have such). It also allows You to pre-record command buffers and it doesn't require any additional extensions to be enabled. But as You noted, it requires more resources to be created and managed.
Don't forget that You need to create a descriptor pool that is big enough to contain 3 uniform buffers but at the same You must also specify that You want to allocate 3 descriptor sets from it (one uniform buffer per descriptor set).
You can read more about descriptor sets in Intel's API without Secrets: Introduction to Vulkan - Part 6 tutorial.
As for Your questions:
Do I need to create 3 VkDescriptorSetLayout - one for every frame?
No - a single layout is enough (as soon as all descriptor sets contain the same types of resources in the same bindings.
Next do I need to allocate and update corresponding descriptor set
with a corresponding buffer?
As per option 3 - yes.
And after this do I need to create 3 different command buffers where I
should specify corresponding descriptor set.
It depends whether You re-record command buffers every frame or if You pre-record them up front. Usually command buffers are re-recorded each frame. But as having a single command buffer requires waiting until its submission is finished, You probably may need a set(s) of command buffers for each frame, that correspond to Your framebuffer images (and descriptor sets). So in frame 0 You use command buffer #0 (or multiple command buffers reserved for frame 0). In frame 1 You use command buffer #1 etc.
Now You can record a command buffer for a given frame and during recording You provide a descriptor set that is associated with a given frame.

Related

Do vertex buffers or index buffers need to have a minimum requirement in Vulkan?

I know that there are minimum uniform buffer and shader storage minimum alignments, but I was wondering if there were minimum alignment for vertices(anything that is read from the input assembler) and indices. Also do the staging buffers need to have an alignment for indices and vertices? How about the copy operations from staging buffer to a device-local buffer and vice versa?
vkCmdBindIndexBuffer's documentation states that offset "must be a multiple of the type indicated by indexType".
The vertex buffer binding functions have similar alignment requirements based on the formats used for them, but they are specified in a more unusual way (and not in the documentation for the function).
There is a section in the specification on how the address for a specific attribute is computed. The wording here puts for a set of de-facto requirements on the pOffsets parameter to vkCmdBindVertexBuffers and similar functions.
The rules boil down to this: you have to specify offsets (and other fields) such that the eventual address computed for each attribute is not misaligned, relative to the format for that attribute. Packed formats have to be multiples of their pack size, while non-packed formats have to be multiples of their component sizes. So while VK_FORMAT_A8B8G8R8_UNORM_PACK32 must be aligned to 4 bytes, VK_FORMAT_R8G8B8A8_UNORM can handle byte-alignment.
Though personally, I wouldn't test the latter.

How to know the configured size of the Chronicle Map?

We use Chronicle Map as a persisted storage. As we have new data arriving all the time, we continue to put new data into the map. Thus we cannot predict the correct value for net.openhft.chronicle.map.ChronicleMapBuilder#entries(long). Chronicle 3 will not break when we put more data than expected, but will degrade performance. So we would like to recreate this map with new configuration from time to time.
Now it the real question: given a Chronicle Map file, how can we know which configuration was used for that file? Then we can compare it with actual amount of data (source of this knowledge is irrelevant here) and recreate a map if needed.
entries() is a high-level config, that is not stored internally. What is stored internally is the number of segments, expected number of entries per segment, and the number of "chunks" allocated in the segment's entry space. They are configured via ChronicleMapBuilder.actualSegments(), entriesPerSegment() and actualChunksPerSegmentTier() respectively. However, there is no way at the moment to query the last two numbers from the created ChronicleMap, so it doesn't help much. (You can query the number of segments via ChronicleMap.segments().)
You can contribute to Chronicle-Map by adding getters to ChronicleMap to expose those configurations. Or, you need to store the number of entries separately, e. g. in a file along with the ChronicleMap persisted file.

Best practice to store List<T> in StackExchange.Redis

I am trying to find best practice(efficient) way of storing set of List objects against ReportingDate key.
List could be serailised as Xml/DataContract or ProtoBuf....
And given some of the data could be big (for that slice of key):
I was wondering if there is any of getting data from redis cache in IEnum/streamed fashion? Atm we using ProtoBuf.NET to have file based cache. And we retrieve data into mem in streamed fashion (we also have an option of selecting what props/fields we want in that T object as ProtoBuf allows us to do it)
Is there any way can force (after some inactivity) certain part of the data to be offloaded from mem and back into file if it is not being used. But load it up again if it is called
Tnx
It sounds like you want a sorted set - see https://redis.io/topics/data-types#sorted-sets. You would use the date as the value, perhaps in epoch time (since it needs to be a number). SE.Redis supports all the operations you would expect to get ranges of values (either positional ranges - the first 20 records, etc; or absolute ranges bases on the value - all items between two dates expressed in the same unit). Look at the methods starting " SortedSet...".
The value can be binary, so protobuf-net is fine (you would serialize the value for each date separately). Just pass a byte[] as the value. You need to handle serialization separately to the redis library.
As for swapping data out: no. Redis has date-based expiration, but doesn't have hot and cold storage. It is either there, or it isn't. You could use scheduled tasks to purge or move data based on date ranges, again using any of the Z* (redis) or SortedSet* (SE.Redis) methods.
For the complete list of Z* operations, see: https://redis.io/commands#sorted_set. They should all be available in SE.Redis.

variable size buffer simulink

I'm receiving variable sized data in each simulation step in simulink. However I need to wait a certain amount of simulation steps before I received the whole data package and therefore I need some kind of variable sized buffer. I have no information about the total amount of data, which I'm going to receive. The only information I got, is the amount of simulation step, that I have to wait until I received the whole data.
I've tried to implement it via a matlab function block and several delay blocks that delay the output data of the matlab function block for one simulation step. but always failing at the variable size constraints (because the delay blocks doesn't support it) and I neither found any buffer block that supports the functionality, that I need here.
Hope, you can help me out!
Given that you know your input and output sample rates, I'd suggest writing a c-mex S-function.
It wouldn't be trivial, but you can
set the input and output ports to have different sample rates
set the input and output ports to have variable signal length
store a pointer to a std::vector<...> class in the P work vector
the std::vector<...> gives you the ability to increase its size as new input data arrives, and be emptied when the data is posted to the output.
Update based on comments:
For code generation you need to specify an upper bound for the size of the buffer, which makes a MATLAB Function block suitable.
Specify the maximum size of the buffer, and keep track of how much f it has been filled using an internal persistent variable.
But the only way to have a block with a different sample rate at its input and its output is to write an S-Function. For the MATLAB Function approach I can think of two approaches,
a) write the code so that it has an internal buffer that fills and only updates the output when the buffer becomes full.
Of course the output sample rate will be the same as the input sample rate, but the data will only change when you specify that it should.
b) have two outputs, one being the buffer, and one being an "I've just become full" logical signal. Then follow the block by a Triggered Subsystem that feeds the buffer straight through it, and is rising edge triggered by the logical signal. The output of the Triggered Subsystem will then only update at the steps when the buffer becomes full.

Circular Buffer in Flash

I need to store items of varying length in a circular queue in a flash chip. Each item will have its encapsulation so I can figure out how big it is and where the next item begins. When there are enough items in the buffer, it will wrap to the beginning.
What is a good way to store a circular queue in a flash chip?
There is a possibility of tens of thousands of items I would like to store. So starting at the beginning and reading to the end of the buffer is not ideal because it will take time to search to the end.
Also, because it is circular, I need to be able to distinguish the first item from the last.
The last problem is that this is stored in flash, so erasing each block is both time consuming and can only be done a set number of times for each block.
First, block management:
Put a smaller header at the start of each block. The main thing you need to keep track of the "oldest" and "newest" is a block number, which simply increments modulo k. k must be greater than your total number of blocks. Ideally, make k less than your MAX value (e.g. 0xFFFF) so you can easily tell what is an erased block.
At start-up, your code reads the headers of each block in turn, and locates the first and last blocks in the sequence that is ni+1 = (ni + 1) MODULO k. Take care not to get confused by erased blocks (block number is e.g. 0xFFFF) or data that is somehow corrupted (e.g. incomplete erase).
Within each block
Each block initially starts empty (each byte is 0xFF). Each record is simply written one after the other. If you have fixed-size records, then you can access it with a simple index. If you have variable-size records, then to read it you have to scan from the start of the block, linked-list style.
If you want to have variable-size records, but avoid linear scan, then you could have a well defined header on each record. E.g. use 0 as a record delimiter, and COBS-encode (or COBS/R-encode) each record. Or use a byte of your choice as a delimiter, and 'escape' that byte if it occurs in each record (similar to the PPP protocol).
At start-up, once you know your latest block, you can do a linear scan for the latest record. Or if you have fixed-size records or record delimiters, you could do a binary search.
Erase scheduling
For some Flash memory chips, erasing a block can take significant time--e.g. 5 seconds. Consider scheduling an erase as a background task a bit "ahead of time". E.g. when the current block is x% full, then start erasing the next block.
Record numbering
You may want to number records. The way I've done it in the past is to put, in the header of each block, the record number of the first record. Then the software has to keep count of the numbers of each record within the block.
Checksum or CRC
If you want to detect corrupted data (e.g. incomplete writes or erases due to unexpected power failure), then you can add a checksum or CRC to each record, and perhaps to the block header. Note the block header CRC would only cover the header itself, not the records, since it could not be re-written when each new record is written.
Keep a separate block that contains a pointer to the start of the first record and the end of the last record. You can also keep more information like the total number of records, etc.
Until you initially run out of space, adding records is as simple as writing them to the end of the buffer and updating the tail pointer.
As you need to reclaim space, delete enough records so that you can fit your current record. Update the head pointer as you delete records.
You'll need to keep track of how much extra space has been freed. If you keep a pointer to end of the last record, the next time you need to add a record, you can compare that with the pointer to the first record to determine if you need to delete any more records.
Also, if this is NAND, you or the flash controller will need to do deblocking and wear-leveling, but that should all be at a lower layer than allocating space for the circular buffer.
I think I get it now. It seems like your largest issue will be, having filled the available space for recording, what happens next? The new data should overwrite the oldest data, which is I believe what you mean by a circular buffer. But since the data is not fixed length you may overwrite more than one record.
I'm assuming that the amount of variability in length is high enough that padding everything out to a fixed length isn't an option.
Your write segment needs to keep track of the address that represents the start of the next record to write. If you know the size of a block to write ahead of time, you can tell if you are going to end up at the end of the logical buffer and start over at '0'. I wouldn't split a record up with some at the end and some at the beginning.
A separate register can track the beginning; this is the oldest data that hasn't been overwritten yet. If you went to read out the data this is where you would start.
The data writer then would check, given the write-start address and the length of data its about to commit, if it should bump the read register, which would examine the first block and see the length, and advance to the next record, until there is enough room to write whatever the data is. There will be a gap of junk data that lives between the end of the written data and the start of the oldest data, probably. But this way, you can just be writing an address or two as overhead, and not rearranging blocks.
At least, that's probably what I would do. HTH
The "circular" in a flash can be done on basis of block size, which means that you must declare how much blocks of the flash you allocate for this buffer.
The actual size of the buffer will be at each particular time between n-1 (n is the number of blocks) and n.
Each block should start with an header that contains sequential number or timestamp that could be used to determine which block is older than the other.
Each Item encapsulated with an header and a footer. the default header contains whatever you want but according to this header you must know the size of the item. The default footer is 0xFFFFFFFF. This value indicates a null termination.
In your RAM you must save a pointer to the oldest block and the latest block and pointer to the oldest item and latest item. On power up you go over all blocks find the relevant blocks and load this members.
When you want to store a new item, you check if the latest block contain enough space for this item. If it does you save the item at the end of the previous item and the change the previous footer to point to this item. If it does not contain enough space you need to erase the oldest block. Before you erase this block change the oldest block members (RAM) to point on the next block and the oldest item to point on the first item in this block.
Then you can save the new item in this block and change the footer of the latest item to point this item.
I know that the explanation may sounds complicated but the process is very simple and if you write it correct you can make it even power fail safe (always keep in you mind the order of the writes).
Pay attention that the circularity of the buffer is not saved in the flash but the flash only contains a blocks with items that you can decide according to the blocks headers and items headers what is the order of these items
I see three options:
option1: is to pad everything out to the same size, this is simple, store a pointer to the head and tail of the buffer so you know where to write and where to start reading from, use the size of each object to get an offset to the next, this means you need to transverse the buffer as you would a linked list, aka its slow if you need item 5000.
option2: is to store only pointers to the real data in the circular buffer, that way when you loop around you don't have to deal with size mis-matchs. if you store the real data in a circular buffer and don't pad it out you could run into a situations where your over witting multiple items with 1 new data object, i assume this is not ok.
store the actual data elsewhere in flash, most flash will have some sort of wear leveling built in, if so you don't need to worry about overwriting the same location multiple times, the IC will figure out where to actually store it on chip, just write to to the next available free space.
this means you need to pick a maximum size for the circular buffer how you do this depends on the data variability. If the size of the data just change much, say by only a few bytes, then you should just pad it out and use option 1. If the size changes wildly and unpredictably, choose the largest size it could be and figure out how many objects of that size would fit in your flash, use that as the max number of entries in the buffer. This means you waste a bunch of space.
option 3: if the object can really be any size, your at the point where you should just use a file system, name the files in order and loop back when your full keeping in mind if your new entry is large you may have to delete multiple old entries to fit it in. This is really just an extension of option 2 as option2 is in many ways a simple file system.