Partial update of device local buffer in Vulkan - vulkan

I'm generating vertex data to memory (from voxel data), setting up a staging buffer (host visible) (vkCreateBuffer), copying vertex data into staging buffer, setting up a device local buffer (vkCreateBuffer) and copy the buffer from host visible to device local (vkCmdCopyBuffer).
From what I understand there is a limit to how many buffers I can have, so I probably can't create one buffer per model.
For static models this is fine, just mash them together and upload. But I want to modify a few random vertexes "regularly". For this I'm thinking of doing differential update of device local buffers. So in a big buffer I only update the data that actually changed. Can this be done?
If I don't render anything from host visible buffer then it will not take up any resources on GPU? So I could keep the host visible buffers and don't have to recreate and fill them?

Yes you should be able to do what you want. Essentially you are sending your updated data without a staging buffer and copy command (similar to how we generally populate uniform buffers for example).
In psuedo-code:
update the data in your application
map the buffer
copy the changed data
unmap the buffer
synchronize
The last part if the tricky aspect - you could simply make the buffer VK_MEMORY_PROPERTY_HOST_COHERENT_BIT which means it will be updated before the next call to vkQueueSubmit. So basically you would want to do the above before the next frame is rendered, see spec. Note that the buffer will need to be VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT.
Whether you make this 'dynamic' data part of your uber-buffer or a completely separate buffer is up to you.
This approach is used for things like particle systems that are managed by the application. Obviously copying directly to a buffer that is visible to both the host and the GPU is (presumably) slower than the staging/copy approach.
If however you want to send some data via a staging buffer and copy command to a buffer that is only visible to the host and then periodically modify some or all of that data (e.g. for deformable terrain) then that might be trickier, it's not something I have looked into.
EDIT: just seen the following post that might be related? Best way to store animated vertex data

Related

Aerospike: Device Overload Error when size of map is too big

We got "device overload" error after the program ran successfully on production for a few months. And we find that some maps' sizes are very big, which may be bigger than 1,000.
After I inspected the source code, I found that the reason of "devcie overload" is that the write queue is beyond limitations, and the length of the write queue is related to the effiency of processing.
So I checked the "particle_map" file, and I suspect that the whole map will be rewritten even if we just want to insert one pair of KV into the map.
But I am not so sure about this. Any advice ?
So I checked the "particle_map" file, and I suspect that the whole map will be rewritten even if we just want to insert one pair of KV into the map.
You are correct. When using persistence, Aerospike does not update records in-place. Each update/insert is buffered into an in-memory write-block which, when full, is queued to be written to disk. This queue allows for short bursts that exceed your disks max IO but if the burst is sustained for too long the server will begin to fail the writes with the 'device overload' error you have mentioned. How far behind the disk is allowed to get is controlled by the max-write-cache namespace storage-engine parameter.
You can find more about our storage layer at https://www.aerospike.com/docs/architecture/index.html.

Chronicle Map v3.9.0 returning different sizes

I am using v3.9.0 of Chronicle Map where I have two processes where Process A writes to a ChronicleMap and Process B just initializes with the same persistent file that A uses. After loading, I print Map.size in Process A and Process B but I get different Map size. I am expecting both sizes to be the same. In what cases, can I see this behaviour?
How can I troubleshoot this problem? Is there any kind of flush operation required?
One thing, I tried to do is to dump the file using getAll method but it dumps everything as a json on a single file which is pretty much killing any of the editors I have. I tried to use MapEntryOperations in Process B to see if anything interesting happening but seems like it is mainly invoked when something is written into the map but not when Map is initialized directly from the persistent store.
I was using createOrRecoverPersistedTo instead of createPersistedTo method. Because of this, my other process was not seeing the entire data.
As described in Recovery section in the tutorial elaborates:
.recoverPersistedTo() needs to access the Chronicle Map exclusively. If a concurrent process is accessing the Chronicle Map while another process is attempting to perform recovery, result of operations on the accessing process side, and results of recovery are unspecified. The data could be corrupted further. You must ensure no other process is accessing the Chronicle Map store when calling for .recoverPersistedTo() on this store."
This seems pretty strange. Size in Chronicle Map is not stored in a single memory location. ChronicleMap.size() walks through and sums each segment's size. So size is "weakly consistent" and if one process constantly writes to a Map, size() calls from multiple threads/processes could return slightly different values. But if nobody writes to a map (e. g. in your case just after loading, when process A hasn't yet started writing) all callers should see the same value.
Instead of getAll() and analyzing output manually, you can try to "count" entries by something like
int entries = 0;
for (K k : map.keySet()) {
entries++;
}

Couchbase 3.1.0 - Hard out of memory error when performing full backup

We recently migrated to Couchbase 3.1.0. The odd thing is - when performing full backup of a bucket, web UI alerts "Hard Out Of Memory Error. Bucket X on node Y is full. All memory allocated to this bucket is used for metadata". The numbers from RAM usage in the web UI contradict that - about 75% is used, but not 100%. I looked into the logs, but haven't find any similar errors there.
Is that even normal?
This is a known issue in the Couchbase Server 3.x releases.
To understand the problem, we must also first understand Database Change Protocol (DCP), the protocol used to transfer data throughout the system. At a high level the flow-control for DCP is as follows:
The Consumer creates a connection with the Producer and sends an Open Connection message. The Consumer then sends a Control message to indicate per stream flow control. This messages will contain “stream_buffer_size” in the key section and the buffer size the Consumer would like each stream to have in the value section.
The Consumer will then start opening streams so that is can receive data from the server.
The Producer will then continue to send data for the stream that has buffer space available until it reaches the maximum send size.
Steps 1-3 continue until the connection is closed, as the Consumer continues to consume items from the stream.
The cbbackup utility does not implement any flow control (data buffer limits) however, and it will try to stream all vbuckets from all nodes at once, with no cap on the buffer size.
While this does not mean that it will use the same amount of memory as your overall data size (as the streams are being drained slowly by the cbbackup process), it does mean that a large memory overhead is required to be able to store the data streams.
When you are in a heavy DGM (disk greater than memory) scenario, the amount of memory required to store the streams is likely to grow more rapidly than cbbackup can drain them as it is streaming large quantities of data off of disk, leading to very large streams, which take up a lot of memory as previously mentioned.
The slightly misleading message about metadata taking up all of the memory is displayed as there is no memory left for the data, so all of the remaining memory is allocated to the metadata, which when using value eviction cannot be ejected from memory.
The reason that this only affects Couchbase Server versions prior to 4.0 is that in 4.0 a server-side improvement to DCP stream management was made that allows the pausing of DCP streams to keep the memory footprint down, this is tracked as MB-12179.
As a result, you should not experience the same issue on Couchbase Server versions 4.x+, regardless of how DGM your bucket is.
Workaround
If you find yourself in a situation where this issue is occurring, then terminating the backup job should release all of the memory consumed by the streams immediately.
Unfortunately if you have already had most of your data evicted from memory as a result of the backup, then you will have to retrieve a large quantity of data off of disk instead of RAM for a small period of time, which is likely to increase your get latencies.
Over time 'hot' data will be brought into memory when requested, so this will only be a problem for a small period of time, however this is still a fairly undesirable situation to be in.
The workaround to avoid this issue completely is to only stream a small number of vbuckets at once when performing the backup, as opposed to all vbuckets which cbbackup does by default.
This can be achieved using cbbackupwrapper which comes bundled with all Couchbase Server releases 3.1.0 and later, details of using cbbackupwrapper can be found in the Couchbase Server documentation.
In particular the parameter to pay attention to is the -n flag, which specifies the number of vbuckets to be backed up in a batch at once.
As the name suggests, cbbackupwrapper is simply a wrapper script on top of cbbackup which partitions the vbuckets up and automatically handles all of the directory creation and backup generation, while still using cbbackup under the hood.
As an example, with a batch size of 50, cbbackupwrapper would backup vbuckets 0-49 first, followed by 50-99, then 100-149 etc.
It is suggested that you test with cbbackupwrapper in a testing environment which mirrors your production environment to find a suitable value for -n and -P (which controls how many backup processes run at once, the combination of these two controls the amount of memory pressure caused by backup as well as the overall speed).
You should not find that lowering the value of -n from its default 100 decreases the backup speed, in some cases you may find that the backup speed actually increases due to the fact that there is far less memory pressure on the server.
You may however wish to sensibly adjust the -P parameter if you wish to speed up the backup further.
Below is an example command:
cbbackupwrapper http://[host]:8091 [backup_dir] -u [user_name] -p [password] -n 50
It should be noted that if you use cbbackupwrapper to perform your backup then you must also use cbrestorewrapper to restore the data, as cbrestorewrapper is automatically aware of the directory structures used by cbbackupwrapper.
When you run a full backup, by default the backup tool streams data from all nodes over the network. This is not the best way, because it causes a lot of extra load and increased memory usage, especially of you run cbbackup on one of the Couchbase nodes. I would use the data-copy mode of cbbackup, which copies data directly from the files on disk:
> sudo /opt/couchbase/bin/cbbackup couchstore-files:///opt/couchbase/var/lib/couchbase/data/ /tmp/backup
Of course, change the data path to wherever your Couchbase data is actually stored. (In my example it runs as sudo because only root has read access to /opt/couchbase/blabla..) Do this on every node, then collect all the backup folders and put them somewhere. Note that the backups are very compressible, so you might want to zip them before copying over the network.

Does writeToFile:atomically: blocks asynchronous reading?

A few times while using my application I am processing some large data in the background. (To be ready when the user needs it. Something kind of indexing.) When this background process finished it needs to the save the data in a cache file, but since this is really large it take some seconds.
But at the same time the user may open some dialog which displays images and text loaded from the disk. If this happens at the same time while the background process data is saved, the user interface needs to wait until the saving process is completed. (This is not wanted, since the user then have to wait 3-4 seconds until the images and texts from the disk are loaded!)
So I am looking a way to throttling the writing to disk. I thought of splitting up the data in chunks and inserting a short delay between saving the different chunks. In this delay, the user interface will be able to load the needed texts and images, so the user will not recognize a delay.
At the moment I am using [[array componentsJoinedByString:'\n'] writeToFile:#"some name.dic" atomically:YES]. This is very high-level solution which doesn't allow any customization. How can I implement without large data into one file without saving all the data as one-shot?
Does writeToFile:atomically: blocks asynchronous reading?
No. It is like writing to a temporary file. Once completed successfully, then renaming the temporary file to the destination (replacing the pre-existing file at the destination, if it exists).
You should consider how you can break your data up, so it is not so slow. If it's all divided by strings/lines and it takes seconds, and easy approach to divide the database would be by first character. Of course, a better solution could likely be imagined, based on how you access, search, and update the index/database.
…inserting a short delay between saving the different chunks. In this delay, the user interface will be able to load the needed texts and images, so the user will not recognize a delay.
Don't. Just implement the move/replace of the atomic write yourself (writing to a temporary file during index and write). Then your app can serialize read and write commands explicitly for fast, consistent and correct accesses to these shared resources.
You have to look to the class NSFileHandle.
Using combination of seekToEndOfFile and writeData:(NSData *)data you can do the work you wish.

Fault tolerant system design

There is a DB as data store and y (>5) other machines. There is a machine A that has data (updated) every x mins. The y machines gets the data from Machine A every x mins, updates the data in the database. Every machine doing the same is for some fault tolerance. Is there a clean way to model the working with fault tolerance?
Any pointers is appreciated.
This is a problem with very large scope. How is the data structured? How are the "db loaders" get the data from the "data producing" machine? What happens if an update fails- is the data lost or must it be persisted at any cost?
I will make some assumptions and suggest a solution:
1. The data can be partitioned.
2. You have access to a central persistent buffer. e.g. MSMQ or WebSphere MQ.
The machine generating the data puts chunks inside a central queue. Each chunk is composed of a set of record IDs and the new values for relevant properties)- you decide the granularity.
The "db loaders" listen to the queue and each de-queues a chunk (the contention is only on the dequeue-stage and is very optimized) and updates its own set of ids.
This way insert work is distributed among the machines, each handles its own portion, and if one crashes, well- the others wills simply work a bit harder.
In case of a failure to update you can return the chunk to the queue and retry later (transactional read).