Given a use case where you append objects to the object store, and update a ref to point to a new commit, is libgit2 safe, where safe is defined as one of the following outcomes:
Power is lost before the ref is updated, no "visible" changes to the head of the repository.
Power is lost after the ref is updated, head points to the new commit and all data is available.
The key points would be at what point does libgit2 guarantee the data is flushed to the disk. Before the ref is updated, do we guarantee that the data in the object database is flushed to disk?
In other words: In what cases could the ref be updated, but the object store has not persisted the data?
UPDATE: I found that libgit2 is now implementing optional support for fsync https://github.com/libgit2/libgit2/pull/4030 which means that all loose object writes (appends) should be on disk before the ref is updated.
Nobody guarantees you that anything gets written to a disk when you loose power. There are only two solutions for this problem: disable write cache or use UPS. At least a battery to flush the RAID controller cache.
Related
Firstly, I am assuming that data structures, like a hash-map for example, can only be stored in-memory but not on disk unless they are serialized. I want to understand why not?
What is holding us back from dumping a block of memory which stores the data structure directly into disk without any modifications?
Something like a JSON could be thought of as a "serialized" python dictionary. We can very well store JSON in files, so why not a dict?
You may say how would you represent non-string values like bool/objects on disk? I can argue "the same way you store them in memory". Am I missing something here?
naming a few problems:
Big endian vs Little endian makes reading data from disk depend on the architecture of the CPU, so if you just dumped it you won't be able to read it again on different device.
items are not contagious in memory, a list (or dictionary) for example only contains pointers to things that exist "somewhere" in memory, you can only dump contagious memory, otherwise you are only storing the locations in memory that the data was in, which won't be the same when you load the program again.
the way structures are laid in memory can change between two compiled versions of the same program, so if you just recompile your application, you may get different layouts for structures in memory so you just lost your data.
different versions of the same application may wish to update the shape of the structures to allow extra functionality, this won't be possible if the data shape on disk is the same as in memory. (which is one of the reasons why you shouldn't be using pickle for portable data storage, despite it using a memory serializer)
we are planning to implement distributed Cache(Redis Cache) for our application. We have a data and stored it in map with having size around 2GB and it is a single object. Currently it is storing in Context scope similarly we have plenty of objects storing into context scope.
Now we are planning to store all these context data into Redis Cache. Here the map data taking high amount of memory and we have to store this map data as single key-value object.
Is it suitable Redis Cache for my requirement. And which data type is suitable to store this data into Redis Cache.
Please suggest the way to implement this.
So, you didn't finish discussion in the other question and started a new one? 2GB is A LOT. Suppose, you have 1Gb/s link between your servers. You need 16 seconds just to transfer raw data. Add protocol costs, add deserialization costs. And you're at 20 seconds now. This is hardware limitations. Of course you may get 10Gb/s link. Or even multiplex it for 20Gb/s. But is it the way? The real solution is to break this data into parts and perform only partial updates.
To the topic: use String (basic) type, there are no options. Other types are complex structures and you need just one value.
So I create a bunch of buffers and images, and I need to set up a memory barrier for some reason.
How do I know what to specify in the srcAccessMask field for the barrier struct of a newly created buffer or image, seeing as at that point I wouldn't have specified the access flags for it? How do I decide what initial access flags to specify for the first memory barrier applied to a buffer or image?
Specifying initial values for other parameters in Vk*MemoryBarrier is easy since I can clearly know, say, the original layout of an image, but it isn't apparent to me what the value of srcAccessMask could be the first time I set up a barrier.
Is it based on the usage flags specified during creation of the object concerned? Or is there some other way that can be used to find out?
So, let's assume vkCreateImage and VK_LAYOUT_UNDEFINED.
Nowhere the specification says it defines some scheduled operation. So it is healthy to assume all its work is done as soon as it returns. Plus, it does not even have memory.
So any synchronization needs would be of the memory you bind to it. Let's assume it is just fresh memory from vkAllocate. Similarly, nowhere it is said in the specification that it defines some scheduled operation.
Even so, there's really only two options. Either the implementation does nothing with the memory, or it null-fills it (for security reason). In the case it null-fills it, that must be done in a way you cannot access the original data (even using synchronization errors). So it is healthy to assume the memory has no "synchronization baggage" on it.
So simply srcStageMask = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT (no previous outstanding scheduled operation) and srcAccessMask = 0 (no previous writes) should be correct.
I've read some article about using suggestion of NSCache, for many it mentioned a recommendation is that to use NSPurgeabledata in an NSCache.
However I just can't catch the point, while the NSCache already be able to evict its content when memory is tight or it reached its count/cost limit, why we still need to use NSPurgeabledata here? Isn't that just potentially slower than using the data object we already have? What kind of advantage can we take here?
The count limit and the total-cost limit are not strictly enforced. That is, when the cache goes over one of its limits, some of its objects might get evicted immediately, later, or never, all depending on the implementation details of the cache.
So the advantages of using NSPurgeabledata here is :-
By using purgeable memory, you allow the system to quickly recover memory if it needs to, thereby increasing performance. Memory that is marked as purgeable is not paged to disk when it is reclaimed by the virtual memory system because paging is a time-consuming process. Instead, the data is discarded, and if needed later, it will have to be recomputed.
It works like a locking mechanism or we can say that it works like synchronization. If data is accessing by one thread then no other thread can access the same one, until unless the first one get completed.
btr,bts instruction is simple and it's can lock the share resource.
Why does the instruction cmpxchg exist? What's the different between these two instructions?
IIRC (it's been a while) lock btr is more expensive than cmpxchg, which was designed to automatically lock the bus for atomicity and to do so as quickly as possible. (Specifically, lock INSTR holds the bus lock for the entire instruction cycle, and does full invalidation, but the microcode for cmpxchg locks and invalidates only when absolutely needed so as to be the fastest possible synchronization primitive.)
(Edit: it also enables fancier (user-)lock-free strategies, per this message.
CMPXCHG [memaddr], reg compares a memory location to EAX (or AX, or
AL); if they are the same, it writes the source operand to the memory
location. This can obviously be used in the same way as XCHG, but it
can be used in another very interesting way as well, for lock-free
synchronization.
Suppose you have a process that updates a shared data structure. To
ensure atomicity, it generates a private updated copy of the data
structure; when it is finished, it atomically updates a single pointer
which used to point to the old data structure so that it now points to
the new data structure.
The straightforward way of doing this will be useful if there's some
possibility of the process failing, and it gives you atomicity. But we
can modify this procedure only a little bit to allow multiple
simultaneous updates while ensuring correctness.
The process simply atomically compares the pointer to the value it had
when it started its work, and if so, makes the pointer point to the new
data structure. If some other process has updated the data structure
in the mean time, the comparison will fail and the exchange will not
happen. In this case, the process must start over from the
newly-updated data structure.
(This is essentially a primitive form of Software Transactional Memory.)
BTR and BTS work on a bit level, where as CMPXCHG works on a wider data type(generally 32, 64 or 128 bits at once). They also function differently, the intel developer manuals give a good summary of how they work. It may also help to note that certain processors may have implemented BTR and BTS poorly (due to them not being so widely utilised), making CMPXCHG the better option for high performance locks.