Overcoming the race condition in lock-free reference-counted dereferences - race-condition

Imagine a structure like this:
struct my_struct {
uint32_t refs
...
}
for which a pointer is acquired through a lookup table:
struct my_struct** table;
my_struct* my_struct_lookup(const char* name)
{
my_struct* s = table[hash(name)];
/* EDIT: Race condition here. */
atomic_inc(&s->refs);
return s;
}
A race exists between the dereference and the atomic increment in a multi-threaded model. Given that this is very performance critical code, I was wondering how this race inbetween the dereference and atomic increment is typically resolved or worked around?
EDIT: When acquiring a pointer to a my_struct structure via the lookup table, it is necessary to first dereference the structure in order to increment its reference count. This creates a problem in multi-threaded code when other threads could be altering the reference count and potentially deallocating the object itself while another thread would then dereference a pointer to non-existent memory. Combined with preemption and some bad luck, this could be a recipe for disaster.

As someone said above, you can make linked list of memory to free at some later time, so your pointers are never invalid. This is a handy method in some cases.
Or....you can make a 64 bit struct with your 32 bit pointer and have 32 bits for a ref count and other flags. You can use 64 bit atomic ops on the struct if you wrap it in a union:
union my_struct_ref {
struct {
unsigned int cUse : 16,
fDeleted : 1; // etc
struct my_struct *s;
} Data;
unsigned long n64;
}
You can human readably work with the Data part of the struct, and you can use CAS on the n64 bit part.
my_struct* my_struct_lookup(const char* name)
{
struct my_struct_ref Old, New;
int iHash = hash(name);
// concurrency loop
while (1) {
Old.n64 = table[iHash].n64;
if (Old.Data.fDeleted)
return NULL;
New.n64 = Old.n64;
New.Data.cRef++;
if (CAS(&table[iHash].n64, Old.n64, New.n64)) // CAS = atomic compare and swap
return New.Data.s; // success
// we get here if some other thread changed the count or deleted our pointer
// in between when we got a copy of it int old. Just loop to try again.
}
}
If you are using 64 bit pointers you will need to do 128 bit CAS.

One solution is to use a freelist, rather than malloc() and free(). This has obvious drawbacks.
Another is to implement lock-free garbage collection (also known as Safe Memory Reclaimation).
There are MANY patents in this field, but it appears that epoch-based LFGC is unencumbered.
The upshot of using this method is that elements are only deallocated when no threads are pointing at them.
The former solution is very easy to implement. You need a lock-free freelist, of course, or your overall system is no longer lock-free.
The latter is really not complex, but requires learning the algorithm in question, which takes some time and research.

Beside the race you identified, you have a general problem of memory consistency.
Even if you could make the table modifications atomic in a lock-free fashion, the block of memory my_struct* points to could still be "stale" when seen from a different thread compared to the thread that last modified it. This does not apply to my_struct.refs (provided you always access it using atomics), but does apply to all other fields. This is the consequence of write buffers and caches that are "private" to each CPU core.
The only way to guarantee you are seeing the correct memory content is to use a memory barrier. Yet, a typical lock is also a memory barrier, so why not just use the lock in the first place?
Lock-free programming is much trickier than may initially seem, OTOH locks can be very fast, especially when contentions are rare. Have you actually benchmarked lock-based implementation and confirmed that locking is indeed your bottleneck?

Related

Using flatbuffers struct as a key

I am considering using flatbuffers' serialized struct as a key in a key-value store. Here is an example of the structs that I want to use as a key in rocksdb.
struct Foo {
foo_id: int64;
foo_type: int32;
}
I read the documentation and figured that the layout of a struct is deterministic. Does that mean it is suitable to be used as a key? If yes, how do I serialize a struct and deserialize it back. It seems like Table has API for serialization/deserialization but struct does not (?).
I tried serializing struct doing it as follows:
constexpr int key_size = sizeof(Foo);
using FooKey = std::array<char, key_size>;
FooKey get_foo_key(const Foo& foo_object) {
FooKey key;
std::memcpy(&key, &foo_object, key_size);
return key;
}
const Foo* get_foo(const FooKey& key) {
return reinterpret_cast<const Foo*>(&key);
}
I did some sanity checks and the above seems to work in my Ubuntu 18 docker image and is blazing fast. So my questions are as follows:
Is this a safe thing to do on a machine if it passes FLATBUFFERS_LITTLEENDIAN and uint8/char equivalence checks? Or are there any other checks needed?
Are there any other caveats that I should be aware of when doing it as demonstrated above?
Thanks in advance !
You don't actually need to go via std::array, the Foo struct is already a block of memory that is safe to copy or cast as you wish. It needs no serialization functions.
Like you said, that memory contains little endian data, so FLATBUFFERS_LITTLEENDIAN must pass. Actually even on a big endian machine you may copy these structures all you want, as long as you use the accessors to read the fields (which do a byteswap on access on big endian). The only thing that won't work on big endian is casting the struct to, say, an int64_t * to read the first field without using the accessor methods.
The other caveat to certain casting operations is strict aliasing, if you have that turned on certain casts may be undefined behavior.
Also note that in this example Foo will be 16 bytes in size on all platforms, because of alignment.

Confusion regarding reentrant functions

My understanding of "reentrant function" is that it's a function that can be interrupted (e.g by an ISR or a recursive call) and later resumed such that the overall output of the function isn't affected in any way by the interruption.
Following is an example of a reentrant function from Wikipedia https://en.wikipedia.org/wiki/Reentrancy_(computing)
int t;
void swap(int *x, int *y)
{
int s;
s = t; // save global variable
t = *x;
*x = *y;
// hardware interrupt might invoke isr() here!
*y = t;
t = s; // restore global variable
}
void isr()
{
int x = 1, y = 2;
swap(&x, &y);
}
I was thinking, what if we modify the ISR like this:
void isr()
{
t=0;
}
And let's say, then, that the main function calls the swap function, but then suddenly an interrupt occurs, then the output would surely get distorted as the swap wouldn't be proper, which in my mind makes this function non-reentrant.
Is my thinking right or wrong? Is there some mistake in my understanding of reentrancy?
The answer to your question:
that the main function calls the swap function, but then suddenly an interrupt occurs, then the output would surely get distorted as the swap wouldn't be proper, which in my mind makes this function non-reentrant.
Is no, it does not, because re-entrancy is (by definition) defined with respect to self. If isr calls swap, the other swap would be safe. However, swap is thread-unsafe, though.
The correct way of thinking depends on the precise definition of re-entrancy and thread-safety (See, say Threadsafe vs re-entrant)
Wikipedia, the source of the code in question, selected the definition of reentrant function to be "if it can be interrupted in the middle of its execution and then safely called again ("re-entered") before its previous invocations complete execution".
I have never heard the term re-entrancy used in the context of interrupt service routines. It is generally the responsibility of the ISR (and/or the operating system) to maintain consistency - application code should not need to know anything about what an interrupt might do.
That a function is re-entrant usually means that it can be called from multiple threads simultaneously - or by itself recursively (either directly or through a more elaborate call chain) - and still maintain internal consistency.
For functions to be re-entrant they must generally avoid using static variables and of course avoid calls to other functions that are not themselves re-entrant.

How does gcc push local variables on to the stack?

void
f
()
{
int a[1];
int b;
int c;
int d[1];
}
I have found that these local variables, for this example, are not pushed on to the stack in order. b and c are pushed in the order of their declaration, but, a and d are grouped together. So the compiler is allocating arrays differently from any other built in type or object.
Is this a C/C++ requirement or gcc implementation detail?
The C standard says nothing about the order in which local variables are allocated. It doesn't even use the word "stack". It only requires that local variables have a lifetime that begins on entry to the nearest enclosing block (basically when execution reaches the {) and ends on exit from that block (reaching the }), and that each object has a unique address. It does acknowledge that two unrelated variables might happen to be adjacent in memory (for obscure technical reasons involving pointer arithmetic), but doesn't say when this might happen.
The order in which variables are allocated is entirely up to the whim of the compiler, and you should not write code that depends on any particular ordering. A compiler might lay out local variables in the order in which they're declared, or alphabetically by name, or it might group some variables together if that happens to result in faster code.
If you need to variables to be allocated in a particular order, you can wrap them in an array or a structure.
(If you were to look at the generated machine code, you'd most likely find that the variables are not "pushed onto the stack" one by one. Instead, the compiler will probably generate a single instruction to adjust the stack pointer by a certain number of bytes, effectively allocating a single chunk of memory to hold all the local variables for the function or block. Code that accesses a given variable will then use its offset within the stack frame.)
And since your function doesn't do anything with its local variables, the compiler might just not bother allocating space for them at all, particularly if you request optimization with -O3 or something similar.
The compiler can order the local variables however it wants. It may even choose to either not allocate them at all (for example, if they're not used, or are optimized away through propagation/ciscizing/keeping in register/etc) or allocate the same stack location for multiple locals that have disjoint live ranges.
There is no common implementation detail to outline how a particular compiler does it, as it may change at any time.
Typically, compilers will try to group similar sized variables (and/or alignments) together to minimize wasted space through "gaps", but there are so many other factors involved.
structs and arrays have slightly different requirements, but that's beyond the scope of this question I believe.

How a programmers solve the dilemma of using old variables instead of new variables?

For example:
... some code
int sizeOfSomeObject = someObject.length();
... some code, sizeOfSomeObject is not need anymore
now I need other int variable for other action(for example, for position in some object), and i have the dilemma: create a new variable or use sizeOfSomeObject for this. In the first case I will keep readability, but lose performance. In the second case - on the contrary. What usually do programmers in this situation?
In the first case I will keep readability, but lose performance. In the second case - on the contrary.
So did you benchmark it? I suspect no, you didn't. Most modern compilers do a lot of agressive analysis during register allocation, so if the optimizer perceives that there's a variable that's not used anymore, but there's a new variable of the same type, it will just merge the two variables to the same memory region or processor register. No need to worry about performance penalties.
And anyway, don't do premature optimization (which this is). In 90% of the cases, readability is more important than "performance".
All in all, go ahead and create a new variable with an appropriate, different, descriptive name. And just for fun, compile this version and the version in which you used the same variable name, and look at the generated assembly (or bytecode, or...) - and find out that they're identical.
I would use different named variables for different things.
In terms of something like this, I don't think just one variable would cause a massive performance hit. In most languages you have the option to clear variables from memory in some way when they are no longer in use, so I would recommend doing that so that the code means something to you or others when read at a later date.
In C++, you can use blocks for objects to be destroyed as soon as they are not needed anymore:
void some_function () {
{
MyClass c;
// ... here we use c ...
}
// now c has been destroyed
{
MyClass d;
// ... here we use d ...
}
// now d has been destroyed
}
In your example (with int variables), there is no reason to worry about performance. The worst thing that could probably happen is memory for two variables being used instead of one, but (i) that's negligible and (ii) int's will probably live in a CPU register, anyway. If you really worry, use the block approach for your int example.
It depends how often such an int would be initialized. If it's not in some hugely nested for loop, most (all) programmers will go for the first. Besides, most modern programming languages have a garbage collector, which cleans up left over objects.
Decent compiler will optimize out your second variable, so that shouldn't be an issue.
That said, there are situations where variable reuse makes sense. E.g., you might have some variable that holds a generic output populated from call to some external API. According to the context and parameters passed to the API you'll process the data differently but it's probably better (more readable etc.) to reuse the same data variable.
For example, something like this:
void* data = getSomeData(params);
//process data
//change params
data = getSomeData(params);
//process data
//change params
data = getSomeData(params);

const vs enum in D

Check out this quote from here, towards the bottom of the page. (I believe the quoted comment about consts apply to invariants as well)
Enumerations differ from consts in that they do not consume any space
in the final outputted object/library/executable, whereas consts do.
So apparently value1 will bloat the executable, while value2 is treated as a literal and doesn't appear in the object file.
const int value1 = 0xBAD;
enum int value2 = 42;
Back in C++ I always assumed this was for legacy reasons, and old compilers that couldn't optimize away constants. But if this is still true in D, there must be a deeper reason behind this. Anyone know why?
Just like in C++, an enum in D seems to be a "conserved integer literal" (edit: amazing, D2 even supports floats and strings). Its enumerators have no location. They are just immaterial as values without identity.
Placing enum is new in D2. It first defines a new variable. It is not an lvalue (so you also cannot take its address). An
enum int a = 10; // new in D2
Is like
enum : int { a = 10 }
If i can trust my poor D knowledge. So, a in here is not an lvalue (no location and you can't take its address). A const, however, has an address. If you have a global (not sure whether this is the right D terminology) const variable, the compiler usually can't optimize it away, because it doesn't know what modules can access that variable or could take its address. So it has to allocate storage for it.
I think if you have a local const, the compiler can still optimize it away just as in C++, because the compiler knows by looking at its scope whether or not anyone is interested in its address or whether everyone just takes its value.
Your actual question; why enum/const is the same in D as in C++; seems to be unanswered. Sadly there exists no good reason for this choice whatsoever. I believe that this was just an unintentional side effect in C++ that became a de facto pattern. In D the same pattern was needed, and Walter Bright decided that it should be done as in C++ such that those coming from that place would recognize what to do ... In fact, before this rather IMHO silly decision, the keyword manifest was used instead of enum for this usecase.
I think a good compiler/linker should still remove the constant. It's just that with the enum, it's actually guaranteed in the spec. The difference is primarily a matter of semantics. (Also keep in mind that 2.0 isn't complete yet)
The real purpose of enum being expanded syntactically to support single manifest constants, from what I understand, is that Don Clugston, a D template guru, was doing some crazy stuff with templates. He kept running into long build times, ridiculous compiler memory usage, etc. because the compiler kept creating internal data strucutres for const variables. One key thing about const/immutable variables compared to enums is that const/immutable variables are lvalues and can have their address taken. This means there is some extra overhead for the compiler. This usually doesn't matter, but when you're executing really complicated compile-time metaprograms, even if const variables are optimized away, this is still significant overhead at compile time.
It sounds like the enum value will be used "inline" in expressions where as the const will actually take storage and any expression referencing it will be loading the value from the memory storage.
This sound similar to the difference between const vs. readonly in C#. The former is a compile-time constant and the later is a run-time constant. This definitely affected versioning of assemblies (since assemblies referencing a readonly would receive a copy at compile time and would not get a change to the value if the referenced assembly was rebuilt with a different value).