Is BOOL read/write atomic in Objective C? - objective-c

What happens when two threads set a BOOL to YES "at the same time"?

Here is code for solution suggested by Jacko.
Use volatile uint32_t with OSAtomicOr32Barrier and OSAtomicAnd32Barrier
#import <libkern/OSAtomic.h>
volatile uint32_t _IsRunning;
- (BOOL)isRunning {
return _IsRunning != 0;
}
- (void)setIsRunning:(BOOL)allowed {
if (allowed) {
OSAtomicOr32Barrier(1, & _IsRunning); //Atomic bitwise OR of two 32-bit values with barrier
} else {
OSAtomicAnd32Barrier(0, & _IsRunning); //Atomic bitwise AND of two 32-bit values with barrier.
}
}

No. Without a locking construct, reading/writing any type variable is NOT atomic in Objective C.
If two threads write YES at the same time to a BOOL, the result is YES regardless of which one gets in first.
Please see: Synchronizing Thread Execution

I would have to diverge from the accepted answer. Sorry.
While objective c does not guarantee that BOOL properties declared as nonatomic
are in fact atomic I'd have to guess that the hardware you most
care about (all iOS and macos devices) have instructions to perform byte reads and stores atomically. So, unless Apple comes out with Road Light OS running
on an IBM microcontroller that has 5 bit wide bus to send 10 bit bytes over
you could just as well use nonatomic BOOLs in a situation that calls for atomic BOOLs. The code would not be portable to Road Light OS but if you can
sacrifice the futureproofing of your code nonatomic is fine for this use case.
I'm sure there are hardened individuals on s.o. that would raise to the challenge of disassembling synthesized BOOL getter and setter for atomic/nonatomic cases to see what's the difference. At least on ARM.
Your takeaway from this is likely this
you can declare BOOL properties as atomic and it won't cost you a dime
on all HW iOS and macOS intrinsically supports.
memory barriers are orthogonal to atomicity
you most definitely should not use 4 byte properties to store booleans into
unless you are into [very] fuzzy logic.
It's idiotic and wasteful, you don't want to be a clone of a Java programmer,
who can't tell a float from a double, or do you?
BOOL variables (which do not obviously support atomic/nonatomic decorators
would not be atomic on some narrow bus architectures objective C would
not be used on anyway (microcontrollers with or without some [very] micro OS are C & assembly territory I suppose. they don't typically need the luggage
objc runtime would bring)

What happens when two threads set a BOOL to YES "at the same time"?
Then its value will be YES. If to threads write the same value to the same memory location, the memory location will have that value, whether that is atomic or not plays no role. It would only play a role if two threads write a different value to the same memory location or if one thread writes to it while another one is reading from it.
Is BOOL read/write atomic in Objective C?
It is if your hardware is a Macintosh running macOS. BOOL is uint32_t on PPC systems and char on Intel systems and writing these data types is atomic on their respective systems.
The Obj-C language makes no such guarantee, though. On other systems it depends on your compiler used and how BOOL is defined for that platfrom. Most compilers (gcc, clang, ...) guarantee that writing a variable of int-size is always atomic, whether other sizes are atomic depends on the CPU.
Note that atomic is not the same as thread-safe. Writing a BOOL is not a memory barrier. The compiler and the CPU may reorder instructions around a BOOL write:
a = 10;
b = YES;
c = 20;
There is no guarantee that instructions are executed in that order. The fact that b is YES does not mean that a is 10. The compiler and CPU are free to shuffle these three instructions around as desired since they don't depend on each other. Explicit atomic instructions as well as locks, mutexes and semaphores are usually memory barriers, that means they instruct the compiler and CPU to not move instructions located before that operation beyond it and not move instructions located after that operation before it (it's a hard border, that instructions may not pass).
Also cache consistency is not guaranteed. Even after you set a BOOL to YES, some other thread may still see it as NO for a limited amount of time. Memory barrier operations are usually also operations that ensure cache synchronization among all threads/cores/CPUs in the system.
And to add something really useful here as well, this is how you can ensure that setting a boolean value is atomic and acts as a memory barrier in 2020 using C11 which will also work in Obj-C Code:
#import <stdatomic.h>
// ...
volatile atomic_bool b = true;
// ...
atomic_store(&b, true);
// ...
atomic_store(&b, false);
Not only will this code guarantee atomic writes to the bool (for which the system will choose an appropriate type), it will also act as a memory barrier (Sequentially Consistent).
To read the boolean atomically from another thread, you'd use
bool x = atomic_load(&b);
You can also use atomic_load_explicit and atomic_store_explicit and pass an explicit memory order, which allows you to control more fine grained which kind of memory reordering is allowed and which ones is not.
Learn more about your possibilities here:
http://llvm.org/docs/Atomics.html
Always read "Notes for optimizers" to see which memory reordering is allowed. If in doubt, always use Sequentially Consistent (memory_order_seq_cst, which is the default if not specified). It will not result in fastest performance but it's the safest option and you really should only use something else if you know what you are doing.

Related

Thread safety of primitive value type properties - Objective C

Question
I am working on a project where I am concerned about the thread safety of an object's properties. I know that when a property is an object such as an NSString, I can run into situations where multiple threads are reading and writing simultaneously. In this case you can get a corrupt read and the app will either crash or result in corrupted data.
My question is for primitive value type properties such as BOOLs or NSIntegers. I am wondering if I can get into a similar situation where I read a corrupt value when reading and writing from multiple threads (and the app will crash)? In either case, I am interested in why.
Clarification - 1/13/17
I am mostly interested in if a primitive value type property is differently susceptible to crashing due to multiple threads accessing it at the same time than an object such as NSMutableString, custom created object, etc. In addition, if there is a difference when accessing memory on the stack vs heap relative to multithreading.
Clarification - 12/1/17
Thank you to #Rob for pointing me to the answer here: stackoverflow.com/a/34386935/1271826! This answer has a great example that shows that depending on the type of architecture you are on (32-bit vs 64-bit), you can get an undefined result when using a primitive property.
Although this is a great step towards answering my question, I still wonder two things:
If there is a multithreading difference when accessing a primitive value property on the stack vs heap (as noted in my previous clarification)?
If you restrict a program to running on one architecture, can you still find yourself in an undefended state when access a primitive value property and why?
I should note that here has been a lot of conversation around atomic vs nonatomic in response to this question. Although this is generally an important concept, this question has little to do with preventing undefined multithreading behavior by using the atomic property modifier or any other thread safety approach such as using GCD.
If your primitive value type property is atomic, then you're assured it cannot be corrupted because your reading it from one thread while setting it from another (as long as you only use the accessor methods, and not interact with the backing ivar directly). That's the entire purpose of atomic. And, as you suggest, this only applicable to fundamental data types (or objects that are both immutable and stateless). But in these narrow cases, atomic can be useful.
Having said that, this is a far cry from concluding that the app is thread-safe. It only assures you that the access to that one property is thread-safe. But often thread-safety must be considered within a broader context. (I know you assure us that this is not the case here, but I qualify this for future readers who too quickly jump to the conclusion that atomic is sufficient to achieve thread-safety. It often is not.)
For example, if your NSInteger property is "how many items are in this cache object", then not only must that NSInteger have its access synchronized, but it must be also be synchronized in conjunction with all interactions with the cache object (e.g. the "add item to cache" and "remove item from cache" tasks, too). And, in these cases, since you'll synchronize all interaction with this broader object somehow (e.g. with GCD queue, locks, #synchronized directive, whatever), making the NSInteger property atomic then becomes redundant and therefore modestly less efficient.
Bottom line, in limited situations, atomic can provide thread-safety for fundamental data types, but frequently it is insufficient when viewed in a broader context.
You later say that you don't care about race conditions. For what it's worth, Apple argues that there is no such thing as a benign race. See WWDC 2016 video Thread Sanitizer and Static Analysis (about 14:40 into it).
Anyway, you suggest you are merely concerned whether the value can be corrupted or whether the app will crash:
I am wondering if I can get into a similar situation where I read a corrupt value when reading and writing from multiple threads (and the app will crash)?
The bottom line is that if you're reading from one thread while mutating on another, the behavior is simply undefined. It could vary. You are simply well advised to avoid this scenario.
In practice, it's a function of the target architecture. For example on 64-bit type (e.g. long long) on 32-bit x86 target, you can easily retrieve a corrupt value, where one half of the 64-bit value is set and the other is not. (See https://stackoverflow.com/a/34386935/1271826 for example.) This results in merely non-sensical, invalid numeric values when dealing with primitive types. For pointers to objects, this obviously would have catestrophic implications.
But even if you're in an environment where no problems are manifested, it's an incredibly fragile approach to eschew synchronization to achieve thread-safety. It could easily break when run on new, unanticipated hardware architectures or compiled under different configuration. I'd encourage you to watch that Thread Sanitizer and Static Analysis video for more information.

What is the significance of the "volatile" key word with respect to Embedded Systems?

I have been recently working on learning embedded systems programming on my own. I have observed a fairly high usage of the keyword volatile qualifier when declaring variables?
What is the significance of volatile when declaring a variable in Embedded System programming?
Basically when the should the key word be used. I did read something about compiler optimization and use of the keyword. Also something related to memory mapping registers.
For example, I read this StackOverflow post but I didn't understand how it applied in an embedded environment. More specifically, I don't understand when the key word should be used. I did read something about compiler optimization and use of the keyword. Also something related to memory mapping registers, but I don't understand when to use it.
Let's have a look at an example. When you look at C header files for PIC microcontrollers, you will see that many elements are declared volatile:
extern volatile unsigned char PORTB # 0x006;
As you have read, the volatile keyword disables compiler optimization. Suppose you write a program that does the following:
PORTB = 0x00; // set all of port B low
while (PORTB == 0x00); // wait for any pin to get high
// do something else
When the compiler optimises this code, it will recognise the second line as an infinite loop: the condition is true and never gets false within its body. Therefore, everything after the infinite loop does not need to be compiled as it will never be ran. Hence, the compiler may decide to not include that part of the code in the generated assembly code.
However, this PORTB is actually linked to a physical port. It is a hardware port whose value may be altered by the external circuitry. This means that although the loop seems to be infinite, it doesn't have to be. The compiler can't possibly know this.
That's where volatile comes in. When PORTB is declared volatile, the compiler won't do any optimisation based on reasoning about PORTB. It will assume that its value may be changed at any time by external factors.
In the embedded systems world, one of the key aspects of the volatile key-word is that it denotes a variable that may change at any time (eg an external/hardware data input - eg an ADC) and therefore the compiler must not optimise use.
But specifically, when used with a control register, it indicates that a read access may in fact change the data!
As a general rule of thumb, I would recommend the use of the volatile qualifier in all of the following:
All hardware register accesses(read and write)
All variables that are accessible in multiple threads (especially interrupt handlers)
Note: accessing a volatile is not necessarily atomic, so it is imperative that you know your hardware and your code structure.
The volatile keyword is primarily used tell the compiler the value of the variable may change anytime. It also tells the compiler the not to apply optimization on the variable. I am not too much of an expert on this but below is good reference that I have referred in the past.
volatile is a qualifier that is applied to a variable when it is declared. It tells the compiler that the value of the variable may change at any time-without any action being taken by the code the compiler finds nearby. The implications of this are quite serious. However, before we examine them, let's take a look at the syntax.
Reference:
Introduction to the volatile keyword
Let me put it in other perspective it is exactly opposite of const keyword.
When compiler encounters const qualifier for any variable it checks if any function or statement is modified it once initialized. Hence flag error.
Volatile is exactly opposite, this variable can be changed by any function. Hence compiler does not apply optimization.
You see this mostly in embedded system programming due to use of interrupts and some programming logic constructs seems redundant.
While the statements about optimization are correct, they seem a little unclear to me. Here is what is really going on.
If you don't use the volatile keyword C may optimize that variable into a register it isn't using at the time. This will make for fewer assembly instructions and the code will execute faster.
For example, consider the following...
extern int my_port; // my_port is defined in a different module somewhere
// presumably a memory mapped hardware port
while (my_port > 0) {so stuff}
The compiler may decide to read my_port into a register only once before the actual while statement, then each time to test my_port it will look at the register not the memory location.
If, however, my_port is a hardware port, the port may change but register won't and the while conditional will not change.
The loop variable (the register) will be "out of phase" with the actual variable (my_port).
Thus the need for the keyword volatile.
Volatile tells C, "Don't optimize this variable into a reg, but read it each and every time you need it."
More instructions are generated, code is a bit slower, but it is always accurate.

How to do Binary instrumentation of syscall brk ? (x86-64 Linux) (maybe valgrind?)

I'd like to instrument syscall brk (and other calls but this in first order, it's most important to me) in given binary (preferably on actual syscall/sysenter level (x86-64 and x86) of making sys_brk call).
Main goal:
A part of sandbox which gives fixed amount of memory to jailed process
So, I'd like to get rid of brk system calls (and most preferably others in next order) and simulate memory allocations under fixed limit. Fixed limit is memory space, available to program. (You can think about it like making a kind of sandbox with fixed amount of available memory)
How to implement (one of) some example possible solutions (or yours solution):
just changing instructions to NOP
As brk returns 0 on success, simulate it's successes with setting operations that sets memory (register) state , as brk would be called with success.
More complex... instrument with code (or function call) which simulates success memory allocations under fixed limit.
Most flexible (maybe overkill in my case) to change this syscall into function call and add provided function to binary.
Given binary is code that can be malicious in one of two (most preferably both :) ) forms:
shared library - here I can setup environment before function call (for example do brk call in controlled way)
program binary - in this case we need to give program fixed amount of memory (by caller, or on begining of program "one syscall"), cause it can not allocate. Example of calling such program should be included in answer.
As problem is highly connected with many other aspects, I tried do my best in separating it as question, but please give me advice if I should specify something more or less.
Answers with implementation, links to resources (books, tutorials) are welcome.
(I am most interested in Linux, and solution that is reliable, so that people preparing binaries, even in assembler, would not have to worry about execution of their code)
LD_PRELOAD will trap C calls to brk(), but it won't trap the actual system call (int/syscall instruction). There's no portable way to trap those, but on Linux, ptrace will do it. Memory can also be allocated to a program by mmap(), so you'll need to intercept that call too.
Of course, what it seems you're really looking for is rlimit().
Yeah, I don't think you want valgrind for this.
You can use LD_PRELOAD or linker tricks to capture brk(2): see these other discussions:
Function interposition in Linux without dlsym
Overriding 'malloc' using the LD_PRELOAD mechanism
Code might look like this:
#include <unistd.h>
#include <dlfcn.h>
/* prototype int brk(void *addr); */
static int (*real_brk)(void *addr) = NULL;
int brk(void * addr) {
real_brk = dlsym(RTLD_NEXT, "brk");
if (real_brk == NULL) {
fprintf(stderr, "error mapping brk: %s\n", dlerror());
return -1;
}
printf("calling brk(2) for %p\n", addr);
return (real_brk (addr));
}`
and then LD_PRELOAD that to intercept brk(2)

How does compiler arrange local variables on stack?

As we know, local variables is located on stack. However, what is their order? Are they arranged as the order of their declaration? This means that the first declared variable is arrange on the higher address of stack (stack grows to lower address) ? As an example:
void foo(){
int iArray[4];
int iVar;
}
On stack, the local variable-- iArray and iVar are arranged as followed?
Only if you have optimisation turned off!
Once the optimiser gets hold of your code all bets are off. Common strategies for aggressive optimisations are:
Drop the variable if its never used or is just a copy of another variable.
Reorder varaibles in the order they are used. This helps greatly if your app is using swap space and also helps cache utilisation (on some machines).
Move often used variables into registers. Common on risk machinces with 32 lovely genreral purpose registers. Not so common on Intel with its measly eight single purpose registers.
Change the data type. e.g. casting small ints to intgers often speeds up register loading and caching.
reorder storage to minimise slack bytes. eg char a, double b, char c, int d could be reordered to double b, int d, char a, char c thus saving 10 bytes.
There is no rule you can depend on. Most compilers will use the declaration order unless you start to optimize the code.
Enabling optimizations can cause reuse of stack space, reordering of local variables or even move the variables to CPU registers, so they don't show up on the stack anymore.
[EDIT] On some systems, the stack grows to bigger addresses. So it starts with 0x1000 and the next address is 0x1001 instead of starting with 0xffff and the next address is 0xfffe.
The simplest implementations make it very easy to predict where various variables will end up on the stack. However, those implementations also allow certain security problems (mainly, overflowing a buffer and predicting what the extra data will overwrite, allowing the injection of shellcode).
Since the layout of the stack is implementation defined in most stack-based languages (technically, many such languages don't mandate the use of a stack, but instead have semantics that are easy to implement with a stack), compiler writers have gone to great lengths to make it hard to predict the stack layout at runtime.

Locking details of synthesized atomic #properties in Obj-C 2.0

The documentation for properties in Obj-C 2.0 say that atomic properties use a lock internally, but it doesn't document the specifics of the lock. Does anybody know if this is a per-property lock, a per-object lock separate from the implicit one used by #synchronized(self), or the equivalent of #synchronized(self)?
Looking at the generated code (iOS SDK GCC 4.0/4.2 for ARM),
32-bit assign properties (including struct {int32_t v;}) are accessed directly.
Larger-than-32-bit structs are accessed with objc_copyStruct().
double and int64_t are accessed with objc_copyStruct, except on GCC 4.0 where they're accessed directly with stmia/ldmia (I'm not sure if this is guaranteed to be atomic in case of interrupts).
retain/copy accessors call objc_getProperty and objc_setProperty.
Cocoa with Love: Memory and thread-safe custom property methods gives some details on how they're implemented in runtime version objc4-371.2; obviously the precise implementation can vary between runtimes (for example, on some platforms you can use atomic swap/CAS to spin on the ivar itself instead of using another lock).
The lock used by atomic #properties is an implementation detail--for appropriate types on appropriate platforms, atomic operations without a lock are possible and I'd be surprised if Apple was not taking advantage of them. There is no public access to the lock in any case, so you can't #synchronize on the same lock. Several Apple engineers have pointed out that atomic properties do not guarantee thread safety; atomic properties only guarantee that gets/sets of that value are atomic. For correct thread safety, you will have to make use of higher-level locking or synchronization and you almost certainly would not want to use the same lock as the synthesize getter/setter(s) might be using.