are all instructions in jvm atomic? - jvm

I remember I read the somewhere before,but I can't find the offical document now.
are all instructions in jvm all atomic?
like:
iinc
iload
aload
all atomic?

The bytecode instructions you mentioned (iinc, iload, aload etc.) operate on the values from the operand stack and on the local variables. These areas are thread-local (see JVMS 2.5, 2.6), that is, speaking about atomicity here is meaningless. I.e. they are NOT implemented using atomic CPU instructions like lock xadd, but nobody should care.
The bytecode instructions that read or write fields and array elements (getfield, putfiled, iastore etc.) are atomic except for long and double types (see JLS 17.7). 32-bit JVM may implement (actually, HotSpot JVM does implement) reading and writing of 64-bit fields with two 32-bit loads or stores, unless the field is declared volatile.

Related

Thread safety of primitive value type properties - Objective C

Question
I am working on a project where I am concerned about the thread safety of an object's properties. I know that when a property is an object such as an NSString, I can run into situations where multiple threads are reading and writing simultaneously. In this case you can get a corrupt read and the app will either crash or result in corrupted data.
My question is for primitive value type properties such as BOOLs or NSIntegers. I am wondering if I can get into a similar situation where I read a corrupt value when reading and writing from multiple threads (and the app will crash)? In either case, I am interested in why.
Clarification - 1/13/17
I am mostly interested in if a primitive value type property is differently susceptible to crashing due to multiple threads accessing it at the same time than an object such as NSMutableString, custom created object, etc. In addition, if there is a difference when accessing memory on the stack vs heap relative to multithreading.
Clarification - 12/1/17
Thank you to #Rob for pointing me to the answer here: stackoverflow.com/a/34386935/1271826! This answer has a great example that shows that depending on the type of architecture you are on (32-bit vs 64-bit), you can get an undefined result when using a primitive property.
Although this is a great step towards answering my question, I still wonder two things:
If there is a multithreading difference when accessing a primitive value property on the stack vs heap (as noted in my previous clarification)?
If you restrict a program to running on one architecture, can you still find yourself in an undefended state when access a primitive value property and why?
I should note that here has been a lot of conversation around atomic vs nonatomic in response to this question. Although this is generally an important concept, this question has little to do with preventing undefined multithreading behavior by using the atomic property modifier or any other thread safety approach such as using GCD.
If your primitive value type property is atomic, then you're assured it cannot be corrupted because your reading it from one thread while setting it from another (as long as you only use the accessor methods, and not interact with the backing ivar directly). That's the entire purpose of atomic. And, as you suggest, this only applicable to fundamental data types (or objects that are both immutable and stateless). But in these narrow cases, atomic can be useful.
Having said that, this is a far cry from concluding that the app is thread-safe. It only assures you that the access to that one property is thread-safe. But often thread-safety must be considered within a broader context. (I know you assure us that this is not the case here, but I qualify this for future readers who too quickly jump to the conclusion that atomic is sufficient to achieve thread-safety. It often is not.)
For example, if your NSInteger property is "how many items are in this cache object", then not only must that NSInteger have its access synchronized, but it must be also be synchronized in conjunction with all interactions with the cache object (e.g. the "add item to cache" and "remove item from cache" tasks, too). And, in these cases, since you'll synchronize all interaction with this broader object somehow (e.g. with GCD queue, locks, #synchronized directive, whatever), making the NSInteger property atomic then becomes redundant and therefore modestly less efficient.
Bottom line, in limited situations, atomic can provide thread-safety for fundamental data types, but frequently it is insufficient when viewed in a broader context.
You later say that you don't care about race conditions. For what it's worth, Apple argues that there is no such thing as a benign race. See WWDC 2016 video Thread Sanitizer and Static Analysis (about 14:40 into it).
Anyway, you suggest you are merely concerned whether the value can be corrupted or whether the app will crash:
I am wondering if I can get into a similar situation where I read a corrupt value when reading and writing from multiple threads (and the app will crash)?
The bottom line is that if you're reading from one thread while mutating on another, the behavior is simply undefined. It could vary. You are simply well advised to avoid this scenario.
In practice, it's a function of the target architecture. For example on 64-bit type (e.g. long long) on 32-bit x86 target, you can easily retrieve a corrupt value, where one half of the 64-bit value is set and the other is not. (See https://stackoverflow.com/a/34386935/1271826 for example.) This results in merely non-sensical, invalid numeric values when dealing with primitive types. For pointers to objects, this obviously would have catestrophic implications.
But even if you're in an environment where no problems are manifested, it's an incredibly fragile approach to eschew synchronization to achieve thread-safety. It could easily break when run on new, unanticipated hardware architectures or compiled under different configuration. I'd encourage you to watch that Thread Sanitizer and Static Analysis video for more information.

How to add memory to the heap at runtime?

I am using Keil's ARM-MDK 4.11. I have a statically allocated block of memory that is used only at startup. It is used before the scheduler is initialised and due to the way RL-RTX takes control of the heap-management, cannot be dynamically allocated (else subsequent allocations after the scheduler starts cause a hard-fault).
I would like to add this static block as a free-block to the system heap after the scheduler is initialised. It would seem that __Heap_ProvideMemory() might provide the answer, this is called during initialisation to create the initial heap. However that would require knowledge of the heap descriptor address, and I can find no documented method of obtaining that.
Any ideas?
I have raised a support request with ARM/Keil for this, but they are more interested in questioning why I would want to do this, and offering alternative solutions. I am well aware of the alternatives, but in this case if this could be done it would be the cleanest solution.
We use the Rowley Crossworks compiler but had a similar issue - the heap was being set up in the compiler CRT startup code. Unfortunately the SDRAM wasn't initialised till the start of main() and so the heap wasn't set up properly. I worked around it by reinitialising the heap at the start of main(), after the SDRAM was initialised.
I looked at the assembler code that the compiler uses at startup to work out the structure - it wasn't hard. Subsequently I have also obtained the malloc/free source code from Rowley - perhaps you could ask Keil for their version?
One method I've used is to incorporate my own simple heap routines and take over the malloc()/calloc()/free() functions from the library.
The simple, custom heap routines had an interface that allowed adding blocks of memory to the heap.
The drawback to this (at least in my case) was that the custom heap routines were far less sophisticated than the built-in library routines and were probably more prone to fragmentation than the built-in routines. That wasn't a serious issue in that particular application. If you want the capabilities of the built-in library routines, you could probably have your malloc() defer to the built-in heap routines until it returns a failure, then try to allocate from your custom heap.
Another drawback is that I found it much more painful to make sure the custom routines were bug-free than I thought it would be at first glance, even though I wasn't trying to do anything too fancy (just a simple list of free blocks that could be split on allocation and coalesced when freed).
The one benefit to this technique is that it's pretty portable (as long as your custom routines are portable) and doesn't break if the toolchain changes it's internals. The only part that requires porting is taking over the malloc()/free() interface and making sure you get initialized early enough.

Is BOOL read/write atomic in Objective C?

What happens when two threads set a BOOL to YES "at the same time"?
Here is code for solution suggested by Jacko.
Use volatile uint32_t with OSAtomicOr32Barrier and OSAtomicAnd32Barrier
#import <libkern/OSAtomic.h>
volatile uint32_t _IsRunning;
- (BOOL)isRunning {
return _IsRunning != 0;
}
- (void)setIsRunning:(BOOL)allowed {
if (allowed) {
OSAtomicOr32Barrier(1, & _IsRunning); //Atomic bitwise OR of two 32-bit values with barrier
} else {
OSAtomicAnd32Barrier(0, & _IsRunning); //Atomic bitwise AND of two 32-bit values with barrier.
}
}
No. Without a locking construct, reading/writing any type variable is NOT atomic in Objective C.
If two threads write YES at the same time to a BOOL, the result is YES regardless of which one gets in first.
Please see: Synchronizing Thread Execution
I would have to diverge from the accepted answer. Sorry.
While objective c does not guarantee that BOOL properties declared as nonatomic
are in fact atomic I'd have to guess that the hardware you most
care about (all iOS and macos devices) have instructions to perform byte reads and stores atomically. So, unless Apple comes out with Road Light OS running
on an IBM microcontroller that has 5 bit wide bus to send 10 bit bytes over
you could just as well use nonatomic BOOLs in a situation that calls for atomic BOOLs. The code would not be portable to Road Light OS but if you can
sacrifice the futureproofing of your code nonatomic is fine for this use case.
I'm sure there are hardened individuals on s.o. that would raise to the challenge of disassembling synthesized BOOL getter and setter for atomic/nonatomic cases to see what's the difference. At least on ARM.
Your takeaway from this is likely this
you can declare BOOL properties as atomic and it won't cost you a dime
on all HW iOS and macOS intrinsically supports.
memory barriers are orthogonal to atomicity
you most definitely should not use 4 byte properties to store booleans into
unless you are into [very] fuzzy logic.
It's idiotic and wasteful, you don't want to be a clone of a Java programmer,
who can't tell a float from a double, or do you?
BOOL variables (which do not obviously support atomic/nonatomic decorators
would not be atomic on some narrow bus architectures objective C would
not be used on anyway (microcontrollers with or without some [very] micro OS are C & assembly territory I suppose. they don't typically need the luggage
objc runtime would bring)
What happens when two threads set a BOOL to YES "at the same time"?
Then its value will be YES. If to threads write the same value to the same memory location, the memory location will have that value, whether that is atomic or not plays no role. It would only play a role if two threads write a different value to the same memory location or if one thread writes to it while another one is reading from it.
Is BOOL read/write atomic in Objective C?
It is if your hardware is a Macintosh running macOS. BOOL is uint32_t on PPC systems and char on Intel systems and writing these data types is atomic on their respective systems.
The Obj-C language makes no such guarantee, though. On other systems it depends on your compiler used and how BOOL is defined for that platfrom. Most compilers (gcc, clang, ...) guarantee that writing a variable of int-size is always atomic, whether other sizes are atomic depends on the CPU.
Note that atomic is not the same as thread-safe. Writing a BOOL is not a memory barrier. The compiler and the CPU may reorder instructions around a BOOL write:
a = 10;
b = YES;
c = 20;
There is no guarantee that instructions are executed in that order. The fact that b is YES does not mean that a is 10. The compiler and CPU are free to shuffle these three instructions around as desired since they don't depend on each other. Explicit atomic instructions as well as locks, mutexes and semaphores are usually memory barriers, that means they instruct the compiler and CPU to not move instructions located before that operation beyond it and not move instructions located after that operation before it (it's a hard border, that instructions may not pass).
Also cache consistency is not guaranteed. Even after you set a BOOL to YES, some other thread may still see it as NO for a limited amount of time. Memory barrier operations are usually also operations that ensure cache synchronization among all threads/cores/CPUs in the system.
And to add something really useful here as well, this is how you can ensure that setting a boolean value is atomic and acts as a memory barrier in 2020 using C11 which will also work in Obj-C Code:
#import <stdatomic.h>
// ...
volatile atomic_bool b = true;
// ...
atomic_store(&b, true);
// ...
atomic_store(&b, false);
Not only will this code guarantee atomic writes to the bool (for which the system will choose an appropriate type), it will also act as a memory barrier (Sequentially Consistent).
To read the boolean atomically from another thread, you'd use
bool x = atomic_load(&b);
You can also use atomic_load_explicit and atomic_store_explicit and pass an explicit memory order, which allows you to control more fine grained which kind of memory reordering is allowed and which ones is not.
Learn more about your possibilities here:
http://llvm.org/docs/Atomics.html
Always read "Notes for optimizers" to see which memory reordering is allowed. If in doubt, always use Sequentially Consistent (memory_order_seq_cst, which is the default if not specified). It will not result in fastest performance but it's the safest option and you really should only use something else if you know what you are doing.

How does it know where my value is in memory?

When I write a program and tell it int c=5, it puts the value 5 into a little bit of it's memory, but how does it remember which one? The only way I could think of would be to have another bit of memory to tell it, but then it would have to remember where it kept that as well, so how does it remember where everything is?
Your code gets compiled before execution, at that step your variable will be replaced by the actual reference of the space where the value will be stored.
This at least is the general principle. In reality it will be way more complecated, but still the same basic idea.
There are lots of good answers here, but they all seem to miss one important point that I think was the main thrust of the OP's question, so here goes. I'm talking about compiled languages like C++, interpreted ones are much more complex.
When compiling your program, the compiler examines your code to find all the variables. Some variables are going to be global (or static), and some are going to be local. For the static variables, it assigns them fixed memory addresses. These addresses are likely to be sequential, and they start at some specific value. Due to the segmentation of memory on most architectures (and the virtual memory mechanisms), every application can (potentially) use the same memory addresses. Thus, if we assume the memory space programs are allowed to use starts at 0 for our example, every program you compile will put the first global variable at location 0. If that variable was 4 bytes, the next one would be at location 4, etc. These won't conflict with other programs running on your system because they're actually being mapped to an arbitrary sequential section of memory at run time. This is why it can assign a fixed address at compile time without worrying about hitting other programs.
For local variables, instead of being assigned a fixed address, they're assigned a fixed address relative to the stack pointer (which is usually a register). When a function is called that allocates variables on the stack, the stack pointer is simply moved by the required number of bytes, creating a gap in the used bytes on the stack. All the local variables are assigned fixed offsets to the stack pointer that put them into that gap. Every time a local variable is used, the real memory address is calculated by adding the stack pointer and the offset (neglecting caching values in registers). When the function returns, the stack pointer is reset to the way it was before the function was called, thus the entire stack frame including local variables is free to be overwritten by the next function call.
read Variable (programming) - Memory allocation:
http://en.wikipedia.org/wiki/Variable_(programming)#Memory_allocation
here is the text from the link (if you don't want to actually go there, but you are missing all the links within the text):
The specifics of variable allocation
and the representation of their values
vary widely, both among programming
languages and among implementations of
a given language. Many language
implementations allocate space for
local variables, whose extent lasts
for a single function call on the call
stack, and whose memory is
automatically reclaimed when the
function returns. (More generally, in
name binding, the name of a variable
is bound to the address of some
particular block (contiguous sequence)
of bytes in memory, and operations on
the variable manipulate that block.
Referencing is more common for
variables whose values have large or
unknown sizes when the code is
compiled. Such variables reference the
location of the value instead of the
storing value itself, which is
allocated from a pool of memory called
the heap.
Bound variables have values. A value,
however, is an abstraction, an idea;
in implementation, a value is
represented by some data object, which
is stored somewhere in computer
memory. The program, or the runtime
environment, must set aside memory for
each data object and, since memory is
finite, ensure that this memory is
yielded for reuse when the object is
no longer needed to represent some
variable's value.
Objects allocated from the heap must
be reclaimed—especially when the
objects are no longer needed. In a
garbage-collected language (such as
C#, Java, and Lisp), the runtime
environment automatically reclaims
objects when extant variables can no
longer refer to them. In
non-garbage-collected languages, such
as C, the program (and the programmer)
must explicitly allocate memory, and
then later free it, to reclaim its
memory. Failure to do so leads to
memory leaks, in which the heap is
depleted as the program runs, risking
eventual failure from exhausting
available memory.
When a variable refers to a data
structure created dynamically, some of
its components may be only indirectly
accessed through the variable. In such
circumstances, garbage collectors (or
analogous program features in
languages that lack garbage
collectors) must deal with a case
where only a portion of the memory
reachable from the variable needs to
be reclaimed
There's a multi-step dance that turns c = 5 into machine instructions to update a location in memory.
The compiler generates code in two parts. There's the instruction part (load a register with the address of C; load a register with the literal 5; store). And there's a data allocation part (leave 4 bytes of room at offset 0 for a variable known as "C").
A "linking loader" has to put this stuff into memory in a way that the OS will be able to run it. The loader requests memory and the OS allocates some blocks of virtual memory. The OS also maps the virtual memory to physical memory through an unrelated set of management mechanisms.
The loader puts the data page into one place and instruction part into another place. Notice that the instructions use relative addresses (an offset of 0 into the data page). The loader provides the actual location of the data page so that the instructions can resolve the real address.
When the actual "store" instruction is executed, the OS has to see if the referenced data page is actually in physical memory. It may be in the swap file and have to get loaded into physical memory. The virtual address being used is translated to a physical address of memory locations.
It's built into the program.
Basically, when a program is compiled into machine language, it becomes a series of instructions. Some instructions have memory addresses built into them, and this is the "end of the chain", so to speak. The compiler decides where each variable will be and burns this information into the executable file. (Remember the compiler is a DIFFERENT program to the program you are writing; just concentrate on how your own program works for the moment.)
For example,
ADD [1A56], 15
might add 15 to the value at location 1A56. (This instruction would be encoded using some code that the processor understands, but I won't explain that.)
Now, other instructions let you use a "variable" memory address - a memory address that was itself loaded from some location. This is the basis of pointers in C. You certainly can't have an infinite chain of these, otherwise you would run out of memory.
I hope that clears things up.
I'm going to phrase my response in very basic terminology. Please don't be insulted, I'm just not sure how proficient you already are and want to provide an answer acceptable to someone who could be a total beginner.
You aren't actually that far off in your assumption. The program you run your code through, usually called a compiler (or interpreter, depending on the language), keeps track of all the variables you use. You can think of your variables as a series of bins, and the individual pieces of data are kept inside these bins. The bins have labels on them, and when you build your source code into a program you can run, all of the labels are carried forward. The compiler takes care of this for you, so when you run the program, the proper things are fetched from their respective bin.
The variables you use are just another layer of labels. This makes things easier for you to keep track of. The way the variables are stored internally may have very complex or cryptic labels on them, but all you need to worry about is how you are referring to them in your code. Stay consistent, use good variable names, and keep track of what you're doing with your variables and the compiler/interpreter takes care of handling the low level tasks associated with that. This is a very simple, basic case of variable usage with memory.
You should study pointers.
http://home.netcom.com/~tjensen/ptr/ch1x.htm
Reduced to the bare metal, a variable lookup either reduces to an address that is some statically known offset to a base pointer held in a register (the stack pointer), or it is a constant address (global variable).
In an interpreted language, one register if often reserved to hold a pointer to a data structure (the "environment") that associates variable names with their current values.
Computers ultimately only undertand on and off - which we conveniently abstract to binary. This language is the basest level and is called machine language. I'm not sure if this is folklore - but some programmers used to (or maybe still do) program directly in machine language. Typing or reading in binary would be very cumbersome, which is why hexadecimal is often used to abbreviate the actual binary.
Because most of us are not savants, machine language is abstracted into assembly language. Assemply is a very primitive language that directly controls memory. There are a very limited number of commands (push/pop/add/goto), but these ultimately accomplish everything that is programmed. Different machine architectures have different versions of assembly, but the gist is that there are a few dozen key memory registers (physically in the CPU) - in a x86 architecture they are EAX, EBX, ECX, EDX, ... These contain data or pointers that the CPU uses to figure out what to do next. The CPU can only do 1 thing at a time and it uses these registers to figure out what to do next. Computers seem to be able to do lots of things simultaneously because the CPU can process these instructions very quickly - (millions/billions instructions per second). Of course, multi-core processors complicate things, but let's not go there...
Because most of us are not smart or accurate enough to program in assembly where you can easily crash the system, assembly is further abstracted into a 3rd generation language (3GL) - this is your C/C++/C#/Java etc... When you tell one of these languages to put the integer value 5 in a variable, your instructions are stored in text; the assembler compiles your text into an assembly file (executable); when the program is executed, the program and its instructions are queued by the CPU, when it is show time for that specific line of code, it gets read in the the CPU register and processed.
The 'not smart enough' comments about the languages are a bit tongue-in-cheek. Theoretically, the further you get away from zeros and ones to plain human language, the more quickly and efficiently you should be able to produce code.
There is an important flaw here that a few people make, which is assuming that all variables are stored in memory. Well, unless you count the CPU registers as memory, then this won't be completely right. Some compilers will optimize the generated code and if they can keep a variable stored in a register then some compilers will make use of this!
Then, of course, there's the complex matter of heap and stack memory. Local variables can be located in both! The preferred location would be in the stack, which is accessed way more often than the heap. This is the case for almost all local variables. Global variables are often part of the data segment of the final executable and tend to become part of the heap, although you can't release these global memory areas. But the heap is often used for on-the-fly allocations of new memory blocks, by allocating memory for them.
But with Global variables, the code will know exactly where they are and thus write their exact location in the code. (Well, their location from the beginning of the data segment anyways.) Register variables are located in the CPU and the compiler knows exactly which register, which is also just told to the code. Stack variables are located at an offset from the current stack pointer. This stack pointer will increase and decrease all the time, depending on the number of levels of procedures calling other procedures.
Only heap values are complex. When the application needs to store data on the heap, it needs a second variable to store it's address, otherwise it could lose track. This second variable is called a pointer and is located as global data or as part of the stack. (Or, on rare occasions, in the CPU registers.)
Oh, it's even a bit more complex than this, but already I can see some eyes rolling due to this information overkill. :-)
Think of memory as a drawer into which you decide how to devide it according to your spontaneous needs.
When you declare a variable of type integer or any other type, the compiler or interpreter (whichever) allocates a memory address in its Data Segment (DS register in assembler) and reserves a certain amount of following addresses depending on your type's length in bit.
As per your question, an integer is 32 bits long, so, from one given address, let's say D003F8AC, the 32 bits following this address will be reserved for your declared integer.
On compile time, whereever you reference your variable, the generated assembler code will replace it with its DS address. So, when you get the value of your variable C, the processor queries the address D003F8AC and retrieves it.
Hope this helps, since you already have much answers. :-)

Locking details of synthesized atomic #properties in Obj-C 2.0

The documentation for properties in Obj-C 2.0 say that atomic properties use a lock internally, but it doesn't document the specifics of the lock. Does anybody know if this is a per-property lock, a per-object lock separate from the implicit one used by #synchronized(self), or the equivalent of #synchronized(self)?
Looking at the generated code (iOS SDK GCC 4.0/4.2 for ARM),
32-bit assign properties (including struct {int32_t v;}) are accessed directly.
Larger-than-32-bit structs are accessed with objc_copyStruct().
double and int64_t are accessed with objc_copyStruct, except on GCC 4.0 where they're accessed directly with stmia/ldmia (I'm not sure if this is guaranteed to be atomic in case of interrupts).
retain/copy accessors call objc_getProperty and objc_setProperty.
Cocoa with Love: Memory and thread-safe custom property methods gives some details on how they're implemented in runtime version objc4-371.2; obviously the precise implementation can vary between runtimes (for example, on some platforms you can use atomic swap/CAS to spin on the ivar itself instead of using another lock).
The lock used by atomic #properties is an implementation detail--for appropriate types on appropriate platforms, atomic operations without a lock are possible and I'd be surprised if Apple was not taking advantage of them. There is no public access to the lock in any case, so you can't #synchronize on the same lock. Several Apple engineers have pointed out that atomic properties do not guarantee thread safety; atomic properties only guarantee that gets/sets of that value are atomic. For correct thread safety, you will have to make use of higher-level locking or synchronization and you almost certainly would not want to use the same lock as the synthesize getter/setter(s) might be using.