Storing things in isa - objective-c

The 64-bit runtime took away the ability to directly access the isa field of an object, something CLANG engineers had been warning us about for a while. They've been replaced by a rather inventive (and magic) set of everchanging ABI rules about which sections of the newly christened isa header contain information about the object, or even other state (in the case of NSNumber/NSString). There seems to be a loophole, in that you can opt out of the new "magic" isa and use one of your own (a raw isa) at the expense of taking the slow road through certain runtime code paths.
My question is twofold, then:
If it's possible to opt out and object_setClass() an arbitrary class into an object in +allocWithZone:, is it also possible to put anything up there in the extra space with the class, or will the runtime try to read it through the fast paths?
What exactly in the isa header is tagged to let the runtime differentiate it from a normal isa?

If it's possible to opt out and object_setClass() an arbitrary class into an object in +allocWithZone:
According to this article by Greg Parker
If you override +allocWithZone:, you may initialize your object's isa field to a "raw" isa pointer. If you do, no extra data will be stored in that isa field and you may suffer the slow path through code like retain/release. To enable these optimizations, instead set the isa field to zero (if it is not already) and then call object_setClass().
So yes, you can opt out and manually set a raw isa pointer. To inform the runtime about this, you have to the first LSB of the isa to 0. (see below)
Also, there's an environment variable that you can set, named OBJC_DISABLE_NONPOINTER_ISA, which is pretty self-explanatory.
is it also possible to put anything up there in the extra space with the class, or will the runtime try to read it through the fast paths?
The extra space is not being wasted. It's used by the runtime for useful in-place information about the object, such as the current state and - most importantly - its retain count (this is a big improvement since it used to be fetched every time from an external hash table).
So no, you cannot use the extra space for your own purposes, unless you opt out (as discussed above). In that case the runtime will go through the long path, ignoring the information contained in the extra bits.
Always according to Greg Parker's article, here's the new layout of the isa (note that this is very likely to change over time, so don't trust it)
(LSB)
1 bit | indexed | 0 is raw isa, 1 is non-pointer isa.
1 bit | has_assoc | Object has or once had an associated reference. Object with no associated references can deallocate faster.
1 bit | has_cxx_dtor | Object has a C++ or ARC destructor. Objects with no destructor can deallocate faster.
30 bits | shiftcls | Class pointer's non-zero bits.
9 bits | magic | Equals 0xd2. Used by the debugger to distinguish real objects from uninitialized junk.
1 bit | weakly_referenced | Object is or once was pointed to by an ARC weak variable. Objects not weakly referenced can deallocate faster.
1 bit | deallocating | Object is currently deallocating.
1 bit | has_sidetable_rc | Object's retain count is too large to store inline.
19 bits | extra_rc | Object's retain count above 1. (For example, if extra_rc is 5 then the object's real retain count is 6.)
(MSB)
What exactly in the isa header is tagged to let the runtime differentiate it from a normal isa?
As anticipated above you can discriminate between a raw isa and a new rich isa by looking at the first LSB.
To wrap it up, while it looks feasible to opt out and start messing with the extra bits available on a 64 bit architecture, I personally discourage it. The new isa layout is carefully crafted for optimizing the runtime performances and it's far from guaranteed to stay the same over time.
Apple may also decide in the future to drop the retro-compatibility with the raw isa representation, preventing opt out. Any code assuming the isa to be a pointer would then break.

You can't safely do this, since if (when, really) the usable address space expands beyond 33 bits, the layout will presumably need to change again. Currently though, the bottom bit of the isa controls whether it's treated as having extra info or not.

Related

does Java allocate memory for objects without instance variables?

Please help with the following question.
Assume that I have class that contains only methods. Will space in heap be allocated for objects created of this class? If yes then what does it contain?
The question linked by Fairoz contains most relevant data, but I'll try to narrow information to your case.
Yes. The JVM will take a contiguous space off the heap to store these objects.
The contents are specific to the JVM implementation. In HotSpot, you can see the specifics in the source code.
There will be a machine word called "Mark", which is defined here, and is used to keep the hashCode, locking state, and garbage collection. This takes 8 bytes.
Next will be a pointer to the Klass, which contains information about the class, such as methods.
If you're in a 64 bit JVM, with compressedOops enabled (as is default on java 8) the Klass pointer will take only 4 bytes. Since you have no fields, the total size is 12 bytes. However, the JVM forces to align to a full word, so your object will use 4 bytes for padding. In total, 16 bytes.
Some useful documentation:
- https://www.infoq.com/articles/Introduction-to-HotSpot
- https://psy-lob-saw.blogspot.com.es/2013/05/know-thy-java-object-memory-layout.html

Is there any reason to use NSInteger instead of uint8_t with NS_ENUM?

The general standard appears to use NS_ENUM with NSInteger as the base type. Why is this the case? Assuming less than 256 cases (which covers almost any enumeration), is there any reason to use that instead of uint8_t, which could use less memory space? Either imports into Swift fine.
This is different than NS_OPTIONS, where a larger type makes sense, since you shouldn't be doing any bit math with enumerations, and you can use every number representable by the base type as a value.
The answer to the question in the title:
Is there any reason to use NSInteger instead of uint8_t with NS_ENUM?
is probably not.
When declaring an enum in C if no underlying type is specified the compiler is free to choose any suitable type from char and the signed and unsigned integer types which can at least represent all the values required. The current Xcode/Clang compiler picks a 4-byte integer. One could reasonably assume the compiler writers made an informed choice - some balance of performance and storage.
Smaller types, such as uint8_t, will usually be aligned on smaller boundaries in memory (or on disc) - but that is only of benefit if the adjacent field matches the alignment e.g. if a 2-byte size typed field follows a 1-byte sized typed field then unless otherwise specified (e.g. with a #pragma packed) there will probably be an intervening unused byte.
Whether any performance or storage differences are significant will be heavily dependent on the application. Follow the usual rule of thumb - don't optimise until an issue is found.
However if you find semantic benefit in limiting the size then certainly do so - there is no general reason you shouldn't. The choice is similar to picking signed vs. unsigned integers, some programmers avoid unsigned types for values that will be ≥ 0 unless absolutely required for the extra range, while others appreciate the semantic benefit.
Summary: There is no right answer, its largely a subjective issue.
HTH
First of all: The memory footprint is close to completely meaningless. You are talking about 1 Byte vs. 4/8 Bytes. (If the memory alignment does not force the usage of 4/8 bytes whatever you chosed.) How many NS_ENUM (C) objects do you want to have in your running app?
I guess that the reason is pretty easy: NSInteger is akin of "catch all" integer type in Cocoa. That makes assignments easier, especially you do not have to care about assigning a bigger integer type to a smaller one. Without casting this would lead to warnings.
Having more than one integer type in a desktop app with a 32/64 bit model is akin of an anachronism. Nor a Mac neither a MacBook neither an iPhone is an embedded micro controller …
You can use any integer data type including uint8_t with NS_ENUM as.
typedef NS_ENUM(uint8_t, eEnumAddEditViewMode) {
eWBEnumAddMode,
eWBEnumEditMode
};
In old c style standard NSInteger is default, because NSInteger is akin of "catch all" integer type in objective c. and developer can easily type boxing and unboxing with their own variable. This is just developer friendly best practise.

Memory addresses, pointers, variables, values - what goes on behind the scenes

This is going to be a pretty loaded question but ever since I started learning about pointers I've been very curious about what happens behind the scenes when a program is run.
As far as I know, computer memory is commonly thought of as a long strip of memory divided evenly into individual bytes. Certainly pictures such as the following evoke such a metaphor:
One thing I've been wondering, what do the memory addresses themselves represent? I'm sure it's no coincidence that memory addresses appear as 8 digit hexadecimal values (eg/ 00EB5748). Why is this?
Furthermore, when I declare a variable x, what is happening at the memory level? Is the compiler simply reserving a random address (+however many consecutive addresses it needs for the variable type) for data storage?
Now suppose x is an unsigned int that occupies 2 bytes of memory (ie values ranging from 0 to 65536). When I declare x = 12, what is happening? What is it that I'm making equal to 12? When I draw conceptual diagrams, I usually have a box for an address (say &x) pointing to a variable (x) that occupies seemingly nothing, and I'm sure that can't be a fully accurate picture of what's going on.
And what's happening at the binary level? Is the address 00EB5748 treated as 111010110101011101001000 and storing a value of 12 somewhere, or 1100?
Mostly my confusion & curiosity stems from the relationship between memory addresses and actual values being declared (eg/ 12, 'a', -355.2). As another example, suppose our address 00EB5748 is pointing to a char 's' whose value is 115 according to ASCII charts. Is the address describing a position that stores the value 115 in 1 byte, by flipping the appropriate 1s and 0s at that position in memory?
Just open any book. You will see pages. Every page has a number. Consecutive pages are numbered by consecutive numbers. Do you have any confusion with numbered pages? I think no. Then you should not have confusion with computer memory.
Books were main memory storage devices before computer era. Computer memory derived basic concept from books: book has pages -> computer memory has memory cells, book has page numbers -> computer memory has memory addresses.
One thing I've been wondering, what do the memory addresses themselves represent?
Numbers. Every memory cell has number, like every page in book.
Furthermore, when I declare a variable x, what is happening at the memory level? Is the compiler simply reserving a random address (+however many consecutive addresses it needs for the variable type) for data storage?
Memory manager marks some memory cells occupied and tells the address of first reserved cell to compiler. Compiler associates name and type of variable with this address. (This picture is from my head, it can be inaccurate).
When I declare x = 12, what is happening?
When you declared variable x, memory cells were reserved for this variable. Now you write 12 into these memory cells. Note that 12 is binary coded in some way, depending on type of variable x. If x is unsigned int which occupies 2 memory cells, then one cell will contain 0, other will contain 12. Because binary integer representation of 12 is
0000 0000 0000 1100
|_______| |_______|
cell cell
If 12 is floating-point number it will be coded in other way.
A memory address is simply the position of a given byte in memory. The zeroth byte is at 0x00000000. The tenth at 0x0000000A. The 65535th at 0x0000FFFF. And so on.
Local variables live on the stack*. When compiling a block of code, the compiler counts how many bytes are needed to hold all the local variables, and then increments the stack pointer so that all the variables can fit below it (along with some other stuff like frame pointers and return addresses and whatnot). Then it just remembers that, for example, local variable x is at an offset -2 from the stack pointer, foo is at an offset -4 and so on, and uses those addresses whenever those variables are referenced in the following code.
Since the compiler knows that x is at address (stack pointer - 2), that's the location that is set to the value 12 when you do x = 12.
Not entirely sure if I understand this question, but say you want to read the memory at address 0x00EB5748. The control unit in the CPU reads the instruction, sees that it is a load instruction, and passes the address (in binary of course) to the load/store unit, along with some other junk like how many bytes to read. Then the LSU sends that address to some memory (probably L1 cache), and after a certain time gets the value 12 back. Then this data is available to, say, put in a register, or send to the ALU to do arithmetic, or whatever.
That seems to be accurate, yes. Going back to the first question, an address simply means "byte number 0xWHATEVER in memory".
Hope this clarified things a bit at least.
*I should probably explain the stack as well. A stack is a portion of memory reserved for local variables (and some other stuff). It starts at a fixed location in memory, and stops at the memory address contained in a special register called the stack pointer. To begin with, the stack is empty, so the stack pointer just contains the start of the stack. As you put more data on the stack, the SP is incremented. This means that you can always put more data on it simply by putting it at the address in the SP, and then incrementing the SP so that once again anything past that address is free memory.

Do integers, whose size is not a power of two, make sense?

This is an 8 bit architecture, with a word size of 16 bits. I now need to use a 48-bit integer variable. My understanding is that libm implements 8, 16, 32, 64 bit operations (addition, multiplication, signed and unsigned).
So in order to make calculations, I must store the value in a 64-bit signed or unsigned integer. Correct?
If so, what is there to prevent general routines from being used? For example, for addition:
start with the LSB of both variables
add them up
if more bytes are available continue, otherways goto ready
shift both variables 1 byte to the right
goto 1)
libm implements the routines for the standard sizes of types, and the compiler chooses the right one to use for expression.
If you want to implement your own types, you can. If you want to use the usual operators, then you have to get into the compilation process to get the compiler to choose yours.
You could implement the operations as functions, say add(int48_t, int48_t), but then the compiler won't be able to do optimizations like constant folding, etc.
So, there is nothing stopping you from implementing your own custom compiler, but is it really necessary? Do you really need to save that space? If so, then go for it!
That is correct, saving a couple of bits is (in almost all cases) not worth the trouble of implementing your own logic.

Rule of thumb: size of boost archive in relation to original serialized object?

For reasons that I will gloss over, I need to set aside space of a fixed size, and then use boost serialization to store an object there. The choice of archive format is arbitrary, and portability is not a concern.
The class is fairly complex (members include fundamental types, arrays, pointers, and child classes) and guaranteed to grow over time.
Does anyone have worthwhile sizing guestimates they trust? Space is important, but it's not at a premium. I'm looking for relatively simple answers like "2*(sizeof X) for binary" or "4 * number of members + 3*sizeof(X) if you like text archives".
Thanks
No responses, so here's what experimentation showed.
From our application, one class had ~190 members, sizeof(A) = 12704. That's a little shy of actual total size due to pointers.
Size of binary_oarchive was 13981 and text_oarchive was 21237. This was for default traits, and an archive with a half-dozen derived types registered too.
So, I'm going to use 2*sizeof(A) as an upper bound for a text archive, and maybe 1.5* for a binary.