Is there any reason to use NSInteger instead of uint8_t with NS_ENUM? - objective-c

The general standard appears to use NS_ENUM with NSInteger as the base type. Why is this the case? Assuming less than 256 cases (which covers almost any enumeration), is there any reason to use that instead of uint8_t, which could use less memory space? Either imports into Swift fine.
This is different than NS_OPTIONS, where a larger type makes sense, since you shouldn't be doing any bit math with enumerations, and you can use every number representable by the base type as a value.

The answer to the question in the title:
Is there any reason to use NSInteger instead of uint8_t with NS_ENUM?
is probably not.
When declaring an enum in C if no underlying type is specified the compiler is free to choose any suitable type from char and the signed and unsigned integer types which can at least represent all the values required. The current Xcode/Clang compiler picks a 4-byte integer. One could reasonably assume the compiler writers made an informed choice - some balance of performance and storage.
Smaller types, such as uint8_t, will usually be aligned on smaller boundaries in memory (or on disc) - but that is only of benefit if the adjacent field matches the alignment e.g. if a 2-byte size typed field follows a 1-byte sized typed field then unless otherwise specified (e.g. with a #pragma packed) there will probably be an intervening unused byte.
Whether any performance or storage differences are significant will be heavily dependent on the application. Follow the usual rule of thumb - don't optimise until an issue is found.
However if you find semantic benefit in limiting the size then certainly do so - there is no general reason you shouldn't. The choice is similar to picking signed vs. unsigned integers, some programmers avoid unsigned types for values that will be ≥ 0 unless absolutely required for the extra range, while others appreciate the semantic benefit.
Summary: There is no right answer, its largely a subjective issue.
HTH

First of all: The memory footprint is close to completely meaningless. You are talking about 1 Byte vs. 4/8 Bytes. (If the memory alignment does not force the usage of 4/8 bytes whatever you chosed.) How many NS_ENUM (C) objects do you want to have in your running app?
I guess that the reason is pretty easy: NSInteger is akin of "catch all" integer type in Cocoa. That makes assignments easier, especially you do not have to care about assigning a bigger integer type to a smaller one. Without casting this would lead to warnings.
Having more than one integer type in a desktop app with a 32/64 bit model is akin of an anachronism. Nor a Mac neither a MacBook neither an iPhone is an embedded micro controller …

You can use any integer data type including uint8_t with NS_ENUM as.
typedef NS_ENUM(uint8_t, eEnumAddEditViewMode) {
eWBEnumAddMode,
eWBEnumEditMode
};
In old c style standard NSInteger is default, because NSInteger is akin of "catch all" integer type in objective c. and developer can easily type boxing and unboxing with their own variable. This is just developer friendly best practise.

Related

Is it possible to get the native CPU size of an integer in Rust?

For fun, I'm writing a bignum library in Rust. My goal (as with most bignum libraries) is to make it as efficient as I can. I'd like it to be efficient even on unusual architectures.
It seems intuitive to me that a CPU will perform arithmetic faster on integers with the native number of bits for the architecture (i.e., u64 for 64-bit machines, u16 for 16-bit machines, etc.) As such, since I want to create a library that is efficient on all architectures, I need to take the target architecture's native integer size into account. The obvious way to do this would be to use the cfg attribute target_pointer_width. For instance, to define the smallest type which will always be able to hold more than the maximum native int size:
#[cfg(target_pointer_width = "16")]
type LargeInt = u32;
#[cfg(target_pointer_width = "32")]
type LargeInt = u64;
#[cfg(target_pointer_width = "64")]
type LargeInt = u128;
However, while looking into this, I came across this comment. It gives an example of an architecture where the native int size is different from the pointer width. Thus, my solution will not work for all architectures. Another potential solution would be to write a build script which codegens a small module which defines LargeInt based on the size of a usize (which we can acquire like so: std::mem::size_of::<usize>().) However, this has the same problem as above, since usize is based on the pointer width as well. A final obvious solution is to simply keep a map of native int sizes for each architecture. However, this solution is inelegant and doesn't scale well, so I'd like to avoid it.
So, my questions: is there a way to find the target's native int size, preferably before compilation, in order to reduce runtime overhead? Is this effort even worth it? That is, is there likely to be a significant difference between using the native int size as opposed to the pointer width?
It's generally hard (or impossible) to get compilers to emit optimal code for BigNum stuff, that's why https://gmplib.org/ has its low level primitive functions (mpn_... docs) hand-written in assembly for various target architectures with tuning for different micro-architecture, e.g. https://gmplib.org/repo/gmp/file/tip/mpn/x86_64/core2/mul_basecase.asm for the general case of multi-limb * multi-limb numbers. And https://gmplib.org/repo/gmp/file/tip/mpn/x86_64/coreisbr/aors_n.asm for mpn_add_n and mpn_sub_n (Add OR Sub = aors), tuned for SandyBridge-family which doesn't have partial-flag stalls so it can loop with dec/jnz.
Understanding what kind of asm is optimal may be helpful when writing code in a higher level language. Although in practice you can't even get close to that so it sometimes makes sense to use a different technique, like only using values up to 2^30 in 32-bit integers (like CPython does internally, getting the carry-out via a right shift, see the section about Python in this). In Rust you do have access to add_overflow to get the carry-out, but using it is still hard.
For practical use, writing Rust bindings for GMP is probably your best bet, unless that already exists.
Using the largest chunks possible is very good; on all current CPUs, add reg64, reg64 has the same throughput and latency as add reg32, reg32 or reg8. So you get twice as much work done per unit. And carry propagation through 64 bits of result in 1 cycle of latency.
(There are alternate ways to store BigInteger data that can make SIMD useful; #Mysticial explains in Can long integer routines benefit from SSE?. e.g. 30 value bits per 32-bit int, allowing you to defer normalization until after a few addition steps. But every use of such numbers has to be aware of these issues so it's not an easy drop-in replacement.)
In Rust, you probably want to just use u64 regardless of the target, unless you really care about small-number (single-limb) performance on 32-bit targets. Let the compiler build u64 operations for you out of add / adc (add with carry).
The only thing that might need to be ISA-specific is if u128 is not available on some targets. You want to use 64 * 64 => 128-bit full multiply as your building block for multiplication; if the compiler can do that for you with u128 then that's great, especially if it inlines efficiently.
See also discussion in comments under the question.
One stumbling block for getting compilers to emit efficient BigInt addition loops (even inside the body of one unrolled loop) is writing an add that takes a carry input and produces a carry output. Note that x += 0xff..ff + carry=1 needs to produce a carry out even though 0xff..ff + 1 wraps to zero. So in C or Rust, x += y + carry has to check for carry out in both the y+carry and the x+= parts.
It's really hard (probably impossible) to convince compiler back-ends like LLVM to emit a chain of adc instructions. An add/adc is doable when you don't need the carry-out from adc. Or probably if the compiler is doing it for you for u128.overflowing_add
Often compilers will turn the carry flag into a 0 / 1 in a register instead of using adc. You can hopefully avoid that for at least pairs of u64 in addition by combining the input u64 values to u128 for u128.overflowing_add. That will hopefully not cost any asm instructions because a u128 already has to be stored across two separate 64-bit registers, just like two separate u64 values.
So combining up to u128 could just be a local optimization for a function that adds arrays of u64 elements, to get the compiler to suck less.
In my library ibig what I do is:
Select architecture-specific size based on target_arch.
If I don't have a value for an architecture, select 16, 32 or 64 based on target_pointer_width.
If target_pointer_width is not one of these values, use 64.

Why does the C standard provide unsized types (int, long long, char vs. int32_t, int64_t, uint8_t etc.)?

Why weren't the contents of stdint.h the standard when it was included in the standard (no int no short no float, but int32_t, int16_t, float32_t etc.)? What advantage did/does ambiguous type sizes provide?
In objective-C, why was it decided that CGFloat, NSInteger, NSUInteger have different sizes on different platforms?
When C was designed, there were computers with different word sizes. Not just multiples of 8, but other sizes like the 18-bit word size on the PDP-7. So sometimes an int was 16 bits, but maybe it was 18 bits, or 32 bits, or some other size entirely. On a Cray-1 an int was 64 bits. As a result, int meant "whatever is convenient for this computer, but at least 16 bits".
That was about forty years ago. Computers have changed, so it certainly looks odd now.
NSInteger is used to denote the computer's word size, since it makes no sense to ask for the 5 billionth element of an array on a 32-bit system, but it makes perfect sense on a 64-bit system.
I can't speak for why CGFloat is a double on 64-bit system. That baffles me.
C is meant to be portable from enbedded devices, over your phone, to descktops, mainfraimes and beyond. These don't have the same base types, e.g the later may have uint128_t where others don't. Writing code with fixed width types would severely restrict portability in some cases.
This is why with preference you should neither use uintX_t nor int, long etc but the semantic typedefs such as size_t and ptrdiff_t. These are really the ones that make your code portable.

Handling magic constants during 64-bit migration

I confess I did something dumb and it now bites me. I used a magic number constant defined as NSUIntegerMax to define a special case index. The value is normally used as index to access selected item in NSArray. In the special case, denoted by the magic number I get the value from elsewhere, instead of from the array.
This index value is serialized in User Defaults as NSNumber.
With Xcode 5.1 my iOS app gets compiled with standard architecture that now also includes arm64. This changed the value of NSUIntegerMax, so now after deserialization I get 32-bit value of NSUIntegerMax, which no longer matches in comparisons with the magic number, whose value is now 64-bit NSUIntegerMax. And it results in NSRangeException with reason: -[__NSArrayI objectAtIndex:]: index 4294967295 beyond bounds [0 .. 10].
It is a minor issue in my code, given the normal range of that array is small, I may just get away with redefining my magic number as 4294967295. But it doesn't feel right. How should I have handled this issue properly?
I guess avoiding the magic number altogether would be the most robust approach?
Note
I think the problem with my magic number is roughly equivalent to what happened to NSNotFound constant. Apple's 64-bit Transition Guide for Cocoa Touch says in section about Common Type-Conversion Problems in Cocoa Touch:
Working with constants defined in the framework as NSInteger. Of particular note is the NSNotFound constant. In the 64-bit runtime, its value is larger than the maximum range of an int type, so truncating its value often causes errors in your app.
… but it does not say what should be done, except to be careful ;-)
If you use NSInteger/NSUInteger it's 4b on 32bit OS and 8b on 64 OS.
If you want to use the the same size integer for both OSs you should consider use int (4) or long long (8) or int32_t/int64_t. To get max int from int you can use cast:
(int)INT_MAX
//or LONG_MAX

Do integers, whose size is not a power of two, make sense?

This is an 8 bit architecture, with a word size of 16 bits. I now need to use a 48-bit integer variable. My understanding is that libm implements 8, 16, 32, 64 bit operations (addition, multiplication, signed and unsigned).
So in order to make calculations, I must store the value in a 64-bit signed or unsigned integer. Correct?
If so, what is there to prevent general routines from being used? For example, for addition:
start with the LSB of both variables
add them up
if more bytes are available continue, otherways goto ready
shift both variables 1 byte to the right
goto 1)
libm implements the routines for the standard sizes of types, and the compiler chooses the right one to use for expression.
If you want to implement your own types, you can. If you want to use the usual operators, then you have to get into the compilation process to get the compiler to choose yours.
You could implement the operations as functions, say add(int48_t, int48_t), but then the compiler won't be able to do optimizations like constant folding, etc.
So, there is nothing stopping you from implementing your own custom compiler, but is it really necessary? Do you really need to save that space? If so, then go for it!
That is correct, saving a couple of bits is (in almost all cases) not worth the trouble of implementing your own logic.

How are the digits in ObjC method type encoding calculated?

Is is a follow-up to my previous question:
What are the digits in an ObjC method type encoding string?
Say there is an encoding:
v24#0:4:8#12B16#20
How are those numbers calculated? B is a char so it should occupy just 1 byte (not 4 bytes). Does it have something to do with "alignment"? What is the size of void?
Is it correct to calculate the numbers as follows? Ask sizeof on every item and round up the result to multiple of 4? And the first number becomes the sum of all the other ones?
The numbers were used in the m68K days to denote stack layout. That is, you could literally decode the the method signature and, for just about all types, know exactly which bytes at what offset within the stack frame you could diddle to get/set arguments.
This worked because the m68K's ABI was entirely [IIRC -- been a long long time] stack based argument/return passing. There wasn't anything shoved into registers across call boundaries.
However, as Objective-C was ported to other platforms, always-on-the-stack was no longer the calling convention. Arguments and return values are often passed in registers.
Thus, those offsets are now useless. As well, the type encoding used by the compiler is no longer complete (because it never was terribly useful) and there will be types that won't be encoded. Not too mention that encoding some C++ templatized types yields method type encoding strings that can be many Kilobytes in size (I think the record I ran into was around 30K of type information).
So, no, it isn't correct to use sizeof() to generate the numbers because they are effectively meaningless to everything. The only reason why they still exist is for binary compatibility; there are bits of esoteric code here and there that still parse the type encoding string with the expectation that there will be random numbers sprinkled here and there.
Note that there are vestiges of API in the ObjC runtime that still lead one to believe that it might be possible to encode/decode stack frames on the fly. It really isn't as the C ABI doesn't guarantee that argument registers will be preserved across call boundaries in the face of optimization. You'd have to drop to assembly and things get ugly really really fast (>shudder<).
The full encoding string is constructed (in clang) by the method ASTContext::getObjCEncodingForMethodDecl, which you can find in lib/AST/ASTContext.cpp.
The method that does the size rounding is ASTContext::getObjCEncodingTypeSize, in the same file. It forces each size to be at least the size of an int. On all of Apple's current platforms, an int is 4 bytes.
The stack frame size and argument offsets are calculated by the compiler. I'm actually trying to track this down in the Clang source myself this week; it possibly has something to do with CodeGenTypes::arrangeObjCMessageSendSignature. (Looks like Rob just made my life a lot easier!)
The first number is the sum of the others, yes -- it's the total space occupied by the arguments. To get the size of the type represented by an ObjC type encoding in your code, you should use NSGetSizeAndAlignment().