Why is CGFloat float on 32 bit and double on 64 bit? - objective-c

From "CoreGraphics/CGBase.h":
#if defined(__LP64__) && __LP64__
# define CGFLOAT_TYPE double
# define CGFLOAT_IS_DOUBLE 1
# define CGFLOAT_MIN DBL_MIN
# define CGFLOAT_MAX DBL_MAX
#else
# define CGFLOAT_TYPE float
# define CGFLOAT_IS_DOUBLE 0
# define CGFLOAT_MIN FLT_MIN
# define CGFLOAT_MAX FLT_MAX
#endif
Why did Apple do this? What's the advantage?
I can seem to think of downsides only. Please enlighten me.

Apple explicitly says they did it "to provide a wider range and accuracy for graphical quantities." You can debate whether the wider range and accuracy have been really helpful in practice, but Apple is clear on what they were thinking.
It's worth remembering, BTW, that CGFloat was added in OS X 10.5, long before iPhones (and certainly long before 64-bit iPhones). Going 64-bit is more obviously beneficial on "big memory" machines like Macs. And Apple made "local architecture" types that were supposed to make it easier to transition between the "old" and "new" worlds. I think it's interesting that Swift brought over NSInteger as the default Int type (i.e. Int is architecture-specific). But they made Float and Double architecture independent. There is no equivalent of CGFloat in the language. I read this as a tacit acknowledgement that CGFloat wasn't the greatest idea. NEON only supports single precision floating point math. Double precision math has to be done on the VFP. (Not that NEON was a consideration when CGFloat was invented.)

It's a performance thing.
On a 32-bit CPU, a single precision, 32-bit float can be stored in a single register, and moved around quickly and efficiently, because it's the same size as an architecture-native pointer.
On a 64-bit CPU architecture, a 64-bit IEEE double has the same advantage of being the same size as a native pointer/register/etc.

Related

gpu optimization when multiplying by powers of 2 in a shader

Do modern GPUs optimize multiplication by powers of 2 by doing a bit shift? For example suppose I do the following in a shader:
float t = 0;
t *= 16;
t *= 17;
Is it possible the first multiplication will run faster than the second?
Floating point multiplication cannot be done by bit shift. Howerver, in theory floating point multiplication by power of 2 constants can be optimized. Floating point value is normally stored in the form of S * M * 2 ^ E, where S is a sign, M is mantissa and E is exponent. Multiplying by a power of 2 constant can be done by adding/substracting to the exponent part of the float, without modifying the other parts. But in practice, I would bet that on GPUs a generic multiply instruction is always used.
I had an interesting observation regarding the power of 2 constants while studying the disassembly output of the PVRShaderEditor (PowerVR GPUs). I have noticed that a certain range of power of 2 constants ([2^(-16), 2^10] in my case), use special notation, e.g. C65, implying that they are predefined. Whereas arbitrary constants, such as 3.0 or 2.3, use shared register notation (e.g. SH12), which implies they are stored as a uniform and probably incur some setup cost. Thus using power of 2 constants may yield some optimizational benefit at least on some hardware.

Explaining the different types in Metal and SIMD

When working with Metal, I find there's a bewildering number of types and it's not always clear to me which type I should be using in which context.
In Apple's Metal Shading Language Specification, there's a pretty clear table of which types are supported within a Metal shader file. However, there's plenty of sample code available that seems to use additional types that are part of SIMD. On the macOS (Objective-C) side of things, the Metal types are not available but the SIMD ones are and I'm not sure which ones I'm supposed to be used.
For example:
In the Metal Spec, there's float2 that is described as a "vector" data type representing two floating components.
On the app side, the following all seem to be used or represented in some capacity:
float2, which is typedef ::simd_float2 float2 in vector_types.h
Noted: "In C or Objective-C, this type is available as simd_float2."
vector_float2, which is typedef simd_float2 vector_float2
Noted: "This type is deprecated; you should use simd_float2 or simd::float2 instead"
simd_float2, which is typedef __attribute__((__ext_vector_type__(2))) float simd_float2
::simd_float2 and simd::float2 ?
A similar situation exists for matrix types:
matrix_float4x4, simd_float4x4, ::simd_float4x4 and float4x4,
Could someone please shed some light on why there are so many typedefs with seemingly overlapping functionality? If you were writing a new application today (2018) in Objective-C / Objective-C++, which type should you use to represent two floating values (x/y) and which type for matrix transforms that can be shared between app code and Metal?
The types with vector_ and matrix_ prefixes have been deprecated in favor of those with the simd_ prefix, so the general guidance (using float4 as an example) would be:
In C code, use the simd_float4 type. (You have to include the prefix unless you provide your own typedef, since C doesn't have namespaces.)
Same for Objective-C.
In C++ code, use the simd::float4 type, which you can shorten to float4 by using namespace simd;.
Same for Objective-C++.
In Metal code, use the float4 type, since float4 is a fundamental type in the Metal Shading Language [1].
In Swift code, use the float4 type, since the simd_ types are typealiased to shorter names.
Update: In Swift 5, float4 and related types have been deprecated in favor of SIMD4<Float> and related types.
These types are all fundamentally equivalent, and all have the same size and alignment characteristics so you can use them across languages. That is, in fact, one of the design goals of the simd framework.
I'll leave a discussion of packed types to another day, since you didn't ask.
[1] Metal is an unusual case since it defines float4 in the global namespace, then imports it into the metal namespace, which is also exported as the simd namespace. It additionally aliases float4 as vector_float4. So, you can use any of the above names for this vector type (except simd_float4). Prefer float4.
which type should you use to represent two floating values (x/y)
If you can avoid it, don't use a single SIMD vector to represent a single geometry x,y vector if you're using CPU SIMD.
CPU SIMD works best when you have many of the same thing in each SIMD vector, because they're actually stores in 16-byte or 32-byte vector registers where "vertical" operations between two vectors are cheap (packed add or multiply), but "horizontal" operations can mostly only be done with a shuffle + a vertical operation.
For example a vector of 4 x values and another vector of 4 y values lets you do 4 dot-products or 4 cross-products in parallel with no shuffling, so the overall throughput is significantly more dot-products per clock cycle than if you had a vector of [x1, y1, x2, y2].
See https://stackoverflow.com/tags/sse/info, and especially these slides: SIMD at Insomniac Games (GDC 2015) for more about planning your data layout and program design for doing many similar operations in parallel instead of trying to accelerate single operations.
The one exception to this rule is if you're only adding / subtracting to translate coordinates, because that's still purely a vertical operation even with an array-of-structs. And thus fine for CPU short-vector SIMD based on 16-byte vectors. (e.g. the 2nd element in one vector only interacts with the 2nd element in another vector, so no shuffling is needed.)
GPU SIMD is different, and I think has no problem with interleaved data. I'm not a GPU expert.
(I don't use Objective C or Metal, so I can't help you with the details of their type names, just what the underlying CPU hardware is good at. That's basically the same for x86 SSE/AVX, ARM NEON / AArch64 SIMD, or PowerPC Altivec. Horizontal operations are slower.)

How do I print a half-precision float using printf in the AMD OpenCL SDK?

The Programming Guide has instructions for doubles (%ld) and vector types (e.g. %v4f), but not half-precision floats.
Normally in C, varargs arguments are automatically promoted to larger datatypes, such as float to double. The OpenCL documentation seems to imply that a similar promotion applies there.
Therefore a simple %f should work also for half-length floats.

CGFloat: round, floor, abs, and 32/64 bit precision

TLDR: How do I call standard floating point code in a way that compiles both 32 and 64 bit CGFloats without warnings?
CGFloat is defined as either double or float, depending on the compiler settings and platform. I'm trying to write code that works well in both situations, without generating a lot of warnings.
When I use functions like floor, abs, ceil, and other simple floating point operations, I get warnings about values being truncated. For example:
warning: implicit conversion shortens 64-bit value into a 32-bit value
I'm not concerned about correctness or loss of precision in of calculations, as I realize that I could just use the double precision versions of all functions all of the time (floor instead of floorf, etc); however, I would have to tolerate these errors.
Is there a way to write code cleanly that supports both 32 bit and 64 bit floats without having to either use a lot of #ifdef __ LP64 __ 's, or write wrapper functions for all of the standard floating point functions?
You may use those functions from tgmath.h.
#include <tgmath.h>
...
double d = 1.5;
double e = floor(d); // will choose the 64-bit version of 'floor'
float f = 1.5f;
float g = floor(f); // will choose the 32-bit version of 'floorf'.
If you only need a few functions you can use this instead:
#if CGFLOAT_IS_DOUBLE
#define roundCGFloat(x) round(x)
#define floorCGFloat(x) floor(x)
#define ceilCGFloat(x) ceil(x)
#else
#define roundCGFloat(x) roundf(x)
#define floorCGFloat(x) floorf(x)
#define ceilCGFloat(x) ceilf(x)
#endif

Why might different computers calculate different arithmetic results in VB.NET?

I have some software written in VB.NET that performs a lot of calculations, mostly extracting jpegs to bitmaps and computing calculations on the pixels like convolutions and matrix multiplication. Different computers are giving me different results despite having identical inputs. What might be the reason?
Edit: I can't provide the algorithm because it's proprietary but I can provide all the relevant operations:
ULong \ ULong (Turuncating division)
Bitmap.Load("filename.bmp') (Load a bitmap into memory)
Bitmap.GetPixel(Integer, Integer) (Get a pixel's brightness)
Double + Double
Double * Double
Math.Sqrt(Double)
Math.PI
Math.Cos(Double)
ULong - ULong
ULong * ULong
ULong << ULong
List.OrderBy(Of Double)(Func)
Hmm... Is it possible that OrderBy is using a non-stable QuickSort and that QuickSort is using a random pivot? Edit: Just tested, nope. The sort is stable.
Turns out that Bitmap.Load("filename.jpeg") doesn't always produce the same bitmap on each computer. I still don't know why that is, however.
one or more bugs in the software (e.g uninitialised variables) ?
old Intel CPU floating point division bug ?
numerically unstable algorithm ?
Screen Drivers - Each Driver will GUI the values differently. While the pixel count is same the color depth may differ via the screen drivers. Now setup into an array and compare that array on those machines you may see a difference of several bytes.
I would print$ the totals and see what they add up to