Numpy: How does np.abs actually work under the hood? - numpy

I'm trying to implement my own absolute function for gonum dense vectors in Go. I'm wandering if there's a better way of getting the absolute value of an array than squaring and then square rooting?
My main issue is that I've had to implement my own element wise Newtonian square-root function on these vectors and there's a balance between implementation speed and accuracy. If I could avoid using this square-root function I'd be happy.

NumPy source code can be tricky to navigate, because it has so many functions for so many data types. You can find the C-level source code for the absolute value function in the file scalarmath.c.src. This file is actually a template with function definitions that are later replicated by the build system for several data types. Note each function is the "kernel" that is run for each element of the array (looping through the array is done somewhere else). The functions are always called <name of the type>_ctype_absolute, where <name of the type> is the data type it applies to and is generally templated. Let's go through them.
/**begin repeat
* #name = ubyte, ushort, uint, ulong, ulonglong#
*/
#define #name#_ctype_absolute #name#_ctype_positive
/**end repeat**/
This one is for unsigned types. In this case, the absolute value is the same as np.positive, which just copies the value without doing anything (it is what you get if you have an array a and you do +a).
/**begin repeat
* #name = byte, short, int, long, longlong#
* #type = npy_byte, npy_short, npy_int, npy_long, npy_longlong#
*/
static void
#name#_ctype_absolute(#type# a, #type# *out)
{
*out = (a < 0 ? -a : a);
}
/**end repeat**/
This one is for signed integers. Pretty straightforward.
/**begin repeat
* #name = float, double, longdouble#
* #type = npy_float, npy_double, npy_longdouble#
* #c = f,,l#
*/
static void
#name#_ctype_absolute(#type# a, #type# *out)
{
*out = npy_fabs#c#(a);
}
/**end repeat**/
This is for floating-point values. Here npy_fabsf, npy_fabs and npy_fabsl functions are used. These are declared in npy_math.h, but defined through templated C code in npy_math_internal.h.src, essentially calling the C/C99 counterparts (unless C99 is not available, in which case fabsf and fabsl are emulated with fabs). You might think that the previous code should work as well for floating-point types, but actually these are more complicated, since they have things like NaN, infinity or signed zeros, so it is better to use the standard C functions that deal with everything reliably.
static void
half_ctype_absolute(npy_half a, npy_half *out)
{
*out = a&0x7fffu;
}
This is actually not templated, it is the absolute value function for half-precision floating-point values. Turns out you can change sign by just doing that bitwise operation (set the first bit to 0), since half-precision is simpler (if more limited) than other floating-point types (it's usually the same for those, but with special cases).
/**begin repeat
* #name = cfloat, cdouble, clongdouble#
* #type = npy_cfloat, npy_cdouble, npy_clongdouble#
* #rtype = npy_float, npy_double, npy_longdouble#
* #c = f,,l#
*/
static void
#name#_ctype_absolute(#type# a, #rtype# *out)
{
*out = npy_cabs#c#(a);
}
/**end repeat**/
This last one is for complex types. These use npy_cabsf, npycabs and npy_cabsl functions, again declared in npy_math.h but in this case template-implemented in npy_math_complex.c.src using C99 functions (unless that is not available, in which case it is emulated with np.hypot).

Related

How kotlin optimizes checks that number belongs to range?

I'm investigating kotlin using decompilation to java code.
I've found one interesting nuance and can't understand how it is implemented.
Here's the kotlin code:
val result = 50 in 1..100
I'm using intelij idea decompilation to look for the equivalent of java code and here's what we have:
public final class Test14Kt {
private static final boolean result = true;
public static final boolean getResult() {
return result;
}
}
So as i understand it, kotlinc somehow knows that the element is in range and saves true to result variable on the stage of compilation.
It's cool. But how is it achieved?
This is very simple constant folding:
Terms in constant expressions are typically simple literals, such as the integer literal 2, but they may also be variables whose values are known at compile time. Consider the statement:
i = 320 * 200 * 32;
Most modern compilers would not actually generate two multiply instructions and a store for this statement. Instead, they identify constructs such as these and substitute the computed values at compile time (in this case, 2,048,000). The resulting code would load the computed value and store it rather than loading and multiplying several values.
Constant folding can even use arithmetic identities. When x is an integer type, the value of 0 * x is zero even if the compiler does not know the value of x.
Here,
50 in 1..100 ==
1 <= 50 && 50 <= 100 ==
true && true ==
true

Obj-C: Is it really safe to compare BOOL variables?

I used to think that in 64-bit Obj-C runtime BOOL is actually _Bool and it's a real type so it's safe to write like this:
BOOL a = YES;
BOOL b = NO;
if (a != b) {...}
It's been working seemingly fine but today I found a problem when I use bit field structs like this:
typedef struct
{
BOOL flag1 : 1;
} FlagsType;
FlagsType f;
f.flag1 = YES;
BOOL b = YES;
if (f.flag1 != b)
{
// DOES GET HERE!!!
}
It seems that BOOL returned from the bit field is equal to -1 while the regular BOOL is 1, and they are not equal!!!
Note that I am aware of the situation when an arbitrary integer number is cast to BOOL and therefore becomes a "strange" BOOL which is not safe to compare.
However in this situation, both flag1 field and b were declared as BOOL and never cast. What is the problem? Is this a compiler bug?
The bigger question is if it's really safe to compare BOOLs at all or should I write a XORing helper function? (It would be such a chore, because boolean comparisons are so ubiquitous...)
I do not repeat that using a C boolean type solves the problems one can have with BOOL. That's true – in particular here, as you can read below –, but most of the problems resulted from a wrong storage into a boolean (C) object. But in this case _Bool or unsigned (int) seem to be the only possible solution. (Except of solutions with extra code.) There is a reason for it:
I cannot find a precise documentation of the new behavior of BOOL in Objective-C, but the behavior you found is something between bad and buggy. I expected the latest behavior to be analogous to _Bool. That's not true in your case. (Thanks for finding that out!) Maybe this is for backwards compatibility. To tell the full story:
In C an object of the type int is signed int. (This is a difference to char. For this type the signedess is implementation defined.)
— int, signed, or signed int
ISO/IEC 9899:TC3, 6.7.2-2
Each of the comma-separated sets designates the same type, […]
ISO/IEC 9899:TC3, 6.7.2-5
But there is a weird exception for historical reasons:
If the int object is a bit-field, it is implementation defined, whether it is a signed int or an unsigned int. (Likely this is because some CPUs in the past could not automatically expand the sign of a partial byte integer. So having an unsigned integer is easier, because nulling the top bits is enough.)
On clang the default is signed int. So according to full-width integers int always denotes a signed integer, even it has only one bit. An int member : 1 can only store 0 and -1! (Therefore it is no solution to use int instead.)
Each of the comma-separated sets designates the same type, except that for bit-fields, it is implementation-defined whether the specifier int designates the same type as signed int or the same type as unsigned int.
ISO/IEC 9899:TC3, 6.7.2-5
The C standard says that a boolean bit-field is an integer type and therefore takes part on the weird integer signedness rule for bit-fields:
A bit-field is interpreted as a signed or unsigned integer type consisting of the specified number of bits.
ISO/IEC 9899:TC3, 6.7.2.1-9
This is the behavior you found. Because this is meaningless for 1 bit booleans types, the C standard explicitly denotes that storing a 1 into a boolean bit-field has to compare equal to 1 in every case:
If the value 0 or 1 is stored into a nonzero-width bit-field of type _Bool, the value of the bit-field shall compare equal to the value stored.
ISO/IEC 9899:TC3, 6.7.2.1-9
This leads to the strange situation, that an implementation can implement booleans of width 1 as { 0, -1 }, but has to fulfill 1 == -1. Great.
So, the short story: BOOL behaves like an integer bit-field (conforming to the standard), but does not take part on the extra requirement for _Bools.
I think this is, because of legacy code. (One could expect -1 in the past.)

Make Realloc behave like Calloc

How can I force Realloc to behave like calloc?
For instance:
I have the following structs:
typedef struct bucket0{
int hashID;
Registry registry;
}Bucket;
typedef struct table0{
int tSize;
int tElements;
Bucket** content;
}Table;
and I have the following code in order to grow the table:
int grow(Table* table){
Bucket** tempPtr;
//grow will add 1 to the number available buckets, and double it.
table->tSize++; //add 1
table->tSize *= 2; //double element
if(!table->content){
//table will be generated for the first time
table->content = (Bucket**)(calloc(sizeof(Bucket*), table->tSize));
} else {
//realloc content
tempPtr = (Bucket**)realloc(table->content, sizeof(Bucket)*table->tSize);
if(tempPtr){
table->content = tempPtr;
return 0;
}else{
return 1000;//table could not grow
}
}
}
When I execute it, the table grows properly, and MOST of the "Buckets" in it are initialized as a NULL ptr. However, not all of them are.
How can I make Realloc behave like calloc? in the sense that when it creates new "buckets" they initialize to NULL
Strictly speaking, you shouldn't be relying on calloc (or memset, for that matter) to set pointers to null. C doesn't guarantee that null pointers are represented by all-zero bytes in memory.
Quoting from the comp.lang.C FAQ question 7.31:
Don't rely on calloc's zero fill too much (see below); usually, it's best to initialize data structures yourself, on a field-by-field basis, especially if there are pointer fields.
calloc's zero fill is all-bits-zero, and is therefore guaranteed to yield the value 0 for all integral types (including '\0' for character types). But it does not guarantee useful null pointer values (see section 5 of this list) or floating-point zero values.
It's safer to initialize the individual structure fields yourself. You can create a static const one as a template, with its content initialized to NULL, and then memcpy it to each element of your dynamically-allocated array.

In what cases do we need functions for both double, float and long double?

In the math-headers we see
extern float fabsf(float);
extern double fabs(double);
extern long double fabsl(long double);
...
extern float fmodf(float, float);
extern double fmod(double, double);
extern long double fmodl(long double, long double);
Why is there one function for each type?
Isn't this a lot of duplicate code? If I where to say write a lerp-function or a clamp-function would I need to write one for each type?
Seems like we will have duplicate code where there's only one thing changing – the type.
extern float clampf(float value, float min, float max)
{
if(value > max)
return max;
if(value < min)
return min;
return value;
}
extern double clamp(double value, double min, double max)
{
if(value > max)
return max;
if(value < min)
return min;
return value;
}
Question 1: What is the historical reason for this structure?
Question 2: Should I follow the same pattern? Or should I only implement the double-kind since it is the one which is most common?
Question 3: Or should I just use macro's to overcome the type-issue altogether?
Historically (circa C89 and before), the math library contained only the double-precision versions of these functions, which is why those versions have no suffix. If you needed to compute the sine of a float, you either wrote your own implementation, or (more likely!) you simply wrote:
float x;
float y = sin(x);
However, this introduces some overhead on modern architectures. Specifically, on the most common architectures today, it is necessary for the compiler to emit code that looks something like this:
convert x to double
call sin
convert result to float
These conversions are pretty fast (about the same as an addition, usually), but they still have some cost. On top of the cost of conversion, sin needs to deliver a result that has ~53 bits of precision, more than half of which are completely wasted if the result is just going to be converted back to single precision. Between these two factors, it is possible for a dedicated single-precision sin routine to be about twice as fast; that’s a significant win for some very frequently-used library functions!
If we look at functions like fabs (and assume that the compiler does not simply inline and lower them), the situation is much, much worse. fabs, on a typical modern architecture, is a simple bitwise-and operation. So the two conversions bracketing the call (if all you have is double) are significantly more expensive than the operation itself, and can easily cause a 5x slowdown. That’s why multiple versions of these functions were added to support each FP type.
If you don’t want to keep track of all of them, you can #include <tgmath.h>, which will infer the correct function to use based on the type of the argument (meaning
sin((float)x)
will generate a call to sinf(x), whereas
sin((long double)x)
will call sinl(x)).
In your own code, you usually know a priori what the type of your arguments is, and only need to support one or maybe two types. clamp and lerp in particular are graphics operations, and almost universally are used only in single-precision variants.
Incidentally, the fact that you’re using clamp and lerp is a pretty good indication that you might want to look at writing your code in OpenCL instead of C/Obj-C; the OpenCL math library implements these operations (and many other similar operations) for you, and provides implementations that work with a wide range of basic types, including vectors.
float and double are different data types, same as int and long int. You can use the functions which operate on double on float values and implicit conversion will happen to make it work as expected in most circumstances, but if you use functions which operate on float on double values, you will almost inevitably lose precision.
There are other longer explanations available, e.g. What's the difference between a single precision and double precision floating point operation? .

How do I implement a bit array in C / Objective C

iOS / Objective-C: I have a large array of boolean values.
This is an inefficient way to store these values – at least eight bits are used for each element when only one is needed.
How can I optimise?
see CFMutableBitVector/CFBitVector for a CFType option
Try this:
#define BITOP(a,b,op) \
((a)[(size_t)(b)/(8*sizeof *(a))] op ((size_t)1<<((size_t)(b)%(8*sizeof *(a)))))
Then for any array of unsigned integer elements no larger than size_t, the BITOP macro can access the array as a bit array. For example:
unsigned char array[16] = {0};
BITOP(array, 40, |=); /* sets bit 40 */
BITOP(array, 41, ^=); /* toggles bit 41 */
if (BITOP(array, 42, &)) return 0; /* tests bit 42 */
BITOP(array, 43, &=~); /* clears bit 43 */
etc.
You use the bitwise logical operations and bit-shifting. (A Google search for these terms might give you some examples.)
Basically you declare an integer type (including int, char, etc.), then you "shift" integer values to the bit you want, then you do an OR or an AND with the integer.
Some quick illustrative examples (in C++):
inline bool bit_is_on(int bit_array, int bit_number)
{
return ((bit_array) & (1 << bit_number)) ? true : false;
}
inline void set_bit(int &bit_array, int bit_number)
{
bit_array |= (1 << bit_number);
}
inline void clear_bit(int &bit_array, int bit_number)
{
bit_array &= ~(1 << bit_number);
}
Note that this provides "bit arrays" of constant size (sizeof(int) * 8 bits). Maybe that's OK for you, or maybe you will want to build something on top of this. (Or re-use whatever some library provides.)
This will use less memory than bool arrays... HOWEVER... The code the compiler generates to access these bits will be larger and slower. So unless you have a large number of objects that need to contain these bit arrays, it might have a net-negative impact on both speed and memory usage.
#define BITOP(a,b,op) \
((a)[(size_t)(b)/(8*sizeof *(a))] op (size_t)1<<((size_t)(b)%(8*sizeof *(a))))
will not work ...
Fix:
#define BITOP(a,b,op) \
((a)[(size_t)(b)/(8*sizeof *(a))] op ((size_t)1<<((size_t)(b)%(8*sizeof *(a)))))
I came across this question as I am writing a bit array framework that is intent to manage large amounts of 'bits' similar to Java BitSet. I was looking to see if the name I decided on was in conflict with other Objective-C frameworks.
Anyway, I'm just starting this and am deciding whether to post it on SourceForge or other open source hosting sites.
Let me know if you are interested
Edit: I've created the project, called BitArray, on SourceForge. The source is in the SF SVN repository and I've also uploaded a compiled framework. This LINK will get your there.
Frank