How to round down a float to the nearest value that can be divided by two without rest?

How to round down a float to the nearest value that can be divided by two without rest? - objective-c

For example I have a float 55.2f and want to round it down such that the result can be divided by two without rest.
So 55.2 would become 54 as that is the nearest smaller "step" that can be divided by two. Is there a function for this or must I write my own algorithm?

If your result must remain a float, you can do:
float f=55.2f;
f=floorf(f/2.f)*2.f;

First convert to an integral type, such as int or long, and then clear the lowest bit.
float f = 55.2f;
int i = (int)f & ~1;
Explanation
~ means the bitwise inverse, i.e. all the 0 bits become 1, and vice versa.
So, if 1 has the bit pattern
0...0001
then ~1 is
1...1110
(Here I'm using ... to represent all the in-between bits depending on how big an integer is on your platform.)
When you & (bitwise AND) your integer with 1...1110, you are preserving the value of each bit apart from the lowest bit, which is forced to 0. See this description of the bitwise AND operator if you still don't get it.
By forcing the lowest bit to be 0, you are rounding the number down to the nearest even number.

You can write your own algorithm, for example with bitwise operators.
The following code works with clearing the last bit of your number. An even number has indeed the last bit not set.
int
f(float x)
{
return (int)x & ~1;
}

How about long int f = lrintf(x / 2);, where x is your float?
You could also just say int f = x / 2;, but some people have argued that that's more expensive, because the C standard mandates a specific rounding mode which may or may not be native to the CPU. The lrintf function on the other hand uses the CPU's native rounding mode. You need to #include <math.h>.

Related

Given no modulus or if even/odd function, how would one check for an odd or even number?

I have recently sat a computing exam in university in which we were never taught beforehand about the modulus function or any other check for odd/even function and we have no access to external documentation except our previous lecture notes. Is it possible to do this without these and how?

Bitwise AND (&)
Extract the last bit of the number using the bitwise AND operator. If the last bit is 1, then it's odd, else it's even. This is the simplest and most efficient way of testing it. Examples in some languages:
C / C++ / C#
bool is_even(int value) {
return (value & 1) == 0;
}
Java
public static boolean is_even(int value) {
return (value & 1) == 0;
}
Python
def is_even(value):
return (value & 1) == 0

I assume this is only for integer numbers as the concept of odd/even eludes me for floating point values.
For these integer numbers, the check of the Least Significant Bit (LSB) as proposed by Rotem is the most straightforward method, but there are many other ways to accomplish that.
For example, you could use the integer division operation as a test. This is one of the most basic operation which is implemented in virtually every platform. The result of an integer division is always another integer. For example:
>> x = int64( 13 ) ;
>> x / 2
ans =
7
Here I cast the value 13 as a int64 to make sure MATLAB treats the number as an integer instead of double data type.
Also here the result is actually rounded towards infinity to the next integral value. This is MATLAB specific implementation, other platform might round down but it does not matter for us as the only behavior we look for is the rounding, whichever way it goes. The rounding allow us to define the following behavior:
If a number is even: Dividing it by 2 will produce an exact result, such that if we multiply this result by 2, we obtain the original number.
If a number is odd: Dividing it by 2 will result in a rounded result, such that multiplying it by 2 will yield a different number than the original input.
Now you have the logic worked out, the code is pretty straightforward:
%% sample input
x = int64(42) ;
y = int64(43) ;
%% define the checking function
% uses only multiplication and division operator, no high level function
is_even = #(x) int64(x) == (int64(x)/2)*2 ;
And obvisouly, this will yield:
>> is_even(x)
ans =
1
>> is_even(y)
ans =
0

I found out from a fellow student how to solve this simplistically with maths instead of functions.
Using (-1)^n :
If n is odd then the outcome is -1
If n is even then the outcome is 1
This is some pretty out-of-the-box thinking, but it would be the only way to solve this without previous knowledge of complex functions including mod.

How do you multiply two fixed point numbers?

I am currently trying to figure out how to multiply two numbers in fixed point representation.
Say my number representation is as follows:
[SIGN][2^0].[2^-1][2^-2]..[2^-14]
In my case, the number 10.01000000000000 = -0.25.
How would I for example do 0.25x0.25 or -0.25x0.25 etc?
Hope you can help!

You should use 2's complement representation instead of a seperate sign bit. It's much easier to do maths on that, no special handling is required. The range is also improved because there's no wasted bit pattern for negative 0. To multiply, just do as normal fixed-point multiplication. The normal Q2.14 format will store value x/214 for the bit pattern of x, therefore if we have A and B then
So you just need to multiply A and B directly then divide the product by 214 to get the result back into the form x/214 like this
AxB = ((int32_t)A*B) >> 14;
A rounding step is needed to get the nearest value. You can find the way to do it in Q number format#Math operations. The simplest way to round to nearest is just add back the bit that was last shifted out (i.e. the first fractional bit) like this
AxB = (int32_t)A*B;
AxB = (AxB >> 14) + ((AxB >> 13) & 1);
You might also want to read these
Fixed-point arithmetic.
Emulated Fixed Point Division/Multiplication
Fixed point math in c#?
With 2 bits you can represent the integer range of [-2, 1]. So using Q2.14 format, -0.25 would be stored as 11.11000000000000. Using 1 sign bit you can only represent -1, 0, 1, and it makes calculations more complex because you need to split the sign bit then combine it back at the end.

Multiply into a larger sized variable, and then right shift by the number of bits of fixed point precision.

Here's a simple example in C:
int a = 0.25 * (1 << 16);
int b = -0.25 * (1 << 16);
int c = (a * b) >> 16;
printf("%.2f * %.2f = %.2f\n", a / 65536.0, b / 65536.0 , c / 65536.0);
You basically multiply everything by a constant to bring the fractional parts up into the integer range, then multiply the two factors, then (optionally) divide by one of the constants to return the product to the standard range for use in future calculations. It's like multiplying prices expressed in fractional dollars by 100 and then working in cents (i.e. $1.95 * 100 cents/dollar = 195 cents).
Be careful not to overflow the range of the variable you are multiplying into. Your constant might need to be smaller to avoid overflow, like using 1 << 8 instead of 1 << 16 in the example above.

What do the operators '<<' and '>>' do?

I was following 'A tour of GO` on http://tour.golang.org.
The table 15 has some code that I cannot understand. It defines two constants with the following syntax:
const (
Big = 1<<100
Small = Big>>99
)
And it's not clear at all to me what it means. I tried to modify the code and run it with different values, to record the change, but I was not able to understand what is going on there.
Then, it uses that operator again on table 24. It defines a variable with the following syntax:
MaxInt uint64 = 1<<64 - 1
And when it prints the variable, it prints:
uint64(18446744073709551615)
Where uint64 is the type. But I can't understand where 18446744073709551615 comes from.

They are Go's bitwise shift operators.
Here's a good explanation of how they work for C (they work in the same way in several languages).
Basically 1<<64 - 1 corresponds to 2^64 -1, = 18446744073709551615.
Think of it this way. In decimal if you start from 001 (which is 10^0) and then shift the 1 to the left, you end up with 010, which is 10^1. If you shift it again you end with 100, which is 10^2. So shifting to the left is equivalent to multiplying by 10 as many times as the times you shift.
In binary it's the same thing, but in base 2, so 1<<64 means multiplying by 2 64 times (i.e. 2 ^ 64).

That's the same as in all languages of the C family : a bit shift.
See http://en.wikipedia.org/wiki/Bitwise_operation#Bit_shifts
This operation is commonly used to multiply or divide an unsigned integer by powers of 2 :
b := a >> 1 // divides by 2
1<<100 is simply 2^100 (that's Big).
1<<64-1 is 2⁶⁴-1, and that's the biggest integer you can represent in 64 bits (by the way you can't represent 1<<64 as a 64 bits int and the point of table 15 is to demonstrate that you can have it in numerical constants anyway in Go).

The >> and << are logical shift operations. You can see more about those here:
http://en.wikipedia.org/wiki/Logical_shift
Also, you can check all the Go operators in their webpage

It's a logical shift:
every bit in the operand is simply moved a given number of bit
positions, and the vacant bit-positions are filled in, usually with
zeros
Go Operators:
<< left shift integer << unsigned integer
>> right shift integer >> unsigned integer

Properly subtracting float values

I am trying to create an array of values. These values should be "2.4,1.6,.8,0". I am subtracting .8 at every step.
This is how I am doing it (code snippet):
float mean = [[_scalesDictionary objectForKey:#"M1"] floatValue]; //3.2f
float sD = [[_scalesDictionary objectForKey:#"SD1"] floatValue]; //0.8f
nextRegion = mean;
hitWall = NO;
NSMutableArray *minusRegion = [NSMutableArray array];
while (!hitWall) {
nextRegion -= sD;
if(nextRegion<0.0f){
nextRegion = 0.0f;
hitWall = YES;
}
[minusRegion addObject:[NSNumber numberWithFloat:nextRegion]];
}
I am getting this output:
minusRegion = (
"2.4",
"1.6",
"0.8000001",
"1.192093e-07",
0
)
I do not want the incredibly small number between .8 and 0. Is there a standard way to truncate these values?

Neither 3.2 nor .8 is exactly representable as a 32-bit float. The representable number closest to 3.2 is 3.2000000476837158203125 (in hexadecimal floating-point, 0x1.99999ap+1). The representable number closest to .8 is 0.800000011920928955078125 (0x1.99999ap-1).
When 0.800000011920928955078125 is subtracted from 3.2000000476837158203125, the exact mathematical result is 2.400000035762786865234375 (0x1.3333338p+1). This result is also not exactly representable as a 32-bit float. (You can see this easily in the hexadecimal floating-point. A 32-bit float has a 24-bit significand. “1.3333338” has one bit in the “1”, 24 bits in the middle six digits, and another bit in the ”8”.) So the result is rounded to the nearest 32-bit float, which is 2.400000095367431640625 (0x1.333334p+1).
Subtracting 0.800000011920928955078125 from that yields 1.6000001430511474609375 (0x1.99999cp+0), which is exactly representable. (The “1” is one bit, the five nines are 20 bits, and the “c” has two significant bits. The low bits two bits in the “c” are trailing zeroes and may be neglected. So there are 23 significant bits.)
Subtracting 0.800000011920928955078125 from that yields 0.800000131130218505859375 (0x1.99999ep-1), which is also exactly representable.
Finally, subtracting 0.800000011920928955078125 from that yields 1.1920928955078125e-07 (0x1p-23).
The lesson to be learned here is the floating-point does not represent all numbers, and it rounds results to give you the closest numbers it can represent. When writing software to use floating-point arithmetic, you must understand and allow for these rounding operations. One way to allow for this is to use numbers that you know can be represented. Others have suggested using integer arithmetic. Another option is to use mostly values that you know can be represented exactly in floating-point, which includes integers up to 224. So you could start with 32 and subtract 8, yielding 24, then 16, then 8, then 0. Those would be the intermediate values you use for loop control and continuing calculations with no error. When you are ready to deliver results, then you could divide by 10, producing numbers near 3.2, 2.4, 1.6, .8, and 0 (exactly). This way, your arithmetic would introduce only one rounding error into each result, instead of accumulating rounding errors from iteration to iteration.

You're looking at good old floating-point rounding error. Fortunately, in your case it should be simple to deal with. Just clamp:
if( val < increment ){
val = 0.0;
}
Although, as Eric Postpischil explained below:
Clamping in this way is a bad idea, because sometimes rounding will cause the iteration variable to be slightly less than the increment instead of slightly more, and this clamping will effectively skip an iteration. For example, if the initial value were 3.6f (instead of 3.2f), and the step were .9f (instead of .8f), then the values in each iteration would be slightly below 3.6, 2.7, 1.8, and .9. At that point, clamping converts the value slightly below .9 to zero, and an iteration is skipped.
Therefore it might be necessary to subtract a small amount when doing the comparison.
A better option which you should consider is doing your calculations with integers rather than floats, then converting later.
int increment = 8;
int val = 32;
while( val > 0 ){
val -= increment;
float new_float_val = val / 10.0;
};

Another way to do this is to multiply the numbers you get by subtraction by 10, then convert to an integer, then divide that integer by by 10.0.
You can do this easily with the floor function (floorf) like this:
float newValue = floorf(oldVlaue*10)/10;

Objective c division of two ints

I'm trying to produce a a float by dividing two ints in my program. Here is what I'd expect:
1 / 120 = 0.00833
Here is the code I'm using:
float a = 1 / 120;
However it doesn't give me the result I'd expect. When I print it out I get the following:
inf

Do the following
float a = 1./120.

You need to specify that you want to use floating point math.
There's a few ways to do this:
If you really are interested in dividing two constants, you can specify that you want floating point math by making the first constant a float (or double). All it takes is a decimal point.
float a = 1./120;
You don't need to make the second constant a float, though it doesn't hurt anything.
Frankly, this is pretty easy to miss so I'd suggest adding a trailing zero and some spacing.
float a = 1.0 / 120;
If you really want to do the math with an integer variable, you can type cast it:
float a = (float)i/120;

float a = 1/120;
float b = 1.0/120;
float c = 1.0/120.0;
float d = 1.0f/120.0f;
NSLog(#"Value of A:%f B:%f C:%f D:%f",a,b,c,d);
Output: Value of A:0.000000 B:0.008333 C:0.008333 D:0.008333
For float variable a : int / int yields integer which you are assigning to float and printing it so 0.0000000
For float variable b: float / int yields float, assigning to float and printing it 0.008333
For float variable c: float / float yields float, so 0.008333
Last one is more precise float. Previous ones are of type double: all floating point values are stored as double data types unless the value is followed by an 'f' to specifically specify a float rather than as a double.

In C (and therefore also in Objective-C), expressions are almost always evaluated without regard to the context in which they appear.
The expression 1 / 120 is a division of two int operands, so it yields an int result. Integer division truncates, so 1 / 120 yields 0. The fact that the result is used to initialize a float object doesn't change the way 1 / 120 is evaluated.
This can be counterintuitive at times, especially if you're accustomed to the way calculators generally work (they usually store all results in floating-point).
As the other answers have said, to get a result close to 0.00833 (which can't be represented exactly, BTW), you need to do a floating-point division rather than an integer division, by making one or both of the operands floating-point. If one operand is floating-point and the other is an integer, the integer operand is converted to floating-point first; there is no direct floating-point by integer division operation.
Note that, as #0x8badf00d's comment says, the result should be 0. Something else must be going wrong for the printed result to be inf. If you can show us more code, preferably a small complete program, we can help figure that out.
(There are languages in which integer division yields a floating-point result. Even in those languages, the evaluation isn't necessarily affected by its context. Python version 3 is one such language; C, Objective-C, and Python version 2 are not.)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas