Getting the Leftmost Bit - objective-c

I have a 5 bit integer that I'm working with. Is there a native function in Objective-C that will let me know which bit is the leftmost?
i.e. I have 01001, it would return 8 or the position.
Thanks

You can build a lookup table, with 32 elements: 0, 1, 2, 2, 3, etc.

This is effectively the same operation as counting he number of leading 0s. Some CPUs have an instruction for this, otherwise you can use tricks such as those found in Hacker's Delight.
It's also equivalent to rounding down to the nearest power of 2, and again you can find efficient methods for doing this in Hacker's Delight, e.g.
uint8_t flp2(uint8_t x)
{
x = x | (x >> 1);
x = x | (x >> 2);
x = x | (x >> 4);
return x - (x >> 1);
}
See also: Previous power of 2

NSInteger value = 9;
NSInteger shift = 1;
for(NSInteger bit = value; bit > 1; bit = value >> ++shift);
NSInteger leftmostbit = 1 << shift;
Works for every number of bits.

If you don't want to use a table lookup, I would use 31 - __builtin_clz(yourNumber).
__builtin_clz( ) is a compiler intrinsic supported by gcc, llvm-gcc, and clang (and possibly other compilers as well). It returns the number of leading zero bits in an integer argument. Subtracting that from 31 gives you the position of the highest-order set bit. It should generate reasonably fast code on any target architecture.

Stanford Bit Twiddling Hacks have lots of examples of how to accomplish this.

If you mean the value of whatever bit is in position five from the right (the "leftmost" of a five-bit value), then:
int value = 17;
int bit = (value >> 4) & 1; // bit is 1
If you mean the position of the leftmost bit that is 1:
int value = 2;
int position;
for (position = 0; position < 5; position++) {
int bit = (value >> position) & 1;
if (bit == 1)
break;
}
// position is 1
Position will be 0 for the bit furthest to the right, 4 for the leftmost bit of your five-bit value, or 5 if all bits where zero.
Note: this is not the most effective solution, in clock cycles. It is hopefully a reasonably clear and educational one. :)

To clear all bits below the most significant bit:
while ( x & (x-1) ) x &= x - 1;
// 01001 => 01000
To clear all bits above the least significant bit:
x &= -x;
// 01001 => 00001
To get the position of the only set bit in a byte:
position = ((0x56374210>>(((((x)&-(x))*0x17)>>3)&0x1C))&0x07);
// 01000 => 3
In libkern.h there is a clz function defined to count leading zeros in a 32 bit int. That is the closest thing to a native Objective-C function. To get the position of the most significant bit in an int:
position = 31 - clz( x );
// 01001 => 3

I don't know objective C but this is how I would do it in C.
pow(2, int(log2(Number))
This should give you the left most 1 bit value.
PLEASE SEE STEPHEN CANON'S COMMENT BELOW BEFORE USING THIS SOLUTION.

With VC++ have a look at
_BitScanReverse/(64) in

Related

16-digit number manipulation on a 32-bit programming language

I have a simple problem, but because this "programming language" I am using is 32-bit and only supports basic functions such as addition, subtraction, multiplication, division, and concatenation (literally that's it), I am having some trouble.
For the input, I have a 16 digit number like so: 3334,5678,9523,4567
I want to then subtract 2 other random 16 digit numbers from this number and check if the first and last digits are 1.
For example, if the two other numbers are 1111,1111,1111,1111 and 1234,5678,9123,4565.
My final number would be: 0988,8888,9288,8891.
Here, the last number is 1, but the first number is 0, so the test would fail.
The issue is with 32-bit systems, there are massive errors due to not enough precision provided by the bits. What are some ways to bypass this issue?
If you're using a language like C or Java you should be able to use a long to create a 64 bit integer. If that's not possible you could divide the numbers into two 32 bit numbers, one to hold the upper half and one to hold the lower half.
Something like this:
//Each half is 8 digits to represent 8 of the 16
//Because of this each half should be less than 100000000
int upperHalf = 33345678;
int lowerHalf = 95234567;
//randomInt represents a function to generate a random
//integer equal to or greater than 0 and less than the
//argument passed to it
int randUpperHalf = randomInt(100000000);
int randLowerHalf = randomInt(100000000);
int lowerHalf = lowerHalf - randLowerHalf;
//If lowerHalf was a negative number you need to borrow from the upperHalf
if (lowerHalf < 0) {
upperHalf = upperHalf - 1;
lowerHalf = lowerHalf + 100000000;
}
upperHalf = upperHalf - randUpperHalf;
//Check that the first and last digits are 1
if ((upperHalf / 100000000) == 1 && (lowerHalf % 10) == 1) {
//The first and last digits are 1
}
Edit: Comments have been added to explain the code better. (lowerHalf % 2) == 1 has been changed to (lowerHalf % 10) == 1 and should now be able to tell if the number ends in a 1.

Difference between 1 and 1'b1 in Verilog

What is the difference between just giving 1 and giving 1'b1 in verilog code?
The 1 is 32 bits wide, thus is the equivalent of 32'b00000000_00000000_00000000_00000001
The 1'b1 is one bit wide.
There are several places where you should be aware of the difference in length but the one most likely to catch you out is in concatenations. {}
reg [ 7:0] A;
reg [ 8:0] B;
assign A = 8'b10100101;
assign B = {1'b1,A}; // B is 9'b110100101
assign B = {1,A}; // B is 9'b110100101
assign B = {A,1'b1}; // B is 9'b101001011
assign B = {A,1}; // B is 9'b000000001 !!!!
So, what's the difference between, say,
logic [7:0] count;
...
count <= count + 1'b1;
and
logic [7:0] count;
...
count <= count + 1;
Not a lot. In the first case your simulator/synthesiser will do this:
i) expand the 1'b1 to 8'b1 (because count is 8 bits wide)
ii) do all the maths using 8 bits (because now everything is 8 bits wide).
In the second case your simulator/synthesiser will do this:
i) do all the maths using 32 bits (because 1 is 32 bits wide)
ii) truncate the 32-bit result to 8 bits wide (because count is 8 bits wide)
The behaviour will be the same. However, that is not always the case. This:
count <= (count * 8'd255) >> 8;
and this:
count <= (count * 255) >> 8;
will behave differently. In the first case, 8 bits will be used for the multiplication (the width of the 8 in the >> 8 is irrelevant) and so the multiplication will overflow; in the second case, 32 bits will be used for the multiplication and so everything will be fine.
1'b1 is an binary, unsigned, 1-bit wide integral value. In the original verilog specification, 1 had the same type as integer. It was signed, but its width was unspecified. A tool could choose the width base on its host implementation of the int type.
Since Verilog 2001 and SystemVerilog 2005, the width of integer and int was fixed at 32-bits. However, because of this original unspecified width, and the fact that so many people write 0 or 1 without realizing that it is now 32-bits wide, the standard does not allow you to use an unbased literal inside a concatenation. {A,1} is illegal.

How does this color blending trick that works on color components in parallel work?

I saw this Java code that does a perfect 50% blend between two RGB888 colors extremely efficiently:
public static int blendRGB(int a, int b) {
return (a + b - ((a ^ b) & 0x00010101)) >> 1;
}
That's apparently equivalent to extracting and averaging the channels individually. Something like this:
public static int blendRGB_(int a, int b) {
int aR = a >> 16;
int bR = b >> 16;
int aG = (a >> 8) & 0xFF;
int bG = (b >> 8) & 0xFF;
int aB = a & 0xFF;
int bB = b & 0xFF;
int cR = (aR + bR) >> 1;
int cG = (aG + bG) >> 1;
int cB = (aB + bB) >> 1;
return (cR << 16) | (cG << 8) | cB;
}
But the first way is much more efficient. My questions are: How does this magic work? What else can I do with it? And are there more tricks similar to this?
(a ^ b) & 0x00010101 is what the least significant bits of the channels would have been in a + b if no carry had come from the right.
Subtracting it from the sum guarantees that the bit that is shifted into the most significant bit of the next channel is just the carry from that channel, untainted by this channel. Of course that also means that this channel is no longer effected by the carry from the next channel.
An other way to look this, not the way it does it but a way that may help you understand it, is that effectively the inputs are changed so that their sum is even for all channels. The carries then go nicely into the least significant bits (which are zero, because even), without disturbing anything. Of course what it actually does is sort of the other way around, first it just sums them, and only then does it ensure that the sums are even for all channels. But the order doesn't matter.
More concretely, there are 4 cases (before the carry from the next channel is applied):
the lsb of a channel is 0 and there is no carry from the next channel.
the lsb of a channel is 0 and there is a carry from the next channel.
the lsb of a channel is 1 and there is no carry from the next channel.
the lsb of a channel is 1 and there is a carry from the next channel.
The first two cases are trivial. The shift puts the carried bit back in channel it belongs to, it doesn't even matter whether it was 0 or 1.
Case 3 is more interesting. If the lsb is 1, that means the shift would shift that bit into the most significant bit of the next channel. That's bad. That bit has to be unset somehow - but you can't just mask it away because maybe you're in case 4.
Case 4 is the most interesting. If the lsb is 1 and there is a carry into that bit, it rolls over to a 0 and the carry is propagated. That can't be undone by masking, but it can be done by reversing the process, ie subtracting 1 from the lsb (which puts it back to 1 and undoes any damage done by the propagated carry).
As you can see, in both case 3 and case 4, the cure is subtracting 1 from the lsb, and those are also the cases in which the lsb really wanted to be 1 (though maybe it isn't any more, due to a carry from the next channel), and in both case 1 and 2, you don't have to anything (in other words, subtract 0). That exactly corresponds to subtracting "what the lsb would have been in a + b if no carry had come from the right".
Also, the blue channel can only fall into cases 1 or 3 (there is no next channel which could carry), and the shift would just discard that bit instead of putting it in the next channel (because there is none). So alternatively, you may write (note the mask has lost the least significant 1)
public static int blendRGB(int a, int b) {
return (a + b - ((a ^ b) & 0x00010100)) >> 1;
}
Doesn't really make any difference, though.
To make it work for ARGB8888, you can switch to the good old "SWAR average":
// channel-by-channel average, no alpha blending
public static int blendARGB(int a, int b) {
return (a & b) + (((a ^ b) & 0xFEFEFEFE) >>> 1);
}
Which is a variation on a recursive way to define addition: x + y = (x ^ y) + ((x & y) << 1) which computes the sum without carries, then adds the carries in separately. The base case is when one of the operands is zero.
Both halves are effectively shifted right by 1, in such a way that the carry out of the most significant bit is never lost. The mask ensures that bits don't move to a channel to the right, and simultaneously ensures that a carry won't propagate out of its channel.

Objective C - Random results is either 1 or -1

I am trying randomly generate a positive or negative number and rather then worry about the bigger range I am hoping to randomly generate either 1 or -1 to just multiply by my other random number.
I know this can be done with a longer rule of generating 0 or 1 and then checking return and using that to either multiply by 1 or -1.
Hoping someone knows of an easier way to just randomly set the sign on a number. Trying to keep my code as clean as possible.
I like to use arc4random() because it doesn't require you to seed the random number generator. It also conveniently returns a uint_32_t, so you don't have to worry about the result being between 0 and 1, etc. It'll just give you a random integer.
int myRandom() {
return (arc4random() % 2 ? 1 : -1);
}
If I understand the question correctly, you want a pseudorandom sequence of 1 and -1:
int f(void)
{
return random() & 1 ? 1 : -1;
// or...
// return 2 * (random() & 1) - 1;
// or...
// return ((random() & 1) << 1) - 1;
// or...
// return (random() & 2) - 1; // This one from Chris Lutz
}
Update: Ok, something has been bothering me since I wrote this. One of the frequent weaknesses of common RNGs is that the low order bits can go through short cycles. It's probably best to test a higher-order bit: random() & 0x80000 ? 1 : -1
To generate either 1 or -1 directly, you could do:
int PlusOrMinusOne() {
return (rand() % 2) * 2 - 1
}
But why are you worried about the broader range?
return ( ((arc4random() & 2) * 2) - 1 );
This extra step won't give you any additional "randomness". Just generate your number straight away in the range that you need (e.g. -10..10).
Standard rand() will return a value from this range: 0..1
You can multiply it by a constant to increase the span of the range or you can add a constant to push it left/right on the X-Axis.
E.g. to generate random values from from (-5..10) range you will have:
rand()*15-5
rand will give you a number from 0 to RAND_MAX which will cover every bit in an int except for the sign. By shifting that result left 1 bit you turn the signed MSB into the sign, but have zeroed-out the 0th bit, which you can repopulate with a random bit from another call to rand. The code will look something like:
int my_rand()
{
return (rand() << 1) + (rand() & 1);
}

What's the fastest way to divide an integer by 3?

int x = n / 3; // <-- make this faster
// for instance
int a = n * 3; // <-- normal integer multiplication
int b = (n << 1) + n; // <-- potentially faster multiplication
The guy who said "leave it to the compiler" was right, but I don't have the "reputation" to mod him up or comment. I asked gcc to compile int test(int a) { return a / 3; } for an ix86 and then disassembled the output. Just for academic interest, what it's doing is roughly multiplying by 0x55555556 and then taking the top 32 bits of the 64 bit result of that. You can demonstrate this to yourself with eg:
$ ruby -e 'puts(60000 * 0x55555556 >> 32)'
20000
$ ruby -e 'puts(72 * 0x55555556 >> 32)'
24
$
The wikipedia page on Montgomery division is hard to read but fortunately the compiler guys have done it so you don't have to.
This is the fastest as the compiler will optimize it if it can depending on the output processor.
int a;
int b;
a = some value;
b = a / 3;
There is a faster way to do it if you know the ranges of the values, for example, if you are dividing a signed integer by 3 and you know the range of the value to be divided is 0 to 768, then you can multiply it by a factor and shift it to the left by a power of 2 to that factor divided by 3.
eg.
Range 0 -> 768
you could use shifting of 10 bits, which multiplying by 1024, you want to divide by 3 so your multiplier should be 1024 / 3 = 341,
so you can now use (x * 341) >> 10
(Make sure the shift is a signed shift if using signed integers), also make sure the shift is an actually shift and not a bit ROLL
This will effectively divide the value 3, and will run at about 1.6 times the speed as a natural divide by 3 on a standard x86 / x64 CPU.
Of course the only reason you can make this optimization when the compiler cant is because the compiler does not know the maximum range of X and therefore cannot make this determination, but you as the programmer can.
Sometime it may even be more beneficial to move the value into a larger value and then do the same thing, ie. if you have an int of full range you could make it an 64-bit value and then do the multiply and shift instead of dividing by 3.
I had to do this recently to speed up image processing, i needed to find the average of 3 color channels, each color channel with a byte range (0 - 255). red green and blue.
At first i just simply used:
avg = (r + g + b) / 3;
(So r + g + b has a maximum of 768 and a minimum of 0, because each channel is a byte 0 - 255)
After millions of iterations the entire operation took 36 milliseconds.
I changed the line to:
avg = (r + g + b) * 341 >> 10;
And that took it down to 22 milliseconds, its amazing what can be done with a little ingenuity.
This speed up occurred in C# even though I had optimisations turned on and was running the program natively without debugging info and not through the IDE.
See How To Divide By 3 for an extended discussion of more efficiently dividing by 3, focused on doing FPGA arithmetic operations.
Also relevant:
Optimizing integer divisions with Multiply Shift in C#
Depending on your platform and depending on your C compiler, a native solution like just using
y = x / 3
Can be fast or it can be awfully slow (even if division is done entirely in hardware, if it is done using a DIV instruction, this instruction is about 3 to 4 times slower than a multiplication on modern CPUs). Very good C compilers with optimization flags turned on may optimize this operation, but if you want to be sure, you are better off optimizing it yourself.
For optimization it is important to have integer numbers of a known size. In C int has no known size (it can vary by platform and compiler!), so you are better using C99 fixed-size integers. The code below assumes that you want to divide an unsigned 32-bit integer by three and that you C compiler knows about 64 bit integer numbers (NOTE: Even on a 32 bit CPU architecture most C compilers can handle 64 bit integers just fine):
static inline uint32_t divby3 (
uint32_t divideMe
) {
return (uint32_t)(((uint64_t)0xAAAAAAABULL * divideMe) >> 33);
}
As crazy as this might sound, but the method above indeed does divide by 3. All it needs for doing so is a single 64 bit multiplication and a shift (like I said, multiplications might be 3 to 4 times faster than divisions on your CPU). In a 64 bit application this code will be a lot faster than in a 32 bit application (in a 32 bit application multiplying two 64 bit numbers take 3 multiplications and 3 additions on 32 bit values) - however, it might be still faster than a division on a 32 bit machine.
On the other hand, if your compiler is a very good one and knows the trick how to optimize integer division by a constant (latest GCC does, I just checked), it will generate the code above anyway (GCC will create exactly this code for "/3" if you enable at least optimization level 1). For other compilers... you cannot rely or expect that it will use tricks like that, even though this method is very well documented and mentioned everywhere on the Internet.
Problem is that it only works for constant numbers, not for variable ones. You always need to know the magic number (here 0xAAAAAAAB) and the correct operations after the multiplication (shifts and/or additions in most cases) and both is different depending on the number you want to divide by and both take too much CPU time to calculate them on the fly (that would be slower than hardware division). However, it's easy for a compiler to calculate these during compile time (where one second more or less compile time plays hardly a role).
For 64 bit numbers:
uint64_t divBy3(uint64_t x)
{
return x*12297829382473034411ULL;
}
However this isn't the truncating integer division you might expect.
It works correctly if the number is already divisible by 3, but it returns a huge number if it isn't.
For example if you run it on for example 11, it returns 6148914691236517209. This looks like a garbage but it's in fact the correct answer: multiply it by 3 and you get back the 11!
If you are looking for the truncating division, then just use the / operator. I highly doubt you can get much faster than that.
Theory:
64 bit unsigned arithmetic is a modulo 2^64 arithmetic.
This means for each integer which is coprime with the 2^64 modulus (essentially all odd numbers) there exists a multiplicative inverse which you can use to multiply with instead of division. This magic number can be obtained by solving the 3*x + 2^64*y = 1 equation using the Extended Euclidean Algorithm.
What if you really don't want to multiply or divide? Here is is an approximation I just invented. It works because (x/3) = (x/4) + (x/12). But since (x/12) = (x/4) / 3 we just have to repeat the process until its good enough.
#include <stdio.h>
void main()
{
int n = 1000;
int a,b;
a = n >> 2;
b = (a >> 2);
a += b;
b = (b >> 2);
a += b;
b = (b >> 2);
a += b;
b = (b >> 2);
a += b;
printf("a=%d\n", a);
}
The result is 330. It could be made more accurate using b = ((b+2)>>2); to account for rounding.
If you are allowed to multiply, just pick a suitable approximation for (1/3), with a power-of-2 divisor. For example, n * (1/3) ~= n * 43 / 128 = (n * 43) >> 7.
This technique is most useful in Indiana.
I don't know if it's faster but if you want to use a bitwise operator to perform binary division you can use the shift and subtract method described at this page:
Set quotient to 0
Align leftmost digits in dividend and divisor
Repeat:
If that portion of the dividend above the divisor is greater than or equal to the divisor:
Then subtract divisor from that portion of the dividend and
Concatentate 1 to the right hand end of the quotient
Else concatentate 0 to the right hand end of the quotient
Shift the divisor one place right
Until dividend is less than the divisor:
quotient is correct, dividend is remainder
STOP
For really large integer division (e.g. numbers bigger than 64bit) you can represent your number as an int[] and perform division quite fast by taking two digits at a time and divide them by 3. The remainder will be part of the next two digits and so forth.
eg. 11004 / 3 you say
11/3 = 3, remaineder = 2 (from 11-3*3)
20/3 = 6, remainder = 2 (from 20-6*3)
20/3 = 6, remainder = 2 (from 20-6*3)
24/3 = 8, remainder = 0
hence the result 3668
internal static List<int> Div3(int[] a)
{
int remainder = 0;
var res = new List<int>();
for (int i = 0; i < a.Length; i++)
{
var val = remainder + a[i];
var div = val/3;
remainder = 10*(val%3);
if (div > 9)
{
res.Add(div/10);
res.Add(div%10);
}
else
res.Add(div);
}
if (res[0] == 0) res.RemoveAt(0);
return res;
}
If you really want to see this article on integer division, but it only has academic merit ... it would be an interesting application that actually needed to perform that benefited from that kind of trick.
Easy computation ... at most n iterations where n is your number of bits:
uint8_t divideby3(uint8_t x)
{
uint8_t answer =0;
do
{
x>>=1;
answer+=x;
x=-x;
}while(x);
return answer;
}
A lookup table approach would also be faster in some architectures.
uint8_t DivBy3LU(uint8_t u8Operand)
{
uint8_t ai8Div3 = [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, ....];
return ai8Div3[u8Operand];
}