How do you multiply two fixed point numbers? - multiplication

I am currently trying to figure out how to multiply two numbers in fixed point representation.
Say my number representation is as follows:
[SIGN][2^0].[2^-1][2^-2]..[2^-14]
In my case, the number 10.01000000000000 = -0.25.
How would I for example do 0.25x0.25 or -0.25x0.25 etc?
Hope you can help!

You should use 2's complement representation instead of a seperate sign bit. It's much easier to do maths on that, no special handling is required. The range is also improved because there's no wasted bit pattern for negative 0. To multiply, just do as normal fixed-point multiplication. The normal Q2.14 format will store value x/214 for the bit pattern of x, therefore if we have A and B then
So you just need to multiply A and B directly then divide the product by 214 to get the result back into the form x/214 like this
AxB = ((int32_t)A*B) >> 14;
A rounding step is needed to get the nearest value. You can find the way to do it in Q number format#Math operations. The simplest way to round to nearest is just add back the bit that was last shifted out (i.e. the first fractional bit) like this
AxB = (int32_t)A*B;
AxB = (AxB >> 14) + ((AxB >> 13) & 1);
You might also want to read these
Fixed-point arithmetic.
Emulated Fixed Point Division/Multiplication
Fixed point math in c#?
With 2 bits you can represent the integer range of [-2, 1]. So using Q2.14 format, -0.25 would be stored as 11.11000000000000. Using 1 sign bit you can only represent -1, 0, 1, and it makes calculations more complex because you need to split the sign bit then combine it back at the end.

Multiply into a larger sized variable, and then right shift by the number of bits of fixed point precision.

Here's a simple example in C:
int a = 0.25 * (1 << 16);
int b = -0.25 * (1 << 16);
int c = (a * b) >> 16;
printf("%.2f * %.2f = %.2f\n", a / 65536.0, b / 65536.0 , c / 65536.0);
You basically multiply everything by a constant to bring the fractional parts up into the integer range, then multiply the two factors, then (optionally) divide by one of the constants to return the product to the standard range for use in future calculations. It's like multiplying prices expressed in fractional dollars by 100 and then working in cents (i.e. $1.95 * 100 cents/dollar = 195 cents).
Be careful not to overflow the range of the variable you are multiplying into. Your constant might need to be smaller to avoid overflow, like using 1 << 8 instead of 1 << 16 in the example above.

Related

32-bit fractional multiplication with cross-multiplication method (no 64-bit intermediate result)

I am programming a fixed-point speech enhancement algorithm on a 16-bit processor. At some point I need to do 32-bit fractional multiplication. I have read other posts about doing 32-bit multiplication byte by byte and I see why this works for Q0.31 formats. But I use different Q formats with varying number of fractional bits.
So I have found out that for fractional bits less than 16, this works:
(low*low >> N) + low*high + high*low + (high*high << N)
where N is the number of fractional bits. I have read that the low*low result should be unsigned as well as the low bytes themselves. In general this gives exactly the result I want in any Q format with less than 16 fractional bits.
Now it gets tricky when the fractional bits are more than 16. I have tried out several numbers of shifts, different shifts for low*low and high*high I have tried to put it on paper, but I can't figure it out.
I know it may be very simple but the whole idea eludes me and I would be grateful for some comments or guidelines!
It's the same formula. For N > 16, the shifts just mean you throw out a whole 16-bit word which would have over- or underflowed. low*low >> N means just shift N-16 bit in the high word of the 32-bit result of the multiply and add to the low word of the result. high * high << N means just use the low word of the multiply result shifted left N-16 and add to the high word of the result.
There are a few ideas at play.
First, multiplication of 2 shorter integers to produce a longer product. Consider unsigned multiplication of 2 32-bit integers via multiplications of their 16-bit "halves", each of which produces a 32-bit product and the total product is 64-bit:
a * b = (a_hi * 216 + a_lo) * (b_hi * 216 + b_lo) =
a_hi * b_hi * 232 + (a_hi * b_lo + a_lo * b_hi) * 216 + a_lo * b_lo.
Now, if you need a signed multiplication, you can construct it from unsigned multiplication (e.g. from the above).
Supposing a < 0 and b >= 0, a *signed b must be equal
264 - ((-a) *unsigned b), where
-a = 232 - a (because this is 2's complement)
IOW,
a *signed b =
264 - ((232 - a) *unsigned b) =
264 + (a *unsigned b) - (b * 232), where 264 can be discarded since we're using 64 bits only.
In exactly the same way you can calculate a *signed b for a >= 0 and b < 0 and must get a symmetric result:
(a *unsigned b) - (a * 232)
You can similarly show that for a < 0 and b < 0 the signed multiplication can be built on top of the unsigned multiplication this way:
(a *unsigned b) - ((a + b) * 232)
So, you multiply a and b as unsigned first, then if a < 0, you subtract b from the top 32 bits of the product and if b < 0, you subtract a from the top 32 bits of the product, done.
Now that we can multiply 32-bit signed integers and get 64-bit signed products, we can finally turn to the fractional stuff.
Suppose now that out of those 32 bits in a and b N bits are used for the fractional part. That means that if you look at a and b as at plain integers, they are going to be 2N times greater than what they really represent, e.g. 1.0 is going to look like 2N (or 1 << N).
So, if you multiply two such integers the product is going to be 2N*2N = 22*N times greater than what it should represent, e.g. 1.0 * 1.0 is going to look like 22*N (or 1 << (2*N)). IOW, plain integer multiplication is going to double the number of fractional bits. If you want the product to
have the same number of fractional bits as in the multiplicands, what do you do? You divide the product by 2N (or shift it arithmetically N positions right). Simple.
A few words of caution, just in case...
In C (and C++) you cannot legally shift a variable left or right by the same or greater number of bits contained in the variable. The code will compile, but not work as you may expect it to. So, if you want to shift a 32-bit variable, you can shift it by 0 through 31 positions left or right (31 is the max, not 32).
If you shift signed integers left, you cannot overflow the result legally. All signed overflows result in undefined behavior. So, you may want to stick to unsigned.
Right shifts of negative signed integers are implementation-specific. They can either do an arithmetic shift or a logical shift. Which one, it depends on the compiler. So, if you need one of the two you need to either ensure that your compiler just supports it directly
or implement it in some other ways.

What do the operators '<<' and '>>' do?

I was following 'A tour of GO` on http://tour.golang.org.
The table 15 has some code that I cannot understand. It defines two constants with the following syntax:
const (
Big = 1<<100
Small = Big>>99
)
And it's not clear at all to me what it means. I tried to modify the code and run it with different values, to record the change, but I was not able to understand what is going on there.
Then, it uses that operator again on table 24. It defines a variable with the following syntax:
MaxInt uint64 = 1<<64 - 1
And when it prints the variable, it prints:
uint64(18446744073709551615)
Where uint64 is the type. But I can't understand where 18446744073709551615 comes from.
They are Go's bitwise shift operators.
Here's a good explanation of how they work for C (they work in the same way in several languages).
Basically 1<<64 - 1 corresponds to 2^64 -1, = 18446744073709551615.
Think of it this way. In decimal if you start from 001 (which is 10^0) and then shift the 1 to the left, you end up with 010, which is 10^1. If you shift it again you end with 100, which is 10^2. So shifting to the left is equivalent to multiplying by 10 as many times as the times you shift.
In binary it's the same thing, but in base 2, so 1<<64 means multiplying by 2 64 times (i.e. 2 ^ 64).
That's the same as in all languages of the C family : a bit shift.
See http://en.wikipedia.org/wiki/Bitwise_operation#Bit_shifts
This operation is commonly used to multiply or divide an unsigned integer by powers of 2 :
b := a >> 1 // divides by 2
1<<100 is simply 2^100 (that's Big).
1<<64-1 is 2⁶⁴-1, and that's the biggest integer you can represent in 64 bits (by the way you can't represent 1<<64 as a 64 bits int and the point of table 15 is to demonstrate that you can have it in numerical constants anyway in Go).
The >> and << are logical shift operations. You can see more about those here:
http://en.wikipedia.org/wiki/Logical_shift
Also, you can check all the Go operators in their webpage
It's a logical shift:
every bit in the operand is simply moved a given number of bit
positions, and the vacant bit-positions are filled in, usually with
zeros
Go Operators:
<< left shift integer << unsigned integer
>> right shift integer >> unsigned integer

How to round down a float to the nearest value that can be divided by two without rest?

For example I have a float 55.2f and want to round it down such that the result can be divided by two without rest.
So 55.2 would become 54 as that is the nearest smaller "step" that can be divided by two. Is there a function for this or must I write my own algorithm?
If your result must remain a float, you can do:
float f=55.2f;
f=floorf(f/2.f)*2.f;
First convert to an integral type, such as int or long, and then clear the lowest bit.
float f = 55.2f;
int i = (int)f & ~1;
Explanation
~ means the bitwise inverse, i.e. all the 0 bits become 1, and vice versa.
So, if 1 has the bit pattern
0...0001
then ~1 is
1...1110
(Here I'm using ... to represent all the in-between bits depending on how big an integer is on your platform.)
When you & (bitwise AND) your integer with 1...1110, you are preserving the value of each bit apart from the lowest bit, which is forced to 0. See this description of the bitwise AND operator if you still don't get it.
By forcing the lowest bit to be 0, you are rounding the number down to the nearest even number.
You can write your own algorithm, for example with bitwise operators.
The following code works with clearing the last bit of your number. An even number has indeed the last bit not set.
int
f(float x)
{
return (int)x & ~1;
}
How about long int f = lrintf(x / 2);, where x is your float?
You could also just say int f = x / 2;, but some people have argued that that's more expensive, because the C standard mandates a specific rounding mode which may or may not be native to the CPU. The lrintf function on the other hand uses the CPU's native rounding mode. You need to #include <math.h>.

Recognizing when to use the modulus operator

I know the modulus (%) operator calculates the remainder of a division. How can I identify a situation where I would need to use the modulus operator?
I know I can use the modulus operator to see whether a number is even or odd and prime or composite, but that's about it. I don't often think in terms of remainders. I'm sure the modulus operator is useful, and I would like to learn to take advantage of it.
I just have problems identifying where the modulus operator is applicable. In various programming situations, it is difficult for me to see a problem and realize "Hey! The remainder of division would work here!".
Imagine that you have an elapsed time in seconds and you want to convert this to hours, minutes, and seconds:
h = s / 3600;
m = (s / 60) % 60;
s = s % 60;
0 % 3 = 0;
1 % 3 = 1;
2 % 3 = 2;
3 % 3 = 0;
Did you see what it did? At the last step it went back to zero. This could be used in situations like:
To check if N is divisible by M (for example, odd or even)
or
N is a multiple of M.
To put a cap of a particular value. In this case 3.
To get the last M digits of a number -> N % (10^M).
I use it for progress bars and the like that mark progress through a big loop. The progress is only reported every nth time through the loop, or when count%n == 0.
I've used it when restricting a number to a certain multiple:
temp = x - (x % 10); //Restrict x to being a multiple of 10
Wrapping values (like a clock).
Provide finite fields to symmetric key algorithms.
Bitwise operations.
And so on.
One use case I saw recently was when you need to reverse a number. So that 123456 becomes 654321 for example.
int number = 123456;
int reversed = 0;
while ( number > 0 ) {
# The modulus here retrieves the last digit in the specified number
# In the first iteration of this loop it's going to be 6, then 5, ...
# We are multiplying reversed by 10 first, to move the number one decimal place to the left.
# For example, if we are at the second iteration of this loop,
# reversed gonna be 6, so 6 * 10 + 12345 % 10 => 60 + 5
reversed = reversed * 10 + number % 10;
number = number / 10;
}
Example. You have message of X bytes, but in your protocol maximum size is Y and Y < X. Try to write small app that splits message into packets and you will run into mod :)
There are many instances where it is useful.
If you need to restrict a number to be within a certain range you can use mod. For example, to generate a random number between 0 and 99 you might say:
num = MyRandFunction() % 100;
Any time you have division and want to express the remainder other than in decimal, the mod operator is appropriate. Things that come to mind are generally when you want to do something human-readable with the remainder. Listing how many items you could put into buckets and saying "5 left over" is good.
Also, if you're ever in a situation where you may be accruing rounding errors, modulo division is good. If you're dividing by 3 quite often, for example, you don't want to be passing .33333 around as the remainder. Passing the remainder and divisor (i.e. the fraction) is appropriate.
As #jweyrich says, wrapping values. I've found mod very handy when I have a finite list and I want to iterate over it in a loop - like a fixed list of colors for some UI elements, like chart series, where I want all the series to be different, to the extent possible, but when I've run out of colors, just to start over at the beginning. This can also be used with, say, patterns, so that the second time red comes around, it's dashed; the third time, dotted, etc. - but mod is just used to get red, green, blue, red, green, blue, forever.
Calculation of prime numbers
The modulo can be useful to convert and split total minutes to "hours and minutes":
hours = minutes / 60
minutes_left = minutes % 60
In the hours bit we need to strip the decimal portion and that will depend on the language you are using.
We can then rearrange the output accordingly.
Converting linear data structure to matrix structure:
where a is index of linear data, and b is number of items per row:
row = a/b
column = a mod b
Note above is simplified logic: a must be offset -1 before dividing & the result must be normalized +1.
Example: (3 rows of 4)
1 2 3 4
5 6 7 8
9 10 11 12
(7 - 1)/4 + 1 = 2
7 is in row 2
(7 - 1) mod 4 + 1 = 3
7 is in column 3
Another common use of modulus: hashing a number by place. Suppose you wanted to store year & month in a six digit number 195810. month = 195810 mod 100 all digits 3rd from right are divisible by 100 so the remainder is the 2 rightmost digits in this case the month is 10. To extract the year 195810 / 100 yields 1958.
Modulus is also very useful if for some crazy reason you need to do integer division and get a decimal out, and you can't convert the integer into a number that supports decimal division, or if you need to return a fraction instead of a decimal.
I'll be using % as the modulus operator
For example
2/4 = 0
where doing this
2/4 = 0 and 2 % 4 = 2
So you can be really crazy and let's say that you want to allow the user to input a numerator and a divisor, and then show them the result as a whole number, and then a fractional number.
whole Number = numerator/divisor
fractionNumerator = numerator % divisor
fractionDenominator = divisor
Another case where modulus division is useful is if you are increasing or decreasing a number and you want to contain the number to a certain range of number, but when you get to the top or bottom you don't want to just stop. You want to loop up to the bottom or top of the list respectively.
Imagine a function where you are looping through an array.
Function increase Or Decrease(variable As Integer) As Void
n = (n + variable) % (listString.maxIndex + 1)
Print listString[n]
End Function
The reason that it is n = (n + variable) % (listString.maxIndex + 1) is to allow for the max index to be accounted.
Those are just a few of the things that I have had to use modulus for in my programming of not just desktop applications, but in robotics and simulation environments.
Computing the greatest common divisor
Determining if a number is a palindrome
Determining if a number consists of only ...
Determining how many ... a number consists of...
My favorite use is for iteration.
Say you have a counter you are incrementing and want to then grab from a known list a corresponding items, but you only have n items to choose from and you want to repeat a cycle.
var indexFromB = (counter-1)%n+1;
Results (counter=indexFromB) given n=3:
`1=1`
`2=2`
`3=3`
`4=1`
`5=2`
`6=3`
...
Best use of modulus operator I have seen so for is to check if the Array we have is a rotated version of original array.
A = [1,2,3,4,5,6]
B = [5,6,1,2,3,4]
Now how to check if B is rotated version of A ?
Step 1: If A's length is not same as B's length then for sure its not a rotated version.
Step 2: Check the index of first element of A in B. Here first element of A is 1. And its index in B is 2(assuming your programming language has zero based index).
lets store that index in variable "Key"
Step 3: Now how to check that if B is rotated version of A how ??
This is where modulus function rocks :
for (int i = 0; i< A.length; i++)
{
// here modulus function would check the proper order. Key here is 2 which we recieved from Step 2
int j = [Key+i]%A.length;
if (A[i] != B[j])
{
return false;
}
}
return true;
It's an easy way to tell if a number is even or odd. Just do # mod 2, if it is 0 it is even, 1 it is odd.
Often, in a loop, you want to do something every k'th iteration, where k is 0 < k < n, assuming 0 is the start index and n is the length of the loop.
So, you'd do something like:
int k = 5;
int n = 50;
for(int i = 0;i < n;++i)
{
if(i % k == 0) // true at 0, 5, 10, 15..
{
// do something
}
}
Or, you want to keep something whitin a certain bound. Remember, when you take an arbitrary number mod something, it must produce a value between 0 and that number - 1.

What's the fastest way to divide an integer by 3?

int x = n / 3; // <-- make this faster
// for instance
int a = n * 3; // <-- normal integer multiplication
int b = (n << 1) + n; // <-- potentially faster multiplication
The guy who said "leave it to the compiler" was right, but I don't have the "reputation" to mod him up or comment. I asked gcc to compile int test(int a) { return a / 3; } for an ix86 and then disassembled the output. Just for academic interest, what it's doing is roughly multiplying by 0x55555556 and then taking the top 32 bits of the 64 bit result of that. You can demonstrate this to yourself with eg:
$ ruby -e 'puts(60000 * 0x55555556 >> 32)'
20000
$ ruby -e 'puts(72 * 0x55555556 >> 32)'
24
$
The wikipedia page on Montgomery division is hard to read but fortunately the compiler guys have done it so you don't have to.
This is the fastest as the compiler will optimize it if it can depending on the output processor.
int a;
int b;
a = some value;
b = a / 3;
There is a faster way to do it if you know the ranges of the values, for example, if you are dividing a signed integer by 3 and you know the range of the value to be divided is 0 to 768, then you can multiply it by a factor and shift it to the left by a power of 2 to that factor divided by 3.
eg.
Range 0 -> 768
you could use shifting of 10 bits, which multiplying by 1024, you want to divide by 3 so your multiplier should be 1024 / 3 = 341,
so you can now use (x * 341) >> 10
(Make sure the shift is a signed shift if using signed integers), also make sure the shift is an actually shift and not a bit ROLL
This will effectively divide the value 3, and will run at about 1.6 times the speed as a natural divide by 3 on a standard x86 / x64 CPU.
Of course the only reason you can make this optimization when the compiler cant is because the compiler does not know the maximum range of X and therefore cannot make this determination, but you as the programmer can.
Sometime it may even be more beneficial to move the value into a larger value and then do the same thing, ie. if you have an int of full range you could make it an 64-bit value and then do the multiply and shift instead of dividing by 3.
I had to do this recently to speed up image processing, i needed to find the average of 3 color channels, each color channel with a byte range (0 - 255). red green and blue.
At first i just simply used:
avg = (r + g + b) / 3;
(So r + g + b has a maximum of 768 and a minimum of 0, because each channel is a byte 0 - 255)
After millions of iterations the entire operation took 36 milliseconds.
I changed the line to:
avg = (r + g + b) * 341 >> 10;
And that took it down to 22 milliseconds, its amazing what can be done with a little ingenuity.
This speed up occurred in C# even though I had optimisations turned on and was running the program natively without debugging info and not through the IDE.
See How To Divide By 3 for an extended discussion of more efficiently dividing by 3, focused on doing FPGA arithmetic operations.
Also relevant:
Optimizing integer divisions with Multiply Shift in C#
Depending on your platform and depending on your C compiler, a native solution like just using
y = x / 3
Can be fast or it can be awfully slow (even if division is done entirely in hardware, if it is done using a DIV instruction, this instruction is about 3 to 4 times slower than a multiplication on modern CPUs). Very good C compilers with optimization flags turned on may optimize this operation, but if you want to be sure, you are better off optimizing it yourself.
For optimization it is important to have integer numbers of a known size. In C int has no known size (it can vary by platform and compiler!), so you are better using C99 fixed-size integers. The code below assumes that you want to divide an unsigned 32-bit integer by three and that you C compiler knows about 64 bit integer numbers (NOTE: Even on a 32 bit CPU architecture most C compilers can handle 64 bit integers just fine):
static inline uint32_t divby3 (
uint32_t divideMe
) {
return (uint32_t)(((uint64_t)0xAAAAAAABULL * divideMe) >> 33);
}
As crazy as this might sound, but the method above indeed does divide by 3. All it needs for doing so is a single 64 bit multiplication and a shift (like I said, multiplications might be 3 to 4 times faster than divisions on your CPU). In a 64 bit application this code will be a lot faster than in a 32 bit application (in a 32 bit application multiplying two 64 bit numbers take 3 multiplications and 3 additions on 32 bit values) - however, it might be still faster than a division on a 32 bit machine.
On the other hand, if your compiler is a very good one and knows the trick how to optimize integer division by a constant (latest GCC does, I just checked), it will generate the code above anyway (GCC will create exactly this code for "/3" if you enable at least optimization level 1). For other compilers... you cannot rely or expect that it will use tricks like that, even though this method is very well documented and mentioned everywhere on the Internet.
Problem is that it only works for constant numbers, not for variable ones. You always need to know the magic number (here 0xAAAAAAAB) and the correct operations after the multiplication (shifts and/or additions in most cases) and both is different depending on the number you want to divide by and both take too much CPU time to calculate them on the fly (that would be slower than hardware division). However, it's easy for a compiler to calculate these during compile time (where one second more or less compile time plays hardly a role).
For 64 bit numbers:
uint64_t divBy3(uint64_t x)
{
return x*12297829382473034411ULL;
}
However this isn't the truncating integer division you might expect.
It works correctly if the number is already divisible by 3, but it returns a huge number if it isn't.
For example if you run it on for example 11, it returns 6148914691236517209. This looks like a garbage but it's in fact the correct answer: multiply it by 3 and you get back the 11!
If you are looking for the truncating division, then just use the / operator. I highly doubt you can get much faster than that.
Theory:
64 bit unsigned arithmetic is a modulo 2^64 arithmetic.
This means for each integer which is coprime with the 2^64 modulus (essentially all odd numbers) there exists a multiplicative inverse which you can use to multiply with instead of division. This magic number can be obtained by solving the 3*x + 2^64*y = 1 equation using the Extended Euclidean Algorithm.
What if you really don't want to multiply or divide? Here is is an approximation I just invented. It works because (x/3) = (x/4) + (x/12). But since (x/12) = (x/4) / 3 we just have to repeat the process until its good enough.
#include <stdio.h>
void main()
{
int n = 1000;
int a,b;
a = n >> 2;
b = (a >> 2);
a += b;
b = (b >> 2);
a += b;
b = (b >> 2);
a += b;
b = (b >> 2);
a += b;
printf("a=%d\n", a);
}
The result is 330. It could be made more accurate using b = ((b+2)>>2); to account for rounding.
If you are allowed to multiply, just pick a suitable approximation for (1/3), with a power-of-2 divisor. For example, n * (1/3) ~= n * 43 / 128 = (n * 43) >> 7.
This technique is most useful in Indiana.
I don't know if it's faster but if you want to use a bitwise operator to perform binary division you can use the shift and subtract method described at this page:
Set quotient to 0
Align leftmost digits in dividend and divisor
Repeat:
If that portion of the dividend above the divisor is greater than or equal to the divisor:
Then subtract divisor from that portion of the dividend and
Concatentate 1 to the right hand end of the quotient
Else concatentate 0 to the right hand end of the quotient
Shift the divisor one place right
Until dividend is less than the divisor:
quotient is correct, dividend is remainder
STOP
For really large integer division (e.g. numbers bigger than 64bit) you can represent your number as an int[] and perform division quite fast by taking two digits at a time and divide them by 3. The remainder will be part of the next two digits and so forth.
eg. 11004 / 3 you say
11/3 = 3, remaineder = 2 (from 11-3*3)
20/3 = 6, remainder = 2 (from 20-6*3)
20/3 = 6, remainder = 2 (from 20-6*3)
24/3 = 8, remainder = 0
hence the result 3668
internal static List<int> Div3(int[] a)
{
int remainder = 0;
var res = new List<int>();
for (int i = 0; i < a.Length; i++)
{
var val = remainder + a[i];
var div = val/3;
remainder = 10*(val%3);
if (div > 9)
{
res.Add(div/10);
res.Add(div%10);
}
else
res.Add(div);
}
if (res[0] == 0) res.RemoveAt(0);
return res;
}
If you really want to see this article on integer division, but it only has academic merit ... it would be an interesting application that actually needed to perform that benefited from that kind of trick.
Easy computation ... at most n iterations where n is your number of bits:
uint8_t divideby3(uint8_t x)
{
uint8_t answer =0;
do
{
x>>=1;
answer+=x;
x=-x;
}while(x);
return answer;
}
A lookup table approach would also be faster in some architectures.
uint8_t DivBy3LU(uint8_t u8Operand)
{
uint8_t ai8Div3 = [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, ....];
return ai8Div3[u8Operand];
}