How to prove clickhouse 256 bits multiply right? - multiplication

ClickHouse Use widen for 256bits interger, but its' multiply seems not strightforward.
https://github.com/ClickHouse/ClickHouse/blob/master/base/base/wide_integer_impl.h#L550
How to prove it rightness? I got confuse with that.

Related

MPFR - Loss precision after addition

First, sorry if this question looks "silly", because I'm new to MPFR, LOL.
I have two mpfr_t variables with precision of 1024, and they have the value of 0.2 and 0.06 stored in them.
But when I add these variables, things goes wrong and the result (which is also a mpfr_t variable) has the value of 0.2599999...
This is strange because the MPFR library should maintain the precision (isn't it?).
Could you please help me with this? Thanks so much, so much in advance.
MPFR numbers are represented in binary (base 2). In this system, the only numbers that can be represented exactly have the form N·2k, where N and k are integers. Neither 0.2 = 1/5 nor 0.06 = 3/50 have this form, so that they are approximated with some small error. When you add these variables, you are seeing a consequence of this error (there may be also another error in the addition operation since in binary these numbers have many nonzero digits, unlike in decimal).
This is the same issue as the one described in: Is floating point math broken?
EDIT:
To answer the question in comment "Is there a way to avoid this situation?", no, there is no way to avoid this situation in practice, except in very specific cases. For instance, if all your numbers (inputs and results of each intermediate operations) are decimal numbers, representable with a small enough number of digits, you can use a decimal arithmetic (but MPFR can't do that). Computer algebra systems may help in some cases. There's also iRRAM... I'll come back to it later.
However, there are solutions to attempt to hide issues with numerical errors. You need to estimate the maximum possible error on a computed value. With an error analysis, you can obtain rigorous bounds, but this may be difficult or take time to do. Note that rigorous bounds are pessimistic in general, but if you use arbitrary precision (e.g. with MPFR), this is less an issue. The analysis can be done dynamically with interval arithmetic (still pessimistic, even worse). But perhaps a simple estimate is sufficient for you. Once you have an estimate of the maximum error:
For the output, choose the number of displayed digits so that the error is less than the weight of the last displayed digit.
For discontinuous functions (e.g. equality test, floor, ceil): if the distance between the computed value and a discontinuity point is less than the maximum error, assume that the actual value is equal to the discontinuity point. Note that this is just a heuristic, but if it fails (this may remain unnoticed and will probably invalidate your estimate), this means that you have not done your computations with enough precision.
Note: MPFR won't do that for you. But you can write code to take these rules into account.
The iRRAM package, which is based on MPFR, can track the error in a rigorous way (like with interval arithmetic) and automatically redo all the computations in a higher precision if it notices that the accuracy is too low. However, if some mathematical result is a discontinuity point, iRRAM won't help. In particular, it cannot provide a rigorous equality test.
Finally, I suggest that you have a look at Goldberg's paper What Every Computer Scientist Should Know About Floating-Point Arithmetic, in particular the notion of cancellation.

Why does floating point addition took longer than multiplication

I was working with PIC18f4550 and the program is critical to speed. when I multiply two floating variables it tooks the PIC about 140 cycles to perform the multiplication. I am measuring it with PIC18f4550 timer1.
variable_1 = variable_2 * variable_3; // took 140 cycles to implement
On the the other hand when I add the same two variables the PIC tooks 280 cycles to perfom the addition.
variable_1 = variable_2 + variable_3; // took 280 cycles to implement
I have seen that the number of cycles vary if the variables changed depend on their exponents.
What is the reason of those more cycles? though I was thinking the addition is more simple than multiplication.
Is there any solution?
For floating point addition, the operands need to be adjusted so that they have the same exponent before the add, and that involves shifting one of the mantissas across byte boundaries, whereas a multiply is basically multiplying the mantissas and adding the exponents.
Since the PIC apparently has a small hardware multiplier, it may not be surprising that sometimes the multiply can be faster than doing a multi-byte shift (especially if the PIC only has single bit shift instructions).
Unless a processor has direct support for it, floating point is always slow, and you should certainly consider arranging your code to use fixed point if at all possible. Getting rid of the floating point library would probably free up a lot of code space as well.

Is multiplying y by 2^x and subtracting y faster that multiplying y by [(2^x)-1] directly?

I have a rather theoretical question:
Is multiplying y by 2^x and subtracting y faster than
multiplying y by [(2^x)-1] directly?
(y*(2^x) - y) vs (y*((2^x)-1))
I implemented a moving average filter on some data I get from a sensor. The basic idea is that I want to average the last 2^x values by taking the old average, multiplying that by [(2^x)-1], adding the new value, and dividing again by 2^x. But because I have to do this more than 500 times a second, I want to optimize it as much as possible.
I know that floating point numbers are represented in IEEE754 and therefore, multiplying and dividing by a power of 2 should be rather fast (basically just changing the mantissa), but how to do that most efficiently? Should I simply stick with just multiplying ((2^x)-1), or is multiplying by 2.0f and subtracting y better, or could I even do that more efficiently by performing a leftshift on the mantissa? And if that is possible, how to implement that properly?
Thank you very much!
I don't think that multiplying a floating-point number by a power of two is faster in practice than a generic multiplication (though I agree that in theory it should be faster, assuming no overflow/underflow). Said otherwise, I don't think that there is a hardware optimization.
Now, I can assume that you have a modern processor, i.e. with a FMA. In this case, (y*(2^x) - y) is faster if performed as fma(y,2^x,-y) (the way you have to write the expression depends on your language and implementation): a FMA should be as fast as a multiplication in practice.
Note also that the speed may also depend on the context. For instance, I've observed on simple code that doing more work can surprisingly yield faster code! So, you need to test (on your real code, not with an arbitrary benchmark).

approximating log10[x^k0 + k1]

Greetings. I'm trying to approximate the function
Log10[x^k0 + k1], where .21 < k0 < 21, 0 < k1 < ~2000, and x is integer < 2^14.
k0 & k1 are constant. For practical purposes, you can assume k0 = 2.12, k1 = 2660. The desired accuracy is 5*10^-4 relative error.
This function is virtually identical to Log[x], except near 0, where it differs a lot.
I already have came up with a SIMD implementation that is ~1.15x faster than a simple lookup table, but would like to improve it if possible, which I think is very hard due to lack of efficient instructions.
My SIMD implementation uses 16bit fixed point arithmetic to evaluate a 3rd degree polynomial (I use least squares fit). The polynomial uses different coefficients for different input ranges. There are 8 ranges, and range i spans (64)2^i to (64)2^(i + 1).
The rational behind this is the derivatives of Log[x] drop rapidly with x, meaning a polynomial will fit it more accurately since polynomials are an exact fit for functions that have a derivative of 0 beyond a certain order.
SIMD table lookups are done very efficiently with a single _mm_shuffle_epi8(). I use SSE's float to int conversion to get the exponent and significand used for the fixed point approximation. I also software pipelined the loop to get ~1.25x speedup, so further code optimizations are probably unlikely.
What I'm asking is if there's a more efficient approximation at a higher level?
For example:
Can this function be decomposed into functions with a limited domain like
log2((2^x) * significand) = x + log2(significand)
hence eliminating the need to deal with different ranges (table lookups). The main problem I think is adding the k1 term kills all those nice log properties that we know and love, making it not possible. Or is it?
Iterative method? don't think so because the Newton method for log[x] is already a complicated expression
Exploiting locality of neighboring pixels? - if the range of the 8 inputs fall in the same approximation range, then I can look up a single coefficient, instead of looking up separate coefficients for each element. Thus, I can use this as a fast common case, and use a slower, general code path when it isn't. But for my data, the range needs to be ~2000 before this property hold 70% of the time, which doesn't seem to make this method competitive.
Please, give me some opinion, especially if you're an applied mathematician, even if you say it can't be done. Thanks.
You should be able to improve on least-squares fitting by using Chebyshev approximation. (The idea is, you're looking for the approximation whose worst-case deviation in a range is least; least-squares instead looks for the one whose summed squared difference is least.) I would guess this doesn't make a huge difference for your problem, but I'm not sure -- hopefully it could reduce the number of ranges you need to split into, somewhat.
If there's already a fast implementation of log(x), maybe compute P(x) * log(x) where P(x) is a polynomial chosen by Chebyshev approximation. (Instead of trying to do the whole function as a polynomial approx -- to need less range-reduction.)
I'm an amateur here -- just dipping my toe in as there aren't a lot of answers already.
One observation:
You can find an expression for how large x needs to be as a function of k0 and k1, such that the term x^k0 dominates k1 enough for the approximation:
x^k0 +k1 ~= x^k0, allowing you to approximately evaluate the function as
k0*Log(x).
This would take care of all x's above some value.
I recently read how the sRGB model compresses physical tri stimulus values into stored RGB values.
It basically is very similar to the function I try to approximate, except that it's defined piece wise:
k0 x, x < 0.0031308
k1 x^0.417 - k2 otherwise
I was told the constant addition in Log[x^k0 + k1] was to make the beginning of the function more linear. But that can easily be achieved with a piece wise approximation. That would make the approximation a lot more "uniform" - with only 2 approximation ranges. This should be cheaper to compute due to no longer needing to compute an approximation range index (integer log) and doing SIMD coefficient lookup.
For now, I conclude this will be the best approach, even though it doesn't approximate the function precisely. The hard part will be proposing this change and convincing people to use it.

Mathematical analysis of a sound sample (as an array of numbers)

I need to find the frequency of a sample, stored (in vb) as an array of byte. Sample is a sine wave, known frequency, so I can check), but the numbers are a bit odd, and my maths-foo is weak.
Full range of values 0-255. 99% of numbers are in range 235 to 245, but there are some outliers down to 0 and 1, and up to 255 in the remaining 1%.
How do I normalise this to remove outliers, (calculating the 235-245 interval as it may change with different samples), and how do I then calculate zero-crossings to get the frequency?
Apologies if this description is rubbish!
The FFT is probably the best answer, but if you really want to do it by your method, try this:
To normalize, first make a histogram to count how many occurrances of each value from 0 to 255. Then throw out X percent of the values from each end with something like:
for (i=lower=0;i< N*(X/100); lower++)
i+=count[lower];
//repeat in other direction for upper
Now normalize with
A[i] = 255*(A[i]-lower)/(upper-lower)-128
Throw away results outside the -128..127 range.
Now you can count zero crossings. To make sure you are not fooled by noise, you might want to keep track of the slope over the last several points, and only count crossings when the average slope is going the right way.
The standard method to attack this problem is to consider one block of data, hopefully at least twice the actual frequency (taking more data isn't bad, so it's good to overestimate a bit), then take the FFT and guess that the frequency corresponds to the largest number in the resulting FFT spectrum.
By the way, very similar problems have been asked here before - you could search for those answers as well.
Use the Fourier transform, it's much more noise insensitive than counting zero crossings
Edit: #WaveyDavey
I found an F# library to do an FFT: From here
As it turns out, the best free
implementation that I've found for F#
users so far is still the fantastic
FFTW library. Their site has a
precompiled Windows DLL. I've written
minimal bindings that allow
thread-safe access to FFTW from F#,
with both guru and simple interfaces.
Performance is excellent, 32-bit
Windows XP Pro is only up to 35%
slower than 64-bit Linux.
Now I'm sure you can call F# lib from VB.net, C# etc, that should be in their docs
If I understood well from your description, what you have is a signal which is a combination of a sine plus a constant plus some random glitches. Say, like
x[n] = A*sin(f*n + phi) + B + N[n]
where N[n] is the "glitch" noise you want to get rid of.
If the glitches are one-sample long, you can remove them using a median filter which has to be bigger than the glitch length. On both sides of the glitch. Glitches of length 1, mean you will have enough with a median of 3 samples of length.
y[n] = median3(x[n])
The median is computed so: Take the samples of x you want to filter (x[n-1],x[n],x[n+1]), sort them, and your output is the middle one.
Now that the noise signal is away, get rid of the constant signal. I understand the buffer is of a limited and known length, so you can just compute the mean of the whole buffer. Substract it.
Now you have your single sinus signal. You can now compute the fundamental frequency by counting zero crossings. Count the amount of samples above 0 in which the former sample was below 0. The period is the total amount of samples of your buffer divided by this, and the frequency is the oposite (1/x) of the period.
Although I would go with the majority and say that it seems like what you want is an fft solution (fft algorithm is pretty quick), if fft is not the answer for whatever reason you may want to try fitting a sine curve to the data using a fitting program and reading off the fitted frequency.
Using Fityk, you can load the data, and fit to a*sin(b*x-c) where 2*pi/b will give you the frequency after fitting.
Fityk can be used from a gui, from a command-line for scripting and has a C++ API so could be included in your programs directly.
I googled for "basic fft". Visual Basic FFT Your question screams FFT, but be careful, using FFT without understanding even a little bit about DSP can lead results that you don't understand or don't know where they come from.
get the Frequency Analyzer at http://www.relisoft.com/Freeware/index.htm and run it and look at the code.