genfromtxt() artifact when displaying floats - numpy

In numpy, I'm reading an ASCII file (see below) using np.genfromtxt()
0.085 102175 0.00025
0.094 103325 0.00030
raw = genfromtxt(fn)
When checking raw I get the following:
>>> raw[0,0]
0.085000000000000006
How do I prevent the artifact 6 at the end and where does it come from?

This is normal behaviour, and is due to the fundamental imprecision of floating point arithmetic. In other words, 0.085 cannot be represented exactly in floating point bits. For this reason, it's generally a good idea to assume a bit of noise in any numerical calculations.

Related

MPFR - Loss precision after addition

First, sorry if this question looks "silly", because I'm new to MPFR, LOL.
I have two mpfr_t variables with precision of 1024, and they have the value of 0.2 and 0.06 stored in them.
But when I add these variables, things goes wrong and the result (which is also a mpfr_t variable) has the value of 0.2599999...
This is strange because the MPFR library should maintain the precision (isn't it?).
Could you please help me with this? Thanks so much, so much in advance.
MPFR numbers are represented in binary (base 2). In this system, the only numbers that can be represented exactly have the form N·2k, where N and k are integers. Neither 0.2 = 1/5 nor 0.06 = 3/50 have this form, so that they are approximated with some small error. When you add these variables, you are seeing a consequence of this error (there may be also another error in the addition operation since in binary these numbers have many nonzero digits, unlike in decimal).
This is the same issue as the one described in: Is floating point math broken?
EDIT:
To answer the question in comment "Is there a way to avoid this situation?", no, there is no way to avoid this situation in practice, except in very specific cases. For instance, if all your numbers (inputs and results of each intermediate operations) are decimal numbers, representable with a small enough number of digits, you can use a decimal arithmetic (but MPFR can't do that). Computer algebra systems may help in some cases. There's also iRRAM... I'll come back to it later.
However, there are solutions to attempt to hide issues with numerical errors. You need to estimate the maximum possible error on a computed value. With an error analysis, you can obtain rigorous bounds, but this may be difficult or take time to do. Note that rigorous bounds are pessimistic in general, but if you use arbitrary precision (e.g. with MPFR), this is less an issue. The analysis can be done dynamically with interval arithmetic (still pessimistic, even worse). But perhaps a simple estimate is sufficient for you. Once you have an estimate of the maximum error:
For the output, choose the number of displayed digits so that the error is less than the weight of the last displayed digit.
For discontinuous functions (e.g. equality test, floor, ceil): if the distance between the computed value and a discontinuity point is less than the maximum error, assume that the actual value is equal to the discontinuity point. Note that this is just a heuristic, but if it fails (this may remain unnoticed and will probably invalidate your estimate), this means that you have not done your computations with enough precision.
Note: MPFR won't do that for you. But you can write code to take these rules into account.
The iRRAM package, which is based on MPFR, can track the error in a rigorous way (like with interval arithmetic) and automatically redo all the computations in a higher precision if it notices that the accuracy is too low. However, if some mathematical result is a discontinuity point, iRRAM won't help. In particular, it cannot provide a rigorous equality test.
Finally, I suggest that you have a look at Goldberg's paper What Every Computer Scientist Should Know About Floating-Point Arithmetic, in particular the notion of cancellation.

Pytorch copying inexact value of numpy floating point number

I'm converting a floating point number (or numpy array) to Pytorch tensor and it seems to be copying the inexact value to the tensor. The error comes in the 8th significant digit and afterwards. This is significant (no-pun intended) for my work as I deal with chaotic dynamics which is very sensitive towards the slight change in the initial conditions.
I'm already using torch.set_printoptions(precision=16) to print 16 significant digits.
np_x = state
print(np_x)
x = torch.tensor(np_x,requires_grad=True,dtype=torch.float32)
print(x.data[0])
and the output is :
0.7575408585008059
tensor(0.7575408816337585)
It would be helpful to know what is going wrong or how it could be resolved ?
Because you're using float32 dtype. If you convert these two numbers to binary, you will find they are actually the same. Strictly speaking, the most accurate representations of those two numbers in float32 format are the same.
0.7575408585008059
Most accurate representation = 7.57540881633758544921875E-1
0.7575408816337585
Most accurate representation = 7.57540881633758544921875E-1
Binary: 00111111 01000001 11101110 00110011

How are quantized DCT coeffiecients serialised in JPEG?

I've read in dozens of articles, scientific papers, and toy implementations that the steps in JPEG compression are roughly as follows
Take 8x8 DCT
Divide by quantization matrix
Round to integers
Run-length & Hufmann
And then the inverse is pretty much the same. What is left out in everything on the topic I've found so far is the magnitude of the data and the corresponding serialization.
It appears implicitly assumed that all the coefficients are stored as unsigned bytes. However, as I understand it, the DC coefficient is in the range 0-255, while the AC coefficients can be negative. Are the AC coefficients in the range ±255, or ±127, or something else?
What is the common way to store these coefficients in a compact way?
The first-hand source to read is of course the ITU-T T.81 standard document.
Looks like the first Google link leads to a paywall.. it's on the w3 site, though: https://www.w3.org/Graphics/JPEG/itu-t81.pdf
Take 8-bit input samples (0..255)
Subtract 128 (-128..127)
Do N*N fDCT, where N=8
Output can have log2(N)+8 bits = 11 bits (-1024..1023)
DC coefficients are stored as a difference, so they can have 12 bits.
The encoding process depends upon whether you have a sequential scan or a progressive scan. The details of the encoding process are too complicated to fit within an answer here.
I highly recommend this book:
https://www.amazon.com/Compressed-Image-File-Formats-JPEG/dp/0201604434/ref=sr_1_2?ie=UTF8&qid=1531091178&sr=8-2&keywords=JPEG&dpID=5168QFRTslL&preST=_SX258_BO1,204,203,200_QL70_&dpSrc=srch
It is the only source I know of that explains JPEG end-to-end in plain English.

tensorflow float32 decimal precision

when trying Tensorflow intro i came across the following code
w=tf.Variable(.3,tf.float32)
b=tf.Variable(-.3,tf.float32)
while printing this values it gives following output
print(sess.run(w))
print(sess.run(b))
print(sess.run([w]))
print(sess.run([b]))
Output
-0.3
-0.3
[0.30000001]
[-0.30000001]
why while print as array it gives extra floating point precision?
Is there any documentation related this topic?
Here is a great resource to answer this question. To paraphrase the first paragraph on that web page :
TensorFlow isn't broken, it's doing floating point math. Computers can only natively store integers, so they need some way of representing decimal numbers. This representation comes with some degree of inaccuracy. That's why, more often than not, .3 == .30000001.

Mathematical analysis of a sound sample (as an array of numbers)

I need to find the frequency of a sample, stored (in vb) as an array of byte. Sample is a sine wave, known frequency, so I can check), but the numbers are a bit odd, and my maths-foo is weak.
Full range of values 0-255. 99% of numbers are in range 235 to 245, but there are some outliers down to 0 and 1, and up to 255 in the remaining 1%.
How do I normalise this to remove outliers, (calculating the 235-245 interval as it may change with different samples), and how do I then calculate zero-crossings to get the frequency?
Apologies if this description is rubbish!
The FFT is probably the best answer, but if you really want to do it by your method, try this:
To normalize, first make a histogram to count how many occurrances of each value from 0 to 255. Then throw out X percent of the values from each end with something like:
for (i=lower=0;i< N*(X/100); lower++)
i+=count[lower];
//repeat in other direction for upper
Now normalize with
A[i] = 255*(A[i]-lower)/(upper-lower)-128
Throw away results outside the -128..127 range.
Now you can count zero crossings. To make sure you are not fooled by noise, you might want to keep track of the slope over the last several points, and only count crossings when the average slope is going the right way.
The standard method to attack this problem is to consider one block of data, hopefully at least twice the actual frequency (taking more data isn't bad, so it's good to overestimate a bit), then take the FFT and guess that the frequency corresponds to the largest number in the resulting FFT spectrum.
By the way, very similar problems have been asked here before - you could search for those answers as well.
Use the Fourier transform, it's much more noise insensitive than counting zero crossings
Edit: #WaveyDavey
I found an F# library to do an FFT: From here
As it turns out, the best free
implementation that I've found for F#
users so far is still the fantastic
FFTW library. Their site has a
precompiled Windows DLL. I've written
minimal bindings that allow
thread-safe access to FFTW from F#,
with both guru and simple interfaces.
Performance is excellent, 32-bit
Windows XP Pro is only up to 35%
slower than 64-bit Linux.
Now I'm sure you can call F# lib from VB.net, C# etc, that should be in their docs
If I understood well from your description, what you have is a signal which is a combination of a sine plus a constant plus some random glitches. Say, like
x[n] = A*sin(f*n + phi) + B + N[n]
where N[n] is the "glitch" noise you want to get rid of.
If the glitches are one-sample long, you can remove them using a median filter which has to be bigger than the glitch length. On both sides of the glitch. Glitches of length 1, mean you will have enough with a median of 3 samples of length.
y[n] = median3(x[n])
The median is computed so: Take the samples of x you want to filter (x[n-1],x[n],x[n+1]), sort them, and your output is the middle one.
Now that the noise signal is away, get rid of the constant signal. I understand the buffer is of a limited and known length, so you can just compute the mean of the whole buffer. Substract it.
Now you have your single sinus signal. You can now compute the fundamental frequency by counting zero crossings. Count the amount of samples above 0 in which the former sample was below 0. The period is the total amount of samples of your buffer divided by this, and the frequency is the oposite (1/x) of the period.
Although I would go with the majority and say that it seems like what you want is an fft solution (fft algorithm is pretty quick), if fft is not the answer for whatever reason you may want to try fitting a sine curve to the data using a fitting program and reading off the fitted frequency.
Using Fityk, you can load the data, and fit to a*sin(b*x-c) where 2*pi/b will give you the frequency after fitting.
Fityk can be used from a gui, from a command-line for scripting and has a C++ API so could be included in your programs directly.
I googled for "basic fft". Visual Basic FFT Your question screams FFT, but be careful, using FFT without understanding even a little bit about DSP can lead results that you don't understand or don't know where they come from.
get the Frequency Analyzer at http://www.relisoft.com/Freeware/index.htm and run it and look at the code.