Different result of same numpy mean calculation on two computers - numpy

I have two computers with python 2.7.2 (MSC v.1500 32 bit (Intel)] on win32) and numpy 1.6.1.
But
numpy.mean(data)
returns
1.13595094681 on my old computer
and
1.13595104218 on my new computer
where
Data = [ 0.20227873 -0.02738848 0.59413314 0.88547146 1.26513398 1.21090782
1.62445402 1.80423951 1.58545554 1.26801944 1.22551131 1.16882968
1.19972098 1.41940248 1.75620842 1.28139281 0.91190684 0.83705413
1.19861531 1.30767155]
In both cases
s=0
for n in data[:20]:
s+=n
print s/20
gives
1.1359509334
Can anyone explain why and how to avoid?
Mads

If you want to avoid any differences between the two, then make them explicitly 32-bit or 64-bit float arrays. NumPy uses several other libraries that may be 32 or 64 bit. Note that rounding can occur in your print statements as well:
>>> import numpy as np
>>> a = [0.20227873, -0.02738848, 0.59413314, 0.88547146, 1.26513398,
1.21090782, 1.62445402, 1.80423951, 1.58545554, 1.26801944,
1.22551131, 1.16882968, 1.19972098, 1.41940248, 1.75620842,
1.28139281, 0.91190684, 0.83705413, 1.19861531, 1.30767155]
>>> x32 = np.array(a, np.float32)
>>> x64 = np.array(a, np.float64)
>>> x32.mean()
1.135951042175293
>>> x64.mean()
1.1359509335
>>> print x32.mean()
1.13595104218
>>> print x64.mean()
1.1359509335
Another point to note is that if you have lower level libraries (e.g., atlas, lapack) that are multi-threaded, then for large arrays, you may have differences in your result regardless, due to possible variable order of operations and floating point precision.
Also, you are at the limit of precision for 32 bit numbers:
>>> x32.sum()
22.719021
>>> np.array(sorted(x32)).sum()
22.719019

This is happening because you have Float32 arrays (single precision). With single precision, the operations are only accurate to 6 decimal place. Hence your results are the same up to the 6th decimal place (after the decimal point, rounding the last digit), but they are not accurate after that. Different architectures/machines/compilers will yield the different results after that. If you want the same results you should use higher precision arrays (e.g. Float64).

Related

Numpy returns different results on windows and unix [duplicate]

This question already has answers here:
numpy array dtype is coming as int32 by default in a windows 10 64 bit machine
(5 answers)
Closed 9 months ago.
The community reviewed whether to reopen this question 9 months ago and left it closed:
Original close reason(s) were not resolved
Given the following code:
import numpy as np
c = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
c = np.array(c)
print((c * c.transpose()).prod())
On my windows machine it returns "-1462091776" (Not sure how it got a negative from all those positives).
On ubuntu it returns "131681894400"
Anyone know what's going on here?
Edit: Apparently this is an overflow problem. (Thanks #rafaelc !)
But it is reproducible (Also thanks to #richardec for testing that)
So now the question becomes.. is this a bug I should report? Who do I report it to?
I have enough comments that I think an "answer" is warranted.
What happened?
Not sure how it got a negative from all those positives
As #rafaelc points out, you ran into an integer overflow. You can read more details at the wikipedia link that was provided.
What caused the overflow?
According to this thread, numpy uses the operating system's C long type as the default dtype for integers. So when you write this line of code:
c = np.array(c)
The dtype defaults to numpy's default integer data type, which is the operating system's C long. The size of a long in Microsoft's C implementation for Windows is 4 bytes (x8 bits/byte = 32 bits), so your dtype defaults to a 32-bit integer.
Why did this calculation overflow?
In [1]: import numpy as np
In [2]: np.iinfo(np.int32)
Out[2]: iinfo(min=-2147483648, max=2147483647, dtype=int32)
The largest number a 32-bit, signed integer data type can represent is 2147483647. If you take a look at your product across just one axis:
In [5]: c * c.T
Out[5]:
array([[ 1, 8, 21],
[ 8, 25, 48],
[21, 48, 81]])
In [6]: (c * c.T).prod(axis=0)
Out[6]: array([ 168, 9600, 81648])
In [7]: 168 * 9600 * 81648
Out[7]: 131681894400
You can see that 131681894400 >> 2147483647 (in mathematics, the notation >> means "is much, much larger"). Since 131681894400 is much larger than the maximum integer the 32-bit long can represent, an overflow occurs.
But it's fine in Linux
In Linux, a long is 8 bytes (x8 bits/byte = 64 bits). Why? Here's an SO thread that discusses this in the comments.
"Is it a bug?"
No, although it's pretty annoying, I'll admit.
For what it's worth, it's usually a good idea to be explicit about your data types, so next time:
c = np.array(c, dtype='int64')
# or
c = np.array(c, dtype=np.int64)
Who do I report a bug to?
Again, this isn't a bug, but if it were, you'd open an issue on the numpy github (where you can also peruse the source code). Somewhere in there is proof of how numpy uses the operating system's default C long, but I don't have it in me to go digging around to find it.

numpy.random.multinomial at version 1.16.6 is 10x faster than later version

Here are codes and result:
python -c "import numpy as np; from timeit import timeit; print('numpy version {}: {:.1f} seconds'.format(np.__version__, timeit('np.random.multinomial(1, [0.1, 0.2, 0.3, 0.4])', number=1000000, globals=globals())))"
numpy version 1.16.6: 1.5 seconds # 10x faster
numpy version 1.18.1: 15.5 seconds
numpy version 1.19.0: 17.4 seconds
numpy version 1.21.4: 15.1 seconds
It is noted that with fixed random seed, the output are the same with different numpy version
python -c "import numpy as np; np.random.seed(0); print(np.__version__); print(np.random.multinomial(1, [0.1, 0.2, 0.3, 0.4], size=10000))" /tmp/tt
Any advice on why numpy version after 1.16.6 is 10x slower?
We have upgraded pandas to latest version 1.3.4, which needs numpy version after 1.16.6
TL;DR: this is a local performance regression caused by the overhead of additional checks in the numpy.random.multinomial function. Very small arrays are strongly impacted due to the relative execution time of the required checks.
Under the hood
A binary search on the Git commits of the Numpy code shows that the performance regression appear the first time in mid April 2019. It can be reproduced in the commit dd77ce3cb but not 7e8e19f9a. There are some build issues for the commit in-between, but with some quick fix we can show that the commit 0f3dd0650 is the first to cause the issue. The commit says that it:
Extend multinomial to allow broadcasting
Fix zipf changes missed in NumPy
Enable 0 as valid input for hypergeometric
A deeper analysis of this commit shows that it modifies the multinomial function defined in Cython file mtrand.pyx to perform two additional following checks:
def multinomial(self, np.npy_intp n, object pvals, size=None):
cdef np.npy_intp d, i, sz, offset
cdef np.ndarray parr, mnarr
cdef double *pix
cdef int64_t *mnix
cdef int64_t ni
d = len(pvals)
parr = <np.ndarray>np.PyArray_FROM_OTF(pvals, np.NPY_DOUBLE, np.NPY_ALIGNED)
pix = <double*>np.PyArray_DATA(parr)
check_array_constraint(parr, 'pvals', CONS_BOUNDED_0_1) # <==========[HERE]
if kahan_sum(pix, d-1) > (1.0 + 1e-12):
raise ValueError("sum(pvals[:-1]) > 1.0")
if size is None:
shape = (d,)
else:
try:
shape = (operator.index(size), d)
except:
shape = tuple(size) + (d,)
multin = np.zeros(shape, dtype=np.int64)
mnarr = <np.ndarray>multin
mnix = <int64_t*>np.PyArray_DATA(mnarr)
sz = np.PyArray_SIZE(mnarr)
ni = n
check_constraint(ni, 'n', CONS_NON_NEGATIVE) # <==========[HERE]
offset = 0
with self.lock, nogil:
for i in range(sz // d):
random_multinomial(self._brng, ni, &mnix[offset], pix, d, self._binomial)
offset += d
return multin
These two checks are required for the code to be robust. However, they are currently pretty expensive considering their purpose.
Indeed, on my machine, the first check is responsible for ~75% of the overhead and the second for ~20%. The checks takes few micro-seconds but since your input is very small, the overhead is huge compared to the computation time.
One workaround to fix this issue is to write a specific Numba function for this since your input array is very small. On my machine, np.random.multinomial in a trivial Numba function results in good performance.
I checked some generators that are under the hood and saw no much change in the timings.
I guessed difference may be due to some overhead, because you are sampling only single value. And it seems to be good hypothesis. When I increased size of the generated random samples to 1000, difference between 1.16.6 and 1.19.2 (my current Numpy version) diminished to ~20%.
python -c "import numpy as np; from timeit import timeit; print('numpy version {}: {:.1f} seconds'.format(np.__version__, timeit('np.random.
multinomial(1, [0.1, 0.2, 0.3, 0.4], size=1000)', number=10000, globals=globals())))"
numpy version 1.16.6: 1.1 seconds
numpy version 1.19.2: 1.3 seconds
Note that both versions have this overhead, just newer version has it much larger. In both versions it is much faster to sample 1000 values once than sample 1 value 1000 times.
They changed by much the code between 1.16.6 and 1.17.0, see for example this commit, it's hard to analyse. Sorry that can't help you better - I propose to make an issue on Numpy's github.

SciPy UnivariateSpline Specifying Axis?

Using scipy.interpolate.interp1d it is possible to pass in a (1080, 4) nd.array and compute an interpolation function for each 'row' in a single command:
spline = interp1d(np.arange(1,5), np.random.random(1080,4), kind='cubic')
I am getting slightly different interpolation results (off the knots) than some existing Fortran code. I believe this is because the SciPy source is using a b-spline and the Fortran code is using splines derived from numerical recipes.
I am attempting to perform the same interpolation using UnivariateSpline with s=0, so InterpolatedUnivariateSpline.
I am able to get this working if I pass the data row by row, i.e. using an iterator to step over all 1080 rows - this is highly inefficient.
Using:
spline = UnivariateSpline(np.arange(1,5).reshape(-1,1), np.random.random(1080,4), s=0, k=3)
I am seeing:
failed in converting 2nd argument `y' of dfitpack.fpcurf0 to C/Fortran array
I believe this is an issue getting the multi-dimensional array into Fitpack? Any insight in how to avoid an iterator? Additionally, any insight into a SciPy interpolation function that matches the one described in numerical recipes (section 3.3, p.120) - You have to type the page number, I can not direct link, it is a Flash viewer...
In older version of SciPy (I observed it in 0.14) the splines returned by interp1d were of relatively poor quality. In versions 0.19 and later, interp1d is consistent with other spline routines, and since it accepts vector inputs, I think that answers the question. Here is the comparison of three spline constructors: the latter two only take one row as input.
from scipy.interpolate import interp1d, UnivariateSpline, splrep, splev
x = np.arange(1, 5)
y = np.random.normal(size=(1080, 4))
spl1 = interp1d(x, y, kind='cubic')
spl2 = UnivariateSpline(x, y[123, :], s=0, k=3)
spl3 = splrep(x, y[123, :], s=0, k=3)
t = 2.345
print(spl1(t)[123], spl2(t), splev(t, spl3))
This prints (with my random numbers)
-0.333973049011 -0.333973049011 -0.333973049011

erratic results for numpy/scipy eigendecompositions

I am finding that scipy.linalg.eig sometimes gives inconsistent results. But not every time.
>>> import numpy as np
>>> import scipy.linalg as lin
>>> modmat=np.random.random((150,150))
>>> modmat=modmat+modmat.T # the data i am interested in is described by real symmetric matrices
>>> d,v=lin.eig(modmat)
>>> dx=d.copy()
>>> vx=v.copy()
>>> d,v=lin.eig(modmat)
>>> np.all(d==dx)
False
>>> np.all(v==vx)
False
>>> e,w=lin.eigh(modmat)
>>> ex=e.copy()
>>> wx=w.copy()
>>> e,w=lin.eigh(modmat)
>>> np.all(e==ex)
True
>>> e,w=lin.eigh(modmat)
>>> np.all(e==ex)
False
While I am not the greatest linear algebra wizard, I do understand that the eigendecomposition is inherently subject to weird rounding errors, but I don't understand why repeating the computation would result in a different value. But my results and reproducibility are varying.
What exactly is the nature of the problem -- well, sometimes the results are acceptably different, and sometimes they aren't. Here are some examples:
>>> d[1]
(9.8986888573772465+0j)
>>> dx[1]
(9.8986888573772092+0j)
The difference above of ~3e-13 does not seem like an enormously big deal. Instead, the real problem (at least for my present project) is that some of the eigenvalues cannot seem to agree on the proper sign.
>>> np.all(np.sign(d)==np.sign(dx))
False
>>> np.nonzero(np.sign(d)!=np.sign(dx))
(array([ 38, 39, 40, 41, 42, 45, 46, 47, 79, 80, 81, 82, 83,
84, 109, 112]),)
>>> d[38]
(-6.4011617320002525+0j)
>>> dx[38]
(6.1888785138080209+0j)
Similar code in MATLAB does not seem to have this problem.
The eigenvalue decompositions satisfy A V = V Lambda, which is all what is guaranteed --- for instance the order of the eigenvalues is not.
Answer to the second part of your question:
Modern compilers/linear algebra libraries produce/contain code that does different things
depending on whether the data is aligned in memory on (e.g.) 16-byte boundaries. This affects rounding error in computations, as floating point operations are done in a different order. Small changes in rounding error can then affect things such as ordering of the eigenvalues if the algorithm (here, LAPACK/xGEEV) does not guarantee numerical stability in this respect.
(If your code is sensitive to things like this, it is incorrect! Running e.g. it on a different platform or different library version would lead to a similar problem.)
The results usually are quasi-deterministic --- for instance you get one of 2 possible results, depending if the array happens to be aligned in memory or not. If you are curious about alignment, check A.__array_interface__['data'][0] % 16.
See http://www.nccs.nasa.gov/images/FloatingPoint_consistency.pdf for more
I think your problem is you are expecting the eigenvalues to be returned in a particular order, and they don't always come out the same. Sort them, and you'll be on your way. If I run your code to generate d and dx with eig I get the following:
>>> np.max(d - dx)
(19.275224236664116+0j)
But...
>>> d_i = np.argsort(d)
>>> dx_i = np.argsort(dx)
>>> np.max(d[d_i] - dx[dx_i])
(1.1368683772161603e-13+0j)

Why the difference between octave's prctile and numpy's percentile?

I've been rewriting a matlab/octave program into numpy and ran across a difference in some resultant values.
This occurs with both the percentile/prctile and the stdard-deviation functions.
In Numpy:
import matplotlib.mlab as ml
import numpy
>>> t = numpy.linspace(0,100, 100)
>>> numpy.percentile(t,95)
95.0
>>> numpy.std(t)
29.157646512850626
>>> ml.prctile(t,95)
95.000000000000014
In Octave:
octave:1> t = linspace(0,100,100)';
octave:2> prctile(t,95)
ans = 95.454545
octave:3> std(t)
ans = 29.304537
Although the array values of 't' are the same, the results are more different than I would suspect.
In the numpy help(numpy.std) they specifically mention that the algorithm is:
std = sqrt(mean(abs(x - x.mean())**2))
So I implemented that in octave and got the exact answer numpy gives. So it seems the std-deviation function differs.
But why/how? And which is correct? (if there is such a thing)
And even prctile/percentile?
Just in case since I'm in Linux aptosid...
GNU Octave, version 3.6.2
numpy.version '1.6.2rc1'
Numpy simply uses a different algorithm when the percentile lies between two data points. Octave, Matlab and R always center it exactly between two points when needed (I believe), numpy does a bit more then that... if you check http://en.wikipedia.org/wiki/Percentile you will see there are a couple of ways to calculate percentiles.
It seems like Octave assumes ddof=1, at least by default, and numpy uses 0 by default:
>>> numpy.std(t, ddof=0)
29.157646512850633
>>> numpy.std(t, ddof=1)
29.304537349375785