a possible bug in numpy.isclose when comparing matrices with nans

a possible bug in numpy.isclose when comparing matrices with nans - numpy

consider the next piece of code:
In [90]: m1 = np.matrix([1,2,3], dtype=np.float32)
In [91]: m2 = np.matrix([1,2,3], dtype=np.float32)
In [92]: m3 = np.matrix([1,2,'nan'], dtype=np.float32)
In [93]: np.isclose(m1, m2, equal_nan=True)
Out[93]: matrix([[ True, True, True]], dtype=bool)
In [94]: np.isclose(m1, m3, equal_nan=True)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-94-5d2b979bc263> in <module>()
----> 1 np.isclose(m1, m3, equal_nan=True)
/usr/local/lib/python2.7/dist-packages/numpy/core/numeric.pyc in isclose(a, b, rtol, atol, equal_nan)
2571 # Ideally, we'd just do x, y = broadcast_arrays(x, y). It's in
2572 # lib.stride_tricks, though, so we can't import it here.
-> 2573 x = x * ones_like(cond)
2574 y = y * ones_like(cond)
2575 # Avoid subtraction with infinite/nan values...
/usr/local/lib/python2.7/dist-packages/numpy/matrixlib/defmatrix.pyc in __mul__(self, other)
341 if isinstance(other, (N.ndarray, list, tuple)) :
342 # This promotes 1-D vectors to row vectors
--> 343 return N.dot(self, asmatrix(other))
344 if isscalar(other) or not hasattr(other, '__rmul__') :
345 return N.dot(self, other)
ValueError: shapes (1,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)
when comparing arrays with nans it's working as expected:
In [95]: np.isclose(np.array(m1), np.array(m3), equal_nan=True)
Out[95]: array([[ True, True, False]], dtype=bool)
why is np.isclose failing? from the documentation it seems that it should work
thanks

The problem comes from np.nan == np.nan, which is False in the float logic.
In [39]: np.nan == np.nan
Out[39]: False
The `equal_nan` parameter is to force two `nan` values to be considered as equal , not to consider any value to be equal to `nan`.
In [37]: np.isclose(m3,m3)
Out[37]: array([ True, True, False], dtype=bool)
In [38]: np.isclose(m3,m3,equal_nan=True)
Out[38]: array([ True, True, True], dtype=bool)

Related

Pandas: Find row with a ndarray

I am failry new to panda.
To find all rows with a certain value, I can run
data[data['category'] == 'name']
which would return a Series as expected.
Ony of my column is a 1x2 numpy array. However if I do
data[data['list'] == np.array([0, 0])]
I get ValueError: Lengths must match to compare
How would I find the row with a certain numpy array in it?

You can use apply with lambda function like df[df.list.apply(lambda x: (x == c).all())]
Ex.:
>>> df
list
0 [0, 0]
1 [1, 1]
2 [0, 0]
3 [1, 0]
>>> c
array([0, 0])
>>> df.list.apply(lambda x: x == c)
0 [True, True]
1 [False, False]
2 [True, True]
3 [False, True]
Name: list, dtype: object
>>> df.list.apply(lambda x: (x == c).all())
0 True
1 False
2 True
3 False
Name: list, dtype: bool
>>> df[df.list.apply(lambda x: (x == c).all())]
list
0 [0, 0]
2 [0, 0]

Numpy masked array initialization from another array

Is there a cleaner way to initialize a numpy masked array from a non-ma, with all masked values False, than this?
masked_array = np.ma.masked_array(array, mask=np.zeros_like(array, dtype='bool'))
The duplicate reference to array seems unnecessary and clunky. If you do not give the mask= parameter, the mask defaults to a scalar boolean, which prevents sliced access to the mask.

You should be able to just set the mask to False:
>>> array = np.array([1,2,3])
>>> masked_array = np.ma.masked_array(array, mask=False)
>>> masked_array
masked_array(data = [1 2 3],
mask = [False False False],
fill_value = 999999)
I saw hpaulj’s comment and played around with different ways of solving this issue and comparing performance. I can’t explain the difference, but #hpaulj seems to have a much deeper understanding of how numpy works. Any input on why m3() executes so much faster would be most appreciated.
def origM():
array = np.array([1,2,3])
return np.ma.masked_array(array, mask=np.zeros_like(array, dtype='bool'))
def m():
array = np.array([1,2,3])
return np.ma.masked_array(array, mask=False)
def m2():
array = np.array([1,2,3])
m = np.ma.masked_array(array)
m.mask = False
return m
def m3():
array = np.array([1,2,3])
m = array.view(np.ma.masked_array)
m.mask = False
return m
>>> origM()
masked_array(data = [1 2 3],
mask = [False False False],
fill_value = 999999)
All four return the same result:
>>> m()
masked_array(data = [1 2 3],
mask = [False False False],
fill_value = 999999)
>>> m2()
masked_array(data = [1 2 3],
mask = [False False False],
fill_value = 999999)
>>> m3()
masked_array(data = [1 2 3],
mask = [False False False],
fill_value = 999999)
m3() executes the fastest:
>>> timeit.timeit(origM, number=1000)
0.024451958015561104
>>> timeit.timeit(m, number=1000)
0.0393978749634698
>>> timeit.timeit(m2, number=1000)
0.024049583997111768
>>> timeit.timeit(m3, number=1000)
0.018082750029861927

Fill nan in numpy array

Is there a straight forward way of filling nan values in a numpy array when the left and right non nan values match?
For example, if I have an array that has False, False , NaN, NaN, False, I want the NaN values to also be False. If the left and right values do not match, I want it to keep the NaN

Your first task is to reliably identify the np.nan elements. Because it's a unique float value, testing isn't trivail. np.isnan is the best numpy tool.
To mix boolean and float (np.nan) you have to use object dtype:
In [68]: arr = np.array([False, False, np.nan, np.nan, False],object)
In [69]: arr
Out[69]: array([False, False, nan, nan, False], dtype=object)
converting to float changes the False to 0 (and True to 1):
In [70]: arr.astype(float)
Out[70]: array([ 0., 0., nan, nan, 0.])
np.isnan is a good test for nan, but it only works on floats:
In [71]: np.isnan(arr)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-71-25d2f1dae78d> in <module>
----> 1 np.isnan(arr)
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
In [72]: np.isnan(arr.astype(float))
Out[72]: array([False, False, True, True, False])
You could test the object array (or a list) with a helper function and a list comprehension:
In [73]: def foo(x):
...: try:
...: return np.isnan(x)
...: except TypeError:
...: return x
...:
In [74]: [foo(x) for x in arr]
Out[74]: [False, False, True, True, False]
Having reliably identified the nan values, you can then apply the before/after logic. I'm not sure if it's easier with lists or array (your logic isn't entirely clear).

Numpy creating logical array in the presence of NaNs

I have an array x, from which I would like to extract a logical mask. x contains nan values, and the mask operation raises a warning, which is what I am trying to avoid.
Here is my code:
import numpy as np
x = np.array([[0, 1], [2.0, np.nan]])
mask = np.isfinite(x) & (x > 0)
The resulting mask is correct (array([[False, True], [ True, False]], dtype=bool)), but a warning is raised:
__main__:1: RuntimeWarning: invalid value encountered in greater
How can I construct the mask in a way that avoids comparing against NaNs? I am not trying to suppress the warning (which I know how to do).

We could do it in two steps - Create the mask of finite ones and then use the same mask to index into itself and also to select the valid mask of remaining finite elements off x for testing and setting into the remaining elements in that mask. So, we would have an implementation like so -
In [35]: x
Out[35]:
array([[ 0., 1.],
[ 2., nan]])
In [36]: mask = np.isfinite(x)
In [37]: mask[mask] = x[mask]>0
In [38]: mask
Out[38]:
array([[False, True],
[ True, False]], dtype=bool)

Looks like masked arrays works with this case:
In [214]: x = np.array([[0, 1], [2.0, np.nan]])
In [215]: xm = np.ma.masked_invalid(x)
In [216]: xm
Out[216]:
masked_array(data =
[[0.0 1.0]
[2.0 --]],
mask =
[[False False]
[False True]],
fill_value = 1e+20)
In [217]: xm>0
Out[217]:
masked_array(data =
[[False True]
[True --]],
mask =
[[False False]
[False True]],
fill_value = 1e+20)
In [218]: _.data
Out[218]:
array([[False, True],
[ True, False]], dtype=bool)
But other than propagating the masking I don't know how it handles element by element operations like this. The usual fill and compressed steps don't seem relevant.

Creating matrix out of an array of categories in numpy

I have a length-n numpy array, y, of integers in the range [0...k-1]. From this, I would like to create an n-by-k numpy matrix M, where M[i,j] is 1 if y[i]==j, and 0 else.
What is the best way to do this in numpy?

Use broadcasting:
a = np.array([1, 2, 3, 1, 2, 2, 3, 0])
m = a[:, None] == np.arange(max(a)+1)
the result is:
array([[False, True, False, False],
[False, False, True, False],
[False, False, False, True],
[False, True, False, False],
[False, False, True, False],
[False, False, True, False],
[False, False, False, True],
[ True, False, False, False]], dtype=bool)
Or create a zero array and fill, I think it's faster:
m2 = np.zeros((len(a), a.max()+1), np.bool)
m2[np.arange(len(a)), a] = True
print m2

This is maybe a bit out there, but its a pretty extensible solution and at least worth noting. If you've already got scikit-learn, the DictVectorizer class is used to transform categorical features in a dataset to column-wise binary representations just like you described:
import numpy as np
from sklearn.feature_extraction import DictVectorizer
# starting with your numpy array
y = np.array([1, 2, 3, 1, 2, 2, 3, 0])
# transform the array to a list of dicts, with original
# int values now as strings, and a throw-away key ''
y_dict = [{'':str(x)} for x in y.tolist()]
# create the vectorizer and transform the list of dicts
vec = DictVectorizer(sparse=False, dtype=int)
M = vec.fit_transform(y_dict)
print M
[[0 1 0 0]
[0 0 1 0]
[0 0 0 1]
[0 1 0 0]
[0 0 1 0]
[0 0 1 0]
[0 0 0 1]
[1 0 0 0]]
Again, probably overkill but it's kind of cute and I thought I'd throw it out there.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

a possible bug in numpy.isclose when comparing matrices with nans - numpy

Related

Pandas: Find row with a ndarray

Numpy masked array initialization from another array

Fill nan in numpy array

Numpy creating logical array in the presence of NaNs

Creating matrix out of an array of categories in numpy

Categories

Resources