Numpy creating logical array in the presence of NaNs - numpy

I have an array x, from which I would like to extract a logical mask. x contains nan values, and the mask operation raises a warning, which is what I am trying to avoid.
Here is my code:
import numpy as np
x = np.array([[0, 1], [2.0, np.nan]])
mask = np.isfinite(x) & (x > 0)
The resulting mask is correct (array([[False, True], [ True, False]], dtype=bool)), but a warning is raised:
__main__:1: RuntimeWarning: invalid value encountered in greater
How can I construct the mask in a way that avoids comparing against NaNs? I am not trying to suppress the warning (which I know how to do).

We could do it in two steps - Create the mask of finite ones and then use the same mask to index into itself and also to select the valid mask of remaining finite elements off x for testing and setting into the remaining elements in that mask. So, we would have an implementation like so -
In [35]: x
Out[35]:
array([[ 0., 1.],
[ 2., nan]])
In [36]: mask = np.isfinite(x)
In [37]: mask[mask] = x[mask]>0
In [38]: mask
Out[38]:
array([[False, True],
[ True, False]], dtype=bool)

Looks like masked arrays works with this case:
In [214]: x = np.array([[0, 1], [2.0, np.nan]])
In [215]: xm = np.ma.masked_invalid(x)
In [216]: xm
Out[216]:
masked_array(data =
[[0.0 1.0]
[2.0 --]],
mask =
[[False False]
[False True]],
fill_value = 1e+20)
In [217]: xm>0
Out[217]:
masked_array(data =
[[False True]
[True --]],
mask =
[[False False]
[False True]],
fill_value = 1e+20)
In [218]: _.data
Out[218]:
array([[False, True],
[ True, False]], dtype=bool)
But other than propagating the masking I don't know how it handles element by element operations like this. The usual fill and compressed steps don't seem relevant.

Related

How do you concatenate several 2D arrays in numpy?

I would like
np.concatenate((np.array([[5,5],[2,3]]),np.array([[6,4],[7,8]])))
to yield
[ [[5,5],[2,3]], [[6,4],[7,8]] ]
Concatenate doesn't do the trick, but I am lost on how else to do it!
you can use numpy.stack() or numpy.append() (I suggest append if you have a large code). just pay attention it is the append of numpy. not built-in append of python.
>>> import numpy as np
>>> a = np.array([[5,5],[2,3]])
>>> b = np.array([[6,4],[7,8]])
>>> np.append([a], [b], axis = 0)
# answer:
array([[[5, 5],
[2, 3]],
[[6, 4],
[7, 8]]])
now if we go with np.stack():
>>> d = np.stack((a,b))
>>> c == d
# answer:
array([[[ True, True],
[ True, True]],
[[ True, True],
[ True, True]]])
as you can see they are the same.
you can see the user guide of numpy.append here and user guide of numpy.vstack here.
for anyone wondering np.stack((a,b)) does the trick :)

Fill nan in numpy array

Is there a straight forward way of filling nan values in a numpy array when the left and right non nan values match?
For example, if I have an array that has False, False , NaN, NaN, False, I want the NaN values to also be False. If the left and right values do not match, I want it to keep the NaN
Your first task is to reliably identify the np.nan elements. Because it's a unique float value, testing isn't trivail. np.isnan is the best numpy tool.
To mix boolean and float (np.nan) you have to use object dtype:
In [68]: arr = np.array([False, False, np.nan, np.nan, False],object)
In [69]: arr
Out[69]: array([False, False, nan, nan, False], dtype=object)
converting to float changes the False to 0 (and True to 1):
In [70]: arr.astype(float)
Out[70]: array([ 0., 0., nan, nan, 0.])
np.isnan is a good test for nan, but it only works on floats:
In [71]: np.isnan(arr)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-71-25d2f1dae78d> in <module>
----> 1 np.isnan(arr)
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
In [72]: np.isnan(arr.astype(float))
Out[72]: array([False, False, True, True, False])
You could test the object array (or a list) with a helper function and a list comprehension:
In [73]: def foo(x):
...: try:
...: return np.isnan(x)
...: except TypeError:
...: return x
...:
In [74]: [foo(x) for x in arr]
Out[74]: [False, False, True, True, False]
Having reliably identified the nan values, you can then apply the before/after logic. I'm not sure if it's easier with lists or array (your logic isn't entirely clear).

a possible bug in numpy.isclose when comparing matrices with nans

consider the next piece of code:
In [90]: m1 = np.matrix([1,2,3], dtype=np.float32)
In [91]: m2 = np.matrix([1,2,3], dtype=np.float32)
In [92]: m3 = np.matrix([1,2,'nan'], dtype=np.float32)
In [93]: np.isclose(m1, m2, equal_nan=True)
Out[93]: matrix([[ True, True, True]], dtype=bool)
In [94]: np.isclose(m1, m3, equal_nan=True)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-94-5d2b979bc263> in <module>()
----> 1 np.isclose(m1, m3, equal_nan=True)
/usr/local/lib/python2.7/dist-packages/numpy/core/numeric.pyc in isclose(a, b, rtol, atol, equal_nan)
2571 # Ideally, we'd just do x, y = broadcast_arrays(x, y). It's in
2572 # lib.stride_tricks, though, so we can't import it here.
-> 2573 x = x * ones_like(cond)
2574 y = y * ones_like(cond)
2575 # Avoid subtraction with infinite/nan values...
/usr/local/lib/python2.7/dist-packages/numpy/matrixlib/defmatrix.pyc in __mul__(self, other)
341 if isinstance(other, (N.ndarray, list, tuple)) :
342 # This promotes 1-D vectors to row vectors
--> 343 return N.dot(self, asmatrix(other))
344 if isscalar(other) or not hasattr(other, '__rmul__') :
345 return N.dot(self, other)
ValueError: shapes (1,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)
when comparing arrays with nans it's working as expected:
In [95]: np.isclose(np.array(m1), np.array(m3), equal_nan=True)
Out[95]: array([[ True, True, False]], dtype=bool)
why is np.isclose failing? from the documentation it seems that it should work
thanks
The problem comes from np.nan == np.nan, which is False in the float logic.
In [39]: np.nan == np.nan
Out[39]: False
The `equal_nan` parameter is to force two `nan` values to be considered as equal , not to consider any value to be equal to `nan`.
In [37]: np.isclose(m3,m3)
Out[37]: array([ True, True, False], dtype=bool)
In [38]: np.isclose(m3,m3,equal_nan=True)
Out[38]: array([ True, True, True], dtype=bool)

Setting a value in masked location with NaNs present in numpy

I have an array with NaNs, say
>>> a = np.random.randn(3, 3)
>>> a[1, 1] = a[2, 2] = np.nan
>>> a
array([[-1.68425874, 0.65435007, 0.55068277],
[ 0.71726307, nan, -0.09614409],
[-1.45679335, -0.12772348, nan]])
I would like to set negative numbers in this array to -1. Doing this the "straightforward" way results in a warning, which I am trying to avoid:
>>> a[a < 0] = -1
__main__:1: RuntimeWarning: invalid value encountered in less
>>> a
array([[-1. , 0.65435007, 0.55068277],
[ 0.71726307, nan, -1. ],
[-1. , -1. , nan]])
Applying AND to the masks results in the same warning because of course a < 0 is computed as a separate temp array:
>>> n = ~np.isnan(a)
>>> a[n & (a < 0)] = -1
__main__:1: RuntimeWarning: invalid value encountered in less
When I try to apply a mask the nans out of a, the masked portion is not written back to the original array:
>>> n = ~np.isnan(a)
>>> a[n][a[n] < 0] = -1
>>> a
array([[-1.68425874, 0.65435007, 0.55068277],
[ 0.71726307, nan, -0.09614409],
[-1.45679335, -0.12772348, nan]])
The only way I could figure out of solving this is by using a gratuitous intermediate masked version of a:
>>> n = ~np.isnan(a)
>>> b = a[n]
>>> b[b < 0] = -1
>>> a[n] = b
>>> a
array([[-1. , 0.65435007, 0.55068277],
[ 0.71726307, nan, -1. ],
[-1. , -1. , nan]])
Is there a simpler way to perform this masked assignment with the presence of NaNs? I would like to solve this without the use of masked arrays if possible.
NOTE
The snippets above are best run with
import numpy as np
import warnings
np.seterr(all='warn')
warnings.simplefilter("always")
as per https://stackoverflow.com/a/30496556/2988730.
If you want to avoid that warning occurring at a < 0 with a containing NaNs, I would think alternative ways would involve using flattened or row-column indices of non-Nan positions and then performing the comparison. Thus, we would have two approaches with that philosophy.
One with flattened indices -
idx = np.flatnonzero(~np.isnan(a))
a.ravel()[idx[a.ravel()[idx] < 0]] = -1
Another with subscripted-indices -
r,c = np.nonzero(~np.isnan(a))
mask = a[r,c] < 0
a[r[mask],c[mask]] = -1
You can suppress the warning temporarily, is this what you're after?
In [9]: a = np.random.randn(3, 3)
In [10]: a[1, 1] = a[2, 2] = np.nan
In [11]: with np.errstate(invalid='ignore'):
....: a[a < 0] = -1
....:
Poking around the np.nan... functions I found np.nan_to_num
In [569]: a=np.arange(9.).reshape(3,3)-5
In [570]: a[[1,2],[1,2]]=np.nan
In [571]: a
Out[571]:
array([[ -5., -4., -3.],
[ -2., nan, 0.],
[ 1., 2., nan]])
In [572]: np.nan_to_num(a) # replace nan with 0
Out[572]:
array([[-5., -4., -3.],
[-2., 0., 0.],
[ 1., 2., 0.]])
In [573]: np.nan_to_num(a)<0 # and safely do the <
Out[573]:
array([[ True, True, True],
[ True, False, False],
[False, False, False]], dtype=bool)
In [574]: a[np.nan_to_num(a)<0]=-1
In [575]: a
Out[575]:
array([[ -1., -1., -1.],
[ -1., nan, 0.],
[ 1., 2., nan]])
Looking at the nan_to_num code, it looks like it uses a masked copyto:
In [577]: a1=a.copy(); np.copyto(a1, 0.0, where=np.isnan(a1))
In [578]: a1
Out[578]:
array([[-1., -1., -1.],
[-1., 0., 0.],
[ 1., 2., 0.]])
So it's like your version with the 'gratuitous' mask, but it's hidden in the function.
np.place, np.putmask are other functions that use a mask.

Creating matrix out of an array of categories in numpy

I have a length-n numpy array, y, of integers in the range [0...k-1]. From this, I would like to create an n-by-k numpy matrix M, where M[i,j] is 1 if y[i]==j, and 0 else.
What is the best way to do this in numpy?
Use broadcasting:
a = np.array([1, 2, 3, 1, 2, 2, 3, 0])
m = a[:, None] == np.arange(max(a)+1)
the result is:
array([[False, True, False, False],
[False, False, True, False],
[False, False, False, True],
[False, True, False, False],
[False, False, True, False],
[False, False, True, False],
[False, False, False, True],
[ True, False, False, False]], dtype=bool)
Or create a zero array and fill, I think it's faster:
m2 = np.zeros((len(a), a.max()+1), np.bool)
m2[np.arange(len(a)), a] = True
print m2
This is maybe a bit out there, but its a pretty extensible solution and at least worth noting. If you've already got scikit-learn, the DictVectorizer class is used to transform categorical features in a dataset to column-wise binary representations just like you described:
import numpy as np
from sklearn.feature_extraction import DictVectorizer
# starting with your numpy array
y = np.array([1, 2, 3, 1, 2, 2, 3, 0])
# transform the array to a list of dicts, with original
# int values now as strings, and a throw-away key ''
y_dict = [{'':str(x)} for x in y.tolist()]
# create the vectorizer and transform the list of dicts
vec = DictVectorizer(sparse=False, dtype=int)
M = vec.fit_transform(y_dict)
print M
[[0 1 0 0]
[0 0 1 0]
[0 0 0 1]
[0 1 0 0]
[0 0 1 0]
[0 0 1 0]
[0 0 0 1]
[1 0 0 0]]
Again, probably overkill but it's kind of cute and I thought I'd throw it out there.