Numpy creating logical array in the presence of NaNs

I have an array x, from which I would like to extract a logical mask. x contains nan values, and the mask operation raises a warning, which is what I am trying to avoid.
Here is my code:
import numpy as np
x = np.array([[0, 1], [2.0, np.nan]])
mask = np.isfinite(x) & (x > 0)
The resulting mask is correct (array([[False, True], [ True, False]], dtype=bool)), but a warning is raised:
__main__:1: RuntimeWarning: invalid value encountered in greater
How can I construct the mask in a way that avoids comparing against NaNs? I am not trying to suppress the warning (which I know how to do).

We could do it in two steps - Create the mask of finite ones and then use the same mask to index into itself and also to select the valid mask of remaining finite elements off x for testing and setting into the remaining elements in that mask. So, we would have an implementation like so -
In [35]: x
array([[ 0., 1.],
[ 2., nan]])
In [36]: mask = np.isfinite(x)
In [37]: mask[mask] = x[mask]>0
In [38]: mask
array([[False, True],
[ True, False]], dtype=bool)

Looks like masked arrays works with this case:
In [214]: x = np.array([[0, 1], [2.0, np.nan]])
In [215]: xm =
In [216]: xm
masked_array(data =
[[0.0 1.0]
[2.0 --]],
mask =
[[False False]
[False True]],
fill_value = 1e+20)
In [217]: xm>0
masked_array(data =
[[False True]
[True --]],
mask =
[[False False]
[False True]],
fill_value = 1e+20)
In [218]:
array([[False, True],
[ True, False]], dtype=bool)
But other than propagating the masking I don't know how it handles element by element operations like this. The usual fill and compressed steps don't seem relevant.


How do you concatenate several 2D arrays in numpy?

I would like
to yield
[ [[5,5],[2,3]], [[6,4],[7,8]] ]
Concatenate doesn't do the trick, but I am lost on how else to do it!
you can use numpy.stack() or numpy.append() (I suggest append if you have a large code). just pay attention it is the append of numpy. not built-in append of python.
>>> import numpy as np
>>> a = np.array([[5,5],[2,3]])
>>> b = np.array([[6,4],[7,8]])
>>> np.append([a], [b], axis = 0)
# answer:
array([[[5, 5],
[2, 3]],
[[6, 4],
[7, 8]]])
now if we go with np.stack():
>>> d = np.stack((a,b))
>>> c == d
# answer:
array([[[ True, True],
[ True, True]],
[[ True, True],
[ True, True]]])
as you can see they are the same.
you can see the user guide of numpy.append here and user guide of numpy.vstack here.
for anyone wondering np.stack((a,b)) does the trick :)

Fill nan in numpy array

Is there a straight forward way of filling nan values in a numpy array when the left and right non nan values match?
For example, if I have an array that has False, False , NaN, NaN, False, I want the NaN values to also be False. If the left and right values do not match, I want it to keep the NaN
Your first task is to reliably identify the np.nan elements. Because it's a unique float value, testing isn't trivail. np.isnan is the best numpy tool.
To mix boolean and float (np.nan) you have to use object dtype:
In [68]: arr = np.array([False, False, np.nan, np.nan, False],object)
In [69]: arr
Out[69]: array([False, False, nan, nan, False], dtype=object)
converting to float changes the False to 0 (and True to 1):
In [70]: arr.astype(float)
Out[70]: array([ 0., 0., nan, nan, 0.])
np.isnan is a good test for nan, but it only works on floats:
In [71]: np.isnan(arr)
TypeError Traceback (most recent call last)
<ipython-input-71-25d2f1dae78d> in <module>
----> 1 np.isnan(arr)
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
In [72]: np.isnan(arr.astype(float))
Out[72]: array([False, False, True, True, False])
You could test the object array (or a list) with a helper function and a list comprehension:
In [73]: def foo(x):
...: try:
...: return np.isnan(x)
...: except TypeError:
...: return x
In [74]: [foo(x) for x in arr]
Out[74]: [False, False, True, True, False]
Having reliably identified the nan values, you can then apply the before/after logic. I'm not sure if it's easier with lists or array (your logic isn't entirely clear).

a possible bug in numpy.isclose when comparing matrices with nans

consider the next piece of code:
In [90]: m1 = np.matrix([1,2,3], dtype=np.float32)
In [91]: m2 = np.matrix([1,2,3], dtype=np.float32)
In [92]: m3 = np.matrix([1,2,'nan'], dtype=np.float32)
In [93]: np.isclose(m1, m2, equal_nan=True)
Out[93]: matrix([[ True, True, True]], dtype=bool)
In [94]: np.isclose(m1, m3, equal_nan=True)
ValueError Traceback (most recent call last)
<ipython-input-94-5d2b979bc263> in <module>()
----> 1 np.isclose(m1, m3, equal_nan=True)
/usr/local/lib/python2.7/dist-packages/numpy/core/numeric.pyc in isclose(a, b, rtol, atol, equal_nan)
2571 # Ideally, we'd just do x, y = broadcast_arrays(x, y). It's in
2572 # lib.stride_tricks, though, so we can't import it here.
-> 2573 x = x * ones_like(cond)
2574 y = y * ones_like(cond)
2575 # Avoid subtraction with infinite/nan values...
/usr/local/lib/python2.7/dist-packages/numpy/matrixlib/defmatrix.pyc in __mul__(self, other)
341 if isinstance(other, (N.ndarray, list, tuple)) :
342 # This promotes 1-D vectors to row vectors
--> 343 return, asmatrix(other))
344 if isscalar(other) or not hasattr(other, '__rmul__') :
345 return, other)
ValueError: shapes (1,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)
when comparing arrays with nans it's working as expected:
In [95]: np.isclose(np.array(m1), np.array(m3), equal_nan=True)
Out[95]: array([[ True, True, False]], dtype=bool)
why is np.isclose failing? from the documentation it seems that it should work
The problem comes from np.nan == np.nan, which is False in the float logic.
In [39]: np.nan == np.nan
Out[39]: False
The `equal_nan` parameter is to force two `nan` values to be considered as equal , not to consider any value to be equal to `nan`.
In [37]: np.isclose(m3,m3)
Out[37]: array([ True, True, False], dtype=bool)
In [38]: np.isclose(m3,m3,equal_nan=True)
Out[38]: array([ True, True, True], dtype=bool)

Setting a value in masked location with NaNs present in numpy

I have an array with NaNs, say
>>> a = np.random.randn(3, 3)
>>> a[1, 1] = a[2, 2] = np.nan
>>> a
array([[-1.68425874, 0.65435007, 0.55068277],
[ 0.71726307, nan, -0.09614409],
[-1.45679335, -0.12772348, nan]])
I would like to set negative numbers in this array to -1. Doing this the "straightforward" way results in a warning, which I am trying to avoid:
>>> a[a < 0] = -1
__main__:1: RuntimeWarning: invalid value encountered in less
>>> a
array([[-1. , 0.65435007, 0.55068277],
[ 0.71726307, nan, -1. ],
[-1. , -1. , nan]])
Applying AND to the masks results in the same warning because of course a < 0 is computed as a separate temp array:
>>> n = ~np.isnan(a)
>>> a[n & (a < 0)] = -1
__main__:1: RuntimeWarning: invalid value encountered in less
When I try to apply a mask the nans out of a, the masked portion is not written back to the original array:
>>> n = ~np.isnan(a)
>>> a[n][a[n] < 0] = -1
>>> a
array([[-1.68425874, 0.65435007, 0.55068277],
[ 0.71726307, nan, -0.09614409],
[-1.45679335, -0.12772348, nan]])
The only way I could figure out of solving this is by using a gratuitous intermediate masked version of a:
>>> n = ~np.isnan(a)
>>> b = a[n]
>>> b[b < 0] = -1
>>> a[n] = b
>>> a
array([[-1. , 0.65435007, 0.55068277],
[ 0.71726307, nan, -1. ],
[-1. , -1. , nan]])
Is there a simpler way to perform this masked assignment with the presence of NaNs? I would like to solve this without the use of masked arrays if possible.
The snippets above are best run with
import numpy as np
import warnings
as per
If you want to avoid that warning occurring at a < 0 with a containing NaNs, I would think alternative ways would involve using flattened or row-column indices of non-Nan positions and then performing the comparison. Thus, we would have two approaches with that philosophy.
One with flattened indices -
idx = np.flatnonzero(~np.isnan(a))
a.ravel()[idx[a.ravel()[idx] < 0]] = -1
Another with subscripted-indices -
r,c = np.nonzero(~np.isnan(a))
mask = a[r,c] < 0
a[r[mask],c[mask]] = -1
You can suppress the warning temporarily, is this what you're after?
In [9]: a = np.random.randn(3, 3)
In [10]: a[1, 1] = a[2, 2] = np.nan
In [11]: with np.errstate(invalid='ignore'):
....: a[a < 0] = -1
Poking around the np.nan... functions I found np.nan_to_num
In [569]: a=np.arange(9.).reshape(3,3)-5
In [570]: a[[1,2],[1,2]]=np.nan
In [571]: a
array([[ -5., -4., -3.],
[ -2., nan, 0.],
[ 1., 2., nan]])
In [572]: np.nan_to_num(a) # replace nan with 0
array([[-5., -4., -3.],
[-2., 0., 0.],
[ 1., 2., 0.]])
In [573]: np.nan_to_num(a)<0 # and safely do the <
array([[ True, True, True],
[ True, False, False],
[False, False, False]], dtype=bool)
In [574]: a[np.nan_to_num(a)<0]=-1
In [575]: a
array([[ -1., -1., -1.],
[ -1., nan, 0.],
[ 1., 2., nan]])
Looking at the nan_to_num code, it looks like it uses a masked copyto:
In [577]: a1=a.copy(); np.copyto(a1, 0.0, where=np.isnan(a1))
In [578]: a1
array([[-1., -1., -1.],
[-1., 0., 0.],
[ 1., 2., 0.]])
So it's like your version with the 'gratuitous' mask, but it's hidden in the function., np.putmask are other functions that use a mask.

Creating matrix out of an array of categories in numpy

I have a length-n numpy array, y, of integers in the range [0...k-1]. From this, I would like to create an n-by-k numpy matrix M, where M[i,j] is 1 if y[i]==j, and 0 else.
What is the best way to do this in numpy?
Use broadcasting:
a = np.array([1, 2, 3, 1, 2, 2, 3, 0])
m = a[:, None] == np.arange(max(a)+1)
the result is:
array([[False, True, False, False],
[False, False, True, False],
[False, False, False, True],
[False, True, False, False],
[False, False, True, False],
[False, False, True, False],
[False, False, False, True],
[ True, False, False, False]], dtype=bool)
Or create a zero array and fill, I think it's faster:
m2 = np.zeros((len(a), a.max()+1), np.bool)
m2[np.arange(len(a)), a] = True
print m2
This is maybe a bit out there, but its a pretty extensible solution and at least worth noting. If you've already got scikit-learn, the DictVectorizer class is used to transform categorical features in a dataset to column-wise binary representations just like you described:
import numpy as np
from sklearn.feature_extraction import DictVectorizer
# starting with your numpy array
y = np.array([1, 2, 3, 1, 2, 2, 3, 0])
# transform the array to a list of dicts, with original
# int values now as strings, and a throw-away key ''
y_dict = [{'':str(x)} for x in y.tolist()]
# create the vectorizer and transform the list of dicts
vec = DictVectorizer(sparse=False, dtype=int)
M = vec.fit_transform(y_dict)
print M
[[0 1 0 0]
[0 0 1 0]
[0 0 0 1]
[0 1 0 0]
[0 0 1 0]
[0 0 1 0]
[0 0 0 1]
[1 0 0 0]]
Again, probably overkill but it's kind of cute and I thought I'd throw it out there.