numpy.nanmean() on a subclass of numpy.ndarray returns unexpected type - numpy

starting from a popular example
import numpy as np
class TestArray(np.ndarray):
def __new__(subtype, shape, dtype=float, buffer=None, offset=0,
strides=None, order=None):
obj = np.ndarray.__new__(subtype, shape, dtype, buffer, offset, strides,
order)
return obj
obj = TestArray(shape=(3,))
obj[:] = [1, 2, 3]
print type(np.nanmean(obj))
print type(np.nanmean(numpy.array(obj)))
gives the output
<class '__main__.TestArray'>
<type 'numpy.float64'>
I'd rather like numpy.nanmean(obj) to return also a numpy.float64.
Now obviously I can cast to numpy.array, but I don't want to do this.
What do I need to modify in the class definition such that numpy.nanmean() (and probably others) return always the same type as if called with a numpy.ndarray directly?

Related

different dimension numpy array broadcasting issue with '+=' operator

I'm new to numpy, and I have an interesting observation on the broadcasting. When I'm adding a 3x5 array directly to a 3x1 array, and update the original 3x1 array with the result, there is no broadcasting issue.
import numpy as np
total = np.random.uniform(-1,1, size=(3))[:,np.newaxis]
print(f'init = \n {total}')
for i in range(3):
total = total + np.ones(shape=(3,5))
print(f'total_{i} = \n {total}')
However, if i'm using '+=' operator to increment the 3x1 array with the value of 3x5 array, there is a broadcasting issue. May I know which rule of numpy broadcasting did I violate in the latter case?
total = np.random.uniform(-1,1, size=(3))[:,np.newaxis]
print(f'init = \n {total}')
for i in range(3):
total += np.ones(shape=(3,5))
print(f'total_{i} = \n {total}')
Thank you!
hawkoli1987
according to add function overridden in numpy array,
def add(x1, x2, *args, **kwargs): # real signature unknown; NOTE: unreliably restored from __doc__
"""
add(x1, x2, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True[, signature, extobj])
Add arguments element-wise.
Parameters
----------
x1, x2 : array_like
The arrays to be added.
If ``x1.shape != x2.shape``, they must be broadcastable to a common
shape (which becomes the shape of the output).
out : ndarray, None, or tuple of ndarray and None, optional
A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated array is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
add function returns a freshly-allocated array when dimensions of arrays are different.
In python, a=a+b and a+=b aren't absolutly same. + calls __add__ function and += calls __iadd__.
a = np.array([1, 2])
b = np.array([3, 4])
first_id = id(a)
a = a + b
second_id = id(a)
assert first_id == second_id # False
a = np.array([1, 2])
b = np.array([3, 4])
first_id = id(a)
a += b
second_id = id(a)
assert first_id == second_id # True
+= function does not create new objects and updates the value to the same address.
numpy add function updates an existing instance when adding an array of the same dimensions, but returns a new object when adding arrays of different dimensions. So when use += functions, the two functions must have the same dimension because the results of the add function must be updated on the same object.
For example,
a = np.array()
total = np.random.uniform(-1,1, size=(3))[:,np.newaxis]
print(id(total))
for i in range(3):
total += np.ones(shape=(3,1))
print(id(total))
id(total) are all same because add function just updates the instance in same address because dimmension of two arrays are same.
In [29]: arr = np.zeros((1,3))
The actual error message is:
In [30]: arr += np.ones((2,3))
Traceback (most recent call last):
Input In [30] in <cell line: 1>
arr += np.ones((2,3))
ValueError: non-broadcastable output operand with shape (1,3) doesn't match the broadcast shape (2,3)
I read that as say that arr on the left is "non-broadcastable", where as arr+np.ones((2,3)) is the result of broadcasting. The wording may be awkward; it's probably produced in some deep compiled function where it makes more sense.
We get a variant on this when we try to assign an array to a slice of an array:
In [31]: temp = arr + np.ones((2,3))
In [32]: temp.shape
Out[32]: (2, 3)
In [33]: arr[:] = temp
Traceback (most recent call last):
Input In [33] in <cell line: 1>
arr[:] = temp
ValueError: could not broadcast input array from shape (2,3) into shape (1,3)
This is clearer, saying that the RHS (2,3) cannot be put into the LHS (1,3) slot.
Or trying to put the (2,3) into one "row" of arr:
In [35]: arr[0] = temp
Traceback (most recent call last):
Input In [35] in <cell line: 1>
arr[0] = temp
ValueError: could not broadcast input array from shape (2,3) into shape (3,)
arr[0] = arr works because it tries to put a (1,3) into a (3,) shape - that's a workable broadcasting combination.
arr[0] = arr.T tries to put a (3,1) into a (3,), and fails.

Tensorflow 2 custom dataset Sequence

I have a dataset in a python dictionary. The structure is as follow:
data.data['0']['input'],data.data['0']['target'],data.data['0']['length']
Both input and target are arrays of size (n,) and length is an int.
I have created a class object with tf.keras.utils.Sequence and specify __getitem__ as this:
def __getitem__(self, idx):
idx = str(idx)
return {
'input': np.asarray(self.data[idx]['input']),
'target': np.asarray(self.data[idx]['target']),
'length': self.data[idx]['length']
}
How can I iterate over such dataset using tf.data.Dataset? I am getting this error if I try to use from_tensor_slices
ValueError: Attempt to convert a value with an unsupported type (<class 'dict'>) to a Tensor.
I think you should modify the dictionary to a tensor as proposed here convert a dictionary to a tensor
or change the dictionary to a text file or to a tfrecords. Hope this would help you!

How to safely subclass ndarray and get behavior consistent with ndarray - odd nanmin/max results?

I'm trying to subclass an ndarray so that I can add some additional fields. When I do this however, I get odd behavior in a variety of numpy functions. For example nanmin returns now return an object of the type of my new array classs, whereas previously I'd get a float64. Why? Is this a bug with nanmin or my class?
import numpy as np
class NDArrayWithColumns(np.ndarray):
def __new__(cls, obj, columns=None):
obj = obj.view(cls)
obj.columns = tuple(columns)
return obj
def __array_finalize__(self, obj):
if obj is None: return
self.columns = getattr(obj, 'columns', None)
NAN = float("nan")
r = np.array([1.,0.,1.,0.,1.,0.,1.,0.,NAN, 1., 1.])
print "MIN", np.nanmin(r), type(np.nanmin(r))
gives:
MIN 0.0 <type 'numpy.float64'>
but
>>> r = NDArrayWithColumns(r, ["a"])
>>> print "MIN", np.nanmin(r), type(np.nanmin(r))
MIN 0.0 <class '__main__.NDArrayWithColumns'>
>>> print r.shape
(11,)
Note the change in type, and also that str(np.nanmin(r)) shows 1 field, not 11.
In case you're interested, I'm subclassing because I'd like to track columns names is matrices of a single type but structure arrays and record type arrays allow for varying type).
You need to implement the __array_wrap__ method that gets called at the end of ufuncs, per the docs:
def __array_wrap__(self, out_arr, context=None):
print('In __array_wrap__:')
print(' self is %s' % repr(self))
print(' arr is %s' % repr(out_arr))
# then just call the parent
return np.ndarray.__array_wrap__(self, out_arr, context)

parent method to append vector to attribute of derived class

My goal is to create a method, called anotherVar, in a class Delta, that adds an array to an existing array which I can call from a derived class (in this case MyClass1).
The code I have written here doesn't accomplish this. Where am I going wrong? Presumably it's my definition of anotherVar?
import numpy as np
class Delta(object):
def anotherVar(self):
return np.vstack(self)
class myClass1(Delta):
def __init__(self, *myVars):
self.__myArray = np.vstack(myVars)
#property
def myArray(self):
return self.__myArray
someVars1 = [1,2,3]
someVars2 = [4,5,6]
someVars3 = [7,8,9]
myResult = myClass1(someVars1,someVars2,someVars2)
myResult.anotherVar = someVars3
print myResult.myArray
[[1 2 3]
[4 5 6]
[4 5 6]]
There are 2 issues with your original code:
You're rebinding the identifier anotherVar of Delta to a variable. Most likely, you wanted to call
myResult.anotherVar(someVars3)
rather than
myResult.anotherVar = someVars3
as the latter rebinds the method anotherVar to the variable someVars3.
When you are using double underscores, you're using name mangling. If it's merely to make an attribute/method "private", you shouldn't. Any developer who sees a single underscore in front of an attribute, will understand that it is liable to change and thus should not be depended on in the public API.
After changing 2 lines in Delta and changing double underscores into single underscores, your code works as you expect:
import numpy as np
class Delta(object):
def anotherVar(self, arr):
self._myArray = np.vstack((self._myArray, arr))
class myClass1(Delta):
def __init__(self, *myVars):
self._myArray = np.vstack(myVars)
#property
def myArray(self):
return self._myArray

numpy matrix trace behaviour

If X is a NumPy matrix object, why does np.trace(X) return a scalar (as expected) but X.trace() return a 1x1 matrix object?
>>> X = np.matrix([[1, 2], [3, 4]])
>>> np.trace(X)
5
>>> X.trace()
matrix([[5]]) # Why not just 5, which would be more useful?
I'm using NumPy 1.7.1, but don't see anything in the release notes of 1.8 to suggest anything's changed.
def __array_finalize__(self, obj):
self._getitem = False
if (isinstance(obj, matrix) and obj._getitem): return
ndim = self.ndim
if (ndim == 2):
return
if (ndim > 2):
newshape = tuple([x for x in self.shape if x > 1])
ndim = len(newshape)
if ndim == 2:
self.shape = newshape
return
elif (ndim > 2):
raise ValueError("shape too large to be a matrix.")
else:
newshape = self.shape
if ndim == 0:
self.shape = (1, 1)
elif ndim == 1:
self.shape = (1, newshape[0])
return
This is from the matrix definition, subclassing ndarray. The trace function is not changed so it is actually the same function getting called.
This function is getting called every time a matrix object is created. The problem is that if ndims is less than 2, it is forced to be larger.
Then here comes some educated guess work, which i think should be true, but i'm not familiar enough with numpy codebase to figure it out exactly.
np.trace and ndarray.trace are two different functions.
np.trace is defined in "core/fromnumeric.py"
ndarray.trace is defined in "core/src/multiarray/methods.c or calculation.c"
np.trace converts the object to a ndarray
ndarray.trace will try to keep the object as the subclassed object.
Unsure about this, i did not understand squat of that code tbh
both trace functions, will keep the result as an array object (subclassed or not). If it's a single value, it will return that single value, or else it returns the array object.
Since the result is kept as a matrix object, it will be forced to be two dimensional by the function above here. And because of this, it will not be returned as a single value, but as the matrix object.
This conclusion is further backed up by editing the _array_finalize__ function like this:
def __array_finalize__(self, obj):
self._getitem = False
if (isinstance(obj, matrix) and obj._getitem): return
ndim = self.ndim
if (ndim == 2):
return
if (ndim > 2):
newshape = tuple([x for x in self.shape if x > 1])
ndim = len(newshape)
if ndim == 2:
self.shape = newshape
return
elif (ndim > 2):
raise ValueError("shape too large to be a matrix.")
else:
newshape = self.shape
return
if ndim == 0:
self.shape = (1, 1)
elif ndim == 1:
self.shape = (1, newshape[0])
return
notice the new return before the last if-else check. Now the result of X.trace() is a single value.
THIS IS NOT A FIX, revert the change if you try this yourself.
They have good reasons for doing this
np.tracedoes not have this problems since it convert's it to an array object directly.
The code for np.trace is (without the docstring):
def trace(a, offset=0, axis1=0, axis2=1, dtype=None, out=None):
return asarray(a).trace(offset, axis1, axis2, dtype, out)
From the docstring of asarray
Array interpretation of a. No copy is performed if the input
is already an ndarray. If a is a subclass of ndarray, a base
class ndarray is returned.
Because X.trace is coded that way! The matrix documentation says:
A matrix is a specialized 2-D array that retains its 2-D nature
through operations.
np.trace is coded as (using ndarray.trace):
return asarray(a).trace(offset, axis1, axis2, dtype, out)
It's harder to follow how the matrix trace is evaluated. But looking at https://github.com/numpy/numpy/blob/master/numpy/matrixlib/defmatrix.py
I suspect it is equivalent to:
np.asmatrix(X.A.trace())
In that same file, sum is defined as:
return N.ndarray.sum(self, axis, dtype, out, keepdims=True)._collapse(axis)
mean, prod etc do the same. _collapse returns a scalar if axis is None. There isn't an explicit definition for a matrix trace, so it probably uses __array_finalize__. In other words, trace returns the default matrix type.
Several constructs that return the scalar are:
X.A.trace()
X.diagonal().sum()
X.trace()._collapse(None)