numpy.asarray fails for cython memoryview - numpy

I have encountered some strange behavior using numpy.asarray with a memoryview object that I can't explain. Here's a brief example using the cython magic in a jupyter notebook -- I simply make a function that expects two struct array buffers and returns them. One has two ints and the other a long and an int:
cdef struct S1:
int iGroup
int iOrder
cdef struct S2:
long iGroup
int iOrder
def test_struct(S1[:] s1, S2[:] s2):
return s1, s2
Now I make two arrays in python to pass to this function:
dt1 = np.dtype([('iGroup', 'i4'), ('iOrder', 'i4')], align=True)
dt2 = np.dtype([('iGroup', 'i8'), ('iOrder', 'i4')], align=True)
a = np.zeros(10, dtype=dt1)
b = np.zeros(10, dtype=dt2)
x, y = test_struct(a,b)
print x
print y
<MemoryView of 'ndarray' object>
<MemoryView of 'ndarray' object>
Both are successfully returned as MemoryView objects. Now I want to turn them into a numpy array:
np.asarray(x)
array([(0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0),
(0, 0), (0, 0)],
dtype=[('iGroup', '<i4'), ('iOrder', '<i4')])
np.asarray(y)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-204-ca5459515bfd> in <module>()
----> 1 np.asarray(y)
/Users/rok/miniconda/lib/python2.7/site-packages/numpy/core/numeric.pyc in asarray(a, dtype, order)
480
481 """
--> 482 return array(a, dtype, copy=False, order=order)
483
484 def asanyarray(a, dtype=None, order=None):
TypeError: expected a readable buffer object
What am I missing here? Why does this second struct not work? Any tips would be greatly appreciated!

Related

different dimension numpy array broadcasting issue with '+=' operator

I'm new to numpy, and I have an interesting observation on the broadcasting. When I'm adding a 3x5 array directly to a 3x1 array, and update the original 3x1 array with the result, there is no broadcasting issue.
import numpy as np
total = np.random.uniform(-1,1, size=(3))[:,np.newaxis]
print(f'init = \n {total}')
for i in range(3):
total = total + np.ones(shape=(3,5))
print(f'total_{i} = \n {total}')
However, if i'm using '+=' operator to increment the 3x1 array with the value of 3x5 array, there is a broadcasting issue. May I know which rule of numpy broadcasting did I violate in the latter case?
total = np.random.uniform(-1,1, size=(3))[:,np.newaxis]
print(f'init = \n {total}')
for i in range(3):
total += np.ones(shape=(3,5))
print(f'total_{i} = \n {total}')
Thank you!
hawkoli1987
according to add function overridden in numpy array,
def add(x1, x2, *args, **kwargs): # real signature unknown; NOTE: unreliably restored from __doc__
"""
add(x1, x2, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True[, signature, extobj])
Add arguments element-wise.
Parameters
----------
x1, x2 : array_like
The arrays to be added.
If ``x1.shape != x2.shape``, they must be broadcastable to a common
shape (which becomes the shape of the output).
out : ndarray, None, or tuple of ndarray and None, optional
A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated array is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
add function returns a freshly-allocated array when dimensions of arrays are different.
In python, a=a+b and a+=b aren't absolutly same. + calls __add__ function and += calls __iadd__.
a = np.array([1, 2])
b = np.array([3, 4])
first_id = id(a)
a = a + b
second_id = id(a)
assert first_id == second_id # False
a = np.array([1, 2])
b = np.array([3, 4])
first_id = id(a)
a += b
second_id = id(a)
assert first_id == second_id # True
+= function does not create new objects and updates the value to the same address.
numpy add function updates an existing instance when adding an array of the same dimensions, but returns a new object when adding arrays of different dimensions. So when use += functions, the two functions must have the same dimension because the results of the add function must be updated on the same object.
For example,
a = np.array()
total = np.random.uniform(-1,1, size=(3))[:,np.newaxis]
print(id(total))
for i in range(3):
total += np.ones(shape=(3,1))
print(id(total))
id(total) are all same because add function just updates the instance in same address because dimmension of two arrays are same.
In [29]: arr = np.zeros((1,3))
The actual error message is:
In [30]: arr += np.ones((2,3))
Traceback (most recent call last):
Input In [30] in <cell line: 1>
arr += np.ones((2,3))
ValueError: non-broadcastable output operand with shape (1,3) doesn't match the broadcast shape (2,3)
I read that as say that arr on the left is "non-broadcastable", where as arr+np.ones((2,3)) is the result of broadcasting. The wording may be awkward; it's probably produced in some deep compiled function where it makes more sense.
We get a variant on this when we try to assign an array to a slice of an array:
In [31]: temp = arr + np.ones((2,3))
In [32]: temp.shape
Out[32]: (2, 3)
In [33]: arr[:] = temp
Traceback (most recent call last):
Input In [33] in <cell line: 1>
arr[:] = temp
ValueError: could not broadcast input array from shape (2,3) into shape (1,3)
This is clearer, saying that the RHS (2,3) cannot be put into the LHS (1,3) slot.
Or trying to put the (2,3) into one "row" of arr:
In [35]: arr[0] = temp
Traceback (most recent call last):
Input In [35] in <cell line: 1>
arr[0] = temp
ValueError: could not broadcast input array from shape (2,3) into shape (3,)
arr[0] = arr works because it tries to put a (1,3) into a (3,) shape - that's a workable broadcasting combination.
arr[0] = arr.T tries to put a (3,1) into a (3,), and fails.

How do I correct this Value Error due to Buffer having the wrong dimensions in a quadprog SVM implementation?

I'm using the quadprog module to set up an SVM for speech recognition. I took a QP implementation from here: https://github.com/stephane-caron/qpsolvers/blob/master/qpsolvers/quadprog_.py
Here is their implementation:
def quadprog_solve_qp(P, q, G=None, h=None, A=None, b=None, initvals=None,
verbose=False):
if initvals is not None:
print("quadprog: note that warm-start values ignored by wrapper")
qp_G = P
qp_a = -q
if A is not None:
if G is None:
qp_C = -A.T
qp_b = -b
else:
qp_C = -vstack([A, G]).T
qp_b = -np.insert(h, 0, 0, axis=0)
meq = A.shape[0]
else: # no equality constraint
qp_C = -G.T if G is not None else None
qp_b = -h if h is not None else None
meq = 0
try:
return solve_qp(qp_G, qp_a, qp_C, qp_b, meq)[0]
except ValueError as e:
if "matrix G is not positive definite" in str(e):
# quadprog writes G the cost matrix that we write P in this package
raise ValueError("matrix P is not positive definite")
raise
Shapes:
P: (127, 127)
h: (254, 1)
q: (127, 1)
A: (1, 127)
G: (254, 127)
I also had that qp_b was initially assigned to an hstack of an array arr = array([0]) with h but the shape: (1,) prevented numpy from concatenating the arrays. I fixed this error by inserting a [0] instead.
When I try quadprog_solve_qp(P, q, G, h, A) I get a:
File "----------------------------.py", line 95, in quadprog_solve_qp
return solve_qp(qp_G, qp_a, qp_C, qp_b, meq)[0]
File "quadprog/quadprog.pyx", line 12, in quadprog.solve_qp
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
And I have no idea where it's coming from, nor what I can do. If anyone has any idea how the quadprog module works or simply what I might be doing wrong I would be pleased to hear.

TypeError: Input 'input_sizes' of 'Conv3DBackpropInputV2' Op has type int64 that does not match expected type of int32

deconv_shape1 = layer3.get_shape()
de_W1 = tf.Variable(tf.truncated_normal(shape=(4, 4, 4,
deconv_shape1[4].value, 2), mean = mu, stddev = sigma))
de_b1 = tf.Variable(tf.zeros(deconv_shape1[4].value))
output_shape=x.get_shape().as_list()
output_shape[1] *= 2
output_shape[2] *= 2
output_shape[3] *= 2
output_shape[4] = deconv_shape1[4].value
output_shape=np.asarray(output_shape)
output_shape=tfConv3DBackpropInputV2.convert_to_tensor(output_shape)
print(type(output_shape))
x = tf.nn.conv3d_transpose(x, de_W1, output_shape, strides=[1, 2, 2, 2, 1], padding="SAME")
x = tf.nn.bias_add(x,de_b1)
first_down_layer=x
x is of type int32.
I get the error as mentioned in tensorflow. What I am doing wrong, as I am not even calling Conv3DBackpropInputV2().
I am a newbie to tensorflow, please help!!
Just as a prefix, why don't you use the ready-made conv3d_transpose layer, tf.layers.conv3d_transpose(), why are you trying to put it together yourself with all these moving parts? But hey, maybe you have a good reason. So:
output_shape is of type int64. Run this code:
import tensorflow as tf
import numpy as np
a = tf.zeros( ( 5, 5, 5, 5, 5 ) )
b = a.get_shape().as_list()
c = np.asarray( b )
print( c.dtype )
will output
int64
So do this when converting to array:
output_shape = np.asarray( output_shape, dtype = np.int32 )
That should fix it.

Cython Typing List of Strings

I'm trying to use cython to improve the performance of a loop, but I'm running
into some issues declaring the types of the inputs.
How do I include a field in my typed struct which is a string that can be
either 'front' or 'back'
I have a np.recarray that looks like the following (note the length of the
recarray is unknown as compile time)
import numpy as np
weights = np.recarray(4, dtype=[('a', np.int64), ('b', np.str_, 5), ('c', np.float64)])
weights[0] = (0, "front", 0.5)
weights[1] = (0, "back", 0.5)
weights[2] = (1, "front", 1.0)
weights[3] = (1, "back", 0.0)
as well as inputs of a list of strings and a pandas.Timestamp
import pandas as pd
ts = pd.Timestamp("2015-01-01")
contracts = ["CLX16", "CLZ16"]
I am trying to cythonize the following loop
def ploop(weights, contracts, timestamp):
cwts = []
for gen_num, position, weighting in weights:
if weighting != 0:
if position == "front":
cntrct_idx = gen_num
elif position == "back":
cntrct_idx = gen_num + 1
else:
raise ValueError("transition.columns must contain "
"'front' or 'back'")
cwts.append((gen_num, contracts[cntrct_idx], weighting, timestamp))
return cwts
My attempt involved typing the weights input as a struct in cython,
in a file struct_test.pyx as follows
import numpy as np
cimport numpy as np
cdef packed struct tstruct:
np.int64_t gen_num
char[5] position
np.float64_t weighting
def cloop(tstruct[:] weights_array, contracts, timestamp):
cdef tstruct weights
cdef int i
cdef int cntrct_idx
cwts = []
for k in xrange(len(weights_array)):
w = weights_array[k]
if w.weighting != 0:
if w.position == "front":
cntrct_idx = w.gen_num
elif w.position == "back":
cntrct_idx = w.gen_num + 1
else:
raise ValueError("transition.columns must contain "
"'front' or 'back'")
cwts.append((w.gen_num, contracts[cntrct_idx], w.weighting,
timestamp))
return cwts
But I am receiving runtime errors, which I believe are related to the
char[5] position.
import pyximport
pyximport.install()
import struct_test
struct_test.cloop(weights, contracts, ts)
ValueError: Does not understand character buffer dtype format string ('w')
In addition I am a bit unclear how I would go about typing contracts as well
as timestamp.
Your ploop (without the timestamp variable) produces:
In [226]: ploop(weights, contracts)
Out[226]: [(0, 'CLX16', 0.5), (0, 'CLZ16', 0.5), (1, 'CLZ16', 1.0)]
Equivalent function without a loop:
def ploopless(weights, contracts):
arr_contracts = np.array(contracts) # to allow array indexing
wgts1 = weights[weights['c']!=0]
mask = wgts1['b']=='front'
wgts1['b'][mask] = arr_contracts[wgts1['a'][mask]]
mask = wgts1['b']=='back'
wgts1['b'][mask] = arr_contracts[wgts1['a'][mask]+1]
return wgts1.tolist()
In [250]: ploopless(weights, contracts)
Out[250]: [(0, 'CLX16', 0.5), (0, 'CLZ16', 0.5), (1, 'CLZ16', 1.0)]
I'm taking advantage of the fact that returned list of tuples has same (int, str, int) layout as the input weight array. So I'm just making a copy of weights and replacing selected values of the b field.
Note that I use the field selection index before the mask one. The boolean mask produces a copy, so we have to careful about indexing order.
I'm guessing that loop-less array version will be competitive in time with the cloop (on realistic arrays). The string and list operations in cloop probably limit its speedup.

numpy matrix trace behaviour

If X is a NumPy matrix object, why does np.trace(X) return a scalar (as expected) but X.trace() return a 1x1 matrix object?
>>> X = np.matrix([[1, 2], [3, 4]])
>>> np.trace(X)
5
>>> X.trace()
matrix([[5]]) # Why not just 5, which would be more useful?
I'm using NumPy 1.7.1, but don't see anything in the release notes of 1.8 to suggest anything's changed.
def __array_finalize__(self, obj):
self._getitem = False
if (isinstance(obj, matrix) and obj._getitem): return
ndim = self.ndim
if (ndim == 2):
return
if (ndim > 2):
newshape = tuple([x for x in self.shape if x > 1])
ndim = len(newshape)
if ndim == 2:
self.shape = newshape
return
elif (ndim > 2):
raise ValueError("shape too large to be a matrix.")
else:
newshape = self.shape
if ndim == 0:
self.shape = (1, 1)
elif ndim == 1:
self.shape = (1, newshape[0])
return
This is from the matrix definition, subclassing ndarray. The trace function is not changed so it is actually the same function getting called.
This function is getting called every time a matrix object is created. The problem is that if ndims is less than 2, it is forced to be larger.
Then here comes some educated guess work, which i think should be true, but i'm not familiar enough with numpy codebase to figure it out exactly.
np.trace and ndarray.trace are two different functions.
np.trace is defined in "core/fromnumeric.py"
ndarray.trace is defined in "core/src/multiarray/methods.c or calculation.c"
np.trace converts the object to a ndarray
ndarray.trace will try to keep the object as the subclassed object.
Unsure about this, i did not understand squat of that code tbh
both trace functions, will keep the result as an array object (subclassed or not). If it's a single value, it will return that single value, or else it returns the array object.
Since the result is kept as a matrix object, it will be forced to be two dimensional by the function above here. And because of this, it will not be returned as a single value, but as the matrix object.
This conclusion is further backed up by editing the _array_finalize__ function like this:
def __array_finalize__(self, obj):
self._getitem = False
if (isinstance(obj, matrix) and obj._getitem): return
ndim = self.ndim
if (ndim == 2):
return
if (ndim > 2):
newshape = tuple([x for x in self.shape if x > 1])
ndim = len(newshape)
if ndim == 2:
self.shape = newshape
return
elif (ndim > 2):
raise ValueError("shape too large to be a matrix.")
else:
newshape = self.shape
return
if ndim == 0:
self.shape = (1, 1)
elif ndim == 1:
self.shape = (1, newshape[0])
return
notice the new return before the last if-else check. Now the result of X.trace() is a single value.
THIS IS NOT A FIX, revert the change if you try this yourself.
They have good reasons for doing this
np.tracedoes not have this problems since it convert's it to an array object directly.
The code for np.trace is (without the docstring):
def trace(a, offset=0, axis1=0, axis2=1, dtype=None, out=None):
return asarray(a).trace(offset, axis1, axis2, dtype, out)
From the docstring of asarray
Array interpretation of a. No copy is performed if the input
is already an ndarray. If a is a subclass of ndarray, a base
class ndarray is returned.
Because X.trace is coded that way! The matrix documentation says:
A matrix is a specialized 2-D array that retains its 2-D nature
through operations.
np.trace is coded as (using ndarray.trace):
return asarray(a).trace(offset, axis1, axis2, dtype, out)
It's harder to follow how the matrix trace is evaluated. But looking at https://github.com/numpy/numpy/blob/master/numpy/matrixlib/defmatrix.py
I suspect it is equivalent to:
np.asmatrix(X.A.trace())
In that same file, sum is defined as:
return N.ndarray.sum(self, axis, dtype, out, keepdims=True)._collapse(axis)
mean, prod etc do the same. _collapse returns a scalar if axis is None. There isn't an explicit definition for a matrix trace, so it probably uses __array_finalize__. In other words, trace returns the default matrix type.
Several constructs that return the scalar are:
X.A.trace()
X.diagonal().sum()
X.trace()._collapse(None)