Numpy C-Api array_equal - numpy

I've tried to find function comparing two PyArrayObject - something like numpy array_equal But I haven't found anything. Do you know function like this?
If not - How to import this numpy array_equal to my C code?

Here's the code for array_equal:
def array_equal(a1, a2):
try:
a1, a2 = asarray(a1), asarray(a2)
except:
return False
if a1.shape != a2.shape:
return False
return bool(asarray(a1 == a2).all())
As you can see it is not a c-api level function. After making sure both inputs are arrays, and that shape match it performs a element == test, followed by all.
This does not work reliably with floats. It's ok with ints and booleans.
There probably is some sort of equality function in the c-api, but a clone of this probably isn't what you need.
PyArray_CountNonzero(PyArrayObject* self)
might be a good function. I remember from digging into the code earlier that PyArray_Nonzero uses it to determine how big of an array to allocate and return. You could give it an object that compares the elements of your 2 arrays (in what ever way is appropriate given the dtype), and then test for a nonzero count.
Or you could construct your own iterator that bails out as soon as it gets a not-equal pair of elements. Use nditer to get the full array broadcasting power.

Related

Gdal: how to assign values to pixel based on condition?

I would like to change the values of the pixel of a geotiff raster such as is 1 if the pixel values are between 50 and 100 and 0 otherwise.
Following this post, this is what I am doing:
gdal_calc.py -A input.tif --outfile=output.tif --calc="1*(50<=A<=100)" --NoDataValue=0
but I got the following error
0.. evaluation of calculation 1*(50<=A<=100) failed
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I think such a notation would only work if the expression returns a single boolean, but this returns an array of booleans. Hence the suggestion to aggregate the array to a scalar with something like any() or all().
You should be able to write it in a way compatible with Numpy arrays with something like this:
1 * ((50 <= A) & (A <=100))
Your original expression has an implicit and in it, whereas this uses an explicit & which translates to np.logical_and for an element-wise test if both values on either side are True.
I'm not sure what the multiplication with one adds in this case, it casts the bool to an int32 datatype. Even if you need to write the result as an int32 you can probably still leave the casting to GDAL in this case.
A toy example replicating this would be:
a = np.random.randint(0,2,5, dtype=np.bool_)
b = np.random.randint(0,2,5, dtype=np.bool_)
With this data a and b would fail in the same way, because it can't evaluate an entire array as True/False, whereas a & b would return a new array with the element-wise result.

How do you use/view memoryview objects in Cython?

I've got a project where a handful of nested for-loops are slowing down the runtime of the code so I've started implementing some Cython typing and it sped up the runtime of the loops significantly but I've run into a new problem, the typing I'm using doesn't allow for any computations to be done one them. Here's a mock sketch of my code:
cdef double[:,:] my_matrix = np.zeros([width, height])
for i in range(0,width):
for j in range(0,height):
a = v1[i] - v2[j]
my_matrix[i,j] = np.sqrt(a**2)
After that I want to compute the product of my_matrix using
A complex number
Two constants
The exponential function
The matrix itself, like so:
product = constant1 * np.exp(-1j * constant2 * my_matrix) / my_matrix
By attempting this I get the error:
TypeError: unsupported operand type(s) for *: 'complex' and 'my_cython_function_cy._memoryviewslice'
I understand the implication of this error but I dont get how to use the contents of the memoryview-object as an array, I tried doing this;
new_matrix = my_matrix
but that won't compile. I'm new to both C and Cython and the documentation isn't very helpful for these rookie-questions so I would be very grateful for any help here.
The best thing to do is:
new_matrix = np.as_array(my_matrix)
That lets you access the full set of Numpy operations on the array. It should be a pretty lightweight transformation (they'll share the same underlying data).
You could also get the wrapped object with my_matrix.base (this would probably be the original Numpy array that you initialized it with). However, depending on what you've done with slicing this might not be quite the same as the memoryview, so be a bit wary of this approach.

Canonical Tensorflow "for loop"

What is the canonical way of running a Tensorflow "for loop"?
Specifically, suppose we have some body function which does NOT depend on the loop iteration, but must be run n times.
One might think that a good method might be to run this inside of a tf.while_loop like this:
def body(x):
return ...
def while_body(i,x):
return i+1, body(x)
i, x = tf.while_loop(lambda i: tf.less(i, n), while_body, [tf.constant(0),x])
In fact, that is precisely what the highest rated answer in this question suggests:
How can I run a loop with a tensor as its range? (in tensorflow)
However, the tf.while_loop docs specifically say
For correct programs, while_loop should return the same result for any parallel_iterations > 0.
If you put a counter in the body, then it seems that that condition is violated. So it seems that there must be a different way of setting up a "for loop".
Furthermore, even if there is no explicit error, doing so seems like it will create a dependency between iterations meaning that I do not think they will run in parallel.
After some investigation, it seems that the tf.while_loop idiom used above is quite common. Alternatively, one can use tf.scan:
def body( x ):
return ...
def scan_body( previous_output, iteration ):
return body( ... )
x = tf.scan( scan_body, tf.range(n), initializer = [x] )
although I have no idea if one is preferable from a performance point of view. Note in the above that we have to wrap the body function to accept the previous output.

How do output shape in cntk?

I write this code:
matrix = C.softmax(model).eval(data).
But matrix.shape, matrix.size give me errors. So I'm wondering, how can I output the shape of CNTK variable?
First note that eval() will not give you a CNTK variable, it will give you a numpy array (or a list of numpy arrays, see the next point).
Second, depending on the nature of the model it is possible that what comes out of eval() is not a numpy array but a list. The reason for this is that if the output is a sequence then CNTK cannot guarrantee that all sequences will be of the same length and it therefore returns a list of arrays, each array being one sequence.
Finally, if you truly have a CNTK variable, you can get the dimensions with .shape

Using vectorize to apply function to each row in Numpy 2d array

I have a 1000x784 matrix of data (10000 examples and 784 features) called X_valid and I'd like to apply the following function to each row in this matrix and get the numerical result:
def predict_prob(x_valid, cov, mean, prior):
return -0.5 * (x_valid.T.dot(np.linalg.inv(cov)).dot(x_valid) + mean.T.dot(
np.linalg.inv(cov)).dot(mean) + np.linalg.slogdet(cov)[1]) + np.log(
prior)
(x_valid is simply a row of data). I'm using numpy's vectorize to do this with the following code:
v_predict_prob = np.vectorize(predict_prob)
scores = v_predict_prob(X_valid, covariance[num], means[num], priors[num])
(covariance[num], means[num], and priors[num] are just constants.)
However, I get the following error when running this:
File "problem_5.py", line 48, in predict_prob
return -0.5 * (x_valid.T.dot(np.linalg.inv(cov)).dot(x_valid) + mean.T.dot(np.linalg.inv(cov)).dot(mean) + np.linalg.slogdet(cov)[1]) + np.log(prior)
AttributeError: 'numpy.float64' object has no attribute 'dot'
That is, it's not passing in each row of the matrix individually. Instead, it is passing in each entry of the matrix (not what I want).
How can I alter this to get the desired behavior?
vectorize is NOT a general substitute for iteration, nor does it claim to be faster. It mainly streamlines access to the numpy broadcasting functionality. In general the function that you vectorize will take scalar inputs, not rows or 1d arrays.
I don't think there is a way of configuring vectorize to pass an array to your function as opposed to an item.
You describe x_valid as 2d that you want to evaluate row by row. And the other terms as 'constants' which you select with [num]. What shape are those constants?
You function treats a lot of these terms as 2d arrays:
x_valid.T.dot(np.linalg.inv(cov)).dot(x_valid) +
mean.T.dot(np.linalg.inv(cov)).dot(mean) +
np.linalg.slogdet(cov)[1]) + np.log(prior)
x_valid.T is meaningful only if x_valid is 2d. If it is 1d, the transpose does noting.
np.linalg.inv(cov) only makes sense if cov is 2d.
mean.T.dot... assumes mean is 2d.
np.linalg.slogdet(cov)[1] assumes np.linalg.slogdet(cov) has 2 or more elements (or rows).
You need to show us that the function works with some real arrays before jumping into iteration or 'vectorize'.
I suggest just using a for loop:
def v_predict_prob(X_valid, c, m, p):
out = []
for row in X_valid:
out.append(predict_prob(row, c, m, p))
return np.array(out)
Under the hood np.vectorize is doing the same thing: http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.vectorize.html
I know this question is a bit outdated, but I thought I would provide an answer for 2020.
Since the release of numpy 1.12, there is a new optional argument, "signature", which should allow 2D array functionality in most cases. Additionally, you will want to "exclude" the constants since they will not be vectorized.
All you would need to change is:
v_predict_prob = np.vectorize(predict_prob, exclude=['cov', 'mean', 'prior'], signature='(n)->()')
This signifies that the function should expect an n-dim array and output a scalar, and cov, mean, and prior will not be vectorized.