Using vectorize to apply function to each row in Numpy 2d array - numpy

I have a 1000x784 matrix of data (10000 examples and 784 features) called X_valid and I'd like to apply the following function to each row in this matrix and get the numerical result:
def predict_prob(x_valid, cov, mean, prior):
return -0.5 * (x_valid.T.dot(np.linalg.inv(cov)).dot(x_valid) + mean.T.dot(
np.linalg.inv(cov)).dot(mean) + np.linalg.slogdet(cov)[1]) + np.log(
prior)
(x_valid is simply a row of data). I'm using numpy's vectorize to do this with the following code:
v_predict_prob = np.vectorize(predict_prob)
scores = v_predict_prob(X_valid, covariance[num], means[num], priors[num])
(covariance[num], means[num], and priors[num] are just constants.)
However, I get the following error when running this:
File "problem_5.py", line 48, in predict_prob
return -0.5 * (x_valid.T.dot(np.linalg.inv(cov)).dot(x_valid) + mean.T.dot(np.linalg.inv(cov)).dot(mean) + np.linalg.slogdet(cov)[1]) + np.log(prior)
AttributeError: 'numpy.float64' object has no attribute 'dot'
That is, it's not passing in each row of the matrix individually. Instead, it is passing in each entry of the matrix (not what I want).
How can I alter this to get the desired behavior?

vectorize is NOT a general substitute for iteration, nor does it claim to be faster. It mainly streamlines access to the numpy broadcasting functionality. In general the function that you vectorize will take scalar inputs, not rows or 1d arrays.
I don't think there is a way of configuring vectorize to pass an array to your function as opposed to an item.
You describe x_valid as 2d that you want to evaluate row by row. And the other terms as 'constants' which you select with [num]. What shape are those constants?
You function treats a lot of these terms as 2d arrays:
x_valid.T.dot(np.linalg.inv(cov)).dot(x_valid) +
mean.T.dot(np.linalg.inv(cov)).dot(mean) +
np.linalg.slogdet(cov)[1]) + np.log(prior)
x_valid.T is meaningful only if x_valid is 2d. If it is 1d, the transpose does noting.
np.linalg.inv(cov) only makes sense if cov is 2d.
mean.T.dot... assumes mean is 2d.
np.linalg.slogdet(cov)[1] assumes np.linalg.slogdet(cov) has 2 or more elements (or rows).
You need to show us that the function works with some real arrays before jumping into iteration or 'vectorize'.

I suggest just using a for loop:
def v_predict_prob(X_valid, c, m, p):
out = []
for row in X_valid:
out.append(predict_prob(row, c, m, p))
return np.array(out)
Under the hood np.vectorize is doing the same thing: http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.vectorize.html

I know this question is a bit outdated, but I thought I would provide an answer for 2020.
Since the release of numpy 1.12, there is a new optional argument, "signature", which should allow 2D array functionality in most cases. Additionally, you will want to "exclude" the constants since they will not be vectorized.
All you would need to change is:
v_predict_prob = np.vectorize(predict_prob, exclude=['cov', 'mean', 'prior'], signature='(n)->()')
This signifies that the function should expect an n-dim array and output a scalar, and cov, mean, and prior will not be vectorized.

Related

Simple question about slicing a Numpy Tensor

I have a Numpy Tensor,
X = np.arange(64).reshape((4,4,4))
I wish to grab the 2,3,4 entries of the first dimension of this tensor, which you can do with,
Y = X[[1,2,3],:,:]
Is this a simpler way of writing this instead of explicitly writing out the indices [1,2,3]? I tried something like [1,:], which gave me an error.
Context: for my real application, the shape of the tensor is something like (30000,100,100). I would like to grab the last (10000, 100,100) to (30000,100,100) of this tensor.
The simplest way in your case is to use X[1:4]. This is the same as X[[1,2,3]], but notice that with X[1:4] you only need one pair of brackets because 1:4 already represent a range of values.
For an N dimensional array in NumPy if you specify indexes for less than N dimensions you get all elements of the remaining dimensions. That is, for N equal to 3, X[1:4] is the same as X[1:4, :, :] or X[1:4, :]. Only if you want to index some dimension while getting all elements in a dimension that comes before it is that you actually need to pass :. Such as X[:, 2:4], for instance.
If you wish to select from some row to the end of array, simply use python slicing notation as below:
X[10000:,:,:]
This will select all rows from 10000 to the end of array and all columns and depths for them.

Coefficients of 2D Chebyshev series in numpy.polynomial.chebyshev

I understand that chebvander2d and chebval2d return the Vandermonde matrix and fitted values for 2D inputs, and chebfit returns the coefficients for 1D-input series, but how do I get the coefficients for 2D-input series?
Short answer: It looks to me like this is not yet implemented. The whole of 2D polynomials seems more like a draft with some stub functions (as of June 2020).
Long answer (I came looking for the same thing, so I dug a little deeper):
First of all, this applies to all of the polynomial classes, not only chebyshev, so you also cannot fit an "ordinary" polynomial (power series). In fact, you cannot even construct one.
To understand the programming problem, let me recapture what a 2D polynomial looks like as a math formula, at an example polynomial of degree 2:
p(x, y) = c_00 + c_10 x + c_01 y + c_20 x^2 + c11 xy + c02 y^2
here the indices of c refer to the powers of x and y (the sum of the exponents must be <= degree).
First thing to notice is that, for degree d, there are (d+1)(d+2)/2 coefficients.
They could be stored in the upper left part of a matrix or in a 1D array, e.g. aranged as in the formula above.
The documentation of functions like numpy.polynomial.polynomial.polyval2d implies that numpy expects the matrix variant: p(x, y) = sum_i,j c_i,j * x^i * y^j.
Side note: it may be confusing that the row index i ("y-coordinate") of the matrix is used as exponent of x, not y; maybe the role of i and j should be switched if this is eventually implementd, or at least there should be a note in the documentation.
This leads to the core problem: the data structure for the 2D coefficients is not defined anywhere; only indirectly, like above, it can be guessed that a matrix should be used. But compared to a 1D array this is a waste of space, and evaluation of the polynomial takes two nested loops instead of just one. Also: does the matrix have to be initialized with np.zeros or do the implemented functions make sure that the lower right part is never touched so that np.empty can be used?
If the whole (d+1)^2 matrix were used, as the polyval2d function doc suggests, the degree of the polynomial would actually be d*2 (if c_d,d != 0)
To test this, I wanted to construct a numpy.polynomial.polynomial.Polynomial (yes, three times polynomial) and check the degree attribute:
import numpy as np
import numpy.polynomial.polynomial as poly
coef = np.array([
[5.00, 5.01, 5.02],
[5.10, 5.11, 0. ],
[5.20, 0. , 0. ]
])
polyObj = poly.Polynomial(coef)
print(polyObj.degree)
This gave a ValueError: Coefficient array is not 1-d before the print statement was reached. So while polyval2d expects a 2D coefficient array, it is not (yet) possible to construct such a polynomial - not manually like this at least. With this insight, it is not surprising that there is no function (yet) that computes a fit for 2D polynomials.

struggling minimizing non linear function

I am looking forward to minimize a non linear function with 3 arguments (x1,x2 and x3)
My sources of information are:
the explanation of the minimization function:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html
And an example they provide:
https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html
I do not belong to a mathematical area, so first off forgive me if I am using incorrect wording / expressions.
This is my code :
import numpy as np
from scipy.optimize import minimize
def rosen(x1,x2,x3):
return np.sqrt(((x1**2)*0.002)+((x2**2)*0.0035)+((x3**2)*0.0015)+(2*x1*x2*0.015)+(2*x1*x3*0.01)+(2*x2*x3*0.02))
I think that the first step is okey up to here..
Then it is required to state the:
x0 : ndarray
Initial guess. len(x0) is the dimensionality of the minimization problem.
Given that I am stating 3 args in the minimization function I shall state a 3 dim array , such like this?
x0=np.array([1,1,1])
res = minimize(rosen, x0)
print(res.x)
The undesired output is:
rosen() missing 2 required positional arguments: 'x2' and 'x3'
Which I do not really understand where shall I state the positional arguments.
Apart from that I would like to set some bounds for the outputing values for x1,x2,x3 .
Which I tried
res = minimize(rosen, x0, bounds=([0,None]),options={"disp": False})
Which outputs also that :
ValueError: length of x0 != length of bounds
How should I then express the bounds inside the res then?
The desired output would be simply to output an array for x1,x2,x3 according to the minimization of the function where each value is minimun 0, as stated in the bounds and that the sum of the args add up to 1.
Function-definition
Read the docs carefully, e.g. for your function-def:
fun : callable
The objective function to be minimized. Must be in the form f(x, *args). The
optimizing argument, x, is a 1-D array of points, and args is a tuple of any
additional fixed parameters needed to completely specify the function.
Your function should take a 1d-array, while you implement the multi-argument for multi-variables approach!
Changing:
def rosen(x1,x2,x3):
return np.sqrt(((x1**2)*0.002)+((x2**2)*0.0035)+((x3**2)*0.0015)+(2*x1*x2*0.015)+(2*x1*x3*0.01)+(2*x2*x3*0.02))
def rosen(x):
x1,x2,x3 = x # unpack vector for your kind of calculations
return np.sqrt(((x1**2)*0.002)+((x2**2)*0.0035)+((x3**2)*0.0015)+(2*x1*x2*0.015)+(2*x1*x3*0.01)+(2*x2*x3*0.02))
should work. This is a bit a repair-something-to-keep-my-other-code approach but won't hurt much in this example. Usually you implement your function-definition on the 1d-array-input assumption!
Bounds
Again from the docs:
bounds : sequence, optional
Bounds for variables (only for L-BFGS-B, TNC and SLSQP). (min, max) pairs for each
element in x, defining the bounds on that parameter. Use None for one of min or max
when there is no bound in that direction.
So you need n_vars pairs! Easily achieved by using a list-comprehension, deducing the necessary info from x0.
res = minimize(rosen, x0, bounds=[[0,None] for i in range(len(x0))],options={"disp": False})
Make variables sum up to 1 / Constraints
Your comment implies you want the variables to sum up to 1. You would need to use an equality-constraint then (only 1 solver supporting this and inequality-constraints; one other only inequality-constraints; the rest no constraints; solver will be picked automatically if none explicitly given).
It looks somewhat like:
cons = ({'type': 'eq', 'fun': lambda x: sum(x) - 1}) # read docs to understand!
# to think about:
# sum vs. np.sum
# (not much diff here)
res = minimize(rosen, x0, bounds=[[0,None] for i in range(len(x0))],options={"disp": False}, constraints=cons)
For the case of x nonnegative, the constraint is usually called the probability-simplex.
(untested code; conceptually correct!)

Matrices with different row lengths in numpy

Is there a way of defining a matrix (say m) in numpy with rows of different lengths, but such that m stays 2-dimensional (i.e. m.ndim = 2)?
For example, if you define m = numpy.array([[1,2,3], [4,5]]), then m.ndim = 1. I understand why this happens, but I'm interested if there is any way to trick numpy into viewing m as 2D. One idea would be padding with a dummy value so that rows become equally sized, but I have lots of such matrices and it would take up too much space. The reason why I really need m to be 2D is that I am working with Theano, and the tensor which will be given the value of m expects a 2D value.
I'll give here very new information about Theano. We have a new TypedList() type, that allow to have python list with all elements with the same type: like 1d ndarray. All is done, except the documentation.
There is limited functionality you can do with them. But we did it to allow looping over the typed list with scan. It is not yet integrated with scan, but you can use it now like this:
import theano
import theano.typed_list
a = theano.typed_list.TypedListType(theano.tensor.fvector)()
s, _ = theano.scan(fn=lambda i, tl: tl[i].sum(),
non_sequences=[a],
sequences=[theano.tensor.arange(2, dtype='int64')])
f = theano.function([a], s)
f([[1, 2, 3], [4, 5]])
One limitation is that the output of scan must be an ndarray, not a typed list.
No, this is not possible. NumPy arrays need to be rectangular in every pair of dimensions. This is due to the way they map onto memory buffers, as a pointer, itemsize, stride triple.
As for this taking up space: np.array([[1,2,3], [4,5]]) actually takes up more space than a 2×3 array, because it's an array of two pointers to Python lists (and even if the elements were converted to arrays, the memory layout would still be inefficient).

Numpy sum over planes of 3d array, return a scalar

I'm making the transition from MATLAB to Numpy and feeling some growing pains.
I have a 3D array, lets say it's 3x3x3 and I want the scalar sum of each plane.
In matlab, I would use:
sum_vec = sum(3dArray,3);
TIA
wbg
EDIT: I was wrong about my matlab code. Matlab only vectorizes in one dim, so a loop wold be required. So numpy turns out to be more elegant...cool.
MATLAB
for i = 1:3
sum_vec(i) = sum(sum(3dArray(:,:,i));
end
You can do
sum_vec = np.array([plane.sum() for plane in cube])
or simply
sum_vec = cube.sum(-1).sum(-1)
where cube is your 3d array. You can specify 0 or 1 instead of -1 (or 2) depending on the orientation of the planes. The latter version is also better because it doesn't use a Python loop, which usually helps to improve performance when using numpy.
You should use the axis keyword in np.sum. Like in many other numpy functions, axis lets you perform the operation along a specific axis. For example, if you want to sum along the last dimension of the array, you would do:
import numpy as np
sum_vec = np.sum(3dArray, axis=-1)
And you'll get a resulting 2D array which corresponds to the sum along the last dimension to all the array slices 3dArray[i, k, :].
UPDATE
I didn't understand exactly what you wanted. You want to sum over two dimensions (a plane). In this case you can do two sums. For example, summing over the first two dimensions:
sum_vec = np.sum(np.sum(3dArray, axis=0), axis=0)
Instead of applying the same sum function twice, you may perform the sum on the reshaped array:
a = np.random.rand(10, 10, 10) # 3D array
b = a.view()
b.shape = (a.shape[0], -1)
c = np.sum(b, axis=1)
The above should be faster because you only sum once.
sumvec= np.sum(3DArray, axis=2)
or this works as well
sumvec=3DArray.sum(2)
Remember Python starts with 0 so axis=2 represent the 3rd dimension.
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.sum.html
If you're trying to sum over a plane (and avoid loops, which is always a good idea) you can use np.sum and pass two axes as a tuple for your argument.
For example, if you have an (nx3x3) array then using
np.sum(a, (1,2))
Will give an (nx1x1), summing over a plane, not a single axis.