Indexing a 4D array using another array of 3D indices - numpy

A have a 4D array M (a x b x c x d) and an array I of indices (3 x f), e.g.
I = np.array([1,2,3, ...], [2,1,3, ...], [4,1,6, ...])
I would like to use I to arrive at a matrix X that has f rows and d columns, where:
X[0,:] = M[1,2,4,:]
X[1,:] = M[2,1,1,:]
X[2,:] = M[3,3,6,:]
...
I know I can use M[I[0], I[1], I[2]], however, I was wondering if there's a more concise solution?

You can use use, for example:
I = np.array([[1,2,3], [2,1,3], [4,1,6]])
M = np.ndarray((10,10,10,10))
X = np.array([M[t,:] for t in I])

This would be one way to do it -
import numpy as np
# Get row indices for use when M is reshaped to a 2D array of d-columns format
row_idx = np.sum(I*np.append(1,np.cumprod(M.shape[1:-1][::-1]))[::-1][:,None],0)
# Reshape M to d-columns 2D array and use row_idx to get final output
out = M.reshape(-1,M.shape[-1])[row_idx]
As, an alternative to find row_idx, if you would like to avoid np.append, you can do -
row_idx = np.sum(I[:-1]*np.cumprod(M.shape[1:-1][::-1])[::-1][:,None],0) + I[-1]
Or little less scary way to get row_idx -
_,p2,p3,_ = M.shape
row_idx = np.sum(I*np.array([p3*p2,p3,1])[:,None],0)

Related

Numpy : multivariate indexing?

I wander, is it possible to index several dimensions at once ? With some broadcasting. Example :
Suppose i have an array A, shaped (n,d). Suppose i have a indexing array, say I with integer values between 0 and d-1. Set B = A[:,I].
If shape(I) == (k,), for whaterver k, then B has shape (n,k) and B[x,y] = A[x,I[y]].
But if shape(I) == (k,p) for whatever (k,p), then i wanted B to be shaped (n,k,p) with B[x,y,z] = A[x,I[y,z]].
1° How can i get this behavior ?
2° Does it have a drawback i did not see ?
You can do it exactly as you described it:
import numpy as np
n = 100
d = 20
k = 10
p = 17
A = np.random.random((n, d))
I = np.random.randint(low=0, high=d, size=(k, p))
B = A[:, I]
print(B.shape) # (n, k, p)
# Testing if the new array B is constructed as expected
x = 3
y = 5
z = 7
print(B[x, y, z])
print(A[x, I[y, z]])
print(B[x, y, z] == A[x, I[y, z]])
Its hard to answer if this is a good implementation or not, without context. But in general it is a good idea to use numpy and vectorization if you have speed in mind.

In numpy, what is the efficient way to find the maximum values and their indices of a 3D ndarray across two axis?

How to find the correlation-peak values and coordinates of a set of 2D cross-correlation functions?
Given an 3D ndarray that contains a set of 2D cross-correlation functions. What is the efficient way to find the maximum(peak) values and their coordinates(x and y indices)?
The code below do the work but I think it is inefficient.
import numpy as np
import numpy.matlib
ccorr = np.random.rand(7,5,5)
xind = ccorr.argmax(axis=-1)
mccorr = ccorr[np.matlib.repmat(np.arange(0,7)[:,np.newaxis],1,5),np.matlib.repmat(np.arange(0,5)[np.newaxis,:],7,1), xind]
yind = mccorr.argmax(axis=-1)
xind = xind[np.arange(0,7),yind]
values = mccorr[np.arange(0,7),yind]
print("cross-correlation functions (z,y,x)")
print(ccorr)
print("x and y indices of the maximum values")
print(xind,yind)
print("Maximum values")
print(values)
You'll want to flatten the dimensions you're searching over and then use unravel_index and take_along_axis to get the coordinates and values, respectively.
ccorr = np.random.rand(7,5,5)
cc_rav = ccorr.reshape(ccorr.shape[0], -1)
idx = np.argmax(cc_rav, axis = -1)
indices_2d = np.unravel_index(idx, ccorr.shape[1:])
vals = np.take_along_axis(ccorr, indices = indices_2d, axis = 0)
if you're using numpy version <1.15:
vals = cc_rav[np.arange(ccorr.shape[0]), idx]
or:
vals = ccorr[np.arange(ccorr.shape[0]),
indices_2d[0], indices_2d[1]]

How to optimize the linear coefficients for numpy arrays in a maximization function?

I have to optimize the coefficients for three numpy arrays which maximizes my evaluation function.
I have a target array called train['target'] and three predictions arrays named array1, array2 and array3.
I want to put the best linear coefficients i.e., x,y,z for these three arrays which will maximize the function
roc_aoc_curve(train['target'], xarray1 + yarray2 +z*array3)
the above function would be maximum when prediction is closer to the target.
i.e, xarray1 + yarray2 + z*array3 should be closer to train['target'].
The range of x,y,z >=0 and x,y,z <= 1
Basically I am trying to put the weights x,y,z for each of the three arrays which would make the function
xarray1 + yarray2 +z*array3 closer to the train['target']
Any help in getting this would be appreciated.
I used pulp.LpProblem('Giapetto', pulp.LpMaximize) to do the maximization. It works for normal numbers, integers etc, however failing while trying to do with arrays.
import numpy as np
import pulp
# create the LP object, set up as a maximization problem
prob = pulp.LpProblem('Giapetto', pulp.LpMaximize)
# set up decision variables
x = pulp.LpVariable('x', lowBound=0)
y = pulp.LpVariable('y', lowBound=0)
z = pulp.LpVariable('z', lowBound=0)
score = roc_auc_score(train['target'],x*array1+ y*array2 + z*array3)
prob += score
coef = x+y+z
prob += (coef==1)
# solve the LP using the default solver
optimization_result = prob.solve()
# make sure we got an optimal solution
assert optimization_result == pulp.LpStatusOptimal
# display the results
for var in (x, y,z):
print('Optimal weekly number of {} to produce: {:1.0f}'.format(var.name, var.value()))
Getting error at the line
score = roc_auc_score(train['target'],x*array1+ y*array2 + z*array3)
TypeError: unsupported operand type(s) for /: 'int' and 'LpVariable'
Can't progress beyond this line when using arrays. Not sure if my approach is correct. Any help in optimizing the function would be appreciated.
When you add sums of array elements to a PuLP model, you have to use built-in PuLP constructs like lpSum to do it -- you can't just add arrays together (as you discovered).
So your score definition should look something like this:
score = pulp.lpSum([train['target'][i] - (x * array1[i] + y * array2[i] + z * array3[i]) for i in arr_ind])
A few notes about this:
[+] You didn't provide the definition of roc_auc_score so I just pretended that it equals the sum of the element-wise difference between the target array and the weighted sum of the other 3 arrays.
[+] I suspect your actual calculation for roc_auc_score is nonlinear; more on this below.
[+] arr_ind is a list of the indices of the arrays, which I created like this:
# build array index
arr_ind = range(len(array1))
[+] You also didn't include the arrays, so I created them like this:
array1 = np.random.rand(10, 1)
array2 = np.random.rand(10, 1)
array3 = np.random.rand(10, 1)
train = {}
train['target'] = np.ones((10, 1))
Here is my complete code, which compiles and executes, though I'm sure it doesn't give you the result you are hoping for, since I just guessed about target and roc_auc_score:
import numpy as np
import pulp
# create the LP object, set up as a maximization problem
prob = pulp.LpProblem('Giapetto', pulp.LpMaximize)
# dummy arrays since arrays weren't in OP code
array1 = np.random.rand(10, 1)
array2 = np.random.rand(10, 1)
array3 = np.random.rand(10, 1)
# build array index
arr_ind = range(len(array1))
# set up decision variables
x = pulp.LpVariable('x', lowBound=0)
y = pulp.LpVariable('y', lowBound=0)
z = pulp.LpVariable('z', lowBound=0)
# dummy roc_auc_score since roc_auc_score wasn't in OP code
train = {}
train['target'] = np.ones((10, 1))
score = pulp.lpSum([train['target'][i] - (x * array1[i] + y * array2[i] + z * array3[i]) for i in arr_ind])
prob += score
coef = x + y + z
prob += coef == 1
# solve the LP using the default solver
optimization_result = prob.solve()
# make sure we got an optimal solution
assert optimization_result == pulp.LpStatusOptimal
# display the results
for var in (x, y,z):
print('Optimal weekly number of {} to produce: {:1.0f}'.format(var.name, var.value()))
Output:
Optimal weekly number of x to produce: 0
Optimal weekly number of y to produce: 0
Optimal weekly number of z to produce: 1
Process finished with exit code 0
Now, if your roc_auc_score function is nonlinear, you will have additional troubles. I would encourage you to try to formulate the score in a way that is linear, possibly using additional variables (for example, if you want the score to be an absolute value).

Numpy masking 3d array

I'm not sure how to achieve the following (preferably without a loop).
I have a numpy array A having dimensions 100*100*3.
I also have a numpy array M having the same dimensions (100*100*3). M is actually a mask, and M[i,j] is [0,0,0] for most pairs (i,j) but for some pairs (i,j) it is not equal to [0,0,0].
What I would like to do is the following:
A[i,j] = M[i,j] when M[i,j] != [0,0,0]
A[ M != [0,0,0]] = M [ M != [0,0,0]] doesn't seem to work.
How can this be done efficiently with numpy?
You were needed to look for ALL match along the last axis and use that mask for boolean-indexing/masking -
mask = ~(M==0).all(-1) # or (M!=0).any(-1)
A[mask] = M[mask]
Or use np.where -
mask = ~(M==0).all(-1,keepdims=1)
Aout = np.where(mask, M, A)

Tensorflow indexing into 2d tensor with 1d tensor

I have a 2D tensor A with shape [batch_size, D] , and a 1D tensor B with shape [batch_size]. Each element of B is a column index of A, for each row of A, eg. B[i] in [0,D).
What is the best way in tensorflow to get the values A[B]
For example:
A = tf.constant([[0,1,2],
[3,4,5]])
B = tf.constant([2,1])
with desired output:
some_slice_func(A, B) -> [2,4]
There is another constraint. In practice, batch_size is actually None.
Thanks in advance!
I was able to get it working using a linear index:
def vector_slice(A, B):
""" Returns values of rows i of A at column B[i]
where A is a 2D Tensor with shape [None, D]
and B is a 1D Tensor with shape [None]
with type int32 elements in [0,D)
Example:
A =[[1,2], B = [0,1], vector_slice(A,B) -> [1,4]
[3,4]]
"""
linear_index = (tf.shape(A)[1]
* tf.range(0,tf.shape(A)[0]))
linear_A = tf.reshape(A, [-1])
return tf.gather(linear_A, B + linear_index)
This feels slightly hacky though.
If anyone knows a better (as in clearer or faster) please also leave an answer! (I won't accept my own for a while)
Code for what #Eugene Brevdo said:
def vector_slice(A, B):
""" Returns values of rows i of A at column B[i]
where A is a 2D Tensor with shape [None, D]
and B is a 1D Tensor with shape [None]
with type int32 elements in [0,D)
Example:
A =[[1,2], B = [0,1], vector_slice(A,B) -> [1,4]
[3,4]]
"""
B = tf.expand_dims(B, 1)
range = tf.expand_dims(tf.range(tf.shape(B)[0]), 1)
ind = tf.concat([range, B], 1)
return tf.gather_nd(A, ind)
the least hacky way is probably to build a proper 2d index by concatenating range(batch_size) and B, to get a batch_size x 2 matrix. then pass this to tf.gather_nd.
The simplest approach is to do:
def tensor_slice(target_tensor, index_tensor):
indices = tf.stack([tf.range(tf.shape(index_tensor)[0]), index_tensor], 1)
return tf.gather_nd(target_tensor, indices)
Consider to use tf.one_hot, tf.math.multiply and tf.reduce_sum to solve it.
e.g.
def vector_slice (inputs, inds, axis = None):
axis = axis if axis is not None else tf.rank(inds) - 1
inds = tf.one_hot(inds, inputs.shape[axis])
for i in tf.range(tf.rank(inputs) - tf.rank(inds)):
inds = tf.expand_dims(inds, axis = -1)
inds = tf.cast(inds, dtype = inputs.dtype)
x = tf.multiply(inputs, inds)
return tf.reduce_sum(x, axis = axis)