I am trying to array index a 4 dimensional numpy array. - numpy

i have a 4 dimensional array -- say a=numpy.array(40,40,4,1000)
I also have an index array -- say b = np.arrange(35)
I am looking to make an array doing something like c = a[b,b,3,999] where the resulting array would look something like d = numpy.array(35,35). Would appreciate any thoughts on what the right way to do this is. Thank you. Neela.

Since b=np.arange(35) is just the first 35 indices, use slices instead:
c = a[:35,:35,3,999]
If the values in b are not contiguous, then you will need to adjust its shape
c = a[b[:,None], b[None,:], 3, 999]
e.g.
In [754]: a=np.arange(3*4*5).reshape(3,4,5)
In [755]: b=np.array([2,0,1])
In [756]: a[b[:,None],b[None,:],3]
Out[756]:
array([[53, 43, 48],
[13, 3, 8],
[33, 23, 28]])
b[:,None] is a (3,1) array, b[None,:] a (1,3), together they broadcast to (3,3) arrays.
You may need to read up on broadcasting and advanced indexing.
More explicitly this indexing is:
a[[[2],[0],[1]], [[2,0,1]], 3]
np.ix_ is a handy tool for generating indexes like this:
In [795]: I,J = np.ix_(b,b)
In [796]: I
Out[796]:
array([[2],
[0],
[1]])
In [797]: J
Out[797]: array([[2, 0, 1]])
In [798]: a[I,J,3]
Out[798]:
array([[53, 43, 48],
[13, 3, 8],
[33, 23, 28]])

Related

Numpy subarrays and relative indexing

I have been searching if there is an standard mehtod to create a subarray using relative indexes. Take the following array into consideration:
>>> m = np.arange(25).reshape([5, 5])
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
I want to access the 3x3 matrix at a specific array position, for example [2,2]:
>>> x = 2, y = 2
>>> m[slice(x-1,x+2), slice(y-1,y+2)]
array([[ 6, 7, 8],
[11, 12, 13],
[16, 17, 18]])
For example for the above somethig like m.subarray(pos=[2,2], shape=[3,3])
I want to sample a ndarray of n dimensions on a specific position which might change.
I did not want to use a loop as it might be inneficient. Scipy functions correlate and convolve do this very efficiently, but for all positions. I am interested only in the sampling of one.
The best answer could solve the issues at edges, in my case I would like for example to have wrap mode:
(a b c d | a b c d | a b c d)
--------------------EDITED-----------------------------
Based on the answer from #Carlos Horn, I could create the following function.
def cell_neighbours(array, index, shape):
pads = [(floor(dim/2), ceil(dim / 2)) for dim in shape]
array = np.pad(self.configuration, pads, "wrap")
views = np.lib.stride_tricks.sliding_window_view
return views(array, shape)[tuple(index)]
Last concern might be about speed, from docs: For many applications using a sliding window view can be convenient, but potentially very slow. Often specialized solutions exist.
From here maybe is easier to get a faster solution.
You could build a view of 3x3 matrices into the array as follows:
import numpy as np
m = np.arange(25).reshape(5,5)
m3x3view = np.lib.stride_tricks.sliding_window_view(m, (3,3))
Note that it will change slightly your indexing on half the window size meaning
x_view = x - 3//2
y_view = y - 3//2
print(m3x3view[x_view,y_view]) # gives your result
In case a copy operation is fine, you could use:
mpad = np.pad(m, 1, mode="wrap")
mpad3x3view = np.lib.stride_tricks.sliding_window_view(mpad, (3,3))
print(mpad3x3view[x % 5,y % 5])
to use arbitrary x, y integer values.

Explicit slicing across a particular dimension

I've got a 3D tensor x (e.g 4x4x100). I want to obtain a subset of this by explicitly choosing elements across the last dimension. This would have been easy if I was choosing the same elements across last dimension (e.g. x[:,:,30:50] but I want to target different elements across that dimension using the 2D tensor indices which specifies the idx across third dimension. Is there an easy way to do this in numpy?
A simpler 2D example:
x = [[1,2,3,4,5,6],[10,20,30,40,50,60]]
indices = [1,3]
Let's say I want to grab two elements across third dimension of x starting from points specified by indices. So my desired output is:
[[2,3],[40,50]]
Update: I think I could use a combination of take() and ravel_multi_index() but some of the platforms that are inspired by numpy (like PyTorch) don't seem to have ravel_multi_index so I'm looking for alternative solutions
Iterating over the idx, and collecting the slices is not a bad option if the number of 'rows' isn't too large (and the size of the sizes is relatively big).
In [55]: x = np.array([[1,2,3,4,5,6],[10,20,30,40,50,60]])
In [56]: idx = [1,3]
In [57]: np.array([x[j,i:i+2] for j,i in enumerate(idx)])
Out[57]:
array([[ 2, 3],
[40, 50]])
Joining the slices like this only works if they all are the same size.
An alternative is to collect the indices into an array, and do one indexing.
For example with a similar iteration:
idxs = np.array([np.arange(i,i+2) for i in idx])
But broadcasted addition may be better:
In [58]: idxs = np.array(idx)[:,None]+np.arange(2)
In [59]: idxs
Out[59]:
array([[1, 2],
[3, 4]])
In [60]: x[np.arange(2)[:,None], idxs]
Out[60]:
array([[ 2, 3],
[40, 50]])
ravel_multi_index is not hard to replicate (if you don't need clipping etc):
In [65]: np.ravel_multi_index((np.arange(2)[:,None],idxs),x.shape)
Out[65]:
array([[ 1, 2],
[ 9, 10]])
In [66]: x.flat[_]
Out[66]:
array([[ 2, 3],
[40, 50]])
In [67]: np.arange(2)[:,None]*x.shape[1]+idxs
Out[67]:
array([[ 1, 2],
[ 9, 10]])
along the 3D axis:
x = [x[:,i].narrow(2,index,2) for i,index in enumerate(indices)]
x = torch.stack(x,dim=1)
by enumerating you get the index of the axis and index from where you want to start slicing in one.
narrow gives you a zero-copy length long slice from a starting index start along a certain axis
you said you wanted:
dim = 2
start = index
length = 2
then you simply have to stack these tensors back to a single 3D.
This is the least work intensive thing i can think of for pytorch.
EDIT
if you just want different indices along different axis and indices is a 2D tensor you can do:
x = [x[:,i,index] for i,index in enumerate(indices)]
x = torch.stack(x,dim=1)
You really should have given a proper working example, making it unnecessarily confusing.
Here is how to do it in numpy, now clue about torch, though.
The following picks a slice of length n along the third dimension starting from points idx depending on the other two dimensions:
# example
a = np.arange(60).reshape(2, 3, 10)
idx = [(1,2,3),(4,3,2)]
n = 4
# build auxiliary 4D array where the last two dimensions represent
# a sliding n-window of the original last dimension
j,k,l = a.shape
s,t,u = a.strides
aux = np.lib.stride_tricks.as_strided(a, (j,k,l-n+1,n), (s,t,u,u))
# pick desired offsets from sliding windows
aux[(*np.ogrid[:j, :k], idx)]
# array([[[ 1, 2, 3, 4],
# [12, 13, 14, 15],
# [23, 24, 25, 26]],
# [[34, 35, 36, 37],
# [43, 44, 45, 46],
# [52, 53, 54, 55]]])
I came up with below using broadcasting:
x = np.array([[1,2,3,4,5,6,7,8,9,10],[10,20,30,40,50,60,70,80,90,100]])
i = np.array([1,5])
N = 2 # number of elements I want to extract along each dimension. Starting points specified in i
r = np.arange(x.shape[-1])
r = np.broadcast_to(r, x.shape)
ii = i[:, np.newaxis]
ii = np.broadcast_to(ii, x.shape)
mask = np.logical_and(r-ii>=0, r-ii<=N)
output = x[mask].reshape(2,3)
Does this look alright?

numpy find values of maxima pointed to by argmax [duplicate]

This question already has answers here:
Index n dimensional array with (n-1) d array
(3 answers)
Closed 4 years ago.
I have a 3-d array. I find the indexes of the maxima along an axis using argmax. How do I now use these indexes to obtain the maximal values?
2nd part: How to do this for arrays of N-d?
Eg:
u = np.arange(12).reshape(3,4,1)
In [125]: e = u.argmax(axis=2)
Out[130]: e
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]])
It would be nice if u[e] produced the expected results, but it doesn't work.
The return value of argmax along an axis can't be simply used as an index. It only works in a 1d case.
In [124]: u = np.arange(12).reshape(3,4,1)
In [125]: e = u.argmax(axis=2)
In [126]: u.shape
Out[126]: (3, 4, 1)
In [127]: e.shape
Out[127]: (3, 4)
e is (3,4), but its values only index the last dimension of u.
In [128]: u[e].shape
Out[128]: (3, 4, 4, 1)
Instead we have to construct indices for the other 2 dimensions, ones which broadcast with e. For example:
In [129]: I,J=np.ix_(range(3),range(4))
In [130]: I
Out[130]:
array([[0],
[1],
[2]])
In [131]: J
Out[131]: array([[0, 1, 2, 3]])
Those are (3,1) and (1,4). Those are compatible with (3,4) e and the desired output
In [132]: u[I,J,e]
Out[132]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
This kind of question has been asked before, so probably should be marked as a duplicate. The fact that your last dimension is size 1, and hence e is all 0s, distracting readers from the underlying issue (using a multidimensional argmax as index).
numpy: how to get a max from an argmax result
Get indices of numpy.argmax elements over an axis
Assuming you've taken the argmax on the last dimension
In [156]: ij = np.indices(u.shape[:-1])
In [157]: u[(*ij,e)]
Out[157]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
or:
ij = np.ix_(*[range(i) for i in u.shape[:-1]])
If the axis is in the middle, it'll take a bit more tuple fiddling to arrange the ij elements and e.
so for general N-d array
dims = np.ix_(*[range(x) for x in u.shape[:-1]])
u.__getitem__((*dims,e))
You can't write u[*dims,e], that's a syntax error, so I think you must use getitem directly.

Numpy Indexing Behavior

I am having a lot of trouble understanding numpy indexing for multidimensional arrays. In this example that I am working with, let's say that I have a 2D array, A, which is 100x10. Then I have another array, B, which is a 100x1 1D array of values between 0-9 (indices for A). In MATLAB, I would use A(sub2ind(size(A), 1:size(A,1)', B) to return for each row of A, the value at the index stored in the corresponding row of B.
So, as a test case, let's say I have this:
A = np.random.rand(100,10)
B = np.int32(np.floor(np.random.rand(100)*10))
If I print their shapes, I get:
print A.shape returns (100L, 10L)
print B.shape returns (100L,)
When I try to index into A using B naively (incorrectly)
Test1 = A[:,B]
print Test1.shape returns (100L, 100L)
but if I do
Test2 = A[range(A.shape[0]),B]
print Test2.shape returns (100L,)
which is what I want. I'm having trouble understanding the distinction being made here. In my mind, A[:,5] and A[range(A.shape[0]),5] should return the same thing, but it isn't here. How is : different from using range(sizeArray) which just creates an array from [0:sizeArray] inclusive, to use an indices?
Let's look at a simple array:
In [654]: X=np.arange(12).reshape(3,4)
In [655]: X
Out[655]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
With the slice we can pick 3 columns of X, in any order (and even repeated). In other words, take all the rows, but selected columns.
In [656]: X[:,[3,2,1]]
Out[656]:
array([[ 3, 2, 1],
[ 7, 6, 5],
[11, 10, 9]])
If instead I use a list (or array) of 3 values, it pairs them up with the column values, effectively picking 3 values, X[0,3],X[1,2],X[2,1]:
In [657]: X[[0,1,2],[3,2,1]]
Out[657]: array([3, 6, 9])
If instead I gave it a column vector to index rows, I get the same thing as with the slice:
In [659]: X[[[0],[1],[2]],[3,2,1]]
Out[659]:
array([[ 3, 2, 1],
[ 7, 6, 5],
[11, 10, 9]])
This amounts to picking 9 individual values, as generated by broadcasting:
In [663]: np.broadcast_arrays(np.arange(3)[:,None],np.array([3,2,1]))
Out[663]:
[array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]]),
array([[3, 2, 1],
[3, 2, 1],
[3, 2, 1]])]
numpy indexing can be confusing. But a good starting point is this page: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

Numpy rebinning a 2D array

I am looking for a fast formulation to do a numerical binning of a 2D numpy array. By binning I mean calculate submatrix averages or cumulative values. For ex. x = numpy.arange(16).reshape(4, 4) would have been splitted in 4 submatrix of 2x2 each and gives numpy.array([[2.5,4.5],[10.5,12.5]]) where 2.5=numpy.average([0,1,4,5]) etc...
How to perform such an operation in an efficient way... I don't have really any ideay how to perform this ...
Many thanks...
You can use a higher dimensional view of your array and take the average along the extra dimensions:
In [12]: a = np.arange(36).reshape(6, 6)
In [13]: a
Out[13]:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]])
In [14]: a_view = a.reshape(3, 2, 3, 2)
In [15]: a_view.mean(axis=3).mean(axis=1)
Out[15]:
array([[ 3.5, 5.5, 7.5],
[ 15.5, 17.5, 19.5],
[ 27.5, 29.5, 31.5]])
In general, if you want bins of shape (a, b) for an array of (rows, cols), your reshaping of it should be .reshape(rows // a, a, cols // b, b). Note also that the order of the .mean is important, e.g. a_view.mean(axis=1).mean(axis=3) will raise an error, because a_view.mean(axis=1) only has three dimensions, although a_view.mean(axis=1).mean(axis=2) will work fine, but it makes it harder to understand what is going on.
As is, the above code only works if you can fit an integer number of bins inside your array, i.e. if a divides rows and b divides cols. There are ways to deal with other cases, but you will have to define the behavior you want then.
See the SciPy Cookbook on rebinning, which provides this snippet:
def rebin(a, *args):
'''rebin ndarray data into a smaller ndarray of the same rank whose dimensions
are factors of the original dimensions. eg. An array with 6 columns and 4 rows
can be reduced to have 6,3,2 or 1 columns and 4,2 or 1 rows.
example usages:
>>> a=rand(6,4); b=rebin(a,3,2)
>>> a=rand(6); b=rebin(a,2)
'''
shape = a.shape
lenShape = len(shape)
factor = asarray(shape)/asarray(args)
evList = ['a.reshape('] + \
['args[%d],factor[%d],'%(i,i) for i in range(lenShape)] + \
[')'] + ['.sum(%d)'%(i+1) for i in range(lenShape)] + \
['/factor[%d]'%i for i in range(lenShape)]
print ''.join(evList)
return eval(''.join(evList))
I assume that you only want to know how to generally build a function that performs well and does something with arrays, just like numpy.reshape in your example. So if performance really matters and you're already using numpy, you can write your own C code for that, like numpy does. For example, the implementation of arange is completely in C. Almost everything with numpy which matters in terms of performance is implemented in C.
However, before doing so you should try to implement the code in python and see if the performance is good enough. Try do make the python code as efficient as possible. If it still doesn't suit your performance needs, go the C way.
You may read about that in the docs.