Numpy: convert index in one dimension into many dimensions - numpy

Many array methods return a single index despite the fact that the array is multidimensional. For example:
a = rand(2,3)
z = a.argmax()
For two dimensions, it is easy to find the matrix indices of the maximum element:
a[z/3, z%3]
But for more dimensions, it can become annoying. Does Numpy/Scipy have a simple way of returning the indices in multiple dimensions given an index in one (collapsed) dimension? Thanks.

Got it!
a = X.argmax()
(i,j) = unravel_index(a, X.shape)

I don't know of an built-in function that does what you want, but where this
has come up for me, I realized that what I really wanted to do was this:
given 2 arrays a,b with the same shape, find the element of b which is in
the same position (same [i,j,k...] position) as the maximum element of a
For this, the quick numpy-ish solution is:
j = a.flatten().argmax()
corresponding_b_element = b.flatten()[j]
Vince Marchetti

Related

How to select elements of an array from a specific axis in Python

I am working with multidimensional arrays with dynamical axes. Now I want to select elements of the array along a specific axis.
For example, if I have a 3-dimensional array, I want to pick the elements like this
b = a[:, :, 1]
Now my problem is that after one iteration of code the same array becomes 4 dimensional. And again I want to pick the elements like this
b = a[:,:,1,:]
Thus I am looking for a general solution to pick all elements from the 3rd axis of the array. This is very simple if I had to choose a[1] and I could get a[1,:,:,:], but I am not aware how to chose for other axes.
Edit:
Also, I would be interested in a solution where the interested axis also changes for example with the same code and next iteration I would like to get
b = a[:,:,:,1]

Simple question about slicing a Numpy Tensor

I have a Numpy Tensor,
X = np.arange(64).reshape((4,4,4))
I wish to grab the 2,3,4 entries of the first dimension of this tensor, which you can do with,
Y = X[[1,2,3],:,:]
Is this a simpler way of writing this instead of explicitly writing out the indices [1,2,3]? I tried something like [1,:], which gave me an error.
Context: for my real application, the shape of the tensor is something like (30000,100,100). I would like to grab the last (10000, 100,100) to (30000,100,100) of this tensor.
The simplest way in your case is to use X[1:4]. This is the same as X[[1,2,3]], but notice that with X[1:4] you only need one pair of brackets because 1:4 already represent a range of values.
For an N dimensional array in NumPy if you specify indexes for less than N dimensions you get all elements of the remaining dimensions. That is, for N equal to 3, X[1:4] is the same as X[1:4, :, :] or X[1:4, :]. Only if you want to index some dimension while getting all elements in a dimension that comes before it is that you actually need to pass :. Such as X[:, 2:4], for instance.
If you wish to select from some row to the end of array, simply use python slicing notation as below:
X[10000:,:,:]
This will select all rows from 10000 to the end of array and all columns and depths for them.

use numpy to make a collection dictionary for some statistical objects

I want to use numpy to make a collection dictionary for some statistical objects and the simplified state is as follows.
There are respectively a scalar-array noted as
a = np.array([n1,n2,n3...])
and a 2D-array as
b = np.array([[q1_1,q1_2],[q2_1,q2_2],[q3_1,q3_2]...])
For each element ni in a, I want to pick out all the elements qi([qi_1,qi_2]) that contain ni in b and make a dict with the key as ni to collect them.
I have recorded a clumsy method for this purpose (assume that a and b are determined) into the following codes as:
import numpy as np
a = np.array([i+1 for i in range(100)])
b = np.array([[2*i+1,2*(i+1)] for i in range(50)])
dict = {}
for i in a: dict[i] = [j for j in b if i in j]
There's no doubt, that when a and b are large, this will be very slow.
Is there any other efficient way to replace the above one?
Seeking your help!
thanks for your idea. It can solve completely my problem. Your core concept is to make a comparison of a and b and to get the Boolean array as the result. So, it is much fast to use this Boolean index for the array b to bulid the dictionary. Follow this idea, I rewrite your codes in my own way as that
dict = {}
for item in a:
index_left, index_right = (b[:,0]==item), (b[:,1]==item)
index = np.logical_or(index_left, index_right)
dict[item] = dict[index]
These codes are still not faster than yours but can avoid the 'memories error' even in large a and b(eg. a=100000 and b=200000)
Numpy arrays allow elementwise comparison:
equal = b[:,:,np.newaxis]==a #np.newaxis to broadcast
# if one of the two is equal, we will include this element
index = np.logical_or(equal[:,0], equal[:,1])
# indexing by a boolean array to get the result
dictionary = {i: b[index[:,i]] for i in range(len(a))}
As a final remark: Are you sure you want to use a dictionary? By this you lose a lot of the numpy advantages
Edit, answer to your comment:
With a and b this large, equal will have size 10^10, which makes 8*10^10 bytes, which is approximately 72 G. That's why you get this error.
The main question you should ask is: Do I really need this big arrays? If yes, are you sure, that the dictionary will not be to large as well?
The problem can be solved, by not computing everything at once, but in n times, n should be about 72/16 (the proportion in memory) in your case. However having n a little bit larger will probably speed up the process:
stride = int(len(b)/n)
dictionary = {}
for i in range(n):
#splitting b into several parts
equal = b[n*stride:(n+1)*stride,:,np.newaxis]==a
index = np.logical_or(equal[:,0], equal[:,1])
dictionary.update( {i: b[index[:,i]] for i in range(len(a))})

julia index matrix with vector

Suppose I have a 20-by-10 matrix m
and a 20-by-1 vector v, where each element is an integer between 1 to 10.
Is there smart indexing command something like m[:,v]
that would give a vector, where each element i is element of m at the index [i,v[i]]?
No, it seems that you cannot do it. Documentation (http://docs.julialang.org/en/stable/manual/arrays/) says:
If all the indices are scalars, then the result X is a single element from the array A. Otherwise, X is an array with the same number of dimensions as the sum of the dimensionalities of all the indices.
So, to get 1d result from indexing operation you need to have one of the indices to have dimensionality 0, i.e. to be just a scalar -- and you won't get what you want then.
Use comprehension, as proposed in the comment to your question.
To be explicit about the comprehension approach:
[m[i,v[i]] for i = 1:length(v)]
This is concise and clear enough that having a special syntax seems unnecessary.

Numpy sum over planes of 3d array, return a scalar

I'm making the transition from MATLAB to Numpy and feeling some growing pains.
I have a 3D array, lets say it's 3x3x3 and I want the scalar sum of each plane.
In matlab, I would use:
sum_vec = sum(3dArray,3);
TIA
wbg
EDIT: I was wrong about my matlab code. Matlab only vectorizes in one dim, so a loop wold be required. So numpy turns out to be more elegant...cool.
MATLAB
for i = 1:3
sum_vec(i) = sum(sum(3dArray(:,:,i));
end
You can do
sum_vec = np.array([plane.sum() for plane in cube])
or simply
sum_vec = cube.sum(-1).sum(-1)
where cube is your 3d array. You can specify 0 or 1 instead of -1 (or 2) depending on the orientation of the planes. The latter version is also better because it doesn't use a Python loop, which usually helps to improve performance when using numpy.
You should use the axis keyword in np.sum. Like in many other numpy functions, axis lets you perform the operation along a specific axis. For example, if you want to sum along the last dimension of the array, you would do:
import numpy as np
sum_vec = np.sum(3dArray, axis=-1)
And you'll get a resulting 2D array which corresponds to the sum along the last dimension to all the array slices 3dArray[i, k, :].
UPDATE
I didn't understand exactly what you wanted. You want to sum over two dimensions (a plane). In this case you can do two sums. For example, summing over the first two dimensions:
sum_vec = np.sum(np.sum(3dArray, axis=0), axis=0)
Instead of applying the same sum function twice, you may perform the sum on the reshaped array:
a = np.random.rand(10, 10, 10) # 3D array
b = a.view()
b.shape = (a.shape[0], -1)
c = np.sum(b, axis=1)
The above should be faster because you only sum once.
sumvec= np.sum(3DArray, axis=2)
or this works as well
sumvec=3DArray.sum(2)
Remember Python starts with 0 so axis=2 represent the 3rd dimension.
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.sum.html
If you're trying to sum over a plane (and avoid loops, which is always a good idea) you can use np.sum and pass two axes as a tuple for your argument.
For example, if you have an (nx3x3) array then using
np.sum(a, (1,2))
Will give an (nx1x1), summing over a plane, not a single axis.