Numpy 3D indexing sepecific column elements - numpy

I have a multidimensional (3D) array like the following:
>>>arr=np.array([[[0, 0],
[1, 1]],
[[2, 0],
[3, 1]],
[[3, 0],
[4, 1]]])
I would like to obtain for each of the 2d submatrices a the element of column 0 located in a certain row, being the value that selects the row a value associated with each submatrix, that is, I would like to achieve:
>>>arr[:,[0,1,0],0]
array([0,3,3])
However, with the above command what I get is:
>>>arr[:,[0,1,0],0]
array([[0, 1, 0],
[2, 3, 2],
[3, 4, 3]])
Per the documentation, I was able to achieve the goal using the following command:
>>>arr[range(arr.shape[0]),[0,1,0],0]
array([0,3,3])
But I would like to know if there is a better way where I don't need to pass a list with all the indices for the first element of the indexing, like in the first example.

Related

What is the difference between np.array([val1, val2]) and np.array([[val1, val2]])?

What is the difference between np.array([1, 2]) and np.array([[1, 2]])?
Which one of them is a matrix?
I also do not understand the output for shape of the above tensors. The former returns (2,) and the latter returns (1,2).
np.array([1, 2]) builds an array starting from a list, thus giving you a 1D array with the shape (2, ) since it only contains a single list of two elements.
When using the double [ you are actually passing a list of lists, thus this gets you a multidimensional array, or matrix, with the shape (1, 2).
With the latter you are able to build more complex matrices like:
np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
rendering a 3x3 matrix:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

special vectorial product numpy

ref question
Let's say I have vector s and I want to produce the matrix m (see image) with only numpy functions, how could I do that ? I imagined to transpose the vector s and to find a special product between s and s^t but I couldn't manage to find it. Do you have any idea ?
This looks like an outer product:
s = np.arange(3) # array([1, 2, 3])
np.multiply.outer(s,s)
output:
array([[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])

Bitwise OR along one axis of a NumPy array

For a given NumPy array, it is easy to perform a "normal" sum along one dimension. For example:
X = np.array([[1, 0, 0], [0, 2, 2], [0, 0, 3]])
X.sum(0)
=array([1, 2, 5])
X.sum(1)
=array([1, 4, 3])
Instead, is there an "efficient" way of computing the bitwise OR along one dimension of an array similarly? Something like the following, except without requiring for-loops or nested function calls.
Example: bitwise OR along zeroeth dimension as I currently am doing it:
np.bitwise_or(np.bitwise_or(X[:,0],X[:,1]),X[:,2])
=array([1, 2, 3])
What I would like:
X.bitwise_sum(0)
=array([1, 2, 3])
numpy.bitwise_or.reduce(X, axis=whichever_one_you_wanted)
Use the reduce method of the numpy.bitwise_or ufunc.

numpy dot or einsum with arbitrary operator

I would like to use something like np.dot or (preferably) np.einsum to efficiently perform their same function but with an alternate ufunc instead of np.multiply. For example, consider these two arrays:
>>> a
array([[0, 1],
[1, 1],
[1, 0]])
>>> b
array([[0, 0],
[1, 0],
[1, 0],
[0, 0]])
Now suppose I want to count the number of elements in each row of a equal to the corresponding elements in each row of b. I'd like to be able to do the equivalent of the following (note: the output below is fabricated but the values are what I would expect to see):
>>> np.dot(a, b.T, ufunc=np.equal)
array([[1, 0, 0, 1],
[0, 1, 1, 0],
[1, 2, 2, 1]])
Is there a way to do this?
You can use the broadcasting from Divakar's answer together with numexpr:
numexpr.evaluate('sum(1*(a == b), axis=2)', {'a': a[:,None]})
The 1*() is a workaround. I have confirmed this doesn't allocate a big temporary array.
You could use broadcasting for such match counting problem -
(a[:,None] == b).sum(2)
Sample run -
In [36]: a
Out[36]:
array([[0, 1],
[1, 1],
[1, 0]])
In [37]: b
Out[37]:
array([[0, 0],
[1, 0],
[1, 0],
[0, 0]])
In [38]: (a[:,None] == b).sum(2)
Out[38]:
array([[1, 0, 0, 1],
[0, 1, 1, 0],
[1, 2, 2, 1]])
If you really want to employ np.einsum and np.equal, here's a way to mold the earlier approach to give us the desired result -
np.einsum('ijk->ij',np.equal(a[:,None],b).astype(int))
There's an old issue on the numpy github asking for a generalization of einsum that would allow the use of other functions. The current version just implements a sum of products. As far as I know, no one has taken on that project.
Several years ago I patched einsum, fixing the handling of the '...' notation. So I have a good idea of how it is implemented; and could probably adapt my Python/cython emulator to add this feature. The actual einsum code is written in c.
My guess is that if you don't like Divakar's approach, you'll have to write your own version with cython.

Numpy Indexing Behavior

I am having a lot of trouble understanding numpy indexing for multidimensional arrays. In this example that I am working with, let's say that I have a 2D array, A, which is 100x10. Then I have another array, B, which is a 100x1 1D array of values between 0-9 (indices for A). In MATLAB, I would use A(sub2ind(size(A), 1:size(A,1)', B) to return for each row of A, the value at the index stored in the corresponding row of B.
So, as a test case, let's say I have this:
A = np.random.rand(100,10)
B = np.int32(np.floor(np.random.rand(100)*10))
If I print their shapes, I get:
print A.shape returns (100L, 10L)
print B.shape returns (100L,)
When I try to index into A using B naively (incorrectly)
Test1 = A[:,B]
print Test1.shape returns (100L, 100L)
but if I do
Test2 = A[range(A.shape[0]),B]
print Test2.shape returns (100L,)
which is what I want. I'm having trouble understanding the distinction being made here. In my mind, A[:,5] and A[range(A.shape[0]),5] should return the same thing, but it isn't here. How is : different from using range(sizeArray) which just creates an array from [0:sizeArray] inclusive, to use an indices?
Let's look at a simple array:
In [654]: X=np.arange(12).reshape(3,4)
In [655]: X
Out[655]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
With the slice we can pick 3 columns of X, in any order (and even repeated). In other words, take all the rows, but selected columns.
In [656]: X[:,[3,2,1]]
Out[656]:
array([[ 3, 2, 1],
[ 7, 6, 5],
[11, 10, 9]])
If instead I use a list (or array) of 3 values, it pairs them up with the column values, effectively picking 3 values, X[0,3],X[1,2],X[2,1]:
In [657]: X[[0,1,2],[3,2,1]]
Out[657]: array([3, 6, 9])
If instead I gave it a column vector to index rows, I get the same thing as with the slice:
In [659]: X[[[0],[1],[2]],[3,2,1]]
Out[659]:
array([[ 3, 2, 1],
[ 7, 6, 5],
[11, 10, 9]])
This amounts to picking 9 individual values, as generated by broadcasting:
In [663]: np.broadcast_arrays(np.arange(3)[:,None],np.array([3,2,1]))
Out[663]:
[array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]]),
array([[3, 2, 1],
[3, 2, 1],
[3, 2, 1]])]
numpy indexing can be confusing. But a good starting point is this page: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html