I'm looking to find a vector formulation instead of a loop for the following problem.
import numpy as np
ny = 6 ; nx = 4 ; na = 2
aa = np.array (np.arange (ny*nx), dtype=np.int32)
aa.shape = (ny, nx)
print ( 'aa : ', aa)
# ix1 has length nx
ix1 = np.array ( [0, 2, 1, 3] )
For each value of the second index in aa, I want to take in aa a slice that starts at ix1, of length na
With a loop :
- 1
bb = np.empty ( [na, nx], dtype=np.int32 )
for xx in np.arange (nx) :
bb [:, xx] = aa [ ix1[xx]:ix1[xx]+na, xx]
print ( 'bb : ', bb)
- 2
bb = np.empty ( [na, nx], dtype=np.int32 )
for xx in np.arange (nx) :
bb [:, xx] = aa [ slice(ix1[xx],ix1[xx]+na), xx]
print ( 'bb : ', bb)
- 3
bb = np.empty ( [na, nx], dtype=np.int32 )
for xx in np.arange (nx) :
bb [:, xx] = aa [ np.s_[ix1[xx]:ix1[xx]+na], xx]
print ( 'bb : ', bb)
Is there a vector form of this ?
None of the following works
print ( np.ix_ (ix1,ix1+na) )
aa [ np.ix_ (ix1,ix1+na) ]
print ( np.s_ [ix1:ix1+na] )
aa [ np.s_ [ix1:ix1+na] ]
print ( slice(ix1,ix1+na) )
aa [ slice(ix1,ix1+na) ]
print ( (slice(ix1,ix1+na), slice(None,None) ))
aa [ (slice(ix1,ix1+na), slice(None,None))]
Look at the problem cases. np.s_ is just a way of creating a slice object. It doesn't add any functionality:
In [562]: ix1
Out[562]: array([0, 2, 1, 3])
In [563]: slice(ix1,ix1+na)
Out[563]: slice(array([0, 2, 1, 3]), array([2, 4, 3, 5]), None)
In [564]: np.s_[ix1: ix1+na]
Out[564]: slice(array([0, 2, 1, 3]), array([2, 4, 3, 5]), None)
Using either as index is the same as (your previous loops showed the equivalence of these slice notations):
In [569]: aa[ix1:ix1+na]
Traceback (most recent call last):
File "<ipython-input-569-f4db64c86100>", line 1, in <module>
aa[ix1:ix1+na]
TypeError: only integer scalar arrays can be converted to a scalar index
While it's possible to create a slice object with array values, it does not work in an actual index.
Think of it as the equivalent of trying to create a range of numbers:
In [572]: np.arange(ix1[0], ix1[0]+na)
Out[572]: array([0, 1])
In [573]: np.arange(ix1, ix1+na)
Traceback (most recent call last):
File "<ipython-input-573-94cfee666466>", line 1, in <module>
np.arange(ix1, ix1+na)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
range between 0 and 2 is fine, but not range between the arrays. Indexing slices must be between scalars, not arrays.
linspace does allow us to create multidimensional ranges:
In [574]: np.linspace(ix1,ix1+na,2,endpoint=False, dtype=int)
Out[574]:
array([[0, 2, 1, 3],
[1, 3, 2, 4]])
As long as the number of values is the same (here 2), the other values are just a matter of scaling or offset.
In [576]: ix1 + np.arange(0,2)[:,None]
Out[576]:
array([[0, 2, 1, 3],
[1, 3, 2, 4]])
That 2d linspace index can be used to index the rows of aa, along with a arange for columns:
In [579]: aa[Out[574],np.arange(4)]
Out[579]:
array([[ 0, 9, 6, 15],
[ 4, 13, 10, 19]], dtype=int32)
Basically the only alternative to joining multiple indexing operations is to construct a join indexing array(s). Here it's easy to do. In more general case, that join might itself require concatenation.
I asked for aa and bb.
In [580]: aa
Out[580]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]], dtype=int32)
In [581]: bb
Out[581]:
array([[ 0, 9, 6, 15],
[ 4, 13, 10, 19]], dtype=int32)
Related
I am trying to access a pytorch tensor by a matrix of indices and I recently found this bit of code that I cannot find the reason why it is not working.
The code below is split into two parts. The first half proves to work, whilst the second trips an error. I fail to see the reason why. Could someone shed some light on this?
import torch
import numpy as np
a = torch.rand(32, 16)
m, n = a.shape
xx, yy = np.meshgrid(np.arange(m), np.arange(m))
result = a[xx] # WORKS for a torch.tensor of size M >= 32. It doesn't work otherwise.
a = torch.rand(16, 16)
m, n = a.shape
xx, yy = np.meshgrid(np.arange(m), np.arange(m))
result = a[xx] # IndexError: too many indices for tensor of dimension 2
and if I change a = np.random.rand(16, 16) it does work as well.
To whoever comes looking for an answer: it looks like its a bug in pyTorch.
Indexing using numpy arrays is not well defined, and it works only if tensors are indexed using tensors. So, in my example code, this works flawlessly:
a = torch.rand(M, N)
m, n = a.shape
xx, yy = torch.meshgrid(torch.arange(m), torch.arange(m), indexing='xy')
result = a[xx] # WORKS
I made a gist to check it, and it's available here
First, let me give you a quick insight into the idea of indexing a tensor with a numpy array and another tensor.
Example: this is our target tensor to be indexed
numpy_indices = torch.tensor([[0, 1, 2, 7],
[0, 1, 2, 3]]) # numpy array
tensor_indices = torch.tensor([[0, 1, 2, 7],
[0, 1, 2, 3]]) # 2D tensor
t = torch.tensor([[1, 2, 3, 4], # targeted tensor
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16],
[17, 18, 19, 20],
[21, 22, 23, 24],
[25, 26, 27, 28],
[29, 30, 31, 32]])
numpy_result = t[numpy_indices]
tensor_result = t[tensor_indices]
Indexing using a 2D numpy array: the index is read like pairs (x,y) tensor[row,column] e.g. t[0,0], t[1,1], t[2,2], and t[7,3].
print(numpy_result) # tensor([ 1, 6, 11, 32])
Indexing using a 2D tensor: walks through the index tensor in a row-wise manner and each value is an index of a row in the targeted tensor.
e.g. [ [t[0],t[1],t[2],[7]] , [[0],[1],[2],[3]] ] see the example below, the new shape of tensor_result after indexing is (tensor_indices.shape[0],tensor_indices.shape[1],t.shape[1])=(2,4,4).
print(tensor_result) # tensor([[[ 1, 2, 3, 4],
# [ 5, 6, 7, 8],
# [ 9, 10, 11, 12],
# [29, 30, 31, 32]],
# [[ 1, 2, 3, 4],
# [ 5, 6, 7, 8],
# [ 9, 10, 11, 12],
# [ 13, 14, 15, 16]]])
If you try to add a third row in numpy_indices, you will get the same error you have because the index will be represented by 3D e.g., (0,0,0)...(7,3,3).
indices = np.array([[0, 1, 2, 7],
[0, 1, 2, 3],
[0, 1, 2, 3]])
print(numpy_result) # IndexError: too many indices for tensor of dimension 2
However, this is not the case with indexing by tensor and the shape will be bigger (3,4,4).
Finally, as you see the outputs of the two types of indexing are completely different. To solve your problem, you can use
xx = torch.tensor(xx).long() # convert a numpy array to a tensor
What happens in the case of advanced indexing (rows of numpy_indices > 3 ) as your situation is still ambiguous and unsolved and you can check 1 , 2, 3.
for example I got many sub-arrays by splitting one array A based on list B:
A = np.array([[1,1,1],
[2,2,2],
[2,3,4],
[5,8,10],
[5,9,9],
[7,9,6],
[1,1,1],
[2,2,2],
[9,2,4],
[9,3,6],
[10,3,3],
[11,2,2]])
B = np.array([5,7])
C = np.split(A,B.cumsum()[:-1])
>>>print(C)
>>>array([[1,1,1],
[1,2,2],
[2,3,4],
[5,8,10],
[5,9,9]]),
array([[7,9,6],
[1,1,1],
[2,2,2],
[9,2,4],
[9,3,6],
[10,3,3],
[11,2,2]])
How can I find get the rows only appeared once in all the sub-arrays (delete those who appeared twice)? so that I can get the result like: (because [1,1,1] and [2,2,2] appeared twice in C )
>>>array([[2,3,4],
[5,8,10],
[5,9,9]]),
array([[7,9,6],
[9,2,4],
[9,3,6],
[10,3,3],
[11,2,2]])
You can use np.unique to identify the duplicates:
_, i, c = np.unique(A, axis=0, return_index=True, return_counts=True)
idx = np.isin(np.arange(len(A)), i[c==1])
out = [a[i] for a,i in zip(np.split(A, B.cumsum()[:-1]),
np.split(idx, B.cumsum()[:-1]))]
output:
[array([[ 2, 3, 4],
[ 5, 8, 10],
[ 5, 9, 9]]),
array([[ 7, 9, 6],
[ 9, 2, 4],
[ 9, 3, 6],
[10, 3, 3],
[11, 2, 2]])]
I'd like to select elements from an array along a specific axis given an index array. For example, given the arrays
a = np.arange(30).reshape(5,2,3)
idx = np.array([0,1,1,0,0])
I'd like to select from the second dimension of a according to idx, such that the resulting array is of shape (5,3). Can anyone help me with that?
You could use fancy indexing
a[np.arange(5),idx]
Output:
array([[ 0, 1, 2],
[ 9, 10, 11],
[15, 16, 17],
[18, 19, 20],
[24, 25, 26]])
To make this more verbose this is the same as:
x,y,z = np.arange(a.shape[0]), idx, slice(None)
a[x,y,z]
x and y are being broadcasted to the shape (5,5). z could be used to select any columns in the output.
I think this gives the results you are after - it uses np.take_along_axis, but first you need to reshape your idx array so that it is also a 3d array:
a = np.arange(30).reshape(5, 2, 3)
idx = np.array([0, 1, 1, 0, 0]).reshape(5, 1, 1)
results = np.take_along_axis(a, idx, 1).reshape(5, 3)
Giving:
[[ 0 1 2]
[ 9 10 11]
[15 16 17]
[18 19 20]
[24 25 26]]
For Python iterables, sum() is applicable to append multiple slices from left to right.
import numpy as np
_list = list(range(15))
print("iterables is {}".format(_list))
print(sum(
[ _list[_slice] for _slice in np.s_[1:3, 5:7, 9:11] ],
start=[]
))
---
List is [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
[1, 2, 5, 6, 9, 10]
It cannot be simply apply to numpy array.
import numpy as np
_list = np.arange(15)
print("List is {}\n".format(_list))
print(sum(
[ _list[_slice] for _slice in np.s_[1:3, 5:7, 9:11] ],
start=[]
))
---
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-66-a9d278e659c8> in <module>
3 print("List is {}\n".format(_list))
4
----> 5 print(sum(
6 [ _list[_slice] for _slice in np.s_[1:3, 5:7, 9:11] ],
7 start=[]
ValueError: operands could not be broadcast together with shapes (0,) (2,)
I suppose numpy way is something like below.
import numpy as np
a = np.arange(15).astype(np.int32)
print("array is {}\n".format(a))
print([a[_slice] for _slice in slices])
np.concatenate([a[_slice] for _slice in slices])
---
array is [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]
[array([1, 2], dtype=int32), array([5, 6], dtype=int32), array([ 9, 10], dtype=int32)]
array([ 1, 2, 5, 6, 9, 10], dtype=int32)
Question
Is there a way to be able to apply sum(). Is there better way other than np.concatenate?
In [38]: np.s_[1:3, 5:7, 9:11]
Out[38]: (slice(1, 3, None), slice(5, 7, None), slice(9, 11, None))
np.r_ can make a composite index - basically a concatenate of aranges:
In [39]: np.r_[1:3, 5:7, 9:11]
Out[39]: array([ 1, 2, 5, 6, 9, 10])
Alternatively, create the slice objects, index and concatenate:
In [40]: x = np.s_[1:3, 5:7, 9:11]
In [41]: y = np.arange(20)
In [42]: np.concatenate([y[s] for s in x])
Out[42]: array([ 1, 2, 5, 6, 9, 10])
When I looked at this in the past, performance is similar.
Ways of creating the indices with list join:
In [46]: list(range(1,3))+list(range(5,7))+list(range(9,11))
Out[46]: [1, 2, 5, 6, 9, 10]
In [50]: sum([list(range(i,j)) for i,j in [(1,3),(5,7),(9,11)]],start=[])
Out[50]: [1, 2, 5, 6, 9, 10]
sum(..., start=[]) is just a list way of concatenating, using the + definition for lists.
In [55]: alist = []
In [56]: for i,j in [(1,3),(5,7),(9,11)]: alist.extend(range(i,j))
In [57]: alist
Out[57]: [1, 2, 5, 6, 9, 10]
Lets say I have a Python Numpy array a.
a = numpy.array([1,2,3,4,5,6,7,8,9,10,11])
I want to create a matrix of sub sequences from this array of length 5 with stride 3. The results matrix hence will look as follows:
numpy.array([[1,2,3,4,5],[4,5,6,7,8],[7,8,9,10,11]])
One possible way of implementing this would be using a for-loop.
result_matrix = np.zeros((3, 5))
for i in range(0, len(a), 3):
result_matrix[i] = a[i:i+5]
Is there a cleaner way to implement this in Numpy?
Approach #1 : Using broadcasting -
def broadcasting_app(a, L, S ): # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
return a[S*np.arange(nrows)[:,None] + np.arange(L)]
Approach #2 : Using more efficient NumPy strides -
def strided_app(a, L, S ): # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
n = a.strides[0]
return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))
Sample run -
In [143]: a
Out[143]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [144]: broadcasting_app(a, L = 5, S = 3)
Out[144]:
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])
In [145]: strided_app(a, L = 5, S = 3)
Out[145]:
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])
Starting in Numpy 1.20, we can make use of the new sliding_window_view to slide/roll over windows of elements.
And coupled with a stepping [::3], it simply becomes:
from numpy.lib.stride_tricks import sliding_window_view
# values = np.array([1,2,3,4,5,6,7,8,9,10,11])
sliding_window_view(values, window_shape = 5)[::3]
# array([[ 1, 2, 3, 4, 5],
# [ 4, 5, 6, 7, 8],
# [ 7, 8, 9, 10, 11]])
where the intermediate result of the sliding is:
sliding_window_view(values, window_shape = 5)
# array([[ 1, 2, 3, 4, 5],
# [ 2, 3, 4, 5, 6],
# [ 3, 4, 5, 6, 7],
# [ 4, 5, 6, 7, 8],
# [ 5, 6, 7, 8, 9],
# [ 6, 7, 8, 9, 10],
# [ 7, 8, 9, 10, 11]])
Modified version of #Divakar's code with checking to ensure that memory is contiguous and that the returned array cannot be modified. (Variable names changed for my DSP application).
def frame(a, framelen, frameadv):
"""frame - Frame a 1D array
a - 1D array
framelen - Samples per frame
frameadv - Samples between starts of consecutive frames
Set to framelen for non-overlaping consecutive frames
Modified from Divakar's 10/17/16 11:20 solution:
https://stackoverflow.com/questions/40084931/taking-subarrays-from-numpy-array-with-given-stride-stepsize
CAVEATS:
Assumes array is contiguous
Output is not writable as there are multiple views on the same memory
"""
if not isinstance(a, np.ndarray) or \
not (a.flags['C_CONTIGUOUS'] or a.flags['F_CONTIGUOUS']):
raise ValueError("Input array a must be a contiguous numpy array")
# Output
nrows = ((a.size-framelen)//frameadv)+1
oshape = (nrows, framelen)
# Size of each element in a
n = a.strides[0]
# Indexing in the new object will advance by frameadv * element size
ostrides = (frameadv*n, n)
return np.lib.stride_tricks.as_strided(a, shape=oshape,
strides=ostrides, writeable=False)