Broadcasting index operations in numpy - numpy

How can I take elements from a NumPy array given multiple index arrays with broadcasting? Or: how can I simplify/vectorize this loop:
elems = np.random.rand(3, 10, 7) # shape N x I x M
ind = np.array([[1, 2], [3, 4], [0, 9]]) # shape N x J
res = np.stack([elems[i, ind[i]] for i in range(len(elems))]) # shape N x J x M

Translate the loop index to an arange and use braodcasting:
>>> elems = np.arange(2*3*4).reshape(2,3,4)
>>> ind = np.arange(0,8,2).reshape(2, 2) % 3
>>>
>>> elems
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
>>> elems[np.arange(2)[:, None], ind]
array([[[ 0, 1, 2, 3],
[ 8, 9, 10, 11]],
[[16, 17, 18, 19],
[12, 13, 14, 15]]])

Related

pytorch tensor stride - how it works

PyTorch doesn't seem to have documentation for tensor.stride().
Can someone confirm my understanding?
My questions are three-fold.
Stride is for accessing an element in the storage. So stride size will be the same as the dimension of the tensor. Correct?
For each dimension, the corresponding element of stride tells how much it takes to move along the 1-dimensional storage. Correct?
For example:
In [15]: x = torch.arange(1,25)
In [16]: x
Out[16]:
tensor([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18
, 19, 20, 21, 22, 23, 24])
In [17]: a = x.view(4,3,2)
In [18]: a
Out[18]:
tensor([[[ 1, 2],
[ 3, 4],
[ 5, 6]],
[[ 7, 8],
[ 9, 10],
[11, 12]],
[[13, 14],
[15, 16],
[17, 18]],
[[19, 20],
[21, 22],
[23, 24]]])
In [20]: a.stride()
Out[20]: (6, 2, 1)
How does having this information help perform tensor operations efficiently? Basically this is showing the memory layout. So how does it help?

Adding a row-dependent value to each row

I have a 2D array containing the following numbers:
A = [[1, 5, 9, 42],
[20, 2, 71, 0],
[2, 44, 4, 9]]
I want to add a different constant value to each row without using loops. This value is a n*c with n being the current row and c being the constant. For example, c=100 so that:
B = [[1, 5, 9, 42],
[120, 102, 171, 100],
[202, 244, 204, 209]]
Any help would be greatly appreciated
You can do that as follows:
>>> A = [[1, 5, 9, 42],
... [20, 2, 71, 0],
... [2, 44, 4, 9]]
...
>>> a = np.array(A)
>>> c = 100
>>> addto = np.arange(len(a))[:, None] * c
>>> a + addto
array([[ 1, 5, 9, 42],
[120, 102, 171, 100],
[202, 244, 204, 209]])
np.arange(len(a)) gets you a 1-dimensional array of the indices, array([0, 1, 2]), which you can then multiply by c.
The hitch is that you then need to conform this to NumPy's broadcasting rules by expanding it's dimensionality:
>>> np.arange(len(a)).shape
(3,)
>>> np.arange(len(a))[:, None].shape
(3, 1)
You could also do something like np.linspace(0, 100*(len(a)-1), num=len(a))[:, None], but that is probably overkill here.

special tiling of matrix in tensorflow or numpy

Consider 3D tensor of T(w x h x d).
The goal is to create a tensor of R(w x h x K) where K = d x k by tiling along 3rd dimension in a unique way.
The tensor should repeat each slice in 3rd dimension k times, meaning :
T[:,:,0]=R[:,:,0:k] and T[:,:,1]=R[:,:,k:2*k]
There's a subtle difference with standard tiling which gives T[:,:,0]=R[:,:,::k], repeats at every kth in 3rd dimension.
Use np.repeat along that axis -
np.repeat(T,k,axis=2)
Sample run -
In [688]: # Setup
...: w,h,d = 2,3,4
...: k = 2
...: T = np.random.randint(0,9,(w,h,d))
...:
...: # Original approach
...: R = np.zeros((w,h,d*k),dtype=T.dtype)
...: for i in range(4):
...: R[:,:,i*k:(i+1)*k] = T[:,:,i][...,None]
...:
In [692]: T
Out[692]:
array([[[4, 5, 6, 4],
[5, 4, 4, 3],
[8, 0, 0, 8]],
[[7, 3, 8, 0],
[8, 7, 0, 8],
[3, 6, 8, 5]]])
In [690]: R
Out[690]:
array([[[4, 4, 5, 5, 6, 6, 4, 4],
[5, 5, 4, 4, 4, 4, 3, 3],
[8, 8, 0, 0, 0, 0, 8, 8]],
[[7, 7, 3, 3, 8, 8, 0, 0],
[8, 8, 7, 7, 0, 0, 8, 8],
[3, 3, 6, 6, 8, 8, 5, 5]]])
In [691]: np.allclose(R, np.repeat(T,k,axis=2))
Out[691]: True
Alternatively with np.tile and reshape -
np.tile(T[...,None],k).reshape(w,h,-1)

Most efficient way to reshape tensor into sequences

I am working with audio in TensorFlow, and would like to obtain a series of sequences which could be obtained from sliding a window over my data, so to speak. Examples to illustrate my situation:
Current Data Format:
Shape = [batch_size, num_features]
example = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12],
[13, 14, 15]
]
What I want:
Shape = [batch_size - window_length + 1, window_length, num_features]
example = [
[
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
],
[
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]
],
[
[7, 8, 9],
[10, 11, 12],
[13, 14, 15]
],
]
My current solution is to do something like this:
list_of_windows_of_data = []
for x in range(batch_size - window_length + 1):
list_of_windows_of_data.append(tf.slice(data, [x, 0], [window_length,
num_features]))
windowed_data = tf.squeeze(tf.stack(list_of_windows_of_data, axis=0))
And this does the transform. However, it also creates 20,000 operations which slows TensorFlow down a lot when creating a graph. If anyone else has a fun and more efficient way to do this, please do share.
You can do that using tf.map_fn as follows:
example = tf.constant([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12],
[13, 14, 15]
]
)
res = tf.map_fn(lambda i: example[i:i+3], tf.range(example.shape[0]-2), dtype=tf.int32)
sess=tf.InteractiveSession()
res.eval()
This prints
array([[[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9]],
[[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]],
[[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15]]])
You could use the built-in tf.extract_image_patches:
example = tf.constant([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12],
[13, 14, 15]
]
)
res = tf.reshape(tf.extract_image_patches(example[None,...,None],
[1,3,3,1], [1,1,1,1], [1,1,1,1], 'VALID'), [-1,3,3])

numpy: Efficient way to use a 1D array as an index into 2D array

X.shape == (10,4)
y.shape == (10)
I'd like to produce M, where each entry in M is defined as M[r,c] == X[r, y[r]]; that is, use y to index into the appropriate column of X.
How can I do this efficiently (without loops)?
M could have a single column, though eventually I need to broadcast it so that it has the same shape as X. c starts from the first col of X (0) and goes to the last (9).
Just do :
X=np.arange(40).reshape(10,4)
Y=np.random.randint(0,4,10)
M=X[range(10),Y]
for
In [8]: X
Out[8]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31],
[32, 33, 34, 35],
[36, 37, 38, 39]])
In [9]: Y
Out[9]: array([1, 1, 3, 3, 1, 2, 2, 3, 2, 1])
In [10]: M
Out[10]: array([ 1, 5, 11, 15, 17, 22, 26, 31, 34, 37])