Append a "layer" to 3D-Array with Numpy - numpy

I have a numpy array with dimensions 12 x 12 x 4. Now I'm trying to add an extra layer to this cube resulting in a 12 x 13 x 4 array. This 13th layer should contain the corresponding indices from the first axis, so for example addressing [7, 13, :] results in [7, 7, 7, 7].
Hard to explain but maybe someone has some advice on how to achieve this with numpy?
EDIT:
I've found a solution, though it seems a little overcomplicated:
# Generate extra layer
layer = np.repeat(np.arange(0, 12)[:, np.newaxis], data.shape[2], axis=1)
# Get dimensions right...
layer = np.expand_dims(layer, axis=1)
# ... and finally append to data
result = np.append(data, layer, axis=1)
Still open for better suggestions.

You have the right idea. A slight simplification:
layer = np.repeat(np.arange(3)[:,None,None], data.shape[2], axis=2)
result = np.concatenate((data, layer), axis=1)

Related

how to avoid split and sum of pieces in pytorch or numpy

I want to split a long vector into smaller unequal pieces, do a summation on each piece and gather the results into a new vector.
I need to do this in pytorch but I am also interested to see how this is done with numpy.
This can easily be accomplish by splitting the vector.
sizes = [3, 7, 5, 9]
X = torch.ones(sum(sizes))
Y = torch.tensor([s.sum() for s in torch.split(X, sizes)])
or with np.ones and np.split.
Is there a more efficient way to do this?
Edit:
Inspired by the first comment:
indices = np.cumsum([0]+sizes)[:-1]
Y = np.add.reduceat(X, indices.tolist())
solves it for numpy. I am still looking for a solution with pytorch.
index_add_ is your friend!
# inputs
sizes = torch.tensor([3, 7, 5, 9], dtype=torch.long)
x = torch.ones(sizes.sum())
# prepare an index vector for summation (what elements of x are summed to each element of y)
ind = torch.zeros(sizes.sum(), dtype=torch.long)
ind[torch.cumsum(sizes, dim=0)[:-1]] = 1
ind = torch.cumsum(ind, dim=0)
# prepare the output
y = torch.zeros(len(sizes))
# do the actual summation
y.index_add_(0, ind, x)

Cleaner way to whiten each image in a batch using keras

I would like to whiten each image in a batch. The code I have to do so is this:
def whiten(self, x):
shape = x.shape
x = K.batch_flatten(x)
mn = K.mean(x, 0)
std = K.std(x, 0) + K.epsilon()
r = (x - mn) / std
r = K.reshape(x, (-1,shape[1],shape[2],shape[3]))
return r
#
where x is (?, 320,320,1). I am not keen on the reshape function with a -1 arg. Is there a cleaner way to do this?
Let's see what the -1 does. From the Tensorflow documentation (Because the documentation from Keras is scarce compared to the one from Tensorflow):
If one component of shape is the special value -1, the size of that dimension is computed so that the total size remains constant.
So what this means:
from keras import backend as K
X = tf.constant([1,2,3,4,5])
K.reshape(X, [-1, 5])
# Add one more dimension, the number of columns should be 5, and keep the number of elements to be constant
# [[1 2 3 4 5]]
X = tf.constant([1,2,3,4,5,6])
K.reshape(X, [-1, 3])
# Add one more dimension, the number of columns should be 3
# For the number of elements to be constant the number of rows should be 2
# [[1 2 3]
# [4 5 6]]
I think it is simple enough. So what happens in your code:
# Let's assume we have 5 images, 320x320 with 3 channels
X = tf.ones((5, 320, 320, 3))
shape = X.shape
# Let's flat the tensor so we can perform the rest of the computation
flatten = K.batch_flatten(X)
# What this did is: Turn a nD tensor into a 2D tensor with same 0th dimension. (Taken from the documentation directly, let's see that below)
flatten.shape
# (5, 307200)
# So all the other elements were squeezed in 1 dimension while keeping the batch_size the same
# ...The rest of the stuff in your code is executed here...
# So we did all we wanted and now we want to revert the tensor in the shape it had previously
r = K.reshape(flatten, (-1, shape[1],shape[2],shape[3]))
r.shape
# (5, 320, 320, 3)
Besides, I can't think of a cleaner way to do what you want to do. If you ask me, your code is already clear enough.

pytorch equivalent tf.gather

I'm having some trouble porting some code over from tensorflow to pytorch.
So I have a matrix with dimensions 10x30 representing 10 examples each with 30 features. Then I have another matrix with dimensions 10x5 containing indices of the the 5 closest examples for each examples in the first matrix. I want to 'gather' using the indices contained in the second matrix the 5 closet examples for each example in the first matrix leaving me with a 3d tensor of shape 10x5x30.
In tensorflow this is done with tf.gather(matrix1, matrix2). Does anyone know how i could do this in pytorch?
How about this?
matrix1 = torch.randn(10, 30)
matrix2 = torch.randint(high=10, size=(10, 5))
gathered = matrix1[matrix2]
It uses the trick of indexing with an array of integers.
I had a scenario where I had to apply gather() on an array of integers.
Exam-01
torch.Tensor().gather(dim, input_tensor)
# here,
# input_tensor -> tensor(1)
my_list = [0, 1, 2, 3, 4]
my_tensor = torch.IntTensor(my_list)
output = my_tensor.gather(0, input_tensor) # 0 -> is the dimension
Exam-02
torch.gather(param_tensor, dim, input_tensor)
# here,
# input_tensor -> tensor(1)
my_list = [0, 1, 2, 3, 4]
my_tensor = torch.IntTensor(my_list)
output = torch.gather(my_tensor, 0, input_tensor) # 0 -> is the dimension

Multiply certain columns of a 2D tensor by a scaler

Is their a way using tf functions to multiply certain columns of a 2D tensor by a scaler?
e.g. multiply the second and third column of a matrix by 2:
[[2,3,4,5],[4,3,4,3]] -> [[2,6,8,5],[4,6,8,3]]
Thanks for any help.
EDIT:
Thank you Psidom for the reply. Unfortunately I am not using a tf.Variable, so it seems I have to use tf.slice.
What I am trying to do is to multiply all components by 2 of a single-sided PSD, except for the DC component and the Nyquist frequency component, to conserve the total power when going from a double-sided spectrum to a single-sided spectrum.
This would correspond to: 2*PSD[:,1:-1] if it was a numpy array.
Here is my attempt with tf.assign and tf.slice:
x['PSD'] = tf.assign(tf.slice(x['PSD'], [0, 1], [tf.shape(x['PSD'])[0], tf.shape(x['PSD'])[1] - 2]),
tf.scalar_mul(2, tf.slice(x['PSD'], [0, 1], [tf.shape(x['PSD'])[0], tf.shape(x['PSD'])[1] - 2]))) # single-sided power spectral density.
However:
AttributeError: 'Tensor' object has no attribute 'assign'
If the tensor is a variable, you can do this by slicing the columns you want to update and then use tf.assign:
x = tf.Variable([[2,3,4,5],[4,3,4,3]])
x = tf.assign(x[:,1:3], x[:,1:3]*2) # update the second and third columns and assign
# the new tensor to x ​
with tf.Session() as sess:
tf.global_variables_initializer().run()
print(sess.run(x))
#[[2 6 8 5]
# [4 6 8 3]]
Ended up taking 3 different slices and concatenating them together, with the middle slice multiplied by 2. Probably not the most efficient way, but it works:
x['PSD'] = tf.concat([tf.slice(x['PSD'], [0, 0], [tf.shape(x['PSD'])[0], 1]),
tf.scalar_mul(2, tf.slice(x['PSD'], [0, 1], [tf.shape(x['PSD'])[0], tf.shape(x['PSD'])[1] - 2])),
tf.slice(x['PSD'], [0, tf.shape(x['PSD'])[1] - 1], [tf.shape(x['PSD'])[0], 1])], 1) # single-sided power spectral density.

Tensorflow embedding lookup with unequal sized lists

Hej guys,
I'm trying to project multi labeled categorical data into a dense space using embeddings.
Here's an toy example. Let's say I have four categories and want to project them into a 2D space. Furthermore I got two instances, the first one belonging to category 0 and the second one to category 1.
The code will look something like this:
sess = tf.InteractiveSession()
embeddings = tf.Variable(tf.random_uniform([4, 2], -1.0, 1.0))
sess.run(tf.global_variables_initializer())
y = tf.nn.embedding_lookup(embeddings, [0,1])
y.eval()
and return something like this:
array([[ 0.93999457, -0.83051205],
[-0.1699729 , 0.73936272]], dtype=float32)
So far, so good. Now imagine an instance belongs to two categories. The embedding lookup will return two vectors which I can reduce by mean for example:
y = tf.nn.embedding_lookup(embeddings, [[0,1],[1,2]]) # two categories
y_ = tf.reduce_mean(y, axis=1)
y_.eval()
This works just like I expect it as well. My problem now arises when instances in my batch are not belonging to the same amount of categories e.g.:
y = tf.nn.embedding_lookup(embeddings, [[0,1],[1,2,3]]) # unequal sized lists
y_ = tf.reduce_mean(y, axis=1)
y_.eval()
ValueError: Argument must be a dense tensor: [[0, 1], [1, 2, 3]] - got shape [2], but wanted [2, 2].
Any idea about how to get around this problem?