Related
I have a convolutional autoencoder model. While an autoencoder typically focuses on reconstructing the input without using any label information, I want to use the class label to perform class conditional scaling/shifting after convolutions. I am curious if utilizing the label in this way might help produce better reconstructions.
num_filters = 32
input_img = layers.Input(shape=(28, 28, 1)) # input image
label = layers.Input(shape=(10,)) # label
# separate scale value for each of the filter dimensions
scale = layers.Dense(num_filters, activation=None)(label)
# conv_0 produces something of shape (None,14,14,32)
conv_0 = layers.Conv2D(num_filters, (3, 3), strides=2, activation=None, padding='same')(input_img)
# TODO: Need help here. Multiply conv_0 by scale along each of the filter dimensions.
# This still outputs something of shape (None,14,14,32)
# Essentially each 14x14x1 has it's own scalar multiplier
In the example above, the output of the convolutional layer is (14,14,32) and the scale layer is of shape (32,). I want the convolutional output to be multiplied by the corresponding scale value along each filter dimension. For example, if these were numpy arrays I could do something like conv_0[:, :, i] * scale[i] for i in range(32).
I looked at tf.keras.layers.Multiply which can be found here, but based on the documentation I believe that takes in tensors of the same size as input. How do I work around this?
You don't have to loop. Simply do the following by making two tensors broadcast-compatible,
out = layers.Multiply()([conv_0, tf.expand_dims(tf.expand_dims(scale,axis=1), axis=1)])
I dont know if i actually understood what you are trying to achieve but i did a quick numpy test. I believe it should hold in tensorflow also:
conv_0 = np.ones([14, 14, 32])
scale = np.array([ i + 1 for i in range(32)])
result = conv_0 * scale
check whether channel-wise slices actually scaled element-wise in this case by the element found at index 1 in scale, which is 2
conv_0_slice_1 = conv_0[:, :, 1]
result_slice_1 = result[:, :, 1]
I am trying to understand Bahdanaus attention using the following tutorial:
https://www.tensorflow.org/tutorials/text/nmt_with_attention
The calculation is the following:
self.attention_units = attention_units
self.W1 = Dense(self.attention_units)
self.W2 = Dense(self.attention_units)
self.V = Dense(1)
score = self.V(tf.nn.tanh(self.W1(last_inp_dec) + self.W2(input_enc)))
I have two problems:
I cannot understand why the shape of tf.nn.tanh(self.W1(last_inp_dec) + self.W2(input_enc)) is (batch_size,max_len,attention_units) ?
Using the rules of matrix multiplication I got the following results:
a) Shape of self.W1(last_inp_dec) -> (1,hidden_units_dec) * (hidden_units_dec,attention_units) = (1,attention_units)
b) Shape of self.W2(last_inp_enc) -> (max_len,hidden_units_dec) * (hidden_units_dec,attention_units) = (max_len,attention_units)
Then we add up a) and b) quantities. How do we end up with dimensionality (max_len, attention_units) or (batch_size, max_len, attention_units)? How can we do addition with different size of second dimension (1 vs max_len)?
Why do we multiply tf.nn.tanh(self.W1(last_inp_dec) + self.W2(input_enc)) by self.V? Because we want alphas as scalar?
) I cannot understand why the shape of tf.nn.tanh(self.W1(last_inp_dec) + self.W2(input_enc)) is
(batch_size,max_len,attention_units) ?
From the comments section of the code in class BahdanauAttention
query_with_time_axis shape = (batch_size, 1, hidden size)
Note that the dimension 1 was added using tf.expand_dims to make the shape compatible with values for the addition. The added dimension of 1 gets broadcast during the addition operation. Otherwise, the incoming shape was (batch_size, hidden size), which would not have been compatible
values shape = (batch_size, max_len, hidden size)
Addition of the query_with_time_axis shape and values shape gives us a shape of (batch_size, max_len, hidden size)
) Why do we multiply tf.nn.tanh(self.W1(last_inp_dec) + self.W2(input_enc)) by self.V? Because we want alphas as scalar?
self.V is the final layer, the output of which gives us the score. The random weight initialization of the self.V layer is handled by keras behind the scene in the line self.V = tf.keras.layers.Dense(1).
We are not multiplying tf.nn.tanh(self.W1(last_inp_dec) + self.W2(input_enc)) by self.V.
The construct self.V(tf.nn.tanh(self.W1(last_inp_dec) + self.W2(input_enc)) means --> the tanh activations resulting from the operation tf.nn.tanh(self.W1(last_inp_dec) + self.W2(input_enc)) form the input matrix to the single output output layer represented by self.V.
The shapes are slightly different from the ones you have given. It is best understood with a direct example perhaps?
Assuming 10 units in the alignment layer and 128 embedding dimensions on the decoder and 256 dimensions on the encoder and 19 timesteps, then:
last_inp_dec and input_enc shapes would be (?,128) and (?,19,256). We need to now expand last_inp_dec over the time axis to make it (?,1,128) so that addition is possible.
The layer weights for w1,w2,v will be (?,128,10), (?,256,10) and (?,10,1) respectively. Notice how self.w1(last_inp_dec) works out to (?,1,10). This is added to each of the self.w2(input_enc) to give a shape of (?,19,10). The result is fed to self.v and the output is (?,19,1) which is the shape we want - a set of 19 weights. Softmaxing this gives the attention weights.
Multiplying this attention weight with each encoder hidden state and summing up returns the context.
To your question as to why 'v' is needed, it is needed because Bahdanau provides the option of using 'n' units in the alignment layer (to determine w1,w2) and we need one more layer on top to massage the tensor back to the shape we want - a set of attention weights..one for each time step.
I just posted an answer at Understanding Bahdanau's Attention Linear Algebra
with all the shapes the tensors and weights involved.
I have been going through the implementation of neural network in openAI code for any Vanilla Policy Gradient (As a matter of fact, this part is used nearly everywhere). The code looks something like this :
def mlp_categorical_policy(x, a, hidden_sizes, activation, output_activation, action_space):
act_dim = action_space.n
logits = mlp(x, list(hidden_sizes) + [act_dim], activation, None)
logp_all = tf.nn.log_softmax(logits)
pi = tf.squeeze(tf.random.categorical(logits, 1), axis=1)
logp = tf.reduce_sum(tf.one_hot(a, depth=act_dim) * logp_all, axis=1)
logp_pi = tf.reduce_sum(tf.one_hot(pi, depth=act_dim) * logp_all, axis=1)
return pi, logp, logp_pi
and this multi-layered perceptron network is defined as follows :
def mlp(x, hidden_sizes=(32,), activation=tf.tanh, output_activation=None):
for h in hidden_sizes[:-1]:
x = tf.layers.dense(inputs=x, units=h, activation=activation)
return tf.layers.dense(inputs=x, units=hidden_sizes[-1], activation=output_activation)
My question is what is the return from this mlp function? I mean the structure or shape. Is it an N-dimentional tensor? If so, how is it given as an input to tf.random_categorical? If not, and its just has the shape [hidden_layer2, output], then what happened to the other layers? As per their website description about random_categorical it only takes a 2-D input. The complete code of openAI's VPG algorithm can be found here. The mlp is implemented here. I would be highly grateful if someone would just tell me what this mlp_categorical_policy() is doing?
Note: The hidden size is [64, 64], the action dimension is 3
Thanks and cheers
Note that this is a discrete action space - there are action_space.n different possible actions at every step, and the agent chooses one.
To do this the MLP is returning the logits (which are a function of the probabilities) of the different actions. This is specified in the code by + [act_dim] which is appending count of the action_space as the final MLP layer. Note that the last layer of an MLP is the output layer. The input layer is not specified in tensorflow, it is inferred from the inputs.
tf.random.categorical takes the logits and samples a policy action pi from them, which is returned as a number.
mlp_categorical_policy also returns logp, the log probability of the action a (used to assign credit), and logp_pi, the log probability of the policy action pi.
It seems your question is more about the return from the mlp.
The mlp creates a series of fully connected layers in a loop. In each iteration of the loop, the mlp is creating a new layer using the previous layer x as an input and assigning it's output to overwrite x, with this line x = tf.layers.dense(inputs=x, units=h, activation=activation).
So the output is not the same as the input, on each iteration x is overwritten with the value of the new layer. This is the same kind of coding trick as x = x + 1, which increments x by 1. This effectively chains the layers together.
The output of tf.layers.dense is a tensor of size [:,h] where : is the batch dimension (and can usually be ignored). The creation of the last layer happens outisde the loop, it can be seen that the number of nodes in this layer is act_dim (so shape is [:,3]). You can check the shape by doing this:
import tensorflow.compat.v1 as tf
import numpy as np
def mlp(x, hidden_sizes=(32,), activation=tf.tanh, output_activation=None):
for h in hidden_sizes[:-1]:
x = tf.layers.dense(x, units=h, activation=activation)
return tf.layers.dense(x, units=hidden_sizes[-1], activation=output_activation)
obs = np.array([[1.0,2.0]])
logits = mlp(obs, [64, 64, 3], tf.nn.relu, None)
print(logits.shape)
result: TensorShape([1, 3])
Note that the observation in this case is [1.,2.], it is nested inside a batch of size 1.
I have a 1D input signal. I want to compute autocorrelation as the part of the neural net for further use inside the network.
I need to perform convolution of input with input itself.
To perform convolution in keras custom layer/ tensorflow. We need the following parameters
data shape is "[batch, in_height, in_width, in_channels]",
filter shape is "[filter_height, filter_width, in_channels, out_channels]
There is no batch present in filter shape, which needs to be input in my case
TensorFlow now has an auto_correlation function. It should be in release 1.6. If you build from source you can use it right now (see e.g. the github code).
Here is a possible solution.
By self convolution, I understood a regular convolution where the filter is exactly the same as the input (if it's not that, sorry for my misunderstanding).
We need a custom function for that, and a Lambda layer.
At first I used padding = 'same' which brings outputs with the same length as the inputs. I'm not sure about what output length you want exactly, but if you want more, you should add padding yourself before doing the convolution. (In the example with length 7, for a complete convolution from one end to another, this manual padding would include 6 zeros before and 6 zeros after the input length, and use padding = 'valid'. Find the backend functions here)
Working example - Input (5,7,2)
from keras.models import Model
from keras.layers import *
import keras.backend as K
batch_size = 5
length = 7
channels = 2
channels_batch = batch_size*channels
def selfConv1D(x):
#this function unfortunately needs to know previously the shapes
#mainly because of the for loop, for other lines, there are workarounds
#but these workarounds are not necessary since we'll have this limitation anyway
#original x: (batch_size, length, channels)
#bring channels to the batch position:
x = K.permute_dimensions(x,[2,0,1]) #(channels, batch_size, length)
#suppose channels are just individual samples (since we don't mix channels)
x = K.reshape(x,(channels_batch,length,1))
#here, we get a copy of x reshaped to match filter shapes:
filters = K.permute_dimensions(x,[1,2,0]) #(length, 1, channels_batch)
#now, in the lack of a suitable available conv function, we make a loop
allChannels = []
for i in range (channels_batch):
f = filters[:,:,i:i+1]
allChannels.append(
K.conv1d(
x[i:i+1],
f,
padding='same',
data_format='channels_last'))
#although channels_last is my default config, I found this bug:
#https://github.com/fchollet/keras/issues/8183
#convolution output: (1, length, 1)
#concatenate all results as samples
x = K.concatenate(allChannels, axis=0) #(channels_batch,length,1)
#restore the original form (passing channels to the end)
x = K.reshape(x,(channels,batch_size,length))
return K.permute_dimensions(x,[1,2,0]) #(batch_size, length, channels)
#input data for the test:
x = np.array(range(70)).reshape((5,7,2))
#little model that just performs the convolution
inp= Input((7,2))
out = Lambda(selfConv1D)(inp)
model = Model(inp,out)
#checking results
p = model.predict(x)
for i in range(5):
print("x",x[i])
print("p",p[i])
You can just use tf.nn.conv3d by treating the "batch size" as "depth":
# treat the batch size as depth.
data = tf.reshape(input_data, [1, batch, in_height, in_width, in_channels])
kernel = [filter_depth, filter_height, filter_width, in_channels, out_channels]
out = tf.nn.conv3d(data, kernel, [1,1,1,1,1], padding='SAME')
Due to the project work of my master study I am implementing a neural network using the tensorflow library form Google. At that I would like to determine (at the output layer of my feed forward neural network) several labels in parallel. And as activation function of the output layer I want to use the softmax function.
So what I want to have specifically is a output is a Vector that looks like this:
vec = [0.1, 0.8, 0.1, 0.3, 0.2, 0.5]
Here the first three numbers are the probabilities of the three classes of the first classification and the other three numbers are the probabilities of the three classes of the second classification. So in this case I would say that the labels are:
[ class2 , class3 ]
In a first attempt I tried to implement this by first reshapeing the (1x6) vector to a (2x3) Matrix with tf.reshape(), then apply the softmax-function on the matrix tf.nn.softmax() and finally reshape the matrix back to a vector. Unfortunately, due to the reshaping, the Gradient-Descent-Optimizer gets problems with calculating the gradient, so I tried something different.
What I do now is, I take the (1x6) vector and multiply it my a matrix that has a (3x3) identity-matrix in the upper part and a (3x3) zero-matrix in the lower part. Whit this I extract the first three entries of the vector. Then I can apply the softmax function and bring it back into the old form of (1x6) by another matrix multiplication. This has to be repeated for the other three vector entries as well.
outputSoftmax = tf.nn.softmax( vec * [[1,0,0],[0,1,0],[0,0,1],[0,0,0],[0,0,0],[0,0,0]] ) * tf.transpose( [[1,0,0],[0,1,0],[0,0,1],[0,0,0],[0,0,0],[0,0,0]] )
+ tf.nn.softmax( vec * [[0,0,0],[0,0,0],[0,0,0],[1,0,0],[0,1,0],[0,0,1]] ) * tf.transpose( [[0,0,0],[0,0,0],[0,0,0],[1,0,0],[0,1,0],[0,0,1]] )
It works so far, but I don't like this solution.
Because in my real problem, I not only have to determine two labels at a time but 91, I would have to repeat the procedure form above 91-times.
Does anyone have an solution, how I can obtain the desired vector, where the softmax function is applied on only three entries at a time, without writing the "same" code 91-times?
You could apply the tf.split function to obtain 91 tensors (one for each class), then apply softmax to each of them.
classes_split = tf.split(0, 91, all_in_one)
for c in classes_split:
softmax_class = tf.nn.softmax(c)
# use softmax_class to compute some loss, add it to overall loss
or instead of computing the loss directly, you could also concatenate them together again:
classes_split = tf.split(0, 91, all_in_one)
# softmax each split individually
classes_split_softmaxed = [tf.nn.softmax(c) for c in classes_split]
# Concatenate again
all_in_one_softmaxed = tf.concat(0, classes_split_softmaxed)