I'm trying out something from a paper[1], where it is required to have a common bias term for all the neurons in the FC layer, instead of individual biases for each neuron.
How do I do it in Tensorflow?
Is it possible to create an FC layer without biases, and finally add tf.nn.bias_add()? Is that the right way to do it?
If so, there is no flag to set tf.nn.bias_add() as trainable. Will it work?
Or any other suggestions?
References:
[1]: Deep Reinforcement Learning with Double Q-Learning (Pg.6 right side)
tf.nn.bias_add(value, bias) requires A 1-D Tensor with size matching the last dimension of value. so just tf.add will do:
X = tf.placeholder(tf.float32, [None, 32])
W = tf.Variable(tf.truncated_normal([32, 16]))
#Single bias for all the 16 output
bias = tf.Variable(tf.truncated_normal([1]))
#Works as expected
y = tf.add(tf.matmul(X,W), bias)
#ValueError: Dimensions must be equal, but are 16 and 1 for 'BiasAdd'
z = tf.nn.bias_add(tf.matmul(X,W), bias)
Related
So recently i am working on a project which i am supposed to take images as input to a CNN and extract the features and feed them to LSTM for training. I am using 2 Layer CNN for feature extraction and im taking the features form fully connected layer and trying to feed them to LSTM. Problem is when i want to feed the FC layer to LSTM as input i get error regarding to wrong dimension. my FC layer is a Tensor with (128,1024) dimension. I tried to reshape it like this tf.reshape(fc,[-1]) which gives me a tensor ok (131072, )
dimension and still wont work. Could anyone give me any ideas of how im suppose to feed the FC to LSTM?here i just write part of my code and teh error i get.
Convolution Layer with 32 filters and a kernel size of 5
conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv1 = tf.layers.max_pooling2d(conv1, 2, 2)
# Convolution Layer with 32 filters and a kernel size of 5
conv2 = tf.layers.conv2d(conv1, 64, 3, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv2 = tf.layers.max_pooling2d(conv2, 2, 2)
# Flatten the data to a 1-D vector for the fully connected layer
fc1 = tf.contrib.layers.flatten(conv2)
# Fully connected layer (in contrib folder for now)
fc1 = tf.layers.dense(fc1, 1024)
# Apply Dropout (if is_training is False, dropout is not applied)
fc1 = tf.layers.dropout(fc1, rate=dropout, training=is_training)
s = tf.reshape(fc1, [1])
rnn_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
outputs, states = rnn.static_rnn(rnn_cell, s, dtype=tf.float32)
return tf.matmul(outputs[-1], rnn_weights['out']) + rnn_biases['out']
here is the error:
ValueError: Cannot reshape a tensor with 131072 elements to shape [1] (1 elements) for 'ConvNet/Reshape' (op: 'Reshape') with input shapes: [128,1024], [1] and with input tensors computed as partial shapes: input[1] = [1].
You have a logical error in how you approach the problem. Collapsing the data to a 1D tensor is not going to solve anything (even if you get it to work correctly).
If you are taking a sequence of images as input your input tensor should be 5D (batch, sequence_index, x, y, channel) or something permutation like that. conv2d should complain about the extra dimension but you probably missing one of them. You should try to fix it first.
Next use conv3d and max_pool3d with a window of 1 for the depth (since you don't want the different frames to interact at this stage).
When you are done you should still have 5D tensor, but x and y dimensions should be 1 (you should check this, and fix the operation if that's not the case).
The RNN part expects 3D tensors (batch, sequence_index, fature_index). You can use tf.squeeze to remove the 1 sized dimensions from your 5D tensor and get this 3D tensor. You shouldn't have to reshape anything.
If you don't use batches, it's OK, but the operations will still expect the dimension to be there (but for you it will be 1). Missing the dimension will cause problems with shapes down the line.
I've learned ML and have been learning DL from Andrew N.G's coursera courses, and every time he talks about a linear classifier, the weights are just a 1-D vector.
Even during the assignments, when we roll an image into a 1-D vector(pixels * 3), the weights would still be a 1-D vector.
I now have started O'Reilly's "Learning TensorFlow" book, and came across the first example. The weights initialization in tensorflow was a bit different.
The book says the following(Page 14):
"Since we are not going to use the spatial information at this point, we will unroll our image pixels as a single long vector denoted x (Figure 2-2). Then
$xw^0 = ∑x_i w^0_i$
will be the evidence for the image containing the digit 0 (and in the same way we will have $w^d$ weight vectors for each one of the other digits, d = 1, . . . , 9)."
and the corresponding TensorFlow code:
data = input_data.read_data_sets(DATA_DIR, one_hot=True)
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
y_true = tf.placeholder(tf.float32, [None, 10])
y_pred = tf.matmul(x, W)
Why are the weights 2-D here. Are weights 2-D in softmax Linear Classifier?
In the coursera course, when he taught Softmax Linear Classifier, he still says the weights are 1-D. Can anyone explain this?
Yes, You are right that weights are 1-D, but that is just for 1 neuron.
If you consider a straightforward layered neural network, it will have some number of layers(Just 1 layer with 10 neurons in your code). So, in tensorflow, the weights variable contains weights for entire layer and not a single neuron, which makes it a 2-D array.
W = tf.Variable(tf.zeros([784, 10]))
This line means that there are 10 neurons, each with a weight array of length 784.
One rule of thumb to understand this in tensorflow is that weight dimentions are written as..
W = tf.Variable(tf.zeros([output_of_previous_layer, output_of_current_layer]))
or
W = tf.Variable(tf.zeros([input_of_current_layer, input_of_next_layer]))
You can read more about this at Intro to Neural Networks
I'm studying LSTM with CNN in tensorflow.
I want to put some scalar label into LSTM network as a condition.
Does anybody know which LSTM is what I meant?
If available, please let me know the usage of that
Thank you.
This thread might interest you: Adding Features To Time Series Model LSTM.
You have basically 3 possible ways:
Let's take an example with weather data from two different cities: Paris and San Francisco. You want to predict the next temperature based on historical data. But at the same time, you expect the weather to change based on the city. You can either:
Combine the auxiliary features with the time series data, at the beginning or at the end (ugly!).
Concatenate the auxiliary features with the output of the RNN layer. It's some kind of post-RNN adjustment since the RNN layer won't see this auxiliary info.
Or just initialize the RNN states with a learned representation of the condition (e.g. Paris or San Francisco).
I wrote a library to condition on auxiliary inputs. It abstracts all the complexity and has been designed to be as user-friendly as possible:
https://github.com/philipperemy/cond_rnn/
The implementation is in tensorflow (>=1.13.1) and Keras.
Hope it helps!
Heres an example of applying CNN and LSTM over the output probabilities of a sequence, like you asked:
def build_model(inputs):
BATCH_SIZE = 4
NUM_CLASSES = 2
NUM_UNITS = 128
H = 224
W = 224
C = 3
TIME_STEPS = 4
# inputs is assumed to be of shape (BATCH_SIZE, TIME_STEPS, H, W, C)
# reshape your input such that you can apply the CNN for all images
input_cnn_reshaped = tf.reshape(inputs, (-1, H, W, C))
# define CNN, for instance vgg 16
cnn_logits_output, _ = vgg_16(input_cnn_reshaped, num_classes=NUM_CLASSES)
cnn_probabilities_output = tf.nn.softmax(cnn_logits_output)
# reshape back to time series convention
cnn_probabilities_output = tf.reshape(cnn_probabilities_output, (BATCH_SIZE, TIME_STEPS, NUM_CLASSES))
# perform LSTM over the probabilities per image
cell = tf.contrib.rnn.LSTMCell(NUM_UNITS)
_, state = tf.nn.dynamic_rnn(cell, cnn_probabilities_output)
# employ FC layer over the last state
logits = tf.layers.dense(state, NUM_UNITS)
# logits is of shape (BATCH_SIZE, NUM_CLASSES)
return logits
By the way, a better approach would be to employ the LSTM over the last hidden layer, i.e to use the CNN as feature extractor and make the prediction over sequences of features.
Are the bias terms added by default when creating the tensorflow neural network models?
To rephrase, if x is the input to a particular layer, y is the ouput, W is the weight matrix and b is the bias , then the output of the layer is given by,
y = W^t x + b
So is the bias added by default when we create the model ?
If you are creating your own model from scratch, you have to create your own trainable variables for the weights and biases explicitly. Tensorflow does not create them by default.
x = tf.placeholder(tf.float32, shape=(None, n))
W = tf.Variable(tf.random_normal([n,m], stddev=0.01))
b = tf.Variable(tf.zeros([m]))
y = tf.matmul(x,W) + b
No, You need to define biases and all the variable by yourself. If you not defined biases then your model will be
y = w^tx
I am not clear which case did you refer in the three possibilities(at least) bellow:
suppose the layer is a basic rnn layer(though you provided only one input and one output), then the answer is yes. Check out this source code. The third parameter of the linear function is true, indicating the bias is added in default.
suppose that you did a linear projection, then you can custom it yourself. A bias can be omitted, for example in this.
and using this API: tf.nn.xw_plus_b, you can define your own bias.
I'm trying to use the Tensorflow's CTC implementation under contrib package (tf.contrib.ctc.ctc_loss) without success.
First of all, anyone know where can I read a good step-by-step tutorial? Tensorflow's documentation is very poor on this topic.
Do I have to provide to ctc_loss the labels with the blank label interleaved or not?
I could not be able to overfit my network even using a train dataset of length 1 over 200 epochs. :(
How can I calculate the label error rate using tf.edit_distance?
Here is my code:
with graph.as_default():
max_length = X_train.shape[1]
frame_size = X_train.shape[2]
max_target_length = y_train.shape[1]
# Batch size x time steps x data width
data = tf.placeholder(tf.float32, [None, max_length, frame_size])
data_length = tf.placeholder(tf.int32, [None])
# Batch size x max_target_length
target_dense = tf.placeholder(tf.int32, [None, max_target_length])
target_length = tf.placeholder(tf.int32, [None])
# Generating sparse tensor representation of target
target = ctc_label_dense_to_sparse(target_dense, target_length)
# Applying LSTM, returning output for each timestep (y_rnn1,
# [batch_size, max_time, cell.output_size]) and the final state of shape
# [batch_size, cell.state_size]
y_rnn1, h_rnn1 = tf.nn.dynamic_rnn(
tf.nn.rnn_cell.LSTMCell(num_hidden, state_is_tuple=True, num_proj=num_classes), # num_proj=num_classes
data,
dtype=tf.float32,
sequence_length=data_length,
)
# For sequence labelling, we want a prediction for each timestamp.
# However, we share the weights for the softmax layer across all timesteps.
# How do we do that? By flattening the first two dimensions of the output tensor.
# This way time steps look the same as examples in the batch to the weight matrix.
# Afterwards, we reshape back to the desired shape
# Reshaping
logits = tf.transpose(y_rnn1, perm=(1, 0, 2))
# Get the loss by calculating ctc_loss
# Also calculates
# the gradient. This class performs the softmax operation for you, so inputs
# should be e.g. linear projections of outputs by an LSTM.
loss = tf.reduce_mean(tf.contrib.ctc.ctc_loss(logits, target, data_length))
# Define our optimizer with learning rate
optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(loss)
# Decoding using beam search
decoded, log_probabilities = tf.contrib.ctc.ctc_beam_search_decoder(logits, data_length, beam_width=10, top_paths=1)
Thanks!
Update (06/29/2016)
Thank you, #jihyeon-seo! So, we have at input of RNN something like [num_batch, max_time_step, num_features]. We use the dynamic_rnn to perform the recurrent calculations given the input, outputting a tensor of shape [num_batch, max_time_step, num_hidden]. After that, we need to do an affine projection in each tilmestep with weight sharing, so we've to reshape to [num_batch*max_time_step, num_hidden], multiply by a weight matrix of shape [num_hidden, num_classes], sum a bias undo the reshape, transpose (so we will have [max_time_steps, num_batch, num_classes] for ctc loss input), and this result will be the input of ctc_loss function. Did I do everything correct?
This is the code:
cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)
h_rnn1, self.last_state = tf.nn.dynamic_rnn(cell, self.input_data, self.sequence_length, dtype=tf.float32)
# Reshaping to share weights accross timesteps
x_fc1 = tf.reshape(h_rnn1, [-1, num_hidden])
self._logits = tf.matmul(x_fc1, self._W_fc1) + self._b_fc1
# Reshaping
self._logits = tf.reshape(self._logits, [max_length, -1, num_classes])
# Calculating loss
loss = tf.contrib.ctc.ctc_loss(self._logits, self._targets, self.sequence_length)
self.cost = tf.reduce_mean(loss)
Update (07/11/2016)
Thank you #Xiv. Here is the code after the bug fix:
cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)
h_rnn1, self.last_state = tf.nn.dynamic_rnn(cell, self.input_data, self.sequence_length, dtype=tf.float32)
# Reshaping to share weights accross timesteps
x_fc1 = tf.reshape(h_rnn1, [-1, num_hidden])
self._logits = tf.matmul(x_fc1, self._W_fc1) + self._b_fc1
# Reshaping
self._logits = tf.reshape(self._logits, [-1, max_length, num_classes])
self._logits = tf.transpose(self._logits, (1,0,2))
# Calculating loss
loss = tf.contrib.ctc.ctc_loss(self._logits, self._targets, self.sequence_length)
self.cost = tf.reduce_mean(loss)
Update (07/25/16)
I published on GitHub part of my code, working with one utterance. Feel free to use! :)
I'm trying to do the same thing.
Here's what I found you may be interested in.
It was really hard to find the tutorial for CTC, but this example was helpful.
And for the blank label, CTC layer assumes that the blank index is num_classes - 1, so you need to provide an additional class for the blank label.
Also, CTC network performs softmax layer. In your code, RNN layer is connected to CTC loss layer. Output of RNN layer is internally activated, so you need to add one more hidden layer (it could be output layer) without activation function, then add CTC loss layer.
See here for an example with bidirectional LSTM, CTC, and edit distance implementations, training a phoneme recognition model on the TIMIT corpus. If you train on that corpus's training set, you should be able to get phoneme error rates down to 20-25% after 120 epochs or so.