regarding shape values in convolutional layer of CIFAR10 example - tensorflow

In the CIFAR10 example, the conv2 is defined as follows. How to know that the shape=[5,5,64,64] in kernel = _variable_with_weight_decay should be given those values, e.g., 5,5,64,64 In addition, in biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.1)), shape is also defined as [64], how to get those values?
# conv2
with tf.variable_scope('conv2') as scope:
kernel = _variable_with_weight_decay('weights',
shape=[5, 5, 64, 64],
stddev=5e-2,
wd=0.0)
conv = tf.nn.conv2d(norm1, kernel, [1, 1, 1, 1], padding='SAME')
biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.1))
bias = tf.nn.bias_add(conv, biases)
conv2 = tf.nn.relu(bias, name=scope.name)
_activation_summary(conv2)

Looking at the source, we see that a call to _variable_with_weight_decay boils down to a tf.get_variable call. We are retrieving a weight tensor (creating one if it doesn't already exist)
In a Convolutional Neural Network, the weight tensor defines a mapping from one layer to the next, but differs from a vanilla NN. The convolution implies you are applying a convolutional filter to your as you map from one layer to the next. This filter is defined with hyper-parameters which are the ones fed into shape.
There are four parameters fed into shape, the first two relate to the size of the convolution filter. In this case we have a 5x5 filter. The third parameter defines the input dimension, which in this case is the same as the output of in the previous convolution:
kernel = _variable_with_weight_decay('weights',
shape=[5, 5, 3, 64],
stddev=5e-2,
wd=0.0)
The fourth parameter defines the output dimension of the tensor.
Bias is a perturbation to system used for better learning. The bias is added to the output of the convolution. Given by basic linear algebra rules, these two vectors should be of the same size, in this case it is 64
Cheers!

Related

Why does TensorFlow Conv2D have two weights matrices?

I have a tf.keras.layers.Conv2D constructed like so:
>>> conv2d_layer = tf.keras.layers.Conv2D(filters=128, kernel_size=(3, 3), strides=2)
For reference that layer is part of a network where the prior layer is prior_layer = Conv2D(filters=64, kernel_size=(3, 3), strides=2).
When I call conv2d_layer.get_weights(), it returns a list with two entries:
>>> [w.shape for w in conv2d_layer.get_weights()]
[(3, 3, 64, 128), (128,)]
Why are there two np.ndarrays in conv2d_layer.get_weights()? What are their respective meanings?
The first shape is for the weights of your conv2D, and the second one is the bias for the same layer, which is represented by a vector.
Looking at the documentation, you can see
For example, a Dense layer returns a list of two values: the kernel matrix and the bias vector. These can be used to set the weights of another Dense layer:
You have 128 convolution filters, each filter has a bias and a kernel. The kernel has a size 3x3. Furthermore, the kernel depth is equal to the input depth (it means 64 in this example). So we have (3, 3, 64) for a kernel, and we have 128 filters, so all filter's shapes are equal to (3, 3, 64, 128). Also, we have a bias for every filter so the shape of the second weight is equal to (128,)

CNN features(dimensions) feed to LSTM Tensorflow

So recently i am working on a project which i am supposed to take images as input to a CNN and extract the features and feed them to LSTM for training. I am using 2 Layer CNN for feature extraction and im taking the features form fully connected layer and trying to feed them to LSTM. Problem is when i want to feed the FC layer to LSTM as input i get error regarding to wrong dimension. my FC layer is a Tensor with (128,1024) dimension. I tried to reshape it like this tf.reshape(fc,[-1]) which gives me a tensor ok (131072, )
dimension and still wont work. Could anyone give me any ideas of how im suppose to feed the FC to LSTM?here i just write part of my code and teh error i get.
Convolution Layer with 32 filters and a kernel size of 5
conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv1 = tf.layers.max_pooling2d(conv1, 2, 2)
# Convolution Layer with 32 filters and a kernel size of 5
conv2 = tf.layers.conv2d(conv1, 64, 3, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv2 = tf.layers.max_pooling2d(conv2, 2, 2)
# Flatten the data to a 1-D vector for the fully connected layer
fc1 = tf.contrib.layers.flatten(conv2)
# Fully connected layer (in contrib folder for now)
fc1 = tf.layers.dense(fc1, 1024)
# Apply Dropout (if is_training is False, dropout is not applied)
fc1 = tf.layers.dropout(fc1, rate=dropout, training=is_training)
s = tf.reshape(fc1, [1])
rnn_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
outputs, states = rnn.static_rnn(rnn_cell, s, dtype=tf.float32)
return tf.matmul(outputs[-1], rnn_weights['out']) + rnn_biases['out']
here is the error:
ValueError: Cannot reshape a tensor with 131072 elements to shape [1] (1 elements) for 'ConvNet/Reshape' (op: 'Reshape') with input shapes: [128,1024], [1] and with input tensors computed as partial shapes: input[1] = [1].
You have a logical error in how you approach the problem. Collapsing the data to a 1D tensor is not going to solve anything (even if you get it to work correctly).
If you are taking a sequence of images as input your input tensor should be 5D (batch, sequence_index, x, y, channel) or something permutation like that. conv2d should complain about the extra dimension but you probably missing one of them. You should try to fix it first.
Next use conv3d and max_pool3d with a window of 1 for the depth (since you don't want the different frames to interact at this stage).
When you are done you should still have 5D tensor, but x and y dimensions should be 1 (you should check this, and fix the operation if that's not the case).
The RNN part expects 3D tensors (batch, sequence_index, fature_index). You can use tf.squeeze to remove the 1 sized dimensions from your 5D tensor and get this 3D tensor. You shouldn't have to reshape anything.
If you don't use batches, it's OK, but the operations will still expect the dimension to be there (but for you it will be 1). Missing the dimension will cause problems with shapes down the line.

How to use tf.layers.conv2d to train a autoencoder with tied weights

If I want to train an autoencoder with tied weights (encoder and decoder has same weight parameters), how to use tf.layers.conv2d to do that correctly?
I cannot just simply share variables between corresponding conv2d layers of encoder and decoder, because the weights of decoder is the transpose of that of encoder.
Maybe tied weights are barely used nowadays, but I am just curious.
Use tf.nn.conv2d (and tf.nn.conv2d_transpose correspondingly). It's a low-level function that accepts the kernel variable as an argument.
kernel = tf.get_variable('kernel', [5, 5, 1, 32])
...
encoder_conv = tf.nn.conv2d(images, kernel, strides=[1, 1, 1, 1], padding='SAME')
...
decoder_conv = tf.nn.conv2d_transpose(images, kernel, strides=[1, 1, 1, 1], padding='SAME')

conv1d and conv2d in tensorflow

For conv2d, assuming an input 2D matrix with shape (W,H) and the conv kernel size is (Wk,H), which means the height of the kernel is the same with the height of input matrix. In this case, can we think that conv1 with kernel size Wk carries out the same computation as conv2d?
For example:
tf.layers.conv2d(
kernel_size=tf.Variable(tf.truncated_normal([Wk, H, 1, out_dim], stddev=0.1),
input=...
)
equals to:
tf.layers.conv1d(kernel_size=Wk, input=...)
They're not the same; the conv2d kernel has many more weights and is going to train differently because of that. Also, depending on what padding is set to, the output size of the conv2d operation may not be 1D either.
tf.nn.conv1d just call the tf.nn.conv2d
This is the description of tf.nn.conv1d:
Internally, this op reshapes the input tensors and invokes tf.nn.conv2d. For example, if data_format does not start with "NC", a tensor of shape [batch, in_width, in_channels] is reshaped to [batch, 1, in_width, in_channels], and the filter is reshaped to [1, filter_width, in_channels, out_channels]. The result is then reshaped back to [batch, out_width, out_channels] (where out_width is a function of the stride and padding as in conv2d) and returned to the caller.

TensorFlow post-LSTM fully connected layer outputs return the same values as each other

I was trying to train a sequence-to-sequence LSTM model with a dataset with three labels: [1, 0] for detection of class 1, [0, 1] for detection of class 2, and [0, 0] for detection of nothing. After getting the outputs from the LSTM network, I applied a fully connected layer to each cell's output the following way:
outputs, state = tf.nn.dynamic_rnn(cell, input)
# Shape of outputs is [batch_size, n_time_steps, n_hidden]
# As matmul works only on matrices, reshape to get the
# time dimension into the batch dimension
outputs = tf.reshape(outputs, [-1, n_hidden])
# Shape is [batch_size * n_time_steps, n_hidden]
w = tf.Variable(tf.truncated_normal(shape=[n_hidden, 2], stddev=0.1))
b = tf.Variable(tf.constant(0.1, shape=[2]))
logit = tf.add(tf.matmul(outputs, w), b, name='logit')
# Reshape back to [batch_size, n_time_steps, 2]
logit = tf.reshape(logit, [batch_size, -1, 2])
On the output, I apply tf.nn.sigmoid_cross_entropy_with_logits and reduce the mean. The model seems to work just fine achieving high accuracy and recall, except for the fact that in almost all the cases it outputs either [0, 0], or [1, 1]. The two logit outputs from the fully connected layer always have very similar values (but not the same). This effectively puts a hard-cap on precision of 50%, which the model converges to (but not a fraction of a percent above).
Now, my intuition would tell me that something must be wrong with the training step and both fully connected outputs are trained on the same data, but curiously enough when I replace my own implementation with the prepackaged one from tf.contrib:
outputs, state = tf.nn.dynamic_rnn(cell, input)
logit = tf.contrib.layers.fully_connected(outputs, 2, activation_fn=None)
without changing a single other thing, the model starts training properly. Now, the obvious solution would be to just use that implementation, but why doesn't the first one work?