conv1d and conv2d in tensorflow - tensorflow

For conv2d, assuming an input 2D matrix with shape (W,H) and the conv kernel size is (Wk,H), which means the height of the kernel is the same with the height of input matrix. In this case, can we think that conv1 with kernel size Wk carries out the same computation as conv2d?
For example:
tf.layers.conv2d(
kernel_size=tf.Variable(tf.truncated_normal([Wk, H, 1, out_dim], stddev=0.1),
input=...
)
equals to:
tf.layers.conv1d(kernel_size=Wk, input=...)

They're not the same; the conv2d kernel has many more weights and is going to train differently because of that. Also, depending on what padding is set to, the output size of the conv2d operation may not be 1D either.

tf.nn.conv1d just call the tf.nn.conv2d
This is the description of tf.nn.conv1d:
Internally, this op reshapes the input tensors and invokes tf.nn.conv2d. For example, if data_format does not start with "NC", a tensor of shape [batch, in_width, in_channels] is reshaped to [batch, 1, in_width, in_channels], and the filter is reshaped to [1, filter_width, in_channels, out_channels]. The result is then reshaped back to [batch, out_width, out_channels] (where out_width is a function of the stride and padding as in conv2d) and returned to the caller.

Related

Understanding shapes in keras layers

I am learning Tensorflow and Keras to implement LSTM many-to-many model where the length of input sequence is equal to the length of the output sequence.
Sample Code:
Inputs:
voc_size = 10000
embed_dim = 64
lstm_units = 75
size_batch = 30
count_classes = 5
Model:
from tensorflow.keras.layers import ( Bidirectional, LSTM,
Dense, Embedding, TimeDistributed )
from tensorflow.keras import Sequential
def sample_build(embed_dim, voc_size, batch_size, lstm_units, count_classes):
model = Sequential()
model.add(Embedding(input_dim=voc_size,
output_dim=embed_dim,input_length=50))
model.add(Bidirectional(LSTM(units=lstm_units,return_sequences=True),
merge_mode="ave"))
model.add(Dense(200))
model.add(TimeDistributed(Dense(count_classes+1)))
# Compile model
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.summary()
return model
sample_model = sample_build(embed_dim,voc_size,
size_batch, rnn_units,
count_classes)
I am having trouble understanding the shapes of input and output for each layer. For example, the shape of the output of Embedding_Layer is (BATCH_SIZE, time_steps, length_of_input) and in this case, it is (30, 50, 64).
Similarly, the output shape of Bidirectional LSTM later is (30, 50, 75). This is will be the input for the next Dense Layer with 200 units. But the shape of the weight matrix of Dense Layer is (number of units in the current layer, number of units in the previous layer, which is (200,75) in this case. So how does the matrix calculation happen between 2D shape of the Dense Layer and the 3D shape of the Bidirectional Layer? Any explanations on the shape clarification will be helpful
The Dense can do 3D operation, it will flatten the the input to shape (batch_size * time_steps, features) and then apply a dense layer and reshape it back to orignal (batch_size, time_steps, units). In keras's documentation of Dense layer, it says:
Note: If the input to the layer has a rank greater than 2, then Dense computes the dot product between the inputs and the kernel along the last axis of the inputs and axis 1 of the kernel (using tf.tensordot). For example, if input has dimensions (batch_size, d0, d1), then we create a kernel with shape (d1, units), and the kernel operates along axis 2 of the input, on every sub-tensor of shape (1, 1, d1) (there are batch_size * d0 such sub-tensors). The output in this case will have shape (batch_size, d0, units).
Another point regarding the output of Embedding layer. As you said, it is correct that it is a 3D output, but correctly the shape correspond to (BATCH_SIZE, input_dim, embeddings_dim)

How can I multiply a tensor with an unknown dimension to a tensorflow variable?

I'm working in Keras (Tensorflow 2). I'd like to multiply each element of a tensor with its own trainable weight. Let's say that my input tensor is 1D, with 10 elements; so I try to define the input as a Keras input tensor, the weights as a tf.Variable, and I try to use the Keras Multiply layer, thus:
import tensorflow as tf
inputs = tf.keras.layers.Input(shape=(10), name='inputs')
weights = tf.Variable(tf.random.normal([10]), name='weights')
outputs = tf.keras.layers.Multiply()([inputs, weights])
Now when I inspect the dimensions they are:
inputs: shape=(None, 10)
weights: shape=(10,)
outputs: shape=(10, 10)
The input dimension has a None dimension, for the batch size, which is what I expect and want. However I expected outputs to have shape=(None, 10). Instead, the initial dimension for the batch size seems to have taken a fixed size of 10. How should I correct this?
You need to broadcast weights along dimenstion 0. The shape of the dimension you want to fix must be constant.
That is, weights must have the shape (1, 10), not (10,).
This can be done using:
weights = tf.Variable(tf.random.normal([1, 10]), name='weights')
or
weights = tf.Variable(tf.random.normal([10]), name='weights')
...
weights = tf.expand_dims(weights, axis=0)

CNN features(dimensions) feed to LSTM Tensorflow

So recently i am working on a project which i am supposed to take images as input to a CNN and extract the features and feed them to LSTM for training. I am using 2 Layer CNN for feature extraction and im taking the features form fully connected layer and trying to feed them to LSTM. Problem is when i want to feed the FC layer to LSTM as input i get error regarding to wrong dimension. my FC layer is a Tensor with (128,1024) dimension. I tried to reshape it like this tf.reshape(fc,[-1]) which gives me a tensor ok (131072, )
dimension and still wont work. Could anyone give me any ideas of how im suppose to feed the FC to LSTM?here i just write part of my code and teh error i get.
Convolution Layer with 32 filters and a kernel size of 5
conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv1 = tf.layers.max_pooling2d(conv1, 2, 2)
# Convolution Layer with 32 filters and a kernel size of 5
conv2 = tf.layers.conv2d(conv1, 64, 3, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv2 = tf.layers.max_pooling2d(conv2, 2, 2)
# Flatten the data to a 1-D vector for the fully connected layer
fc1 = tf.contrib.layers.flatten(conv2)
# Fully connected layer (in contrib folder for now)
fc1 = tf.layers.dense(fc1, 1024)
# Apply Dropout (if is_training is False, dropout is not applied)
fc1 = tf.layers.dropout(fc1, rate=dropout, training=is_training)
s = tf.reshape(fc1, [1])
rnn_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
outputs, states = rnn.static_rnn(rnn_cell, s, dtype=tf.float32)
return tf.matmul(outputs[-1], rnn_weights['out']) + rnn_biases['out']
here is the error:
ValueError: Cannot reshape a tensor with 131072 elements to shape [1] (1 elements) for 'ConvNet/Reshape' (op: 'Reshape') with input shapes: [128,1024], [1] and with input tensors computed as partial shapes: input[1] = [1].
You have a logical error in how you approach the problem. Collapsing the data to a 1D tensor is not going to solve anything (even if you get it to work correctly).
If you are taking a sequence of images as input your input tensor should be 5D (batch, sequence_index, x, y, channel) or something permutation like that. conv2d should complain about the extra dimension but you probably missing one of them. You should try to fix it first.
Next use conv3d and max_pool3d with a window of 1 for the depth (since you don't want the different frames to interact at this stage).
When you are done you should still have 5D tensor, but x and y dimensions should be 1 (you should check this, and fix the operation if that's not the case).
The RNN part expects 3D tensors (batch, sequence_index, fature_index). You can use tf.squeeze to remove the 1 sized dimensions from your 5D tensor and get this 3D tensor. You shouldn't have to reshape anything.
If you don't use batches, it's OK, but the operations will still expect the dimension to be there (but for you it will be 1). Missing the dimension will cause problems with shapes down the line.

regarding tf.nn.conv1d and its corresponding transpose operation

In the latest tensorflow version, there is tf.nn.conv2d_transpose for the 2D deconvolution operation. However, there is no corresponding 1D deconvolution operation for tf.nn.conv1d. How to perform the deconvolution for 1D data?
Well, conv1d is actually conv2d with in_height=1. The nn_ops.py.conv1d states:
Internally, this op reshapes the input tensors and invokes `tf.nn.conv2d`.
For example, if `data_format` does not start with "NC", a tensor of shape
[batch, in_width, in_channels]
is reshaped to
[batch, 1, in_width, in_channels],
and the filter is reshaped to
[1, filter_width, in_channels, out_channels].
The result is then reshaped back to
[batch, out_width, out_channels]
\(where out_width is a function of the stride and padding as in conv2d\) and
returned to the caller.
Thus, tf.nn.conv2d_transpose can do the job.

regarding shape values in convolutional layer of CIFAR10 example

In the CIFAR10 example, the conv2 is defined as follows. How to know that the shape=[5,5,64,64] in kernel = _variable_with_weight_decay should be given those values, e.g., 5,5,64,64 In addition, in biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.1)), shape is also defined as [64], how to get those values?
# conv2
with tf.variable_scope('conv2') as scope:
kernel = _variable_with_weight_decay('weights',
shape=[5, 5, 64, 64],
stddev=5e-2,
wd=0.0)
conv = tf.nn.conv2d(norm1, kernel, [1, 1, 1, 1], padding='SAME')
biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.1))
bias = tf.nn.bias_add(conv, biases)
conv2 = tf.nn.relu(bias, name=scope.name)
_activation_summary(conv2)
Looking at the source, we see that a call to _variable_with_weight_decay boils down to a tf.get_variable call. We are retrieving a weight tensor (creating one if it doesn't already exist)
In a Convolutional Neural Network, the weight tensor defines a mapping from one layer to the next, but differs from a vanilla NN. The convolution implies you are applying a convolutional filter to your as you map from one layer to the next. This filter is defined with hyper-parameters which are the ones fed into shape.
There are four parameters fed into shape, the first two relate to the size of the convolution filter. In this case we have a 5x5 filter. The third parameter defines the input dimension, which in this case is the same as the output of in the previous convolution:
kernel = _variable_with_weight_decay('weights',
shape=[5, 5, 3, 64],
stddev=5e-2,
wd=0.0)
The fourth parameter defines the output dimension of the tensor.
Bias is a perturbation to system used for better learning. The bias is added to the output of the convolution. Given by basic linear algebra rules, these two vectors should be of the same size, in this case it is 64
Cheers!