regarding tf.nn.conv1d and its corresponding transpose operation - tensorflow

In the latest tensorflow version, there is tf.nn.conv2d_transpose for the 2D deconvolution operation. However, there is no corresponding 1D deconvolution operation for tf.nn.conv1d. How to perform the deconvolution for 1D data?

Well, conv1d is actually conv2d with in_height=1. The nn_ops.py.conv1d states:
Internally, this op reshapes the input tensors and invokes `tf.nn.conv2d`.
For example, if `data_format` does not start with "NC", a tensor of shape
[batch, in_width, in_channels]
is reshaped to
[batch, 1, in_width, in_channels],
and the filter is reshaped to
[1, filter_width, in_channels, out_channels].
The result is then reshaped back to
[batch, out_width, out_channels]
\(where out_width is a function of the stride and padding as in conv2d\) and
returned to the caller.
Thus, tf.nn.conv2d_transpose can do the job.

Related

What does the keras ConvLSTM2D layer do?

I would like to understand the ConvLSTM2D Keras layer a bit better.
Does it execute an 2D convolution on a 2D input (image) and then average/ flatten its ouptut and feed that into a LSTM module?
But I guess it is basically an LSTM cell, where the matrix multiplications are replaced with convolution operations. Is that correct?
Thank you
Yes, you are right with the concept of CONVLSTM2D.
CONVLSTM2D architecture combines gating of LSTM with 2D convolutions.
As you have mentioned, CONVLSTM layers will do a similar task to LSTM but instead of matrix multiplications, it does convolution operations and retains the input dimensions.
Another different approach would be that the images pass through the convolution layer and the result will be a flattened 1D array and this will be the input to the LSTM layers with a set of features over time.
Input of Kera's CONVLSTM layer: is a 5D tensor with shape
(samples, time, channels, rows, cols) if it is channels first.
(samples, time, rows, cols, channels) if it is channels last.
Output of a CONVLSTM layer:
If return_sequences = True then it is a 5D tensor with shape
(samples, time, filters, rows, cols)
If return_sequences = False then it is a 4D tensor with shape.
(samples, filters, rows, cols)
You can refer to this paper from where the implementation of CONVLSTM is done.

Data format and actual shape

I'm trying to migrate TensorFlow checkpoint weights to PyTorch.
When I extract some weights with cp.load_variable(<CKPT>, <FIELD_NAME>), I get a 4D list ordered as HWCN, for example [1, 1, 512, 1024] which is clearly HWCN.
However, all convolution blocks data_format are set to NHWC.
So, the question is, why there's mismatch?
what should I believe? does the 4D list from cp.load_variable is correct and all left to do is permute the dimensions?
Thanks!
The weights are not given as HWCN, as the weights do not have any batch dimension (N), otherwise that would apply a different weight for each sample in the batch. The shape is [kernel_height, kernel_width, in_channels, out_channels]. There is no mismatch, because data_format specifies which format the input and output use.
In PyTorch the weight of convolutions is given as [out_channels, in_channels, kernel_height, kernel_width], therefore you only need to permute the dimensions.

CNN features(dimensions) feed to LSTM Tensorflow

So recently i am working on a project which i am supposed to take images as input to a CNN and extract the features and feed them to LSTM for training. I am using 2 Layer CNN for feature extraction and im taking the features form fully connected layer and trying to feed them to LSTM. Problem is when i want to feed the FC layer to LSTM as input i get error regarding to wrong dimension. my FC layer is a Tensor with (128,1024) dimension. I tried to reshape it like this tf.reshape(fc,[-1]) which gives me a tensor ok (131072, )
dimension and still wont work. Could anyone give me any ideas of how im suppose to feed the FC to LSTM?here i just write part of my code and teh error i get.
Convolution Layer with 32 filters and a kernel size of 5
conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv1 = tf.layers.max_pooling2d(conv1, 2, 2)
# Convolution Layer with 32 filters and a kernel size of 5
conv2 = tf.layers.conv2d(conv1, 64, 3, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv2 = tf.layers.max_pooling2d(conv2, 2, 2)
# Flatten the data to a 1-D vector for the fully connected layer
fc1 = tf.contrib.layers.flatten(conv2)
# Fully connected layer (in contrib folder for now)
fc1 = tf.layers.dense(fc1, 1024)
# Apply Dropout (if is_training is False, dropout is not applied)
fc1 = tf.layers.dropout(fc1, rate=dropout, training=is_training)
s = tf.reshape(fc1, [1])
rnn_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
outputs, states = rnn.static_rnn(rnn_cell, s, dtype=tf.float32)
return tf.matmul(outputs[-1], rnn_weights['out']) + rnn_biases['out']
here is the error:
ValueError: Cannot reshape a tensor with 131072 elements to shape [1] (1 elements) for 'ConvNet/Reshape' (op: 'Reshape') with input shapes: [128,1024], [1] and with input tensors computed as partial shapes: input[1] = [1].
You have a logical error in how you approach the problem. Collapsing the data to a 1D tensor is not going to solve anything (even if you get it to work correctly).
If you are taking a sequence of images as input your input tensor should be 5D (batch, sequence_index, x, y, channel) or something permutation like that. conv2d should complain about the extra dimension but you probably missing one of them. You should try to fix it first.
Next use conv3d and max_pool3d with a window of 1 for the depth (since you don't want the different frames to interact at this stage).
When you are done you should still have 5D tensor, but x and y dimensions should be 1 (you should check this, and fix the operation if that's not the case).
The RNN part expects 3D tensors (batch, sequence_index, fature_index). You can use tf.squeeze to remove the 1 sized dimensions from your 5D tensor and get this 3D tensor. You shouldn't have to reshape anything.
If you don't use batches, it's OK, but the operations will still expect the dimension to be there (but for you it will be 1). Missing the dimension will cause problems with shapes down the line.

conv1d and conv2d in tensorflow

For conv2d, assuming an input 2D matrix with shape (W,H) and the conv kernel size is (Wk,H), which means the height of the kernel is the same with the height of input matrix. In this case, can we think that conv1 with kernel size Wk carries out the same computation as conv2d?
For example:
tf.layers.conv2d(
kernel_size=tf.Variable(tf.truncated_normal([Wk, H, 1, out_dim], stddev=0.1),
input=...
)
equals to:
tf.layers.conv1d(kernel_size=Wk, input=...)
They're not the same; the conv2d kernel has many more weights and is going to train differently because of that. Also, depending on what padding is set to, the output size of the conv2d operation may not be 1D either.
tf.nn.conv1d just call the tf.nn.conv2d
This is the description of tf.nn.conv1d:
Internally, this op reshapes the input tensors and invokes tf.nn.conv2d. For example, if data_format does not start with "NC", a tensor of shape [batch, in_width, in_channels] is reshaped to [batch, 1, in_width, in_channels], and the filter is reshaped to [1, filter_width, in_channels, out_channels]. The result is then reshaped back to [batch, out_width, out_channels] (where out_width is a function of the stride and padding as in conv2d) and returned to the caller.

Why tensorflow use 'dim' parameter for softmax function?

Why tensorflow use 'dim' parameter for softmax function? What kind of tensors we can use as input ?
tf.nn.softmax accepts in input a generic nonempty tensor.
You can decide to apply softmax on every dimension you want to.
Usually, softmax is applied to the last dimension (that's the default behavior) of the input tensor. This because usually softmax is applied to neural network output that's usually a tensor with a shape of [batch_size, num_classes].
However, you could decide to apply softmax to a tensor with a shape of [batch_size, num_classes, 2, 1] and compute the softmax only over the second dimension of the tensor: tf.nn.softmax(tensor, axis=1)