Convolution-Deconvolution pair gives slightly different dimensionality - tensorflow

I am using a convolution layer followed by a deconvolution layer like so:
tf.nn.conv2d(num_outputs=1, kernel_size=[21, 11], stride=[2, 2], padding="SAME", rate=1)
tf.nn.conv2d_transpose(num_outputs=1, kernel_size=[21, 11], stride=[2, 2], padding="SAME")
My idea is to make the initial image smaller, then bring it to its original size with the deconvolution. I am actually using the tf.slim functions, but the arguments are the ones above.
When I look at the input and output, I have a small difference:
Input shape : (16, 161, 511, 1)
Output shape: (16, 162, 512, 1)
I think it could be due to my stride size or kernel size. I've tried multiple values but none seem to reproduce the original dimensions.

A popular methodology is to pad the input image such that the output after the convolution and deconvolution would be of the same size as the padded input. Then crop the output to fit the original input without the padding.

Related

Understanding basic Keras Conv2DTranspose example

This is definitely a basic question, but I'm having trouble understanding exactly what is going on with Keras's layers.Conv2DTranspose function. I have the following three lines:
Setup
model = tf.keras.Sequential()
...
model.add(layers.Reshape((10, 10, 256)))
model.add(layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False))
assert model.output_shape == (None, 10, 10, 128)
The first occurrence of Reshape gets me a tensor of shape [10x10x256].
In the Conv2DTranspose layer, somehow I'm sliding a filter of shape [5x5] along this tensor and ending up with a new tensor of shape [10x10x128].
Question
What mathematically is happening to get me from the first tensor [10x10x256] to the second [10x10x128]?
It's almost the same as a convolution, but with fancy paddings to get the feeling of doing a backward convolution.
The sliding window in your picture is correctly positioned.
But it's not a "window", it is actually a "sliding block". The size of the block is 256 in depth.
So, it goes multiplying and summing all the channels for each stride.
But then there are 128 different sliding blocks (as you defined in your layer with filters=128). Each of these 128 sliding blocks produce separate output channel.
Great explanations about transposed convolutions: https://datascience.stackexchange.com/questions/6107/what-are-deconvolutional-layers

Deconvolutions/Transpose_Convolutions with tensorflow

I am attempting to use tf.nn.conv3d_transpose, however, I am getting an error indicating that my filter and output shape is not compatible.
I have a tensor of size [1,16,16,4,192]
I am attempting to use a filter of [1,1,1,192,192]
I believe that the output shape would be [1,16,16,4,192]
I am using "same" padding and a stride of 1.
Eventually, I want to have an output shape of [1,32,32,7,"does not matter"], but I am attempting to get a simple case to work first.
Since these tensors are compatible in a regular convolution, I believed that the opposite, a deconvolution, would also be possible.
Why is it not possible to perform a deconvolution on these tensors. Could I get an example of a valid filter size and output shape for a deconvolution on a tensor of shape [1,16,16,4,192]
Thank you.
I have a tensor of size [1,16,16,4,192]
I am attempting to use a filter of [1,1,1,192,192]
I believe that the output shape would be [1,16,16,4,192]
I am using "same" padding and a stride of 1.
Yes the output shape will be [1,16,16,4,192]
Here is a simple example showing that the dimensions are compatible:
import tensorflow as tf
i = tf.Variable(tf.constant(1., shape=[1, 16, 16, 4, 192]))
w = tf.Variable(tf.constant(1., shape=[1, 1, 1, 192, 192]))
o = tf.nn.conv3d_transpose(i, w, [1, 16, 16, 4, 192], strides=[1, 1, 1, 1, 1])
print(o.get_shape())
There must be some other problem in your implementation than the dimensions.

When would I want to set a stride in the batch or channel dimension for TensorFlow convolution?

Tensor flow implements a basic convolution operation with tf.nn.conv2d.
I am specifically interested in the "strides" parameter, which lets you set the stride of the convolution filter -- how far across the image you shift the filter each time.
The example given in one of the early tutorials, with an image stride of 1 in each direction, is
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
The strides array is explained more in the linked docs:
In detail, with the default NHWC format...
Must have strides[0] = strides[3] = 1. For the most common case of the same horizontal and vertices strides, strides = [1, stride, stride, 1].
Note the order of "strides" matches the order of inputs: [batch, height, width, channels] in the NHWC format.
Obviously having a stride of not 1 for batch and channels wouldn't make sense, right? (your filter should always go across every batch and every channel)
But why is it even an option to put something other than 1 in strides[0] and strides[3], then? (where it being an "option" is in regards to the fact that you could put something other than 1 in the python array you pass in, disregarding the documentation quote above)
Is there a situation where I would have a non-one stride for the batch or channels dimension, e.g.
tf.nn.conv2d(x, W, strides=[2, 1, 1, 2], padding='SAME')
If so, what would that example even mean in terms of the convolution operation?
There might be a situation where you send a video in chunks. That means your batch will be a sequence of frames. And assuming that close frames should be quite similar we can omit some of them by increasing batch stride. That as far as I understand. IDK about channel stride though

How to imagine convolution/pooling on images with 3 color channels

I am a beginner and i understood the mnist tutorials. Now i want to get something going on the SVHN dataset. In contrast to mnist, it comes with 3 color channels. I am having a hard time visualizing how convolution and pooling works with the additional dimensionality of the color channels.
Has anyone a good way to think about it or a link for me ?
I appreciate all input :)
This is very simple, the difference only lies in the first convolution:
in grey images, the input shape is [batch_size, W, H, 1] so your first convolution (let's say 3x3) has a filter of shape [3, 3, 1, 32] if you want to have 32 dimensions after.
in RGB images, the input shape is [batch_size, W, H, 3] so your first convolution (still 3x3) has a filter of shape [3, 3, 3, 32].
In both cases, the output shape (with stride 1) is [batch_size, W, H, 32]

How to use tf.train.batch with enqueue_many=true

I'm looking for an example of using tf.train.batch with enqueue_many=True.
In my case, I have an image tensor of shape [299,299,3] and when I call a function get_distortions(image) it will return a new tensor of shape [10,299,299,3] (in this example, it will apply 10 distortions to the image and return them all as a new tensor). I'd then like to enqueue all these by calling tf.train.batch.
I tried this:
example_batch = tf.train.batch(tf.unpack(distortions), 5, enqueue_many=True)
But when I sess.run(example_batch) I get back a list of length 10 (I was expecting a batch of size 5).
Also, how would I include the label to tf.train.batch in this case? The label is the same for all 10 distortions.
Don't unpack distortions. The semantics of enqueue_many is that you feed it a tensor with first dimension being the batching dimension, so a [10, 299, 299, 3] tensor with enqueue_many will result in ten separate items, each of shape 299, 299, 3 being enqueued -- which is what you want.
Documentation for tf.train.batch tells you:
If enqueue_many is True, tensors is assumed to represent a batch of
examples, where the first dimension is indexed by example, and all
members of tensors should have the same size in the first dimension.
If an input tensor has shape [*, x, y, z], the output will have shape
[batch_size, x, y, z]. The capacity argument controls the how long the
prefetching is allowed to grow the queues.
Which is exactly what happens in your case: [10, 299, 299, 3], where 10 is the batch size. So you do not need to do any unpacking and tf.train.batch(distortions, 5, enqueue_many=True) will do the job.