While a convolution layer in TensorFlow has a complete description https://www.tensorflow.org/api_guides/python/nn#Convolution, transposed convolution does not have one.
Although tf.keras.layers.Conv2DTranspose has a reference to https://arxiv.org/pdf/1603.07285.pdf, it is not complete.
Is there any documentation that describes how tf.keras.layers.Conv2DTranspose behaves?
Conv2DTranspose is often used as upsampling for an image/feature map. The code below use 1X1 filter kernel to show how the input is padded with zero. the code is for tensorflow 2.0, add enable_eager_execution() with tensorflow1.x
data = tf.ones([2,2],tf.float32,"input_data")
input_layer = tf.reshape(data, [-1, 2, 2, 1])
transpose2d = layers.Conv2DTranspose(1, (1, 1), kernel_initializer='ones', strides=(2, 2), padding='valid', use_bias=False)
x = transpose2d(input_layer)
print(x)
The input is
1,1
1,1
The x is
1,0,1,0
0,0,0,0
1,0,1,0
0,0,0,0
you can change the stride value to see the diffrence
Related
I was reading this tutorial on Keras convolutional auto encoders, and I realized that I don't get the dimension (8, 4, 4) after these layers in my calculation - the dimension of images should drop to 3 already after the second convolutional layer, as the stride is large. So how does it obtain this dimension? Or can anyone explain the calculation process?
I am also confused on how "same" padding is executed in this situation, as they always mention "when stride=1 same padding will keep the same dimension". I totally get that. But what happens when stride isn't 1? how many zeros do I get on each side? I know the calculation equation for dimensions, floor((h + 2p - k)//s + 1), but what is p in this case?
Thanks
input_img = Input(shape=(1, 28, 28))
x = Convolution2D(16, 3, 3, activation='relu', border_mode='same')(input_img)
x = MaxPooling2D((2, 2), border_mode='same')(x)
x = Convolution2D(8, 3, 3, activation='relu', border_mode='same')(x)
x = MaxPooling2D((2, 2), border_mode='same')(x)
x = Convolution2D(8, 3, 3, activation='relu', border_mode='same')(x)
encoded = MaxPooling2D((2, 2), border_mode='same')(x)
# at this point the representation is (8, 4, 4) i.e. 128-dimensional
Oh no I think I know what happens: the code in the tutorial is wrong. I found this question which cite the same tutorial with the correct code. So they forgot to put parenthesis in all the Convolution2D layers (it's a translated version), and it actually should be 16, (3, 3), that means stride is 1, not 3. So it explains. If stride is 3 we can't get this dimension.
When taking the one dimensional convolution of a one dimensional array, I receive an error which suggests my second dimension is not big enough.
Here is the overview of the relevant code:
inputs_ = tf.placeholder(tf.float32 ,(None, 45), name='inputs')
x1 = tf.expand_dims(inputs_, axis=1)
x1 = tf.layers.conv1d(x1, filters=64, kernel_size=1, strides=1, padding='valid')
I am hoping to increase the kernel size to 3 such that neighbouring points also influence the output of each input node, however I get the following error:
ValueError: Negative dimension size caused by subtracting 3 from 1 for
'conv1d_4/convolution/Conv2D' (op: 'Conv2D') with input shapes:
[?,1,1,45], [1,3,45,64].
My guess is that tensorflow is expecting me to reshape my input into two dimensions so that some depth can be used to do the kernel multiplication. Question is why is this the case and what to expect for the layer behaviour based on the input dimensions
You need to add a Channel dimension as last dimension even if you only have one channel.
So this code works:
inputs_ = tf.placeholder(tf.float32 ,(None, 45), name='inputs')
x1 = tf.expand_dims(inputs_, axis=-1)
x1 = tf.layers.conv1d(x1, filters=64, kernel_size=3, strides=1, padding='valid')
So basically the error was caused because your tensor looked like having a width of 1, with 45 channels. TensorFlow was trying to convolve with a kernel size 3 along a size 1 dimension.
In the expert mnist tutorial in tensorflow website, it have something like this :
x_image = tf.reshape(x, [-1,28,28,1])
I know that the reshape is like
tf.reshape(input,[batch_size,width,height,channel])
Q1 : why is the batch_size equals -1? What does the -1 means?
And when I go down the code there's one more thing I can not understand
W_fc1 = weight_variable([7 * 7 * 64, 1024])
Q2:What does the image_size * 64 means?
Q1 : why is the batch_size equals -1? What does the -1 means?
-1 means "figure this part out for me". For example, if I run:
reshape([1, 2, 3, 4, 5, 6, 7, 8], [-1, 2])
It creates two columns, and whatever number of rows it needs to get everything to fit:
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
Q2:What does the image_size * 64 means?
It is the number of filters in that particular filter activation. Shapes of filters in conv layers follow the format [height, width, # of input channels (number of filters in the previous layer), # of filters].
When you pass -1 as a dimension in tf.reshape, it preserves the existing dimension. From the docs:
If one component of shape is the special value -1, the size of that
dimension is computed so that the total size remains constant. In
particular, a shape of [-1] flattens into 1-D. At most one component
of shape can be -1.
The reference to 7 x 7 x 64 is because the convolutional layer being applied prior to this example has reduced the image to a shape of [7, 7, 64], and the input to the next fully connected layer needs to be a single dimension, so in the next line of the example, the tensor is reshaped from [7,7,64] to [7*7*64] so it can connect to the FC layer.
For more info on how convolutions and max pooling works, the wikipedia page has some helpful graphics:
e.g. network architecture:
and pooling:
I have a problem with which I've been struggling. It is related to tf.matmul() and its absence of broadcasting.
I am aware of a similar issue on https://github.com/tensorflow/tensorflow/issues/216, but tf.batch_matmul() doesn't look like a solution for my case.
I need to encode my input data as a 4D tensor:
X = tf.placeholder(tf.float32, shape=(None, None, None, 100))
The first dimension is the size of a batch, the second the number of entries in the batch.
You can imagine each entry as a composition of a number of objects (third dimension). Finally, each object is described by a vector of 100 float values.
Note that I used None for the second and third dimensions because the actual sizes may change in each batch. However, for simplicity, let's shape the tensor with actual numbers:
X = tf.placeholder(tf.float32, shape=(5, 10, 4, 100))
These are the steps of my computation:
compute a function of each vector of 100 float values (e.g., linear function)
W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1))
Y = tf.matmul(X, W)
problem: no broadcasting for tf.matmul() and no success using tf.batch_matmul()
expected shape of Y: (5, 10, 4, 50)
applying average pooling for each entry of the batch (over the objects of each entry):
Y_avg = tf.reduce_mean(Y, 2)
expected shape of Y_avg: (5, 10, 50)
I expected that tf.matmul() would have supported broadcasting. Then I found tf.batch_matmul(), but still it looks like doesn't apply to my case (e.g., W needs to have 3 dimensions at least, not clear why).
BTW, above I used a simple linear function (the weights of which are stored in W). But in my model I have a deep network instead. So, the more general problem I have is automatically computing a function for each slice of a tensor. This is why I expected that tf.matmul() would have had a broadcasting behavior (if so, maybe tf.batch_matmul() wouldn't even be necessary).
Look forward to learning from you!
Alessio
You could achieve that by reshaping X to shape [n, d], where d is the dimensionality of one single "instance" of computation (100 in your example) and n is the number of those instances in your multi-dimensional object (5*10*4=200 in your example). After reshaping, you can use tf.matmul and then reshape back to the desired shape. The fact that the first three dimensions can vary makes that little tricky, but you can use tf.shape to determine the actual shapes during run time. Finally, you can perform the second step of your computation, which should be a simple tf.reduce_mean over the respective dimension. All in all, it would look like this:
X = tf.placeholder(tf.float32, shape=(None, None, None, 100))
W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1))
X_ = tf.reshape(X, [-1, 100])
Y_ = tf.matmul(X_, W)
X_shape = tf.gather(tf.shape(X), [0,1,2]) # Extract the first three dimensions
target_shape = tf.concat(0, [X_shape, [50]])
Y = tf.reshape(Y_, target_shape)
Y_avg = tf.reduce_mean(Y, 2)
As the renamed title of the GitHub issue you linked suggests, you should use tf.tensordot(). It enables contraction of axes pairs between two tensors, in line with Numpy's tensordot(). For your case:
X = tf.placeholder(tf.float32, shape=(5, 10, 4, 100))
W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1))
Y = tf.tensordot(X, W, [[3], [0]]) # gives shape=[5, 10, 4, 50]
So, here is what I want to do:
Right now, I have padding = 'SAME' for all of my neural net layers. I would like to make my code more generic, so I can build my nets with arbitrary paddings, and I don't want to have to calculate how big the output tensors of the layers of my net are. I would like to just access the dimension at initialization/run time, the way the tf.nn functions apparently do internally, so I can initialize my weight and bias tensors in the correct dimension...
So,
How do I access the "shape" function/object of the output placeholder of a convolution?
There are two kinds of shapes -- tensor.get_shape() which gives static shape computed by Python wrappers during Graph construction (whenever possible), and tf.shape(tensor) which is an op that can be executed during runtime to get shape of the tensor (always possible). Both of these work for convolutions.
a = tf.Variable(tf.ones((1, 3, 3, 1)))
b = tf.Variable(tf.ones((3, 3, 1, 1)))
c = tf.nn_ops.conv2d(a, b, [1, 1, 1, 1], padding="VALID")
sess = create_session()
sess.run(tf.initialize_all_variables())
print c.get_shape()
print sess.run(tf.shape(c))
This gives
(1, 1, 1, 1)
[1 1 1 1]