Understanding basic Keras Conv2DTranspose example - tensorflow

This is definitely a basic question, but I'm having trouble understanding exactly what is going on with Keras's layers.Conv2DTranspose function. I have the following three lines:
Setup
model = tf.keras.Sequential()
...
model.add(layers.Reshape((10, 10, 256)))
model.add(layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False))
assert model.output_shape == (None, 10, 10, 128)
The first occurrence of Reshape gets me a tensor of shape [10x10x256].
In the Conv2DTranspose layer, somehow I'm sliding a filter of shape [5x5] along this tensor and ending up with a new tensor of shape [10x10x128].
Question
What mathematically is happening to get me from the first tensor [10x10x256] to the second [10x10x128]?

It's almost the same as a convolution, but with fancy paddings to get the feeling of doing a backward convolution.
The sliding window in your picture is correctly positioned.
But it's not a "window", it is actually a "sliding block". The size of the block is 256 in depth.
So, it goes multiplying and summing all the channels for each stride.
But then there are 128 different sliding blocks (as you defined in your layer with filters=128). Each of these 128 sliding blocks produce separate output channel.
Great explanations about transposed convolutions: https://datascience.stackexchange.com/questions/6107/what-are-deconvolutional-layers

Related

Why does TensorFlow Conv2D have two weights matrices?

I have a tf.keras.layers.Conv2D constructed like so:
>>> conv2d_layer = tf.keras.layers.Conv2D(filters=128, kernel_size=(3, 3), strides=2)
For reference that layer is part of a network where the prior layer is prior_layer = Conv2D(filters=64, kernel_size=(3, 3), strides=2).
When I call conv2d_layer.get_weights(), it returns a list with two entries:
>>> [w.shape for w in conv2d_layer.get_weights()]
[(3, 3, 64, 128), (128,)]
Why are there two np.ndarrays in conv2d_layer.get_weights()? What are their respective meanings?
The first shape is for the weights of your conv2D, and the second one is the bias for the same layer, which is represented by a vector.
Looking at the documentation, you can see
For example, a Dense layer returns a list of two values: the kernel matrix and the bias vector. These can be used to set the weights of another Dense layer:
You have 128 convolution filters, each filter has a bias and a kernel. The kernel has a size 3x3. Furthermore, the kernel depth is equal to the input depth (it means 64 in this example). So we have (3, 3, 64) for a kernel, and we have 128 filters, so all filter's shapes are equal to (3, 3, 64, 128). Also, we have a bias for every filter so the shape of the second weight is equal to (128,)

Using tf.where() and tf.gather_nd() with None dimension

I am tackling a machine learning problem in which I feed my network with data of shape (batch_size, n_objects, n_features). So, each training instance comes with a given number of objects, each of them having a given number of features. Among these features I have electric charge, and while writing a custom loss function I would like to use only the neutral objects to compute it. Thus, starting from a tensor of shape (batch_size, n_objects, n_features) I would like to get a tensor of shape (batch_size, n_neutral_objects, n_features). In doing this, I'm facing a couple of problems.
First of all, I made a try by creating a tensor by hand. I have 3 training instances, each one having 2 objects, each one having 3 features. I try to get the neutral objects using tf.where() and tf.gather() methods in the following way (suppose that electric charge is the 2nd feature):
a = tf.constant([[[3.5, 0, 6], [2.1, 1, 2.9]], [[1.5, 1, 4.5], [2.0, 0, 4.2]], [[6.2, 0, 6.1], [4.8, 1, 3.4]]]) #toy input tensor
b = tf.where(a[:,:,1] == 0) #find neutral objects (charge is 2nd feature)
c = tf.gather_nd(a,b) #gather them
print(c)
This kind of works, as I get
[[3.5 0. 6. ]
[2. 0. 4.2]
[6.2 0. 6.1]], shape=(3, 3), dtype=float32)
as an output, which are the desired objects. But I've somehow lost the first dimension, as I don't want a tensor of shape (3, 3), but rather one of shape (3, 1, 3), namely still 3 input instances, each one having only one neutral object, each of them having 3 features.
Things get worse if I plug my approach into my TF model. In this real-life case, my batch size is None and I am thus dealing with tensors of shape (None, 4000, 14) (so 4000 objects for each training instance, 14 features each). This is the code I tried
def get_neutrals(tensor):
print("tensor.get_shape()", tensor.get_shape())
charges = tensor[:,:,4] #charge is the 5th feature in this case
print("charges.get_shape()", charges.get_shape())
where_neutrals = tf.where(charges == 0) # get the neutrals only
print("where_neutrals.get_shape()", where_neutrals.get_shape())
print("tf.gather_nd(tensor, where_neutrals).get_shape()", tf.gather_nd(tensor, where_neutrals).get_shape())
return tf.gather_nd(tensor, where_neutrals)
and this is what I get printed if I call my method:
tensor.get_shape() (None, 4000, 14)
charges.get_shape() (None, 4000)
where_neutrals.get_shape() (None, 2)
tf.gather_nd(tensor, where_neutrals).get_shape() (None, 14)
The last two shapes are completely unexpected and I don't know why they look like this. Can anyone here help with this?
Thanks a lot, cheers,
F.

When the input shape is incompatible, what will tensorflow actually do?

Thanks for your reading.
I train a LSTM predictor with fixed dimension (None, 5, 2), and I test the predictor with smaller dimension (None, 1, 2), and I got the warning:
WARNING:tensorflow:Model was constructed with shape (None, 5, 2) for input Tensor("input_1_1:0", shape=(None, 5, 2), dtype=float32), but it was called on an input with incompatible shape (None, 1, 2).
However, the results are fine.
I just wonder what tensorflow actually do when the case happens? Say it will automatically pad zero, such that to match the dimensions?
Again, thanks for your reading and looking forward to an answer.
Tensor computations are executed as a TensorFlow graph - see https://www.tensorflow.org/guide/intro_to_graphs. Normally graph execution is faster.
The second dimension of LSTM is dynamic. In such cases keras have to rebuild the graph every time when the input shape changing. It is slow. If your input shape is changing frequently - graph execution could be slower than eager execution. Because of that - keras issuing a warning.
Keras don't pad your data.

output dimension of reshape layer

model = tf.keras.Sequential()
model.add(layers.Dense(7*7*256, use_bias=False, input_shape=(100,)))
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(layers.Reshape((7, 7, 256)))
The dense layer takes input of 1*100 dimension. It uses 7*7*256 nodes in it's layer. Reshape layer takes 1*(7*7*256) as input and what's it's output. I mean what does (7, 7, 256) means ?
Is it the image of dimension 7 * 7 if we give input as image of 1*100? What is it ?
I am sorry, I know that I have understood it in a completely wrong way. So I wanted to understand it.
Here your model will take an input_shape of (*, 100), the first dense layer will output a shape of ( * , 7*7*256) and finaly the last Reshape layer will reshape that output to an array of shape (*, 7, 7, 256).
With * being your batch_size.
So yeah basically, your 'image' of shape (,100) will be reshaped to an array of shape
(, 7, 7, 256).
Hope this will help you
This has reference to the google's tensorflow mnist dcgan tutorial.
The first dense layer at the input is configured to have number of filters 7 * 7 * 256 and we are not able to find an explanation for this in the tutorial.
My initial impression about this is as follows:
Remember we want a 28x28 grey scale image as output. That means the required output shape is (None, 28, 28, 1) where first entity is batch size, which is none if a single image is required.
Now note that a Conv2DTranspose layer with strides=(2,2) essentially upsamples the input shape by a factor of 2, it doubles it. Secondly the number of filters of Conv2DTranspose layer become the channels, if I want the output to be grey scale, the number of filters should be one. Thus, if I want (None, 28,28,1) at the output of Conv2DTranspose layer, the shape of its input should be (None, 14,14,x). (No if channels is rather decided by current layer, x can be any value at input).
Suppose I am again putting one more Conv2DTranspose layer with strides=(2,2), before this layer, obviously the input to this layer should be (None, 7,7,x) where x is number of filters.
In general, if a batch of images of size (h, w) is input to a Conv2DTranspose layer with strides = (2,2), its output will have shape (batch_size, 2 * h, 2 * w , no_of_filters)
The google tutorial further puts one more Conv2DTranspose layer [but with strides =(1,1) so it does not have the upsampling effect] and a Dense layer on top of it. These layers are not doing upsampling so the input shape remains 7x7. 7x7 is the image shape here. The first dense layer's output is in flattened shape, so if it has 7 * 7 * x units, we can always reshape it to get an (7,7,x) image.
This is theory behind that 7 * 7 *x number of units of first dense layer. The value 256 they have used is an arbitrary value which they might have derived empirically or intuitively, I guess.

TensorFlow Batch Normalization Dimension

I'm trying to use batch normalization in a conv2d_transpose as follows:
h1 = tf.layers.conv2d_transpose(inputs, 64, 4, 2, padding='SAME',
kernel_initializer=tf.variance_scaling_initializer,
bias_initializer=tf.ones_initializer,
activity_regularizer=tf.layers.batch_normalization,
)
h2 = tf.layers.conv2d_transpose(h1, 3, 4, 2, padding='SAME',
kernel_initializer=tf.variance_scaling_initializer,
bias_initializer=tf.ones_initializer,
activity_regularizer=tf.layers.batch_normalization,
)
And I am receiving the following error:
ValueError: Dimension 1 in both shapes must be equal, but are 32 and 64
From merging shape 2 with other shapes. for 'tower0/AddN' (op: 'AddN') with input shapes: [?,32,32,64], [?,64,64,3].
I've seen that other people have had this error in Keras because of the difference in dimension ordering between TensorFlow and Theano. However, I'm using pure TensorFlow, all of my variables are in TensorFlow dimension format (batch_size, height, width, channels), and the data_format of the conv2d_transpose layer should be the default 'channels_last'. What am I missing here?
tf.layers.batch_normalization should be added as a layer, not a regularizer. activity_regularizer is a function that takes activity (layer's output) and produces an extra loss term that is added to the overall loss term of the whole network. For example, you might want to penalize networks that produce high activation. You can see how activity_regularizer is called on the outputs and its result added to the loss here.