output dimension of reshape layer - tensorflow

model = tf.keras.Sequential()
model.add(layers.Dense(7*7*256, use_bias=False, input_shape=(100,)))
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(layers.Reshape((7, 7, 256)))
The dense layer takes input of 1*100 dimension. It uses 7*7*256 nodes in it's layer. Reshape layer takes 1*(7*7*256) as input and what's it's output. I mean what does (7, 7, 256) means ?
Is it the image of dimension 7 * 7 if we give input as image of 1*100? What is it ?
I am sorry, I know that I have understood it in a completely wrong way. So I wanted to understand it.

Here your model will take an input_shape of (*, 100), the first dense layer will output a shape of ( * , 7*7*256) and finaly the last Reshape layer will reshape that output to an array of shape (*, 7, 7, 256).
With * being your batch_size.
So yeah basically, your 'image' of shape (,100) will be reshaped to an array of shape
(, 7, 7, 256).
Hope this will help you

This has reference to the google's tensorflow mnist dcgan tutorial.
The first dense layer at the input is configured to have number of filters 7 * 7 * 256 and we are not able to find an explanation for this in the tutorial.
My initial impression about this is as follows:
Remember we want a 28x28 grey scale image as output. That means the required output shape is (None, 28, 28, 1) where first entity is batch size, which is none if a single image is required.
Now note that a Conv2DTranspose layer with strides=(2,2) essentially upsamples the input shape by a factor of 2, it doubles it. Secondly the number of filters of Conv2DTranspose layer become the channels, if I want the output to be grey scale, the number of filters should be one. Thus, if I want (None, 28,28,1) at the output of Conv2DTranspose layer, the shape of its input should be (None, 14,14,x). (No if channels is rather decided by current layer, x can be any value at input).
Suppose I am again putting one more Conv2DTranspose layer with strides=(2,2), before this layer, obviously the input to this layer should be (None, 7,7,x) where x is number of filters.
In general, if a batch of images of size (h, w) is input to a Conv2DTranspose layer with strides = (2,2), its output will have shape (batch_size, 2 * h, 2 * w , no_of_filters)
The google tutorial further puts one more Conv2DTranspose layer [but with strides =(1,1) so it does not have the upsampling effect] and a Dense layer on top of it. These layers are not doing upsampling so the input shape remains 7x7. 7x7 is the image shape here. The first dense layer's output is in flattened shape, so if it has 7 * 7 * x units, we can always reshape it to get an (7,7,x) image.
This is theory behind that 7 * 7 *x number of units of first dense layer. The value 256 they have used is an arbitrary value which they might have derived empirically or intuitively, I guess.

Related

Understanding basic Keras Conv2DTranspose example

This is definitely a basic question, but I'm having trouble understanding exactly what is going on with Keras's layers.Conv2DTranspose function. I have the following three lines:
Setup
model = tf.keras.Sequential()
...
model.add(layers.Reshape((10, 10, 256)))
model.add(layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False))
assert model.output_shape == (None, 10, 10, 128)
The first occurrence of Reshape gets me a tensor of shape [10x10x256].
In the Conv2DTranspose layer, somehow I'm sliding a filter of shape [5x5] along this tensor and ending up with a new tensor of shape [10x10x128].
Question
What mathematically is happening to get me from the first tensor [10x10x256] to the second [10x10x128]?
It's almost the same as a convolution, but with fancy paddings to get the feeling of doing a backward convolution.
The sliding window in your picture is correctly positioned.
But it's not a "window", it is actually a "sliding block". The size of the block is 256 in depth.
So, it goes multiplying and summing all the channels for each stride.
But then there are 128 different sliding blocks (as you defined in your layer with filters=128). Each of these 128 sliding blocks produce separate output channel.
Great explanations about transposed convolutions: https://datascience.stackexchange.com/questions/6107/what-are-deconvolutional-layers

Convolutional Neural Network (CNN) input shape

I am new to CNN and I have a question regarding CNN. I am a bit confused about the input shape of CNN (specifically with Keras).
My data is a 2D data (let's say 10X10) in different time slots. Therefore, I have 3D data.
I am going to feed this data to my model to predict the coming time slot. So, I will have a certain number of time slots for prediction (let's say 10 slots, so far, I may have a 10X10X10 data).
Now, my question is that I have to deal with this data as a 2D image with 10 channels (like ordinary kinds of data in CNN, RGB images) or as a 3D data. (conv2D or conv3D in Keras).
Thank you in advance for your help.
In your case,Conv2D will be useful. Please refer below description for understanding input shape of Convolution Neural Network (CNN) using Conv2D.
Let’s see how the input shape looks like. The input data to CNN will look like the following picture. We are assuming that our data is a collection of images.
Input shape has (batch_size, height, width, channels). Incase of RGB image would have a channel of 3 and the greyscale image would have a channel of 1.
Let’s look at the following code
import tensorflow as tf
from tensorflow.keras.layers import Conv2D
model=tf.keras.models.Sequential()
model.add(Conv2D(filters=64, kernel_size=1, input_shape=(10,10,3)))
model.summary()
Output:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 10, 10, 64) 256
=================================================================
Thought it looks like out input shape is 3D, but you have to pass a 4D array at the time of fitting the data which should be like (batch_size, 10, 10, 3). Since there is no batch size value in the input_shape argument, we could go with any batch size while fitting the data.
The output shape is (None, 10, 10, 64). The first dimension represents the batch size, which is None at the moment. Because the network does not know the batch size in advance.
Note: Once you fit the data, None would be replaced by the batch size you give while fitting the data.
Let’s look at another code with batch Size
import tensorflow as tf
from tensorflow.keras.layers import Conv2D
model=tf.keras.models.Sequential()
model.add(Conv2D(filters=64, kernel_size=1, batch_input_shape=(16,10,10,3)))
model.summary()
Output:
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (16, 10, 10, 64) 256
=================================================================
Here I have replaced input_shape argument with batch_input_shape. As the name suggests, this argument will ask you the batch size in advance, and you can not provide any other batch size at the time of fitting the data.

Difference between Global Pooling and (normal) Pooling Layers in keras

Is there any significance difference between the Pooling layers. There are two types of Max and Average Pooling ( except 1,2,3-D ) basically named GlobalPooling and (normal)Pooling. In the documents provided by Keras, there is not so much difference and explanation provided.
What is the difference among the different layers?
Normal pooling layers do the pool according to the specific pool_size, stride, and padding.
For example
inp = Input((224, 224, 3))
x = MaxPooling()(x) # default pool_size and stride is 2
The output will has shape (112, 112, 3).
Global pooling is like, make the pool size equal to width and heigth, and do flatten. If input shape is (224, 224, 3) you will get a tensor shape (3), if input is (7, 7, 1024) you will get a (1024) .

CNN features(dimensions) feed to LSTM Tensorflow

So recently i am working on a project which i am supposed to take images as input to a CNN and extract the features and feed them to LSTM for training. I am using 2 Layer CNN for feature extraction and im taking the features form fully connected layer and trying to feed them to LSTM. Problem is when i want to feed the FC layer to LSTM as input i get error regarding to wrong dimension. my FC layer is a Tensor with (128,1024) dimension. I tried to reshape it like this tf.reshape(fc,[-1]) which gives me a tensor ok (131072, )
dimension and still wont work. Could anyone give me any ideas of how im suppose to feed the FC to LSTM?here i just write part of my code and teh error i get.
Convolution Layer with 32 filters and a kernel size of 5
conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv1 = tf.layers.max_pooling2d(conv1, 2, 2)
# Convolution Layer with 32 filters and a kernel size of 5
conv2 = tf.layers.conv2d(conv1, 64, 3, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv2 = tf.layers.max_pooling2d(conv2, 2, 2)
# Flatten the data to a 1-D vector for the fully connected layer
fc1 = tf.contrib.layers.flatten(conv2)
# Fully connected layer (in contrib folder for now)
fc1 = tf.layers.dense(fc1, 1024)
# Apply Dropout (if is_training is False, dropout is not applied)
fc1 = tf.layers.dropout(fc1, rate=dropout, training=is_training)
s = tf.reshape(fc1, [1])
rnn_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
outputs, states = rnn.static_rnn(rnn_cell, s, dtype=tf.float32)
return tf.matmul(outputs[-1], rnn_weights['out']) + rnn_biases['out']
here is the error:
ValueError: Cannot reshape a tensor with 131072 elements to shape [1] (1 elements) for 'ConvNet/Reshape' (op: 'Reshape') with input shapes: [128,1024], [1] and with input tensors computed as partial shapes: input[1] = [1].
You have a logical error in how you approach the problem. Collapsing the data to a 1D tensor is not going to solve anything (even if you get it to work correctly).
If you are taking a sequence of images as input your input tensor should be 5D (batch, sequence_index, x, y, channel) or something permutation like that. conv2d should complain about the extra dimension but you probably missing one of them. You should try to fix it first.
Next use conv3d and max_pool3d with a window of 1 for the depth (since you don't want the different frames to interact at this stage).
When you are done you should still have 5D tensor, but x and y dimensions should be 1 (you should check this, and fix the operation if that's not the case).
The RNN part expects 3D tensors (batch, sequence_index, fature_index). You can use tf.squeeze to remove the 1 sized dimensions from your 5D tensor and get this 3D tensor. You shouldn't have to reshape anything.
If you don't use batches, it's OK, but the operations will still expect the dimension to be there (but for you it will be 1). Missing the dimension will cause problems with shapes down the line.

TensorFlow Batch Normalization Dimension

I'm trying to use batch normalization in a conv2d_transpose as follows:
h1 = tf.layers.conv2d_transpose(inputs, 64, 4, 2, padding='SAME',
kernel_initializer=tf.variance_scaling_initializer,
bias_initializer=tf.ones_initializer,
activity_regularizer=tf.layers.batch_normalization,
)
h2 = tf.layers.conv2d_transpose(h1, 3, 4, 2, padding='SAME',
kernel_initializer=tf.variance_scaling_initializer,
bias_initializer=tf.ones_initializer,
activity_regularizer=tf.layers.batch_normalization,
)
And I am receiving the following error:
ValueError: Dimension 1 in both shapes must be equal, but are 32 and 64
From merging shape 2 with other shapes. for 'tower0/AddN' (op: 'AddN') with input shapes: [?,32,32,64], [?,64,64,3].
I've seen that other people have had this error in Keras because of the difference in dimension ordering between TensorFlow and Theano. However, I'm using pure TensorFlow, all of my variables are in TensorFlow dimension format (batch_size, height, width, channels), and the data_format of the conv2d_transpose layer should be the default 'channels_last'. What am I missing here?
tf.layers.batch_normalization should be added as a layer, not a regularizer. activity_regularizer is a function that takes activity (layer's output) and produces an extra loss term that is added to the overall loss term of the whole network. For example, you might want to penalize networks that produce high activation. You can see how activity_regularizer is called on the outputs and its result added to the loss here.