Difference between Global Pooling and (normal) Pooling Layers in keras - tensorflow

Is there any significance difference between the Pooling layers. There are two types of Max and Average Pooling ( except 1,2,3-D ) basically named GlobalPooling and (normal)Pooling. In the documents provided by Keras, there is not so much difference and explanation provided.
What is the difference among the different layers?

Normal pooling layers do the pool according to the specific pool_size, stride, and padding.
For example
inp = Input((224, 224, 3))
x = MaxPooling()(x) # default pool_size and stride is 2
The output will has shape (112, 112, 3).
Global pooling is like, make the pool size equal to width and heigth, and do flatten. If input shape is (224, 224, 3) you will get a tensor shape (3), if input is (7, 7, 1024) you will get a (1024) .

Related

Why does TensorFlow Conv2D have two weights matrices?

I have a tf.keras.layers.Conv2D constructed like so:
>>> conv2d_layer = tf.keras.layers.Conv2D(filters=128, kernel_size=(3, 3), strides=2)
For reference that layer is part of a network where the prior layer is prior_layer = Conv2D(filters=64, kernel_size=(3, 3), strides=2).
When I call conv2d_layer.get_weights(), it returns a list with two entries:
>>> [w.shape for w in conv2d_layer.get_weights()]
[(3, 3, 64, 128), (128,)]
Why are there two np.ndarrays in conv2d_layer.get_weights()? What are their respective meanings?
The first shape is for the weights of your conv2D, and the second one is the bias for the same layer, which is represented by a vector.
Looking at the documentation, you can see
For example, a Dense layer returns a list of two values: the kernel matrix and the bias vector. These can be used to set the weights of another Dense layer:
You have 128 convolution filters, each filter has a bias and a kernel. The kernel has a size 3x3. Furthermore, the kernel depth is equal to the input depth (it means 64 in this example). So we have (3, 3, 64) for a kernel, and we have 128 filters, so all filter's shapes are equal to (3, 3, 64, 128). Also, we have a bias for every filter so the shape of the second weight is equal to (128,)

How Keras can calculate the number of parameters at early stage when there are still None dimensions?

Sorry for the very basic question (I'm new with Keras). I was wondering how Keras can calculate for each layer the number of parameters at an early stage (before fit) despite that model.summary shows that there are dimensions that still have None values at this stage. Are these values already determined in some way and if yes, why not show them in the summary?
I ask the question because I'm having a hard time figure out my "tensor shape bug" (I'm trying to determine the output dimensions of the the C5 block of my resnet50 model but I cannot see them in model.summary even if I see the number of parameters).
I give below an example based on C5_reduced layer in RetinaNet which is fed by C5 layer of Resnet50. The C5_reduced is
Conv2D(256,kernel_size=1,strides=1,pad=1)
Based on model.summary for this particular layer:
C5_reduced (Conv2D) (None, None, None, 256) 524544
I've made the guess that C5 is (None,1,1,2048) because 2048*256+256 = 524544 (I don't know how to confirm or infirm that hypothesis). So if it's already known, why not show it on summary? If dimensions 2 and 3 would have been different, the number of parameters would have been different too right?
If you pass exact input shape to your very first layer or input layer on your network, you will have the output that you want. For instance I used input layer here:
input_1 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
Passed input as (224,224,3). 3 represents the depth here. Note that convolutional parameters' calculation differ from Dense layers' calculation.
If you do such following:
tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(150, 150, 3))
You will see:
conv2d (Conv2D) ---> (None, 148, 148, 16)
Dimensions reduced to 148x148, in Keras padding is valid by default. Also strides is 1. Then the shape of output will be 148 x 148. (You can search for the formula.)
So then what are None values?
First None value is the batch size. In Keras first dimension is the batch size. You can pass them and make fixed, or you can determine them while fitting the model, or predicting.
In 2D convolution, the expected input is (batch_size, height, width, channels), you can also have shapes such as (None, None, None, 3), that means varying image sizes are allowed.
Edit:
tf.keras.layers.Input(shape = (None, None, 3)),
tf.keras.layers.Conv2D(16, (3,3), activation='relu')
Produces:
conv2d_21 (Conv2D) (None, None, None, 16) 448
Regarding to your question, how are the parameters calculated even we passed image height & width as None?
Convolution parameters calculated according to:
(filter_height * filter_width * input_image_channels + 1) * number_of_filters
When we put them into formula,
filter_height = 3
filter_width = 3
input_image_channel = 3
number_of_filters = 16
Parameters = (3 x 3 x 3 + 1) * 16 = 28 * 16 = 448
Notice, we only needed input_image's channel number which is 3, representing that it is an RGB image.
If you want to calculate the params for later convolutions, you need to consider that the number of filters from previous layer becomes the number of channels for current layer's channel.
That's how you can end up having None params rather than batch_size. Keras needs to know if your image is RGB or not in that case. Or you won't specify the dimensions while creating the model and can pass them while fitting the model with the dataset.
You need to define an input layer for your model. The total number of trainable parameters is unknown until you either a) compile the model and feed it data, at which point the model makes a graph based on the dimensions of the input and you will then be able to determine the number of params, or b) you define an input layer for the model with the input dimensions stated, then you can find the number of params with model.summary().
The point is that the model cannot know the number of parameters between the input and first hidden layer until it is defined, or you run inference and give it the shape of the input.

output dimension of reshape layer

model = tf.keras.Sequential()
model.add(layers.Dense(7*7*256, use_bias=False, input_shape=(100,)))
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(layers.Reshape((7, 7, 256)))
The dense layer takes input of 1*100 dimension. It uses 7*7*256 nodes in it's layer. Reshape layer takes 1*(7*7*256) as input and what's it's output. I mean what does (7, 7, 256) means ?
Is it the image of dimension 7 * 7 if we give input as image of 1*100? What is it ?
I am sorry, I know that I have understood it in a completely wrong way. So I wanted to understand it.
Here your model will take an input_shape of (*, 100), the first dense layer will output a shape of ( * , 7*7*256) and finaly the last Reshape layer will reshape that output to an array of shape (*, 7, 7, 256).
With * being your batch_size.
So yeah basically, your 'image' of shape (,100) will be reshaped to an array of shape
(, 7, 7, 256).
Hope this will help you
This has reference to the google's tensorflow mnist dcgan tutorial.
The first dense layer at the input is configured to have number of filters 7 * 7 * 256 and we are not able to find an explanation for this in the tutorial.
My initial impression about this is as follows:
Remember we want a 28x28 grey scale image as output. That means the required output shape is (None, 28, 28, 1) where first entity is batch size, which is none if a single image is required.
Now note that a Conv2DTranspose layer with strides=(2,2) essentially upsamples the input shape by a factor of 2, it doubles it. Secondly the number of filters of Conv2DTranspose layer become the channels, if I want the output to be grey scale, the number of filters should be one. Thus, if I want (None, 28,28,1) at the output of Conv2DTranspose layer, the shape of its input should be (None, 14,14,x). (No if channels is rather decided by current layer, x can be any value at input).
Suppose I am again putting one more Conv2DTranspose layer with strides=(2,2), before this layer, obviously the input to this layer should be (None, 7,7,x) where x is number of filters.
In general, if a batch of images of size (h, w) is input to a Conv2DTranspose layer with strides = (2,2), its output will have shape (batch_size, 2 * h, 2 * w , no_of_filters)
The google tutorial further puts one more Conv2DTranspose layer [but with strides =(1,1) so it does not have the upsampling effect] and a Dense layer on top of it. These layers are not doing upsampling so the input shape remains 7x7. 7x7 is the image shape here. The first dense layer's output is in flattened shape, so if it has 7 * 7 * x units, we can always reshape it to get an (7,7,x) image.
This is theory behind that 7 * 7 *x number of units of first dense layer. The value 256 they have used is an arbitrary value which they might have derived empirically or intuitively, I guess.

CNN features(dimensions) feed to LSTM Tensorflow

So recently i am working on a project which i am supposed to take images as input to a CNN and extract the features and feed them to LSTM for training. I am using 2 Layer CNN for feature extraction and im taking the features form fully connected layer and trying to feed them to LSTM. Problem is when i want to feed the FC layer to LSTM as input i get error regarding to wrong dimension. my FC layer is a Tensor with (128,1024) dimension. I tried to reshape it like this tf.reshape(fc,[-1]) which gives me a tensor ok (131072, )
dimension and still wont work. Could anyone give me any ideas of how im suppose to feed the FC to LSTM?here i just write part of my code and teh error i get.
Convolution Layer with 32 filters and a kernel size of 5
conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv1 = tf.layers.max_pooling2d(conv1, 2, 2)
# Convolution Layer with 32 filters and a kernel size of 5
conv2 = tf.layers.conv2d(conv1, 64, 3, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv2 = tf.layers.max_pooling2d(conv2, 2, 2)
# Flatten the data to a 1-D vector for the fully connected layer
fc1 = tf.contrib.layers.flatten(conv2)
# Fully connected layer (in contrib folder for now)
fc1 = tf.layers.dense(fc1, 1024)
# Apply Dropout (if is_training is False, dropout is not applied)
fc1 = tf.layers.dropout(fc1, rate=dropout, training=is_training)
s = tf.reshape(fc1, [1])
rnn_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
outputs, states = rnn.static_rnn(rnn_cell, s, dtype=tf.float32)
return tf.matmul(outputs[-1], rnn_weights['out']) + rnn_biases['out']
here is the error:
ValueError: Cannot reshape a tensor with 131072 elements to shape [1] (1 elements) for 'ConvNet/Reshape' (op: 'Reshape') with input shapes: [128,1024], [1] and with input tensors computed as partial shapes: input[1] = [1].
You have a logical error in how you approach the problem. Collapsing the data to a 1D tensor is not going to solve anything (even if you get it to work correctly).
If you are taking a sequence of images as input your input tensor should be 5D (batch, sequence_index, x, y, channel) or something permutation like that. conv2d should complain about the extra dimension but you probably missing one of them. You should try to fix it first.
Next use conv3d and max_pool3d with a window of 1 for the depth (since you don't want the different frames to interact at this stage).
When you are done you should still have 5D tensor, but x and y dimensions should be 1 (you should check this, and fix the operation if that's not the case).
The RNN part expects 3D tensors (batch, sequence_index, fature_index). You can use tf.squeeze to remove the 1 sized dimensions from your 5D tensor and get this 3D tensor. You shouldn't have to reshape anything.
If you don't use batches, it's OK, but the operations will still expect the dimension to be there (but for you it will be 1). Missing the dimension will cause problems with shapes down the line.

Convolutional neural network Conv1d input shape

I am trying to create a CNN to classify data. My Data is X[N_data, N_features]
I want to create a neural net capable of classifying it. My problem is concerning the input shape of a Conv1D for the keras back end.
I want to repeat a filter over.. let say 10 features and then keep the same weights for the next ten features.
For each data my convolutional layer would create N_features/10 New neurones.
How can i do so? What should I put in input_shape?
def cnn_model():
model = Sequential()
model.add(Conv1D(filters=1, kernel_size=10 ,strides=10,
input_shape=(1, 1,N_features),kernel_initializer= 'uniform',
activation= 'relu'))
model.flatten()
model.add(Dense(N_features/10, init= 'uniform' , activation= 'relu' ))
Any advice?
thank you!
Try:
def cnn_model():
model = Sequential()
model.add(Conv1D(filters=1, kernel_size=10 ,strides=10,
input_shape=(N_features, 1),kernel_initializer= 'uniform',
activation= 'relu'))
model.flatten()
model.add(Dense(N_features/10, init= 'uniform' , activation= 'relu' ))
....
And reshape your x to shape (nb_of_examples, nb_of_features, 1).
EDIT:
Conv1D was designed for a sequence analysis - to have convolutional filters which would be the same no matter in which part of sequence we are. The second dimension is so called features dimension where you could have a vector of multiple features at each of timesteps. One may think about sequence dimension the same as spatial dimensions and feature dimension the same as channel dimension or color dimension in Conv2D. As #putonspectacles mentioned in his comment - you may set sequence dimension to None in order to make your network input length invariant.
#Marcin's answer might work, but might suggestion given the documentation here:
When using this layer as the first layer in a model, provide an
input_shape argument (tuple of integers or None, e.g. (10, 128) for
sequences of 10 vectors of 128-dimensional vectors, or (None, 128) for
variable-length sequences of 128-dimensional vectors.
would be:
model = Sequential()
model.add(Conv1D(filters=1, kernel_size=10 ,strides=10,
input_shape=(None, N_features),kernel_initializer= 'uniform',
activation= 'relu'))
Note that since input data (N_Data, N_features), we set the number of examples as unspecified (None). The strides argument controls the size of of the timesteps in this case.
To input a usual feature table data of shape (nrows, ncols) to Conv1d of Keras, following 2 steps are needed:
xtrain.reshape(nrows, ncols, 1)
# For conv1d statement:
input_shape = (ncols, 1)
For example, taking first 4 features of iris dataset:
To see usual format and its shape:
iris_array = np.array(irisdf.iloc[:,:4].values)
print(iris_array[:5])
print(iris_array.shape)
The output shows usual format and its shape:
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]]
(150, 4)
Following code alters the format:
nrows, ncols = iris_array.shape
iris_array = iris_array.reshape(nrows, ncols, 1)
print(iris_array[:5])
print(iris_array.shape)
Output of above code data format and its shape:
[[[5.1]
[3.5]
[1.4]
[0.2]]
[[4.9]
[3. ]
[1.4]
[0.2]]
[[4.7]
[3.2]
[1.3]
[0.2]]
[[4.6]
[3.1]
[1.5]
[0.2]]
[[5. ]
[3.6]
[1.4]
[0.2]]]
(150, 4, 1)
This works well for Conv1d of Keras. For input_shape (4,1) is needed.