Why does TensorFlow Conv2D have two weights matrices? - tensorflow

I have a tf.keras.layers.Conv2D constructed like so:
>>> conv2d_layer = tf.keras.layers.Conv2D(filters=128, kernel_size=(3, 3), strides=2)
For reference that layer is part of a network where the prior layer is prior_layer = Conv2D(filters=64, kernel_size=(3, 3), strides=2).
When I call conv2d_layer.get_weights(), it returns a list with two entries:
>>> [w.shape for w in conv2d_layer.get_weights()]
[(3, 3, 64, 128), (128,)]
Why are there two np.ndarrays in conv2d_layer.get_weights()? What are their respective meanings?

The first shape is for the weights of your conv2D, and the second one is the bias for the same layer, which is represented by a vector.
Looking at the documentation, you can see
For example, a Dense layer returns a list of two values: the kernel matrix and the bias vector. These can be used to set the weights of another Dense layer:

You have 128 convolution filters, each filter has a bias and a kernel. The kernel has a size 3x3. Furthermore, the kernel depth is equal to the input depth (it means 64 in this example). So we have (3, 3, 64) for a kernel, and we have 128 filters, so all filter's shapes are equal to (3, 3, 64, 128). Also, we have a bias for every filter so the shape of the second weight is equal to (128,)

Related

Difference between Global Pooling and (normal) Pooling Layers in keras

Is there any significance difference between the Pooling layers. There are two types of Max and Average Pooling ( except 1,2,3-D ) basically named GlobalPooling and (normal)Pooling. In the documents provided by Keras, there is not so much difference and explanation provided.
What is the difference among the different layers?
Normal pooling layers do the pool according to the specific pool_size, stride, and padding.
For example
inp = Input((224, 224, 3))
x = MaxPooling()(x) # default pool_size and stride is 2
The output will has shape (112, 112, 3).
Global pooling is like, make the pool size equal to width and heigth, and do flatten. If input shape is (224, 224, 3) you will get a tensor shape (3), if input is (7, 7, 1024) you will get a (1024) .

Why Conv2D has different number of filters in each layer

Learning from this Keras document example
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same', # why filter is 32?
input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3))) # why filter is not changed?
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same')) # why filter is changed to 64?
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512)) # why Dense neurons is 512? not 1024? what's the rule to set the number?
Here are my qeustions:
why in the 1st layer filter is 32 and not changed in the 2nd place but still in 1st layer?
Why in the 2nd layer filter is changed to 64? What is the rule to set the number?
why Dense neurons are 512? not 1024? what's the rule to set the number?
Why in the 1st layer filter is 32 and not changed in the 2nd place but still in 1st layer?
Number of filters can be any arbitrary number. It's just a matter of having more kernels in that layer. Each filter does a separate convolution on all channels of the input. So 32 filters does 32 separate convolutions on all RGB channels of the input.
Why in the 2nd layer filter is changed to 64? What is the rule to set the number?
Again following the first answer, number of filter on each layer can be anything. Here for example, the second layers has 64 filters doing 64 separate convolutions on all 32 channels of the output of the first layer.
Why Dense neurons are 512? not 1024? what's the rule to set the number?
Again the dense layer can have any number of neurons. For example of you have a 64x64x3 RGB input, your last convolution output will produce (batch_size, 16, 16, 64) (assuming padding='same' and stride of (2,2) on max pool layer) output.
After going through Flatten() layer this will become a (batch_size, 16*16*64) output. Then you convert take this as the input to the dense layer and produce a (batch_size, 512) output (because the Dense layer has 512 neurons). To be exact the Dense layer does the following matrix multiplication. (batch_size, 16*16*64) x (16*16*64, 512) which results in a (batch_size, 512) sized output from the Dense layer.
Note: To set these parameters, best way would be to do hyperparameter optimization w.r.t your dataset.
Edit: What do I mean by separate convolutions
So a filter would represent a single color here. This is for 1D convolution (with padding='valid'). But you get the idea. They are randomly initialized separate filters. Over time, they learn various filters.

output dimension of reshape layer

model = tf.keras.Sequential()
model.add(layers.Dense(7*7*256, use_bias=False, input_shape=(100,)))
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(layers.Reshape((7, 7, 256)))
The dense layer takes input of 1*100 dimension. It uses 7*7*256 nodes in it's layer. Reshape layer takes 1*(7*7*256) as input and what's it's output. I mean what does (7, 7, 256) means ?
Is it the image of dimension 7 * 7 if we give input as image of 1*100? What is it ?
I am sorry, I know that I have understood it in a completely wrong way. So I wanted to understand it.
Here your model will take an input_shape of (*, 100), the first dense layer will output a shape of ( * , 7*7*256) and finaly the last Reshape layer will reshape that output to an array of shape (*, 7, 7, 256).
With * being your batch_size.
So yeah basically, your 'image' of shape (,100) will be reshaped to an array of shape
(, 7, 7, 256).
Hope this will help you
This has reference to the google's tensorflow mnist dcgan tutorial.
The first dense layer at the input is configured to have number of filters 7 * 7 * 256 and we are not able to find an explanation for this in the tutorial.
My initial impression about this is as follows:
Remember we want a 28x28 grey scale image as output. That means the required output shape is (None, 28, 28, 1) where first entity is batch size, which is none if a single image is required.
Now note that a Conv2DTranspose layer with strides=(2,2) essentially upsamples the input shape by a factor of 2, it doubles it. Secondly the number of filters of Conv2DTranspose layer become the channels, if I want the output to be grey scale, the number of filters should be one. Thus, if I want (None, 28,28,1) at the output of Conv2DTranspose layer, the shape of its input should be (None, 14,14,x). (No if channels is rather decided by current layer, x can be any value at input).
Suppose I am again putting one more Conv2DTranspose layer with strides=(2,2), before this layer, obviously the input to this layer should be (None, 7,7,x) where x is number of filters.
In general, if a batch of images of size (h, w) is input to a Conv2DTranspose layer with strides = (2,2), its output will have shape (batch_size, 2 * h, 2 * w , no_of_filters)
The google tutorial further puts one more Conv2DTranspose layer [but with strides =(1,1) so it does not have the upsampling effect] and a Dense layer on top of it. These layers are not doing upsampling so the input shape remains 7x7. 7x7 is the image shape here. The first dense layer's output is in flattened shape, so if it has 7 * 7 * x units, we can always reshape it to get an (7,7,x) image.
This is theory behind that 7 * 7 *x number of units of first dense layer. The value 256 they have used is an arbitrary value which they might have derived empirically or intuitively, I guess.

TensorFlow Batch Normalization Dimension

I'm trying to use batch normalization in a conv2d_transpose as follows:
h1 = tf.layers.conv2d_transpose(inputs, 64, 4, 2, padding='SAME',
kernel_initializer=tf.variance_scaling_initializer,
bias_initializer=tf.ones_initializer,
activity_regularizer=tf.layers.batch_normalization,
)
h2 = tf.layers.conv2d_transpose(h1, 3, 4, 2, padding='SAME',
kernel_initializer=tf.variance_scaling_initializer,
bias_initializer=tf.ones_initializer,
activity_regularizer=tf.layers.batch_normalization,
)
And I am receiving the following error:
ValueError: Dimension 1 in both shapes must be equal, but are 32 and 64
From merging shape 2 with other shapes. for 'tower0/AddN' (op: 'AddN') with input shapes: [?,32,32,64], [?,64,64,3].
I've seen that other people have had this error in Keras because of the difference in dimension ordering between TensorFlow and Theano. However, I'm using pure TensorFlow, all of my variables are in TensorFlow dimension format (batch_size, height, width, channels), and the data_format of the conv2d_transpose layer should be the default 'channels_last'. What am I missing here?
tf.layers.batch_normalization should be added as a layer, not a regularizer. activity_regularizer is a function that takes activity (layer's output) and produces an extra loss term that is added to the overall loss term of the whole network. For example, you might want to penalize networks that produce high activation. You can see how activity_regularizer is called on the outputs and its result added to the loss here.

regarding shape values in convolutional layer of CIFAR10 example

In the CIFAR10 example, the conv2 is defined as follows. How to know that the shape=[5,5,64,64] in kernel = _variable_with_weight_decay should be given those values, e.g., 5,5,64,64 In addition, in biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.1)), shape is also defined as [64], how to get those values?
# conv2
with tf.variable_scope('conv2') as scope:
kernel = _variable_with_weight_decay('weights',
shape=[5, 5, 64, 64],
stddev=5e-2,
wd=0.0)
conv = tf.nn.conv2d(norm1, kernel, [1, 1, 1, 1], padding='SAME')
biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.1))
bias = tf.nn.bias_add(conv, biases)
conv2 = tf.nn.relu(bias, name=scope.name)
_activation_summary(conv2)
Looking at the source, we see that a call to _variable_with_weight_decay boils down to a tf.get_variable call. We are retrieving a weight tensor (creating one if it doesn't already exist)
In a Convolutional Neural Network, the weight tensor defines a mapping from one layer to the next, but differs from a vanilla NN. The convolution implies you are applying a convolutional filter to your as you map from one layer to the next. This filter is defined with hyper-parameters which are the ones fed into shape.
There are four parameters fed into shape, the first two relate to the size of the convolution filter. In this case we have a 5x5 filter. The third parameter defines the input dimension, which in this case is the same as the output of in the previous convolution:
kernel = _variable_with_weight_decay('weights',
shape=[5, 5, 3, 64],
stddev=5e-2,
wd=0.0)
The fourth parameter defines the output dimension of the tensor.
Bias is a perturbation to system used for better learning. The bias is added to the output of the convolution. Given by basic linear algebra rules, these two vectors should be of the same size, in this case it is 64
Cheers!