TensorflowJS : What is the parameter filters? - tensorflow

I'm learning TensorflowJS and I'm working on CNN.
I'm following this and in this tutorial you have to parameter the first layer like that
// In the first layer of out convolutional neural network we have
// to specify the input shape. Then we specify some paramaters for
// the convolution operation that takes place in this layer.
model.add(tf.layers.conv2d({
inputShape: [IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_CHANNELS],
kernelSize: 5,
filters: 8,
strides: 1,
activation: 'relu',
kernelInitializer: 'varianceScaling'
}));
filters. The number of filter windows of size kernelSize to apply to the input data. Here, we will apply 8 filters to the data.
Despite the little explanation I still not understand what the filters are :( Can somebody explain me ?
Thank you.

It's not correct definition, but perhaps it will help you with intuition. Filters are like channels. If you have a 28 x 28 pix image and this image holds RGB colors, we can say you have 28 x 28 x 3 dimension size of a picture, where 3 = [red, green, blue]. If you set filters to 10, (and let's assume first two guys stay the same), you will get 28x28x10 dimensions for your original input. It's very useful for feature detection. But it's very expensive for computing.

Related

understanding tensorflow conv1d trainable variable shape

I'm using Keras with TensorFlow 2 and I have a trained model with the weights corresponding to each layer of my model but the shape of some conv1d layers confused me.
I set the convolutional layers to have 64 filters with a length of 16 but the shape of my weight vector is like (16,64,64) at the end.
can someone explain this to me? I suppose that 16 is the length of every filter and the last 64 is my num_filters, what is the other one, I mean how is that 3-dimensional? it should be (16,64) or something.
and besides, isn't this odd to specify the length of every filter on z-axis ? (of course with assuming the computer science version of representing dimensions (z,x,y instead of x,y,z))
what I get is something like this :
name:conv1d/kernel:0 shape:(16,64,64) dtype:<dtype:'float32'> numpy=...
thank you guys in advance.
to answer my own question, the first 64 corresponds to depth of the data we are facing. for instance, if you want to have 5 filters with 32-element length for a data which has 10 features (in other words the input depth of the conv layer is 10) your variable shape will be : (32,10,5)

How conv2D function change the input layer

In my ResNet32 network coded using Tensorflow, the input size is 32 x 32 x 3 and the output of the
layer is 32 x 32 x 32. Why 32 channel is used ?
tf.contrib.layers.conv2d(
inputs,
**num_outputs**, /// how to determine the number of channel to be used in my layer?
kernel_size,
stride=1,
padding='SAME',
data_format=None,
rate=1,
activation_fn=tf.nn.relu,
normalizer_fn=None,
normalizer_params=None,
weights_initializer=initializers.xavier_initializer(),
weights_regularizer=None,
biases_initializer=tf.zeros_initializer(),
biases_regularizer=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
scope=None
)
Thank's in advance,
The 3 in input is the number to represent that the input image is RGB (color image), also known as color channels, and if it were a black and white image then it would have been 1 (monochrome image).
The 32 in output of this represents the number of neurons\number of features\number of channels you are using, so basically you are representing the image in 3 colors with 32 channels.
This helps in learning more complex and different set of features of the image. For example, it can make the network learn better edges.
By assigning stride=2 you can reduce the spatial size of input tensor so that the height and width of output tensor becomes half of that input tensor. That means, if your input tensor shape is (batch, 32, 32, 3) (3 is for RGB channel) to a Convolution layer having 32 kernels/filters with stride=2 then the shape of output tensor will be (batch, 16, 16, 32). Alternatively, Pooling is also widely used to reduce the output tensor size.
The ability of learning hierarchical representation by stacking conv layer is considered as the key to success of CNN. In CNN, as we go deeper the spatial size of the tensor reduces whereas the number of channel increases that helps to handle the variations in appearance of complex target object . This reduction of spatial size drastically decreases the required number of arithmetic operations and computation time with the motive of extracting prominent features contributing towards final output/decision. However, finding this optimal number of filter/kernel/output channel is time consuming and, therefore, people follow the proven earlier architectures e.g. VGG.

Does the shape of a tensor for an image affect the resulting output?

I am representing images of size 100px by 100px, so I can have the shape (None, 100, 100, 3) or shape (None, 10000, 3)
I can't find any clear explanation on Google, however, will the following two tensors result in similar results?
(None, 100, 100, 3)
(None, 10000, 3)
I assume either is sufficient as I would have thought the neural network will still learn just as well if the image is in a single row, your thoughts?
For the 1st shape : ( 100 , 100 , 3 )
This is a 3 dimensional tensor. If you are working with Dense layers, they require two dimensional input. Yes, 1D Convolutional layers exist but they are reserved for totally different use cases.
A Convolutional layer would pass a kernel through definite strides and will gather spatial information. This kernel will then get pooled so that the information is retained but with lesser dimensions.
Hence, the learning with this shape, would be far better as learning
of spatial features will take place. This is excellent for Image
Classification.
For 2nd shape : ( 10000 , 3 )
This is 2 dimensional tensor and would work with 1D Convolutional layers and Dense layers.
1D Convolutions pass the kernel through only one straight line ( axis ). Also the features of the image would get aligned in a straight ( all the columns would get lined up ). This will destroy the features of the image.
Hence, at last, an image is a 2D object a and must be kept in it's original dimension to facilitate learning. A 1D tensor has other uses like Text classification, human activity recognition etc.

Reusing layer weights in Tensorflow

I am using tf.slim to implement an autoencoder. I's fully convolutional with the following architecture:
[conv, outputs = 1] => [conv, outputs = 15] => [conv, outputs = 25] =>
=> [conv_transpose, outputs = 25] => [conv_transpose, outputs = 15] =>
[conv_transpose, outputs = 1]
It has to be fully convolutional and I cannot do pooling (limitations of the larger problem). I want to use tied weights, so
encoder_W_3 = decoder_W_1_Transposed
(so the weights of the first decoder layer are the ones of the last encoder layer, transposed).
If I reuse weights the regular way tfslim lets you reuse them, i.e. reuse = True and then just provide the scope name of the layer you want to reuse, I get size issue:
ValueError: Trying to share variable cnn_block_3/weights, but specified shape (21, 11, 25, 25) and found shape (21, 11, 15, 25).
This makes sense, if you do not transpose the weights of the previous model. Does anyone have an idea on how I can transpose those weights?
PS: I know this is very abstract and hand-waving, but I am working with a custom api, on top of tfslim, so I can't post code examples here.
Does anyone have an idea on how I can transpose those weights?
Transposition is simple:
new_weights = tf.transpose(weights, perm=[0, 1, 3, 2])
will swap the last two axes.
However, as #Seven mentioned, that wouldn't be enough to address the error, as the total number of weights changed.

How to understand the convolution parameters in tensorflow?

When I reading the chapter of "Deep MNIST for expert" in tensorflow tutorial.
There give below function for the weight of first layer. I can't understand why the patch size is 5*5 and why features number is 32, are they the random numbers that you can pick anyone or some rules must be followed? and whether the features number "32" is the “Convolution kernel”?
W_conv1 = weight_variable([5, 5, 1, 32])
First Convolutional Layer
We can now implement our first layer. It will consist of convolution,
followed by max pooling. The convolutional will compute 32 features
for each 5x5 patch. Its weight tensor will have a shape of [5, 5, 1,
32]. The first two dimensions are the patch size, the next is the
number of input channels, and the last is the number of output
channels. We will also have a bias vector with a component for each
output channel.
The patch size and the number of features are network hyper-parameters, therefore the are completely arbitrary.
There are rules of thumb, by the way, to follow in order to define a working and performing network.
The kernel size should be small, due to the equivalence between the application of multiple small kernels and lower number of big kernels (it's an image processing topic and it's well explained in the VGG paper). In addiction, operations with small filters are way faster to execute.
The number of features to extract (32 in you example) is completely arbitrary and find the right number is somehow an art.
Yes, both of them are hyperparameters, selected mostly arbitrary for this tutorial. A lot of effort is done currently to find appropriate sizes of the kernel, but for this tutorial it is not important.
The tutorial tells:
The convolutional will compute 32 features for each 5x5 patch. Its weight tensor will have a shape of [5, 5, 1, 32]
tf.nn.conv2d() tells that the second parameter represent your filter and consists of [filter_height, filter_width, in_channels, out_channels]. So [5, 5, 1, 32] means that your in_channels is 1: you have a greyscale image, so no surprises here.
32 means that during our learning phase, the network will try to learn 32 different kernels which will be used during the prediction. You can change this number to any other number as it is a hyperparameter that you can tune.