In CNN network, should input image go to all neurons in first convolution layer (I mean first hidden layer) or not? - tensorflow

I am new to CNN topic, I have one basic question regarding mapping between input image with neurons in first convolution layer.
My question is :
should input image go to all neurons in first convolution layer (I mean first hidden layer) or not ?
For example: if my first hidden layer in CNN as 8 neurons, in that case complete input image is passed to all these 8 neurons or only set of pixels of input image is passed to each neuron.

I am not sure whether you understand what you are asking because the number of neurons in a convolutional layer is something that you don't concern yourself too much with unless you are building your own implementation of a CNN.
To answer your question - No, each neuron in the first convolutional layer is connected only to pixels in its own receptive field (given by kernel size) and the same logic also applies to next convolutional layers as well except now they are connected to lower layer neurons in their receptive field.
For example: if my first hidden layer in CNN as 8 neurons
How do you know that it has 8 neurons? Unless you are doing some low level CNN programming, you do not specify the number of neurons. The number of neurons that you need is usually given by a combination of kernel size, stride, type of padding and the amount of filters that you have chosen to use. These 4 things (together with input size of the image) tells you exactly how many neurons you need.
For example, in Keras (since you have tagged this question with tensorflow) you might see convolutional layer like this one:
keras.layers.Conv2D(filters=128, kernel_size=3, strides=1, activation="relu",
input_shape=(100, 100, 1))
As you can see, you don't specify anything like the number of neurons here (at least not directly). Under this settings, you end up with output whose width and height are cropped by 2 (due to the default padding) so the output shape of this layer is (128, 98, 98) (technically it has a shape of (None, 128, 98, 98) where None is for batch size). If you would flatten this and feed the output into a single neuron (let's say in a dense layer), you would end up with 128 * 98 * 98 = 1,229,313 weights just between these two layers.
So for the analogy between dense and convolutional layer, the above convolutional layers with 128 filters connected to one output neuron is similar to having dense layer with 1,229,313 neurons connected to one output neuron.

Related

How conv2D function change the input layer

In my ResNet32 network coded using Tensorflow, the input size is 32 x 32 x 3 and the output of the
layer is 32 x 32 x 32. Why 32 channel is used ?
tf.contrib.layers.conv2d(
inputs,
**num_outputs**, /// how to determine the number of channel to be used in my layer?
kernel_size,
stride=1,
padding='SAME',
data_format=None,
rate=1,
activation_fn=tf.nn.relu,
normalizer_fn=None,
normalizer_params=None,
weights_initializer=initializers.xavier_initializer(),
weights_regularizer=None,
biases_initializer=tf.zeros_initializer(),
biases_regularizer=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
scope=None
)
Thank's in advance,
The 3 in input is the number to represent that the input image is RGB (color image), also known as color channels, and if it were a black and white image then it would have been 1 (monochrome image).
The 32 in output of this represents the number of neurons\number of features\number of channels you are using, so basically you are representing the image in 3 colors with 32 channels.
This helps in learning more complex and different set of features of the image. For example, it can make the network learn better edges.
By assigning stride=2 you can reduce the spatial size of input tensor so that the height and width of output tensor becomes half of that input tensor. That means, if your input tensor shape is (batch, 32, 32, 3) (3 is for RGB channel) to a Convolution layer having 32 kernels/filters with stride=2 then the shape of output tensor will be (batch, 16, 16, 32). Alternatively, Pooling is also widely used to reduce the output tensor size.
The ability of learning hierarchical representation by stacking conv layer is considered as the key to success of CNN. In CNN, as we go deeper the spatial size of the tensor reduces whereas the number of channel increases that helps to handle the variations in appearance of complex target object . This reduction of spatial size drastically decreases the required number of arithmetic operations and computation time with the motive of extracting prominent features contributing towards final output/decision. However, finding this optimal number of filter/kernel/output channel is time consuming and, therefore, people follow the proven earlier architectures e.g. VGG.

Keras CIFAR-10 Dense Layer Code Why 512 Neurons in Last Layer?

I am using Keras to build a CNN to work with the CIFAR-10 dataset. I am slightly confused at one of the last lines of an online tutorial. They take 50,000 32x32 color images and process them through 4 convolutional layers and one fully connected layer. The last part is accomplished by:
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
I am trying to understand why it is model.add(Dense(512)) and not some other number. For example, I thought 32x32 images can be flattened to a 1024-size vector. But, why did they choose 512 here?
Thanks!
Actualy not 32x32, it's 32x32x3 because of color channels and flatten and dense different methods I think you won't get the code there is low level implementation:
W1=tf.Variable(tf.random_normal([32*32*3,512]),name="W1") #variable
x=tf.placeholder(tf.float32,[batch,32,32,3]) #placeholder for inputs
flat=tf.reshape(x,[batch,32*32*3]) #model.add(Flatten())
mul1=tf.matmul(flat,W1) #model.add(Dense(512))
relu=tf.nn.relu(mul1) #model.add(Activation('relu'))
flat's shape=[batch,32*32*3]
mul1's shape=[batch,512]
Of course it could be 1024 or 5000 but it becomes harder to optimize.

Why not use Flatten followed by a Dense layer instead of TimeDistributed?

I am trying to understand the Keras layers better. I am working on a sequence to sequence model where I embed a sentence and pass it to a LSTM that returns sequences. Hereafter, I want to apply a Dense layer to each timestep (word) in the sentence and it seems like TimeDistributed does the job for three-dimensional tensors like this case.
In my understanding, Dense layers only work for two-dimensional tensors and TimeDistributed just applies the same dense on every timestep in three dimensions. Could one then not simply flatten the timesteps, apply a dense layer and perform a reshape to obtain the same result or are these not equivalent in some way that I am missing?
Imagine you have a batch of 4 time steps, each containing a 3-element vector. Let's represent that with this:
Now you want to transform this batch using a dense layer, so you get 5 features per time step. The output of the layer can be represented as something like this:
You consider two options, a TimeDistributed dense layer, or reshaping as a flat input, apply a dense layer and reshaping back to time steps.
In the first option, you would apply a dense layer with 3 inputs and 5 outputs to every single time step. This could look like this:
Each blue circle here is a unit in the dense layer. By doing this with every input time step you get the total output. Importantly, these five units are the same for all the time steps, so you only have the parameters of a single dense layer with 3 inputs and 5 outputs.
The second option would involve flattening the input into a 12-element vector, applying a dense layer with 12 inputs and 20 outputs, and then reshaping that back. This is how it would look:
Here the input connections of only one unit are drawn for clarity, but every unit would be connected to every input. Here, obviously, you have many more parameters (those of a dense layer with 12 inputs and 20 outputs), and also note that each output value is influenced by every input value, so values in one time step would affect outputs in other time steps. Whether this is something good or bad depends on your problem and model, but it is an important difference with respect to the previous, where each time step input and output were independent. In addition to that, this configuration requires you to use a fixed number of time steps on each batch, whereas the previous works independently of the number of time steps.
You could also consider the option of having four dense layers, each applied independently to each time step (I didn't draw it but hopefully you get the idea). That would be similar to the previous one, only each unit would receive input connections only from its respective time step inputs. I don't think there is a straightforward way to do that in Keras, you would have to split the input into four, apply dense layers to each part and merge the outputs. Again, in this case the number of time steps would be fixed.
Dense layer can act on any tensor, not necessarily rank 2. And I think that TimeDistributed wrapper does not change anything in the way Dense layer acts. Just applying Dense layer to a tensor of rank 3 will do exactly the same as applying TimeDistributed wrapper of the Dense layer. Here is illustration:
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
model = Sequential()
model.add(Dense(5,input_shape=(50,10)))
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_5 (Dense) (None, 50, 5) 55
=================================================================
Total params: 55
Trainable params: 55
Non-trainable params: 0
_________________________________________________________________
model1 = Sequential()
model1.add(TimeDistributed(Dense(5),input_shape=(50,10)))
model1.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
time_distributed_3 (TimeDist (None, 50, 5) 55
=================================================================
Total params: 55
Trainable params: 55
Non-trainable params: 0
_________________________________________________________________
Adding to the above answers,
here are few pictures comparing the output shapes of the two layers. So when using one of these layers after LSTM(for example) would have different behaviors.
"Could one then not simply flatten the timesteps, apply a dense layer and perform a reshape to obtain the same result"
No, flattening timesteps into input dimensions (input_dim) is the wrong operation. As illustrated by yuva-rajulu if you flatten a 3D input (batch_size,timesteps,input_dim) = (1000,50,10), you end up with a flattened input (batch_size,input_dim)=(1000,500), resulting in a network architecture with timesteps interacting with each others (see jdehesa). This is not what is intended (i.e., we want to apply the same dense layer to each timestep independently).
What need to be done instead is to reshape the 3D input as (batch_size * timesteps, input_dim) = (50000,10), then apply the dense layer on this 2D input. That way the same dense layer will operate 50000 times on each input vector (10,1) independently. You will end up with a (50000,n_units) output that you should reshape back as a (1000,50,n_units) output. Fortunately, when you pass a 3D input to a dense layer keras does this automatically for you. See official reference:
"If the input to the layer has a rank greater than 2, then Dense computes the dot product between the inputs and the kernel along the last axis of the inputs and axis 0 of the kernel (using tf.tensordot). For example, if input has dimensions (batch_size, d0, d1), then we create a kernel with shape (d1, units), and the kernel operates along axis 2 of the input, on every sub-tensor of shape (1, 1, d1) (there are batch_size * d0 such sub-tensors). The output in this case will have shape (batch_size, d0, units)."
Another way to see it is that the way Dense() computes the output is simply by applying the kernel , i.e., weigth matrix of size (input_dim, n_units) to the last dimension of your 3D input, considering all other dimensions as similar to batch sizes, then size the output accordingly.
I think that they may have been a time when the TimeDistributed layer was needed in keras with Dense() discussion here. Today, we do not need the TimeDistributed wrapper as Dense() and TimeDistributed(Dense()) do exactly the same thing, see Andrey Kite Gorin or mujjiga.

Feeding data into RNN in tensorflow with Convolutional Neural Network as feature extractor

I have a convolutional neural network of 3 layers followed with a dense layer and finally, a softmax layer. The purpose is to train the convolutional layers, then replace the dense layer with an RNN layer. My problem is, I need a batch_size of 128, number of time_steps of 100, and hidden_state size of 100. Therefore, any input has to be of shape: [batch_size, n_steps, number_features], where the number_features is the reshaped pool layer after the 3rd conv layer.
So, feeding 128 (for batch_size) * 100(for num_steps) images will not fit into my memory.
What I need to achieve is the follows: I need to process 128 images at a time, as a mini-batch, extract their features, then hold on. After 100 mini-batches, I can feed the 128, 100, num_features into the RNN. In this case, I will not run out of memory.
So, how can I achieve this in tensorflow?
Any help is much appreciated!!

TF-slim layers count

Would the code below represent one or two layers? I'm confused because isn't there also supposed to be an input layer in a neural net?
input_layer = slim.fully_connected(input, 6000, activation_fn=tf.nn.relu)
output = slim.fully_connected(input_layer, num_output)
Does that contain a hidden layer? I'm just trying to be able to visualize the net. Thanks in advance!
You have a neural network with one hidden layer. In your code, input corresponds to the 'Input' layer in the above image. input_layer is what the image calls 'Hidden'. output is what the image calls 'Output'.
Remember that the "input layer" of a neural network isn't a traditional fully-connected layer since it's just raw data without an activation. It's a bit of a misnomer. Those neurons in the picture above in the input layer are not the same as the neurons in the hidden layer or output layer.
From tensorflow-slim:
Furthermore, TF-Slim's slim.stack operator allows a caller to repeatedly apply the same operation with different arguments to create a stack or tower of layers. slim.stack also creates a new tf.variable_scope for each operation created. For example, a simple way to create a Multi-Layer Perceptron (MLP):
# Verbose way:
x = slim.fully_connected(x, 32, scope='fc/fc_1')
x = slim.fully_connected(x, 64, scope='fc/fc_2')
x = slim.fully_connected(x, 128, scope='fc/fc_3')
# Equivalent, TF-Slim way using slim.stack:
slim.stack(x, slim.fully_connected, [32, 64, 128], scope='fc')
So the network mentioned here is a [32, 64,128] network - a layer with a hidden size of 64.