deeplearning4j defining layers - layer

The code below comes from https://deeplearning4j.org.
I don't quite get the nIn and nOut params. Does the definition below create 2 layers, or 3 with one hidden layer of 1.000 neurons?
And what would happen if the nOut of layer 0 would not match nIn of layer 1? Does this always have to be the same number (in this case 1.000)?
.layer(0, new DenseLayer.Builder()
.nIn(numRows * numColumns) // Number of input datapoints.
.nOut(1000) // Number of output datapoints.
.activation("relu") // Activation function.
.weightInit(WeightInit.XAVIER) // Weight initialization.
.build())
.layer(1, new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD)
.nIn(1000)
.nOut(outputNum)
.activation("softmax")
.weightInit(WeightInit.XAVIER)
.build())
.pretrain(false).backprop(true)
.build();

This is standard neural networks. I would suggest reading background material if you are new to neural networks in general.
Dense layers are your standard neural network defining the number of inputs and outputs of the various hidden layers.

Related

Best way to add features after last convolutional layer, before fully-connected layers?

I am working on a regression problem related to chess. The output will depend on about 68 values that are given by Stockfish's static evaluation function (example output shown here), as well as the state of the board. However, the static eval features should not be passed through the CNN, only through the final fully-connected layers. Therefore I want to have some convolutional layers take the (one-hot encoded) board state down to a flat vector, then extend it with the other features before passing the full vector to a fully-connected layer.
How can I use Tensorflow to combine these two feature vectors (the result from the CNN and the other game-related features) within a single Layer type that can be added to a Sequential? I couldn't find anything that would handle this in the docs. Would subclassing Layer be the only way to go?

What are the uses of layers in keras/Tensorflow

So I am new to computer vision, and I do not really know what the layers do in keras. What is the use of adding layers (dense, Conv2D, etc) in keras? What do they add to it?
Convolution neural network has 4 main steps: Convolution, Pooling, Flatten, and Full connection.
Conv2D(), Conv3D(), etc. is for Feature extraction (It's a Convolution Layer).
Pooling layers (MaxPool2D(), AvgPool2D(), etc) is for Feature extraction as well (It has different operation though).
Flattening layers (Flatten() ) are to convert the extracted feature map into Vector before being fed into the Fully connection layers (The Dense layers).
Dense layers are for Fully connected step in Computer vision that acts as Classifier (The Neural network classify each extracted features from the Convolution layers.)
There are also optimization layers such as Dropout(), BatchNormalization(), etc.
For more information, just open the keras documentation.
If you want to start learning Convolution neural network, this article may help.
A layer in an Artificial Neural Network is a bunch of nodes bound together at a specific depth in a Neural Network. Keras is a high level API used over NN modules like TensorFlow or CNTK in order to simplify tasks. A Keras layer comprises 3 main parts:
Input Layer - Which contains the raw data
Hidden layer - Where the nodes of a layer learn some aspects about
the raw data which is input. It's similar to levels of abstraction
to form a Neural network.
Output Layer - Consists of a single output which is mostly a single
node and can be subjected to classification.
Keras, as a whole consists of many different types of layers. A Convolutional layer creates a kernel which is convoluted with the input over a single temporal space to derive a group of outputs. Pooling layers provide sampling of the feature maps by simplifying features in a map into patches. Max Pooling and Average Pooling are commonly used methods in a Pool layer.
Other commonly used layers in Keras are Embedding layers, Noise layers and Core layers. A single NN layer can represent only a Linearly seperable method. Most prediction problems are complicated and more than just one layer is required. This is where Multi Layer concept is required.
I think i clear your doubts and for any other queries you can see on https://www.tensorflow.org/api_docs/python/tf/keras
Neural networks are a great tool nowadays to automate classification problems. However when it comes to computer vision the amount of input data is too great to be handled efficiently by simple neural networks.
To reduce the network workload, your data needs to be preprocessed and certain features need to be identified. To find features in images we can use certain filters (like sobel edge detection), which will highlight the essential features needed for classification.
Again the amount of filters required to classify one image is too great, and thus the selection of those filters needs to be automated.
That's where the convolutional layer comes in.
We use a convolutional layer to generate multiple random (at first) filters that will highlight certain features in an image. While the network is training those filters are optimized to do a better job at highlighting features.
In Tensorflow we use Conv2D() to add one of those layers. An example of parameters is : Conv2D(64, 3, activation='relu'). 64 denotes the number of filters used, 3 denotes the size of the filters (in this case 3x3) and activation='relu' denotes the activation function
After the convolutional layer we use a pooling layer to further highlight the features produced by the previous convolutional layer. In Tensorflow this is usually done with MaxPooling2D() which takes the filtered image and applies a 2x2 (by default) layer every 2 pixels. The filter applied by MaxPooling is basically looking for the maximum value in that 2x2 area and adds it in a new image.
We can use this set of convolutional layer and pooling layers multiple times to make the image easier for the network to work with.
After we are done with those layers, we need to pass the output to a conventional (Dense) neural network.
To do that, we first need to flatten the image data from a 2D Tensor(Matrix) to a 1D Tensor(Vector). This is done by calling the Flatten() method.
Finally we need to add our Dense layers which are used to train on the flattened data. We do this by calling Dense(). An example of parameters is Dense(64, activation='relu')
where 64 is the number of nodes we are using.
Here is an example CNN structure I used recently:
# Build model
model = tf.keras.models.Sequential()
# Convolution and pooling layers
model.add(tf.keras.layers.Conv2D(64, 3, activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 1))) # Input layer
model.add(tf.keras.layers.MaxPooling2D())
model.add(tf.keras.layers.Conv2D(64, 3, activation='relu'))
model.add(tf.keras.layers.MaxPooling2D())
# Flattened layers
model.add(tf.keras.layers.Flatten())
# Dense layers
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(2, activation='softmax')) # Output layer
Of course this worked for a certain classification problem and the number of layers and method parameters differ depending on the problem.
The Youtube channel The Coding Train has a very helpful video explaining the Convolutional and Pooling layer.

Fully Connected Layer dimensions

I have a few uncertainties regarding the fully connected layer of a convolutional neural network. Lets say the the input is the output of a convolutional layer. I understand the previous layer is flattened. But can it have multiple channels? (for example, can the input to the fully connected layer be 16x16x3 (3 channels, flattened into a vector of 768 elements?)
Next, I understand the equation for outputs is,
outputs = activation(inputs * weights' + bias)
Is there 1 weight per input? (for example, in the example above, would there be 768 weights?)
Next, how many bias's are there? 1 per channel (so 3)? 1 no matter what? Something else?
Lastly, how do filters work in the fully connected layer? Can there be more than 1?
You might have a misunderstanding of how the fully connected neural network works. To get a better understanding of it, you could always check some good tutorials such as online courses from Stanford HERE
To answer your first question: yes, whatever dimensions you have, you need to flatten it before sending to fully connected layers.
To answer your second question, you have to understand that fully connected layer is actually a process of matrix multiplication followed by a vector addition:
input^T * weights + bias = output
where you have an input of dimension 1xIN, weights of size INxOUT, and output of size 1xOUT, so you have 1xIN * INxOUT = 1xOUT. Altogether, you will have INxOUT weights, and OUT weights for each input. You will also need OUT biases, such that the full equation is 1xIN * INxOUT + 1xOUT(bias term).
There is no filters since you are not doing convolution.
Note that fully connected layer is also equal to 1x1 convolution layer, and many implementations use later for fully connected layer, this could be confusing for beginners. For details, please refer to HERE

Last fc layers in VGG16

The VGG16 architecture has input: 224x224x3 images.I want to have 48x48x3 inputs but to do this in keras, we remove the last fc layers which have 4096 neurons each.Why we have to do this? and is it needed to add another size of fc layers for this input?
Final pooling layer of VGG16 has dimension 7x7x512 for 224x224 input image. From there VGG16 uses fully connected layer of (7x7x512)x4096 to get 4096 dimensional output. However, since your input size is different your feature output dimension from final pooling layer will also be different (2x2x512 I think). So you need to change matrix dimension for fully connected layer to make it work. You have two other options though
use a global average pooling across spatial dimension to get 512 dimensional feature and then use few fully connected layers to get to your number of classes.
Resize you input image to 224x224x3 and you won't need to change anything in model architecture.
Removing the last FC layers is for fine-tuning or transfer learning, where you adapt an existing network to a new problem, such as changing the number of categories that your classifier can choose between.
You are adapting the network to take a different sized input, so you need to adjust the first layer(s) of the network.

How to derive the shapes of tensors given the networks architecture, number of outputs and number of samples

How can we calculate the shapes of various tensors involved in the computation graph once we know the network architecture (number of hidden layers, number of units in each layer), number of outputs, number of inputs and number of samples in the training set. (Assuming the network is fully connected).
For example, lets say there are 100 features, 10000 samples, 2 hidden layers (H0, H1) and 10 outputs. H1 has 500 units and H2 has 5000 units. Assume RELU activation is used in H0/H1 and softmax is used for output layer.
In this case, although the sequence of calculations that need to happen is very clear, finding correct shape for each constant/variable/place holder is difficult.
I am trying to understand if there is a standard method which we can follow to do this.