I'm working on a toy Keras/Tensorflow project targeting the MNIST dataset. I want to build something akin to a 2D convolutional network, but instead of a stack of filters, I want to produce a dense vector representation.
Here is an example of a model that I used to create an autoencoder for a 3x3 sub-sample of the input:
model = Sequential()
model.add(Flatten(input_shape=(3, 3)))
model.add(Dense(32, activation='elu'))
model.add(Dense(4, activation='elu'))
model.add(Dense(32, activation='elu'))
model.add(Dense(9, activation='sigmoid'))
model.add(Reshape((3, 3)))
Using this model, I know that the topology is close to what for my 3x3 kernel. What I am trying to figure out is how to replicate/tile the first three layers of this model over my 2D image. I would like to have all of the features of the Conv2d layer such as strides/padding but it's not clear to me if/how i could replace the kernel of that layer with an entire multi-layer "sub model".
Some properties that I would like:
The "kernel" needs to be shared across the tiled instances so that we only have to train a single kernel.
However we define this kernel, it would be nice if it could be expressed in keras layers
It has all of the sampling features of Conv2d like padding/strides/dilation
Some things I have tried:
Keras Conv2D custom kernel initialization - seems to require the kernel to be reduced to a single tensor?
Using K.tile but that seems to require me to reimplement large parts of Conv2d and it's not clear if the variables that are created are shared or new instances
You're in luck, because there's a tensorflow function that does exactly what you want. You're looking for tf.image.extract_patches. You can just put it in a tf.keras.layers.Lambda layer to wrap it in a tf.keras.layer.Layer. A cleaner way to do it is tf.keras.layers.Layer, but it has slightly more effort. More info on how to do that can be found in the docs for tf.keras.layers.Lamba
Related
I was reading the data augmentation article on Keras and they allow one to make preprocessing layer a part of the model:
model = tf.keras.Sequential([
resize_and_rescale,
data_augmentation,
layers.Conv2D(16, 3, padding="same", activation="relu"),
layers.MaxPooling2D(),
# Rest of your model
])
I'm wondering whether one or both of the resize_and_rescale and data_augmentation layers are also applied to the validation set during training?
It depends on which type of augmentation you are using. For example if you use resizing layer or a rescale layer they are applied even during inference mode, that is they would be applied to the valiation data in model.fit. For other augmentation layers like RandomFlip layer the documentation states:
During inference time, the output will be identical to input.
So you have to look up the information on the type of layer you are using. Documentation is here. From what I could gather I think only the resizing and rescaling layers remain active during inference mode.
I was working on an image recognition problem. After training the model, I saved the architecture as well as weights. Now I want to use the model for extracting features from other images and perform SVM on that. For this, I want to remove the last two layers of my model and get the values calculated by the CNN and fully connected layers till then. How can I do that in Keras?
# a simple model
model = keras.models.Sequential([
keras.layers.Input((32,32,3)),
keras.layers.Conv2D(16, 3, activation='relu'),
keras.layers.Flatten(),
keras.layers.Dense(10, activation='softmax')
])
# after training
feature_only_model = keras.models.Model(model.inputs, model.layers[-2].output)
feature_only_model take a (32,32,3) for input and the output is the feature vector
If your model is subclassed - just change call() method.
If not:
if your model is complicated - wrap your model by subclassed model and change forward pass in call() method, or
if your model is simple - create model without the last layers, load weights to every layer separately
So I am new to computer vision, and I do not really know what the layers do in keras. What is the use of adding layers (dense, Conv2D, etc) in keras? What do they add to it?
Convolution neural network has 4 main steps: Convolution, Pooling, Flatten, and Full connection.
Conv2D(), Conv3D(), etc. is for Feature extraction (It's a Convolution Layer).
Pooling layers (MaxPool2D(), AvgPool2D(), etc) is for Feature extraction as well (It has different operation though).
Flattening layers (Flatten() ) are to convert the extracted feature map into Vector before being fed into the Fully connection layers (The Dense layers).
Dense layers are for Fully connected step in Computer vision that acts as Classifier (The Neural network classify each extracted features from the Convolution layers.)
There are also optimization layers such as Dropout(), BatchNormalization(), etc.
For more information, just open the keras documentation.
If you want to start learning Convolution neural network, this article may help.
A layer in an Artificial Neural Network is a bunch of nodes bound together at a specific depth in a Neural Network. Keras is a high level API used over NN modules like TensorFlow or CNTK in order to simplify tasks. A Keras layer comprises 3 main parts:
Input Layer - Which contains the raw data
Hidden layer - Where the nodes of a layer learn some aspects about
the raw data which is input. It's similar to levels of abstraction
to form a Neural network.
Output Layer - Consists of a single output which is mostly a single
node and can be subjected to classification.
Keras, as a whole consists of many different types of layers. A Convolutional layer creates a kernel which is convoluted with the input over a single temporal space to derive a group of outputs. Pooling layers provide sampling of the feature maps by simplifying features in a map into patches. Max Pooling and Average Pooling are commonly used methods in a Pool layer.
Other commonly used layers in Keras are Embedding layers, Noise layers and Core layers. A single NN layer can represent only a Linearly seperable method. Most prediction problems are complicated and more than just one layer is required. This is where Multi Layer concept is required.
I think i clear your doubts and for any other queries you can see on https://www.tensorflow.org/api_docs/python/tf/keras
Neural networks are a great tool nowadays to automate classification problems. However when it comes to computer vision the amount of input data is too great to be handled efficiently by simple neural networks.
To reduce the network workload, your data needs to be preprocessed and certain features need to be identified. To find features in images we can use certain filters (like sobel edge detection), which will highlight the essential features needed for classification.
Again the amount of filters required to classify one image is too great, and thus the selection of those filters needs to be automated.
That's where the convolutional layer comes in.
We use a convolutional layer to generate multiple random (at first) filters that will highlight certain features in an image. While the network is training those filters are optimized to do a better job at highlighting features.
In Tensorflow we use Conv2D() to add one of those layers. An example of parameters is : Conv2D(64, 3, activation='relu'). 64 denotes the number of filters used, 3 denotes the size of the filters (in this case 3x3) and activation='relu' denotes the activation function
After the convolutional layer we use a pooling layer to further highlight the features produced by the previous convolutional layer. In Tensorflow this is usually done with MaxPooling2D() which takes the filtered image and applies a 2x2 (by default) layer every 2 pixels. The filter applied by MaxPooling is basically looking for the maximum value in that 2x2 area and adds it in a new image.
We can use this set of convolutional layer and pooling layers multiple times to make the image easier for the network to work with.
After we are done with those layers, we need to pass the output to a conventional (Dense) neural network.
To do that, we first need to flatten the image data from a 2D Tensor(Matrix) to a 1D Tensor(Vector). This is done by calling the Flatten() method.
Finally we need to add our Dense layers which are used to train on the flattened data. We do this by calling Dense(). An example of parameters is Dense(64, activation='relu')
where 64 is the number of nodes we are using.
Here is an example CNN structure I used recently:
# Build model
model = tf.keras.models.Sequential()
# Convolution and pooling layers
model.add(tf.keras.layers.Conv2D(64, 3, activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 1))) # Input layer
model.add(tf.keras.layers.MaxPooling2D())
model.add(tf.keras.layers.Conv2D(64, 3, activation='relu'))
model.add(tf.keras.layers.MaxPooling2D())
# Flattened layers
model.add(tf.keras.layers.Flatten())
# Dense layers
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(2, activation='softmax')) # Output layer
Of course this worked for a certain classification problem and the number of layers and method parameters differ depending on the problem.
The Youtube channel The Coding Train has a very helpful video explaining the Convolutional and Pooling layer.
I would like to create my own layer in Keras. To be more precision I would like to create simple convolution layer using only NumPy library(without TensorFlow part). I have some reasons for do that - first, for learning something new and second I have some idea how to modify that layer, so I have to write it from scratch. To make problem easier we can assume that I need only convolutional layer with 3x3 kernel size and default for others parameters.
I know I have to base on: https://keras.io/layers/writing-your-own-keras-layers/
In def build(self, input_shape): section I have to add weights. Convolutional layer needs filters times kernel matrix with 3x3 size.
In def call(self, x): section I can use that weights. But I have some problems with that.
Problems:
I need to get something like sliding through the input - typical convolutional layer task(moving 3x3 matrix through image). But I can't do that because x in def call(self, x): have ? or None in first value in shape. I know it is batch_size, but I can't use loop on that tensor because of that. So how can I get all data(numbers) from x to make some operations using them?
Maybe you have some general tips how can I make my own Convolutional Layer from scratch in Keras?
The problem for me is not to write Convolutional Layer in numpy(there is materials about that - for example: https://github.com/Eyyub/numpy-convnet ) but to marge it with Keras without using TensorFlow backend.
I want to predict the estimated wait time based on images using a CNN. So I would imagine that this would use a CNN to output a regression type output using a loss function of RMSE which is what I am using right now, but it is not working properly.
Can someone point out examples that use CNN image recognition to output a scalar/regression output (instead of a class output) similar to wait time so that I can use their techniques to get this to work because I haven't been able to find a suitable example.
All of the CNN examples that I found are for the MSINT data and distinguishing between cats and dogs which output a class output, not a number/scalar output of wait time.
Can someone give me an example using tensorflow of a CNN giving a scalar or regression output based on image recognition.
Thanks so much! I am honestly super stuck and am getting no progress and it has been over two weeks working on this same problem.
Check out the Udacity self-driving-car models which take an input image from a dash cam and predict a steering angle (i.e. continuous scalar) to stay on the road...usually using a regression output after one or more fully connected layers on top of the CNN layers.
https://github.com/udacity/self-driving-car/tree/master/steering-models/community-models
Here is a typical model:
https://github.com/udacity/self-driving-car/tree/master/steering-models/community-models/autumn
...it uses tf.atan() or you can use tf.tanh() or just linear to get your final output y.
Use MSE for your loss function.
Here is another example in keras...
model = models.Sequential()
model.add(convolutional.Convolution2D(16, 3, 3, input_shape=(32, 128, 3), activation='relu'))
model.add(pooling.MaxPooling2D(pool_size=(2, 2)))
model.add(convolutional.Convolution2D(32, 3, 3, activation='relu'))
model.add(pooling.MaxPooling2D(pool_size=(2, 2)))
model.add(convolutional.Convolution2D(64, 3, 3, activation='relu'))
model.add(pooling.MaxPooling2D(pool_size=(2, 2)))
model.add(core.Flatten())
model.add(core.Dense(500, activation='relu'))
model.add(core.Dropout(.5))
model.add(core.Dense(100, activation='relu'))
model.add(core.Dropout(.25))
model.add(core.Dense(20, activation='relu'))
model.add(core.Dense(1))
model.compile(optimizer=optimizers.Adam(lr=1e-04), loss='mean_squared_error')
They key difference from the MNIST examples is that instead of funneling down to a N-dim vector of logits into softmax w/ cross entropy loss, for your regression output you take it down to a 1-dim vector w/ MSE loss. (you can also have a mix of multiple classification and regression outputs in the final layer...like in YOLO object detection)
The key is to have NO activation function in your last Fully Connected (output) layer. Note that you must have at least 1 FC layer beforehand.