How can I create a RoI pooling layer in tensorlfow/keras? - tensorflow

I've programmed a VGG16 based CNN and now I want to create a faster R-CNN from it. In all the architecture photos I've seen it is needed to have a RoI pooling layer but I don't know how to implement one. Is there an function to do this?

Keras/Tensorflow does not provide an implementation of ROI Pooling Layer, so you need to code it yourself.
You can have code reference from this repository

Related

Trouble with implementing local response normalization in TensorFlow

I'm trying to implement a local response normalization layer in Tensorflow to be used in a Keras model:
Here is an image of the operation I am trying to implement:
Here is the Paper link, please refer to section 3.3 to see the description of this layer
I have a working NumPy implementation, however, this implementation uses for loops and inbuilt python min and max operators to compute the summation. However, these pythonic operations will cause errors when defining a custom keras layer, so I can't use this implementation.
The issue here lies in the fact that I need to iterate over all the elements in the feature map and generate a normalized value for each of them. Additionally, the upper and lower bound on the summation change depending on which value I am currently normalizing. I can't really think of a way to handle this without nested for loops, but this will not work in a Keras custom layer as it isn't a native TensorFlow function.
Could anyone point me towards tensorflow/keras backend functions that could help me in implementing this layer?
EDIT: I know that this layer is implemented as a keras layer, but I want to build intuition about custom layers, so I want to implement this layer using tensor ops.

Multi-Head attention layers - what is a warpper multi-head layer in Keras?

I am new to attention mechanisms and I want to learn more about it by doing some practical examples. I came across a Keras implementation for multi-head attention found it in this website Pypi keras multi-head. I found two different ways to implement it in Keras.
One way is to use a multi-head attention as a keras wrapper layer with either LSTM or CNN.
This is a snippet of implementating multi-head as a wrapper layer with LSTM in Keras. This example is taken from this website keras multi-head"
import keras
from keras_multi_head import MultiHead
model = keras.models.Sequential()
model.add(keras.layers.Embedding(input_dim=100, output_dim=20, name='Embedding'))
model.add(MultiHead(keras.layers.LSTM(units=64), layer_num=3, name='Multi-LSTMs'))
model.add(keras.layers.Flatten(name='Flatten'))
model.add(keras.layers.Dense(units=4, activation='softmax', name='Dense'))
model.build()
model.summary()
The other way is to use it separately as a stand-alone layer.
This is a snippet of the second implementation for multi-head as stand-alone laye, also taken from keras multi-head"
import keras
from keras_multi_head import MultiHeadAttention
input_layer = keras.layers.Input( shape=(2, 3), name='Input',)
att_layer = MultiHeadAttention( head_num=3, name='Multi-Head',)(input_layer)
model = keras.models.Model(inputs=input_layer, outputs=att_layer)
model.compile( optimizer='adam', loss='mse', metrics={},)
I have been trying to find some documents that explain this but I have not found yet.
Update:
What I have found was that the second implementation (MultiHeadAttention) is more like the Transformer paper "Attention All You Need". However, I am still struggling to understand the first implementation which is the wrapper layer.
Does the first one (as a wrapper layer) would combine the output of multi-head with LSTM?.
I was wondering if someone could explain the idea behind them, especially, the wrapper layer.
I understand your confusion. From my experience, what the Multihead (this wrapper) does is that it duplicates (or parallelize) layers to form a kind of multichannel architecture, and each channel can be used to extract different features from the input.
For instance, each channel can have a different configuration, which is later concatenated to make an inference. So, the MultiHead can be used to wrap conventional architectures to form multihead-CNN, multihead-LSTM etc.
Note that the attention layer is different. You may stack attention layers to form a new architecture. You may also parallelize the attention layer (MultiHeadAttention) and configure each layer as explained above. See here for different implementation of the attention layer.

Freezing an LSTM layer

Is it possible to freeze one LSTM layer and build another LSTM layer on the top of it?
The idea is quite simple but mechanism in Tensorflow makes it harder. All I need to do is to build one LSTM layer and save the model. Then, I restore this model to get kernel (weight) matrix and biases using:
tf.get_default_graph().get_tensor_by_name("rnn/multi_rnn_cell/cell_0/LSTM_cell/kernel:0") tf.get_default_graph().get_tensor_by_name("rnn/multi_rnn_cell/cell_0/LSTM_cell/bias:0")
Then, I want to get these two tensors and keep them untrained and build another LSTM layer on the top of this then fetch the variables of the second layer.
However, the only topic I came up related to what I want to do is here and it says that it is not possible, because I cannot set weights manually.
Doing this is super simple in feed forward neural network.
Does anyone have any idea?

Custom Object detection using tensorflow

I have trained the object detection API using ssd_mobilenet_v1_coco_2017_11_17 model to detect a custom object. But after training, the API only detects the custom object and not the objects for which the API is already trained. ssd_mobilenet_v1_coco_2017_11_17 model detect 90 objects.
Is there any way to add more classes to an existing model so that it can detect new objects along with the one it has been trained for?
This question is already asked here and some answer could be found here.
The very last layer of the networks is softmax layer. when the network is trained, the weights of the network is optimized for the exact number of classes on the training set. So, if you need to add a new class and also the classes it was trained, the easiest way is to get the original dataset it was trained on along with your new class images. Then start the training from the pre-trained model weights. The training should converge faster as it has to do relatively little adjustments.

Faster RCNN for TensorFlow

Has anyone implement the FRCNN for TensorFlow version?
I found some related repos as following:
Implement roi pool layer
Implement fast RCNN based on py-faster-rcnn repo
but for 1: assume the roi pooling layer works (I haven't tried), and there are something need to be implemented as following:
ROI data layer e.g. roidb.
Linear Regression e.g. SmoothL1Loss
ROI pool layer post-processing for end-to-end training which should convert the ROI pooling layer's results to feed into CNN for classifier.
For 2: em...., it seems based on py-faster-rcnn which based on Caffe to prepared pre-processing (e.g. roidb) and feed data into Tensorflow to train the model, it seems weird, so I may not tried it.
So what I want to know is that, will Tensorflow support Faster RCNN in the future?. If not, do I have any mis-understand which mentioned above? or has any repo or someone support that?
Tensorflow has just released an official Object Detection API here, that can be used for instance with their various slim models.
This API contains implementation of various Pipelines for Object Detection, including popular Faster RCNN, with their pre-trained models as well.