SegNet Implementation - tensorflow

I am working on Biomedical Image Segmentation. For this regard, i need the implementation of SegNet model. I searched for SegNet implementation in many places but none of these provide me correct implementation. I got some implementations without using pre-trained encoder. But from the paper of SegNet, i knew that SegNet use pre-trained encoder which is trimmed portion of VGG-16 network trained on ImageNet dataset. I need the implementation in Keras.
N.B. There is a pretrained VGG-16 network available in keras. But that lacks Batch Normalization layers, which is present in the original paper in SegNet.
P.S. I cannot retrain the VGG-16 network on my own because of the scarcity of computational resource.

Related

Quantization aware training examples?

I want to do quantization-aware training with a basic convolutional neural network that I define directly in tensorflow (I don't want to use other API's such as Keras). The only ressource that I am aware of is the readme here:
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/quantize
However its not clear exactly where the different quantization commands should go in the overall process of training and then freezing the graph for actual inference.
Therefore I am wondering if there is any code example out there that shows how to define, train, and freeze a simple convolutional neural network with quantization aware training in tensorflow?
It seems that others have had the same question as well, see for instance here.
Thanks!

SSD Inception v2. Is the VGG16 feature extractor replaced by the Inception v2?

In the original SSD paper they used a VGG16 network to the the feature extraction. I am using the SSD Inception v2 model from the TensorFlow model zoo and I do not know what the difference in architecture is. This stack overflow post suggest that for other models like SSD MobileNet the VGG16 feature extractor is replaced by the MobileNet feature extractor.
I thought this would be the same case here with the SSD Inception but this paper has me confused. From here it seems that the Inception is added to the SSD part of the model and the VGG16 feature extractor remains in the beginning of the architecture.
What is the architecture of the SSD Inception v2 model?
In tensorflow object detection api, the ssd_inception_v2 model uses inception_v2 as the feature extractor, namely, the vgg16 part in the first figure (figure (a)) is replaced with inception_v2.
In ssd models, the feature layer extracted by feature extractor (i.e. vgg16, inception_v2, mobilenet) will be further processed to produce extra feature layers of different resolutions. In the above figure (a), there are 6 output feature layers, the first two (19x19) are directly taken from the feature extractor. How are the other 4 layers (10x10, 5x5, 3x3, 1x1) generated?
They are generated by extra convolutional operations (these conv operations are sort of like using very shallow feature extractors, aren't they?). The implementation details are here provided with good documents. In the documentation it says
Note that the current implementation only supports generating new layers
using convolutions of stride 2 (resulting in a spatial resolution reduction
by a factor of 2)
that is how the extra feature map decreases by a factor of 2, and if you read the function multi_resolution_feature_maps, you will find slim.conv2d operations being used, which indicates these extra layers are obtained with extra convolution layer (just one layer each!).
Now we can explain what is improved in the paper you linked. They proposed to replace the extra feature layers with inception block. There is no inception_v2 model but simply a inception block. The paper reported improving classification accuracy by using inception block.
Now it should be clear to the question, ssd model with vgg16, inceptioin_v2 or mobilenet are alright but the inception in the paper only refers to a inception block, not the inception network.

Pre Trained LeNet Model for License plate Recognition

I have implemented a form of the LeNet model via tensorflow and python for a Car number plate recognition system. My model was trained solely on my train data and tested on the test data. My dataset contains segmented images wherein every image has only one character in them. This is what my data looks like. My created model does not perform very well, so I'm now looking for models which I can use via Transfer Learning. Since most models, are already trained on a humongous dataset, I looked over a few like AlexNet, ResNet, GoogLeNet and Inception v2. Most of these models have not been trained on the type of data that I want which would be, Letters and digits.
Question: Should I still go forward with one of these models and train them on my dataset or are there any better models which would help ? For such models would keras be a better option since it is more high level than Tensorflow?
Question: I'd prefer to work with the LeNet model itself since training the other models would definitely take a long time due to the insufficient specs of my laptop. So is there any implementation of the model which uses machine printed character images to train the model which I could use to then train the final layers of the model on my data?
to get good results you should use a model explicitly designed for text recognition.
First, (roughly) crop the input image to the region around the text.
Then, feed the image of the text into a neural network (NN) to detect the text.
A typical NN for text recognition extracts relevant features (with convolutional NN), propagates those features through the image (with recurrent NN) and finally predicts a character score for each position in the image.
Usually, those networks are trained with the CTC loss.
As a starting point I would suggest looking at the CRNN implementation (they also provide a pre-trained model) [1] and the corresponding paper [2]. There is, as far as I remember, also a TensorFlow implementation on github.
You can use any framework (e.g TensorFlow or CNTK or ...) you like as long as it features convolutional and recurrent NN and the CTC loss.
I once attended a presentation about CNTK where they claimed that they have a very fast implementation of recurrent NN - so maybe CNTK would be a good choice for your slow computer?
[1] CRNN implementation: https://github.com/bgshih/crnn
[2] Shi - An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

Tensorflow SSD-Mobilenet model accuracy drop after quantization using transform_graph

I am working on the recently released "SSD-Mobilenet" model by google for object detection.
Model downloaded from following location: https://github.com/tensorflow/models/blob/master/object_detection/g3doc/detection_model_zoo.md
The frozen graph file downloaded from the site is working as expected, however after quantization the accuracy drops significantly (mostly random predictions).
I built tensorflow r1.2 from source, and used following method to quantize:
bazel-bin/tensorflow/tools/graph_transforms/transform_graph --in_graph=frozen_inference_graph.pb --out_graph=optimized_graph.pb --inputs='image_tensor' --outputs='detection_boxes','detection_scores','detection_classes','num_detections' --transforms='add_default_attributes strip_unused_nodes(type=float, shape="1,224,224,3") fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms quantize_weights strip_unused_nodes sort_by_execution_order'
I tried various combinations in the "transforms" part, and the transforms mentioned above gave sometimes correct predictions, however no where close to the original model.
Is there any other way to improve performance of the quantized model?
In this case SSD uses mobilenet as it's feature extractor . In-order to increase the speed. If you read the mobilenet paper , it's a lightweight convolutional neural nets specially using separable convolution inroder to reduce parameters .
As I understood separable convolution can loose information because of the channel wise convolution.
So when quantifying a graph according to TF implementation it makes 16 bits ops and weights to 8bits . If you read the tutorial in TF for quantization they clearly have mentioned how this operation is more like adding some noise in to already trained net hoping our model has well generalized .
So this will work really well and almost lossless interms of accuracy for a heavy model like inception , resnet etc. But with the lightness and simplicity of ssd with mobilenet it really can make a accuracy loss .
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
How to Quantize Neural Networks with TensorFlow

Manipulating pretrained layers of convnet in Tensorflow

I am learning convolutional networks in Tensorflow. I wonder if there is any tutorials of using TF to investigate a pre-trained convnet model, like these excellent tutorials for Caffe: this and this. I mean, how to access middle layers, get its learned parameters and blobs, to customize input shape to accept arbitrary image size or batch size, etc.
It's not quite the same thing, but there's a codelab here that shows you how to remove the top layer of a pretrained network and train up a new one on your own data:
https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/index.html?index=..%2F..%2Findex#0
It might give you some ideas on how to approach this in TensorFlow.