Is that a good idea to use transfer learning in real world projects? - tensorflow

SCENARIO
What if my intention is to train for a dataset of medical images and I have chosen a coco pre-trained model.
My Doubts
1 Since I have chosen medical images there is no point of train it on COCO dataset, right? if so what is a possible solution to do the same?
2 Adding more layers to a pre-trained model will screw the entire model? with classes of around 10 plus and 10000's of training datasets?
3 Without train from scratch what are the possible solutions , like fine-tuning the model?
PS - let's assume this scenario is based on deploying the model for business purposes.
Thanks-

Yes, it is a good idea to reuse the Pre-Trained Models or Transfer Learning in Real World Projects, as it saves Computation Time and as the Architectures are proven.
If your use case is to classify the Medical Images, that is, Image Classification, then
Since I have chosen medical images there is no point of train it on
COCO dataset, right? if so what is a possible solution to do the same?
Yes, COCO Dataset is not a good idea for Image Classification as it is efficient for Object Detection. You can reuse VGGNet or ResNet or Inception Net or EfficientNet. For more information, refer TF HUB Modules.
Adding more layers to a pre-trained model will screw the entire model?
with classes of around 10 plus and 10000's of training datasets?
No. We can remove the Top Layer of the Pre-Trained Model and can add our Custom Layers, without affecting the performance of the Pre-Trained Model.
Without train from scratch what are the possible solutions , like
fine-tuning the model?
In addition to using the Pre-Trained Models, you can Tune the Hyper-Parameters of the Model (Custom Layers added by you) using HParams of Tensorboard.

Related

Object detector with multiple datasets

I am interested in building a yolo detector with trained on multiple datasets where each dataset has it own detection head. It is a multi-task learning approach. I am not sure how to convert the yolo detector architecture to support multiple head.
I came across the following projects, however I need your help to implement similar approach.
https://github.com/xingyizhou/UniDet
https://link.springer.com/chapter/10.1007/978-981-16-6963-7_27
This approach has some difficulties. First, in article you sent they use two-stage detection model with separate classification "branches". In the same time YOLO is one-stage detector and is fullyconvolutional, that means there are no fullyconnected layers, and class predictions (1d) are taking from the whole 3d-tensor (see the image).
You can take a look at YOLO9000 paper, the model was trained on detection and classification datasets at the same time - only loss function was changing.

Why are Keras models instantiated with imagenet weights only?

If we look at the list of available models in Keras as shown here we see that almost all of them are instantiated with weights='imagenet'. For instance:
model = VGG16(weights='imagenet', include_top=False)
Why always imagenet? is it because it is the baseline? If not, what are the other options available?
Thank you
Imagenet is a defacto standard for images classification. A yearly contest is run with millions of training images in 1000 categories. The models used in the imagenet classification competitions are measured against each other for performance. Therefore it provides a "standard" measure for how good a model is for image classification. So many often used transfer learning model models use the imagenet weights. Your model if you are using transfer learning can be customized for your application by adding additional layers to the model. You do not have to use the imagenet weighst but it generally is beneficial as it helps the model converge in less epochs. I use them but I also set all layers to be trainable which helps adapt the weights of the model to your application.

How to best transfer learning using Dopamine for Reinforcement Learning?

I am using Google's Dopamine framework to train a specific reinforcement learning use-case. I am using an auto encoder to pre-train the convolutional layers of the Deep Q Network and then transfer those pre-trained weights in the final network.
To that end, I have created a separate model (in this case an auto-encoder) which I train and save the resulting model and weights.
The DQN model is created using Keras's model sub-classing method and the model used to save the trained convolutional layers weights was build using the Sequential API. My issue is with when trying to load the pre-trained weights to my final DQN model. Based on whether I use the load_model() or load_weights() functionality from Tensorflow's API I get two different overall behaviors of my network and I would like to understand why. Specifically I have the two following scenarios:
Loading the weights with theload_weights() method to the final model. The weights are the weights of the encoder plus one additional layer(added just before saving the weights) to fit the architecture of the final network implemented in dopamine where they are loaded.
First load the saved model with load_model() and then when defining the new model in the __init__() method, extract the relevant layers from the loaded model and then use them for the final model.
Overall, I would expect the two approaches to yield similar results with regards to the average reward achieved per episode , when I use the same pre-trained weights. However the two approaches differ ( 1. yield higher average reward than 2. although using the same pre-trained weights) and I don't understand why.
Furthermore, in order to validate this behavior I have tried loading random weights with the two aforementioned approaches in order to see a change in behavior. In both cases, based on which of the two aforementioned loading methods I am using, I end up with very similar resulting behavior with the respected case when loading the trained weights. It's seems like the pre-trained weights in each respected case have no effect on the overall resulting training behavior. Although, this might be irrelevant to the issue I am trying to investigate here as it might be the case that the pre-trained weights don't offer any benefit overall which is also possible.
Any thoughts and ideas on this would be much appreciated.

training image datasets for object detection

Which version of YOLO-tensorflow (customised cnn like googlenet) is preferred for traffic science?
If the training datasets are blurred and are with noise is that okay to train or what are the steps to be considered for training dataset images?
You may need to curate your own dataset using frames from a traffic camera and manually tagging images with cars where the passengers' seatbelts are or are not buckled, as this is a very specialized task. From there, you can do data augmentation (perhaps using the Keras ImageDataGenerator class). If a human can identify a seatbelt in an image that is blurred or noisy, a model can learn from it. From there, you can use transfer learning from a pre-trained CNN model like Inception (this is a helpful tutorial for how to do that), or train your own binary classifier with your tagged images, where your inputs are frames of traffic camera video.
I'd suggest that after learning the basics of CNNs with these models, only then should you dive into a more complicated model like yolo.

Pre Trained LeNet Model for License plate Recognition

I have implemented a form of the LeNet model via tensorflow and python for a Car number plate recognition system. My model was trained solely on my train data and tested on the test data. My dataset contains segmented images wherein every image has only one character in them. This is what my data looks like. My created model does not perform very well, so I'm now looking for models which I can use via Transfer Learning. Since most models, are already trained on a humongous dataset, I looked over a few like AlexNet, ResNet, GoogLeNet and Inception v2. Most of these models have not been trained on the type of data that I want which would be, Letters and digits.
Question: Should I still go forward with one of these models and train them on my dataset or are there any better models which would help ? For such models would keras be a better option since it is more high level than Tensorflow?
Question: I'd prefer to work with the LeNet model itself since training the other models would definitely take a long time due to the insufficient specs of my laptop. So is there any implementation of the model which uses machine printed character images to train the model which I could use to then train the final layers of the model on my data?
to get good results you should use a model explicitly designed for text recognition.
First, (roughly) crop the input image to the region around the text.
Then, feed the image of the text into a neural network (NN) to detect the text.
A typical NN for text recognition extracts relevant features (with convolutional NN), propagates those features through the image (with recurrent NN) and finally predicts a character score for each position in the image.
Usually, those networks are trained with the CTC loss.
As a starting point I would suggest looking at the CRNN implementation (they also provide a pre-trained model) [1] and the corresponding paper [2]. There is, as far as I remember, also a TensorFlow implementation on github.
You can use any framework (e.g TensorFlow or CNTK or ...) you like as long as it features convolutional and recurrent NN and the CTC loss.
I once attended a presentation about CNTK where they claimed that they have a very fast implementation of recurrent NN - so maybe CNTK would be a good choice for your slow computer?
[1] CRNN implementation: https://github.com/bgshih/crnn
[2] Shi - An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition