I am using Google's Dopamine framework to train a specific reinforcement learning use-case. I am using an auto encoder to pre-train the convolutional layers of the Deep Q Network and then transfer those pre-trained weights in the final network.
To that end, I have created a separate model (in this case an auto-encoder) which I train and save the resulting model and weights.
The DQN model is created using Keras's model sub-classing method and the model used to save the trained convolutional layers weights was build using the Sequential API. My issue is with when trying to load the pre-trained weights to my final DQN model. Based on whether I use the load_model() or load_weights() functionality from Tensorflow's API I get two different overall behaviors of my network and I would like to understand why. Specifically I have the two following scenarios:
Loading the weights with theload_weights() method to the final model. The weights are the weights of the encoder plus one additional layer(added just before saving the weights) to fit the architecture of the final network implemented in dopamine where they are loaded.
First load the saved model with load_model() and then when defining the new model in the __init__() method, extract the relevant layers from the loaded model and then use them for the final model.
Overall, I would expect the two approaches to yield similar results with regards to the average reward achieved per episode , when I use the same pre-trained weights. However the two approaches differ ( 1. yield higher average reward than 2. although using the same pre-trained weights) and I don't understand why.
Furthermore, in order to validate this behavior I have tried loading random weights with the two aforementioned approaches in order to see a change in behavior. In both cases, based on which of the two aforementioned loading methods I am using, I end up with very similar resulting behavior with the respected case when loading the trained weights. It's seems like the pre-trained weights in each respected case have no effect on the overall resulting training behavior. Although, this might be irrelevant to the issue I am trying to investigate here as it might be the case that the pre-trained weights don't offer any benefit overall which is also possible.
Any thoughts and ideas on this would be much appreciated.
Related
I am implementing a generative adversarial autoencoder in TF2. I have got it working but not optimally and could use some high level advise to improve it.
The model is inspired by the paper “Adversarial Factorization Autoencoder for Look-alike Modeling” https://dmkd.cs.vt.edu/papers/CIKM19.pdf
The model consists of three parts: an encoder/generator, a decoder and a discriminator.
I have implemented these three parts as custom classes, each subclassing from tf.keras.Model (which is primary source of my issues). I have three different loss functions (autoencoder loss, generator loss, discriminator loss) and two custom training step functions.
The first training function trains first the autoencoder (using the autoencoder loss function) and then the generator (using the generator loss function). The generator is just the encoder part of the autoencoder but having the double purpose of also fooling the discriminator).
The second training function trains the discriminator using the discriminator loss function.
This approach works ok but subclassing all three parts from tf.keras.Model has limitations. I can’t utilize keras compile and fit functionalities. Callbacks are a nightmare and I really do need early stopping, keep best model, tensorboard integration and so on. It appears the best approach is to subclass each part from tf.keras.layers.Layer and then combine them in a single custom model. But I am not sure if it is at all possible to wire up multiple loss functions and multiple training steps to different layer blocks of a custom model?
Any hints and insights are greatly appreciated.
First of all I want to state out that I am familiar with the benefits of transfer learning. Moreover I am able to train a pretrained model from 'modelzoo' on my dataset. But for research purposes I want to train my model from scratch without transferlearning.
I want to adopt the Faster-RCNN Resnet 101 implementation from tensorsflow's Object Detection API to my dataset. If I use one of the pretrained models the training goes as expected and the loss is always in 'normal' ranges (never above about 6). But if I do not use transferlearning the loss jumps very frequently to extrem high values (about 80,000,000), but between those values the loss is in normal ranges. In addition to this I do not see any predictions of the network on images in TensorBoard. It seems like the network does not make any predictions at all. The only thing which I change is to comment out those two lines in the model.config file:
# fine_tune_checkpoint: 'path'
# from_detection_checkpoint: true
I tried a lot of things to find the reason: Changed optimizer, changed the learning rate, used gradient clipping, changed the initializer used different machines to train on but nothing helps. Moreover I inspected my label_map as well as my record file. To ensure that those files are correct I redid the steps mentioned above by using the pascal voc dataset, the script to create records and the label map from the api, but even with this code from the Object Detection API without any code changes, the loss explodes (Tensorflow Object Detection API own inputs).
SCENARIO
What if my intention is to train for a dataset of medical images and I have chosen a coco pre-trained model.
My Doubts
1 Since I have chosen medical images there is no point of train it on COCO dataset, right? if so what is a possible solution to do the same?
2 Adding more layers to a pre-trained model will screw the entire model? with classes of around 10 plus and 10000's of training datasets?
3 Without train from scratch what are the possible solutions , like fine-tuning the model?
PS - let's assume this scenario is based on deploying the model for business purposes.
Thanks-
Yes, it is a good idea to reuse the Pre-Trained Models or Transfer Learning in Real World Projects, as it saves Computation Time and as the Architectures are proven.
If your use case is to classify the Medical Images, that is, Image Classification, then
Since I have chosen medical images there is no point of train it on
COCO dataset, right? if so what is a possible solution to do the same?
Yes, COCO Dataset is not a good idea for Image Classification as it is efficient for Object Detection. You can reuse VGGNet or ResNet or Inception Net or EfficientNet. For more information, refer TF HUB Modules.
Adding more layers to a pre-trained model will screw the entire model?
with classes of around 10 plus and 10000's of training datasets?
No. We can remove the Top Layer of the Pre-Trained Model and can add our Custom Layers, without affecting the performance of the Pre-Trained Model.
Without train from scratch what are the possible solutions , like
fine-tuning the model?
In addition to using the Pre-Trained Models, you can Tune the Hyper-Parameters of the Model (Custom Layers added by you) using HParams of Tensorboard.
Assume that I have a CNN which I am training on some dataset. The most important part of the model is the CNN architecture.
Now when I write a code, I define the model structure in a Python class. However, outside that class, I define a number of other nodes such as loss, accuracy, tf.Variable to keep count of epochs and so on.
When I am training, for properly resuming the training, I'd like to save all these nodes (e.g - loss, epoch variable etc), and not just the CNN structure.
However, once I am done with training, I would like to save only the CNN architecture and no nodes for loss, accuracy etc. This is because it will enable people using my model to exercise freedom in writing their own finetuning codes.
How to achieve this in TF code ? Can someone show an example ?
Is this approach towards saving followed by others also ? I just want to know if my approach is right.
I have trained the object detection API using ssd_mobilenet_v1_coco_2017_11_17 model to detect a custom object. But after training, the API only detects the custom object and not the objects for which the API is already trained. ssd_mobilenet_v1_coco_2017_11_17 model detect 90 objects.
Is there any way to add more classes to an existing model so that it can detect new objects along with the one it has been trained for?
This question is already asked here and some answer could be found here.
The very last layer of the networks is softmax layer. when the network is trained, the weights of the network is optimized for the exact number of classes on the training set. So, if you need to add a new class and also the classes it was trained, the easiest way is to get the original dataset it was trained on along with your new class images. Then start the training from the pre-trained model weights. The training should converge faster as it has to do relatively little adjustments.