TensorFlow model saving to be approached differently during training Vs. deployment? - tensorflow

Assume that I have a CNN which I am training on some dataset. The most important part of the model is the CNN architecture.
Now when I write a code, I define the model structure in a Python class. However, outside that class, I define a number of other nodes such as loss, accuracy, tf.Variable to keep count of epochs and so on.
When I am training, for properly resuming the training, I'd like to save all these nodes (e.g - loss, epoch variable etc), and not just the CNN structure.
However, once I am done with training, I would like to save only the CNN architecture and no nodes for loss, accuracy etc. This is because it will enable people using my model to exercise freedom in writing their own finetuning codes.
How to achieve this in TF code ? Can someone show an example ?
Is this approach towards saving followed by others also ? I just want to know if my approach is right.

Related

How to best transfer learning using Dopamine for Reinforcement Learning?

I am using Google's Dopamine framework to train a specific reinforcement learning use-case. I am using an auto encoder to pre-train the convolutional layers of the Deep Q Network and then transfer those pre-trained weights in the final network.
To that end, I have created a separate model (in this case an auto-encoder) which I train and save the resulting model and weights.
The DQN model is created using Keras's model sub-classing method and the model used to save the trained convolutional layers weights was build using the Sequential API. My issue is with when trying to load the pre-trained weights to my final DQN model. Based on whether I use the load_model() or load_weights() functionality from Tensorflow's API I get two different overall behaviors of my network and I would like to understand why. Specifically I have the two following scenarios:
Loading the weights with theload_weights() method to the final model. The weights are the weights of the encoder plus one additional layer(added just before saving the weights) to fit the architecture of the final network implemented in dopamine where they are loaded.
First load the saved model with load_model() and then when defining the new model in the __init__() method, extract the relevant layers from the loaded model and then use them for the final model.
Overall, I would expect the two approaches to yield similar results with regards to the average reward achieved per episode , when I use the same pre-trained weights. However the two approaches differ ( 1. yield higher average reward than 2. although using the same pre-trained weights) and I don't understand why.
Furthermore, in order to validate this behavior I have tried loading random weights with the two aforementioned approaches in order to see a change in behavior. In both cases, based on which of the two aforementioned loading methods I am using, I end up with very similar resulting behavior with the respected case when loading the trained weights. It's seems like the pre-trained weights in each respected case have no effect on the overall resulting training behavior. Although, this might be irrelevant to the issue I am trying to investigate here as it might be the case that the pre-trained weights don't offer any benefit overall which is also possible.
Any thoughts and ideas on this would be much appreciated.

is it a good practice to overfit your deep learning model on just one data instance for testing the correctness

Currently i am implementing an object detection model for face detection , and tried training my model on just one single instance of data. (Input : image , target : labels).
But after training my model for a long time, its not able to converge optimally. or in other words i should say it was achieving a saddle point from which it couldnt come out. So i was wondering whether its because the data is just one instance and model is somehow not able to learn or something is wrong with my loss function .
I am using yolo architechture and yolo like loss function to train my model.
Adam optimizer for minimizing the loss over time.
thanks,

Why does the loss explode during training from scratch? - Tensorflow Object Detection Models

First of all I want to state out that I am familiar with the benefits of transfer learning. Moreover I am able to train a pretrained model from 'modelzoo' on my dataset. But for research purposes I want to train my model from scratch without transferlearning.
I want to adopt the Faster-RCNN Resnet 101 implementation from tensorsflow's Object Detection API to my dataset. If I use one of the pretrained models the training goes as expected and the loss is always in 'normal' ranges (never above about 6). But if I do not use transferlearning the loss jumps very frequently to extrem high values (about 80,000,000), but between those values the loss is in normal ranges. In addition to this I do not see any predictions of the network on images in TensorBoard. It seems like the network does not make any predictions at all. The only thing which I change is to comment out those two lines in the model.config file:
# fine_tune_checkpoint: 'path'
# from_detection_checkpoint: true
I tried a lot of things to find the reason: Changed optimizer, changed the learning rate, used gradient clipping, changed the initializer used different machines to train on but nothing helps. Moreover I inspected my label_map as well as my record file. To ensure that those files are correct I redid the steps mentioned above by using the pascal voc dataset, the script to create records and the label map from the api, but even with this code from the Object Detection API without any code changes, the loss explodes (Tensorflow Object Detection API own inputs).

What are the purposes of each step in train-evaluate-predict in tensorflow?

What do each of the stages do? I understand that for neural nets in nlp, the train will find the best parameters for the word embedding. But what is the purpose of the evaluation step? What is it supposed to do? How is that different from the prediction phase?
Training, evaluation and prediction are the three main steps of training a model ( basically in any ML framework ) and to move a model from research/development to production.
Training:
A suitable ML architecture is selected based on the problem which needs to be solved. Hyperparameter optimization is carried out to fine-tune the model. The model is then trained on the data for a certain number of epochs. Metrics such as loss, accuracy, MSE are monitored.
Evaluation:
We need to move the model to production. The model in the production
stage will only make inferences and hence we require the best model
possible. So, in order to evaluate or test the model based on some
predefined levels, the evaluation phase is carried out.
Evaluation is mostly carried out on the data which is a subset of the original dataset. Training and evaluations splits are made while preprocessing the data. Metrics are calculated in order to check the performance of the model on the evaluation dataset.
The evaluation data has been never seen by the model as it is not trained on it. Hence, the model's best performance is expected here.
Prediction:
After the testing of the model, we can move it to production. In the production phase, models only make an inference ( predictions ) on the data given to them. No training takes place here.
Even after a thorough examination, the model tends to make
mispredictions. Hence, in the production stage, we can receive
interactive feedback from the users about the performance of the
model.
Now,
But what is the purpose of the evaluation step? What is it supposed to
do? How is that different from the prediction phase?
Evaluation is to make the model better for most cases through which it will come across. Predictions are made to check for other problems which are not related to performance.

Training object detectors from scratch leads to really bad performance

I am trying to train a Faster-RCNN network with Inception-v3 architecture (reference paper: Google's paper) as my fixed feature extractor using keras on my own dataset (number of classes = 4) which is very different compared to the Image-net. Still I initialized it with Image-net weights because this paper gives evidence that initializing with pre-trained weights is always better compared to random initialization.
Upon Training for 60 Epochs my Training accuracy is at 96% and my validation accuracy is at 84% ,Over-fit! (severe maybe?). But what is more worrying is that my loss did not converge at all. Upon testing the network it failed miserably! like, it didn't even detect.
Then I took a slightly different approach. I did a two step training. First I trained the Inception-v3 on my dataset like a classification problem (Still initialized it with Image-net weights) it converged well. Then I used those weights to initialize the Faster-RCNN network. This worked! But, I am confused why this two staged approach works but Training from scratch didn't work. Given I initialized both the methods with the pre-trained image-net weights initially.
Is there a way to train Faster RCNN from scratch?