Does people train object detection methods with early stopping and what were their settings? - object-detection

I'm working on some stuff related to object detection methods (YOLOv3, Faster-RCNN, RetinaNet, ... ) and I need to train on VOC2007 and VOC2012 (using pretrained models of course). However when I read the relevant papers I do not see people describe if they trained using early stopping or just fixed number of iterations. And if they used early stopping, how many steps were set before stopping ? Because when I tried 100 steps before stopping, it got really poor results . Please help me, thank you very much.

I found an implementation of the PASCAL VOC2012 dataset trained for semantic segmentation that uses the following early stopping parameters:
earlyStopping = EarlyStopping(
monitor='val_loss', patience=30, verbose=2, mode='auto')

Related

Tensorflow Object Detection API low loss low confidence - checkpoint not saving weights

A few months ago I trained a custom object detector on stanford dogs using efficientdet_d0_512x512 and only 2 classes of dogs with success. Not changing the code, I tried doing that again and the model was pumping out really low confidence scores (<1%), even though the loss in the training process was low.
I then tried resuming the training using the checkpoint generated after the initial training and the loss starts high as if the checkpoint did not exist.
I also tried working with faster rcnn, getting the same results. Here's the code:
https://colab.research.google.com/drive/1fE3TYRyRrvKI2sVSQOPUaOzA9JItkuFL?usp=sharing
I'm thinking that the exporting is not working and the trained weights are not saved. Any ideas?
It's seems indeed that your checkpoint cannot be loaded as the many warnings you're getting :
WARNING:tensorflow:Unresolved object in checkpoint:
(root). .......
After a few research I found this issue on the model repository of Tensorflow Object Detection API : https://github.com/tensorflow/models/issues/8892#issuecomment-680207038.
Basically it's saying that you should change :
fine_tune_checkpoint_type = "detection"
to :
fine_tune_checkpoint_type = "fine_tune"
so that the checkpoint loaded will be used in fine-tuning and the number of classes doesn't cause issues between your configuration file and the one you're starting with.
It also suggests to be careful whereas your modeldir (where the custom checkpoints and tensorboard events will be saved) is empty or not for the same reasons but it seems that you're good with that on your colab notebook.
Also on a different subject you should be careful to your learning rate, right now you're using a cosine_decay_learning_rate which requires some warmup steps, 2500 in that case. However you are using only 800 so the warmups steps are not completed when you stop the training! If for some reason you want to keep the number of steps low you should change your learning rate to a manual_step_learning_rate or exponential_decay_learning rate. Otherwise you should keep your training going for much longer.
EDIT : After further investigation the problem might be a bit deeper confers to this issue on github : https://github.com/tensorflow/models/issues/9229
You might want to keep an eye on this to see where this is going.

Different evaluation accuracy when loading BERT from checkpoint

For some reason, I get wildly different loss and acc when I evaluate my BERT test set right after training vs. when I load from a saved checkpoint. I thought it might have been my adaptation of BERT, so I tried modifying the run_classifier.py script as little as possible to fit my use case, and I still am seeing this problem.
The only reason I can think of is that the model isn't loading correctly, but I don't know how to fix it. I believe I'm loading how originally intended. For the init_checkpoint parameter, I pass path/to/classifier/model.ckpt-{last_step}. There are three model files (meta, index, data) but there are also the checkpoint, events, and graph files. Do I need to be doing something with those other three files as well? I'm used to using keras, and this pure tensorflow saving/loading process seems unnecessarily convoluted to me.
Thank you in advance for any help/insight regarding BERT or pure tf saving/loading! If you're unfamiliar with BERT, here's the github link: BERT GitHub

Quantization aware training examples?

I want to do quantization-aware training with a basic convolutional neural network that I define directly in tensorflow (I don't want to use other API's such as Keras). The only ressource that I am aware of is the readme here:
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/quantize
However its not clear exactly where the different quantization commands should go in the overall process of training and then freezing the graph for actual inference.
Therefore I am wondering if there is any code example out there that shows how to define, train, and freeze a simple convolutional neural network with quantization aware training in tensorflow?
It seems that others have had the same question as well, see for instance here.
Thanks!

Does Tensorflow support Keras models fit() method with eager execution?

I am training a Keras model (tf.keras.models.Sequential) calling its method fit().
Since I enabled eager execution, training time (for the same number of epochs) went up from 20.1s to 49.4s. Also, training didn't seem to converge anymore, as loss remained around 9 (without eager execution it went down to 1), while method fit() didn't even report the requested metric "accuracy" anymore.
Is eager execution support for Keras models? Note that I am calling method fit() on the model, not using an estimator.
Here the snippet of code that declares the model and does the training. Using TF 1.7 for GPU installed with pip3.
tf.enable_eager_execution()
model = tf.keras.models.Sequential([
tf.keras.layers.InputLayer(input_shape=(11,)) ,
tf.keras.layers.Dense(64, activation='relu') ,
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dense(11, activation='softmax')
])
optimizer = tf.train.AdamOptimizer()
# optimizer = 'adam'
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x=train_X, y=train_y, epochs=200, batch_size=64, verbose=2)
UPDATE: filed issue #18642 on Tensorflow GITHUB.
The issue I reported on tensorflow got this answer:
Thank you for the bug report. We have a fix for this issue, that will
show up on GitHub soon.
See issue #18642 on GITHUB for Tensorflow.
Based on this, I understand that method fit() of Keras models will be supported with eager execution, once the bug is fixed.
Here is a quote from the Tensorflow site found here
There are many parameters to optimize when calculating derivatives. TensorFlow code is easier to read when structured into reusable classes and objects instead of a single top-level function. Eager execution encourages the use of the Keras-style layer classes in the tf.keras.layers module. Additionally, the tf.train.Optimizer classes provide sophisticated techniques to calculate parameter updates.
That means keras layers and subsequent models are allowed using Eager execution.
As for your timing, the link also mentions how using eager stops building of graphs.
TensorFlow's eager execution is an imperative programming environment that evaluates operations immediately, without an extra graph-building step. Operations return concrete values instead of constructing a computational graph to run later.
This may make it harder for your model to run given the number of DENSE layers you have. Someone may correct me on that because I have not done much work with DENSE layers before, or it has been a long time since I have. If that does not work then I would look into your loss function. This answer may help if that becomes a problem.
Everything else looks alright though. Hope this helps.
EDIT
Ok I see what you are saying Fate. Yeah the first link uses Sequential model, but Gradient tape fro gradient decent. Reading deeper into the eager tutorial shows that they only use Gradient tape as well. Here is what the tutorial says about training:
Automatic differentiation is useful for implementing machine learning algorithms such as backpropagation for training neural networks. During eager execution, use tfe.GradientTape to trace operations for computing gradients later.tfe.GradientTape is an opt-in feature to provide maximal performance when not tracing. Since different operations can occur during each call, all forward-pass operations get recorded to a "tape". To compute the gradient, play the tape backwards and then discard. A particular tfe.GradientTape can only be computed once, subsequent calls throw a runtime error.
So maybe as of right now only Gradient tape and the estimator method are what you are supposed to use with eager.
When reading the compile method on Model (documentation), you can find an argument, run_eagerly:
run_eagerly: Bool. Defaults to False. If True, this Model's logic will not be wrapped in a tf.function. Recommended to leave this as None unless your Model cannot be run inside a tf.function.
So by default, a tf.keras.Model will default to running through graph execution, not eager execution.

Neural Network - how to test that it is implemented properly?

I've implemented the Neural Network using Tensorflow. During the implementation and training, I've found several not-so-trivial bugs.
Example: during the training I had same Mini-Batch loss for different steps/epochs, but different accuracy.
Now the neural network seems to be ready and working properly. I haven't managed to train it well yet, but I am working on it.
Anyway, I would like to check somehow that I haven't done any computational errors there. I am thinking about generating some artificial data for "fake" classification problem with lets say 4 features. The classification should have a very clear human-understandable dependency between the classification output and 4 features. The idea is to try to train the NN on it and see how it performs.
What do you think?
Stanford's c231n has a couple of general tips for this, like gradient checking.
If you're just learning neural networks, why don't you try to run your implementation on some known data? Many courses provide error and loss curves form models with specified hyperparameters, so you can check whether your implementation's behavior differs significantly from correct implementation.