Can I continue to training from final .weight with more train and test images? - object-detection

I trained my custom object detection with darknet yolov3 untill the average loss decreased down to 0.06 but now i want to train it with more training and test images (maybe also deleting some of the image files). Can I do these steps and continue to training with final .weights file or I should start it from the beginning?

Yes, you can use the currently trained model (.weights file) as the pre-trained model for the new training session. For example, if you use AlexeyAB repository you can train your model by a command like this:
darknet.exe detector train data/obj.data yolo-obj.cfg darknet53.conv.74
where darknet53.conv.74 is the pre-trained model.
In the new training session, you can add or remove images. However, the basic configurations should be correct (like the number of classes, etc).
According to the page I mentioned:
in the original repository original repository the
weights-file is saved only once every 10 000 iterations

If you have just modified the data set, but are not interested in changing the model architecture,you can directly resume from the previously saved model using DarkNet in AlexeyAB/darknet. For example,
darknet.exe detector train cfg/obj.data cfg/yolov3.cfg yolov3_weights_last.weights -clear -map
The clear flag will reset iterations saved in the weights, which is appropriate in case of data set changes. That is because the learning rate often depends on the iterations, and you probably don't want to change the configurations.

You need to specify more epochs if you resume. For example if you train to 300/300 then resume will also train to 300 also (starting at 300) unless you specify more epochs..
python train.py --resume

you can resume your training from the previously saved weights, of your custom model.
use the "yolov3_custom_last.weights" instead of the pre-trained default weights.
Incase you find some issues with resuming, try changing the batch size .
this should work and resume your model training with new set of images :)

open the .cfg, find the max_batches code may be in 22 row, set the bigger value:
max_batches = 500200
max_batches is the same to the tranning iteration.
How to continute training after 50000 iteration? #2633

Related

Image classification model re-calibration

I built an image classification model (CNN) using tensorflow-keras. I have some new images which I need to feed into the same model in order to increase the accuracy of the existing model.
I tried using the following code. But it decreases the accuracy.
re_calibrated_model = loaded_model.fit_generator(new_training_set,
steps_per_epoch=int(stp),
epochs=int(epc),
validation_data=new_test_set,
verbose=1,
validation_steps = 50)
Is there any method that I can use to re-calibrate my CNN model?
Your new training session does not start from previous training accuracy if you use completely different dataset to do second training.
You need to feed (old_images+new_images) for your intention.
What I normally do is to train the CNN model on the first batch of images and save that model. If I need to "retrain" the model with additional images, I load the previous saved model from disk and apply the inputs (test and train) and call the fit method. As mentioned before this only works if your outputs are exactly the same i.e. if you are using the same input and output classes.
In my experience training models using different image batches does not necessarily make your model more or less accurate, but rather increase the training time with each batch. Since I am using a CPU to train, my training time is about 30% faster if I train two batches of 1000 images each as oppose to training one batch of 2000 images for example.

Using ssd_inception_v2 to train on different resolution

The dataset contains images of different sizes.
The pretrained weights are trained on 300x300 resolution.
I am training on widerface dataset where objects are as small as 15x15.
Q1. I want to train with 800x800 resolution do i need to resize all the images manually or this will be done by Tensorflow automatically ?
I am using the following command to train:
python3 /opt/github/models/research/object_detection/legacy/train.py --logtostderr --train_dir=/opt/github/object_detection_retraining/wider_face_checkpoint/ --pipeline_config_path=/opt/github/object_detection_retraining/models/ssd_inception_v2_coco_2018_01_28/pipeline.config
Q2. I also tried training it using the model_main.py but after 1000 iterations it is evaluating the dataset with each iteration.
I am using the following command to train:
python3 /opt/github/models/research/object_detection/model_main.py --num_train_steps=200000 --logtostderr --model_dir=/opt/github/object_detection_retraining/wider_face_checkpoint/ --pipeline_config_path=/opt/github/object_detection_retraining/models/ssd_inception_v2_coco_2018_01_28/pipeline.config
Q3. Also if you can suggest any model i should use for real time face detection apart from mobilenet and inception, please suggest.
Thanks.
Q1. No you do not need to resize manually. See this detailed answer.
Q2. By 1000 iterations you meant steps right? (An iteration counts as a complete cycle of the dataset.) Usually the model performed evaluation after a certain amount of time, e.g. 10 minutes. So in every 10 minutes, the checkpoints are saved and an evaluation of the model on evaluation set is performed.
Q3. SSD models with mobilenet is one of the fast detectors, apart from that you can try YOLO models for real time detection

Tensorflow Object-Detection API - How does the Fine-Tuning of a model works?

This is a more general question about the Tensorflow Object-Detection API.
I am using this API, to be more concrete I fine-tune a model to my dataset. According to the description of the API, I use the model_main.py function to retrain a model from a given checkpoint/frozen graph.
However, it is not clear for me how the fine-tuning is working within the API. Does a re-initialization of the last layer happen automatically or do I have to implement something like ?
In the README files I did not find any hint concerning this topic. Maybe somebody could help me.
Training from stratch or training from a checkpoint, model_main.py is the main program, besides this program, all you need is a correct pipeline config file.
So for fine-tuning, it can be separated into two steps, restoring weights and updating weights. Both steps can be customly configured according to the train proto file, this proto corresponds to train_config in the pipeline config file.
train_config: {
batch_size: 24
optimizer { }
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
fine_tune_checkpoint_type: "detection"
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 200000
data_augmentation_options {}
}
Step 1, restoring weights.
In this step, you can config the variables to be restored by setting fine_tune_checkpoint_type, the options are detection and classification. By setting it to detection essentially you can restore almost all variables from the checkpoint, and by setting it to classification, only variables from the feature_extractor scope are restored, (all the layers in backbone networks, like VGG, Resnet, MobileNet, they are called feature extractors).
Previously this is controlled by from_detection_checkpoint and load_all_detection_checkpoint_vars, but these two fields are deprecated.
Also notice that after you configured the fine_tune_checkpoint_type, the actual restoring operation would check if the variable in the graph exists in the checkpoint, and if not, the variable would be initialized with routine initialization operation.
Give an example, suppose you want to fine-tune a ssd_mobilenet_v1_custom_data model and you downloaded the checkpoint ssd_mobilenet_v1_coco, when you set fine_tune_checkpoint_type: detection, then all variables in the graph that are also available in the checkpoint file will be restored, and the box predictor (last layer) weights will also be restored. But if you set fine_tune_checkpoint_type: classification, then only the weights for mobilenet layers are restored. But if you use a different model checkpoint, say faster_rcnn_resnet_xxx, then because variables in the graph are not available in the checkpoint, you will see the output log saying Variable XXX is not available in checkpoint warning, and they won't be restored.
Step 2, updating weights
Now you have all weights restored and you want to keep training (fine-tuning) on your own dataset, normally this should be enough.
But if you want to experiment with something and you want to freeze some layers during training, then you can customize the training by setting freeze_variables. Say you want to freeze all the weights of the mobilenet and only updating the weights for the box predictor, you can set freeze_variables: [feature_extractor] so that all variables that have feature_extractor in their names won't be updated. For detailed info, please see another answer that I wrote.
So to fine-tune a model on your custom dataset, you should prepare a custom config file. You can start with the sample config files and then modify some fields to suit your needs.

when to stop training object detection tensorflow

I am training faster rcnn model on fruit dataset using a pretrained model provided in google api(faster_rcnn_inception_resnet_v2_atrous_coco).
I made few changes to the default configuration. (number of classes : 12 fine_tune_checkpoint: path to the pretrained checkpoint model and from_detection_checkpoint: true). Total number of annotated images I have is around 12000.
After training for 9000 steps, the results I got have an accuracy percent below 1, though I was expecting it to be atleast 50% (In evaluation nothing is getting detected as accuracy is almost 0). The loss fluctuates in between 0 and 4.
What should be the number of steps I should train it for. I read an article which says to run around 800k steps but its the number of step when you train from scratch?
FC layers of the model are changed because of the different number of the classes but it should not effect those classes which are already present in the pre-trained model like 'apple'?
Any help would be much appreciated!
You shouldn't look at your training loss to determine when to stop. Instead, you should run your model through the evaluator periodically, and stop training when the evaluation mAP stops improving.

Training trained seq2seq model on additional training data

I have trained a seq2seq model with 1M samples and saved the latest checkpoint. Now, I have some additional training data of 50K sentence pairs which has not been seen in previous training data. How can I adapt the current model to this new data without starting the training from scratch?
You do not have to re-run the whole network initialization. You may run an incremental training.
Training from pre-trained parameters
Another use case it to use a base model and train it further with new training options (in particular the optimization method and the learning rate). Using -train_from without -continue will start a new training with parameters initialized from a pre-trained model.
Remember to tokenize your 50K corpus the same way you tokenized the previous one.
Also, you do not have to use the same vocabulary beginning with OpenNMT 0.9. See the Updating the vocabularies section and use the appropriate value with -update_vocab option.