Training YOLOv3 on COCO gives me bad mean average precision - yolo

I want to train YOLO to only detect the class person. Therefor I downloaded the COCO dataset, adjusted the labels to only this class and changed the config files accordingly, I then trained YOLO by following the steps described in the section "Training YOLO on COCO" on this site https://pjreddie.com/darknet/yolo/.
But the mean average precision (map) with my trained weights for the class person is much worse than the map for the same class when I use the trained weights from the same page under the caption "Performance on the COCO Dataset". I was wondering what could be the reason for this, and which data was used to train the weights available at the homepage.

Probably there's something wrong when you modify the cfg file (classes, filters, etc). Anyway what's the purpose of your task? Do you really need to retrain the model, or you only need to filter 1 class and make detection?
If you want to filter the Person label only out of 80 classes, you can simply do this workaround method. You don't need to retrain the model, you just need to use the weight provided by the author on yolo website.
For easy and simple way using COCO dataset, follow these steps :
Modify (or copy for backup) the coco.names file in darknet\data\coco.names
Delete all other classes except person
Modify your cfg file (e.g. yolov3.cfg), change the 3 classes on line 610, 696, 783 from 80 to 1
Change the 3 filters in cfg file on line 603, 689, 776 from 255 to 18 (derived from (classes+5)x3)
Run the detector ./darknet detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights data/your_image.jpg
For more advance way using COCO dataset you can use this repo to create yolo datasets based on voc, coco or open images. https://github.com/holger-prause/yolo_utils .
Also refer to this : How can I download a specific part of Coco Dataset?

It would be much simpler, IMMO, to use the pretrained weights with the presupplied *.names, *.data and *.cfg. Then you run the detector for single images or for a list of image file names in the mode (I do not remember the details) where YOLO outputs a list of detections in the form of class name, confidence and bbox. You then simply ignore anything other than the "person" class.

Related

Feed image data without class label

I am trying to implement image super resolution using SRGAN. In the process, I used DIV2K dataset (http://data.vision.ee.ethz.ch/cvl/DIV2K/DIV2K_train_HR.zip) as my source.
I have worked with image classification using CNN (I used keras.layers.convolutional.Conv2D). But in this case we don't have class label in my data source.
I have unzipped the file and kept in D:\Unzipped\DIV2K_train_HR. Then used following command to read the files.
img_dataset = tensorflow.keras.utils.image_dataset_from_directory("D:\\unzipped")
Then created the model as follows
model = Sequential()
model.add(Conv2D(filters=64,kernel_size=(3,3),activation="relu",input_shape=(256,256,3)))
model.add(AveragePooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=64,kernel_size=(3,3),activation="relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.compile(optimizer='sgd', loss='mse')
model.fit(img_dataset,batch_size=32, epochs=10)
But I am getting error : "Graph execution error". I am unable to find the root cause behind this error. Is this error appearing as the class label is missing (I think as per code DIV2K_train_HR is treated as one class label)? Or is this happening due to images don't have one specific size?
Note: This code does not match with SRGAN architecture. I am new to GAN and trying to move ahead step by step. I got stuck in the first step itself.
Yes, the error message is because you don't have labels in your dataset.
As a first step in GAN network you need to create a discriminator model: given some image it should recognize if it is a real or fake image. You can take images from your dataset and label them as 1 ("real images"). Then generate "fake images" by down-sampling and up-sampling images from your dataset and label them as 0. Train your discriminator model so that it can distinguish between original and processed images.
After that, you create generator model. The generator model takes a down-sampled version of the image as an input and creates an up-sampled version in original resolution. GAN model combines generator and discriminator models by passing output from generator to discriminator. The target label is 1, i.e. we want generator create up-sampled versions of images, which discriminator can't distinguish from the real ones. Now train GAN network (set 'trainable' to false for discriminator model weights).
After your generator manages to produce images, which discriminator can't distinguish from the real, you take them, label as 0 and train discriminator again. Then train generator again etc.
The process continues until discriminator can't distinguish fake images from the real ones anymore (i.e. accuracy doesn't exceed 0.5).
Please see a simple example on ("Generative Adversarial Networks"):
https://github.com/ageron/handson-ml3/blob/main/17_autoencoders_gans_and_diffusion_models.ipynb
This code is explained in ch. 17 in book "Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow (3rd edition)" by Aurélien Géron.

variational autoencoder with limited data

Im working on a binary classificaton project, and im using VAE (variational autoencoder) to handle the imbalance between the 2 classes by generating new samples for the minority class.
the first class (majority class) contains 20000 samples, and the second one (minority class) contains 500 samples.
After training VAE model on the minority class, i generated new samples for this class and add them to the training set, then i trained two classification models, a model on trained on the imbalanced data (only training set) and the second one trained with training set + data generated by VAE). The problem is the first model is giving results better than the second(f1-score, Roc auc...), and i thought that maybe the problem was because of the limited amount of data that the VAE was trained on.
Any help please.
Though 500 training Images are not good enough to generate diversified images from a VAE, you can still try producing some. It's better to take mean of latents of 10 different images (or even more) and pass it through the decoder ( if you're already doing this, ignore it. If you're doing some other method, try this).
If it's still not working, then, I suggest you to build a Conditional VAE on your entire dataset. In conditional VAE, you train VAE using the labels so that your models learns not only reconstruction but also what class of image it is reconstructing. This helps you to generate an Image of any particular class.

Load data for Mask RCNN

I want to train Mask-RCNN on my own dataset. I already have the segmented images (ground truths) of leaves which look something like the image below:
How can I load the dataset for training Mask RCNN?
Since Mask RCNN is pre-trained on COCO dataset,you need to train it with these images. For that purpose you have to label them and train it. Since a mask is involved use a tool such as VGG annotator to do the necessary annotation and labelling, it will generate a json file depending on your classes. Later based on your requirement you have to run the .py files for your classes, train and then generate it for testing.
You will have to convert this to TFrecords for the MASK RCNN model to be able to read the image and its annotations. Please refer this medium article 'https://medium.com/#vijendra1125/custom-mask-rcnn-using-tensorflow-object-detection-api-101149ce0765'
You can use coco annotations ( here is an example ) to annotate your dataset and then just run it like you use coco dataset.
Also you can check this code : https://github.com/matterport/Mask_RCNN/blob/master/samples/shapes/train_shapes.ipynb

Can I continue to training from final .weight with more train and test images?

I trained my custom object detection with darknet yolov3 untill the average loss decreased down to 0.06 but now i want to train it with more training and test images (maybe also deleting some of the image files). Can I do these steps and continue to training with final .weights file or I should start it from the beginning?
Yes, you can use the currently trained model (.weights file) as the pre-trained model for the new training session. For example, if you use AlexeyAB repository you can train your model by a command like this:
darknet.exe detector train data/obj.data yolo-obj.cfg darknet53.conv.74
where darknet53.conv.74 is the pre-trained model.
In the new training session, you can add or remove images. However, the basic configurations should be correct (like the number of classes, etc).
According to the page I mentioned:
in the original repository original repository the
weights-file is saved only once every 10 000 iterations
If you have just modified the data set, but are not interested in changing the model architecture,you can directly resume from the previously saved model using DarkNet in AlexeyAB/darknet. For example,
darknet.exe detector train cfg/obj.data cfg/yolov3.cfg yolov3_weights_last.weights -clear -map
The clear flag will reset iterations saved in the weights, which is appropriate in case of data set changes. That is because the learning rate often depends on the iterations, and you probably don't want to change the configurations.
You need to specify more epochs if you resume. For example if you train to 300/300 then resume will also train to 300 also (starting at 300) unless you specify more epochs..
python train.py --resume
you can resume your training from the previously saved weights, of your custom model.
use the "yolov3_custom_last.weights" instead of the pre-trained default weights.
Incase you find some issues with resuming, try changing the batch size .
this should work and resume your model training with new set of images :)
open the .cfg, find the max_batches code may be in 22 row, set the bigger value:
max_batches = 500200
max_batches is the same to the tranning iteration.
How to continute training after 50000 iteration? #2633

Tensorflow : Is it possible to identify the data is used for training?

I have created text classification model(.pb) using tensorflow. Prediction is good.
Is it possible to check the sentence using for prediction is already used to train the model or not. I need to retrain the model when new sentence is given to model to predict.
I did some research and couldn't find a way to get the train data only with the pb file because that file only stores the features and not the actual train data(obviously),but if you have the dataset,then you can easily verify duh....
I don't think you can ever find the exact train data with only the trained model,cause the model only contains the features and not the actual train data