how to prepare own dataset for CNN - tensorflow

I am building a CNN model to classify own dataset in python using tensorflow 2. How should i prepare my dataset in the directory to load it to the model.

Create the train dataset and test dataset, extract them into 2 different folders named as “train” and “test”. The train folder should contain ‘n’ folders each containing images of respective classes. For example, in the Dog vs Cats dataset, the train folder should have 2 folders, namely “Dogs” and “Cats” containing respective images inside them. The same should be done for the test folder.
Then, you should use tf.keras.preprocessing.image.ImageDataGenerator and its flow_from_directory function.

Related

Tensorflow image_dataset_from_directory split to train, validation, testing

I have a data set containing pictures in 2 classes.
The dataset is organized in 1 folder that contains 2 folders, 1 for each class.
class 1 pictures path: "Project\Class_A"
class 2 pictures path: "Project\Class_B"
I want to read all the data (quickly).
splitting it into train, validation, test.
train a tensorflow model.
I saw that the image_dataset_from_directory function doing the trick nicely but I can not found how to split to a test dataset (only train, validation).
I want the data to be splitted randomly (and not just split it manually by moving some pictures to a different folder).
Do you have any idea for a quick way to read all the data and splitting it to train, validation, test datasets?
Thank you.

How to select annotated data for target class from multiclass public dataset for object detection model training to create TFrecord files?

Am trying to build object detection model using tf2 object detection API.
I have two public datasets with multi-class bounding box annotations
(Annotations are in Pascal-VOC .xml formats) from which I wish to create model only for selective classes.
FOR Example:
Dataset1 has classes ['Tiger', 'Cat', 'Leopard']
Dataset2 has classes ['Car', 'Auto'. 'Bicycle']
Now here, my target classes are ['Tiger', 'Cat', 'Car']
My question is what is the best method to create the TFrecord files with data having target class?
Also, some silly solution from my side, Is it possible to?
Merge both datasets as a single dataset.
Split as Train-Test.
Create a labelmap.pbtxt with my target classes.
While generating train & test TFRecord files, parse and select annotations info with reference to my target labelmap classes.
Here is the answer to your questions.
If you are using the generate_tfrecord.py script, just add your targeted class to this function and your model will only be trained for these classes and it will ignore other labeled classes in the annotation files.
You can merge both datasets as a single dataset but there can be an issue when the Dataset1 objects are present in Dataset2 and they will be unannotated there and the model will treat them as a background there and vice versa.
Yes, you can split them.
Yes, you can create labelmap.pbtxt

Training YOLOv3 on COCO gives me bad mean average precision

I want to train YOLO to only detect the class person. Therefor I downloaded the COCO dataset, adjusted the labels to only this class and changed the config files accordingly, I then trained YOLO by following the steps described in the section "Training YOLO on COCO" on this site https://pjreddie.com/darknet/yolo/.
But the mean average precision (map) with my trained weights for the class person is much worse than the map for the same class when I use the trained weights from the same page under the caption "Performance on the COCO Dataset". I was wondering what could be the reason for this, and which data was used to train the weights available at the homepage.
Probably there's something wrong when you modify the cfg file (classes, filters, etc). Anyway what's the purpose of your task? Do you really need to retrain the model, or you only need to filter 1 class and make detection?
If you want to filter the Person label only out of 80 classes, you can simply do this workaround method. You don't need to retrain the model, you just need to use the weight provided by the author on yolo website.
For easy and simple way using COCO dataset, follow these steps :
Modify (or copy for backup) the coco.names file in darknet\data\coco.names
Delete all other classes except person
Modify your cfg file (e.g. yolov3.cfg), change the 3 classes on line 610, 696, 783 from 80 to 1
Change the 3 filters in cfg file on line 603, 689, 776 from 255 to 18 (derived from (classes+5)x3)
Run the detector ./darknet detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights data/your_image.jpg
For more advance way using COCO dataset you can use this repo to create yolo datasets based on voc, coco or open images. https://github.com/holger-prause/yolo_utils .
Also refer to this : How can I download a specific part of Coco Dataset?
It would be much simpler, IMMO, to use the pretrained weights with the presupplied *.names, *.data and *.cfg. Then you run the detector for single images or for a list of image file names in the mode (I do not remember the details) where YOLO outputs a list of detections in the form of class name, confidence and bbox. You then simply ignore anything other than the "person" class.

Train dataset progressively using tensorflow

can we train image data-set progressively,like my previous training dataset is created using 500 images but now i want add more images in to it.
Should we train old dataset using more images ?
In Tensorflow there are checkpoints for this. You import already learned weights for an existing model and continue training on new (or existing) data. You can just add the new images to your dataset. For the repeatability of the training procedure it is useful to create a new record file. Of course you have to refer to the new record file during the training.

How to train new data continuously in tensorflow

I use TF-slim training flower data set, scripts is this. the flower data set has only 5 classes. If I add some new image data to the roses, or add a new classification, what should I do after the train 1000 steps? Do I need to delete already trained data, such as checkpoint files?
There exists a similar question on Data Science Stack Exchange, with an answer that considers your scenario:
Once a model is trained and you get new data which can be used for
training, you can load the previous model and train onto it. For
example, you can save your model as a .pickle file and load it and
train further onto it when new data is available. Do note that for the
model to predict correctly, the new training data should have a
similar distribution as the past data
I do the same in my own project, where I started with a small dataset that grew bigger over the time. After addding new data I retrain the model from the last checkpoint.