How to train a model with 1 type of training images? - object-detection

I need to train a model to count number of cylinders with a particular type of occlusion in the image.
Following is the image:
As you can see the occlusion is of only one type - Top railing of truck
I do have only one image. Even with augmentation, I wont have much images to train.
How can I proceed?

Related

how to detect not in trained-for-category images when using resnet50

I have trained resnet50 on four categories of images. It works fantastic when I feed it an image in any one of the four categories -- I have essentially 100% accuracy on images in these categories.
However, when I feed my trained Resnet50 model an image of a similar object, but not in one of the original four categories, the prediction comes back as one of the four existing classes. By this I mean, in the array that is returned with the likelihood of each category, in many cases the likelihood of one of the categories is basically 1. For example, when I query the model about image that is not in one of the four categories, the prediction array will look like
[1.3492944e-07 9.9999988e-01 8.3132584e-14 1.4716975e-24]
Here is the prediction array for an image that the model was trained on:
[1.8217645e-27 1.0000000e+00 3.6731971e-32 0.0000000e+00]
These scores are different, but not much different. Many of the images that are not in one of the trained-for categories have a 1.00000000 for one of the labels.
I had been planning on dealing with the oddball images by looking at the prediction array to see if the max(category labels prediction) was below some threshold. But most of my max(category labels predictions) are all above .99999 and so I can't differentiate between images in the training set and images not part of the training set.
I plan to train my model for N buckets. When I am running the system I will occasionally have images that are not in one of the N buckets and I need to know that. I don't care what they are, I just want to know when an image is not in one of the N buckets.
Resnet50 does a great job of forcing everything into one of the categories, even when it is not.
My images are super well defined! I wonder if I am somehow overtraining or overlooking some other obvious error.
Here is an example of an image that was correctly categorized:
in training set and correctly categorized
Here is an image that is not part of the training set that was then categorized into one of the categories:
not in training set and incorrectly categorized
In summary: I am trying to sort images and I need to know when one of the images is not part of the training categories so I can reject that image. Restated, I want to sort images into buckets: known, trained for buckets, and one unknown bucket.
Is there any way to do this?
Should I use a different classifier than Resnet50?
My images are grayscale, bicubic interpolated during resize (large to smaller), 150x150. I have about 1,600 training images and 200 validation images per category. My accuracy and val_accuracy are .9997 after 3 epochs.
Training and validation accuracy
Training and validation loss
Your model only knows about 4 classes. It or any other model say MobileNet will always look at an image and assign probabilities to each of the 4 classes. You could put in a picture of a water buffalo and it will still try to classify it. Usually but not always if the out of class image you put in is very different from your training images the class with the highest probability will have a probability value well below 1.0. However in your case the out of class image is NOT all that different from the images in your dataset hence a fairly high false probability prediction.
All I can think off is if your out of class images will be generically similar to each other you could create a 5th class and train your model with the data you have plus gather some "typical" out of class images. Then train the model on these 5 classes. I made a model that classified 50 different dog breeds. It was extremely accurate. I put in a picture of Donald Trump and he was predicted as being a chihuahua!

How many training samples should I take for an object detection model with 62 classes?

I'm trying to train a YOLOv3 model for 62 classes using https://github.com/wizyoung/YOLOv3_TensorFlow.
How many samples should I take for each class.
I'm using a Nvidia GTX 1050Ti GPU so what should be my batch size with each image of 300*300 size?
Is 80-20 train/test split ideal?
The 80-20% train-test(val) split is dependent on the number of samples, not on the number of classes. The more data you have, the bigger the discrepancy percentage between train and test(val) it can be (for millions of samples data you can have 95%---5% split)
Normally, at least (minimum) number of 200 bounding_boxes_annotations per object should be present. That is, each of your classes should have at least 200 annotations.
1050Ti has only 4GB VRAM. Depending on your image_size, you can increase or decrease the batch_size. However, take into consideration that you do not have very much VRAM available, most likely(decrease it to 1 if you have OOM issues) a batch_size of 2 for images of 300x300 will be the maximum you can achieve.

understanding Image Classification ResNet accuracy

Currently I have trained a ResNet50 for image classification with a 85% classification accuracy. The doubt that I am encountering is that, when I input an image that is far away from the training set, instead of getting a 0% confidence in all the labels, I am getting 100% in one of the labels.
In other words if I train my network to classify dogs and cats it classifies correctly with 90-99% accuracy in between the image class but if I present a bicycle it tends to output 100% confidence for one of the labels.
Is this the correct expected response from the network?
If so, how can I make the network report 0% confidence in my training label set in case I present an image that is outside the training set?
Thanks

Do data augmentations by Tensorflow Object Detection API result in more samples than original?

So let's say my original raw dataset has 100 images. And I apply random_horizontal_flip data augmentation, which by default horizontally flips with 50% probability. So just for the sake of example, lets say it flips 50 of the 100 images. So,
Does that mean my algorithm will now be trained with 150 images (100 original and 50 flipped versions) or does it mean it will be trained with 100 images still, but 50 of them will be the flipped versions of the originals?
Is the answer to question #1 generalizable to all data augmentation options provided by Tensorflow object detection API?
I read as much official documentation as possible, and looked into preprocessor code, but couldn't find my answer.
Default augmentation probability, which is 50%, is independenetly applied to each image. Number of images that your model/algorithm is trained on depends on the number of epochs.
Let's say your batch size is 1 and total number of epoch is 100:
Your algoirthm will be trained on 100 images, 50 of them will be flipped version of the original images. In this case, model will not see the original 50 images because your epoch is too low.
Let's say your batch size is 1 and total number of epoch is 200:
Your algoirthm will be trained on 200 images, 100 of them will be flipped version of the original images.
As a result, as long as your epoch size is not limiting your dataset, with the probability of 50%, you will see an effect as if you have doubled the dataset by flipping each item.
In addition to the horizontal flip, if you add the vertical flip (random_vertical_flip) too, you triple your dataset.

Improving training accuracy on Resnet

I am training an object detector using mxnet/resnet50
After the last training run the mAP was 78%, and the loss was 0.37
When I run the detector on my test set (independent of train/val data)
I am getting false positives result - with some rather high 30-60% confidence levels. I think I need to add some train/val images that do not have ANY of the objects i'm training the detector for.
I'm planning on adding about 20% more images that have a label of -1 -- which I read somewhere is how you designate an image having no label in mxnet.
Does this seem reasonable? is -1 the right way to designate it? any downside?
Thanks,
john
One method for an unbalanced object detection task is to have a classifier before the object detection stage, which determines if the image contains an object or not. You can weight the loss for each class in this classifier relative to its inverse frequency (i.e. higher weight for classes that appear less frequently). You should test on data with a similar class balance as the real world. You might find this post useful.