How to make tensorflow object detection work on data having same shape but different color? - tensorflow

I have dataset of 3 types of cars: CarA, CarB, CarC.
All cars have the same shape but different color and/or logo.
I have trained an object detection model using tensor flow object detection api's with the base model as SSD ResNet50 V1 FPN 640x640 (RetinaNet50). Training was performed for 10000 steps and the training was stopped when loss reached 0.15
When I test the model, it classifies any car with the same shape as all 3 CarA, CarB, CarC. Is the model not able to distinguish based on color/logo and only works based on shape? I want to ask whether this color aspect can be handled better using any specific base model?

Related

How to generate the labels of custom data for YOLO

The labels for YOLO is like [class, x , y, width, height] . Since the dataset is very large, is there any shortcut to generate the labels for YOLO, or we have to hardcode them through measurement?
Method 1: Using Pre-trained YOLOv4 models.
YOLOv4 models were pre-trained on COCO dataset. So, if your object(s) can be found in this list, then, you can use the pre-trained weights to pseudo-label your object(s).
To process a list of images data/new_train.txt and save results of detection in Yolo training format for each image as label <image_name>.txt, use: darknet.exe detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights -thresh 0.25 -dont_show -save_labels < data/new_train.txt
Method 2: Using Other Pre-trained Models. It's the same concept. Use other pre-trained models to detect your object (as long as they have trained their models on your object), then export/convert the labels to YOLO format.
Method 3: Use hand-crafted feature descriptors. Examples are shape detection, color-based detection, etc.
Method 4: Manual labelling. If everything else fails, do the labelling yourself or hire some data labelling services. Here's a list of tools that you can use if you want to label them yourself.

TensorFlow Object Detection API - what do the losses mean in the object detection api?

What do each for the following losses mean? (in the TensorFlow Object detection API, while training FasterRCNN based models)
Loss/BoxClassifierLoss/classification_loss/mul_1
Loss/BoxClassifierLoss/localization_loss/mul_1
Loss/RPNLoss/localization_loss/mul_1
Loss/RPNLoss/objectness_loss/mul_1
clone_loss_1
The losses for the Region Proposal Network:
Loss/RPNLoss/localization_loss/mul_1: Localization Loss or the Loss of the Bounding Box regressor for the RPN
Loss/RPNLoss/objectness_loss/mul_1: Loss of the Classifier that classifies if a bounding box is an object of interest or background
The losses for the Final Classifier:
Loss/BoxClassifierLoss/classification_loss/mul_1: Loss for the classification of detected objects into various classes: Cat, Dog, Airplane etc
Loss/BoxClassifierLoss/localization_loss/mul_1: Localization Loss or the Loss of the Bounding Box regressor
clone_loss_1 is relevant only if you train on multiple GPUs: Tensorflow would create a clone of the model to train on each GPU and report the loss on each clone. If you are training the model on a single GPU/CPU, then you will just see clone_loss_1, which is the same as TotalLoss.
The other losses are as described in Rohit's answer.
There are four losses that you will encounter if you are using the faster rcnn network
1.RPN LOSS/LOCALIZATION LOSS
If we see the architecture of faster rcnn we will be having the cnn for getting the regoin proposals. For getting the region proposals from the feature map we have the loss functions . This is the localization loss for bounding boxes for the anchors generated.'
2.RPN LOSS/OBJECTNESS LOSS
This is also when we are extracting the region proposals whether the object is present in the anchorbox or not.
3.BOX_CLASSIFIERLOSS/CLASSIFICATION_LOSS
This is at the final layer to which class the object belongs to whether dog or cat??
4.BOX_CLASSIFIERLOSS/LOCALIZATION_LOSS
This is also at the final layer for the bounding boxes of the object. (coordinates for dog and cat)

can not detect the correct class when using Tensorflow object detection API

I am using tensorflow object detection API to train and detect my dataset. This dataset has 5 classes(every class has 50 images), and it contains two very similar classes(red and black). After the train process, I detected the test images and found that, the model always detects a target of red class as a target of black class, the other classes is detected as correct class.
I trained the model with faster_rcnn_resnet101_breads.congfig, and using fine_tune_checkpoint.I set the learning_rate to 0.003(the original is 0.0003).
Can you tell me what is wrong with my model, and how much the learning_rate should I set?
The compare result of my config file and samples config file:compare result
train curves:train curves
> black class: https://i.stack.imgur.com/eWrlK.jpg
> red class: https://i.stack.imgur.com/TuRjg.jpg

Retrain last inception or mobilenet layer to work with INPUT_SIZE 64x64 or 32x32

I want to retrain last inception or mobilenet layer so it would classify my own objects (about 5-15)
Also I want this to work with INPUT_SIZE == 64x64 or 32x32 (not 224 like for the default inception model)
I found some articles about retraining models:
https://hackernoon.com/creating-insanely-fast-image-classifiers-with-mobilenet-in-tensorflow-f030ce0a2991
https://medium.com/#daj/creating-an-image-classifier-on-android-using-tensorflow-part-3-215d61cb5fcd
For mobilenet they say
the input image size, either '224', '192', '160', or '128'
so I can't train with 64 or 32 (it's bad) https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/image_retraining/retrain.py#L80
What about inception models? Can I somehow train models to work with small image input sizes (to get results faster)?
Objects which I want to classify from such small images will be already cropped from its parent image (for example from camera frames), it could be traffic/road signs cropped by fastest cascade classifiers (LBP/Haar) which were trained to detect everything that looks like sign's shapes/figures (triangle/rhombus,circle shapes)
So 64x64 images which fully include/contain only interested object should be enough for classification
No you still can, use the smallest option which would be 128. It will just scale your 32 or 64 image up, which is fine.
it's not possible for classificators
but it become possible for tensorflow object detection api (we can set any input size) https://github.com/tensorflow/models/tree/master/research/object_detection

Changing Inception-v4 architecture to do Multi-label classification in Tensorflow

I am working on image tagging and annotation problem, simply an image may contain multiple objects. I want to train inception-v4 for multi-label classification. My training data will be an image and a vector of length equals the number of classes and has 1 in each index if the object exists in the image. For example, If I have four classes (Person, car, tree, buildings). If an image contains a person and car. Then my vector will be (1, 1, 0, 0).
What changes do I need to make to train inception-v4 for the tagging and annotation problem?
Do I only need to change the input format and change the loss function from softmax to sigmoid_cross_entropy_with_logits in the inception-v4 architecture?
https://github.com/tensorflow/models/blob/master/slim/nets/inception_v4.py
Thank you in advance.
If you'd like to retrain a model to output different labels, check out the image_retraining example: https://github.com/tensorflow/tensorflow/blob/r1.1/tensorflow/examples/image_retraining/retrain.py
In that example, we retrain the standard inception v3 model to recognize flowers instead of the standard ImageNet categories.