Object Detection MASK RCNN only for 2 classes - object-detection

I use (https://github.com/matterport/Mask_RCNN) MASKRCNN for object detection and everything works fine. Is there a way to detect only certain objects? There are more than 80 classes of which I only need 2 pieces (like car and person). I would like to have the remaining classes not detected. How can they be removed?

I guess you are using the default pre-trained coco model for detection which comes with 80 classes. You can train your own model.
First of all you have to use VIA (VGG image annotator) to label the classes you want to predict.
Once that is done, you have to make some changes to the code of the model. For example, if you are using the file "balloon.py", you have to add classes, make some changes in load_mask() function and few other parts of the codes. After that you can start training your model, then use for detection and segmentation.
Yes, and for VIA, try using the version 1.0.0 as the format of the .json file is slightly changed in in the updated versions, which generally makes them incompatible with training on your custom data sets.
Check one example here

Related

Multi-label image classification vs. object detection

For my next TF2-based computer vision project I need to classify images to a pre-defined set of classes. However, multiple objects of different classes can occur on one such image. That sounds like an object detection task, so I guess I could go for that.
But: I don't need to know where on an image each of these objects are, I just need to know which classes of objects are visible on an image.
Now I am thinking which route I should take. I am in particular interested in a high accuracy/quality of the solution. So I would prefer the approach that leads to better results. Thus from your experience, should I still go for an object detector, even though I don't need to know the location of the detected objects on the image, or should I rather build an image classifier, which could output all the classes that are located on an image? Is this even an option, can a "normal" classifier output multiple classes?
Since you don't need the object localization, stick to classification only.
Although you will be tempted to use the standard off-the-shelf network of multi-class multi-label object detection because of its re-usability, but realize that you are asking the model to do more things. If you have tons of data - not a problem. Or if your objects are similar to the ones used in ImageNet/COCO etc, you can simply use standard off-the-shelf object detection architecture and fine-tune on your dataset.
However, if you have less data and you need to train from scratch (e.g. medical images, weird objects), then object detection will be an overkill and will give you inferior results.
Remember, most of the object detection networks re-cycle the classification architectures with modifications added to last layers to incorporate additional outputs for object detection coordinates. There is a loss function associated with those additional outputs. During training in order to get best loss value, some of the classification accuracy is compromised for the sake of getting better object localization coordinates. You don't need that compromise. So, you can modify the last layer of object detection network and remove the outputs for coordinates.
Again, all this hassle is worth only if you have less data and you really need to train from scratch.

Is it possible to run two TFLite models at the same time on a Flutter App? / make Teachable Machine recognize when an object is not present?

I am using a Teachable Machine model which i trained to recognize some specific objects, the issue with it, however, is that it does not recognize when there is nothing, basically it always assumes that one of the objects is there. One potential solution I am considering is combining two models like the YOLO V2 Tflite model in the same app. Would this be even possible/efficient? If it is what would be the best way to do it?
If anyone knows a solution to get teachable machine to recognize when the object is not present that would probably be a much better solution.
Your problem can be solved making a model ensemble: Train a classifier that learns to know if your specific objects are not in the visual space, and then use your detection model.
However, I really recommend you to upload your model to an online service and consume it via an API. As I know tflite package just supports well MobileNet based models.
I had the same problem, just create another class called whatever you want(for example none) and put some non-related images in it, then train the model.
Now whenever there is nothing in the field, it should output none.

Image Detection & Classification - general approach?

I'm trying to build a detection + classification model that will recognize an object in an image and classify it. Every image will contain at most 1 object among my 10 classes (i.e. same image cannot contains 2 classes). An image can, however, contain none of my classes/objects. I'm struggling with the general approach to this problem, especially due to the nature of my problem; my objects have different sizes. This is what I have tried:
Trained a classifier with images that only contains my objects/classes, i.e. every image is the object itself with background pre-removed. Now, since the objects/images have different shapes (aspect ratios) I had to reshape the images to the same size (destroying the aspect ratios). This would work just fine if my purpose was to only build a classifier, but since I also need to detect the objects, this didn't work so good.
The second approach was similar to (1), except that I didn't reshape the objects naively, but kept the aspect ratios by padding the image with 0 (black). This completely destroyed my classifiers ability to perform well (accuracy < 5%).
Mask RCNN - I followed this blogpost to try build a detector + classifier in the same model. The approach took forever and I wasn't sure it was the right approach. I even used external tools (RectLabel) to generate annotated image files containing information about the bounding boxes.
Question:
How should I approach this problem, on a general level:
Should I build 2 separate models? (One for detection/localization and one for classification?)
Should I be annotating my images using annotations file as in approach (3)?
Do I have to reshape my images at any stage?
Thanks,
PS. In all of my approaches, I augmented the images to generate ~500-1000 images per class.
To answer your questions:
No, you don't have to build two separate models. What you are describing is called Object detection, which is classification along with localization. There are many models which do this: Mask_RCNN, Yolo, Detectron, SSD, etc..
Yes, you do need to annotate your images for training a model for your custom classes. Each of the models mentioned above has needs a different way of annotation.
No, you don't need to do any image resizing. Most of the time it is done when the model loads the data for training or inference.
You are on the right track with trying MaskRCNN.
Other than MaskRCNN, you could also try Yolo. There is also an accompanying easy-to-use annotating tool Yolo-Mark.
If you go through this tutorial, you would understand what you care about.
How to train your own Object Detector with TensorFlow’s Object Detector API
The SSD model is small so that it would not take so much time for training.
There are some object detection models.
On RectLabel, you can save bounding boxes in the PASCAL VOC format.
You can export TFRecord for Tensorflow.
https://rectlabel.com/help#tf_record

how to use tensorflow object detection API for face detection

Open CV provides a simple API to detect and extract faces from given images. ( I do not think it works perfectly fine though because I experienced that it cuts frames from the input pictures that have nothing to do with face images. )
I wonder if tensorflow API can be used for face detection. I failed finding relevant information but hoping that maybe an experienced person in the field can guide me on this subject. Can tensorflow's object detection API be used for face detection as well in the same way as Open CV does? (I mean, you just call the API function and it gives you the face image from the given input image.)
You can, but some work is needed.
First, take a look at the object detection README. There are some useful articles you should follow. Specifically: (1) Configuring an object detection pipeline, (3) Preparing inputs and (3) Running locally. You should start with an existing architecture with a pre-trained model. Pretrained models can be found in Model Zoo, and their corresponding configuration files can be found here.
The most common pre-trained models in Model Zoo are on COCO dataset. Unfortunately this dataset doesn't contain face as a class (but does contain person).
Instead, you can start with a pre-trained model on Open Images, such as faster_rcnn_inception_resnet_v2_atrous_oid, which does contain face as a class.
Note that this model is larger and slower than common architectures used on COCO dataset, such as SSDLite over MobileNetV1/V2. This is because Open Images has a lot more classes than COCO, and therefore a well working model need to be much more expressive in order to be able to distinguish between the large amount of classes and localizing them correctly.
Since you only want face detection, you can try the following two options:
If you're okay with a slower model which will probably result in better performance, start with faster_rcnn_inception_resnet_v2_atrous_oid, and you can only slightly fine-tune the model on the single class of face.
If you want a faster model, you should probably start with something like SSDLite-MobileNetV2 pre-trained on COCO, but then fine-tune it on the class of face from a different dataset, such as your own or the face subset of Open Images.
Note that the fact that the pre-trained model isn't trained on faces doesn't mean you can't fine-tune it to be, but rather that it might take more fine-tuning than a pre-trained model which was pre-trained on faces as well.
just increase the shape of the input, I tried and it's work much better

How to build tensorflow object detection model for custom classes and also include the 90 classess the SSD Mobilenet model contains

I am building a new tensorflow model based off of SSD V1 coco model in order to perform real time object detection in a video but i m trying to find if there is a way to build a model where I can add a new class to the existing model so that my model has all those 90 classes available in SSD MOBILENET COCO v1 model and also contains the new classes that i want to classify.
For example, I have created training data for two classes: man, woman
Now, I built a new tensorflow model that identifies a man and/or woman in a video. However, my model does not have the other 90 classes present in original SSD Mobilenet model. I am looking for a way to concatenate both models or pass more than one model to my code to detect the objects.
If you have any questions or if I am not clear, please feel free to probe me further.
The only way i find is you need to get dataset of SSD Mobilenet model on which it was trained.
Make sure all the images are present in one directory and annotations in another directory.
We should have a corresponding annotation file for each image file
ex: myimage.jpg and myimage.xml
If all the images of your customed dataset are of same formate with SSD Mobilenet model then annotate it with a tool called LabelImg.
Add that images and annotated files to respective images and annotations directory where we have already saved SSD Mobilenet.
Try regenerate new TFrecord and continue with remaining procedure on it.
You can use transfer learning with Tensorflow API.
Transfer learning allows you to load re-trained network and modify the fully connected layer by introducing your classes.
There is full description for this in the following references:
Codelab
A good explanation here
Tensorflow API here for more details
Also you can use google cloud platform for better and faster results:
I wish this helps you.
I don't think there is a way you can add your classes to the existing 90 classes without using the dataset it is previously trained with. Your only way is to use that dataset plus your own and retrain the model.