CoreML - Image Classifier vs Object Detection - object-detection

I was wondering, which would be better for the following:
I want to create a model to help distinct car models, take the Mercedes C250 (2014) and Mercedes C63 (2014) as an example.
I understand, object helps to identify multiple well... objects in a given image, however, looking at a tutorial online and seeing how IBM cloud can allow you to annotate such specifics say the badge on the car, certain detailing etc. Would an object detection work better for me as opposed to just an image classifier?
I understand, the more data that is fed, the better the results, but in a general sense, what should be the approach? Image classifier or object detection? Or maybe something else? I've used and trained multiple image classifiers but I am not happy at all with the results.
Any help or suggestions would be much appreciated.

Object detection better because simple image classifier broke if you have more than one different cars at one photo.

Related

Which model should I use for object recognition on mobile devices?

I am working on a project where I need the app to recognize a special character, just one small graphic, from a photographed document. Something similar to the example from the picture. More specifically, the app would use this character to determine the corners of the document.
Something like this
Which model would be suitable for that, Mobile SSD, Yolo or something completely different? Approximately how many photos and how much time would it take me to successfully train the model to 90%+ detection? And is TensorFlow Model Maker a good option?
I already tried to train it with Model Maker but the results were really disappointing. I have used
efficientdet_lite0
model. The photos were taken with a phone in high resolution, tagged with labelImg. About 40 for training, five each for validation and test.
It would mean a lot to me if someone would tell me if I am at least on the right track. Thank you very much in advance.
efficientdet or yolo should be good enough for you use case. Yolov5 recommends 10000 annotated instances of each class for good results, this is given that the variability of the class representation is large. I would not start with anything less than a few hundred.

Is it possible to combine two different custom YOLOv4 models

I'm working on an object detection project where I have to identify the type of animals and their posture given an image/video. For this purpose, I have two custom YOLOv4 models which are trained separately. Model 1 identifies the type of animal and Model 2 identifies the posture of the animal. I have converted these models to TensorFlow models.
Now, since both the models use the same image/video as input, I want to combine the outputs of both the models and the final output should display the bounding box of both the models.
I'm stuck at this point, I have been researching the solution for this and I'm confused with various methods. Could anyone help me with this?
I don't think that you need object detection model as pose identifier - because you've already localized the animal by 1st net.
The easiest (and clearly not very accurate) solution that I see is to use classifier on top of detections (crop bounding box as input) - but in that case the animal anatomy is not taken into account explicitly, but that approach is I guess still good baseline.
For further experiments you can take a look at these and these solutions with animal pose estimation, but they are more complex to use

Is this the correct way of using YOLO for image classification in a custom project?

I'm a beginner in computer vision. Could anyone tell me whether what I'm considering to do is correct or not? I wanted to detect a certain cyst in teeth. So my dataset consists of a part of the dental x-ray that contains that cyst. I train my model with these pictures. The one with the colored area contains cyst (infected teeth), and the one below it is the uninfected teet.
Image with cyst
Uninfected teeth
After training my model, I want to use it on a full dental x-ray, and determine if this picture has the cyst or not. A full dental x-ray is shown below.
Full dental X-Ray
Does this work? Or I'm completely wrong?
Instead of treating this as an object detection problem, you would get far better results if you were to treat this as a classification problem.
There are already various architectures for such classification tasks.
There are various architectures in TensorFlow to get you started.
Take a look at this. If you have enough data you can train them from scratch instead of using pre-trained weights
Note - The architecture provided in TensorFlow will almost always give you better results than the architectures that you create.
Object detection is suitable for cases where you have well-defined objects. If you take a look at recently published research papers you can see that these types of problems are considered as classification problems instead of object detection problems.

How to train your own(w/o YOLO etc.) object detector in tf/keras

I successfully trained multi-classificator model, that was really easy with simple class related folder structure and keras.preprocessing.image.ImageDataGenerator with flow_from_directory (no one-hot encoding by hand btw!) after i just compile fit and evaluate - extremely well done pipeline by Keras!
BUT! when i decided to make my own (not cats, not dogs, not you_named) object detector - this is became a nightmare...
TFRecord and tf.Example are just madness! but ok, i almost get it (my dataset is small, i have plenty of ram, but who cares, write f. boilerplate, so much meh...)
The main thing - i just can't find any docs/tutorial how to make it with plain simple tf/keras, everyone just want to build up it on top of someone model, YOLO SSD FRCNN, even if they trying to detect completely new objects!!!
There two links about OD in official docs, and they both using some models underneath.
So my main question WHY ??? or i just blind..? -__-
It becomes a nightmare because Object Detection is way way harder than classification. The most simple object detector is this: first train a classifier on all your objects. Then when you want to detect objects in your image, slide a window over your image, and classify each window. Then, if your classifier is certain that a certain window is one of the objects, mark it as a successful detection.
But this approach has a lot of problems, mainly it's way (like waaaay) too slow. So, researcher improved it and invented RCNNs. That had it problems, so they invented Faster-RCNN, YOLO and SSD, all to make it faster and more accurate.
You won't find any tutorials online on how to implement the sliding window technique because it's not useful anyway, and you won't find any tutorials on how to implement the more advanced stuff because, well, the networks get complicated pretty quick.
Also note that using YOLO doesn't mean you should use the same weights as in YOLO. You can always train YOLO from scratch on your own data if you want by randomly initiliazing all the weights in the network layers. So the even if they trying to detect completely new objects!!! you mentioned isn't really valid. Also also note that I still would advise you to do use the weights they used in Yolo network. Transfer Learning is generally looked at as being a good idea, especially when starting out and especially in the image processing world, as many images share common features (like edges, for example).
I am having pretty much the same problem as my images are B/W diagrams, quite different from regular pictures, I want to train a custom model on just only diagrams.
I have found this documentation section in Tensorflow models repo:
https://github.com/tensorflow/models/blob/master/research/object_detection/README.md
It has a couple of sections explaining how to bring your own model and dataset in "extras" that could be a starting point.

Counting Pedestrians Using TensorFlow's Object Detection

I am new to machine learning field and based on what I have seen on youtube and read on internet I conjectured that it might be possible to count pedestrians in a video using tensorflow's object detection API.
Consequently, I did some research on tensorflow and read documentation about how to install tensorflow and then finally downloaded tensorflow and installed it. Using the sample files provided on github I adapted the code related to object_detection notebook provided here ->https://github.com/tensorflow/models/tree/master/research/object_detection.
I executed the adapted code on the videos that I collected while making changes to visualization_utils.py script so as to report number of objects that cross a defined region of interest on the screen. That is I collected bounding boxes dimensions (left,right,top, bottom) of person class and counted all the detection's that crossed the defined region of interest (imagine a set of two virtual vertical lines on video frame with left and right pixel value and then comparing detected bounding box's left & right values with predefined values). However, when I use this procedure I am missing on lot of pedestrians even though they are detected by the program. That is the program correctly classifies them as persons but sometimes they don't meet the criteria that I defined for counting and as such they are not counted. I want to know if there is a better way of counting unique pedestrians using the code rather than using the simplistic method that I am trying to develop. Is the approach that I am using the right one ? Could there be other better approaches ? Would appreciate any kind of help.
Please go easy on me as I am not a machine learning expert and just a novice.
You are using a pretrained model which is trained to identify people in general. I think you're saying that some people are pedestrians whereas some other people are not pedestrians, for example, someone standing waiting at the light is a pedestrian, but someone standing in their garden behind the street is not a pedestrian.
If I'm right, then you've reached the limitations of what you'll get with this model and you will probably have to train a model yourself to do what you want.
Since you're new to ML building your own dataset and training your own model probably sounds like a tall order, there's a learning curve to be sure. So I'll suggest the easiest way forward. That is, use the object detection model to identify people, then train a new binary classification model (about the easiest model to train) to identify if a particular person is a pedestrian or not (you will create a dataset of images and 1/0 values to identify them as pedestrian or not). I suggest this because a boolean classification model is about as easy a model as you can get and there are dozens of tutorials you can follow. Here's a good one:
https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/neural_network.ipynb
A few things to note when doing this:
When you build your dataset you will want a set of images, at least a few thousand along with the 1/0 classification for each (pedestrian or not pedestrian).
You will get much better results if you start with a model that is pretrained on imagenet than if you train it from scratch (though this might be a reasonable step-2 as it's an extra task). Especially if you only have a few thousand images to train it on.
Since your images will have multiple people in it you have a problem of identifying which person you want the model to classify as a pedestrian or not. There's no one right way to do this necessarily. If you have a yellow box surrounding the person the network may be successful in learning this notation. Another valid approach might be to remove the other people that were detected in the image by deleting them and leaving that area black. Centering on the target person may also be a reasonable approach.
My last bullet-point illustrates a problem with the idea as it's I've proposed it. The best solution would be to alter the object detection network to ouput both a bounding box per-person, and also a pedestrian/non pedestrian classification with it; or to only train the model to identify pedestrians, specifically, in the first place. I mention this as more optimal, but I consider it a more advanced task than my first suggestion, and a more complex dataset to manage. It's probably not the first thing you want to tackle as you learn your way around ML.