how to combine SSD with tracker to accomplish multiple object tracking - tracking

how to combine object detection algorithm (e.g. SSD, YOLO, Faster RCNN and so on) with tracker(e.g. KCF TLD and so on) to accomplish multiple object tracking? Somebody could share concerned papers, blogs, wedsite or github code?? thank you very much

First of all, You only need a single combination of detection and tracking such as Yolo-KCF or SSD-KCF to track multiple objects.
You can use Darknet Yolo for detection:
https://github.com/pjreddie/darknet
and SORT for tracking:
https://github.com/abewley/sort
or combined:
https://github.com/Qidian213/deep_sort_yolov3
you can try the different combination with same rules.

One approach is to do object detection via the mentioned object detection architecture like YOLO, SSD and so on and pass those results to a tracker for object detection over multiple frames.
A simple introduction is here to find: https://www.pyimagesearch.com/2018/08/06/tracking-multiple-objects-with-opencv/

Related

Multi-label image classification vs. object detection

For my next TF2-based computer vision project I need to classify images to a pre-defined set of classes. However, multiple objects of different classes can occur on one such image. That sounds like an object detection task, so I guess I could go for that.
But: I don't need to know where on an image each of these objects are, I just need to know which classes of objects are visible on an image.
Now I am thinking which route I should take. I am in particular interested in a high accuracy/quality of the solution. So I would prefer the approach that leads to better results. Thus from your experience, should I still go for an object detector, even though I don't need to know the location of the detected objects on the image, or should I rather build an image classifier, which could output all the classes that are located on an image? Is this even an option, can a "normal" classifier output multiple classes?
Since you don't need the object localization, stick to classification only.
Although you will be tempted to use the standard off-the-shelf network of multi-class multi-label object detection because of its re-usability, but realize that you are asking the model to do more things. If you have tons of data - not a problem. Or if your objects are similar to the ones used in ImageNet/COCO etc, you can simply use standard off-the-shelf object detection architecture and fine-tune on your dataset.
However, if you have less data and you need to train from scratch (e.g. medical images, weird objects), then object detection will be an overkill and will give you inferior results.
Remember, most of the object detection networks re-cycle the classification architectures with modifications added to last layers to incorporate additional outputs for object detection coordinates. There is a loss function associated with those additional outputs. During training in order to get best loss value, some of the classification accuracy is compromised for the sake of getting better object localization coordinates. You don't need that compromise. So, you can modify the last layer of object detection network and remove the outputs for coordinates.
Again, all this hassle is worth only if you have less data and you really need to train from scratch.

How to train your own(w/o YOLO etc.) object detector in tf/keras

I successfully trained multi-classificator model, that was really easy with simple class related folder structure and keras.preprocessing.image.ImageDataGenerator with flow_from_directory (no one-hot encoding by hand btw!) after i just compile fit and evaluate - extremely well done pipeline by Keras!
BUT! when i decided to make my own (not cats, not dogs, not you_named) object detector - this is became a nightmare...
TFRecord and tf.Example are just madness! but ok, i almost get it (my dataset is small, i have plenty of ram, but who cares, write f. boilerplate, so much meh...)
The main thing - i just can't find any docs/tutorial how to make it with plain simple tf/keras, everyone just want to build up it on top of someone model, YOLO SSD FRCNN, even if they trying to detect completely new objects!!!
There two links about OD in official docs, and they both using some models underneath.
So my main question WHY ??? or i just blind..? -__-
It becomes a nightmare because Object Detection is way way harder than classification. The most simple object detector is this: first train a classifier on all your objects. Then when you want to detect objects in your image, slide a window over your image, and classify each window. Then, if your classifier is certain that a certain window is one of the objects, mark it as a successful detection.
But this approach has a lot of problems, mainly it's way (like waaaay) too slow. So, researcher improved it and invented RCNNs. That had it problems, so they invented Faster-RCNN, YOLO and SSD, all to make it faster and more accurate.
You won't find any tutorials online on how to implement the sliding window technique because it's not useful anyway, and you won't find any tutorials on how to implement the more advanced stuff because, well, the networks get complicated pretty quick.
Also note that using YOLO doesn't mean you should use the same weights as in YOLO. You can always train YOLO from scratch on your own data if you want by randomly initiliazing all the weights in the network layers. So the even if they trying to detect completely new objects!!! you mentioned isn't really valid. Also also note that I still would advise you to do use the weights they used in Yolo network. Transfer Learning is generally looked at as being a good idea, especially when starting out and especially in the image processing world, as many images share common features (like edges, for example).
I am having pretty much the same problem as my images are B/W diagrams, quite different from regular pictures, I want to train a custom model on just only diagrams.
I have found this documentation section in Tensorflow models repo:
https://github.com/tensorflow/models/blob/master/research/object_detection/README.md
It has a couple of sections explaining how to bring your own model and dataset in "extras" that could be a starting point.

how to use tensorflow object detection API for face detection

Open CV provides a simple API to detect and extract faces from given images. ( I do not think it works perfectly fine though because I experienced that it cuts frames from the input pictures that have nothing to do with face images. )
I wonder if tensorflow API can be used for face detection. I failed finding relevant information but hoping that maybe an experienced person in the field can guide me on this subject. Can tensorflow's object detection API be used for face detection as well in the same way as Open CV does? (I mean, you just call the API function and it gives you the face image from the given input image.)
You can, but some work is needed.
First, take a look at the object detection README. There are some useful articles you should follow. Specifically: (1) Configuring an object detection pipeline, (3) Preparing inputs and (3) Running locally. You should start with an existing architecture with a pre-trained model. Pretrained models can be found in Model Zoo, and their corresponding configuration files can be found here.
The most common pre-trained models in Model Zoo are on COCO dataset. Unfortunately this dataset doesn't contain face as a class (but does contain person).
Instead, you can start with a pre-trained model on Open Images, such as faster_rcnn_inception_resnet_v2_atrous_oid, which does contain face as a class.
Note that this model is larger and slower than common architectures used on COCO dataset, such as SSDLite over MobileNetV1/V2. This is because Open Images has a lot more classes than COCO, and therefore a well working model need to be much more expressive in order to be able to distinguish between the large amount of classes and localizing them correctly.
Since you only want face detection, you can try the following two options:
If you're okay with a slower model which will probably result in better performance, start with faster_rcnn_inception_resnet_v2_atrous_oid, and you can only slightly fine-tune the model on the single class of face.
If you want a faster model, you should probably start with something like SSDLite-MobileNetV2 pre-trained on COCO, but then fine-tune it on the class of face from a different dataset, such as your own or the face subset of Open Images.
Note that the fact that the pre-trained model isn't trained on faces doesn't mean you can't fine-tune it to be, but rather that it might take more fine-tuning than a pre-trained model which was pre-trained on faces as well.
just increase the shape of the input, I tried and it's work much better

tensorflow object detection: using more feature extractors with faster RCNN

I'm trying to perform object detection on a custom, relatively easy dataset (with ~30k samples). I've successfully used Faster_RCNN with Resnet101_v1 (final mAP 0.9) and inception_resnet_v2 feature extractors (training in progress). Now I would like my model to run faster but still keep good performance, so I'd like to compare the ones I have, with SSD running with various versions of mobile_net. However, to know which changes in performance come from SSD and which come from the feature extractor, I'd like to also try Faster-RCNN with mobile_nets. It's also possible that this yields the tradeoff I need between performance and inference time (faster RCNN being good and slow, and mobile_nets fast).
The original MobileNets paper mentions using it with Faster RCNN, and I guess they used the tensorflow model detection API, so maybe they've released the files to adapt MobileNets to Faster RCNN ?
How can I make mobile_nets compatible with Faster-RCNN?
In a nutshell, a MobileNet version of the Faster-RCNN Feature Extractor will need to be created. This is something we are looking at adding, but is not a current priority.
I am not an expert apparently, but as far as know, you cann't use mobilenets with faster_rcnn, mobilenets is based on yolo which is a different architecture from faster_rcnn.
Google released its Object Detection Model recently.
https://github.com/tensorflow/models/tree/master/object_detection
You can replace feature extractor easily with this API (Xception, Inception ResNet, DenseNet, or Mobile Net) with a current object detector.
There are two common parts in many Object Recognition Systems. The first part is feature extractor (extracting features such as edges, lines, colors from image input). The second part is Object Detector (Faster R-CNN, SSD, YOLOv2).

How can i detect and localize object using tensorflow and convolutional neural network?

My problem statement is as follows :
" Object Detection and Localization using Tensorflow and convolutional neural network "
What i did ?
I am done with the cat detection from images using tflearn library.I successfully trained a model using 25000 images of cats and its working fine with good accuracy.
Current Result :
What i wanted to do?
If my image consist of two or more than two objects in the same image for example cat and dog together so my result should be 'cat and dog' and apart from this i have to find the exact location of these two objects on the image(bounding box)
I came across many high level libraries like darknet , SSD but not able to get the concept behind it.
Please guide me about the approach to solve the problem.
Note : I am using supervised learning techniques.
Expected Result :
You have several ways to go about it.
The most straight forward way is to get some suggested bounding boxes using some bounding box suggestion algorithm like selective search and run on each on of the suggestion the classification net that you already trained. This approach is the approach taken by R-CNN.
For more advanced algorithm based on the above approach i suggest you read about Fast-R-CNN and Faster R-CNN.
Look at Object detection with R-CNN? for some basic explanation.
Darknet and SSD are based on a different approach if you want to undestand them you can read about them on
http://www.cs.unc.edu/~wliu/papers/ssd.pdf
https://pjreddie.com/media/files/papers/yolo.pdf
Image localization is a complex problem with many different implementations achieving the same result with different efficiency.
There are 2 main types of implementation
-Localize objects with regression
-Single Shot Detectors
Read this https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/object_localization_and_detection.html to get a better idea.
Cheers
I have done a similar project (detection + localization) on Indian Currencies using PyTorch and ResNet34. Following is the link of my kaggle notebook, hope you find it helpful. I have manually collected images from the internet and made bounding box around them and saved their annotation file (Pascal VOC) using "LabelImg" annotation tool.
https://www.kaggle.com/shweta2407/objectdetection-on-custom-dataset-resnet34