How does the object-detection methods really work? - tensorflow

I'm in a group project in school and we are using the tensorflow object-detection API in a Raspberry Pi 3 but do not know how the object detection methods, SSD (single shot detector) and CNN (convolutional neural network), works underneath.
Can someone give a simple yet non-trivial explanation on how SSD and CNN works and recommendations on possible factors that might optimize the speed of the object detection methods.
Please link us to good articles if you know any!

Related

In object detection with SSD Mobilenetv2 + FPN(TF2 Object detection API), what are the possible reasons for hard example mining to not work well?

I am using the Tensorflow Object Detection API to build a detection model. Hard example mining seemed to work really well with SSD+Mobilenetv2 model(used with the TF1 version of the API). However with similar settings in the TF2 version with FPN SSD+Mobilenetv2+FPN model, I achieve similar metrics for mAP on relevant category but see a lot more false positives in evaluation even after adding hard example mining. What could be the possible reasons for that, any other ways to reduce false positives?
Solution is to not use negative hard mining, but instead use focal loss. This is in line with the Focal Loss / RetinaNet paper : https://arxiv.org/pdf/1708.02002v2.pdf

Deep learning for computer vision: What after MNIST stage?

I am trying to explore computer vision using deep learning techniques. I have gone through basic literature, made a NN of my own to classify digits using MNIST data(without using any library like TF,Keras etc, and in the process understood concepts like loss function, optimization, backward propagation etc), and then also explored Fashion MNIST using TF Keras.
I applied my knowledge gained so far to solve a Kaggle problem(identifying a plant type), but results are not very encouraging.
So, what should be my next step in progress? What should I do to improve my knowledge and models to solve more complex problems? What more books, literature etc should I read to move ahead of beginner stage?
You should try hyperparameter tuning, it will help improve your model performance. Feel free to surf around various articles, fine tuning your model will be the next step as you have fundamental knowledge regarding how model works.

Tensorflow object detection API: How to isolate the backbone network?

I'm a user of Tensorflow Object Detection API. I use it a lot to train models like Faster-RCNN on my images. From what I understand, there is a backbone network (in my experiences, ResNet) used to extract features.
I would like to re-use the weight of this specific network, but when I save my model, it's a Faster RCNN model and, even after hours in the documentation and in the source files, I don't see how to isolate the weight of the backbone network.
Is something somebody else already has realized before ? Or is Tensorflow OD API not the right tool for what I need ?
Thank you for your help or advises !

Tensorflow on Raspberry pi for image processing

I want to use Raspberry pi for taking pictures and then process them through Tensorflow (train it to find that object in an image and count it) to count my objects for example. I couldn't find any example, do you know if this is possible? I know OpenCV maybe easier but do you know if this is possible with Tensorflow?
As far as I know you can't train a TensorFlow model on a Raspberry Pi, there simply isn't the processing power. However, you could train a TensorFlow model on a laptop/PC and then deploy the model on a Raspberry Pi to do object recognition.
Have a read of this blog post on Pyimagesearch, there are some really in depth tutorials on TensorFlow/Keras on Raspberry Pi.
https://www.pyimagesearch.com/2017/12/18/keras-deep-learning-raspberry-pi/
For future questions on SO try to have a go yourself and then post a question once you get stuck explaining what you've tried and any code you're using. You're more likely to learn more this way.

Real Time Object detection using TensorFlow

I have just started experimenting with Deep Learning and Computer Vision technologies. I came across this awesome tutorial. I have setup the TensorFlow environment using docker and trained my own sets of objects and it provided greater accuracy when I tested it out.
Now I want to make the same more real-time. For example, instead of giving an image of an object as the input, I want to utilize a webcam and make it recognize the object with the help of TensorFlow. Can you guys guide me with the right place to start with this work?
You may want to look at TensorFlow Serving so that you can decouple compute from sensors (and distribute the computation), or our C++ api. Beyond that, tensorflow was written emphasizing throughput rather than latency, so batch samples as much as you can. You don't need to run tensorflow at every frame, so input from a webcam should definitely be in the realm of possibilities. Making the network smaller, and buying better hardware are popular options.