ML.NET prediction speed improvement for object detection - object-detection

I've created a ONNX model for Object Detection with Visual Studio and ML Model Builder, using VOTT to define the 4 objects I want to detect.
I'm testing the model as explained in the tutorial, and it works well, result is ok:
var sampleData = new MLModel1.ModelInput()
{
ImageSource = #"C:\Data\sample1.jpg",
};
//Load model and predict output
var result = MLModel1.Predict(sampleData);
Problem is it takes 5 seconds (10 seconds on first run, 5 on the following ones).
sample.jpg is a 700x400 pixels image, 85kb, the computer is a Intel i7 2.9GHz.
Why it's so slow? Am I doing something wrong or this is the speed I should expect?
Here's the image, the objects to detect are the REF, LOT, the hourglass icon and the factory icon.
Is there any other technique I could use to have a faster detection of these objects?
Thanks

Related

Keras model in Tensorflow.js: good predictions on images but awful on video?

I have converted a custom Keras model to layersModel for Tensorflow.js. I tested the model by uploading an image and calling the prediction after upload was done. Snippet for prediction:
let img = document.getElementById('image')
let offset = tf.scalar(255)
let tensorImg = tf.browser.fromPixels(img).resizeNearestNeighbor([224,224]).toFloat().expandDims();
let tensorImg_scaled = tensorImg.div(offset)
prediction = await model.predict(tensorImg_scaled).data();
With this code, my predictions follow the original model, confidence values changing constantly like they should. However my intention is to analyze webcam feed every second. A function including this code is called every second:
const video = document.querySelector("video");
let offset = tf.scalar(255)
let tensorImg = tf.browser.fromPixels(video).resizeNearestNeighbor([224,224]).toFloat().expandDims();
let tensorImg_scaled = tensorImg.div(offset)
prediction = await model.predict(tensorImg_scaled).data();
With video I get awful results where the prediction is always something like Float32Array(3) [6.18722574920633e-16, 1, 3.5979095258653615e-8] - the middle confidence value always being 1 or 0,9999.
What could be the problem here? Calling the video prediction snippet more seldom - like every 5 seconds - does not help.
Any help with video predictions is super appreciated - it is a final project to uni and the panic starts to creep in... Many thanks!
Even though video is technically made of individual frames it has one important thing which is that those frames exist as a sequence of frames. Your model is not performing well because you trained it to do well on a single frame at a time. When dealing with video data you should be using a CONV(for spatial features) and then a LSTM(for temporal features).
In your case what you could do is implement a rolling prediction over K predictions i.e., the prediction at a frame is the average prediction over a certain number of predictions.

min_scale and max_scale in the model config of Tensorflow Object Detection API

In the Tensorflow object detection API have the model config files for training, this config file has min_scale and max_scale for detection object that are set to 0.2 and 0.95 respectively by default,
I have some question about these parameters:
These params are for detecting the size of objects?
If we set the input size of network=300x300 and min_scale=0.2, then the network is not able to detect the objects that have size smaller than 300x0.2 = 60 pixels?
As far as you know, the ssd_mobilenet_v2_coco has the problem for detecting the small objects, If we set the min_scale = 0.05 and train the network on small objects with the same model, Is it possible to detect small objects with size 300x0.05 = 15 pixels?
These params are for detecting the size of objects?
Well, yes and no. Those parameters are inside the ssd_anchor_generator definition, which is itself an anchor_generator. That part of the system takes care of providing some anchor boxes for the further box prediction.
If we set the input size of network=300x300 and min_scale=0.2, then the network is not able to detect the objects that have size smaller than 300x0.2 = 60 pixels?
No. The size of a detectable object is not just related to the min_scale (which only affects anchor generation), but instead is affected by, for example, data the network was trained on, network depth, etc.
As far as you know, the ssd_mobilenet_v2_coco has the problem for detecting the small objects, If we set the min_scale = 0.05 and train the network on small objects with the same model, Is it possible to detect small objects with size 300x0.05 = 15 pixels?
Maybe? That depends entirely on your data. Modifying the min_scale parameter might help (and indeed it might make sense to select another range for those parameters), but experimentation with your data is necessary.

TF Api Dataset: initialization

The tf.dataset works really greate, I was able to speed up learning ~2x. But I have still performance problem, the utilization of GPU is low (despite using tf.dataset with several workers).
My use case is following:
~400 of training examples, each have 10 input channels (take ~5GB)
The task is segmentation using ResNet50. The forward-backward take ~0.15s. Batch size = 32
The data loading is fast, take ~0.06s.
But after one epoch (400/32 ~= 13 iteration), the data loading take ~3.5 seconds, same like initialization of loader (it is more than processing all epoch). This make learning very slow.
My question is: is there are option to eliminate initialization after each epoch, just continuously feed the data ?
I was trying to set dataset.repeat(10) but it does no help.
The loading code and train is here: https://gist.github.com/melgor/0e681a4fe8f125d25573aa30d8ace5f3
The model is just ResNet transformed to Ecnoder-Decoder idea for image segmentation. The most of the code is taken from https://github.com/argman/EAST, but as here loading is very slow, I would like to transform it to TfRecords.
I partly resolve my problem with long initialization. I just make tge tfrecord file smaller.
In my base implementation I used raw string as images (so string from numpy array). The new 'tfrecord' contain compressed images using jpeg or png. Thanks to that it make the file 50x smaller what make initialization much faster. But there is also the cons of it: your images need to be uini8 (jpeg) or uint16 (png). In case of float, you can use uint16 but there will loss of information.
For encoding numpy array to compressed sting you can use Tensorflow itself:
encoded_jpeg = tf.image.encode_jpeg(tf.constant(img),format='rgb').eval(session=sess)
encoded_png = tf.image.encode_png(tf.constant(png_image)).eval(session=sess)

Detection of multiple objects (using OpenCV)

I want to find multiple objects in a scene (objects look the same, but may differ in scale, and rotation and I don´t know what the object to be detected will be). I have implemented the following idea, based on the featuredetectors in OpenCV, which works:
detect and compute keypoints from the object
for i < max_objects_todetect; i++
1. detect and compute keypoints from the whole scene
2. match scene and object keypoints with Flannmatcher
3. use findHomography/Ransac to compute the boundingbox of the first object (object which hast the most keypoints in the scene with multiple objects)
4. set the pixel in the scene, which are within the computed boundingbox to 0, -> in the next loopcycle there are no keypoints for this object to detect anymore.
The Problem with this implementation is that I need to compute the keypoints for the scene multiple times which needs alot of computing time (250ms). Does anyone has a better idea for detecting multiple objects?
Thanks Drian
Hello togehter I tried ORB which is indeed faster and I will try Akaze.
While testing ORB I have encouterd following problem:
While chaning the size of my picture doesn´t affect the detected keypoints in Surf (finds the same keypoints in the small and in the big picture (in the linked picture right)), it affects the keypoints detected by ORB. In the small picute i´m not able to find these keypoints. I tried to experiment with the ORB parameters but couldn´t make it work.
Picture: http://www.fotos-hochladen.net/view/bildermaf6d3zt.png
SURF:
cv::Ptr<cv::xfeatures2d::SURF> detector = cv::xfeatures2d::SURF::create(100);
ORB:
cv::Ptr<cv::ORB> detector = cv::ORB::create( 1500, 1.05f,16, 31, 0, 2, ORB::HARRIS_SCORE, 2, 10);
Do you know if, and how it´s possible to detect the same keypoints independent of the size of the pictures?
Greetings Drian

How to count objects detected in an image using Tensorflow?

I have trained my custom object detector using faster_rcnn_inception_v2, tested it using object_detection_tutorial.ipynb and it works perfect, I can find bounding boxes for the objects inside the test image, my problem is how can I actually count the number of those bounding boxes or simply I want to count the number of objects detected for each class.
Because of low reputations I can not comment.
As far as I know the object detection API unfortunately has no built-in function for this.
You have to write this function by yourself. I assume you run the eval.py for evaluation!? To access the individual detected objects for each image you have to follow the following chain of scripts:
eval.py -> evaluator.py ->object_detection_evaluation.py -> per_image_evaluation.py
In the last script you can count the detected objects and bounding boxes per image. You just have to save the numbers and sum them up over your entire dataset.
Does this already help you?
I solved this using Tensorflow Object Counting API. We have an example of counting objects in an image using single_image_object_counting.py. I just replaced ssd_mobilenet_v1_coco_2017_11_17 with my own model containing inference graph
input_video = "image.jpg"
detection_graph, category_index = backbone.set_model(MODEL_DIR)