Benchmark YOLOv3,4,5 on VOC 2007 - object-detection

I want to compare R-CNN family vs YOLO family. I found the evaluation results of R-CNN, Fast R-CNN, Faster R-CNN, YOLOv1 and YOLOv2 on the Pascal VOC 2007 dataset, but I did not find the evaluation results of YOLOv3,4,5 on the Pascal VOC data set. Compare RCNN - YOLO
Where can I find the evaluation results of YOLOv3,4,5 on the VOC dataset?

Related

Tensorflow op: ParseExampleV2 is super slow

When using native tensorflow for training and prediction/serving, we found that the bottle neck of training/serving speed is on the op of ParseExampleV2 , i.e. the op of transforming a vector of tf.Example protos (as strings) into typed tensors. Time-spent for each tensorflow ops within the graph based on a Logistic Regression model
Is there any solution to that problem?

Lower validation accuracy on ImageNet when evaluating Keras pre-trained models

I want to work with Keras models pre-trained on ImageNet. The models and information about their performance are here.
I downloaded ILSVRC 2012 (ImageNet) dataset and evaluated ResNet50 on the validation dataset. The top-1 accuracy should be 0.749 but I get 0.68. The top-5 accuracy should be 0.921, mine is 0.884. I also tried VGG16 and MobileNet with similar discrepancies.
I preprocess the images using built-in preprocess_input function (e.g. tensorflow.keras.applications.resnet50.preprocess_input()).
My guess is that the dataset is different. How can I make sure that the validation dataset that I use for evaluation is the same as the one that was used by the authors? Could there be any other reason why I get different results?

TensorFlow and TFLite - loss of accuracy / runtime

Is there a general statement if or how much the accuracy and runtime at the inference decrease when using a TFLite model (.tflite) instead the original TensorFlow model (.h5)?
The simple answer is no. This really relies on how the model looks like along with what sort of optimization was applied on converting the H5 to tflite. But generally if you do not apply any sort of additional optimization but just performed the float32 conversion to tflite, I'd say most of the time tflite would generate the equivalent level of accuracy compared to the original model.

What is the difference between tensorflow inception and mobilenet

Recently i have been working with tensorflow inception V3 and mobileNet to deploy them for use in Android. While converting retrained model of inception V3 to "tflite" there some issues as the "tflite" model was empty, But when tried with retrained MobileNet model it was successfully converted into "tflite". So basically i have two questions
Is it possible to convert inception V3 retrained model to "tflite"?
What is the difference between inception V3 and MobileNet?
PS. I have gone through the official documentation link, which only hinted at mobileNet only being
https://www.tensorflow.org/tutorials/image_retraining#other_model_architectures
Yes both of the models can be converted to tflite format. For a step by step procedure please go through this link Convert to tflite.
The major difference between InceptionV3 and Mobilenet is that Mobilenet uses
Depthwise separable convolution while Inception V3 uses standard convolution.
This results into lesser number of parameters in MobileNet compared to InceptionV3. However, this results in slight decrease in the performance as well.
In a standard convolution the filter operates on the M channels of the input image all-together and outputs N feature maps i.e. the matrix multiplication between the input and filter is multidimensional. To make it clear take the filter as a cube of size Dk x Dk x M, then in standard convolution each element of the cube will multiply with the corresponding element in the input feature matrix and finally after the multiplication the feature maps will be added to output N feature maps.
However, in a depthwise separable convolution the M single channel filters will operate on a single cube in the input feature and once the M filter outputs are obtained a pointwise filter of size 1 x 1 x M will operate on it to give N output feature maps. This can be understood from the figure below from the MobileNet paper.
To make it more clear please go through the DataScienceLink.
They have a concrete example on how it reduces the parameters count which I am simply pasting here.
4

Float ops found in quantized TensorFlow MobileNet model

As you can see in the screenshot of a quantized MobileNet model implemented in TensorFlow, there are still some float operations. The quantization is done in TensorFlow via the graph_transform tools.
The red ellipse in the image has its description in the right-hand-size text box. The "depthwise" is a "DepthwiseConv2dNative" operation that expects "DT_FLOAT" inputs.
Despite the lower Relu6 performs an 8-bit quantized operation, the result has to go through "(Relu6)" which is a "Dequantize" op, in order to produce "DT_FLOAT" inputs for the depthwise convolution.
Why is depthwise conv operations left out by TF graph_transform tools? Thank you.
Unfortunately there isn't a quantized version of depthwise conv in standard TensorFlow, so it falls back to the float implementation with conversions before and after. For a full eight-bit implementation of MobileNet, you'll need to look at TensorFlow Lite, which you can learn more about here:
https://www.tensorflow.org/mobile/tflite/