What is the difference between an Estimator and a Regressor in TensorFlow tf.estimator? - tensorflow

TensorFlow Estimators (tf.estimator) are "high level tools for working with models". Instead of defining your own DNN classifier layer by layer there is already an estimator for that.
Everyone knows the difference between classifying features with a label and estimating a numeric value. However, what is the difference between a Regressor and an Estimator? For example DNNEstimator and DNNRegressor. The description says
DNNEstimator - An estimator for TensorFlow DNN models with user-specified head.
DNNRegressor - A regressor for TensorFlow DNN models.
A "head" in turn is an "interface for the head/top of a model". Would it be fair to say that DNNRegressor is a DNNEstimator with a RegressionHead or what is an Estimator?

Related

SSD Inception v2. Is the VGG16 feature extractor replaced by the Inception v2?

In the original SSD paper they used a VGG16 network to the the feature extraction. I am using the SSD Inception v2 model from the TensorFlow model zoo and I do not know what the difference in architecture is. This stack overflow post suggest that for other models like SSD MobileNet the VGG16 feature extractor is replaced by the MobileNet feature extractor.
I thought this would be the same case here with the SSD Inception but this paper has me confused. From here it seems that the Inception is added to the SSD part of the model and the VGG16 feature extractor remains in the beginning of the architecture.
What is the architecture of the SSD Inception v2 model?
In tensorflow object detection api, the ssd_inception_v2 model uses inception_v2 as the feature extractor, namely, the vgg16 part in the first figure (figure (a)) is replaced with inception_v2.
In ssd models, the feature layer extracted by feature extractor (i.e. vgg16, inception_v2, mobilenet) will be further processed to produce extra feature layers of different resolutions. In the above figure (a), there are 6 output feature layers, the first two (19x19) are directly taken from the feature extractor. How are the other 4 layers (10x10, 5x5, 3x3, 1x1) generated?
They are generated by extra convolutional operations (these conv operations are sort of like using very shallow feature extractors, aren't they?). The implementation details are here provided with good documents. In the documentation it says
Note that the current implementation only supports generating new layers
using convolutions of stride 2 (resulting in a spatial resolution reduction
by a factor of 2)
that is how the extra feature map decreases by a factor of 2, and if you read the function multi_resolution_feature_maps, you will find slim.conv2d operations being used, which indicates these extra layers are obtained with extra convolution layer (just one layer each!).
Now we can explain what is improved in the paper you linked. They proposed to replace the extra feature layers with inception block. There is no inception_v2 model but simply a inception block. The paper reported improving classification accuracy by using inception block.
Now it should be clear to the question, ssd model with vgg16, inceptioin_v2 or mobilenet are alright but the inception in the paper only refers to a inception block, not the inception network.

Can I add Tensorflow Fake Quantization in a Keras sequential model?

I have searched this for a while, but it seems Keras only has quantization feature after the model is trained. I wish to add Tensorflow fake quantization to my Keras sequential model. According to Tensorflow's doc, I need these two functions to do fake quantization: tf.contrib.quantize.create_training_graph() and tf.contrib.quantize.create_eval_graph().
My question is has anyone managed to add these two functions in a Keras model? If yes, where should these two function be added? For example, before model.compile or after model.fit or somewhere else? Thanks in advance.
I worked around by post-training quantization. Since my final goal is to train a mdoel for mobile device, instead of fake quantization during training, I exported keras .h5 file and converted to Tenforflow lite .tflite file directly (with post_training_quantize flag set to true). I tested this on a simple cifar-10 model. The original keras model and the quantized tflite model have very close accuracy (the quantized one a bit lower).
Post-training quantization: https://www.tensorflow.org/performance/post_training_quantization
Convert Keras model to tensorflow lite: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/toco/g3doc/python_api.md
Used the tf-nightly tensorflow here: https://pypi.org/project/tf-nightly/
If you still want to do fake quantization (because for some model, post-training quantization may give poor accuracy according to Google), the original webpage is down last week. But you can find it from github: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/quantize
Update: Turns out post-quantization does not really quantize the model. During inference, it still uses float32 kernels to do calculations. Thus, I've switched to quantization-aware training. The accuracy is pretty good for my cifar10 model.

Image learning network other than gan series

What is the famous network that I can use in the network that inputs and outputs images of the same size of deep learning that can be used with tesorflow other than gan series?
There are plenty of network architectures which can perform auto-encoding / image-to-image translation, e.g. ResNet 1, UNet 2, Variational Auto-Encoders (VAEs) 3, etc.
Note that any of those architectures can be used for GANs generators. GAN methods can be seen as a slightly more complexed way to train these generative networks, using an adversarial loss along the generative one.
Some Resources:
Github - tensorflow/models/resnet
Github - jakeret/tf_unet
Article - Building Variational Auto-Encoders in TensorFlow by Danijar Hafner

Why use keras as backend instead of using tensorflow?

I see that there are many similar functions between tensorflow and keras like argmax, boolean_mask...I wonder why people have to use keras as backend along with tensorflow instead of using tensorflow alone.
Keras is not a backend, but it is a high-level API for building and training Neural Networks. Keras is capable of running on top of Tensorflow, Theano and CNTK. Most of the people prefer Keras due to its simplicity compared to other libraries like Tensorflow. I recommend Keras for beginners in Deep Learning.
A Keras tensor is a tensor object from the underlying backend (Theano,
TensorFlow or CNTK), which we augment with certain attributes that
allow us to build a Keras model just by knowing the inputs and outputs
of the model.
Theano vs Tensorflow
Tensorflow is necessary if you wish to use coremltools. Apple has promised support for architectures created using Theano but I haven't seen it yet.
Keras will require unique syntax sugar depending on the backend in use. I like the flexibility of Tensorflow input layers and easy-access to strong Google neural networks.

Tensorflow SSD-Mobilenet model accuracy drop after quantization using transform_graph

I am working on the recently released "SSD-Mobilenet" model by google for object detection.
Model downloaded from following location: https://github.com/tensorflow/models/blob/master/object_detection/g3doc/detection_model_zoo.md
The frozen graph file downloaded from the site is working as expected, however after quantization the accuracy drops significantly (mostly random predictions).
I built tensorflow r1.2 from source, and used following method to quantize:
bazel-bin/tensorflow/tools/graph_transforms/transform_graph --in_graph=frozen_inference_graph.pb --out_graph=optimized_graph.pb --inputs='image_tensor' --outputs='detection_boxes','detection_scores','detection_classes','num_detections' --transforms='add_default_attributes strip_unused_nodes(type=float, shape="1,224,224,3") fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms quantize_weights strip_unused_nodes sort_by_execution_order'
I tried various combinations in the "transforms" part, and the transforms mentioned above gave sometimes correct predictions, however no where close to the original model.
Is there any other way to improve performance of the quantized model?
In this case SSD uses mobilenet as it's feature extractor . In-order to increase the speed. If you read the mobilenet paper , it's a lightweight convolutional neural nets specially using separable convolution inroder to reduce parameters .
As I understood separable convolution can loose information because of the channel wise convolution.
So when quantifying a graph according to TF implementation it makes 16 bits ops and weights to 8bits . If you read the tutorial in TF for quantization they clearly have mentioned how this operation is more like adding some noise in to already trained net hoping our model has well generalized .
So this will work really well and almost lossless interms of accuracy for a heavy model like inception , resnet etc. But with the lightness and simplicity of ssd with mobilenet it really can make a accuracy loss .
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
How to Quantize Neural Networks with TensorFlow