Normalization in the Object Detection API - tensorflow

I'm quite confused of the normalization process when using the object detection api.
I'm using the SSD MobileNet v2 320x320 from the model zoo. In the pipeline config used for training I don't specify any additional preprocessing steps besides what is already defined my default.
The inference of the model from the checkpoint files works fine. However the tflite inference only seems to work if I add normalization before feeding the image to the net. I use the following line for this:
image = (image-127.5)/127.5
But I don't get why the preprocessing is helping when I didn't use it while training. The documentation also says that normalization only needs to be added for the inference when it was used for training.
What am I missing? Is there any preprocessing done in training by default that is not in the defined pipeline? If so I couldn't find it.

I found it!
In the pipeline config a feature extractor is defined.
feature_extractor {
type: "ssd_mobilenet_v2_keras"
depth_multiplier: 1.0
...
}
This uses this feature extractor. In this extractor the following preprocessing function is defined:
def preprocess(self, resized_inputs):
"""SSD preprocessing.
Maps pixel values to the range [-1, 1].
Args:
resized_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
Returns:
preprocessed_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
"""
return (2.0 / 255.0) * resized_inputs - 1.0
If you do the math you'll see that this is exactly the same as
image = (image-127.5)/127.5
just formatted in a different way.

Related

Are masks in Tensorflow automatically consumed by loss and metric?

This answer says:
If there's a mask in your model, it'll be propagated layer-by-layer
and eventually applied to the loss. So if you're padding and masking
the sequences in a correct way, the loss on the padding placeholders
would be ignored.
However in TensorFlow's tutorial on Transformers, the author has implemented custom loss and metric where masks are computed and applied internally. Is this necessary?
Note in the code of the Transformer model, the author has deleted the keras mask:
....
....
try:
# Drop the keras mask, so it doesn't scale the losses/metrics.
# b/250038731
del logits._keras_mask
except AttributeError:
pass
# Return the final output and the attention weights.
return logits
Do we need to implement a custom loss and metric with mask, or we can use the built-in ones?

Different results in TfLite model vs model before quantization

I have taken Object Detection model from TF zoo v2,
I took mobilenet and trained it on my own TFrecords
I am using mobilenet because it is often found in the examples of converting it to Tflite and this is what I need because I run it on RPi3.
I am following ideas from the official example from Sagemaker docs
and github you can find here
What is interesting the accuracy done after step 2) training and 3) deploying is pretty nice! My trucks are discovered nicely with the custom trained model.
However, when converted to tflite the accuracy goes down no matter if I use tfliteconvert tool or using python tf.lite.Converter.
What is more, all detections are on borders of images, and usually in the bottom-right corner. Maybe I am not preparing images correctly? Or some misunderstanding of results?
You can check images I uploaded.
https://ibb.co/fSzfZvz
https://ibb.co/0GF101s
What could possibly go wrong?
I was lacking proper preprocessing of image.
After I have used pipeline config to build detection object which has preprocess function I utilized to build tensor before feeding it into Interpreter.
num_classes = 2
configs = config_util.get_configs_from_pipeline_file(pipeline_config)
model_config = configs['model']
model_config.ssd.num_classes = num_classes
model_config.ssd.freeze_batchnorm = True
detection_model = model_builder.build(
model_config=model_config, is_training=True)

Keras: Custom loss function with training data not directly related to model

I am trying to convert my CNN written with tensorflow layers to use the keras api in tensorflow (I am using the keras api provided by TF 1.x), and am having issue writing a custom loss function, to train the model.
According to this guide, when defining a loss function it expects the arguments (y_true, y_pred)
https://www.tensorflow.org/guide/keras/train_and_evaluate#custom_losses
def basic_loss_function(y_true, y_pred):
return ...
However, in every example I have seen, y_true is somehow directly related to the model (in the simple case it is the output of the network). In my problem, this is not the case. How do implement this if my loss function depends on some training data that is unrelated to the tensors of the model?
To be concrete, here is my problem:
I am trying to learn an image embedding trained on pairs of images. My training data includes image pairs and annotations of matching points between the image pairs (image coordinates). The input feature is only the image pairs, and the network is trained in a siamese configuration.
I am able to implement this successfully with tensorflow layers and train it sucesfully with tensorflow estimators.
My current implementations builds a tf Dataset from a large database of tf Records, where the features is a dictionary containing the images and arrays of matching points. Before I could easily feed these arrays of image coordinates to the loss function, but here it is unclear how to do so.
There is a hack I often use that is to calculate the loss within the model, by means of Lambda layers. (When the loss is independent from the true data, for instance, and the model doesn't really have an output to be compared)
In a functional API model:
def loss_calc(x):
loss_input_1, loss_input_2 = x #arbirtray inputs, you choose
#according to what you gave to the Lambda layer
#here you use some external data that doesn't relate to the samples
externalData = K.constant(external_numpy_data)
#calculate the loss
return the loss
Using the outputs of the model itself (the tensor(s) that are used in your loss)
loss = Lambda(loss_calc)([model_output_1, model_output_2])
Create the model outputting the loss instead of the outputs:
model = Model(inputs, loss)
Create a dummy keras loss function for compilation:
def dummy_loss(y_true, y_pred):
return y_pred #where y_pred is the loss itself, the output of the model above
model.compile(loss = dummy_loss, ....)
Use any dummy array correctly sized regarding number of samples for training, it will be ignored:
model.fit(your_inputs, np.zeros((number_of_samples,)), ...)
Another way of doing it, is using a custom training loop.
This is much more work, though.
Although you're using TF1, you can still turn eager execution on at the very beginning of your code and do stuff like it's done in TF2. (tf.enable_eager_execution())
Follow the tutorial for custom training loops: https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough
Here, you calculate the gradients yourself, of any result regarding whatever you want. This means you don't need to follow Keras standards of training.
Finally, you can use the approach you suggested of model.add_loss.
In this case, you calculate the loss exaclty the same way I did in the first answer. And pass this loss tensor to add_loss.
You can probably compile a model with loss=None then (not sure), because you're going to use other losses, not the standard one.
In this case, your model's output will probably be None too, and you should fit with y=None.

Mask R-CNN for TPU on Google Colab

We are trying to build an image segmentation deep learning model using Google Colab TPU. Our model is Mask R-CNN.
TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']
import tensorflow as tf
tpu_model = tf.contrib.tpu.keras_to_tpu_model(
model.keras_model,
strategy=tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))
However I am running into issues while converting our Mask R-CNN model to TPU model as pasted below.
ValueError:
Layer <keras.engine.topology.InputLayer object at 0x7f58574f1940> has a
variable shape in a non-batch dimension. TPU models must
have constant shapes for all operations.
You may have to specify `input_length` for RNN/TimeDistributed layers.
Layer: <keras.engine.topology.InputLayer object at 0x7f58574f1940>
Input shape: (None, None, None, 3)
Output shape: (None, None, None, 3)
Appreciate any help.
Google recently released a tutorial on getting Mask R-CNN going on their TPUs. For this, they are using an experimental model for Mask RCNN on Google's TPU github repository (under models/experimental/mask_rcnn). Looking through the code, it looks like they define the model with a fixed input size to overcome the issue you are seeing.
See below for more explanation:
As #aman2930 points out, the shape of your input tensor is not static. This won't work because Tensorflow compiles models with XLA to use a TPU and XLA must have all tensor shapes defined at compile time. In the link above, the website specifically calls this out:
Static shapes
During regular usage TensorFlow attempts to determine the shapes of each tf.Tensor during graph construction. During
execution any unknown shape dimensions are determined dynamically, see
Tensor Shapes for more details.
To run on Cloud TPUs TensorFlow models are compiled using XLA. XLA
uses a similar system for determining shapes at compile time. XLA
requires that all tensor dimensions be statically defined at compile
time. All shapes must evaluate to a constant, and not depend on
external data, or stateful operations like variables or a random
number generator.
That side, further down the document, they mention that the input function is run on the CPU, so isn't limited by static XLA sizes. They point to batch size being the issue, not image size:
Static shapes and batch size
The input pipeline generated by your
input_fn is run on CPU. So it is mostly free from the strict static
shape requirements imposed by the XLA/TPU environment. The one
requirement is that the batches of data fed from your input pipeline
to the TPU have a static shape, as determined by the standard
TensorFlow shape inference algorithm. Intermediate tensors are free to
have a dynamic shapes. If shape inference has failed, but the shape is
known it is possible to impose the correct shape using tf.set_shape().
So you could fix this by reformulating your model to have fixed batch size or to use
tf.contrib.data.batch_and_drop_remainder as they suggest.
Could you please share the input data function. It is hard to tell the exact issue, but it seems that the shape of tensor representing input sample is not static.

How to retrain inception-v1 model?

I have successfully gone through the official tutorial, which explains how to retrain inception-v3 model and later successfully retrained the same model o train the model for specific purposes.
The model, however, is complex and slow compared to other, simpler models, such as inception-v1 which accuracy is good enough for some tasks. Specifically, I would like to retrain the model to use it on Android and ideally the performance in terms of speed should be comparable to original TensorFlow Android demo. Anyway, I tried to retrain the inception-v1 model from this link with following modifications in retrain.py:
BOTTLENECK_TENSOR_NAME = 'avgpool0/reshape:0'
BOTTLENECK_TENSOR_SIZE = 2048
MODEL_INPUT_WIDTH = 224
MODEL_INPUT_HEIGHT = 224
MODEL_INPUT_DEPTH = 3
JPEG_DATA_TENSOR_NAME = 'input'
RESIZED_INPUT_TENSOR_NAME = 'input'
As opposed to inception v3, inception v1 does not have any decodeJpeg or resize nodes:
inception v3 nodes:
DecodeJpeg/contents
DecodeJpeg
Cast
ExpandDims/dim
ExpandDims
ResizeBilinear/size
ResizeBilinear
...
pool_3
pool_3/_reshape/shape
pool_3/_reshape
softmax/weights
softmax/biases
softmax/logits/MatMul
softmax/logits
softmax
inception v1 nodes:
input
conv2d0_w
conv2d0_b
conv2d1_w
conv2d1_b
conv2d2_w
conv2d2_b
...
softmax1_pre_activation
softmax1
avgpool0/reshape/shape
avgpool0/reshape
softmax2_pre_activation/matmul
softmax2_pre_activation
softmax2
output
output1
output2
so I guess the images have to be reshaped before being fed into the graph.
Right now the error occurs when hitting the following function:
def run_bottleneck_on_image(sess, image_data, image_data_tensor,
bottleneck_tensor):
"""Runs inference on an image to extract the 'bottleneck' summary layer.
Args:
sess: Current active TensorFlow Session.
image_data: Numpy array of image data.
image_data_tensor: Input data layer in the graph.
bottleneck_tensor: Layer before the final softmax.
Returns:
Numpy array of bottleneck values.
"""
bottleneck_values = sess.run(
bottleneck_tensor,
{image_data_tensor: image_data})
bottleneck_values = np.squeeze(bottleneck_values)
return bottleneck_values
Error:
TypeError: Cannot interpret feed_dict key as Tensor: Can not convert a
Operation into a Tensor.
I guess the data on input node of inception v1 graph has to be reshaped to match the data after passing the following nodes in inception v3:
DecodeJpeg/contents
DecodeJpeg
Cast
ExpandDims/dim
ExpandDims
ResizeBilinear/size
ResizeBilinear
If anyone has already managed to retrain the inception v1 model or has an idea how to reshape the data in inception v1 case to match inception v3, I would be very thankful for any tips or suggestions.
Not sure if you have solved this or not but I am working on a similar problem.
I am trying to use a different model (not Inception-v1 or Inception-v3) with the Inception-v3 transfer learning tutorial. This post seems to be on the right track of remapping the input of the new model (in your case inception-v1) to play nice with the jpeg encoding used in the rest of the tutorial:
feeding image data in tensorflow for transfer learning
The only problem I am having is a error in my input saying "Cannot convert a tensor of type uint8 to an input type of float32" but this may at least put you on the right track.
Good Luck!
(For the ones who are still interested)
Bottleneck tensor size should be 1024 for inception-v1. For me, the following setup works with mentioned inception-v1 for this retrain script. No need for jpeg data tensor or else.
bottleneck_tensor_name = 'avgpool0/reshape:0'
bottleneck_tensor_size = 1024
input_width = 224
input_height = 224
input_depth = 3
resized_input_tensor_name = 'input:0'