TF object detection: return subset of inference payload - tensorflow

Problem
I'm working on training and deploying an instance segmentation model using TF's object detection API. I'm able to successfully train the model, package it into a TF Serving Docker image (latest tag as of Oct 2020), and process inference requests via the REST interface. However, the amount of data returned from an inference request is very large (hundreds of Mb). This is a big problem when the inference request and processing don't happen on the same machine because all that returned data has to go over the network.
Is there a way to trim down the number of outputs (either during model export or within the TF Serving image) so allow faster round trip times during inference?
Details
I'm using TF OD API (with TF2) to train a Mask RCNN model, which is a modified version of this config. I believe the full list of outputs is described in code here. The list of items I get during inference is also pasted below. For a model with 100 object proposals, that information is ~270 Mb if I just write the returned inference as json to disk.
inference_payload['outputs'].keys()
dict_keys(['detection_masks', 'rpn_features_to_crop', 'detection_anchor_indices', 'refined_box_encodings', 'final_anchors', 'mask_predictions', 'detection_classes', 'num_detections', 'rpn_box_predictor_features', 'class_predictions_with_background', 'proposal_boxes', 'raw_detection_boxes', 'rpn_box_encodings', 'box_classifier_features', 'raw_detection_scores', 'proposal_boxes_normalized', 'detection_multiclass_scores', 'anchors', 'num_proposals', 'detection_boxes', 'image_shape', 'rpn_objectness_predictions_with_background', 'detection_scores'])
I already encode the images within my inference requests as base64, so the request payload is not too large when going over the network. It's just that the inference response is gigantic in comparison. I only need 4 or 5 of the items out of this response, so it'd be great to exclude the rest and avoid passing such a large package of bits over the network.
Things I've tried
I've tried setting the score_threshold to a higher value during the export (code example here) to reduce the number of outputs. However, this seems to just threshold the detection_scores. All the extraneous inference information is still returned.
I also tried just manually excluding some of these inference outputs by adding the names of keys to remove here. That also didn't seem to have any effect, and I'm worried this is a bad idea because some of those keys might be needed during scoring/evaluation.
I also searched here and on tensorflow/models repo, but I wasn't able to find anything.

I was able to find a hacky workaround. In the export process (here), some of the components of the prediction dict are deleted. I added additional items to the non_tensor_predictions list, which contains all keys that will get removed during the postprocess step. Augmenting this list cut down my inference outputs from ~200MB to ~12MB.
Full code for the if self._number_of_stages == 3 block:
if self._number_of_stages == 3:
non_tensor_predictions = [
k for k, v in prediction_dict.items() if not isinstance(v, tf.Tensor)]
# Add additional keys to delete during postprocessing
non_tensor_predictions = non_tensor_predictions + ['raw_detection_scores', 'detection_multiclass_scores', 'anchors', 'rpn_objectness_predictions_with_background', 'detection_anchor_indices', 'refined_box_encodings', 'class_predictions_with_background', 'raw_detection_boxes', 'final_anchors', 'rpn_box_encodings', 'box_classifier_features']
for k in non_tensor_predictions:
tf.logging.info('Removing {0} from prediction_dict'.format(k))
prediction_dict.pop(k)
return prediction_dict
I think there's a more "proper" way to deal with this using signature definitions during the creation of the TF Serving image, but this worked for a quick and dirty fix.

I've ran into the same problem. In the exporter_main_v2 code there is stated that the outputs should be:
and the following output nodes returned by the model.postprocess(..):
* `num_detections`: Outputs float32 tensors of the form [batch]
that specifies the number of valid boxes per image in the batch.
* `detection_boxes`: Outputs float32 tensors of the form
[batch, num_boxes, 4] containing detected boxes.
* `detection_scores`: Outputs float32 tensors of the form
[batch, num_boxes] containing class scores for the detections.
* `detection_classes`: Outputs float32 tensors of the form
[batch, num_boxes] containing classes for the detections.
I've submitted an issue on the tensorflow object detection github repo, I hope we will get feedback from the tensorflow dev team.
The github issue can be found here

If you are using exporter_main_v2.py file to export your model, you can try this hack way to solve this problem.
Just add following codes in the function _run_inference_on_images of exporter_lib_v2.py file:
detections[classes_field] = (
tf.cast(detections[classes_field], tf.float32) + label_id_offset)
############# START ##########
ignored_model_output_names = ["raw_detection_boxes", "raw_detection_scores"]
for key in ignored_model_output_names:
if key in detections.keys(): del detections[key]
############# END ##########
for key, val in detections.items():
detections[key] = tf.cast(val, tf.float32)
Therefore, the generated model will not output the values of ignored_model_output_names.
Please let me know if this can solve your problem.

Another approach would be to alter the signatures of the saved model:
model = tf.saved_model.load(path.join("models", "efficientdet_d7_coco17_tpu-32", "saved_model"))
infer = model.signatures["serving_default"]
outputs = infer.structured_outputs
for o in ["raw_detection_boxes", "raw_detection_scores"]:
outputs.pop(o)
tf.saved_model.save(
model,
export_dir="export",
signatures={"serving_default" : infer},
options=None
)

Related

How to batch an object detection dataset?

I am working on implementing a face detection model on the wider face dataset. I learned it was built into Tensorflow datasets and I am using it.
However, I am facing an issue while batching the data. Since, an Image can have multiple faces, therefore the number of bounding boxes output are different for each Image. For example, an Image with 2 faces will have 2 bounding box, whereas one with 4 will have 4 and so on.
But the problem is, these unequal number of bounding boxes is causing each of the Dataset object tensors to be of different shapes. And in TensorFlow afaik we cannot batch tensors of unequal shapes ( source - Tensorflow Datasets: Make batches with different shaped data). So I am unable to batch the dataset.
So after loading the following code and batching -
ds,info = tfds.load('wider_face', split='train', shuffle_files=True, with_info= True)
ds1 = ds.batch(12)
for step, (x,y,z) in enumerate(ds1) :
print(step)
break
I am getting this kind of error on run Link to Error Image
In general any help on how can I batch the Tensorflow object detection datasets will be very helpfull.
It might be a bit late but I thought I should post this anyways. The padded_batch feature ought to do the trick here. It kind of goes around the issue by matching dimension via padding zeros
ds,info = tfds.load('wider_face', split='train', shuffle_files=True, with_info= True)
ds1 = ds.padded_batch(12)
for step, (x,y,z) in enumerate(ds1) :
print(step)
break
Another solution would be to process not use batch and process with custom buffers with for loops but that kind of defeats the purpose. Just for posterity I'll add the sample code here as an example of a simple workaround.
ds,info = tfds.load('wider_face', split='train', shuffle_files=True, with_info= True)
batch_size = 12
image_annotations_pair = [x['image'], x['faces']['bbox'] for n, x in enumerate(ds) if n < batch_size]
Then use a train_step modified for this.
For details one may refer to - https://www.kite.com/python/docs/tensorflow.contrib.autograph.operators.control_flow.dataset_ops.DatasetV2.padded_batch

Non Max Suppression settings and postprocessing for EfficientDet

I've downloaded and installed the Tensorflow Object Detection API and downloaded one of the EfficientDet models. As I want to do some work on the raw scores directly before Non-Max Suppression reduces it to class output, my first goal was to try and get the same final outputs from the raw scores, using the downloaded model config as a guide.
post_processing {
batch_non_max_suppression {
score_threshold: 9.99999993922529e-09
iou_threshold: 0.5
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
As the Object Detection API has no score converter method under postprocessing, I'm not sure what this does, but the only batch NMS method in utils seems to be batch_multiclass_non_max_suppression.
So, having fed an image into the network and got an output detections, to try and replicate its results:
result = post_processing.batch_multiclass_non_max_suppression(tf.expand_dims(detections['raw_detection_boxes'], 2), detections['raw_detection_scores'], 9.99999993922529e-09, 0.5, 100, max_total_size=100)
detections['detection_boxes'] = result[0]
detections['detection_scores'] = result[1]
detections['detection_classes'] = result[2]
i.e., substitute the relevant scores in the detections with the output of NMS, and insert the dimension needed for the batch function to work. This is then visualised following per the TensorFlow Hub colab.
The problem is that whilst the input image (this from the MSCOCO dataset) should produce this:
It instead produces this:
The bounding boxes are all (seemingly) shifted upwards and the categories are simply off, which suggests there's more processing being done between the raw scores, NMS, and output, but it's entirely unclear what. The scores are correct, so it appears to be pruning correctly.
Edit: I suspect, after looking at the SSD model template, that the problem with the misaligned bounding boxes is because I'm not passing the resized image dimensions along to NMS, which is generated by the preprocessing step, which should be easy enough to address via generating the image resize function. However, after applying the slice operation to remove a background class doesn't address the incorrect labels:
Instead, it seems to have lost the person class entirely--this makes sense; it isn't configured to include a background class of any sort and if Person (id 1) is instead coming out as index 0, then this would cut them off.
EDIT 2: I looked at the original meta-architecture further and copied the image-resizing function, i.e.:
from object_detection.protos import image_resizer_pb2
from object_detection.utils import config_util as c
from object_detection.utils import shape_utils
config = c.get_configs_from_pipeline_file(r"C:\Users\Person\.keras\datasets\efficientdet_d7_coco17_tpu-32\pipeline.config")
image_config = c.get_image_resizer_config(config['model'])
resize = image_resizer_builder.build(image_config)
def compute_clip_window(preprocessed_images, true_image_shapes):
# identical to the meta-arch definition
# image resizing
im = tf.cast(input_tensor, tf.float32)
channel_offset = [0.485, 0.456, 0.406]
channel_scale = [0.229, 0.224, 0.225]
im = ((im / 255.0) - [[channel_offset]]) / [[channel_scale]]
resized = shape_utils.resize_images_and_return_shapes(im, resize)
clip = compute_clip_window(resized[0], resized[1])
Therefore allowing the clip argument to be supplied to NMS. However, this doesn't change anything, and it still returns the same mis-aligned boxes as the second image. This is incredibly confusing, as this seems like it should replicate everything the model needs in both the preprocessing and postprocessing steps to generate its own output: the image is normalized and resized; the true image size is retained alongside the resized image; no further processing of the raw boxes or raw scores happens before they get passed to the NMS (the returned versions of the raw values are identical to the values passed to NMS except with one dimension and the model itself doesn't interfere with the post-processing at all--and the call signature calls preprocessing, prediction, and postprocessing in turn, so nothing else should be happening in the interim.
Edit 3: I added another line was added (to no effect)--setting the multiclass scores in the NMS additional fields to the detection scores with backgrounds (i.e., the raw scores). By adding +1 to all the label classes, I got the following image:
Whilst this is correct, this only corrects for the earlier parts of the dataset, i.e. where the only empty class is the 0th. It still appears that there must be some mapping step I'm not following, alongside whatever is causing the image misalignment.
The easiest solution in my case was to load the model from the checkpoint and configs, rather than use the saved model directly, in order to access the original preprocess, predict, and postprocess methods, rather than having a single function call.

When forward using MXNet, how to do with varying 'batch size' in data_shapes?

Hi,I have a question that, how can I make predict with unfixed input data? I will try to describe in detail clear:
I use MTCNN for face detection(it's ok unfamiliar with that), and it employs 3 networks: PNet, RNet, ONet. PNet detects a mass of proposal face bounding boxes, then these boxes are coarse-to-fine by the rest net one after another, finally get precise face bbox(s). When taking an image as input to PNet, image's size is unfixed, and the output proposal box number from PNet is also unfixed, so as RNet, ONet. Reference to another MTCNN code I set a large data_shapes(e.g., image size, batch size) when I bind the module, and initialize all to zero,then make predict. That works though, Isn't that a redundant calculation? (Question 1)
PNet:
max_img_w=1000
max_img_h=1000
sym, arg_params, aux_params = mx.model.load_checkpoint(‘det1’, 0)
self.PNets = mx.mod.Module(symbol=sym, context=ctx,label_names=None)
self.PNets.bind(data_shapes=[(‘data’, (1, 3, max_img_w, max_img_h))],for_training=False)
self.PNets.set_params(arg_params,aux_params)
RNet
sym, arg_params, aux_params = mx.model.load_checkpoint(‘det2’, 0)
self.RNet = mx.mod.Module(symbol=sym, context=ctx,label_names=None)
self.RNet.bind(data_shapes=[(‘data’, (2048,3, 24, 24))],for_training=False)
self.RNet.set_params(arg_params,aux_params,allow_missing=True)
ONet
sym, arg_params, aux_params = mx.model.load_checkpoint(‘det3’, 0)
self.ONet = mx.mod.Module(symbol=sym, context=ctx,label_names=None)
self.ONet.bind(data_shapes=[(‘data’, (256, 3, 48, 48))],for_training=False)
self.ONet.set_params(arg_params,aux_params,allow_missing=True)
And I try mx.mod.Module.reshape before predict, which will adjust data'shape according to last network's output, but I get this error:(Question 2)
AssertionError: Shape of unspecified array arg:prob1_label changed. This can cause the new executor to not share parameters with the old one. Please check for error in the network. If this is intended, set partial_shaping=True to suppress this warning.
One more thing is that The MTCNN code (https://github.com/pangyupo/mxnet_mtcnn_face_detection) primary use deprecated function to load models:
self.PNet = mx.model.FeedForward.load(‘det1’,0)
One single line to work with arbitrary data_shapes, why this function be deprecated..?(Question 3)
I found a little difference that after load model, FeedFroward takes 0MB memory before make one predict, but mx.mod.Module takes up memory once loaded, and increase obviously after making one prediction.
You can use MXNet imperative API Gluon and that will let you use different batch-sizes.
If like in this case, your model was trained using the symbolic API or has been exported in the serialized MXNet format ('-0001.params', '-symbol.json' for e.g), you can load it in Gluon that way:
ctx = mx.cpu()
sym = mx.sym.load_json(open('det1-symbol.json', 'r').read())
PNet = gluon.nn.SymbolBlock(outputs=sym, inputs=mx.sym.var('data'))
PNet.load_params('det1-0001.params', ctx=ctx)
Then you can use it the following way:
# a given batch size (1)
data1 = mx.nd.ones((1, C, W, H))
output1 = PNet(data1)
# a different batch size (5)
data2 = mx.nd.ones((5, C, W, H))
output2 = PNet(data2)
And it would work.
You can get started with MXNet Gluon with the official 60 minutes crash course

Tensorflow word2vec InvalidArgumentError: Assign requires shapes of both tensors to match

I am using this code to train a word2vec model. I am trying to train it incrementally, with using saver.restore(). I am using new data after restoring the model. Since vocabulary size for the old data and new data are not the same, I got an exception like this:
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [28908,200] rhs shape= [71291,200]
Here 71291 is vocabulary size for the old data and 28908 is for new data.
It gets the vocabulary words from the train_data file here, and constructs the network model using size of the vocabulary. I thought that if I could set vocabulary size the same for my old data and new data, I can solve this problem.
So, my question is: Can I do that in this code? As far as I understand, I cannot reach skipgram_word2vec() function.
Or, is there any other way of solving this issue in this code beside what I thought? If it is not possible using this code, I will try other ways for my purpose.
Any help is appreciated.
Having taken a look at the source of word2vec_optimized.py I'd say you will need to change the code there. It operates by opening a text file right up front as "training data". For your purposes, you have to change the build_graph method and allow it to get an option to set all that data ( words, counts, words_per_epoch, current_epoch, total_words_processed, examples, labels, opts.vocab_words, opts.vocab_counts, opts.words_per_epoch ) when initializing, and not from a text file.
Then you need to merge the two text files, and load them once, to produce the vocabulary. Then save all the data above, and use that to restore the network at each subsequent run.
If you use more than 2 texts, you need to include all the text you plan to use in the first data to produce the vocabulary, however.

Incorporating very large constants in Tensorflow

For example, the comments for the Tensorflow image captioning example model state:
NOTE: This script will consume around 100GB of disk space because each image
in the MSCOCO dataset is replicated ~5 times (once per caption) in the output.
This is done for two reasons:
1. In order to better shuffle the training data.
2. It makes it easier to perform asynchronous preprocessing of each image in
TensorFlow.
The primary goal of this question is to see if there is an alternative to this type of duplication. In my use case, storing the data in this way would require each image to be duplicated in the TFRecord files many more times, on the order of 20 - 50 times.
I should note first that I have already fed the images through VGGnet to extract 4096 dim features, and I have these stored as a mapping between filename and the vectors.
Before switching over to Tensorflow, I had been feeding batches containing filename strings and then looking up the corresponding vector on a per-batch basis. This allows me to store all of the image data in ~15GB without needing to duplicate the data on disk.
My first attempt to do this in in Tensorflow involved storing indices in the TFExample buffers and then doing a "preprocessing" step to slice into the corresponding matrix:
img_feat = pd.read_pickle("img_feats.pkl")
img_matrix = np.stack(img_feat)
preloaded_images = tf.Variable(img_matrix)
first_image = tf.slice(preloaded_images, [0,0], [1,4096])
However, in this case, Tensorflow disallows a variable larger than 2GB. So my next thought was to partition this across several variables:
img_tensors = []
for i in range(NUM_SPLITS):
with tf.Graph().as_default():
img_tensors.append(tf.Variable(img_matrices[i], name="preloaded_images_%i"%i))
first_image = tf.concat(1, [tf.slice(t, [0,0], [1,4096//NUM_SPLITS]) for t in img_tensors])
In this case, I'm forced to store each partition on a separate graph, because it seems any one graph cannot be this large either. However, now the concat fails because each tensor I am concatenating is on a separate graph.
Any advice on incorporating a large amount (~15GB) of preloaded into the Tensorflow graph.
Potentially related is this question; however in this case I'd like to override the decoding of the actual JPEG file with the preprocessed value in a tensor op.