how to reduce input size for mask-RCNN trained model while running prediction on google cloud platform - tensorflow

I am trying to use Google AI Platform prediction to perform object recognition using Mask RCNN. After spending close to two weeks, I was able to:
find out how to train on Google Cloud
convert the model from h5 to the SavedModel format required by the AI platform
create AI Platform Models and deploy the trained models there.
Now, that I am trying to perform prediction, it said that my input size exceeds 1.5 MB which is the maximum size the input can be. When I checked it, the code that converts the image ( of size 65KB) to the format required for prediction makes the input file to 57MB.
I have no idea how a 64 KB image file can be converted to a 57 MB json file when molded. And I wanted to know how I can reduce this? Not sure if I am doing something wrong.
I have tried to perform local prediction using the gcloud local predict, and I am able to get the response with the 57MB file. So, that means that the file is correct.
I tried to set the max dimension of the image to 400X400, and that reduced the size of file from 57MB to around 7 MB. which is still very high. I cannot keep reducing it as it leads to loss of information.

As per the online prediction documentation
Binary data cannot be formatted as the UTF-8 encoded strings that JSON supports. If you have binary data in your inputs, you must use base64 encoding to represent it.
You need to have your input_image tensor called input_image_bytes and you will send it data like so:
{'input_image_bytes': {'b64': base64.b64encode(jpeg_data).decode()}}
If you need help correcting you model's inputs you should see def _encoded_image_string_tensor_input_placeholder() in exporter.py called from export_inference_graph.py

Related

VGG19 .h5 file modfiying

I'm using pretrained VGG19 in my modified neural transfer code (Gatys algorithm), but my PC doesn't allow me to use input image in original size (original height is 2499 pix, but with 20GB RAM I can use it only 1000 pix maximum)
As I read, the solution for me will be decreasing batch_size. So, my question is - how can I modify VGG19 .h5 file to change batch_size inside it? Or maybe I can override batch_size of it in my code?
Assuming the pretrained model is defined on ImageNet, the maximum input data size for a single sample is 224*224.
If you try and pass a large input, it's possible your deep learning framework will reshape it into many images to be classified at once.
Resizing your input data to 224*224, you will run with a single image (batch size of 1).
You could make a custom implementation of your model to take larger input sizes. However sizing down to 224*224 generally gets good results, depending on the task.

how to manage batches for model.provide_groundtruth

I'm trying to use TensorFlow 2 Object Detection API with a custom dataset for multi classes to train an SSD, I took as base the example provide by the documentation: https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/eager_few_shot_od_training_tf2_colab.ipynb
My current problem is when I start the fine tuning:
InvalidArgumentError: The first dimension of paddings must be the rank
of inputs[2,2] [6] [Op:Pad]
That seems to be related with the section of model.provide_groundtruth on train_step_fn, as I mention I took my data from a TensorFlow record, I mapped this to a dataset and divide it into batches using padded_batches(tf.data.TFRecordDataset) seems that this is the correct to feed the training with the image but now my problem is the groundtruth because this now is also converted to batches [batch_size,num_detections,coordinate_bbox], is this the problem? any idea on how to fix this issue.
Thanks
P.S. I tried to used the version of modified the pipeline.config file and run the model_main_tf2.py as was in the past with TensorFlow 1 but this method is buggy.
Just to share with everyone this resolves my issue was that I manage to split the data into batches the images and ground truth correctly but I never convert my labels to one hot vector encoding.

Data augmentation in Tensorflow using Estimator API and TFRecords dataset

I'm using Tensorflow's 1.3 Estimator API to perform some image classification. Since I have a considerable amount of data, I gave the TFRecords a go. Saved the file and can read the examples to a Dataset using a parser function inside the input_fn of the estimator model. So far so good.
The issue is when I want to do some image augmentation (rotating and shearing in this case).
1) I tried using the tf.contrib.keras.preprocessing.image.random_shearand the likes. Turns out Keras doesn't like the format of TF's shape ('Dimension') and I can't cast it to a list because its arguments are the axis indexes not the actual value.
2) Then I tried using the tf.contrib.image.rotate and tf.contrib.image.transform with random values in my chosen range. This time I get an error of NotFoundError: Op type not registered 'ImageProjectiveTransform' in binary running on MYPC. Make sure the Op and Kernel are registered in the binary running in this process. which is an open issue (https://github.com/tensorflow/tensorflow/issues/9672). At the moment I can't move from Windows, so I would very interested in possible alternatives.
3) Searched for a way to read TFRecords and transform it to numpy array and do the augmentation with other tools, but can't find a way from within the input_fn from where I can't access the session.
Thanks!
Have you tried using function from the answer to the question below?tensorflow: how to rotate an image for data augmentation?

Deploy Retrained inception model on Google cloud machine learning

I manage to retrain my specific classification model using the generic inception model following this tutorial. I would like now to deploy it on the google cloud machine learning following this steps.
I already managed to export it as MetaGraph but I can't manage to get the proper inputs and outputs.
Using it locally, my entry point to the graph is DecodeJpeg/contents:0 which is fed with a jpeg image in binary format. The output are my predictions.
The code I use locally (which is working) is:
softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
predictions = sess.run(softmax_tensor,{'DecodeJpeg/contents:0': image_data})
Should the input tensor be DecodeJpeg? What would be the changes I need to make if I would like to have a base64 image as input ?
I defined the output as:
outputs = {'prediction':softmax_tensor.name}
Any help is highly appreciated.
In your example, the input tensor is 'DecodeJpeg/contents:0', so you would have something like:
inputs = {'image': 'DecodeJpeg/contents:0')
outputs = {'prediction': 'final_result:0')
(Be sure to follow all of the instructions for preparing a model).
The model directory you intend to export should have files such as:
gs://my_bucket/path/to/model/export.meta
gs://my_bucket/path/to/model/checkpoint*
When you deploy your model, be sure to set gs://my_bucket/path/to/model as the deployment_uri.
To send an image to the service, as you suggest, you will need to base64 encode the image bytes. The body of your request should look like the following (note the 'tag', 'b64', indicating the data is base-64 encoded):
{'instances': [{'b64': base64.b64encode(image)}]}
We've now released a tutorial on how to retrain the Inception model, including instructions for how to deploy the model on the CloudML service.
https://cloud.google.com/blog/big-data/2016/12/how-to-train-and-classify-images-using-google-cloud-machine-learning-and-cloud-dataflow

textsum beam search decoder gives all <UNK> results

I have been testing textsum with both the binary data and gigaword data, trained models and tested. The beam search decoder gives me all 'UNK' results with both set of data and models. I was using the default parameter settings.
I first changed the data interface in data.py and batch_reader.py to read and parse the article and abstract from gigaword dataset. I trained a model with over 90K mini-batches on a roughly 1.7 million documents. Then I tested the model on a different test set but it returned all results.
decoder result from model trained with gigaword
Then I used the binary data that comes along with the textsum code to train a small model with less than 1k mini-batches. I tested on the same binary data. It gives all results in the decoding file except a few 'for' and '.'.
decoder result from model trained with binary data
I also viewed the tensorboard on training loss and it shows training converged.
In training and testing, I didn't change any of the default settings.
Have anyone tried the same thing as I did and found the same issue?
I think I found why it is happening with at least the given toy data set. In my case, I trained and tested with the same toy set given (the data & vocab files). The reason why I'm getting [UNK]s in decoder result is the vocab file doesn't contain any words that appear in the summaries of toy data set. Due to that reason the decoder couldn't find the words to decoding with hence using [UNK] in the final result