Generate SavedModel from Tensorflow model to serve it on Google Cloud ML - tensorflow

I used TF Hub to retrain a model for image classification. Now I would like to serve it in the cloud. For that i need a SavedModel. The retrain.py script from TF Hub uses tf.saved_model.simple_save to generate the SavedModel after the training is done.
What confuses me is the .pb file inside the SavedModel folder that I get from that method is much smaller than the final .pb saved after the training.
simple_save is also now deprecated and I tried to get my SavedModel after the training is done following this SO issue.
But my variables folder is empty. How can I incorporate that building of SavedModel inside the retrain.py to replace the simple_save method ? Tips would be much appreciated.

To deploy your model to Google Cloud ML, you need a SavedModel which can be produced from tf.saved_model api.
Below are the steps for hosting your trained models in cloud with Cloud ML Engine.
Upload your saved model to a Cloud Storage bucket by setting up a cloud storage bucket using BUCKET_NAME="your_bucket_name"
Select a region for your bucket and set a REGION environment variable.
EGION=us-central1
Create a new bucket gsutil mb -l $REGION gs://$BUCKET_NAME
Upload using
SAVED_MODEL_DIR=$(ls ./your-export-dir-base | tail -1)
gsutil cp -r $SAVED_MODEL_DIR gs://your-bucket
Create a Cloud ML Engine model resource and model version.
Also for your question on incorporating savedmodel inside retrain.py, you need to pass saved model as an argument to the tfhub_module line as below.
python retrain.py --image_dir C: ...\\code\\give the path here --tfhub_module C:
...give the path to saved model directory

Related

How to save a tensorflow checkpoint to S3 on Sagemaker? It throws OSError: Unable to create file

I'm using Amazon Sagemaker Studio to train my tensorflow custom model. The problem is that when I run model.fil(callbacks=[...], ...), it won't save checkpoints to AWS S3 even though bucket exists and training data is retrieved from that. It throws
OSError: Unable to create file (unable to open file: name = 's3://mybucket/01-0.70.hdf5', errno = 2, error message = 'No such file or directory', flags = 13, o_flags = 242)
How would you let it work?
The container I'm using is python==3.8, tensorflow==2.6, CPU Optimized.
The relevant discussion is found here:
https://github.com/tensorflow/tensorflow/issues/13796
TensorFlow is probably not able to use s3 ARNs as files. Indeed S3 is not a file system, so most libraries & APIs written for POSIX-compliant file interaction will fail to read an S3 ARN. You have 2 options:
If you want to stay in SageMaker Studio, you need to write your checkpoint locally, then copy it later to S3 with an s3 cp or boto3 upload_file
Recommended option: do not train in a notebook, but train in an async, SageMaker Training job. This has multiple benefits from a cost, scalability, reproducibility and instrumentation standpoint. SageMaker Training has a checkpoint path connected to the S3 bucket of your choice. I suggest you write checkpoints in /opt/ml/checkpoints/, and in your SDK call you set checkpoint_s3_uri=s3://mybucket/<add a job ID to have only files from this job>
Read more about SageMaker managed checkpointing here https://docs.aws.amazon.com/sagemaker/latest/dg/model-checkpoints.html

TF1.14][TPU]Can not use custom TFrecord dataset on Colab using TPU

I have created a TFRecord dataset file consisting elements and their corresponding labels. I want to use it for training model on Colab using free TPU. I can load the TFRecord file and even run an iterator just to see the contents however, before the beginning of the epoch it throws following error-
UnimplementedError: From /job:worker/replica:0/task:0:
File system scheme '[local]' not implemented (file: '/content/gdrive/My Drive/data/encodeddata_inGZIP.tfrecord')
[[{{node MultiDeviceIteratorGetNextFromShard}}]]
[[RemoteCall]]
[[IteratorGetNextAsOptional_1]]
In my understanding, it wants the TFRecord file on the TPU bucket, I don't know how to do that on Colab. How can one use a TFRecord file directly on Colab TPU?
You need to host it on Google Cloud Storage:
All input files and the model directory must use a cloud storage bucket path (gs://bucket-name/...), and this bucket must be accessible from the TPU server. Note that all data processing and model checkpointing is performed on the TPU server, not the local machine.
As mentioned on Google's troubleshooting page: https://cloud.google.com/tpu/docs/troubleshooting#cannot_use_local_filesystem
Hope this helps!

Deploying a trained model

I have trained my Neural Style transfer model and got .ckpt files after training. Now I want to deploy this model using tensorflow-serving. How can I proceed further ?
Install Docker and Pull Docker Tensorflow serving image.
$docker pull tensorflow/serving
copy your SavedModel to the container's model folder:
$docker cp models/ serving_base:/models/
Follow instructions from https://github.com/tensorflow/serving/blob/master/tensorflow_serving/g3doc/docker.md , and you should be able to run serving image to host model.
Check onto below link for more details -
https://www.tensorflow.org/tfx/serving/docker

Google Cloud ML Engine: Trouble With Local Prediction Given saved_model.pb

I've trained a Keras model using the tf.data.Dataset API and am trying to see if I've saved it (as saved_model.pb) correctly, so I can use it on ML Engine. Here's what I've done:
estimator = tf.keras.estimator.model_to_estimator(my_model)
# create serving function...
estimator.export_savedmodel('./export', serving_fn)
So now I'm trying to use gcloud ml-engine local predict to see if I can get a prediction back. I'm doing:
gcloud ml-engine local predict --model-dir=~/path/to/folder --json-instances=instances.json
Unfortunately, I get:
cloud.ml.prediction.prediction_utils.PredictionError: Failed to load model: Cloud ML only supports TF 1.0 or above and models saved in SavedModel format. (Error code: 0)
So then I try adding --runtime-version=1.2 to my command like this:
gcloud ml-engine local predict --model-dir=~/path/to/folder --json-instances=instances.json --runtime-version=1.2
and I get back:
ERROR: (gcloud.ml-engine.local.predict) unrecognized arguments: --runtime-version=1.2
Any idea what I'm doing incorrectly / how to fix?
Thanks!
For posterity: the problem turned out to be an incorrect path. If anybody else encounters this issue try using the full absolute path and ensure you are point to the directory containing the saved_model.pb file.

Google Cloud ML Engine can't locate local TFRecords

I am trying to use Google Cloud ML Engine to optimize hyperparameters for my variational autoencoder model, but the job fails because the .tfrecord files I specify for my input are not found. In my model code, I pass train.tfrecords to my input tensor as in the canonical cifar10 example and specify the location of train.tfrecords with the full path.
Relevant information:
JOB_DIR points to the trainer directory
This image shows my directory structure
My setup.py file is below:
from setuptools import find_packages
from setuptools import setup
REQUIRED_PACKAGES = ['tensorflow==1.3.0', 'opencv-python']
setup(
name='trainer',
version='0.1',
install_requires=REQUIRED_PACKAGES,
packages=find_packages(),
include_package_data=True,
description='My trainer application package.'
)
When the job executes it will not be able to read data from your local machine. The easiest way to make TFRecord files available to your job is by copying them to GCS and then passing the location of the GCS files to your program as flags and using those flags to configure your readers and writers.