Method to load image data for TPU - tensorflow

So currently both keras.ImageDataGenerator and tf.data.Dataset are not working with tensorflow TPU, Is there any Other method to load image data for training model on Tpu ?

You can use GCS buckets.
Or you read from files:
with open(image_path, "rb") as local_file:
img = local_file.read()
And then convert to tensor and create dataset with tf.data.Dataset.from_tensor_slices.

Related

How to deploy custom tensorflow model to web?

so im facing a problem about deployment my custom sign-language recognition model. I converted my_ssd_mobnet with exporter_main_v2.py to saved_model.pb and then i tried to use the tensorflowjs convertor with this code:
from tensorflow import keras
import tensorflowjs as tfjs
def importModel(modelPath):
model = tf.keras.models.load_model(modelPath)
tfjs.converters.save_tf_model(model, "tfjsmodel")
importModel("saved_model")
#importModel("modelDirectory")
then i got an error like this..
ValueError: Unable to create a Keras model from this SavedModel. This SavedModel was created with tf.saved_model.save, and lacks the Keras metadata.Please save your Keras model by calling model.saveor tf.keras.models.save_model.
Finally i decide to convert my model to h5, but.. i don't know how.
How can i convert my_ssd_mobnet model to h5?
Thanks!
If you're creating a custom Keras layer in python and wanting to export it to tfjs for the browser to predict, then you'll most likely encounter "Unknown layer" and will have to implement them yourself in JS.
Instead of exporting the layers, it's best to export a graph since you're only using it for prediction and not training in the browser.
tf.saved_model.save(model, 'saved_model')
This will save the files in the saved_model folder and contains the .pb file.
Use the tensorflowjs_converter tool to convert the model into a graph tfjs model.
tensorflow_converter --input_format=tf_saved_model saved_model model
This will convert your saved model into the browser-compatible tfjs model without the custom layer. (The Keras layers will be built in.)
Move this folder to your website's public folder.
In the browser:
const model = await tf.loadGraphModel('/model/model.json')
const img = tf.browser.fromPixels(imageData, 3) // imageElement, videoElement, ImageData
.toFloat().resizeBilinear([224, 224]) // mobilenet dims
.div(tf.scalar(255)) // mobilenet [0,1] normalization
.expandDims()
const { values, indices } = model.predict(img).topk()
const label = indices.dataSync()[0]
const confidence = values.dataSync()[0]
NOTE: The .bin files will end up in the 10's of MB so put this inside a webworker. You can send a buffered data from the main thread to the worker thread for processing.
First and foremost, if you have used "exporter_main_v2.py" script to export the model, you will only get the model format in tensorflow model. This way of exporting is mainly used to make inference on the trained model. So the main problem in your code is that you are trying to import a "keras model" with that tf.keras.models.load_model() function. Instead of using "exporter_main_v2.py" you have to use tf.keras.models.save_model() function to export/save your model.
I am also giving you a simple video explanation link to clarify a few things for you
https://www.youtube.com/watch?v=Lx7OCFXPG8o
After watching the video you might want to checkout the following colab notebook
https://colab.research.google.com/github/tensorflow/examples/blob/master/courses/udacity_intro_to_tensorflow_for_deep_learning/l07c01_saving_and_loading_models.ipynb
This is a material provided by Udacity from its introduction to tensorflow training course. That should be very helpful in your case to understand the difference between tensorflow model file and keras model file.
Have a nice day.
Edit:
HDF5 format
Keras provides a basic save format using the HDF5 standard.
Create and train a new model instance.
model = create_model()
model.fit(train_images, train_labels, epochs=5)
Save the entire model to a HDF5 file.
The '.h5' extension indicates that the model should be saved to HDF5.
model.save('my_model.h5')
You should add '.h5' extension to filename when calling model.save function, by this way the model will be saved in h5 format.

Best way to process terabytes of data on gcloud ml-engine with keras

I want to train a model on about 2TB of image data on gcloud storage. I saved the image data as separate tfrecords and tried to use the tensorflow data api following this example
https://medium.com/#moritzkrger/speeding-up-keras-with-tfrecord-datasets-5464f9836c36
But it seems like keras' model.fit(...) doesn't support validation for tfrecord datasets based on
https://github.com/keras-team/keras/pull/8388
Is there a better approach for processing large amounts of data with keras from ml-engine that I'm missing?
Thanks a lot!
If you are willing to use tf.keras instead of actual Keras, you can instantiate a TFRecordDataset with the tf.data API and pass that directly to model.fit(). Bonus: you get to stream directly from Google Cloud storage, no need to download the data first:
# Construct a TFRecordDataset
ds_train tf.data.TFRecordDataset('gs://') # path to TFRecords on GCS
ds_train = ds_train.shuffle(1000).batch(32)
model.fit(ds_train)
To include validation data, create a TFRecordDataset with your validation TFRecords and pass that one to the validation_data argument of model.fit(). Note: this is possible as of TensorFlow 1.9.
Final note: you'll need to specify the steps_per_epoch argument. A hack that I use to know the total number of examples in all TFRecordfiles, is to simply iterate over the files and count:
import tensorflow as tf
def n_records(record_list):
"""Get the total number of records in a collection of TFRecords.
Since a TFRecord file is intended to act as a stream of data,
this needs to be done naively by iterating over the file and counting.
See https://stackoverflow.com/questions/40472139
Args:
record_list (list): list of GCS paths to TFRecords files
"""
counter = 0
for f in record_list:
counter +=\
sum(1 for _ in tf.python_io.tf_record_iterator(f))
return counter
Which you can use to compute steps_per_epoch:
n_train = n_records([gs://path-to-tfrecords/record1,
gs://path-to-tfrecords/record2])
steps_per_epoch = n_train // batch_size

Tensorflow Dataset API: input pipeline with parquet files

I am trying to design an input pipeline with Dataset API. I am working with parquet files. What is a good way to add them to my pipeline?
We have released Petastorm, an open source library that allows you to use Apache Parquet files directly via Tensorflow Dataset API.
Here is a small example:
with Reader('hdfs://.../some/hdfs/path') as reader:
dataset = make_petastorm_dataset(reader)
iterator = dataset.make_one_shot_iterator()
tensor = iterator.get_next()
with tf.Session() as sess:
sample = sess.run(tensor)
print(sample.id)

Tensorflow Keras GCP ML engine model serving

I'm working on an image classifier with tensorflow estimator + keras retraining the last layer of a pretrained application inception_v3 on GCP ML engine.
The keras model is exported with tf.keras.estimator.model_to_estimator and the input function receive the path of the image stored on GCP cloud storage open the image with tf.image.decode_jpeg and return a dataset with the following format dict(zip(['inception_v3_input'], [image])), label
I'm trying to define the tf.estimator.export.ServingInputReceiver but I'm having some trouble defining it.
The model is serving correctly the prediction with the predict method using the input function without the labels.
My idea was to reuse the input_function to decode the image passing only the path of the image on cloud storage to the prediction also for the google endpoint, but I can't understand how to do it.
Thank's for your help
If I'm understanding correctly, your question is how to get the file from Cloud Storage, considering that you want to decode the image this way:
image_decoded = tf.image.decode_jpeg(image_string)
So, in this case, you can use:
image_string = file_io.FileIO(filename, mode='r')
By importing file_io first:
from tensorflow.python.lib.io import file_io
According to the comments on this question about reading input data from GCS, using the file_read function should provide the same results since " there was a bunch of work done to abstract file io and file systems, so there all the io functionality works consistently". So you can try also with read_file function.

How to augment data in tensorflow tfrecords?

I am storing my data using tfrecords and I read them as tensors using Dataset API and then I use the Estimator API to perform training. Now, I want to do online data-augmentation on each item in the dataset, but after trying for a while I cannot find a way out to do it. I want randomly flipping, randomly rotation and other manipulators.
I am following the instructions given in this tutorial with a custom estimator which is the my CNN and I am not sure where the data augmentation step occurs.
Using TFRecords doesn't prevent you from doing data augmentation.
Following the tutorial you linked in your comment, here is what roughly happens:
You create the dataset from the TFRecords files, and parse the file to get an image and a label
dataset = tf.data.TFRecordDataset(filenames=filenames)
dataset = dataset.map(parse)
You can now apply a new preprocessing function to do some data augmentation during training
# Only do it when we are training
if train:
dataset = dataset.map(train_preprocess)
The train_preprocess function can be something like this:
def train_preprocess(image, label):
flip_image = tf.image.random_flip_left_right(image)
# Other transformations...
return flip_image, label