Using TF 2.0 and tfp probability layers, I have constructed a keras.sequential model. I would like to export it for serving with TensorFlow Serving, and I would like to include the preprocessing and post processing steps in the servable.
My preprocessing steps are fairly simple-- fill NAs with explicit values, encoding a few strings as floats, normalize inputs, and denormalize outputs. For training, I have been doing the pre/post processing with pandas and numpy.
I know that I can export my Keras model's weights, wrap the keras.sequential model's architecture in a bigger TensorFlow graph, use low-level ops like tf.math.subtract(inputs, vector_of_feature_means) to do pre/post processing operations, define tf.placeholders for my inputs and outputs, and make a servable, but I feel like there has to be a cleaner way of doing this.
Is it possible to use keras.layers.Add() and keras.layers.Multiply() in a keras.sequence model for explicit preprocessing steps, or is there some more standard way of doing these things?
The standard and efficient way of doing these things, as per my understanding is, to use Tensorflow Transform. It doesn't essentially mean that we should use entire TFX Pipeline if we have to use TF Transform. TF Transform can be used as a Standalone as well.
Tensorflow Transform creates a Beam Transormation Graph, which injects these Transformations as Constants in Tensorflow Graph. As these transformations are represented as Constants in the Graph, they will be consistent across Training and Serving. Advantages of that consistency across Training and Serving are
Eliminates Training-Serving Skew
Eliminates the need for having code in the Serving System, which improves the latency.
Sample Code for TF Transform is mentioned below:
Code for Importing all the Dependencies:
try:
import tensorflow_transform as tft
import apache_beam as beam
except ImportError:
print('Installing TensorFlow Transform. This will take a minute, ignore the warnings')
!pip install -q tensorflow_transform
print('Installing Apache Beam. This will take a minute, ignore the warnings')
!pip install -q apache_beam
import tensorflow_transform as tft
import apache_beam as beam
import tensorflow as tf
import tensorflow_transform.beam as tft_beam
from tensorflow_transform.tf_metadata import dataset_metadata
from tensorflow_transform.tf_metadata import dataset_schema
Below mentioned is the Pre-Processing function where we mention all the Transformations. As of now, TF Transform doesn't provide a direct API for Missing Value Imputation. So, only for that, we have to write our own code for that using low level APIs.
def preprocessing_fn(inputs):
"""Preprocess input columns into transformed columns."""
# Since we are modifying some features and leaving others unchanged, we
# start by setting `outputs` to a copy of `inputs.
outputs = inputs.copy()
# Scale numeric columns to have range [0, 1].
for key in NUMERIC_FEATURE_KEYS:
outputs[key] = tft.scale_to_0_1(outputs[key])
for key in OPTIONAL_NUMERIC_FEATURE_KEYS:
# This is a SparseTensor because it is optional. Here we fill in a default
# value when it is missing.
dense = tf.sparse_to_dense(outputs[key].indices,
[outputs[key].dense_shape[0], 1],
outputs[key].values, default_value=0.)
# Reshaping from a batch of vectors of size 1 to a batch to scalars.
dense = tf.squeeze(dense, axis=1)
outputs[key] = tft.scale_to_0_1(dense)
# For all categorical columns except the label column, we generate a
# vocabulary but do not modify the feature. This vocabulary is instead
# used in the trainer, by means of a feature column, to convert the feature
# from a string to an integer id.
for key in CATEGORICAL_FEATURE_KEYS:
tft.vocabulary(inputs[key], vocab_filename=key)
# For the label column we provide the mapping from string to index.
table = tf.contrib.lookup.index_table_from_tensor(['>50K', '<=50K'])
outputs[LABEL_KEY] = table.lookup(outputs[LABEL_KEY])
return outputs
You can refer below mentioned link for the detailed information and for the Tutorial of TF Transform.
https://www.tensorflow.org/tfx/transform/get_started
https://www.tensorflow.org/tfx/tutorials/transform/census
Related
I am training a tensorflow model with with multiple images as input and a segmentation mask at the output. I wanted to perform random rotation augmentation in my dataset pipeline.
I have a list of parallel image file names (input output files aligned) for which I convert into tf dataset object using tf.data.Dataset.from_generator and then and use Dataset.map function to load images with tf.image.decode_png and tf.io.read_file commands.
How can I perform random rotations on the input-output images. I tried using random_transform function of ImageDataGenerator class but it expects numpy data as input and does not work on Tensors (Since tensorflow does not support eager execution in data pipeline I cannot convert it into numpy as well). I suppose I can use tf.numpy_function but I expect there should be some simple solution for this simple problem.
I would like to know how to extract the same performance results from the events file of the output of a model as does Tensorboard : specifically the Precision, Recall, and Loss numbers are most of interest. Here is a subset of them displayed on Tensorboard given the model checkpoint directory:
I'm not sure if there self-documenting information or other metadata available for these models. This one in particular is the Faster RNN Inception: but are these outputs tied to a particular model or are they generic in format?
Found the approach in the tensorboard package:
from tensorboard.backend.event_processing.event_accumulator import EventAccumulator
event_acc = EventAccumulator(evtf)
event_acc.Reload()
One of the entries is:
scal_losses = event_acc.Scalars('Loss/total_loss')
From that list we can extract such attributes as Step [number] and Value (of the loss):
losses = sorted([[sevt.step, sevt.value] for sevt in scal_losses])
I'm relatively new to TensorFlow and I'm having trouble modifying some of the examples to use batch/stream processing with input functions. More specifically, what is the 'best' way to modify this script to make it suitable for training and serving deployment on Google Cloud ML?
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/text_classification.py
Something akin to this example:
https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/census/estimator/trainer
I can package it up and train it in the cloud, but I can't figure out how to apply even the simple vocab_processor transformations to an input tensor. I know how to do it with pandas, but there I can't apply the transformation to batches (using the chunk_size parameter). I would be very happy if I could reuse my pandas preprocessing pipelines in TensorFlow.
I think you have 3 options
1) You cannot reuse pandas preprocessing pipelines in TF. However, you could start TF with the output of your pandas preprocessing. So you could build a vocab and convert the text words to integers, and save a new preprocessed dataset to disk. Then read the integer data (which is encoding your text) in TF to do training.
2) You could build a vocab outside of TF in pandas. Then inside TF, after reading the words, you can make a table to map the text to integers. But if you are going to build a vocab outside of TF, you might as well do the transformation at the same time outside of TF, which is option 1.
3) Use tensorflow_transform. You can call tft.string_to_int() on the text column to automatically build the vocab and convert to integers. The output of tensorflow_transform is preprocessed data in tf.example format. Then training can start from the tf.example files. This is again option 1 but with tf.example files. If you want to run prediction on raw text data, this option allows you to make an exported graph that has the same text preprocessing built in, so you don't have to manage the preprocessing step at prediction time. However, this option is the most complicated as it introduces two additional ideas: tf.example files and beam pipelines.
For examples of tensorflow_transform see https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/criteo_tft
and
https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/reddit_tft
I have string labels such as "cat", "dog". Can I feed string labels directly to deep learning models in Tensorflow and get string labels as predictions? I am looking for the equivalent of sklearn's labelEncoder sklearn.preprocessing import LabelEncoder
If this is not possible, is there a way to pack the labels into savedModel protobuf file and retrieve them based on indices during serving time? I am using Estimator's export_savedModel API. Is assets_extra the right way? The one at https://github.com/tensorflow/serving/issues/55 does not use savedModel format.
The typical way to handle label data in deep learning is to embed the labels in a vector space. Language models do it routinely with word embedding. TensorFlow provides embedding lookup operations that you can use for you purposes.
I tried to follow the Cifar10 example. However, I want to replace the file reading with the Numpy array. There are a few benefits for doing that:
Simpler code (I want to remove the binary file parsing)
Simpler graph and visualization --> easier to explain to other audience
Small perf improvement (due to I/O and parsing)?
What would be a simple way to do it?
You need to get the tensor reshape_image by either:
giving it a name
finding its default name, with Tensorboard for instance
reshaped_image = tf.cast(read_input.uint8image, tf.float32, name="float_image")
Then you can feed your numpy array using a feed_dict like:
reshaped_image = tf.get_default_graph().get_tensor_by_name("float_image")
sess.run(loss, feed_dict={reshaped_image: your_numpy})
The same goes for labels.