Related
I'm trying to build the model illustrated in this picture:
I obtained a pre-trained BERT and respective tokenizer from HuggingFace's transformers in the following way:
from transformers import AutoTokenizer, TFBertModel
model_name = "dbmdz/bert-base-italian-xxl-cased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
bert = TFBertModel.from_pretrained(model_name)
The model will be fed a sequence of italian tweets and will need to determine if they are ironic or not.
I'm having problems building the initial part of the model, which takes the inputs and feeds them to the tokenizer in order to get a representation I can feed to BERT.
I can do it outside of the model-building context:
my_phrase = "Ciao, come va?"
# an equivalent version is tokenizer(my_phrase, other parameters)
bert_input = tokenizer.encode(my_phrase, add_special_tokens=True, return_tensors='tf', max_length=110, padding='max_length', truncation=True)
attention_mask = bert_input > 0
outputs = bert(bert_input, attention_mask)['pooler_output']
but I'm having troubles building a model that does this. Here is the code for building such a model (the problem is in the first 4 lines ):
def build_classifier_model():
text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')
encoder_inputs = tokenizer(text_input, return_tensors='tf', add_special_tokens=True, max_length=110, padding='max_length', truncation=True)
outputs = bert(encoder_inputs)
net = outputs['pooler_output']
X = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences=True, dropout=0.1, recurrent_dropout=0.1))(net)
X = tf.keras.layers.Concatenate(axis=-1)([X, input_layer])
X = tf.keras.layers.MaxPooling1D(20)(X)
X = tf.keras.layers.SpatialDropout1D(0.4)(X)
X = tf.keras.layers.Flatten()(X)
X = tf.keras.layers.Dense(128, activation="relu")(X)
X = tf.keras.layers.Dropout(0.25)(X)
X = tf.keras.layers.Dense(2, activation='softmax')(X)
model = tf.keras.Model(inputs=text_input, outputs = X)
return model
And when I call the function for creating this model I get this error:
text input must of type str (single example), List[str] (batch or single pretokenized example) or List[List[str]] (batch of pretokenized examples).
One thing I thought was that maybe I had to use the tokenizer.batch_encode_plus function which works with lists of strings:
class BertPreprocessingLayer(tf.keras.layers.Layer):
def __init__(self, tokenizer, maxlength):
super().__init__()
self._tokenizer = tokenizer
self._maxlength = maxlength
def call(self, inputs):
print(type(inputs))
print(inputs)
tokenized = tokenizer.batch_encode_plus(inputs, add_special_tokens=True, return_tensors='tf', max_length=self._maxlength, padding='max_length', truncation=True)
return tokenized
def build_classifier_model():
text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')
encoder_inputs = BertPreprocessingLayer(tokenizer, 100)(text_input)
outputs = bert(encoder_inputs)
net = outputs['pooler_output']
# ... same as above
but I get this error:
batch_text_or_text_pairs has to be a list (got <class 'keras.engine.keras_tensor.KerasTensor'>)
and beside the fact I haven't found a way to convert that tensor to a list with a quick google search, it seems weird that I have to go in and out of tensorflow in this way.
I've also looked up on the huggingface's documentation but there is only a single usage example, with a single phrase, and what they do is analogous at my "out of model-building context" example.
EDIT:
I also tried with Lambdas in this way:
tf.executing_eagerly()
def tokenize_tensor(tensor):
t = tensor.numpy()
t = np.array([str(s, 'utf-8') for s in t])
return tokenizer(t.tolist(), return_tensors='tf', add_special_tokens=True, max_length=110, padding='max_length', truncation=True)
def build_classifier_model():
text_input = tf.keras.layers.Input(shape=(1,), dtype=tf.string, name='text')
encoder_inputs = tf.keras.layers.Lambda(tokenize_tensor, name='tokenize')(text_input)
...
outputs = bert(encoder_inputs)
but I get the following error:
'Tensor' object has no attribute 'numpy'
EDIT 2:
I also tried the approach suggested by #mdaoust of wrapping everything in a tf.py_function and got this error.
def py_func_tokenize_tensor(tensor):
return tf.py_function(tokenize_tensor, [tensor], Tout=[tf.int32, tf.int32, tf.int32])
eager_py_func() missing 1 required positional argument: 'Tout'
Then I defined Tout as the type of the value returned by the tokenizer:
transformers.tokenization_utils_base.BatchEncoding
and got the following error:
Expected DataType for argument 'Tout' not <class
'transformers.tokenization_utils_base.BatchEncoding'>
Finally I unpacked the value in the BatchEncoding in the following way:
def tokenize_tensor(tensor):
t = tensor.numpy()
t = np.array([str(s, 'utf-8') for s in t])
dictionary = tokenizer(t.tolist(), return_tensors='tf', add_special_tokens=True, max_length=110, padding='max_length', truncation=True)
#unpacking
input_ids = dictionary['input_ids']
tok_type = dictionary['token_type_ids']
attention_mask = dictionary['attention_mask']
return input_ids, tok_type, attention_mask
And get an error in the line below:
...
outputs = bert(encoder_inputs)
ValueError: Cannot take the length of shape with unknown rank.
For now I solved by taking the tokenization step out of the model:
def tokenize(sentences, tokenizer):
input_ids, input_masks, input_segments = [],[],[]
for sentence in sentences:
inputs = tokenizer.encode_plus(sentence, add_special_tokens=True, max_length=128, pad_to_max_length=True, return_attention_mask=True, return_token_type_ids=True)
input_ids.append(inputs['input_ids'])
input_masks.append(inputs['attention_mask'])
input_segments.append(inputs['token_type_ids'])
return np.asarray(input_ids, dtype='int32'), np.asarray(input_masks, dtype='int32'), np.asarray(input_segments, dtype='int32')
The model takes two inputs which are the first two values returned by the tokenize funciton.
def build_classifier_model():
input_ids_in = tf.keras.layers.Input(shape=(128,), name='input_token', dtype='int32')
input_masks_in = tf.keras.layers.Input(shape=(128,), name='masked_token', dtype='int32')
embedding_layer = bert(input_ids_in, attention_mask=input_masks_in)[0]
...
model = tf.keras.Model(inputs=[input_ids_in, input_masks_in], outputs = X)
for layer in model.layers[:3]:
layer.trainable = False
return model
I'd still like to know if someone has a solution which integrates the tokenization step inside the model-building context so that an user of the model can simply feed phrases to it to get a prediction or to train the model.
text input must of type str (single example), List[str] (batch or single pretokenized example) or List[List[str]] (batch of pretokenized examples).
Solution to the above error:
Just use text_input = 'text'
instead of
text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')
It looks like this is not TensorFlow compatible.
https://huggingface.co/dbmdz/bert-base-italian-xxl-cased#model-weights
Currently only PyTorch-Transformers compatible weights are available. If you need access to TensorFlow checkpoints, please raise an issue!
But remember that some things are easier if you don't use keras's functional-model-api. That's what got <class 'keras.engine.keras_tensor.KerasTensor'> is complaining about.
Try passing a tf.Tensor to see if that works.
What happens when you try:
text_input = tf.constant('text')
Try writing your model as a subclass of model.
Yeah, my first answer was wrong.
The problem is that tensorflow has two types of tensors. Eager tensors (these have a value). And "symbolic tensors" or "graph tensors" that don't have a value, and are just used to build up a calculation.
Your tokenize_tensor function expects an eager tensor. Only eager tensors have a .numpy() method.
def tokenize_tensor(tensor):
t = tensor.numpy()
t = np.array([str(s, 'utf-8') for s in t])
return tokenizer(t.tolist(), return_tensors='tf', add_special_tokens=True, max_length=110, padding='max_length', truncation=True)
But keras Input is a symbolic tensor.
text_input = tf.keras.layers.Input(shape=(1,), dtype=tf.string, name='text')
encoder_inputs = tf.keras.layers.Lambda(tokenize_tensor, name='tokenize')(text_input)
To fix this, you can use tf.py_function. It works in graph mode, and will call the wrapped function with eager tensors when the graph is executed, instead of passing it the graph-tensors while the graph is being constructed.
def py_func_tokenize_tensor(tensor):
return tf.py_function(tokenize_tensor, [tensor])
...
encoder_inputs = tf.keras.layers.Lambda(py_func_tokenize_tensor, name='tokenize')(text_input)
Found this Use `sentence-transformers` inside of a keras model and this amazing articles https://www.philschmid.de/tensorflow-sentence-transformers, which explain you how to do what you're trying to achieve.
The first one is using the py_function approach, the second uses tf.Model to wrap everything into model classes.
Hope this helps anyone arriving here in the future.
This is how to use tf.py_function correctly to create a model that takes string as an input:
model_name = "dbmdz/bert-base-italian-xxl-cased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
bert = TFBertModel.from_pretrained(model_name)
def build_model():
text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')
def encode_text(text):
inputs = [tf.compat.as_str(x) for x in text.numpy().tolist()]
tokenized = tokenizer(
inputs,
return_tensors='tf',
add_special_tokens=True,
max_length=110,
padding='max_length',
truncation=True)
return tokenized['input_ids'], tokenized['attention_mask']
input_ids, attention_mask = tf.py_function(encode_text, inp=[text_input], Tout=[tf.int32, tf.int32])
input_ids = tf.ensure_shape(input_ids, [None, 110])
attention_mask = tf.ensure_shape(attention_mask, [None, 110])
outputs = bert(input_ids, attention_mask)
net = outputs['last_hidden_state']
# Some other layers, this part is not important
x = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences=True))(net)
x = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(1, name='classifier'))(x)
return tf.keras.Model(inputs=text_input, outputs=x)
I use last_hidden_state instead of pooler_output, that's where outputs for each token in the sequence are located. (See discussion here on difference between last_hidden_state and pooler_output). We usually use last_hidden_state when doing token level classification (e.g. named entity recognition).
To use pooler_output would be even simpler, e.g:
net = outputs['pooler_output']
x = tf.keras.layers.Dense(1, name='classifier')(net)
return tf.keras.Model(inputs=text_input, outputs=x)
pooler_output can be used in simpler classification problems (like irony detection), but of course it's still possible to use last_hidden_state to create more powerful models. (When you use bert(input_ids_in, attention_mask=input_masks_in)[0] in your solution, it actually returns last_hidden_state.)
Making sure the model works:
model = build_model()
my_phrase = "Ciao, come va?"
model(tf.constant([my_phrase]))
>>> <tf.Tensor: shape=(1, 110, 1), dtype=float32, numpy=...>,
Making sure HuggingFace part of the model is trainable:
model.summary(show_trainable=True)
Problem
As the title suggests I have been trying to create a pipeline for training an Autoencoder model using TFX. The problem I'm having is fitting the tf.Dataset returned by the DataAccessor.tf_dataset_factory object to the Autoencoder.
Below I summarise the steps I've taken through this project, and have some Questions at the bottom if you wish to skip the background information.
Intro
TFX Pipeline
The TFX components I have used so far have been:
CsvExampleGenerator (the dataset has 82 columns, all numeric, and the sample csv has 739 rows)
StatisticsGenerator / SchemaGenerator, the schema has been edited as is now loaded in using an Importer
Transform
Trainer (this is the component I am currently having problems with)
Model
The model that I am attempting to train is based off of the example laid out here https://www.tensorflow.org/tutorials/generative/autoencoder. However, my model is being trained on tabular data, searching for anomalous results, as opposed to image data.
As I have tried a couple of solutions I have tried using both the Keras.layers and Keras.model format for defining the model and I outline both below:
Subclassing Keras.Model
class Autoencoder(keras.models.Model):
def __init__(self, features):
super(Autoencoder, self).__init__()
self.encoder = tf.keras.Sequential([
keras.layers.Dense(82, activation = 'relu'),
keras.layers.Dense(32, activation = 'relu'),
keras.layers.Dense(16, activation = 'relu'),
keras.layers.Dense(8, activation = 'relu')
])
self.decoder = tf.keras.Sequential([
keras.layers.Dense(16, activation = 'relu'),
keras.layers.Dense(32, activation = 'relu'),
keras.layers.Dense(len(features), activation = 'sigmoid')
])
def call(self, x):
inputs = [keras.layers.Input(shape = (1,), name = f) for f in features]
dense = keras.layers.concatenate(inputs)
encoded = self.encoder(dense)
decoded = self.decoder(encoded)
return decoded
Subclassing Keras.Layers
def _build_keras_model(features: List[str]) -> tf.keras.Model:
inputs = [keras.layers.Input(shape = (1,), name = f) for f in features]
dense = keras.layers.concatenate(inputs)
dense = keras.layers.Dense(32, activation = 'relu')(dense)
dense = keras.layers.Dense(16, activation = 'relu')(dense)
dense = keras.layers.Dense(8, activation = 'relu')(dense)
dense = keras.layers.Dense(16, activation = 'relu')(dense)
dense = keras.layers.Dense(32, activation = 'relu')(dense)
outputs = keras.layers.Dense(len(features), activation = 'sigmoid')(dense)
model = keras.Model(inputs = inputs, outputs = outputs)
model.compile(
optimizer = 'adam',
loss = 'mae'
)
return model
TFX Trainer Component
For creating the Trainer Component I have been mainly following the implementation details laid out here: https://www.tensorflow.org/tfx/guide/trainer
As well as following the default penguins example: https://www.tensorflow.org/tfx/tutorials/tfx/penguin_simple#write_model_training_code
run_fn defintion
def run_fn(fn_args: tfx.components.FnArgs) -> None:
tft_output = tft.TFTransformOutput(fn_args.transform_output)
train_dataset = _input_fn(
file_pattern = fn_args.train_files,
data_accessor = fn_args.data_accessor,
tf_transform_output = tft_output,
batch_size = fn_args.train_steps
)
eval_dataset = _input_fn(
file_pattern = fn_args.eval_files,
data_accessor = fn_args.data_accessor,
tf_transform_output = tft_output,
batch_size = fn_args.custom_config['eval_batch_size']
)
# model = Autoencoder(
# features = fn_args.custom_config['features']
# )
model = _build_keras_model(features = fn_args.custom_config['features'])
model.compile(optimizer = 'adam', loss = 'mse')
model.fit(
train_dataset,
steps_per_epoch = fn_args.train_steps,
validation_data = eval_dataset,
validation_steps = fn_args.eval_steps
)
...
_input_fn definition
def _apply_preprocessing(raw_features, tft_layer):
transformed_features = tft_layer(raw_features)
return transformed_features
def _input_fn(
file_pattern,
data_accessor: tfx.components.DataAccessor,
tf_transform_output: tft.TFTransformOutput,
batch_size: int) -> tf.data.Dataset:
"""
Generates features and label for tuning/training.
Args:
file_pattern: List of paths or patterns of input tfrecord files.
data_accessor: DataAccessor for converting input to RecordBatch.
tf_transform_output: A TFTransformOutput.
batch_size: representing the number of consecutive elements of returned
dataset to combine in a single batch
Returns:
A dataset that contains features where features is a
dictionary of Tensors.
"""
dataset = data_accessor.tf_dataset_factory(
file_pattern,
tfxio.TensorFlowDatasetOptions(batch_size = batch_size),
tf_transform_output.transformed_metadata.schema
)
transform_layer = tf_transform_output.transform_features_layer()
def apply_transform(raw_features):
return _apply_preprocessing(raw_features, transform_layer)
return dataset.map(apply_transform).repeat()
This differs from the _input_fn example given above as I was following the example in the next tfx tutorial found here: https://www.tensorflow.org/tfx/tutorials/tfx/penguin_tft#run_fn
Also for reference, there is no Target within the example data so there is no label_key to be passed to the tfxio.TensorFlowDatasetOptions object.
Error
When trying to run the Trainer component using a TFX InteractiveContext object I receive the following error.
ValueError: No gradients provided for any variable: ['dense_460/kernel:0', 'dense_460/bias:0', 'dense_461/kernel:0', 'dense_461/bias:0', 'dense_462/kernel:0', 'dense_462/bias:0', 'dense_463/kernel:0', 'dense_463/bias:0', 'dense_464/kernel:0', 'dense_464/bias:0', 'dense_465/kernel:0', 'dense_465/bias:0'].
From my own attempts to solve this I believe the problem lies in the way that an Autoencoder is trained. From the Autoencoder example linked here https://www.tensorflow.org/tutorials/generative/autoencoder the data is fitted like so:
autoencoder.fit(x_train, x_train,
epochs=10,
shuffle=True,
validation_data=(x_test, x_test))
therefore it stands to reason that the tf.Dataset should also mimic this behaviour and when testing with plain Tensor objects I have been able to recreate the error above and then solve it when adding the target to be the same as the training data in the .fit() function.
Things I've Tried So Far
Duplicating Train Dataset
model.fit(
train_dataset,
train_dataset,
steps_per_epoch = fn_args.train_steps,
validation_data = eval_dataset,
validation_steps = fn_args.eval_steps
)
Raises error due to Keras not accepting a 'y' value when a dataset is passed.
ValueError: `y` argument is not supported when using dataset as input.
Returning a dataset that is a tuple with itself
def _input_fn(...
dataset = data_accessor.tf_dataset_factory(
file_pattern,
tfxio.TensorFlowDatasetOptions(batch_size = batch_size),
tf_transform_output.transformed_metadata.schema
)
transform_layer = tf_transform_output.transform_features_layer()
def apply_transform(raw_features):
return _apply_preprocessing(raw_features, transform_layer)
dataset = dataset.map(apply_transform)
return dataset.map(lambda x: (x, x))
This raises an error where the keys from the features dictionary don't match the output of the model.
ValueError: Found unexpected keys that do not correspond to any Model output: dict_keys(['feature_string', ...]). Expected: ['dense_477']
At this point I switched to using the keras.model Autoencoder subclass and tried to add output keys to the Model using an output which I tried to create dynamically in the same way as the inputs.
def call(self, x):
inputs = [keras.layers.Input(shape = (1,), name = f) for f in x]
dense = keras.layers.concatenate(inputs)
encoded = self.encoder(dense)
decoded = self.decoder(encoded)
outputs = {}
for feature_name in x:
outputs[feature_name] = keras.layers.Dense(1, activation = 'sigmoid')(decoded)
return outputs
This raises the following error:
TypeError: Cannot convert a symbolic Keras input/output to a numpy array. This error may indicate that you're trying to pass a symbolic value to a NumPy call, which is not supported. Or, you may be trying to pass Keras symbolic inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically converting the API call to a lambda layer in the Functional Model.
I've been looking into solving this issue but am no longer sure if the data is being passed correctly and am beginning to think I'm getting side-tracked from the actual problem.
Questions
Has anyone managed to get an Autoencoder working when connected via TFX examples?
Did you alter the tf.Dataset or handled the examples in a different way to the _input_fn demonstrated?
So I managed to find an answer to this and wanted to leave what I found here in case anyone else stumbles onto a similar problem.
It turns out my feelings around the error were correct and the solution did indeed lie in how the tf.Dataset object was presented.
This can be demonstrated when I ran some code which simulated the incoming data using randomly generated tensors.
tensors = [tf.random.uniform(shape = (1, 82)) for i in range(739)]
# This gives us a list of 739 tensors which hold 1 value for 82 'features' simulating the dataset I had
dataset = tf.data.Dataset.from_tensor_slices(tensors)
dataset = dataset.map(lambda x : (x, x))
# This returns a dataset which marks the training set and target as the same
# which is what the Autoecnoder model is looking for
model.fit(dataset ...)
Following this I proceeded to do the same thing with the dataset returned by the _input_fn. Given that the tfx DataAccessor object returns a features_dict however I needed to combine the tensors in that dict together to create a single tensor.
This is how my _input_fn looks now:
def create_target_values(features_dict: Dict[str, tf.Tensor]) -> tuple:
value_tensor = tf.concat(list(features_dict.values()), axis = 1)
return (features_dict, value_tensor)
def _input_fn(
file_pattern,
data_accessor: tfx.components.DataAccessor,
tf_transform_output: tft.TFTransformOutput,
batch_size: int) -> tf.data.Dataset:
"""
Generates features and label for tuning/training.
Args:
file_pattern: List of paths or patterns of input tfrecord files.
data_accessor: DataAccessor for converting input to RecordBatch.
tf_transform_output: A TFTransformOutput.
batch_size: representing the number of consecutive elements of returned
dataset to combine in a single batch
Returns:
A dataset that contains (features, target_tensor) tuple where features is a
dictionary of Tensors, and target_tensor is a single Tensor that is a concatenated tensor of all the
feature values.
"""
dataset = data_accessor.tf_dataset_factory(
file_pattern,
tfxio.TensorFlowDatasetOptions(batch_size = batch_size),
tf_transform_output.transformed_metadata.schema
)
dataset = dataset.map(lambda x: create_target_values(features_dict = x))
return dataset.repeat()
Keras introduced tf.keras.preprocessing.image_dataset_from_directory function recently, which is more efficient than previously ImageDataGenerator.flow_from_directory method in tensorflow 2.x.
I am practising on the catsvsdogs problems and using this function to build a data pipeline for my model. After training the model, I use preds = model.predict(test_ds) to get the predictions for my test dataset. How should I match the preds with the name of pictures? (There is generator.filenames before, but doesn't exist in the new method any more.) Thanks!
Expanding on #Daniel Woolcott's and #Almog David's answers, the file paths are returned by the image_dataset_from_directory() function in Tensorflow v2.4. already. No need to change the source code of the function.
To be more exact – you can easily retrieve the paths with the file_paths attribute.
Try this:
img_folder = "your_image_folder/"
img_generator = keras.preprocessing.image_dataset_from_directory(
img_folder,
batch_size=32,
image_size=(224,224)
)
file_paths = img_generator.file_paths
print(file_paths)
Prints out:
your_file_001.jpg
your_file_002.jpg
…
I had a similar issue. The solution was to take the underlying tf.keras.preprocessing.image_dataset_from_directory function and add the 'image_paths' variable to the return statement. This incurs no computational overhead as the filenames have already been retrieved.
The main function code is taken from the GitHub at: https://github.com/tensorflow/tensorflow/blob/v2.3.0/tensorflow/python/keras/preprocessing/image_dataset.py#L34-L206
See below:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
from tensorflow.python.data.ops import dataset_ops
from tensorflow.python.keras.layers.preprocessing import image_preprocessing
from tensorflow.python.keras.preprocessing import dataset_utils
from tensorflow.python.ops import image_ops
from tensorflow.python.ops import io_ops
from tensorflow.python.util.tf_export import keras_export
WHITELIST_FORMATS = ('.bmp', '.gif', '.jpeg', '.jpg', '.png')
## Tensorflow override method to return fname as list as well as dataset
def image_dataset_from_directory(directory,
labels='inferred',
label_mode='int',
class_names=None,
color_mode='rgb',
batch_size=32,
image_size=(256, 256),
shuffle=True,
seed=None,
validation_split=None,
subset=None,
interpolation='bilinear',
follow_links=False):
if labels != 'inferred':
if not isinstance(labels, (list, tuple)):
raise ValueError(
'`labels` argument should be a list/tuple of integer labels, of '
'the same size as the number of image files in the target '
'directory. If you wish to infer the labels from the subdirectory '
'names in the target directory, pass `labels="inferred"`. '
'If you wish to get a dataset that only contains images '
'(no labels), pass `label_mode=None`.')
if class_names:
raise ValueError('You can only pass `class_names` if the labels are '
'inferred from the subdirectory names in the target '
'directory (`labels="inferred"`).')
if label_mode not in {'int', 'categorical', 'binary', None}:
raise ValueError(
'`label_mode` argument must be one of "int", "categorical", "binary", '
'or None. Received: %s' % (label_mode,))
if color_mode == 'rgb':
num_channels = 3
elif color_mode == 'rgba':
num_channels = 4
elif color_mode == 'grayscale':
num_channels = 1
else:
raise ValueError(
'`color_mode` must be one of {"rbg", "rgba", "grayscale"}. '
'Received: %s' % (color_mode,))
interpolation = image_preprocessing.get_interpolation(interpolation)
dataset_utils.check_validation_split_arg(
validation_split, subset, shuffle, seed)
if seed is None:
seed = np.random.randint(1e6)
image_paths, labels, class_names = dataset_utils.index_directory(
directory,
labels,
formats=WHITELIST_FORMATS,
class_names=class_names,
shuffle=shuffle,
seed=seed,
follow_links=follow_links)
if label_mode == 'binary' and len(class_names) != 2:
raise ValueError(
'When passing `label_mode="binary", there must exactly 2 classes. '
'Found the following classes: %s' % (class_names,))
image_paths, labels = dataset_utils.get_training_or_validation_split(
image_paths, labels, validation_split, subset)
dataset = paths_and_labels_to_dataset(
image_paths=image_paths,
image_size=image_size,
num_channels=num_channels,
labels=labels,
label_mode=label_mode,
num_classes=len(class_names),
interpolation=interpolation)
if shuffle:
# Shuffle locally at each iteration
dataset = dataset.shuffle(buffer_size=batch_size * 8, seed=seed)
dataset = dataset.batch(batch_size)
# Users may need to reference `class_names`.
dataset.class_names = class_names
return dataset, image_paths
def paths_and_labels_to_dataset(image_paths,
image_size,
num_channels,
labels,
label_mode,
num_classes,
interpolation):
"""Constructs a dataset of images and labels."""
# TODO(fchollet): consider making num_parallel_calls settable
path_ds = dataset_ops.Dataset.from_tensor_slices(image_paths)
img_ds = path_ds.map(
lambda x: path_to_image(x, image_size, num_channels, interpolation))
if label_mode:
label_ds = dataset_utils.labels_to_dataset(labels, label_mode, num_classes)
img_ds = dataset_ops.Dataset.zip((img_ds, label_ds))
return img_ds
def path_to_image(path, image_size, num_channels, interpolation):
img = io_ops.read_file(path)
img = image_ops.decode_image(
img, channels=num_channels, expand_animations=False)
img = image_ops.resize_images_v2(img, image_size, method=interpolation)
img.set_shape((image_size[0], image_size[1], num_channels))
return img
Which would then work as:
train_dir = '/content/drive/My Drive/just_monkeying_around/monkey_training'
BATCH_SIZE = 32
IMG_SIZE = (224, 224)
train_dataset, train_paths = image_dataset_from_directory(train_dir,
shuffle=True,
batch_size=BATCH_SIZE,
image_size=IMG_SIZE)
train_paths returns a list of file strings.
As of Tensorflow 2.4 the dataset has a field named: file_paths
So it can be used in order to get the file paths.
If you use shuffle=True in the dataset creation please pay attention that you have to disable this line in the dataset creation code (method: image_dataset_from_directory):
if shuffle:
# Shuffle locally at each iteration
dataset = dataset.shuffle(buffer_size=batch_size * 8, seed=seed)
I am building a simple Sequential model in Keras (tensorflow backend). During training I want to inspect the individual training batches and model predictions. Therefore, I am trying to create a custom Callback that saves the model predictions and targets for each training batch. However, the model is not using the current batch for prediction, but the entire training data.
How can I hand over only the current training batch to the Callback?
And how can I access the batches and targets that the Callback saves in self.predhis and self.targets?
My current version looks as follows:
callback_list = [prediction_history((self.x_train, self.y_train))]
self.model.fit(self.x_train, self.y_train, batch_size=self.batch_size, epochs=self.n_epochs, validation_data=(self.x_val, self.y_val), callbacks=callback_list)
class prediction_history(keras.callbacks.Callback):
def __init__(self, train_data):
self.train_data = train_data
self.predhis = []
self.targets = []
def on_batch_end(self, epoch, logs={}):
x_train, y_train = self.train_data
self.targets.append(y_train)
prediction = self.model.predict(x_train)
self.predhis.append(prediction)
tf.logging.info("Prediction shape: {}".format(prediction.shape))
tf.logging.info("Targets shape: {}".format(y_train.shape))
NOTE: this answer is outdated and only works with TF1. Check #bers's answer for a solution tested on TF2.
After model compilation, the placeholder tensor for y_true is in model.targets and y_pred is in model.outputs.
To save the values of these placeholders at each batch, you can:
First copy the values of these tensors into variables.
Evaluate these variables in on_batch_end, and store the resulting arrays.
Now step 1 is a bit involved because you'll have to add an tf.assign op to the training function model.train_function. Using current Keras API, this can be done by providing a fetches argument to K.function() when the training function is constructed.
In model._make_train_function(), there's a line:
self.train_function = K.function(inputs,
[self.total_loss] + self.metrics_tensors,
updates=updates,
name='train_function',
**self._function_kwargs)
The fetches argument containing the tf.assign ops can be provided via model._function_kwargs (only works after Keras 2.1.0).
As an example:
from keras.layers import Dense
from keras.models import Sequential
from keras.callbacks import Callback
from keras import backend as K
import tensorflow as tf
import numpy as np
class CollectOutputAndTarget(Callback):
def __init__(self):
super(CollectOutputAndTarget, self).__init__()
self.targets = [] # collect y_true batches
self.outputs = [] # collect y_pred batches
# the shape of these 2 variables will change according to batch shape
# to handle the "last batch", specify `validate_shape=False`
self.var_y_true = tf.Variable(0., validate_shape=False)
self.var_y_pred = tf.Variable(0., validate_shape=False)
def on_batch_end(self, batch, logs=None):
# evaluate the variables and save them into lists
self.targets.append(K.eval(self.var_y_true))
self.outputs.append(K.eval(self.var_y_pred))
# build a simple model
# have to compile first for model.targets and model.outputs to be prepared
model = Sequential([Dense(5, input_shape=(10,))])
model.compile(loss='mse', optimizer='adam')
# initialize the variables and the `tf.assign` ops
cbk = CollectOutputAndTarget()
fetches = [tf.assign(cbk.var_y_true, model.targets[0], validate_shape=False),
tf.assign(cbk.var_y_pred, model.outputs[0], validate_shape=False)]
model._function_kwargs = {'fetches': fetches} # use `model._function_kwargs` if using `Model` instead of `Sequential`
# fit the model and check results
X = np.random.rand(10, 10)
Y = np.random.rand(10, 5)
model.fit(X, Y, batch_size=8, callbacks=[cbk])
Unless the number of samples can be divided by the batch size, the final batch will have a different size than other batches. So K.variable() and K.update() can't be used in this case. You'll have to use tf.Variable(..., validate_shape=False) and tf.assign(..., validate_shape=False) instead.
To verify the correctness of the saved arrays, you can add one line in training.py to print out the shuffled index array:
if shuffle == 'batch':
index_array = _batch_shuffle(index_array, batch_size)
elif shuffle:
np.random.shuffle(index_array)
print('Index array:', repr(index_array)) # Add this line
batches = _make_batches(num_train_samples, batch_size)
The shuffled index array should be printed out during fitting:
Epoch 1/1
Index array: array([8, 9, 3, 5, 4, 7, 1, 0, 6, 2])
10/10 [==============================] - 0s 23ms/step - loss: 0.5670
And you can check if cbk.targets is the same as Y[index_array]:
index_array = np.array([8, 9, 3, 5, 4, 7, 1, 0, 6, 2])
print(Y[index_array])
[[ 0.75325592 0.64857277 0.1926653 0.7642865 0.38901153]
[ 0.77567689 0.13573623 0.4902501 0.42897559 0.55825652]
[ 0.33760938 0.68195038 0.12303088 0.83509441 0.20991668]
[ 0.98367778 0.61325065 0.28973401 0.28734073 0.93399794]
[ 0.26097574 0.88219054 0.87951941 0.64887846 0.41996446]
[ 0.97794604 0.91307569 0.93816428 0.2125808 0.94381495]
[ 0.74813435 0.08036688 0.38094272 0.83178364 0.16713736]
[ 0.52609421 0.39218962 0.21022047 0.58569125 0.08012982]
[ 0.61276627 0.20679494 0.24124858 0.01262245 0.0994412 ]
[ 0.6026137 0.25620512 0.7398164 0.52558182 0.09955769]]
print(cbk.targets)
[array([[ 0.7532559 , 0.64857274, 0.19266529, 0.76428652, 0.38901153],
[ 0.77567691, 0.13573623, 0.49025011, 0.42897558, 0.55825651],
[ 0.33760938, 0.68195039, 0.12303089, 0.83509439, 0.20991668],
[ 0.9836778 , 0.61325067, 0.28973401, 0.28734073, 0.93399793],
[ 0.26097575, 0.88219053, 0.8795194 , 0.64887846, 0.41996446],
[ 0.97794604, 0.91307569, 0.93816429, 0.2125808 , 0.94381493],
[ 0.74813437, 0.08036689, 0.38094273, 0.83178365, 0.16713737],
[ 0.5260942 , 0.39218962, 0.21022047, 0.58569127, 0.08012982]], dtype=float32),
array([[ 0.61276627, 0.20679495, 0.24124858, 0.01262245, 0.0994412 ],
[ 0.60261369, 0.25620511, 0.73981643, 0.52558184, 0.09955769]], dtype=float32)]
As you can see, there are two batches in cbk.targets (one "full batch" of size 8 and the final batch of size 2), and the row order is the same as Y[index_array].
Long edit (almost a new answer) for the following reasons:
Yu-Yang's 2017 answer relies on the private _make_train_function and _function_kwargs APIs, which work only in TF1 (and maybe in TF1 compatibility, so-called non-eager mode).
Similarly, Binyan Hu's 2020 answer relies on _make_test_function and does not work in TF2 by default (requiring non-eager mode as well).
My own Jan 2020 answer, which was already subject to several required configuration settings, seems to have stopped working with (or before) TF 2.5, and I was not able to make model.inputs or model.outputs work any longer.
Finally, the earlier version of this answer requires potentially expensive model evaluation to obtain the predictions for each batch. A similar solution to obtain activation histograms even led to OOM issues with repeated training of different models.
So I set out find a way to obtain all possible quantities (inputs, targets, predictions, activations), batch-wise, without using any private APIs. The aim was to be able to call .numpy() on the intended quantities, so Keras callbacks can run ordinary Python code to ease debugging (I suppose that is what this question is mainly about - for maximum performance, one would probably try to integrate as many computations as possible into TensorFlow's graph operations anyway).
This is the common base model for all solutions:
"""Demonstrate batch data access."""
import tensorflow as tf
from tensorflow import keras
class DataCallback(keras.callbacks.Callback):
"""This class is where all implementations differ."""
def tf_nan(dtype):
"""Create NaN variable of proper dtype and variable shape for assign()."""
return tf.Variable(float("nan"), dtype=dtype, shape=tf.TensorShape(None))
def main():
"""Run main."""
model = keras.Sequential([keras.layers.Dense(1, input_shape=(2,))])
callback = DataCallback()
model.compile(loss="mse", optimizer="adam")
model.fit(
x=tf.transpose(tf.range(7.0) + [[0.2], [0.4]]),
y=tf.transpose(tf.range(7.0) + 10 + [[0.5]]),
validation_data=(
tf.transpose(tf.range(11.0) + 30 + [[0.6], [0.7]]),
tf.transpose(tf.range(11.0) + 40 + [[0.9]]),
),
shuffle=False,
batch_size=3,
epochs=2,
verbose=0,
callbacks=[callback],
)
model.save("tmp.tf")
if __name__ == "__main__":
main()
The following three snippets show one possible solution each, each with their own pros and cons. The core trick is always the same: allocate a tf.Variable and use tf.Variable.assign to export the intended quantity, from some Keras code run in graph mode, into the callback. The methods differ slightly in callback initialization and (in one case) model compilation, and most importantly, in the quantities they can access, which is why I summarize them above each snippet.
Custom metric
Using a custom (fake) metric (similar to my Jan 2020 answer), while we cannot seem to access model.inputs nor model.outputs any more (and model.(_)targets does not even exist any longer), we can access y_true and y_pred, which represent the model targets and outputs:
[ ] Inputs/Samples (x)
[ ] Weights (w)
[+] Targets/Labels (y_true)
[+] Outputs/Predictions (y_pred)
[ ] All layers (or only final input/output layers)
"""Demonstrate batch data access using a custom metric."""
import tensorflow as tf
from tensorflow import keras
class DataCallback(keras.callbacks.Callback): # diff
"""Callback to operate on batch data from metric."""
def __init__(self):
"""Offer a metric to access batch data."""
super().__init__()
self.y_true = None
self.y_pred = None
def set_model(self, model):
"""Initialize variables when model is set."""
self.y_true = tf_nan(model.output.dtype)
self.y_pred = tf_nan(model.output.dtype)
def metric(self, y_true, y_pred):
"""Fake metric."""
self.y_true.assign(y_true)
self.y_pred.assign(y_pred)
return 0
def on_train_batch_end(self, _batch, _logs=None):
"""See keras.callbacks.Callback.on_train_batch_end."""
print("y_true =", self.y_true.numpy())
print("y_pred =", self.y_pred.numpy())
def on_train_end(self, _logs=None):
"""Clean up."""
del self.y_true, self.y_pred
def tf_nan(dtype):
"""Create NaN variable of proper dtype and variable shape for assign()."""
return tf.Variable(float("nan"), dtype=dtype, shape=tf.TensorShape(None))
def main():
"""Run main."""
model = keras.Sequential([keras.layers.Dense(1, input_shape=(2,))])
callback = DataCallback()
model.compile(loss="mse", optimizer="adam", metrics=[callback.metric]) # diff
model.fit(
x=tf.transpose(tf.range(7.0) + [[0.2], [0.4]]),
y=tf.transpose(tf.range(7.0) + 10 + [[0.5]]),
validation_data=(
tf.transpose(tf.range(11.0) + 30 + [[0.6], [0.7]]),
tf.transpose(tf.range(11.0) + 40 + [[0.9]]),
),
shuffle=False,
batch_size=3,
epochs=2,
verbose=0,
callbacks=[callback],
)
model.save("tmp.tf")
if __name__ == "__main__":
main()
Custom training step
A custom training step is what I used in an earlier version of this answer. The idea still works in principle, but y_pred can be expensive and it might make sense to use a custom metric (see above) if that is required.
[+] Inputs/Samples (x)
[+] Weights (w)
[+] Targets/Labels (y_true)
[~] Outputs/Predictions (y_pred) [expensive!]
[ ] All layers (or only final input/output layers)
"""Demonstrate batch data access using a custom training step."""
import tensorflow as tf
from tensorflow import keras
class DataCallback(keras.callbacks.Callback): # diff
"""Callback to operate on batch data from training step."""
def __init__(self):
"""Initialize tf.Variables."""
super().__init__()
self.x = None
self.w = None
self.y_true = None
self.y_pred = None
def set_model(self, model):
"""Wrap the model.train_step function to access training batch data."""
self.x = tf_nan(model.input.dtype)
# pylint:disable=protected-access (replace by proper dtype if you know it)
if model.compiled_loss._user_loss_weights is not None:
self.w = tf_nan(model.compiled_loss._user_loss_weights.dtype)
self.y_true = tf_nan(model.output.dtype)
self.y_pred = tf_nan(model.output.dtype)
model_train_step = model.train_step
def outer_train_step(data):
# https://github.com/keras-team/keras/blob/v2.7.0/keras/engine/training.py
x, y_true, w = keras.utils.unpack_x_y_sample_weight(data)
self.x.assign(x)
if w is not None:
self.w.assign(w)
self.y_true.assign(y_true)
result = model_train_step(data)
y_pred = model(x)
self.y_pred.assign(y_pred)
return result
model.train_step = outer_train_step
def on_train_batch_end(self, _batch, _logs=None):
"""See keras.callbacks.Callback.on_train_batch_end."""
print("x =", self.x.numpy())
if self.w is not None:
print("w =", self.w.numpy())
print("y_true =", self.y_true.numpy())
print("y_pred =", self.y_pred.numpy())
def on_train_end(self, _logs=None):
"""Clean up."""
del self.x, self.w, self.y_true, self.y_pred
def tf_nan(dtype):
"""Create NaN variable of proper dtype and variable shape for assign()."""
return tf.Variable(float("nan"), dtype=dtype, shape=tf.TensorShape(None))
def main():
"""Run main."""
model = keras.Sequential([keras.layers.Dense(1, input_shape=(2,))])
callback = DataCallback()
model.compile(loss="mse", optimizer="adam")
model.fit(
x=tf.transpose(tf.range(7.0) + [[0.2], [0.4]]),
y=tf.transpose(tf.range(7.0) + 10 + [[0.5]]),
validation_data=(
tf.transpose(tf.range(11.0) + 30 + [[0.6], [0.7]]),
tf.transpose(tf.range(11.0) + 40 + [[0.9]]),
),
shuffle=False,
batch_size=3,
epochs=2,
verbose=0,
callbacks=[callback],
)
model.save("tmp.tf")
if __name__ == "__main__":
main()
Custom layer call
A custom layer call is a super-flexible way of accessing each layer's inputs and outputs. The callback handles patching of the call functions for a list of layers. While we cannot access weights and targets (as these quantitities do not make sense at the level of individual layers), it allows us to access individual layer activations, which can be handy for questions such as How does one log activations using `tf.keras.callbacks.TensorBoard`?.
[+] Inputs/Samples (x)
[ ] Weights (w)
[ ] Targets/Labels (y_true)
[+] Outputs/Predictions (y_pred)
[+] All layers (or only final input/output layers)
"""Demonstrate batch data access using custom layer calls."""
import tensorflow as tf
from tensorflow import keras
class DataCallback(keras.callbacks.Callback): # diff
"""Callback to operate on batch data from selected (to be wrapped) layers."""
def __init__(self, layers):
"""Wrap the calls of an iterable of model layers to access layer batch data."""
super().__init__()
self.data = {}
self.inner_calls = {}
self.outer_calls = {}
for layer in layers:
self.data[layer] = {
"inputs": tf_nan(layer.input.dtype),
"outputs": tf_nan(layer.output.dtype),
}
self.inner_calls[layer] = layer.call
def outer_call(inputs, layer=layer, layer_call=layer.call):
self.data[layer]["inputs"].assign(inputs)
outputs = layer_call(inputs)
self.data[layer]["outputs"].assign(outputs)
return outputs
self.outer_calls[layer] = outer_call
def on_train_batch_begin(self, _epoch, _logs=None):
"""Wrap layer calls during each batch."""
for layer, call in self.outer_calls.items():
layer.call = call
def on_train_batch_end(self, _epoch, _logs=None):
"""Restore original layer calls for ModelCheckpoint, model.save, ..."""
for layer, call in self.inner_calls.items():
layer.call = call
for layer, data in self.data.items():
print("Layer =", layer)
print("Inputs =", data["inputs"].numpy())
print("Outputs =", data["outputs"].numpy())
def tf_nan(dtype):
"""Create NaN variable of proper dtype and variable shape for assign()."""
return tf.Variable(float("nan"), dtype=dtype, shape=tf.TensorShape(None))
def main():
"""Run main."""
model = keras.Sequential([keras.layers.Dense(1, input_shape=(2,))])
callback = DataCallback(model.layers) # diff
model.compile(loss="mse", optimizer="adam")
model.fit(
x=tf.transpose(tf.range(7.0) + [[0.2], [0.4]]),
y=tf.transpose(tf.range(7.0) + 10 + [[0.5]]),
validation_data=(
tf.transpose(tf.range(11.0) + 30 + [[0.6], [0.7]]),
tf.transpose(tf.range(11.0) + 40 + [[0.9]]),
),
shuffle=False,
batch_size=3,
epochs=2,
verbose=0,
callbacks=[callback],
)
model.save("tmp.tf")
if __name__ == "__main__":
main()
When to use which and open to-dos
I think the snippets above each solution nicely summarize what each approach is capable of. Generally,
a custom training step will be ideal to access the model input, such as batched dataset generators, effects of shuffling, etc;
a custom layer call is ideal to access the in-betweens of the model; and
a custom metric is ideal to access the outputs of the model.
I am fairly certain (but have not tried) that one can combine all approaches to be able to access all batch quantities simultaneously. I have not tested anything but training mode - each method can have further pros and cons relating to their usefulness in testing or prediction mode. Finally, I assume, but have not tested either, that their should be only minor differences between tf.keras and keras. Having tested this code on TF2.8.rc1 and Keras 2.8.0, which has moved the tf.keras code back into the keras pip package, and not using any private APIs, I believe this assumption is justified.
It would be great if this approach could be extended to access model.inputs and model.outputs again. Currently, I am getting errors such as this one:
TypeError: You are passing KerasTensor(...), an intermediate Keras symbolic input/output, to a TF API that does not allow registering custom dispatchers, such as tf.cond, tf.function, gradient tapes, or tf.map_fn. Keras Functional model construction only supports TF API calls that do support dispatching, such as tf.math.add or tf.reshape. Other APIs cannot be called directly on symbolic Kerasinputs/outputs. You can work around this limitation by putting the operation in a custom Keras layer call and calling that layer on this symbolic input/output.
Previous answer
From TF 2.2 on, you can use custom training steps rather than callbacks to achieve what you want. Here's a demo that works with tensorflow==2.2.0rc1, using inheritance to improve the keras.Sequential model. Performance-wise, this is not ideal as predictions are made twice, once in self(x, training=True) and once in super().train_step(data). But you get the idea.
This works in eager mode and does not use private APIs, so it should be pretty stable. One caveat is that you have to use tf.keras (standalone keras does not support Model.train_step), but I feel standalone keras is becoming more and more deprecated anyway. (In fact, tf.keras migrates to keras in TF2.8.)
"""Demonstrate access to Keras batch tensors in a tf.keras custom training step."""
import numpy as np
from tensorflow import keras
from tensorflow.keras import backend as K
from tensorflow.python.keras.engine import data_adapter
in_shape = (2,)
out_shape = (1,)
batch_size = 3
n_samples = 7
class SequentialWithPrint(keras.Sequential):
def train_step(self, original_data):
# Basically copied one-to-one from https://git.io/JvDTv
data = data_adapter.expand_1d(original_data)
x, y_true, w = data_adapter.unpack_x_y_sample_weight(data)
y_pred = self(x, training=True)
# this is pretty much like on_train_batch_begin
K.print_tensor(w, "Sample weight (w) =")
K.print_tensor(x, "Batch input (x) =")
K.print_tensor(y_true, "Batch output (y_true) =")
K.print_tensor(y_pred, "Prediction (y_pred) =")
result = super().train_step(original_data)
# add anything here for on_train_batch_end-like behavior
return result
# Model
model = SequentialWithPrint([keras.layers.Dense(out_shape[0], input_shape=in_shape)])
model.compile(loss="mse", optimizer="adam")
# Example data
X = np.random.rand(n_samples, *in_shape)
Y = np.random.rand(n_samples, *out_shape)
model.fit(X, Y, batch_size=batch_size)
print("X: ", X)
print("Y: ", Y)
Finally, here is a simpler example without inheritance:
"""Demonstrate access to Keras batch tensors in a tf.keras custom training step."""
import tensorflow as tf
IN_SHAPE = (2,)
OUT_SHAPE = (1,)
BATCH_SIZE = 3
N_SAMPLES = 7
def make_print_data_and_train_step(keras_model):
"""Return a train_step function that prints data batches."""
original_train_step = keras_model.train_step
def print_data_and_train_step(data):
# Adapted from https://git.io/JvDTv, skipping data_adapter.expand_1d
x, y_true, w = tf.keras.utils.unpack_x_y_sample_weight(data)
y_pred = keras_model(x, training=True)
# this is pretty much like on_train_batch_begin
tf.keras.backend.print_tensor(w, "Sample weight (w) =")
tf.keras.backend.print_tensor(x, "Batch input (x) =")
tf.keras.backend.print_tensor(y_true, "Batch output (y_true) =")
tf.keras.backend.print_tensor(y_pred, "Prediction (y_pred) =")
result = original_train_step(data)
# add anything here for on_train_batch_end-like behavior
return result
return print_data_and_train_step
# Model
model = tf.keras.Sequential([tf.keras.layers.Dense(OUT_SHAPE[0], input_shape=IN_SHAPE)])
model.train_step = make_print_data_and_train_step(model)
model.compile(loss="mse", optimizer="adam")
# Example data
X = tf.random.normal((N_SAMPLES, *IN_SHAPE))
Y = tf.random.normal((N_SAMPLES, *OUT_SHAPE))
model.fit(X, Y, batch_size=BATCH_SIZE)
print("X: ", X)
print("Y: ", Y)
Update: This approach has stopped working. See my other answer a number of solutions compatible with TF2.8 (and hopefully beyond).
One problem with #Yu-Yang's solution is that it relies on model._function_kwargs, which is not guaranteed to work as it is not part of the API. In particular, in TF2 with eager execution, session kwargs seem to be either not accepted at all or run preemptively due to eager mode.
Therefore, here is my solution tested on tensorflow==2.1.0. The trick is to replace fetches by a Keras metric, in which the assignment operations from fetches are made during training.
This even enables a Keras-only solution if the batch size divides the number of samples; otherwise, another trick has to be applied when initializing TensorFlow variables with a None shape, similar to validate_shape=False in earlier solutions (compare https://github.com/tensorflow/tensorflow/issues/35667).
Importantly, tf.keras behaves differently from keras (sometimes just ignoring assignments, or seeing variables as Keras symbolic tensors), so this updated solution takes care of both implementations (Keras==2.3.1 and tensorflow==2.1.0).
"""Demonstrate access to Keras symbolic tensors in a (tf.)keras.Callback."""
import numpy as np
import tensorflow as tf
use_tf_keras = True
if use_tf_keras:
from tensorflow import keras
from tensorflow.keras import backend as K
tf.config.experimental_run_functions_eagerly(False)
compile_kwargs = {"run_eagerly": False, "experimental_run_tf_function": False}
else:
import keras
from keras import backend as K
compile_kwargs = {}
in_shape = (2,)
out_shape = (1,)
batch_size = 3
n_samples = 7
class CollectKerasSymbolicTensorsCallback(keras.callbacks.Callback):
"""Collect Keras symbolic tensors."""
def __init__(self):
"""Initialize intermediate variables for batches and lists."""
super().__init__()
# Collect batches here
self.inputs = []
self.targets = []
self.outputs = []
# # For a pure Keras solution, we need to know the shapes beforehand;
# # in particular, batch_size must divide n_samples:
# self.input = K.variable(np.empty((batch_size, *in_shape)))
# self.target = K.variable(np.empty((batch_size, *out_shape)))
# self.output = K.variable(np.empty((batch_size, *out_shape)))
# If the shape of these variables will change (e.g., last batch), initialize
# arbitrarily and specify `shape=tf.TensorShape(None)`:
self.input = tf.Variable(0.0, shape=tf.TensorShape(None))
self.target = tf.Variable(0.0, shape=tf.TensorShape(None))
self.output = tf.Variable(0.0, shape=tf.TensorShape(None))
def on_batch_end(self, batch, logs=None):
"""Evaluate the variables and save them into lists."""
self.inputs.append(K.eval(self.input))
self.targets.append(K.eval(self.target))
self.outputs.append(K.eval(self.output))
def on_train_end(self, logs=None):
"""Print all variables."""
print("Inputs: ", *self.inputs)
print("Targets: ", *self.targets)
print("Outputs: ", *self.outputs)
#tf.function
def assign_keras_symbolic_tensors_metric(_foo, _bar):
"""
Return the assignment operations as a metric to have them evaluated by Keras.
This replaces `fetches` from the TF1/non-eager-execution solution.
"""
# Collect assignments as list of (dest, src)
assignments = (
(callback.input, model.inputs[0]),
(callback.target, model._targets[0] if use_tf_keras else model.targets[0]),
(callback.output, model.outputs[0]),
)
for (dest, src) in assignments:
dest.assign(src)
return 0
callback = CollectKerasSymbolicTensorsCallback()
metrics = [assign_keras_symbolic_tensors_metric]
# Example model
model = keras.Sequential([keras.layers.Dense(out_shape[0], input_shape=in_shape)])
model.compile(loss="mse", optimizer="adam", metrics=metrics, **compile_kwargs)
# Example data
X = np.random.rand(n_samples, *in_shape)
Y = np.random.rand(n_samples, *out_shape)
model.fit(X, Y, batch_size=batch_size, callbacks=[callback])
print("X: ", X)
print("Y: ", Y)
Inspired by the way tf.keras.callbacks.TesnsorBoard saves v1 (graph) summaries.
No variable assignments and no redundant metrics.
For use with tensorflow>=2.0.0, graph (disable eager) mode during evaluating.
Extensive operations on the numpy predictions can be implemented by overriding SavePrediction._pred_callback.
import numpy as np
import tensorflow as tf
from tensorflow import keras
tf.compat.v1.disable_eager_execution()
in_shape = (2,)
out_shape = (1,)
batch_size = 2
n_samples = 32
class SavePrediction(keras.callbacks.Callback):
def __init__(self):
super().__init__()
self._get_pred = None
self.preds = []
def _pred_callback(self, preds):
self.preds.append(preds)
def set_model(self, model):
super().set_model(model)
if self._get_pred is None:
self._get_pred = self.model.outputs[0]
def on_test_begin(self, logs):
# pylint: disable=protected-access
self.model._make_test_function()
# pylint: enable=protected-access
if self._get_pred not in self.model.test_function.fetches:
self.model.test_function.fetches.append(self._get_pred)
self.model.test_function.fetch_callbacks[self._get_pred] = self._pred_callback
def on_test_end(self, logs):
if self._get_pred in self.model.test_function.fetches:
self.model.test_function.fetches.remove(self._get_pred)
if self._get_pred in self.model.test_function.fetch_callbacks:
self.model.test_function.fetch_callbacks.pop(self._get_pred)
print(self.preds)
model = keras.Sequential([
keras.layers.Dense(out_shape[0], input_shape=in_shape)
])
model.compile(loss="mse", optimizer="adam")
X = np.random.rand(n_samples, *in_shape)
Y = np.random.rand(n_samples, *out_shape)
model.evaluate(X, Y,
batch_size=batch_size,
callbacks=[SavePrediction()])
I know about the "Serving a Tensorflow Model" page
https://www.tensorflow.org/serving/serving_basic
but those functions assume you're using tf.Session() which the DNNClassifier tutorial does not... I then looked at the api doc for DNNClassifier and it has an export_savedmodel function (the export function is deprecated) and it seems simple enough but I am getting a "'NoneType' object is not iterable" error... which is suppose to mean I'm passing in an empty variable but I'm unsure what I need to change... I've essentially copied and pasted the code from the get_started/tflearn page on tensorflow.org but then added
directoryName = "temp"
def serving_input_fn():
print("asdf")
classifier.export_savedmodel(
directoryName,
serving_input_fn
)
just after the classifier.fit function call... the other parameters for export_savedmodel are optional I believe... any ideas?
Tutorial with Code:
https://www.tensorflow.org/get_started/tflearn#construct_a_deep_neural_network_classifier
API Doc for export_savedmodel
https://www.tensorflow.org/api_docs/python/tf/contrib/learn/DNNClassifier#export_savedmodel
There are two kind of TensorFlow applications:
The functions that assume you are using tf.Session() are functions from "low level" Tensorflow examples, and
the DNNClassifier tutorial is a "high level" Tensorflow application.
I'm going to explain how to export "high level" Tensorflow models (using export_savedmodel).
The function export_savedmodel requires the argument serving_input_receiver_fn, that is a function without arguments, which defines the input from the model and the predictor. Therefore, you must create your own serving_input_receiver_fn, where the model input type match with the model input in the training script, and the predictor input type match with the predictor input in the testing script.
On the other hand, if you create a custom model, you must define the export_outputs, defined by the function tf.estimator.export.PredictOutput, which input is a dictionary that define the name that has to match with the name of the predictor output in the testing script.
For example:
TRAINING SCRIPT
def serving_input_receiver_fn():
serialized_tf_example = tf.placeholder(dtype=tf.string, shape=[None], name='input_tensors')
receiver_tensors = {"predictor_inputs": serialized_tf_example}
feature_spec = {"words": tf.FixedLenFeature([25],tf.int64)}
features = tf.parse_example(serialized_tf_example, feature_spec)
return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)
def estimator_spec_for_softmax_classification(logits, labels, mode):
predicted_classes = tf.argmax(logits, 1)
if (mode == tf.estimator.ModeKeys.PREDICT):
export_outputs = {'predict_output': tf.estimator.export.PredictOutput({"pred_output_classes": predicted_classes, 'probabilities': tf.nn.softmax(logits)})}
return tf.estimator.EstimatorSpec(mode=mode, predictions={'class': predicted_classes, 'prob': tf.nn.softmax(logits)}, export_outputs=export_outputs) # IMPORTANT!!!
onehot_labels = tf.one_hot(labels, 31, 1, 0)
loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits)
if (mode == tf.estimator.ModeKeys.TRAIN):
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)
eval_metric_ops = {'accuracy': tf.metrics.accuracy(labels=labels, predictions=predicted_classes)}
return tf.estimator.EstimatorSpec(mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)
def model_custom(features, labels, mode):
bow_column = tf.feature_column.categorical_column_with_identity("words", num_buckets=1000)
bow_embedding_column = tf.feature_column.embedding_column(bow_column, dimension=50)
bow = tf.feature_column.input_layer(features, feature_columns=[bow_embedding_column])
logits = tf.layers.dense(bow, 31, activation=None)
return estimator_spec_for_softmax_classification(logits=logits, labels=labels, mode=mode)
def main():
# ...
# preprocess-> features_train_set and labels_train_set
# ...
classifier = tf.estimator.Estimator(model_fn = model_custom)
train_input_fn = tf.estimator.inputs.numpy_input_fn(x={"words": features_train_set}, y=labels_train_set, batch_size=batch_size_param, num_epochs=None, shuffle=True)
classifier.train(input_fn=train_input_fn, steps=100)
full_model_dir = classifier.export_savedmodel(export_dir_base="C:/models/directory_base", serving_input_receiver_fn=serving_input_receiver_fn)
TESTING SCRIPT
def main():
# ...
# preprocess-> features_test_set
# ...
with tf.Session() as sess:
tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.SERVING], full_model_dir)
predictor = tf.contrib.predictor.from_saved_model(full_model_dir)
model_input = tf.train.Example(features=tf.train.Features( feature={"words": tf.train.Feature(int64_list=tf.train.Int64List(value=features_test_set)) }))
model_input = model_input.SerializeToString()
output_dict = predictor({"predictor_inputs":[model_input]})
y_predicted = output_dict["pred_output_classes"][0]
(Code tested in Python 3.6.3, Tensorflow 1.4.0)
If you try to use predictor with tensorflow > 1.6 you can get this Error :
signature_def_key "serving_default". Available signatures are ['predict']. Original error:
No SignatureDef with key 'serving_default' found in MetaGraphDef.
Here is working example which is tested on 1.7.0 :
SAVING :
First you need to define features length in dict format like this:
feature_spec = {'x': tf.FixedLenFeature([4],tf.float32)}
Then you have to build a function which have placeholder with same shape of features and return using tf.estimator.export.ServingInputReceiver
def serving_input_receiver_fn():
serialized_tf_example = tf.placeholder(dtype=tf.string,
shape=[None],
name='input_tensors')
receiver_tensors = {'inputs': serialized_tf_example}
features = tf.parse_example(serialized_tf_example, feature_spec)
return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)
Then just save with export_savedmodel :
classifier.export_savedmodel(dir_path, serving_input_receiver_fn)
full example code:
import os
from six.moves.urllib.request import urlopen
import numpy as np
import tensorflow as tf
dir_path = os.path.dirname('.')
IRIS_TRAINING = os.path.join(dir_path, "iris_training.csv")
IRIS_TEST = os.path.join(dir_path, "iris_test.csv")
feature_spec = {'x': tf.FixedLenFeature([4],tf.float32)}
def serving_input_receiver_fn():
serialized_tf_example = tf.placeholder(dtype=tf.string,
shape=[None],
name='input_tensors')
receiver_tensors = {'inputs': serialized_tf_example}
features = tf.parse_example(serialized_tf_example, feature_spec)
return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)
def main():
training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
filename=IRIS_TRAINING,
target_dtype=np.int,
features_dtype=np.float32)
test_set = tf.contrib.learn.datasets.base.load_csv_with_header(
filename=IRIS_TEST,
target_dtype=np.int,
features_dtype=np.float32)
feature_columns = [tf.feature_column.numeric_column("x", shape=[4])]
classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns,
hidden_units=[10, 20, 10],
n_classes=3,
model_dir=dir_path)
# Define the training inputs
train_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": np.array(training_set.data)},
y=np.array(training_set.target),
num_epochs=None,
shuffle=True)
# Train model.
classifier.train(input_fn=train_input_fn, steps=200)
classifier.export_savedmodel(dir_path, serving_input_receiver_fn)
if __name__ == "__main__":
main()
Restoring
Now let's restore the model :
import tensorflow as tf
import os
dir_path = os.path.dirname('.') #current directory
exported_path= os.path.join(dir_path, "1536315752")
def main():
with tf.Session() as sess:
tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.SERVING], exported_path)
model_input= tf.train.Example(features=tf.train.Features(feature={
'x': tf.train.Feature(float_list=tf.train.FloatList(value=[6.4, 3.2, 4.5, 1.5]))
}))
predictor= tf.contrib.predictor.from_saved_model(exported_path)
input_tensor=tf.get_default_graph().get_tensor_by_name("input_tensors:0")
model_input=model_input.SerializeToString()
output_dict= predictor({"inputs":[model_input]})
print(" prediction is " , output_dict['scores'])
if __name__ == "__main__":
main()
Here is Ipython notebook demo example with data and explanation :
There are two possible questions and answers possible. First you encounter a missing session for the DNNClassifier which uses the more higher level estimators API (as opposed to the more low level API's where you manipulate the ops yourself). The nice thing about tensorflow is that all high and low level APIs are more-or-less interoperable, so if you want a session and do something with that session, it is as simple as adding:
sess = tf.get_default_session()
The you can start hooking in the remainder of the tutorial.
The second interpretation of your question is, what about the export_savedmodel, well actually export_savedmodel and the sample code from the serving tutorial try to achieve the same goal. When you are training your graph you set up some infrastructure to feed input to the graph (typically batches from a training dataset) however when you switch to 'serving' you will often read your input from somewhere else, and you need some separate infrastructure which replaces the input of the graph used for training. The bottomline is that the serving_input_fn() which you filled with a print should in essence return an input op. This is also said in the documentation:
serving_input_fn: A function that takes no argument and returns an
InputFnOps.
Hence instead of print("asdf") it should do something similar as adding an input chain (which should be similar to what builder.add_meta_graph_and_variables is also adding).
Examples of serving_input_fn()'s can for example be found (in the cloudml sample)[https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/census/customestimator/trainer/model.py#L240]. Such as the following which serves input from JSON:
def json_serving_input_fn():
"""Build the serving inputs."""
inputs = {}
for feat in INPUT_COLUMNS:
inputs[feat.name] = tf.placeholder(shape=[None], dtype=feat.dtype)
return tf.estimator.export.ServingInputReceiver(inputs, inputs)