Tensorflow 2.1.0 - An op outside of the function building code is being passed a "Graph" tensor - tensorflow

I am trying to implement a recent paper. Part of this implementation involves moving from tf 1.14 to tf 2.1.0. The code was working with tf 1.14 but is no longer working.
NOTE: If I disable eager execution tf.compat.v1.disable_eager_execution() then the code works as expected.
Is this the solution? I've made plenty of models before in TF 2.x and never had to disable eager execution to achieve normal functionality.
I have distilled the problem to a very short gist that shows what's happening.
Links & Code First Followed By Detailed Error Message
Link to Gist -- https://gist.github.com/darien-schettler/fd5b25626e9eb5b1330cce670bf9cc17
Code
# version 2.1.0
import tensorflow as tf
# version 1.18.1
import numpy as np
# ######## DEFINE CUSTOM FUNCTION FOR TF LAMBDA LAYER ######## #
def resize_like(input_tensor, ref_tensor):
""" Resize an image tensor to the same size/shape as a reference image tensor
Args:
input_tensor : (image tensor) Input image tensor that will be resized
ref_tensor : (image tensor) Reference image tensor that we want to resize the input tensor to.
Returns:
reshaped tensor
"""
reshaped_tensor = tf.image.resize(images=input_tensor,
size=tf.shape(ref_tensor)[1:3],
method=tf.image.ResizeMethod.NEAREST_NEIGHBOR,
preserve_aspect_ratio=False,
antialias=False,
name=None)
return reshaped_tensor
# ############################################################# #
# ############ DEFINE MODEL USING TF.KERAS FN API ############ #
# INPUTS
model_input_1 = tf.keras.layers.Input(shape=(160,160,3))
model_input_2 = tf.keras.layers.Input(shape=(160,160,3))
# OUTPUTS
model_output_1 = tf.keras.layers.Conv2D(filters=64,
kernel_size=(1, 1),
use_bias=False,
kernel_initializer='he_normal',
name='conv_name_base')(model_input_1)
model_output_2 = tf.keras.layers.Lambda(function=resize_like,
arguments={'ref_tensor': model_output_1})(model_input_2)
# MODEL
model = tf.keras.models.Model(inputs=[model_input_1, model_input_2],
outputs=model_output_2,
name="test_model")
# ############################################################# #
# ######### TRY TO UTILIZE PREDICT WITH DUMMY INPUT ########## #
dummy_input = [np.ones((1,160,160,3)), np.zeros((1,160,160,3))]
model.predict(x=dummy_input) # >>>>ERROR OCCURS HERE<<<<
# ############################################################# #
Full Error
>>> model.predict(x=dummy_input) # >>>>ERROR OCCURS HERE<<<<
Traceback (most recent call last):
File "/Users/<username>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 61, in quick_execute
num_outputs)
TypeError: An op outside of the function building code is being passed
a "Graph" tensor. It is possible to have Graph tensors
leak out of the function building context by including a
tf.init_scope in your function building code.
For example, the following function will fail:
#tf.function
def has_init_scope():
my_constant = tf.constant(1.)
with tf.init_scope():
added = my_constant * 2
The graph tensor has name: conv_name_base_1/Identity:0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 1013, in predict
use_multiprocessing=use_multiprocessing)
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 498, in predict
workers=workers, use_multiprocessing=use_multiprocessing, **kwargs)
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 475, in _model_iteration
total_epochs=1)
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 128, in run_one_epoch
batch_outs = execution_function(iterator)
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 98, in execution_function
distributed_function(input_fn))
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 568, in __call__
result = self._call(*args, **kwds)
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 638, in _call
return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds) # pylint: disable=protected-access
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1611, in _filtered_call
self.captured_inputs)
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 545, in call
ctx=ctx)
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 75, in quick_execute
"tensors, but found {}".format(keras_symbolic_tensors))
tensorflow.python.eager.core._SymbolicException: Inputs to eager execution function cannot be Keras symbolic tensors, but found [<tf.Tensor 'conv_name_base_1/Identity:0' shape=(None, 160, 160, 64) dtype=float32>]
One potential solution I thought of would be to replace the Lambda layer with a custom layer... this seems to fix the issue as well. Not sure what the best practices are surrounding this though. Code below.
# version 2.1.0
import tensorflow as tf
# version 1.18.1
import numpy as np
# ######## DEFINE CUSTOM LAYER DIRECTLY BY SUBCLASSING ######## #
class ResizeLike(tf.keras.layers.Layer):
""" tf.keras layer to resize a tensor to the reference tensor shape.
Attributes:
keras.layers.Layer: Base layer class.
This is the class from which all layers inherit.
- A layer is a class implementing common neural networks
operations, such as convolution, batch norm, etc.
- These operations require managing weights,
losses, updates, and inter-layer connectivity.
"""
def __init__(self, **kwargs):
super().__init__(**kwargs)
def call(self, inputs, **kwargs):
"""TODO: docstring
Args:
inputs (TODO): TODO
**kwargs:
TODO
Returns:
TODO
"""
input_tensor, ref_tensor = inputs
return self.resize_like(input_tensor, ref_tensor)
def resize_like(self, input_tensor, ref_tensor):
""" Resize an image tensor to the same size/shape as a reference image tensor
Args:
input_tensor: (image tensor) Input image tensor that will be resized
ref_tensor: (image tensor) Reference image tensor that we want to resize the input tensor to.
Returns:
reshaped tensor
"""
reshaped_tensor = tf.image.resize(images=input_tensor,
size=tf.shape(ref_tensor)[1:3],
method=tf.image.ResizeMethod.NEAREST_NEIGHBOR,
preserve_aspect_ratio=False,
antialias=False)
return reshaped_tensor
# ############################################################# #
# ############ DEFINE MODEL USING TF.KERAS FN API ############ #
# INPUTS
model_input_1 = tf.keras.layers.Input(shape=(160,160,3))
model_input_2 = tf.keras.layers.Input(shape=(160,160,3))
# OUTPUTS
model_output_1 = tf.keras.layers.Conv2D(filters=64,
kernel_size=(1, 1),
use_bias=False,
kernel_initializer='he_normal',
name='conv_name_base')(model_input_1)
model_output_2 = ResizeLike(name="resize_layer")([model_input_2, model_output_1])
# MODEL
model = tf.keras.models.Model(inputs=[model_input_1, model_input_2],
outputs=model_output_2,
name="test_model")
# ############################################################# #
# ######### TRY TO UTILIZE PREDICT WITH DUMMY INPUT ########## #
dummy_input = [np.ones((1,160,160,3)), np.zeros((1,160,160,3))]
model.predict(x=dummy_input) # >>>>ERROR OCCURS HERE<<<<
# ############################################################# #
Thoughts??
Thanks in advance!!
Let me know if you would like me to provide anything else.

You can try the following steps:
Change resize_like as follows:
def resize_like(inputs):
input_tensor, ref_tensor = inputs
reshaped_tensor = tf.image.resize(images=input_tensor,
size=tf.shape(ref_tensor)[1:3],
method=tf.image.ResizeMethod.NEAREST_NEIGHBOR,
preserve_aspect_ratio=False,
antialias=False,
name=None)
return reshaped_tensor
Then, in the Lambda layer:
model_output_2 = tf.keras.layers.Lambda(function=resize_like)([model_input_2, model_output_1])

Related

Use `sentence-transformers` inside of a Tensorflow-recommendation keras model in SageMaker

I've been going crazy for a few days over a problem that I thought trivial. My end-goal is to deploy to AWS Sagemaker a Tensorflow model that uses a simple string as input, calculates the embedding using a 'sentence-transformer' pre-trained model and eventually uses TensorFlow Recommenders to suggest the knn among a collection of embedding I already have calculated. I would like to do this entirely from the model, including the preprocessing (tokenization).
I made the predictions works with different approaches in my notebook. I start having troubles when I try to save my model.
The problem seems to be that HF's AutoTokenizer needs a pure List of Strings as input, and I hit a roadblock whenever I try to save my model using , and trying to go around this with tf.py_function using this approach results in problems with Sagemaker.
My approaches so far:
1. THE 'I THOUGHT IT WAS SO SIMPLE'
startups_ids: list, startup_vectors
):
import tensorflow as tf
import tensorflow_recommenders as tfrs
import numpy as np
from random import randint
exported_model = tfrs.layers.factorized_top_k.BruteForce(SentenceTransformer("all-mpnet-base-v2").encode)
exported_model.index(np.array(startup_vectors), np.array(startups_ids))
# TESTS the model
#for some reason this seems to be needed in order to save the model :/
# https://github.com/tensorflow/recommenders/issues/131
test = exported_model(['Test Text Query'])
print(test)
return exported_model
text_to_startup_model(search_db_ids, search_db_embeddings)
#--> WORKS PERFECTLY, AS I GET SOME SUGGESTIONS
tf.saved_model.save(text_to_startup_model(search_db_ids, search_db_embeddings), export_dir="/home/nicholas/test_model_save/1")
#TypeError Traceback (most recent call last)
# /home/nicholas/Documents/Dev/Rialto-predict-1/notebooks/t2s_different_approaches.ipynb Cell 5 in <cell line: 22>()
# 19 text_to_startup_model(search_db_ids, search_db_embeddings)
# 20 #--> WORKS PERFECTLY, AS I GET SOME SUGGESTIONS
# ---> 22 tf.saved_model.save(text_to_startup_model(search_db_ids, search_db_embeddings), export_dir="/home/nicholas/test_model_save/1")
# File ~/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/tensorflow/python/saved_model/save.py:1334, in save(obj, export_dir, signatures, options)
# 1332 # pylint: enable=line-too-long
# 1333 metrics.IncrementWriteApi(_SAVE_V2_LABEL)
# -> 1334 save_and_return_nodes(obj, export_dir, signatures, options)
# 1335 metrics.IncrementWrite(write_version="2")
#
# .........
#
#
# File ~/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/tensorflow/python/eager/def_function.py:677, in Function._defun_with_scope.<locals>.wrapped_fn(*args, **kwds)
# 673 with default_graph._variable_creator_scope(scope, priority=50): # pylint: disable=protected-access
# 674 # __wrapped__ allows AutoGraph to swap in a converted function. We give
# 675 # the function a weak reference to itself to avoid a reference cycle.
# 676 with OptionalXlaContext(compile_with_xla):
# --> 677 out = weak_wrapped_fn().__wrapped__(*args, **kwds)
# 678 return out
# File ~/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/tensorflow/python/framework/func_graph.py:1147, in func_graph_from_py_func.<locals>.autograph_handler(*args, **kwargs)
# 1145 except Exception as e: # pylint:disable=broad-except
# 1146 if hasattr(e, "ag_error_metadata"):
# -> 1147 raise e.ag_error_metadata.to_exception(e)
# 1148 else:
# 1149 raise
# TypeError: in user code:
# File "/home/nicholas/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/keras/saving/saving_utils.py", line 138, in _wrapped_model *
# outputs = model(*args, **kwargs)
# File "/home/nicholas/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler **
# raise e.with_traceback(filtered_tb) from None
# TypeError: Exception encountered when calling layer "brute_force_3" (type BruteForce).
# in user code:
# File "/home/nicholas/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/tensorflow_recommenders/layers/factorized_top_k.py", line 567, in call *
# queries = self.query_model(queries)
# File "/home/nicholas/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 160, in encode *
# features = self.tokenize(sentences_batch)
# File "/home/nicholas/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 318, in tokenize *
# return self._first_module().tokenize(texts)
# File "/home/nicholas/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/sentence_transformers/models/Transformer.py", line 102, in tokenize *
# batch1.append(text_tuple[0])
# TypeError: 'NoneType' object is not subscriptable
# ...
# Call arguments received:
# • queries=['None']
# • k=None
2. THE tf.py_function
As from my understanding the problem with the first approach is that it has no knowledge of the input type/value this second approach, from Use `sentence-transformers` inside of a keras model was supposedly gonna work, as it uses tf.py_function to accept a List of Strings as first input, without complaining.
def approach_2(startups_ids: list, startup_vectors):
import tensorflow as tf
import tensorflow_recommenders as tfrs
import numpy as np
from transformers import MPNetTokenizer, TFMPNetModel
# Here it loads the specific pre-trained model we are using for Rialto
tokenizer = MPNetTokenizer.from_pretrained(
"sentence-transformers/all-mpnet-base-v2"
)
model = TFMPNetModel.from_pretrained(
"sentence-transformers/all-mpnet-base-v2", from_pt=True
)
class SBert(tf.keras.layers.Layer):
def __init__(self, tokenizer, model):
super(SBert, self).__init__()
self.tokenizer = tokenizer
self.model = model
def tf_encode(self, inputs):
def encode(inputs):
inputs = [x[0].decode("utf-8") for x in inputs.numpy()]
outputs = self.tokenizer(
inputs, padding=True, truncation=True, return_tensors="tf"
)
return outputs["input_ids"], outputs["attention_mask"]
return tf.py_function(
func=encode, inp=[inputs], Tout=[tf.int32, tf.int32]
)
def process(self, i, a):
def __call(i, a):
model_output = self.model(
{"input_ids": i.numpy(), "attention_mask": a.numpy()}
)
return model_output[0]
return tf.py_function(func=__call, inp=[i, a], Tout=[tf.float32])
def mean_pooling(self, model_output, attention_mask):
token_embeddings = tf.squeeze(tf.stack(model_output), axis=0)
input_mask_expanded = tf.cast(
tf.broadcast_to(
tf.expand_dims(attention_mask, -1), tf.shape(token_embeddings)
),
tf.float32,
)
a = tf.math.reduce_sum(token_embeddings * input_mask_expanded, axis=1)
b = tf.clip_by_value(
tf.math.reduce_sum(input_mask_expanded, axis=1),
1e-9,
tf.float32.max,
)
embeddings = a / b
embeddings, _ = tf.linalg.normalize(embeddings, 2, axis=1)
return embeddings
def call(self, inputs):
input_ids, attention_mask = self.tf_encode(inputs)
model_output = self.process(input_ids, attention_mask)
embeddings = self.mean_pooling(model_output, attention_mask)
return embeddings
# Uses the keras-ified model in a Keras model
sbert = SBert(tokenizer, model)
inputs = tf.keras.layers.Input((1,), dtype=tf.string)
outputs = sbert(inputs)
model = tf.keras.Model(inputs, outputs)
# Implements the model we just build for top KNN retrieval, from the pool of pre-calculated startups embeddings.
exported_model = tfrs.layers.factorized_top_k.BruteForce(model)
exported_model.index(np.array(startup_vectors), np.array(startups_ids))
# TESTS the model
# for some reason this seems to be needed in order to save the model :/
# https://github.com/tensorflow/recommenders/issues/131
print(exported_model(tf.constant(["'Test Text Query'"])))
return exported_model
model_to_store_1 = approach_2(search_db_ids, search_db_embeddings)
tf.saved_model.save(model_to_store_1, export_dir="/home/nicholas/test_model_save/2")
# THIS ONE WORKS LIKE A CHARM, saving the model and everything. Deploy on sagemaker is successful.
# FAILS TO WORK ON SAGEMAKER. BELOW THE LOGS WHEN THE MODEL IS CALLED
# ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from model with message "{
# "error": "No OpKernel was registered to support Op 'EagerPyFunc' used by {{node StatefulPartitionedCall/brute_force/model/s_bert/EagerPyFunc}} with these attrs: [is_async=false, Tin=[DT_STRING], _output_shapes=[<unknown>, <unknown>], Tout=[DT_INT32, DT_INT32], token=\"pyfunc_4\"]\nRegistered devices: [CPU]\nRegistered kernels:\n <no registered kernels>\n\n\t [[StatefulPartitionedCall/brute_force/model/s_bert/EagerPyFunc]]\n\t [[StatefulPartitionedCall]]"
# }". See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/rialto-t2s-model-endpoint in account 634470116418 for more information
As you can see from the log, that the problem seems to be with the Eager mode and py_functions. I tried to google and found absolutely nothing on how to address this issue.
3. THE Classes approach
I've tried implementing something building upon this article, but I am running into similar issues that with the first approach, as when I go to save the model, the expected input clashed with the requirements of tokenizer.
EDIT 1 - here a coolab showcasing the approach: https://colab.research.google.com/drive/1gibFdEoHTs0hzD5yiXzLT_-asmilUoAQ?usp=sharing#scrollTo=TibAssWm3D5e
All of this journey triggered some questions:
Question 1 Is this even a best practice? Should I serve my model the tokenized sentences as a tensor?
Question 2 How the hell do I make it work? :)

tensorflow - Invalid argument: Input size should match but they differ by 2

I am trying to train a dl model with tf.keras. I have 67 classes of images inside the image directory like airports, bookstore, casino. And for each classes i have at least 100 images. The data is from mit indoor scene dataset But when I am trying to train the model, I am constantly getting this error.
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input size should match (header_size + row_size * abs_height) but they differ by 2
[[{{node decode_image/DecodeImage}}]]
[[IteratorGetNext]]
(1) Invalid argument: Input size should match (header_size + row_size * abs_height) but they differ by 2
[[{{node decode_image/DecodeImage}}]]
[[IteratorGetNext]]
[[IteratorGetNext/_7]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_1570]
Function call stack:
train_function -> train_function
I tried to resolve the problem by resizing the image with the resizing layer, also included the labels='inferred' and label_mode='categorical' in the image_dataset_from_directory method and included loss='categorical_crossentropy' in the model compile method. Previously labels and label_model were not set and loss was sparse_categorical_crossentropy which i think is not right. so I changed them as described above.But I am still having problems.
There is one question related to this in stackoverflow but the person did not mentioned how he solved the problem just updated that - My suggestion is to check the metadata of the dataset. It helped to fix my problem. But did not mentioned what metadata to look for or what he did to solve the problem.
The code that I am using to train the model -
import os
import PIL
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, Dense, MaxPooling2D, GlobalAveragePooling2D
from tensorflow.keras.layers import Flatten, Dropout, BatchNormalization, Rescaling
from tensorflow.keras.models import Sequential
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
from tensorflow.keras.regularizers import l1, l2
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from pathlib import Path
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# define directory paths
PROJECT_PATH = Path.cwd()
DATA_PATH = PROJECT_PATH.joinpath('data', 'Images')
# create a dataset
batch_size = 32
img_height = 180
img_width = 180
train = tf.keras.utils.image_dataset_from_directory(
DATA_PATH,
validation_split=0.2,
subset="training",
labels="inferred",
label_mode="categorical",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
valid = tf.keras.utils.image_dataset_from_directory(
DATA_PATH,
validation_split=0.2,
subset="validation",
labels="inferred",
label_mode="categorical",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
class_names = train.class_names
for image_batch, label_batch in train.take(1):
print("\nImage shape:", image_batch.shape)
print("Label Shape", label_batch.shape)
# resize image
resize_layer = tf.keras.layers.Resizing(img_height, img_width)
train = train.map(lambda x, y: (resize_layer(x), y))
valid = valid.map(lambda x, y: (resize_layer(x), y))
# standardize the data
normalization_layer = tf.keras.layers.Rescaling(1./255)
train = train.map(lambda x, y: (normalization_layer(x), y))
valid = valid.map(lambda x, y: (normalization_layer(x), y))
image_batch, labels_batch = next(iter(train))
first_image = image_batch[0]
print("\nImage (min, max) value:", (np.min(first_image), np.max(first_image)))
print()
# configure the dataset for performance
AUTOTUNE = tf.data.AUTOTUNE
train = train.cache().prefetch(buffer_size=AUTOTUNE)
valid = valid.cache().prefetch(buffer_size=AUTOTUNE)
# create a basic model architecture
num_classes = len(class_names)
# initiate a sequential model
model = Sequential()
# CONV1
model.add(Conv2D(filters=64, kernel_size=3, activation="relu",
input_shape=(img_height, img_width, 3)))
model.add(BatchNormalization())
# CONV2
model.add(Conv2D(filters=64, kernel_size=3,
activation="relu"))
model.add(BatchNormalization())
# Pool + Dropout
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.3))
# CONV3
model.add(Conv2D(filters=128, kernel_size=3,
activation="relu"))
model.add(BatchNormalization())
# CONV4
model.add(Conv2D(filters=128, kernel_size=3,
activation="relu"))
model.add(BatchNormalization())
# POOL + Dropout
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.3))
# FC5
model.add(Flatten())
model.add(Dense(128, activation="relu"))
model.add(Dense(num_classes, activation="softmax"))
# compile the model
model.compile(loss="categorical_crossentropy",
optimizer="adam", metrics=['accuracy'])
# train the model
epochs = 25
early_stopping_cb = EarlyStopping(patience=10, restore_best_weights=True)
history = model.fit(train, validation_data=valid, epochs=epochs,
callbacks=[early_stopping_cb], verbose=2)
result = pd.DataFrame(history.history)
print()
print(result.head())
Note -
I just modified the code to make it as simple as possible to reduce the error. The model run for few batches than again got the above error.
Epoch 1/10
732/781 [===========================>..] - ETA: 22s - loss: 3.7882Traceback (most recent call last):
File ".\02_model1.py", line 139, in <module>
model.fit(train, epochs=10, validation_data=valid)
File "C:\Users\BHOLA\anaconda3\lib\site-packages\keras\engine\training.py", line 1184, in fit
tmp_logs = self.train_function(iterator)
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py", line 885, in __call__
result = self._call(*args, **kwds)
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py", line 917, in _call
return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\function.py", line 3039, in __call__
return graph_function._call_flat(
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\function.py", line 1963, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\function.py", line 591, in call
outputs = execute.execute(
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input size should match (header_size + row_size * abs_height) but they differ by 2
[[{{node decode_image/DecodeImage}}]]
[[IteratorGetNext]]
(1) Invalid argument: Input size should match (header_size + row_size * abs_height) but they differ by 2
[[{{node decode_image/DecodeImage}}]]
[[IteratorGetNext]]
[[IteratorGetNext/_2]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_11840]
Function call stack:
train_function -> train_function
Modified code -
# create a dataset
batch_size = 16
img_height = 256
img_width = 256
train = image_dataset_from_directory(
DATA_PATH,
validation_split=0.2,
subset="training",
labels="inferred",
label_mode="categorical",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
valid = image_dataset_from_directory(
DATA_PATH,
validation_split=0.2,
subset="validation",
labels="inferred",
label_mode="categorical",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
model = tf.keras.applications.Xception(
weights=None, input_shape=(img_height, img_width, 3), classes=67)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.fit(train, epochs=10, validation_data=valid)
I think it might be a corrupted file. It is throwing an exception after a data integrity check in the DecodeBMPv2 function (https://github.com/tensorflow/tensorflow/blob/0b6b491d21d6a4eb5fbab1cca565bc1e94ca9543/tensorflow/core/kernels/image/decode_image_op.cc#L594)
If that's the issue and you want to find out which file(s) are throwing the exception, you can try something like this below on the directory containing the files. Remove/replace any files you find and it should train normally.
import glob
img_paths = glob.glob(os.path.join(<path_to_dataset>,'*/*.*') # assuming you point to the directory containing the label folders.
bad_paths = []
for image_path in img_paths:
try:
img_bytes = tf.io.read_file(path)
decoded_img = tf.io.decode_image(img_bytes)
except tf.errors.InvalidArgumentError as e:
print(f"Found bad path {image_path}...{e}")
bad_paths.append(image_path)
print(f"{image_path}: OK")
print("BAD PATHS:")
for bad_path in bad_paths:
print(f"{bad_path}")
This is in fact a corrupted file problem. However, the underlying issue is far more subtle. Here is an explanation of what is going on and how to circumvent this obstacle. I encountered the very same problem on the very same MIT Indoor Scene Classification dataset. All the images are JPEG files (spoiler alert: well, are they?).
It has been correctly noted that the exception is raised exactly here, in a C++ file related to the tf.io.decode_image() function. It is the decode_image() function where the issue lies, which is called by the
tf.keras.utils.image_dataset_from_directory().
On the other hand, tf.keras.preprocessing.image.ImageDataGenerator().flow_from_directory() relies on Pillow under the hood (shown here, which is called from here). This is the reason why adopting the ImageDataGenerator class works.
After closer inspection of the corresponding C++ source file, one can observe that the function is actually called DecodeBmpV2(...), as defined here. This raises the question of why a JPEG image is being treated as a BMP one. The aforementioned function is actually called here, as part of a basic switch statement the aim of which is further direct data conversion according to the determined type. Thus, the piece of code that determines the file type should be subjected to deeper analysis. The file type is determined according to the value of starting bytes (see here). Long story short, a simple comparison of so-called magic bytes that signify file type is performed.
Here is a code extract with the corresponding magic bytes.
static const char kPngMagicBytes[] = "\x89\x50\x4E\x47\x0D\x0A\x1A\x0A";
static const char kGifMagicBytes[] = "\x47\x49\x46\x38";
static const char kBmpMagicBytes[] = "\x42\x4d";
static const char kJpegMagicBytes[] = "\xff\xd8\xff";
After identifying which files raise the exception, I saw that they were supposed to be JPEG files, however, their starting bytes indicated a BMP format instead.
Here is an example of 3 files and their first 10 bytes.
laundromat\laundry_room_area.jpg
b'ffd8ffe000104a464946'
laundromat\Laundry_Room_Edens1A.jpg
b'ffd8ffe000104a464946'
laundromat\Laundry_Room_bmp.jpg
b'424d3800030000000000'
Look at the last one. It even contains the word bmp in the file name. Why is that so? I do not know. The dataset does contain corrupted image files. Someone probably converted the file from BMP to JPEG, yet the tool used did not work correctly. We can just guess the real reason, but that is now irrelevant.
The method by which the file type is determined is different from the one performed by the Pillow package, thus, there is nothing we can do about it. The recommendation is to identify the corrupted files, which is actually easy or to rely on the ImageDataGenerator. However, I would advise against doing so as this class has been marked as deprecated. It is not a bug in code per se, but rather bad data inadvertently introduced into the dataset.

ValueError: Tensor Tensor("dense_1/Sigmoid:0", shape=(?, 1), dtype=float32) is not an element of this graph

Im using tf.keras to load a model I made previously with tf.keras but when I try to make the prediction I just get this
[ERROR] [1560045312.143498]: bad callback: <function callback at 0x7f16fe94b8c0>
Traceback (most recent call last):
File "/opt/ros/kinetic/lib64/python2.7/site-packages/rospy/topics.py", line 750, in _invoke_callback
cb(msg)
File "/home/franky/catkin_ws_kinetic/src/tfm/scripts/nnet_predictor.py", line 50, in callback
true_face.eyes[1].height
File "/usr/lib64/python2.7/site-packages/tensorflow/python/keras/engine/training.py", line 1113, in predict
self, x, batch_size=batch_size, verbose=verbose, steps=steps)
File "/usr/lib64/python2.7/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 195, in model_iteration
f = _make_execution_function(model, mode)
File "/usr/lib64/python2.7/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 122, in _make_execution_function
return model._make_execution_function(mode)
File "/usr/lib64/python2.7/site-packages/tensorflow/python/keras/engine/training.py", line 1989, in _make_execution_function
self._make_predict_function()
File "/usr/lib64/python2.7/site-packages/tensorflow/python/keras/engine/training.py", line 1979, in _make_predict_function
**kwargs)
File "/usr/lib64/python2.7/site-packages/tensorflow/python/keras/backend.py", line 3201, in function
return GraphExecutionFunction(inputs, outputs, updates=updates, **kwargs)
File "/usr/lib64/python2.7/site-packages/tensorflow/python/keras/backend.py", line 2939, in __init__
with ops.control_dependencies(self.outputs):
File "/usr/lib64/python2.7/site-packages/tensorflow/python/framework/ops.py", line 5028, in control_dependencies
return get_default_graph().control_dependencies(control_inputs)
File "/usr/lib64/python2.7/site-packages/tensorflow/python/framework/ops.py", line 4528, in control_dependencies
c = self.as_graph_element(c)
File "/usr/lib64/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3478, in as_graph_element
return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
File "/usr/lib64/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3557, in _as_graph_element_locked
raise ValueError("Tensor %s is not an element of this graph." % obj)
ValueError: Tensor Tensor("dense_1/Sigmoid:0", shape=(?, 1), dtype=float32) is not an element of this graph.
I also may say that I am using this in a ros framework (robot operating system [it is not an operating system, just have a super misleading name, I am on linux]) so I know that the callback() is being called in a thread and I can't avoid using ros.
Also I have tested that if I use the prediction function in the main thread, everything works fine.
I have already tried the with graph.as_default(): and clear_session() solutions, but no luck.
I have already check that every import is from tf.keras and I'm not mixing tf.keras with keras
I also tried to use a Lock() to avoid having the predict() function being called 2+ at the same time
#!/usr/bin/python2
from tensorflow import keras
from tensorflow.keras.models import model_from_json
from tfm_msgs.msg import IsLooking
import numpy as np
from tensorflow.keras.backend import clear_session
import tensorflow as tf
from threading import Thread, Lock
# other non relevant imports
def callback(face_array_stamped):
global mutex
mutex.acquire()
try:
global graph
# with graph.as_default():
global my_model
global pub
true_faces = []
for face in face_array_stamped.faces:
if len(face.eyes) == 2:
true_faces.append(face)
if len(true_faces) == 1:
true_face = true_faces[0]
prediction = my_model.predict(np.array([[
#all the data here
]]))[0]
#↑↑↑↑↑It crashes here↑↑↑↑↑↑
#more non relevant stuff
finally:
mutex.release()
if __name__ == '__main__':
# clear_session()
model_dir = str(os.path.dirname(os.path.abspath(__file__))) + "/../nnet_models/"
json_file = open(model_dir+'model.json', 'r')
my_model = model_from_json(json_file.read())
json_file.close()
my_model.load_weights(model_dir+'model.h5')
# my_model._make_predict_function()
my_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# my_model.summary()
I would expect the code not to crash
I think you should add graph = tf.get_default_graph() and with graph.as_default():
What about this?
from tensorflow import keras
from tensorflow.keras.models import model_from_json
from tfm_msgs.msg import IsLooking
import numpy as np
from tensorflow.keras.backend import clear_session
import tensorflow as tf
from threading import Thread, Lock
# other non relevant imports
graph = tf.get_default_graph()
def callback(face_array_stamped):
global mutex
mutex.acquire()
try:
global my_model
global pub
true_faces = []
for face in face_array_stamped.faces:
if len(face.eyes) == 2:
true_faces.append(face)
if len(true_faces) == 1:
true_face = true_faces[0]
with graph.as_default():
prediction = my_model.predict(np.array([[
#all the data here
]]))[0]
#↑↑↑↑↑It crashes here↑↑↑↑↑↑
#more non relevant stuff
finally:
mutex.release()
if __name__ == '__main__':
# clear_session()
model_dir = str(os.path.dirname(os.path.abspath(__file__))) + "/../nnet_models/"
json_file = open(model_dir+'model.json', 'r')
my_model = model_from_json(json_file.read())
json_file.close()
my_model.load_weights(model_dir+'model.h5')
# my_model._make_predict_function()
my_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# my_model.summary()
At the end I couldn't figure out what's happening, I don't know if it's beacause of how ros works, but I ended up using the solution at
Execute Python function in Main thread from call in Dummy thread
so my code ended up like this:
callback_queue = Queue.Queue()
def prediction_callback(true_face, face_header):
#non relevant stuff
prediction = my_model.predict(np.array([[
#all the variables
]])
#more non relevant stuff
def face_callback(face_array_stamped): #this is the original callback
#...
callback_queue.put(lambda: prediction_callback(true_face, face_array_stamped.header))
#...
if __name__ == '__main__':
#...
while not rospy.is_shutdown():
try:
callback_queue.get(True, 2)()
except Queue.Empty:
pass
I had the same problem, but the solution above didn't work out for me. I wanted to subscribe an image and predict something with keras/tensorflow. Doing so, I got the same errors described above.
The following solution worked for me:
def method_to_predict(msg):
# ...
model.predict(...)
# ...
if __name__ == '__main__':
rospy.init_node('my_node', anonymous=False)
while not rospy.is_shutdown():
msg = rospy.wait_for_message('topic', msg_type)
method_to_predict(msg)
Hope this helps if the solution above doesn't.
I am subscribing to a topic in ROS and in callback method calling model.predict(). Adding:
import tensorflow as tf
global graph,model
graph = tf.get_default_graph()
And
with graph.as_default():
steering_angle = float(model.predict(cropped[None, :, :, :], batch_size=1))
To callback solved my problem, as Nattaphon advised.
Tensorflow 1.12, Keras 2.0.6, Ubuntu 18.04

Tensorflow Estimator.predict() fails

I am recreating the DnCNN, i.e. Gaussian Denoiser, which does image to image prediction with a series of convolutional layers. And it trains perfectly fine, but when i try to do the list(model.predict(..)),
i get the error:
Labels must not be none
I actually put all of the specs arguments of my EstimatorSpec explicitly in there, as they are lazily evaluated depending on the method (train/eval/predict) that is called upon the Estimator.
def DnCNN_model_fn (features, labels, mode):
# some convolutinons here
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=conv_last + input_layer,
loss=tf.losses.mean_squared_error(
labels=labels,
predictions=conv_last + input_layer),
train_op=tf.train.AdamOptimizer(learning_rate=0.001, epsilon=1e-08).minimize(
loss=tf.losses.mean_squared_error(
labels=labels,
predictions=conv_last + input_layer),
global_step=tf.train.get_global_step()),
eval_metric_ops={
"accuracy": tf.metrics.mean_absolute_error(
labels=labels,
predictions=conv_last + input_layer)}
)
Putting it into an estimator:
d = datetime.datetime.now()
DnCNN = tf.estimator.Estimator(
model_fn=DnCNN_model_fn,
model_dir=root + 'model/' +
"DnCNN_{}_{}_{}_{}".format(d.month, d.day, d.hour, d.minute),
config=tf.estimator.RunConfig(save_summary_steps=2,
log_step_count_steps=10)
)
After training the model i do the predictions as follows:
test_input_fn = tf.estimator.inputs.numpy_input_fn(
x= test_data[0:2,:,:,:],
y= None,
batch_size=1,
num_epochs=1,
shuffle=False)
predicted = DnCNN.predict(input_fn=test_input_fn)
list(predicted) # this is where the error occurs
The traceback says, that tf.losses.mean_squared_error is causing this.
Traceback (most recent call last):
File "<input>", line 16, in <module>
File "...\venv2\lib\site-packages\tensorflow\python\estimator\estimator.py", line 551, in predict
features, None, model_fn_lib.ModeKeys.PREDICT, self.config)
File "...\venv2\lib\site-packages\tensorflow\python\estimator\estimator.py", line 1169, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "<input>", line 95, in DnCNN_model_fn
File "...\venv2\lib\site-packages\tensorflow\python\ops\losses\losses_impl.py", line 663, in mean_squared_error
raise ValueError("labels must not be None.")
ValueError: labels must not be None.
From estimator.predict raises "ValueError: None values not supported":
"In your model_fn, you define the loss in every mode (train / eval / predict). This means that even in predict mode, the labels will be used and need to be provided.
When you are in predict mode, you actually just need to return the predictions so you can return early from the function:"
def model_fn(features, labels, mode):
#...
y = ...
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions=y)
#...
I am not entirly sure about what the exact error was, but i managed to get my model predicting.
what i changed (apart from adding the Batch norm UPDATE_OPS, which did not solve my issue) was short-circuiting (i.e. early & seperatly return) the tf.estimator.EstimatorSpec in case of tf.estimator.ModeKeys.PREDICT:
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=conv_last + input_layer
)
apparently there seems to be something wrong with the doc statement (or i did not understand it correctly) found at tf.estimator.EstimatorSpec :
model_fn can populate all arguments independent of mode. In this case, some arguments will be ignored by an Estimator. E.g. train_op will be ignored in eval and infer modes.
BTW: given mode is predict, at some point, labels are automatically replaced by None in any case.

Tensorflow object_detection: unable to find input and output tensors

I've successfully trained and saved a faster RCNN model for tensorflow using their object detection API. I'm now trying to run some inferences on the code, taking bits of code from this tutorial.
However, after I successfully restore the metagraph and the checkpoint, the system can't find the input and output nodes, I get the following error:
KeyError: "The name 'image_tensor:0' refers to a Tensor which does not
exist. The operation, 'image_tensor', does not exist in the graph."
The checkpoint and metagraph were created by the train.py script, on my own data, following the instructions given here.
This is my code:
OUTPUT_DIR = "my_path/models/SSD_v1/train"
CKPT_DIR = OUTPUT_DIR
LATEST_CKPT_FILENAME = "checkpoint"
LAST_CKPT_FILE = os.path.join(CKPT_DIR, LATEST_CKPT_FILENAME)
MODEL_FILENAME_PATH = os.path.join(OUTPUT_DIR, "model.ckpt.meta")
def load_image_into_numpy_array(image):
(im_width, im_height) = image.size
return np.array(image.getdata()).reshape(
(im_height, im_width, 3)).astype(np.uint8)
def test_model(images_list, path_to_ckpt=None,
meta_graph=None):
if path_to_ckpt is None:
path_to_ckpt = tf.train.latest_checkpoint(CKPT_DIR, LATEST_CKPT_FILENAME)
if meta_graph is None:
meta_graph = MODEL_FILENAME_PATH
print("test_model launched")
tf.reset_default_graph()
detection_graph = tf.Graph()
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
# Restore graph
saver = tf.train.import_meta_graph(meta_graph, clear_devices=True)
print('metagraph restored')
saver.restore(sess, path_to_ckpt)
print('graph restored')
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0') # This is where the error happens
# Each box represents a part of the image where a particular object was detected.
detected_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
detected_scores = detection_graph.get_tensor_by_name('detection_scores:0')
detected_classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = graph.get_tensor_by_name('num_detections:0')
print("Output tensors: ")
print(detected_boxes)
print(detected_scores)
print(detected_classes)
print('')
for i, image in enumerate(images_list):
detected_boxes, detected_scores, detected_classes, num_detect = sess.run([detected_boxes, detected_scores, detected_classes, num_detections],
feed_dict={image_tensor: image})
print(i, num_detect, detected_boxes, detected_scores, detected_classes)
def main():
directory_path = "../data/samples/"
image_files = [f for f in os.listdir(directory_path) if os.path.isfile(os.path.join(directory_path, f))]
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_list = [ np.expand_dims(load_image_into_numpy_array(Image.open(os.path.join(directory_path, f))), axis=0) for f in image_files]
test_model(images_list=image_list)
if __name__=="__main__":
main()
Full error stacktrace:
Traceback (most recent call last): File "/home/guillaumedelaboulaye/PR8210PANO/faster-rcnn/pano_faster_rcnn/src/run_faster_rcnn_inference.py", line 99, in <module>
main() File "/home/guillaumedelaboulaye/PR8210PANO/faster-rcnn/pano_faster_rcnn/src/run_faster_rcnn_inference.py", line 95, in main
test_model(images_list=image_list) File "/home/guillaumedelaboulaye/PR8210PANO/faster-rcnn/pano_faster_rcnn/src/run_faster_rcnn_inference.py", line 48, in test_model
image_tensor = graph.get_tensor_by_name('image_tensor:0') File "/home/guillaumedelaboulaye/PR8210PANO/faster-rcnn/venv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2733, in get_tensor_by_name
return self.as_graph_element(name, allow_tensor=True, allow_operation=False) File "/home/guillaumedelaboulaye/PR8210PANO/faster-rcnn/venv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2584, in as_graph_element
return self._as_graph_element_locked(obj, allow_tensor, allow_operation) File "/home/guillaumedelaboulaye/PR8210PANO/faster-rcnn/venv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2626, in _as_graph_element_locked
"graph." % (repr(name), repr(op_name))) KeyError: "The name 'image_tensor:0' refers to a Tensor which does not exist. The operation, 'image_tensor', does not exist in the graph."
In the train graph, the input/output nodes are not given those names. What you will need to do is to "export" your trained model via the export_inference_graph.py tool. I believe it currently exports it to a frozen graph or a SavedModel, but in future releases, it will export to ordinary checkpoint as well.
If you want sample code for finding the node names of the graph, referring to the object_detection_tutorial.ipynb, after the "Load a (frozen) Tensorflow model into memory." block:
for node in od_graph_def.node:
print node.name
That should list all the node names that you can then enter in the subsequent blocks.