Custom Object Detection TFLite model error (pascal voc) - ValueError: The size of the train_data (0) couldn't be smaller than batch_size (2) - tensorflow

I am trying to build a custom object detection model in jupyter notebook using tflite model maker but I have some problems.
I am getting images with pascal_voc (not csv) so I splited the train data/test data into different files
os.mkdir('C:/Users/user/Desktop/GradProject5/train_test_split/train')
os.mkdir('C:/Users/user/Desktop/GradProject5/train_test_split/test')
image_paths = os.listdir('C:/Users/user/anaconda3/envs/DarkflowTest/data/dataset')
random.shuffle(image_paths)
for i, image_path in enumerate(image_paths):
if i < int(len(image_paths) * 0.8):
shutil.copy(f'C:/Users/user/anaconda3/envs/DarkflowTest/data/dataset/{image_path}', 'C:/Users/user/Desktop/GradProject5/train_test_split/train')
shutil.copy(f'C:/Users/user/anaconda3/envs/DarkflowTest/data/annotations/{image_path.replace("jpg", "xml")}', 'C:/Users/user/Desktop/GradProject5/train_test_split/train')
else:
shutil.copy(f'C:/Users/user/anaconda3/envs/DarkflowTest/data/dataset/{image_path}', 'C:/Users/user/Desktop/GradProject5/train_test_split/test')
shutil.copy(f'C:/Users/user/anaconda3/envs/DarkflowTest/data/annotations/{image_path.replace("jpg", "xml")}', 'C:/Users/user/Desktop/GradProject5/train_test_split/test')
test_image_dir='C:/Users/user/Desktop/GradProject/\train_test_split/test/'
#annotations_dir = 'C:/Users/user/anaconda3/envs/DarkflowTest/data/annotations/'
train_data=object_detector.DataLoader.from_pascal_voc(train_image_dir+'image/',train_image_dir+'xml/',label_map={1:"pill",2:"text"})
test_datal=object_detector.DataLoader.from_pascal_voc(test_image_dir+'image/',test_image_dir+'xml/',label_map={1:"pill",2:"text"})
Then loaded train data and test data with DataLoader
model = object_detector.create(train_data, model_spec=spec, batch_size=2, train_whole_model=True)
Then I tried to create a model but I am getting this error.
[[1]: https://i.stack.imgur.com/ZBQtk.png][1]
ValueError Traceback (most recent call last)
in
----> 1 model = object_detector.create(train_data, model_spec=spec, batch_size=2, train_whole_model=True)
~\anaconda3\lib\site-packages\tensorflow_examples\lite\model_maker\core\task\object_detector.py in create(cls, train_data, model_spec, validation_data, epochs, batch_size, train_whole_model, do_train)
285 if do_train:
286 tf.compat.v1.logging.info('Retraining the models...')
--> 287 object_detector.train(train_data, validation_data, epochs, batch_size)
288 else:
289 object_detector.create_model()
~\anaconda3\lib\site-packages\tensorflow_examples\lite\model_maker\core\task\object_detector.py in train(self, train_data, validation_data, epochs, batch_size)
137 # TODO(b/171449557): Upstream this to the parent class.
138 if len(train_data) < batch_size:
--> 139 raise ValueError('The size of the train_data (%d) couldn't be smaller '
140 'than batch_size (%d). To solve this problem, set '
141 'the batch_size smaller or increase the size of the '
ValueError: The size of the train_data (0) couldn't be smaller than batch_size (2). To solve this problem, set the batch_size smaller or increase the size of the train_data.
Am I getting error because
train_data=object_detector.DataLoader.from_pascal_voc(train_image_dir+'image/',train_image_dir+'xml/',label_map={1:"pill",2:"text"})
it failed to load train data..?
I think I did everything right and still struggling with this error..
Please help me if you know the solution to this problem!

I met the same issue, and simply remove the trailing / from the test image dir, then the issue is gone.
train_image_dir = '/Users/twinssbc/VSProject/mobile_phone/train'
train_process_image_dir = '/Users/twinssbc/VSProject/mobile_phone/train_preprocess'
train_label_dir = '/Users/twinssbc/VSProject/mobile_phone/train_label'
train_data_loader = object_detector.DataLoader.from_pascal_voc(train_process_image_dir, train_label_dir, label_map=label_map)
spec = model_spec.get('efficientdet_lite4')
model = object_detector.create(train_data_loader, spec, batch_size=5, epochs=50, train_whole_model=True)
test_data_loader = object_detector.DataLoader.from_pascal_voc(test_process_image_dir, test_label_dir, label_map=label_map)
model.evaluate(test_data_loader)

check the format of your pascal voc xml..... I solved this issue by reformating the xml which had <?xml version= .bla bla bla uft bla bla> you might need to remove this line.. and if you still get another file not found error... check that the extension of your xml filename is same as what your file name is in the folder in use. ps this happened to me because I used label studio to annotate and tried to use the data in tflie model builder directly

Related

Save and resuse deeplearning model in Keras/Tensoflow [duplicate]

Setting
As already mentioned in the title, I got a problem with my custom loss function, when trying to load the saved model. My loss looks as follows:
def weighted_cross_entropy(weights):
weights = K.variable(weights)
def loss(y_true, y_pred):
y_pred = K.clip(y_pred, K.epsilon(), 1-K.epsilon())
loss = y_true * K.log(y_pred) * weights
loss = -K.sum(loss, -1)
return loss
return loss
weighted_loss = weighted_cross_entropy([0.1,0.9])
So during training, I used the weighted_loss function as loss function and everything worked well. When training is finished I save the model as .h5file with the standard model.save function from keras API.
Problem
When I am trying to load the model via
model = load_model(path,custom_objects={"weighted_loss":weighted_loss})
I am getting a ValueError telling me that the loss is unknown.
Error
The error message looks as follows:
File "...\predict.py", line 29, in my_script
"weighted_loss": weighted_loss})
File "...\Continuum\anaconda3\envs\processing\lib\site-packages\keras\engine\saving.py", line 419, in load_model
model = _deserialize_model(f, custom_objects, compile)
File "...\Continuum\anaconda3\envs\processing\lib\site-packages\keras\engine\saving.py", line 312, in _deserialize_model
sample_weight_mode=sample_weight_mode)
File "...\Continuum\anaconda3\envs\processing\lib\site-packages\keras\engine\training.py", line 139, in compile
loss_function = losses.get(loss)
File "...\Continuum\anaconda3\envs\processing\lib\site-packages\keras\losses.py", line 133, in get
return deserialize(identifier)
File "...\Continuum\anaconda3\envs\processing\lib\site-packages\keras\losses.py", line 114, in deserialize
printable_module_name='loss function')
File "...\Continuum\anaconda3\envs\processing\lib\site-packages\keras\utils\generic_utils.py", line 165, in deserialize_keras_object
':' + function_name)
ValueError: Unknown loss function:loss
Questions
How can I fix this problem? May it be possible that the reason for that is my wrapped loss definition? So keras doesn't know, how to handle the weights variable?
Your loss function's name is loss (i.e. def loss(y_true, y_pred):). Therefore, when loading back the model you need to specify 'loss' as its name:
model = load_model(path, custom_objects={'loss': weighted_loss})
For full examples demonstrating saving and loading Keras models with custom loss functions or models, please have a look at the following GitHub gist files:
Custom loss function defined using a wrapper:
https://gist.github.com/ashkan-abbasi66/a81fe4c4d588e2c187180d5bae734fde
Custom loss function defined by subclassing:
https://gist.github.com/ashkan-abbasi66/327efe2dffcf9788847d26de934ef7bd
Custom model:
https://gist.github.com/ashkan-abbasi66/d5a525d33600b220fa7b095f7762cb5b
Note:
I tested the above examples on Python 3.8 with Tensorflow 2.5.

tensorflow - Invalid argument: Input size should match but they differ by 2

I am trying to train a dl model with tf.keras. I have 67 classes of images inside the image directory like airports, bookstore, casino. And for each classes i have at least 100 images. The data is from mit indoor scene dataset But when I am trying to train the model, I am constantly getting this error.
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input size should match (header_size + row_size * abs_height) but they differ by 2
[[{{node decode_image/DecodeImage}}]]
[[IteratorGetNext]]
(1) Invalid argument: Input size should match (header_size + row_size * abs_height) but they differ by 2
[[{{node decode_image/DecodeImage}}]]
[[IteratorGetNext]]
[[IteratorGetNext/_7]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_1570]
Function call stack:
train_function -> train_function
I tried to resolve the problem by resizing the image with the resizing layer, also included the labels='inferred' and label_mode='categorical' in the image_dataset_from_directory method and included loss='categorical_crossentropy' in the model compile method. Previously labels and label_model were not set and loss was sparse_categorical_crossentropy which i think is not right. so I changed them as described above.But I am still having problems.
There is one question related to this in stackoverflow but the person did not mentioned how he solved the problem just updated that - My suggestion is to check the metadata of the dataset. It helped to fix my problem. But did not mentioned what metadata to look for or what he did to solve the problem.
The code that I am using to train the model -
import os
import PIL
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, Dense, MaxPooling2D, GlobalAveragePooling2D
from tensorflow.keras.layers import Flatten, Dropout, BatchNormalization, Rescaling
from tensorflow.keras.models import Sequential
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
from tensorflow.keras.regularizers import l1, l2
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from pathlib import Path
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# define directory paths
PROJECT_PATH = Path.cwd()
DATA_PATH = PROJECT_PATH.joinpath('data', 'Images')
# create a dataset
batch_size = 32
img_height = 180
img_width = 180
train = tf.keras.utils.image_dataset_from_directory(
DATA_PATH,
validation_split=0.2,
subset="training",
labels="inferred",
label_mode="categorical",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
valid = tf.keras.utils.image_dataset_from_directory(
DATA_PATH,
validation_split=0.2,
subset="validation",
labels="inferred",
label_mode="categorical",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
class_names = train.class_names
for image_batch, label_batch in train.take(1):
print("\nImage shape:", image_batch.shape)
print("Label Shape", label_batch.shape)
# resize image
resize_layer = tf.keras.layers.Resizing(img_height, img_width)
train = train.map(lambda x, y: (resize_layer(x), y))
valid = valid.map(lambda x, y: (resize_layer(x), y))
# standardize the data
normalization_layer = tf.keras.layers.Rescaling(1./255)
train = train.map(lambda x, y: (normalization_layer(x), y))
valid = valid.map(lambda x, y: (normalization_layer(x), y))
image_batch, labels_batch = next(iter(train))
first_image = image_batch[0]
print("\nImage (min, max) value:", (np.min(first_image), np.max(first_image)))
print()
# configure the dataset for performance
AUTOTUNE = tf.data.AUTOTUNE
train = train.cache().prefetch(buffer_size=AUTOTUNE)
valid = valid.cache().prefetch(buffer_size=AUTOTUNE)
# create a basic model architecture
num_classes = len(class_names)
# initiate a sequential model
model = Sequential()
# CONV1
model.add(Conv2D(filters=64, kernel_size=3, activation="relu",
input_shape=(img_height, img_width, 3)))
model.add(BatchNormalization())
# CONV2
model.add(Conv2D(filters=64, kernel_size=3,
activation="relu"))
model.add(BatchNormalization())
# Pool + Dropout
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.3))
# CONV3
model.add(Conv2D(filters=128, kernel_size=3,
activation="relu"))
model.add(BatchNormalization())
# CONV4
model.add(Conv2D(filters=128, kernel_size=3,
activation="relu"))
model.add(BatchNormalization())
# POOL + Dropout
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.3))
# FC5
model.add(Flatten())
model.add(Dense(128, activation="relu"))
model.add(Dense(num_classes, activation="softmax"))
# compile the model
model.compile(loss="categorical_crossentropy",
optimizer="adam", metrics=['accuracy'])
# train the model
epochs = 25
early_stopping_cb = EarlyStopping(patience=10, restore_best_weights=True)
history = model.fit(train, validation_data=valid, epochs=epochs,
callbacks=[early_stopping_cb], verbose=2)
result = pd.DataFrame(history.history)
print()
print(result.head())
Note -
I just modified the code to make it as simple as possible to reduce the error. The model run for few batches than again got the above error.
Epoch 1/10
732/781 [===========================>..] - ETA: 22s - loss: 3.7882Traceback (most recent call last):
File ".\02_model1.py", line 139, in <module>
model.fit(train, epochs=10, validation_data=valid)
File "C:\Users\BHOLA\anaconda3\lib\site-packages\keras\engine\training.py", line 1184, in fit
tmp_logs = self.train_function(iterator)
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py", line 885, in __call__
result = self._call(*args, **kwds)
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py", line 917, in _call
return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\function.py", line 3039, in __call__
return graph_function._call_flat(
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\function.py", line 1963, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\function.py", line 591, in call
outputs = execute.execute(
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input size should match (header_size + row_size * abs_height) but they differ by 2
[[{{node decode_image/DecodeImage}}]]
[[IteratorGetNext]]
(1) Invalid argument: Input size should match (header_size + row_size * abs_height) but they differ by 2
[[{{node decode_image/DecodeImage}}]]
[[IteratorGetNext]]
[[IteratorGetNext/_2]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_11840]
Function call stack:
train_function -> train_function
Modified code -
# create a dataset
batch_size = 16
img_height = 256
img_width = 256
train = image_dataset_from_directory(
DATA_PATH,
validation_split=0.2,
subset="training",
labels="inferred",
label_mode="categorical",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
valid = image_dataset_from_directory(
DATA_PATH,
validation_split=0.2,
subset="validation",
labels="inferred",
label_mode="categorical",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
model = tf.keras.applications.Xception(
weights=None, input_shape=(img_height, img_width, 3), classes=67)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.fit(train, epochs=10, validation_data=valid)
I think it might be a corrupted file. It is throwing an exception after a data integrity check in the DecodeBMPv2 function (https://github.com/tensorflow/tensorflow/blob/0b6b491d21d6a4eb5fbab1cca565bc1e94ca9543/tensorflow/core/kernels/image/decode_image_op.cc#L594)
If that's the issue and you want to find out which file(s) are throwing the exception, you can try something like this below on the directory containing the files. Remove/replace any files you find and it should train normally.
import glob
img_paths = glob.glob(os.path.join(<path_to_dataset>,'*/*.*') # assuming you point to the directory containing the label folders.
bad_paths = []
for image_path in img_paths:
try:
img_bytes = tf.io.read_file(path)
decoded_img = tf.io.decode_image(img_bytes)
except tf.errors.InvalidArgumentError as e:
print(f"Found bad path {image_path}...{e}")
bad_paths.append(image_path)
print(f"{image_path}: OK")
print("BAD PATHS:")
for bad_path in bad_paths:
print(f"{bad_path}")
This is in fact a corrupted file problem. However, the underlying issue is far more subtle. Here is an explanation of what is going on and how to circumvent this obstacle. I encountered the very same problem on the very same MIT Indoor Scene Classification dataset. All the images are JPEG files (spoiler alert: well, are they?).
It has been correctly noted that the exception is raised exactly here, in a C++ file related to the tf.io.decode_image() function. It is the decode_image() function where the issue lies, which is called by the
tf.keras.utils.image_dataset_from_directory().
On the other hand, tf.keras.preprocessing.image.ImageDataGenerator().flow_from_directory() relies on Pillow under the hood (shown here, which is called from here). This is the reason why adopting the ImageDataGenerator class works.
After closer inspection of the corresponding C++ source file, one can observe that the function is actually called DecodeBmpV2(...), as defined here. This raises the question of why a JPEG image is being treated as a BMP one. The aforementioned function is actually called here, as part of a basic switch statement the aim of which is further direct data conversion according to the determined type. Thus, the piece of code that determines the file type should be subjected to deeper analysis. The file type is determined according to the value of starting bytes (see here). Long story short, a simple comparison of so-called magic bytes that signify file type is performed.
Here is a code extract with the corresponding magic bytes.
static const char kPngMagicBytes[] = "\x89\x50\x4E\x47\x0D\x0A\x1A\x0A";
static const char kGifMagicBytes[] = "\x47\x49\x46\x38";
static const char kBmpMagicBytes[] = "\x42\x4d";
static const char kJpegMagicBytes[] = "\xff\xd8\xff";
After identifying which files raise the exception, I saw that they were supposed to be JPEG files, however, their starting bytes indicated a BMP format instead.
Here is an example of 3 files and their first 10 bytes.
laundromat\laundry_room_area.jpg
b'ffd8ffe000104a464946'
laundromat\Laundry_Room_Edens1A.jpg
b'ffd8ffe000104a464946'
laundromat\Laundry_Room_bmp.jpg
b'424d3800030000000000'
Look at the last one. It even contains the word bmp in the file name. Why is that so? I do not know. The dataset does contain corrupted image files. Someone probably converted the file from BMP to JPEG, yet the tool used did not work correctly. We can just guess the real reason, but that is now irrelevant.
The method by which the file type is determined is different from the one performed by the Pillow package, thus, there is nothing we can do about it. The recommendation is to identify the corrupted files, which is actually easy or to rely on the ImageDataGenerator. However, I would advise against doing so as this class has been marked as deprecated. It is not a bug in code per se, but rather bad data inadvertently introduced into the dataset.

tensorflow warning - Found untraced functions such as lstm_cell_6_layer_call_and_return_conditional_losses

I'm using tensorflow2.4, and new to tensorflow
Here's the code
model = Sequential()
model.add(LSTM(32, input_shape=(X_train.shape[1:])))
model.add(Dropout(0.2))
model.add(Dense(1, activation='linear'))
model.compile(optimizer='rmsprop', loss='mean_absolute_error', metrics='mae')
model.summary()
save_weights_at = 'basic_lstm_model'
save_best = ModelCheckpoint(save_weights_at, monitor='val_loss', verbose=0,
save_best_only=True, save_weights_only=False, mode='min',
period=1)
history = model.fit(x=X_train, y=y_train, batch_size=16, epochs=20,
verbose=1, callbacks=[save_best], validation_data=(X_val, y_val),
shuffle=True)
And in some epochs, got this warning:
Do you know why did I get this warning?
I think this warning can be safely ignored as you can find the same warning even in a tutorial given by tensorflow. I often see this warning when saving custom models such as graph NNs. You should be good to go as long as you don't want to access those non-callable functions.
However, if you're annoyed by this big chunk of text, you can suppress this warning by adding the following at the top of the code.
import absl.logging
absl.logging.set_verbosity(absl.logging.ERROR)
If you ignore this warning, you will not be able to reload your model.
What this warning is telling you is that you are using customized layers and losses in your model architecture.
The ModelCheckpoint callback saves your model after each epoch with a lower validation loss achieved. Models can be saved in HDF5 format or SavedModel format (default, specific to TensorFlow and Keras). You are using the SavedModel format here, as you didn't explicitly specify the .h5 extension.
Every time a new epoch reaches a lower validation loss, you model is automatically saved, but your customized objects (layers and losses) are not traced. Btw, this is why the warning is only prompted after several training epochs.
Without your traced customized objects, you will not be able to reload your model successfully with keras.models.load_model().
If you do not intent to reload your best model in the future, you can safely ignore this warning. In any case, you can still use your best model in your current local environment after training.
saving models in H5 format seems to work for me.
model.save(filepath, save_format="h5")
Here is how to use H5 with model checkpointing (I've not tested this extensively, caveat emptor!)
from tensorflow.keras.callbacks import ModelCheckpoint
class ModelCheckpointH5(ModelCheckpoint):
# There is a bug saving models in TF 2.4
# https://github.com/tensorflow/tensorflow/issues/47479
# This forces the h5 format for saving
def __init__(self,
filepath,
monitor='val_loss',
verbose=0,
save_best_only=False,
save_weights_only=False,
mode='auto',
save_freq='epoch',
options=None,
**kwargs):
super(ModelCheckpointH5, self).__init__(filepath,
monitor='val_loss',
verbose=0,
save_best_only=False,
save_weights_only=False,
mode='auto',
save_freq='epoch',
options=None,
**kwargs)
def _save_model(self, epoch, logs):
from tensorflow.python.keras.utils import tf_utils
logs = logs or {}
if isinstance(self.save_freq,
int) or self.epochs_since_last_save >= self.period:
# Block only when saving interval is reached.
logs = tf_utils.to_numpy_or_python_type(logs)
self.epochs_since_last_save = 0
filepath = self._get_file_path(epoch, logs)
try:
if self.save_best_only:
current = logs.get(self.monitor)
if current is None:
logging.warning('Can save best model only with %s available, '
'skipping.', self.monitor)
else:
if self.monitor_op(current, self.best):
if self.verbose > 0:
print('\nEpoch %05d: %s improved from %0.5f to %0.5f,'
' saving model to %s' % (epoch + 1, self.monitor,
self.best, current, filepath))
self.best = current
if self.save_weights_only:
self.model.save_weights(
filepath, overwrite=True, options=self._options)
else:
self.model.save(filepath, overwrite=True, options=self._options,save_format="h5") # NK edited here
else:
if self.verbose > 0:
print('\nEpoch %05d: %s did not improve from %0.5f' %
(epoch + 1, self.monitor, self.best))
else:
if self.verbose > 0:
print('\nEpoch %05d: saving model to %s' % (epoch + 1, filepath))
if self.save_weights_only:
self.model.save_weights(
filepath, overwrite=True, options=self._options)
else:
self.model.save(filepath, overwrite=True, options=self._options,save_format="h5") # NK edited here
self._maybe_remove_file()
except IOError as e:
# `e.errno` appears to be `None` so checking the content of `e.args[0]`.
if 'is a directory' in six.ensure_str(e.args[0]).lower():
raise IOError('Please specify a non-directory filepath for '
'ModelCheckpoint. Filepath used is an existing '
'directory: {}'.format(filepath))
# Re-throw the error for any other causes.
raise
Try appending the extension to the file.
save_weights_at = 'basic_lstm_model'
For:
save_weights_at = 'basic_lstm_model.h5'
save the model in .tf format if there is issue in saving the model
model.save('you_model_name.tf')

Unable to save model with tensorflow 2.0.0 beta1

I have tried all the options described in the documentation but none of them allowed me to save my model in tensorflow 2.0.0 beta1. I've also tried to upgrade to the (also unstable) TF2-RC but that ruined even the code I had working in beta so I quickly rolled back for now to beta.
See a minimal reproduction code below.
What I have tried:
model.save("mymodel.h5")
NotImplementedError: Saving the model to HDF5 format requires the
model to be a Functional model or a Sequential model. It does not work
for subclassed models, because such models are defined via the body of
a Python method, which isn't safely serializable. Consider saving to
the Tensorflow SavedModel format (by setting save_format="tf") or
using save_weights.
model.save("mymodel", format='tf')
ValueError: Model <main.CVAE object at 0x7f1cac2e7c50> cannot be
saved because the input shapes have not been set. Usually, input
shapes are automatically determined from calling .fit() or .predict().
To manually set the shapes, call model._set_inputs(inputs).
3.
model._set_input(input_sample)
model.save("mymodel", format='tf')
AssertionError: tf.saved_model.save is not supported inside a traced
#tf.function. Move the call to the outer eagerly-executed context.
And this is where I am stuck now because it gives me no reasonable hint whatsoever. That's because I am NOT calling the save() function from a #tf.function, I'm already calling it from the outermost scope possible. In fact, I have no #tf.function at all in this minimal reproduction script below and still getting the same error.
So I really have no idea how to save my model, I've tried every options and they all throw errors and provide no hints.
The minimal reproduction example below works fine if you set save_model=False and it reproduces the error when save_model=True.
It may seem unnecessary in this simplified auto-encoder code example to use a subclassed model but I have lots of custom functions added to it in my original VAE code that I need it for.
Code:
import tensorflow as tf
save_model = True
learning_rate = 1e-4
BATCH_SIZE = 100
TEST_BATCH_SIZE = 10
color_channels = 1
imsize = 28
(train_images, _), (test_images, _) = tf.keras.datasets.mnist.load_data()
train_images = train_images[:5000, ::]
test_images = train_images[:1000, ::]
train_images = train_images.reshape(-1, imsize, imsize, 1).astype('float32')
test_images = test_images.reshape(-1, imsize, imsize, 1).astype('float32')
train_images /= 255.
test_images /= 255.
train_dataset = tf.data.Dataset.from_tensor_slices(train_images).batch(BATCH_SIZE)
test_dataset = tf.data.Dataset.from_tensor_slices(test_images).batch(TEST_BATCH_SIZE)
class AE(tf.keras.Model):
def __init__(self):
super(AE, self).__init__()
self.network = tf.keras.Sequential([
tf.keras.layers.InputLayer(input_shape=(imsize, imsize, color_channels)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(50),
tf.keras.layers.Dense(imsize**2 * color_channels),
tf.keras.layers.Reshape(target_shape=(imsize, imsize, color_channels)),
])
def decode(self, input):
logits = self.network(input)
return logits
optimizer = tf.keras.optimizers.Adam(learning_rate)
model = AE()
def compute_loss(data):
logits = model.decode(data)
loss = tf.reduce_mean(tf.losses.mean_squared_error(logits, data))
return loss
def train_step(data):
with tf.GradientTape() as tape:
loss = compute_loss(data)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
return loss, 0
def test_step(data):
loss = compute_loss(data)
return loss
input_shape_set = False
epoch = 0
epochs = 20
for epoch in range(epochs):
for train_x in train_dataset:
train_step(train_x)
if epoch % 1 == 0:
loss = 0.0
num_batches = 0
for test_x in test_dataset:
loss += test_step(test_x)
num_batches += 1
loss /= num_batches
print("Epoch: {}, Loss: {}".format(epoch, loss))
if save_model:
print("Saving model...")
if not input_shape_set:
# Note: Why set input shape manually and why here:
# 1. If I do not set input shape manually: ValueError: Model <main.CVAE object at 0x7f1cac2e7c50> cannot be saved because the input shapes have not been set. Usually, input shapes are automatically determined from calling .fit() or .predict(). To manually set the shapes, call model._set_inputs(inputs).
# 2. If I set input shape manually BEFORE the first actual train step, I get: RuntimeError: Attempting to capture an EagerTensor without building a function.
model._set_inputs(train_dataset.__iter__().next())
input_shape_set = True
# Note: Why choose tf format: model.save('MNIST/Models/model.h5') will return NotImplementedError: Saving the model to HDF5 format requires the model to be a Functional model or a Sequential model. It does not work for subclassed models, because such models are defined via the body of a Python method, which isn't safely serializable. Consider saving to the Tensorflow SavedModel format (by setting save_format="tf") or using save_weights.
model.save('MNIST/Models/model', save_format='tf')
I have tried the same minimal reproduction example in tensorflow-gpu 2.0.0-rc0 and the error was more revealing than what the beta version gave me. The error in RC says:
NotImplementedError: When subclassing the Model class, you should
implement a call method.
This got me read through https://www.tensorflow.org/beta/guide/keras/custom_layers_and_models where I found examples of how to do subclassing in TF2 in a way that allows saving. I was able to resolve the error and have the model saved by replacing my 'decode' method by 'call' in the above example (although this will be more complicated with my actual code where I had various methods defined for the class). This solved the error both in beta and in rc. Strangely, the training (or the saving) got also much faster in rc.
You should change two things:
Change the decode method to call, as you pointed out
As your model is of type Sequential, and not built inside the class, you want to call the save method on the self.network attribute of the model, i.e.,
model.network.save('mymodel.h5')
alternatively, to keep things more standard, you can implement this method inside the AE class, as follows:
def save(self, save_dir):
self.network.save(save_dir)
Cheers mate

Custom loss function: perform a model.predict on the data in y_pred

I am training a network to denoise images, for this I am using the CIFAR10 dataset. I am trying to generate a custom loss function so that the loss is mse / classification_accuracy.
Given that my network receives as input 32x32 (noisy) images and predicts 32x32 (denoised) images, I am assuming that y_pred and Y_true would be arrays of 32x32 images. Thus my custom loss functions looks like this:
def custom_loss():
def joint_optimized_loss(y_true, y_pred):
mse = K.mean(K.square(y_pred - y_true), axis=-1)
preds = classif_model.predict(y_pred)
correctPreds = 0
totPreds = 0
for pred in preds:
predictedClass = pred.index(max(pred))
totPreds += 1
if predictedClass == currentClass:
correctPreds += 1
classifAccuracy = correctPreds / totPreds
loss = mse / classifAccuracy
return loss
return joint_optimized_loss
myModel.compile(optimizer='adadelta', loss=custom_loss())
classif_model is a pre-trained model that classifies CIFAR10 images into one of the 10 classes. It receives an array of 32x32 images.
However when I run my code I get the following error:
Traceback (most recent call last):
File "myCode.py", line 94, in
myModel.compile(optimizer='adadelta', loss=custom_loss())
File "/home/rvidalma/anaconda2/envs/tensorUpdated/lib/python2.7/site-packages/keras/engine/training.py",
line 850, in compile
sample_weight, mask)
File "/home/rvidalma/anaconda2/envs/tensorUpdated/lib/python2.7/site-packages/keras/engine/training.py",
line 450, in weighted
score_array = fn(y_true, y_pred)
File "myCode.py", line 57, in joint_optimized_loss
preds = classif_model.predict(y_pred)
File "/home/rvidalma/anaconda2/envs/tensorUpdated/lib/python2.7/site-packages/keras/models.py",
line 913, in predict
return self.model.predict(x, batch_size=batch_size, verbose=verbose)
File "/home/rvidalma/anaconda2/envs/tensorUpdated/lib/python2.7/site-packages/keras/engine/training.py",
line 1713, in predict
verbose=verbose, steps=steps)
File "/home/rvidalma/anaconda2/envs/tensorUpdated/lib/python2.7/site-packages/keras/engine/training.py",
line 1260, in _predict_loop
batches = _make_batches(num_samples, batch_size)
File "/home/rvidalma/anaconda2/envs/tensorUpdated/lib/python2.7/site-packages/keras/engine/training.py",
line 374, in _make_batches
num_batches = int(np.ceil(size / float(batch_size)))
AttributeError: 'Dimension' object has no attribute 'ceil'
I think this has something to do with the fact that y_true and y_pred are both tensors that, before training, are empty thus classif_model.predict fails as it is expecting an array. However I am not sure on how to fix this...
I tried getting instead the value of y_pred using K.get_value(y_pred), but that gives me the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape
[-1,32,32,3] has negative dimensions [[Node: input_1 =
Placeholderdtype=DT_FLOAT, shape=[?,32,32,3],
_device="/job:localhost/replica:0/task:0/cpu:0"]]
You cannot use accuracy as a loss function, as it is not differentiable. This is why upper bounds on accuracy like the cross-entropy are used instead.
Additionally, the way you implemented accuracy is also non-symbolic, you should have used only functions in keras.backend to implement a loss for it to work properly.
I had almost same problem, and I tried this and it worked for me.
Instead of:
preds = classif_model.predict(y_pred)
try:
preds = classif_model(y_pred)
I am not sure about the reason but it is because when we use model.predict(y) it need batch_size and while compiling we don't have any, so we can not use model.predict(y).
Please correct me if this is wrong.