tensorflow warning - Found untraced functions such as lstm_cell_6_layer_call_and_return_conditional_losses

tensorflow warning - Found untraced functions such as lstm_cell_6_layer_call_and_return_conditional_losses - tensorflow

I'm using tensorflow2.4, and new to tensorflow
Here's the code
model = Sequential()
model.add(LSTM(32, input_shape=(X_train.shape[1:])))
model.add(Dropout(0.2))
model.add(Dense(1, activation='linear'))
model.compile(optimizer='rmsprop', loss='mean_absolute_error', metrics='mae')
model.summary()
save_weights_at = 'basic_lstm_model'
save_best = ModelCheckpoint(save_weights_at, monitor='val_loss', verbose=0,
save_best_only=True, save_weights_only=False, mode='min',
period=1)
history = model.fit(x=X_train, y=y_train, batch_size=16, epochs=20,
verbose=1, callbacks=[save_best], validation_data=(X_val, y_val),
shuffle=True)
And in some epochs, got this warning:
Do you know why did I get this warning?

I think this warning can be safely ignored as you can find the same warning even in a tutorial given by tensorflow. I often see this warning when saving custom models such as graph NNs. You should be good to go as long as you don't want to access those non-callable functions.
However, if you're annoyed by this big chunk of text, you can suppress this warning by adding the following at the top of the code.
import absl.logging
absl.logging.set_verbosity(absl.logging.ERROR)

If you ignore this warning, you will not be able to reload your model.
What this warning is telling you is that you are using customized layers and losses in your model architecture.
The ModelCheckpoint callback saves your model after each epoch with a lower validation loss achieved. Models can be saved in HDF5 format or SavedModel format (default, specific to TensorFlow and Keras). You are using the SavedModel format here, as you didn't explicitly specify the .h5 extension.
Every time a new epoch reaches a lower validation loss, you model is automatically saved, but your customized objects (layers and losses) are not traced. Btw, this is why the warning is only prompted after several training epochs.
Without your traced customized objects, you will not be able to reload your model successfully with keras.models.load_model().
If you do not intent to reload your best model in the future, you can safely ignore this warning. In any case, you can still use your best model in your current local environment after training.

saving models in H5 format seems to work for me.
model.save(filepath, save_format="h5")
Here is how to use H5 with model checkpointing (I've not tested this extensively, caveat emptor!)
from tensorflow.keras.callbacks import ModelCheckpoint
class ModelCheckpointH5(ModelCheckpoint):
# There is a bug saving models in TF 2.4
# https://github.com/tensorflow/tensorflow/issues/47479
# This forces the h5 format for saving
def __init__(self,
filepath,
monitor='val_loss',
verbose=0,
save_best_only=False,
save_weights_only=False,
mode='auto',
save_freq='epoch',
options=None,
**kwargs):
super(ModelCheckpointH5, self).__init__(filepath,
monitor='val_loss',
verbose=0,
save_best_only=False,
save_weights_only=False,
mode='auto',
save_freq='epoch',
options=None,
**kwargs)
def _save_model(self, epoch, logs):
from tensorflow.python.keras.utils import tf_utils
logs = logs or {}
if isinstance(self.save_freq,
int) or self.epochs_since_last_save >= self.period:
# Block only when saving interval is reached.
logs = tf_utils.to_numpy_or_python_type(logs)
self.epochs_since_last_save = 0
filepath = self._get_file_path(epoch, logs)
try:
if self.save_best_only:
current = logs.get(self.monitor)
if current is None:
logging.warning('Can save best model only with %s available, '
'skipping.', self.monitor)
else:
if self.monitor_op(current, self.best):
if self.verbose > 0:
print('\nEpoch %05d: %s improved from %0.5f to %0.5f,'
' saving model to %s' % (epoch + 1, self.monitor,
self.best, current, filepath))
self.best = current
if self.save_weights_only:
self.model.save_weights(
filepath, overwrite=True, options=self._options)
else:
self.model.save(filepath, overwrite=True, options=self._options,save_format="h5") # NK edited here
else:
if self.verbose > 0:
print('\nEpoch %05d: %s did not improve from %0.5f' %
(epoch + 1, self.monitor, self.best))
else:
if self.verbose > 0:
print('\nEpoch %05d: saving model to %s' % (epoch + 1, filepath))
if self.save_weights_only:
self.model.save_weights(
filepath, overwrite=True, options=self._options)
else:
self.model.save(filepath, overwrite=True, options=self._options,save_format="h5") # NK edited here
self._maybe_remove_file()
except IOError as e:
# `e.errno` appears to be `None` so checking the content of `e.args[0]`.
if 'is a directory' in six.ensure_str(e.args[0]).lower():
raise IOError('Please specify a non-directory filepath for '
'ModelCheckpoint. Filepath used is an existing '
'directory: {}'.format(filepath))
# Re-throw the error for any other causes.
raise

Try appending the extension to the file.
save_weights_at = 'basic_lstm_model'
For:
save_weights_at = 'basic_lstm_model.h5'

save the model in .tf format if there is issue in saving the model
model.save('you_model_name.tf')

Related

tensorflow - Invalid argument: Input size should match but they differ by 2

I am trying to train a dl model with tf.keras. I have 67 classes of images inside the image directory like airports, bookstore, casino. And for each classes i have at least 100 images. The data is from mit indoor scene dataset But when I am trying to train the model, I am constantly getting this error.
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input size should match (header_size + row_size * abs_height) but they differ by 2
[[{{node decode_image/DecodeImage}}]]
[[IteratorGetNext]]
(1) Invalid argument: Input size should match (header_size + row_size * abs_height) but they differ by 2
[[{{node decode_image/DecodeImage}}]]
[[IteratorGetNext]]
[[IteratorGetNext/_7]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_1570]
Function call stack:
train_function -> train_function
I tried to resolve the problem by resizing the image with the resizing layer, also included the labels='inferred' and label_mode='categorical' in the image_dataset_from_directory method and included loss='categorical_crossentropy' in the model compile method. Previously labels and label_model were not set and loss was sparse_categorical_crossentropy which i think is not right. so I changed them as described above.But I am still having problems.
There is one question related to this in stackoverflow but the person did not mentioned how he solved the problem just updated that - My suggestion is to check the metadata of the dataset. It helped to fix my problem. But did not mentioned what metadata to look for or what he did to solve the problem.
The code that I am using to train the model -
import os
import PIL
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, Dense, MaxPooling2D, GlobalAveragePooling2D
from tensorflow.keras.layers import Flatten, Dropout, BatchNormalization, Rescaling
from tensorflow.keras.models import Sequential
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
from tensorflow.keras.regularizers import l1, l2
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from pathlib import Path
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# define directory paths
PROJECT_PATH = Path.cwd()
DATA_PATH = PROJECT_PATH.joinpath('data', 'Images')
# create a dataset
batch_size = 32
img_height = 180
img_width = 180
train = tf.keras.utils.image_dataset_from_directory(
DATA_PATH,
validation_split=0.2,
subset="training",
labels="inferred",
label_mode="categorical",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
valid = tf.keras.utils.image_dataset_from_directory(
DATA_PATH,
validation_split=0.2,
subset="validation",
labels="inferred",
label_mode="categorical",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
class_names = train.class_names
for image_batch, label_batch in train.take(1):
print("\nImage shape:", image_batch.shape)
print("Label Shape", label_batch.shape)
# resize image
resize_layer = tf.keras.layers.Resizing(img_height, img_width)
train = train.map(lambda x, y: (resize_layer(x), y))
valid = valid.map(lambda x, y: (resize_layer(x), y))
# standardize the data
normalization_layer = tf.keras.layers.Rescaling(1./255)
train = train.map(lambda x, y: (normalization_layer(x), y))
valid = valid.map(lambda x, y: (normalization_layer(x), y))
image_batch, labels_batch = next(iter(train))
first_image = image_batch[0]
print("\nImage (min, max) value:", (np.min(first_image), np.max(first_image)))
print()
# configure the dataset for performance
AUTOTUNE = tf.data.AUTOTUNE
train = train.cache().prefetch(buffer_size=AUTOTUNE)
valid = valid.cache().prefetch(buffer_size=AUTOTUNE)
# create a basic model architecture
num_classes = len(class_names)
# initiate a sequential model
model = Sequential()
# CONV1
model.add(Conv2D(filters=64, kernel_size=3, activation="relu",
input_shape=(img_height, img_width, 3)))
model.add(BatchNormalization())
# CONV2
model.add(Conv2D(filters=64, kernel_size=3,
activation="relu"))
model.add(BatchNormalization())
# Pool + Dropout
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.3))
# CONV3
model.add(Conv2D(filters=128, kernel_size=3,
activation="relu"))
model.add(BatchNormalization())
# CONV4
model.add(Conv2D(filters=128, kernel_size=3,
activation="relu"))
model.add(BatchNormalization())
# POOL + Dropout
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.3))
# FC5
model.add(Flatten())
model.add(Dense(128, activation="relu"))
model.add(Dense(num_classes, activation="softmax"))
# compile the model
model.compile(loss="categorical_crossentropy",
optimizer="adam", metrics=['accuracy'])
# train the model
epochs = 25
early_stopping_cb = EarlyStopping(patience=10, restore_best_weights=True)
history = model.fit(train, validation_data=valid, epochs=epochs,
callbacks=[early_stopping_cb], verbose=2)
result = pd.DataFrame(history.history)
print()
print(result.head())
Note -
I just modified the code to make it as simple as possible to reduce the error. The model run for few batches than again got the above error.
Epoch 1/10
732/781 [===========================>..] - ETA: 22s - loss: 3.7882Traceback (most recent call last):
File ".\02_model1.py", line 139, in <module>
model.fit(train, epochs=10, validation_data=valid)
File "C:\Users\BHOLA\anaconda3\lib\site-packages\keras\engine\training.py", line 1184, in fit
tmp_logs = self.train_function(iterator)
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py", line 885, in __call__
result = self._call(*args, **kwds)
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py", line 917, in _call
return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\function.py", line 3039, in __call__
return graph_function._call_flat(
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\function.py", line 1963, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\function.py", line 591, in call
outputs = execute.execute(
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input size should match (header_size + row_size * abs_height) but they differ by 2
[[{{node decode_image/DecodeImage}}]]
[[IteratorGetNext]]
(1) Invalid argument: Input size should match (header_size + row_size * abs_height) but they differ by 2
[[{{node decode_image/DecodeImage}}]]
[[IteratorGetNext]]
[[IteratorGetNext/_2]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_11840]
Function call stack:
train_function -> train_function
Modified code -
# create a dataset
batch_size = 16
img_height = 256
img_width = 256
train = image_dataset_from_directory(
DATA_PATH,
validation_split=0.2,
subset="training",
labels="inferred",
label_mode="categorical",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
valid = image_dataset_from_directory(
DATA_PATH,
validation_split=0.2,
subset="validation",
labels="inferred",
label_mode="categorical",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
model = tf.keras.applications.Xception(
weights=None, input_shape=(img_height, img_width, 3), classes=67)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.fit(train, epochs=10, validation_data=valid)

I think it might be a corrupted file. It is throwing an exception after a data integrity check in the DecodeBMPv2 function (https://github.com/tensorflow/tensorflow/blob/0b6b491d21d6a4eb5fbab1cca565bc1e94ca9543/tensorflow/core/kernels/image/decode_image_op.cc#L594)
If that's the issue and you want to find out which file(s) are throwing the exception, you can try something like this below on the directory containing the files. Remove/replace any files you find and it should train normally.
import glob
img_paths = glob.glob(os.path.join(<path_to_dataset>,'*/*.*') # assuming you point to the directory containing the label folders.
bad_paths = []
for image_path in img_paths:
try:
img_bytes = tf.io.read_file(path)
decoded_img = tf.io.decode_image(img_bytes)
except tf.errors.InvalidArgumentError as e:
print(f"Found bad path {image_path}...{e}")
bad_paths.append(image_path)
print(f"{image_path}: OK")
print("BAD PATHS:")
for bad_path in bad_paths:
print(f"{bad_path}")

This is in fact a corrupted file problem. However, the underlying issue is far more subtle. Here is an explanation of what is going on and how to circumvent this obstacle. I encountered the very same problem on the very same MIT Indoor Scene Classification dataset. All the images are JPEG files (spoiler alert: well, are they?).
It has been correctly noted that the exception is raised exactly here, in a C++ file related to the tf.io.decode_image() function. It is the decode_image() function where the issue lies, which is called by the
tf.keras.utils.image_dataset_from_directory().
On the other hand, tf.keras.preprocessing.image.ImageDataGenerator().flow_from_directory() relies on Pillow under the hood (shown here, which is called from here). This is the reason why adopting the ImageDataGenerator class works.
After closer inspection of the corresponding C++ source file, one can observe that the function is actually called DecodeBmpV2(...), as defined here. This raises the question of why a JPEG image is being treated as a BMP one. The aforementioned function is actually called here, as part of a basic switch statement the aim of which is further direct data conversion according to the determined type. Thus, the piece of code that determines the file type should be subjected to deeper analysis. The file type is determined according to the value of starting bytes (see here). Long story short, a simple comparison of so-called magic bytes that signify file type is performed.
Here is a code extract with the corresponding magic bytes.
static const char kPngMagicBytes[] = "\x89\x50\x4E\x47\x0D\x0A\x1A\x0A";
static const char kGifMagicBytes[] = "\x47\x49\x46\x38";
static const char kBmpMagicBytes[] = "\x42\x4d";
static const char kJpegMagicBytes[] = "\xff\xd8\xff";
After identifying which files raise the exception, I saw that they were supposed to be JPEG files, however, their starting bytes indicated a BMP format instead.
Here is an example of 3 files and their first 10 bytes.
laundromat\laundry_room_area.jpg
b'ffd8ffe000104a464946'
laundromat\Laundry_Room_Edens1A.jpg
b'ffd8ffe000104a464946'
laundromat\Laundry_Room_bmp.jpg
b'424d3800030000000000'
Look at the last one. It even contains the word bmp in the file name. Why is that so? I do not know. The dataset does contain corrupted image files. Someone probably converted the file from BMP to JPEG, yet the tool used did not work correctly. We can just guess the real reason, but that is now irrelevant.
The method by which the file type is determined is different from the one performed by the Pillow package, thus, there is nothing we can do about it. The recommendation is to identify the corrupted files, which is actually easy or to rely on the ImageDataGenerator. However, I would advise against doing so as this class has been marked as deprecated. It is not a bug in code per se, but rather bad data inadvertently introduced into the dataset.

KeyError: 'Failed to format this callback filepath: Reason: \'lr\''

I recently switched form Tensorflow 2.2.0 to 2.4.1 and now I have a problem with ModelCheckpoint callback path. This code works fine if I use an environment with tf 2.2 but get an error when I use tf 2.4.1.
checkpoint_filepath = 'path_to/temp_checkpoints/model/epoch-{epoch}_loss-{lr:.2e}_loss-{val_loss:.3e}'
checkpoint = ModelCheckpoint(checkpoint_filepath, monitor='val_loss')
history = model.fit(training_data, training_data,
epochs=10,
batch_size=32,
shuffle=True,
validation_data=(validation_data, validation_data),
verbose=verbose, callbacks=[checkpoint])
Error:
KeyError: 'Failed to format this callback filepath: "path_to/temp_checkpoints/model/epoch-{epoch}_loss-{lr:.2e}_loss-{val_loss:.3e}". Reason: 'lr''

In ModelCheckpoint, formatted name of filepath argument, can only be contain: epoch + keys in logs after epoch ends.
You can see available keys in logs like this:
class CustomCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
keys = list(logs.keys())
print("Log keys: {}".format(keys))
model.fit(..., callbacks=[CustomCallback()])
If you run code above, you will see something like this:
Log keys: ['loss', 'mean_absolute_error', 'val_loss', 'val_mean_absolute_error']
Which shows you available keys you can use (plus epoch) and lr is not available for you (You have used 3 keys: epoch, lr and val_loss in filepath name).
Solution:
You can add learning rate to logs yourself:
import tensorflow.keras.backend as K
class CustomCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
logs.update({'lr': K.eval(self.model.optimizer.lr)})
keys = list(logs.keys())
print("Log keys: {}".format(keys)) #you will see now `lr` available
checkpoint_filepath = 'path_to/temp_checkpoints/model/epoch-{epoch}_loss-{lr:.2e}_loss-{val_loss:.3e}'
checkpoint = ModelCheckpoint(checkpoint_filepath, monitor='val_loss')
history = model.fit(training_data, training_data,
epochs=10,
batch_size=32,
shuffle=True,
validation_data=(validation_data, validation_data),
verbose=verbose, callbacks=[checkpoint, CustomCallback()])

Keras load_weights() not loading checkpoints

I have been following the RNN tutorial of Tensorflow
https://www.tensorflow.org/tutorials/text/text_generation
The model.load_weights() is not working, and is throwing the error
Traceback (most recent call last):
File "C:/Users/swati.srivastava/PycharmProjects/TensorFlow Practice/main.py", line 1232, in <module>
model.load_weights(tf.train.load_checkpoint("./training_checkpoints/ckpt_" + str(checkpoint_num)))
File "C:\Users\swati.srivastava\PycharmProjects\TensorFlow Practice\venv\lib\site-packages\tensorflow\python\keras\engine\training.py", line 2260, in load_weights
filepath, save_format = _detect_save_format(filepath)
File "C:\Users\swati.srivastava\PycharmProjects\TensorFlow Practice\venv\lib\site-packages\tensorflow\python\keras\engine\training.py", line 2868, in _detect_save_format
if saving_utils.is_hdf5_filepath(filepath):
File "C:\Users\swati.srivastava\PycharmProjects\TensorFlow Practice\venv\lib\site-packages\tensorflow\python\keras\saving\saving_utils.py", line 327, in is_hdf5_filepath
return (filepath.endswith('.h5') or filepath.endswith('.keras') or
AttributeError: 'tensorflow.python.util._pywrap_checkpoint_reader.C' object has no attribute 'endswith'
Process finished with exit code 1
My code is
BATCH_SIZE = 64
VOCAB_SIZE = len(vocab)
EMBEDDING_DIM = 256
RNN_UNITS = 1024
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim, batch_input_shape=[batch_size, None]),
tf.keras.layers.LSTM(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform'),
tf.keras.layers.Dense(vocab_size)
])
return model
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_prefix,
save_weights_only=True
)
model = build_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, batch_size=1)
checkpoint_num = 2
model.load_weights(tf.train.load_checkpoint("./training_checkpoints/ckpt_" + str(checkpoint_num)))
model.build(tf.TensorShape([1, None]))
My project directory looks like
which means that the training checkpoints are created, and exist. None of the checkpoint files are empty.
The only solution I could find was at https://github.com/tensorflow/tensorflow/issues/38745, where it says to do save_weights_only=True, which I have already done.
I think it is some sort of version conflict, but am not sure.
Edit: Added the checkpoint_callback snippet. training_checkpoints directory is created as can be seen in the project directory image

The issue is that load_weights api expects an HDF5 format file, but as per the your code, you do not provide it.
I am assuming that you are using ModelCheckpoint API for checkpoint creation like below:
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
save_weights_only=True,
verbose=1)
Then you will just provide the checkpoint path which is in your case './training_checkpoints/'
model.load_weights(checkpoint_path)
You can look at the documentation for more details.

Tensorflow TensorBoard not showing acc, loss, acc_val, and loss_val only only epoch_accuracy and epoch_loss

I am looking to have a TensorBoard display graphs corresponding to acc, loss, acc_val, and loss_val but they do not appear for some reason. Here is what I am seeing.
I am looking to have this:
I am following the instruction here to be able to use tensorboard in a google colab notebook
This is the code used to generate the tensorboard:
opt = tf.keras.optimizers.Adam(lr=0.001, decay=1e-6)
tensorboard = TensorBoard(log_dir="logs/{}".format(NAME),
histogram_freq=1,
write_graph=True,
write_grads=True,
batch_size=BATCH_SIZE,
write_images=True)
model.compile(
loss='sparse_categorical_crossentropy',
optimizer=opt,
metrics=['accuracy']
)
# Train model
history = model.fit(
train_x, train_y,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
validation_data=(validation_x, validation_y),
callbacks=[tensorboard]
)
How do I go about solving this issue? Any ideas? Your help is very much appreciated!

That's the intended behavior. If you want to log custom scalars such as a dynamic learning rate, you need to use the TensorFlow Summary API.
Retrain the regression model and log a custom learning rate. Here's how:
Create a file writer, using tf.summary.create_file_writer().
Define a custom learning rate function. This will be passed to the Keras LearningRateScheduler callback.
Inside the learning rate function, use tf.summary.scalar() to log the custom learning rate.
Pass the LearningRateScheduler callback to Model.fit().
In general, to log a custom scalar, you need to use tf.summary.scalar() with a file writer. The file writer is responsible for writing data for this run to the specified directory and is implicitly used when you use the tf.summary.scalar().
logdir = "logs/scalars/" + datetime.now().strftime("%Y%m%d-%H%M%S")
file_writer = tf.summary.create_file_writer(logdir + "/metrics")
file_writer.set_as_default()
def lr_schedule(epoch):
"""
Returns a custom learning rate that decreases as epochs progress.
"""
learning_rate = 0.2
if epoch > 10:
learning_rate = 0.02
if epoch > 20:
learning_rate = 0.01
if epoch > 50:
learning_rate = 0.005
tf.summary.scalar('learning rate', data=learning_rate, step=epoch)
return learning_rate
lr_callback = keras.callbacks.LearningRateScheduler(lr_schedule)
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)
model = keras.models.Sequential([
keras.layers.Dense(16, input_dim=1),
keras.layers.Dense(1),
])
model.compile(
loss='mse', # keras.losses.mean_squared_error
optimizer=keras.optimizers.SGD(),
)
training_history = model.fit(
x_train, # input
y_train, # output
batch_size=train_size,
verbose=0, # Suppress chatty output; use Tensorboard instead
epochs=100,
validation_data=(x_test, y_test),
callbacks=[tensorboard_callback, lr_callback],
)

Tensorflow Eager: Store and Restore Trainable variables

I've written a custom model with Tensorflow Eager(similar to this example). I want to store/restore just my trainable variables - something similar to the following non-eager logic. How can I do this within Eager?:
def store(self, sess_var, model_path):
if model_path is not None:
saver = tf.train.Saver(var_list=tf.trainable_variables())
save_path = saver.save(sess_var, model_path)
print("Model saved in path: %s" % save_path)
else:
print("Model path is None - Nothing to store")
def restore(self, sess_var, model_path):
if model_path is not None:
if os.path.exists("{}.index".format(model_path)):
saver = tf.train.Saver(var_list=tf.trainable_variables())
saver.restore(sess_var, model_path)
print("Model at %s restored" % model_path)
else:
print("Model path does not exist, skipping...")
else:
print("Model path is None - Nothing to restore")

Eager execution in TensorFlow encourages encapsulating model state in objects, for example in tf.keras.Model objects. The state of these objects (the "checkpointed" values of variables) can then be saved and restored using tf.contrib.eager.Checkpoint
Note that the tf.contrib.eager.Checkpoint class is compatible with both eager and graph execution.
You'll see this used in examples in the tensorflow repository such as this and this
Hope that helps.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

tensorflow warning - Found untraced functions such as lstm_cell_6_layer_call_and_return_conditional_losses - tensorflow

Try appending the extension to the file. save_weights_at = 'basic_lstm_model' For: save_weights_at = 'basic_lstm_model.h5'

save the model in .tf format if there is issue in saving the model model.save('you_model_name.tf')

Related

tensorflow - Invalid argument: Input size should match but they differ by 2

KeyError: 'Failed to format this callback filepath: Reason: \'lr\''

Keras load_weights() not loading checkpoints

Tensorflow TensorBoard not showing acc, loss, acc_val, and loss_val only only epoch_accuracy and epoch_loss

Tensorflow Eager: Store and Restore Trainable variables

Categories

Resources