Keras load_weights() not loading checkpoints

Keras load_weights() not loading checkpoints - tensorflow

I have been following the RNN tutorial of Tensorflow
https://www.tensorflow.org/tutorials/text/text_generation
The model.load_weights() is not working, and is throwing the error
Traceback (most recent call last):
File "C:/Users/swati.srivastava/PycharmProjects/TensorFlow Practice/main.py", line 1232, in <module>
model.load_weights(tf.train.load_checkpoint("./training_checkpoints/ckpt_" + str(checkpoint_num)))
File "C:\Users\swati.srivastava\PycharmProjects\TensorFlow Practice\venv\lib\site-packages\tensorflow\python\keras\engine\training.py", line 2260, in load_weights
filepath, save_format = _detect_save_format(filepath)
File "C:\Users\swati.srivastava\PycharmProjects\TensorFlow Practice\venv\lib\site-packages\tensorflow\python\keras\engine\training.py", line 2868, in _detect_save_format
if saving_utils.is_hdf5_filepath(filepath):
File "C:\Users\swati.srivastava\PycharmProjects\TensorFlow Practice\venv\lib\site-packages\tensorflow\python\keras\saving\saving_utils.py", line 327, in is_hdf5_filepath
return (filepath.endswith('.h5') or filepath.endswith('.keras') or
AttributeError: 'tensorflow.python.util._pywrap_checkpoint_reader.C' object has no attribute 'endswith'
Process finished with exit code 1
My code is
BATCH_SIZE = 64
VOCAB_SIZE = len(vocab)
EMBEDDING_DIM = 256
RNN_UNITS = 1024
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim, batch_input_shape=[batch_size, None]),
tf.keras.layers.LSTM(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform'),
tf.keras.layers.Dense(vocab_size)
])
return model
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_prefix,
save_weights_only=True
)
model = build_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, batch_size=1)
checkpoint_num = 2
model.load_weights(tf.train.load_checkpoint("./training_checkpoints/ckpt_" + str(checkpoint_num)))
model.build(tf.TensorShape([1, None]))
My project directory looks like
which means that the training checkpoints are created, and exist. None of the checkpoint files are empty.
The only solution I could find was at https://github.com/tensorflow/tensorflow/issues/38745, where it says to do save_weights_only=True, which I have already done.
I think it is some sort of version conflict, but am not sure.
Edit: Added the checkpoint_callback snippet. training_checkpoints directory is created as can be seen in the project directory image

The issue is that load_weights api expects an HDF5 format file, but as per the your code, you do not provide it.
I am assuming that you are using ModelCheckpoint API for checkpoint creation like below:
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
save_weights_only=True,
verbose=1)
Then you will just provide the checkpoint path which is in your case './training_checkpoints/'
model.load_weights(checkpoint_path)
You can look at the documentation for more details.

Related

tensorflow - Invalid argument: Input size should match but they differ by 2

I am trying to train a dl model with tf.keras. I have 67 classes of images inside the image directory like airports, bookstore, casino. And for each classes i have at least 100 images. The data is from mit indoor scene dataset But when I am trying to train the model, I am constantly getting this error.
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input size should match (header_size + row_size * abs_height) but they differ by 2
[[{{node decode_image/DecodeImage}}]]
[[IteratorGetNext]]
(1) Invalid argument: Input size should match (header_size + row_size * abs_height) but they differ by 2
[[{{node decode_image/DecodeImage}}]]
[[IteratorGetNext]]
[[IteratorGetNext/_7]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_1570]
Function call stack:
train_function -> train_function
I tried to resolve the problem by resizing the image with the resizing layer, also included the labels='inferred' and label_mode='categorical' in the image_dataset_from_directory method and included loss='categorical_crossentropy' in the model compile method. Previously labels and label_model were not set and loss was sparse_categorical_crossentropy which i think is not right. so I changed them as described above.But I am still having problems.
There is one question related to this in stackoverflow but the person did not mentioned how he solved the problem just updated that - My suggestion is to check the metadata of the dataset. It helped to fix my problem. But did not mentioned what metadata to look for or what he did to solve the problem.
The code that I am using to train the model -
import os
import PIL
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, Dense, MaxPooling2D, GlobalAveragePooling2D
from tensorflow.keras.layers import Flatten, Dropout, BatchNormalization, Rescaling
from tensorflow.keras.models import Sequential
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
from tensorflow.keras.regularizers import l1, l2
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from pathlib import Path
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# define directory paths
PROJECT_PATH = Path.cwd()
DATA_PATH = PROJECT_PATH.joinpath('data', 'Images')
# create a dataset
batch_size = 32
img_height = 180
img_width = 180
train = tf.keras.utils.image_dataset_from_directory(
DATA_PATH,
validation_split=0.2,
subset="training",
labels="inferred",
label_mode="categorical",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
valid = tf.keras.utils.image_dataset_from_directory(
DATA_PATH,
validation_split=0.2,
subset="validation",
labels="inferred",
label_mode="categorical",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
class_names = train.class_names
for image_batch, label_batch in train.take(1):
print("\nImage shape:", image_batch.shape)
print("Label Shape", label_batch.shape)
# resize image
resize_layer = tf.keras.layers.Resizing(img_height, img_width)
train = train.map(lambda x, y: (resize_layer(x), y))
valid = valid.map(lambda x, y: (resize_layer(x), y))
# standardize the data
normalization_layer = tf.keras.layers.Rescaling(1./255)
train = train.map(lambda x, y: (normalization_layer(x), y))
valid = valid.map(lambda x, y: (normalization_layer(x), y))
image_batch, labels_batch = next(iter(train))
first_image = image_batch[0]
print("\nImage (min, max) value:", (np.min(first_image), np.max(first_image)))
print()
# configure the dataset for performance
AUTOTUNE = tf.data.AUTOTUNE
train = train.cache().prefetch(buffer_size=AUTOTUNE)
valid = valid.cache().prefetch(buffer_size=AUTOTUNE)
# create a basic model architecture
num_classes = len(class_names)
# initiate a sequential model
model = Sequential()
# CONV1
model.add(Conv2D(filters=64, kernel_size=3, activation="relu",
input_shape=(img_height, img_width, 3)))
model.add(BatchNormalization())
# CONV2
model.add(Conv2D(filters=64, kernel_size=3,
activation="relu"))
model.add(BatchNormalization())
# Pool + Dropout
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.3))
# CONV3
model.add(Conv2D(filters=128, kernel_size=3,
activation="relu"))
model.add(BatchNormalization())
# CONV4
model.add(Conv2D(filters=128, kernel_size=3,
activation="relu"))
model.add(BatchNormalization())
# POOL + Dropout
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.3))
# FC5
model.add(Flatten())
model.add(Dense(128, activation="relu"))
model.add(Dense(num_classes, activation="softmax"))
# compile the model
model.compile(loss="categorical_crossentropy",
optimizer="adam", metrics=['accuracy'])
# train the model
epochs = 25
early_stopping_cb = EarlyStopping(patience=10, restore_best_weights=True)
history = model.fit(train, validation_data=valid, epochs=epochs,
callbacks=[early_stopping_cb], verbose=2)
result = pd.DataFrame(history.history)
print()
print(result.head())
Note -
I just modified the code to make it as simple as possible to reduce the error. The model run for few batches than again got the above error.
Epoch 1/10
732/781 [===========================>..] - ETA: 22s - loss: 3.7882Traceback (most recent call last):
File ".\02_model1.py", line 139, in <module>
model.fit(train, epochs=10, validation_data=valid)
File "C:\Users\BHOLA\anaconda3\lib\site-packages\keras\engine\training.py", line 1184, in fit
tmp_logs = self.train_function(iterator)
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py", line 885, in __call__
result = self._call(*args, **kwds)
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py", line 917, in _call
return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\function.py", line 3039, in __call__
return graph_function._call_flat(
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\function.py", line 1963, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\function.py", line 591, in call
outputs = execute.execute(
File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input size should match (header_size + row_size * abs_height) but they differ by 2
[[{{node decode_image/DecodeImage}}]]
[[IteratorGetNext]]
(1) Invalid argument: Input size should match (header_size + row_size * abs_height) but they differ by 2
[[{{node decode_image/DecodeImage}}]]
[[IteratorGetNext]]
[[IteratorGetNext/_2]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_11840]
Function call stack:
train_function -> train_function
Modified code -
# create a dataset
batch_size = 16
img_height = 256
img_width = 256
train = image_dataset_from_directory(
DATA_PATH,
validation_split=0.2,
subset="training",
labels="inferred",
label_mode="categorical",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
valid = image_dataset_from_directory(
DATA_PATH,
validation_split=0.2,
subset="validation",
labels="inferred",
label_mode="categorical",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
model = tf.keras.applications.Xception(
weights=None, input_shape=(img_height, img_width, 3), classes=67)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.fit(train, epochs=10, validation_data=valid)

I think it might be a corrupted file. It is throwing an exception after a data integrity check in the DecodeBMPv2 function (https://github.com/tensorflow/tensorflow/blob/0b6b491d21d6a4eb5fbab1cca565bc1e94ca9543/tensorflow/core/kernels/image/decode_image_op.cc#L594)
If that's the issue and you want to find out which file(s) are throwing the exception, you can try something like this below on the directory containing the files. Remove/replace any files you find and it should train normally.
import glob
img_paths = glob.glob(os.path.join(<path_to_dataset>,'*/*.*') # assuming you point to the directory containing the label folders.
bad_paths = []
for image_path in img_paths:
try:
img_bytes = tf.io.read_file(path)
decoded_img = tf.io.decode_image(img_bytes)
except tf.errors.InvalidArgumentError as e:
print(f"Found bad path {image_path}...{e}")
bad_paths.append(image_path)
print(f"{image_path}: OK")
print("BAD PATHS:")
for bad_path in bad_paths:
print(f"{bad_path}")

This is in fact a corrupted file problem. However, the underlying issue is far more subtle. Here is an explanation of what is going on and how to circumvent this obstacle. I encountered the very same problem on the very same MIT Indoor Scene Classification dataset. All the images are JPEG files (spoiler alert: well, are they?).
It has been correctly noted that the exception is raised exactly here, in a C++ file related to the tf.io.decode_image() function. It is the decode_image() function where the issue lies, which is called by the
tf.keras.utils.image_dataset_from_directory().
On the other hand, tf.keras.preprocessing.image.ImageDataGenerator().flow_from_directory() relies on Pillow under the hood (shown here, which is called from here). This is the reason why adopting the ImageDataGenerator class works.
After closer inspection of the corresponding C++ source file, one can observe that the function is actually called DecodeBmpV2(...), as defined here. This raises the question of why a JPEG image is being treated as a BMP one. The aforementioned function is actually called here, as part of a basic switch statement the aim of which is further direct data conversion according to the determined type. Thus, the piece of code that determines the file type should be subjected to deeper analysis. The file type is determined according to the value of starting bytes (see here). Long story short, a simple comparison of so-called magic bytes that signify file type is performed.
Here is a code extract with the corresponding magic bytes.
static const char kPngMagicBytes[] = "\x89\x50\x4E\x47\x0D\x0A\x1A\x0A";
static const char kGifMagicBytes[] = "\x47\x49\x46\x38";
static const char kBmpMagicBytes[] = "\x42\x4d";
static const char kJpegMagicBytes[] = "\xff\xd8\xff";
After identifying which files raise the exception, I saw that they were supposed to be JPEG files, however, their starting bytes indicated a BMP format instead.
Here is an example of 3 files and their first 10 bytes.
laundromat\laundry_room_area.jpg
b'ffd8ffe000104a464946'
laundromat\Laundry_Room_Edens1A.jpg
b'ffd8ffe000104a464946'
laundromat\Laundry_Room_bmp.jpg
b'424d3800030000000000'
Look at the last one. It even contains the word bmp in the file name. Why is that so? I do not know. The dataset does contain corrupted image files. Someone probably converted the file from BMP to JPEG, yet the tool used did not work correctly. We can just guess the real reason, but that is now irrelevant.
The method by which the file type is determined is different from the one performed by the Pillow package, thus, there is nothing we can do about it. The recommendation is to identify the corrupted files, which is actually easy or to rely on the ImageDataGenerator. However, I would advise against doing so as this class has been marked as deprecated. It is not a bug in code per se, but rather bad data inadvertently introduced into the dataset.

Keras model compiles well outside SageMaker. But as soon as i try to train it in SageMaker with the Tensorflow instance i get an error

Here is the error: ValueError: Output tensors to a Model must be the output of a TensorFlow Layer (thus holding past layer metadata)
I try to train and deploy a multi-input Keras model with AWS Sagemaker, but there seem to be some showstopper issues with the needed libraries that expect single input for Keras models.
I have 3 categorical inputs variables and one numeric variable. The target variable is also of type categorical.I have no test or validation data. I am only interested in the training without errors.
I merged the arrays after data preparation as follows and then stored them in s3
input_train = np.column_stack((input_cat1, input_cat2, input_num, input_cat3))
training_input_path = sage_maker_session.upload_data('data/training.npz', key_prefix=prefix + training_folder)
print(training_input_path)
s3://sagemaker-eu-central-1-xxxxxxxxxxxxx/user_tracking/training/training.npz
In the train.py script (entry_point), I again fetched the file from s3. And I compiled the Train.py file again without problems, as if I were outside SageMaker.
%%writefile train.py
### import library ###
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--epochs', type=int, default=60)
parser.add_argument('--batch-size', type=int, default=50)
parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
#parser.add_argument('--model-dir', type=str)
parser.add_argument('--training', type=str, default=os.environ['SM_CHANNEL_TRAINING'])
#parser.add_argument('--training', type=str, default='data')
args, _ = parser.parse_known_args()
epochs = args.epochs
batch_size = args.batch_size
model_dir = args.model_dir
training_dir = args.training
input_train =np.load(os.path.join(training_dir, 'training.npz'))['train_input']
target =np.load(os.path.join(training_dir, 'training.npz'))['train_output']
input_cat1 = input_train[:,0].astype(np.int32)
input_cat2 = input_train[:,1].astype(np.int32)
input_cat3 = input_train[:,3:].astype(np.int32)
input_num = input_train[:,2].astype(np.float32)
n_steps = 2 # number of timesteps in each sample
num_unique_os = 5 #len(le_betriebsystem.classes_)+1
num_unique_browser = 10 #len(le_browser.classes_)+1
num_unique_actions = 210 #len(le_actionen.classes_)+1
#numeric Input
numerical_input = tf.keras.Input(shape=(1,), name='numeric_input')
#categorical Input
os_input = tf.keras.Input(shape=(1,), name='os_input')
browser_input = tf.keras.Input(shape=(1,), name='browser_input')
action_input= tf.keras.Input(shape=(max_seq_len,), name='action_input')
emb_os = tf.keras.layers.Embedding(num_unique_os, 32)(os_input)
emb_browser = tf.keras.layers.Embedding(num_unique_browser, 32)(browser_input)
emb_actions = tf.keras.layers.Embedding(num_unique_actions, 64)(action_input)
actions_repr = tf.keras.layers.LSTM(300, return_sequences=True)(emb_actions)
actions_repr = tf.keras.layers.LSTM(200)(emb_actions)
emb_os = tf.squeeze(emb_os, axis=1)
emb_browser = tf.squeeze(emb_browser, axis=1)
activity_repr = tf.keras.layers.Concatenate()([emb_os, emb_browser, actions_repr,
numerical_input])
x = tf.keras.layers.RepeatVector(n_steps)(activity_repr)
x = tf.keras.layers.LSTM(288, return_sequences=True)(x)
next_n_actions = tf.keras.layers.Dense(num_unique_actions-1, activation='softmax')(x)
model = tf.keras.Model(inputs=[numerical_input, os_input, browser_input, action_input], outputs =
next_n_actions)
model.summary()
model.compile('adam', 'categorical_crossentropy', metrics=['accuracy'])
history = model.fit({'numeric_input': input_num,
'os_input': input_cat1,
'browser_input': input_cat2,
'action_input': input_cat3}, target, batch_size=50, epochs=130)
tf.saved_model.simple_save(
tf.keras.backend.get_session(),
os.path.join(model_dir, '1'),
inputs={'inputs': model.input},
outputs={t.name: t for t in model.outputs})
I received this:
Model Sommary
Meric Tendency
when trying to do the whole thing again with the Tensorflow instance, the following error occurred:
Traceback (most recent call last): File "train.py", line 105, in model = tf.keras.Model(inputs=[numerical_input, os_input, browser_input, action_input], outputs = next_n_actions) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 121, in init super(Model, self).init(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/network.py", line 80, in init self._init_graph_network(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpointable/base.py", line 474, in _method_wrapper method(self, *args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/network.py", line 224, in _init_graph_network '(thus holding past layer metadata). Found: ' + str(x)) ValueError: Output tensors to a Model must be the output of a TensorFlow Layer (thus holding past layer metadata). Found: Tensor("dense/truediv:0", shape=(?, 2, 209), dtype=float32) 2021-03-08 21:52:04,761 sagemaker-containers ERROR ExecuteUserScriptError: Command "/usr/bin/python train.py --batch-size 50 --epochs 150--model_dir s3://sagemaker-eu-central-1-xxxxxxxxxxxxxxxxx/sagemaker-tensorflow-scriptmode
I used the Tensorflow versions '2.0.4' and '1.15.4' respectly with the kernels conda_tensorflow_p36 and conda_tensorflow2_p36
For more of Code: https://gitlab.com/patricksardin08/data-science/-/tree/master/
Please i need your Helps. I'm here around the clock if anyone wants me to explain the question in more detail.

L2-normalization with Keras Backend?

I'd like to normalize the inputs going into my neural network but, as I'm defining my model in this way:
df = pd.read_csv(r'C:\Users\Davide Mori\PycharmProjects\pythonProject\Dataset.csv')
print(df)
target_column = ['W_mag', 'W_phase']
predictors = list(set(list(df.columns)) - set(target_column))
X = df[predictors].values
Y = df[target_column].values
def get_model(n_inputs, n_outputs):
model = Sequential()
model.add(Dense(1000,input_dim= n_inputs, activation='relu'))
#model.add(Lambda(lambda x: K.l2_normalize(x, axis=1)))
model.add(Dense(1000, activation='linear', activity_regularizer=regularizers.l1(0.0001)))
model.add(Activation('relu'))
model.add(Dense(n_outputs, activation='linear'))
model.compile(optimizer="adam", loss="mean_squared_error", metrics=["mean_squared_error"])
model.summary()
return model
n_inputs, n_outputs = X.shape[1], Y.shape[1]
model = get_model(n_inputs, n_outputs)
# fit the model on all data
model.fit(X, Y, epochs=100, batch_size=1)
how do I apply the lambda layer to my inputs? Isn't wrong the commented line position? Because If I put the lambda layer there I'm normalizing what is already be "transformed" by the first hidden layer,right? How can I solve this problem?
This is the error I have when putting the lambda layer before everything else :
2020-10-12 15:08:46.036872: I
tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports
instructions that this TensorFlow binary was not compiled to use: AVX AVX2
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Program Files\JetBrains\PyCharm
2020.2.2\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197,
in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the
script
File "C:\Program Files\JetBrains\PyCharm
2020.2.2\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line
18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/Davide Mori/PycharmProjects/pythonProject/prova_rete_sfs.py",
line 60, in <module>
model = get_model(n_inputs, n_outputs)
File "C:/Users/Davide Mori/PycharmProjects/pythonProject/prova_rete_sfs.py",
line 52, in get_model
model.summary()
File "C:\Users\Davide Mori\Anaconda3\envs\pythonProject\lib\site-
packages\tensorflow_core\python\keras\engine\network.py", line 1302, in
summary
raise ValueError('This model has not yet been built. '
ValueError: This model has not yet been built. Build the model first by
calling `build()` or calling `fit()` with some data, or specify an
`input_shape` argument in the first layer(s) for automatic build.

Unable to create group (name already exists)

import tensorflow as tf
from ..models.encoder import encoder_build
from ..models.decoder import decoder_build
def compute_attention_maps(inputs,name,upsample=False):
attMap = tf.reduce_sum(tf.square(inputs),axis=-1,keepdims=True,name= str(name)+"reducSum")
if upsample:
attMap = tf.keras.layers.UpSampling2D(size=(2, 2),
interpolation='bilinear',
name = str(name)+"bilinear")(attMap)
attMap = tf.squeeze(attMap,axis=-1,name = str(name)+"squeeze")
attMap = tf.reshape(attMap,
(tf.shape(attMap)[0],tf.shape(attMap)[1]*tf.shape(attMap)[2]),
name = str(name)+"reshape")
attMap = tf.nn.softmax(attMap,
axis=-1,
name = str(name)+"spatialSoftmax")
return attMap
def compute_mse(x,y,name):
diff = tf.math.squared_difference(x,y,name = str(name)+"squError")
diff = tf.reduce_mean(diff,axis=0, name = str(name)+"mean")
diff = tf.reduce_sum(diff, name = str(name)+"sum")
return diff
def compute_distillation(attention_inputs):
inp1,inp2,inp3,inp4 = attention_inputs
attMap1 = compute_attention_maps(inp1,"attmap1_")
attMap2_upsample = compute_attention_maps(inp2,"attmap2UP_",upsample=True)
attMap2 = compute_attention_maps(inp2,"attmap2_")
attMap3_upsample = compute_attention_maps(inp3,"attmap3UP_",upsample=True)
attMap3 = compute_attention_maps(inp3,"attmap3_")
attMap4 = compute_attention_maps(inp4,"attmap4_")
distillation1 = compute_mse(attMap1,attMap2_upsample,"distil1_")
distillation2 = compute_mse(attMap2,attMap3_upsample,"distil2_")
distillation3 = compute_mse(attMap3,attMap4,"distil3_")
return tf.math.add_n([distillation1,distillation2,distillation3], name="distill_loss")
if __name__ == '__main__':
inputs = tf.keras.layers.Input(shape=(None, None, 3), name='image')
encoderTuple = encoder_build(inputs) # import from encoder.py file
attention_inputs = encoderTuple[1]
outputs = decoder_build(encoderTuple) # import from decoder.py file
model = tf.keras.models.Model(inputs=inputs, outputs=outputs)
model.add_loss(compute_distillation(attention_inputs))
model.summary()
model.compile(optimizer = tf.keras.optimizers.Adam(learning_rate=0.00001, clipnorm=0.001),
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(x = train_generator,
epochs=epochs,
verbose=1,
callbacks=callbacks,
validation_data=validation_generator,
shuffle=True)
i have created keras segmentation model for lane detection (https://arxiv.org/pdf/1908.00821.pdf). I'm able to compile, start training and save models for each epoch without any errors. But if i add my custom loss to model model.add_loss(compute_distillation(attention_inputs)) model gets trained for 1 epoch, after that model is not saving and displaying below error. How to resolve this error?
374/375 [============================>.] - ETA: 0s - loss: 4.4717 - acc: 0.9781Epoch 1/50
78/78[============================>.] - ETA: 37:38 - val_loss: 4.5855 - val_acc: 0.9758
Epoch 00001: saving model to /workspace/work/enet_sad_naiveresize/snapshot/enetNRSAD_Tusimple_L_4.4718_VL_4.5855.h5
Traceback (most recent call last):
File "/workspace/work/enet_sad_naiveresize/bin/train.py", line 82, in <module>
shuffle=True)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 727, in fit
use_multiprocessing=use_multiprocessing)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 603, in fit
steps_name='steps_per_epoch')
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 332, in model_iteration
callbacks.on_epoch_end(epoch, epoch_logs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 299, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 968, in on_epoch_end
self._save_model(epoch=epoch, logs=logs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 1015, in _save_model
self.model.save(filepath, overwrite=True)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 1171, in save
signatures)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/save.py", line 109, in save_model
model, filepath, overwrite, include_optimizer)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 103, in save_model_to_hdf5
save_weights_to_hdf5_group(model_weights_group, model_layers)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 619, in save_weights_to_hdf5_group
g = f.create_group(layer.name)
File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/group.py", line 68, in create_group
gid = h5g.create(self.id, name, lcpl=lcpl, gcpl=gcpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5g.pyx", line 161, in h5py.h5g.create
ValueError: Unable to create group (name already exists)

The issue is because you are stacking layers (and naming them wrongly) in compute_distillation function, by calling other functions such as compute_attention_maps and compute_mse.
You would have got a similar layer if you hadn't named also and the fact that the error persists even after you named them is because the h5 models expects names in a certain format as explained here https://github.com/keras-team/keras/issues/12195.
A good solution would be to use keras lambda layers in compute_distilation function to create attMap1, attMap2, etc. or define your own custom AttentionMap layer as shown below.
class AttentionMaps(tf.keras.layers.Layer):
def __init__(self, upsample=False):
super(AttentionMaps, self).__init__()
self.upsample = upsample
def call(self, inputs):
attMap = tf.reduce_sum(
tf.square(inputs),
axis=-1,
keepdims=True
)
if self.upsample:
attMap = tf.keras.layers.UpSampling2D(
size=(2, 2),
interpolation='bilinear'
)(attMap)
attMap = tf.squeeze(attMap,axis=-1)
attMap = tf.reshape(
attMap,
(tf.shape(attMap)[0],tf.shape(attMap)[1]*tf.shape(attMap)[2]))
attMap = tf.nn.softmax(attMap,
axis=-1,)
return attMap
This custom layer can then be added to your model as per the example below. The names ofthe layers are no longer required, so I removed them.
def compute_distillation(attention_inputs):
inp1,inp2,inp3,inp4 = attention_inputs
attention_layer_1 = AttentionMaps()
attMap1 = attention_layer_1(inp1)
attention_layer_2 = AttentionMaps(upsample=True)
attMap2_upsample = attention_layer_2(inp2)
attention_layer_3 = AttentionMaps()
attMap2 = attention_layer_3(inp2)
attention_layer_4 = AttentionMaps(upsample=True)
attMap3_upsample = attention_layer_4(inp3)
attention_layer_5 = AttentionMaps()
attMap3 = attention_layer_5(inp3)
attention_layer_6 = AttentionMaps(upsample=True)
attMap4_upsample = attention_layer_6(inp4)
distillation1 = compute_mse(attMap1,attMap2_upsample)
distillation2 = compute_mse(attMap2,attMap3_upsample)
distillation3 = compute_mse(attMap3,attMap4_upsample)
return tf.math.add_n([distillation1,distillation2,distillation3], name="distill_loss")

There are a few issues opened in the keras github related to this.
https://github.com/keras-team/keras/issues/6005
https://github.com/keras-team/keras/issues/12195
This issue is not due to the custom loss function but due to the way the model is defined.
You could try out the solutions provided in the above mentioned links such as saving the model weights as a tf file rather than h5 or avoid adding the same instance of activation layer at multiple places in the model. If that doesn't resolve your issue please update the question to include the models.

Save the Keras model error: AttributeError: 'numpy.dtype' object has no attribute 'item'

I have tried to save my Keras model in pycharm where I got the error, this is how I created the model:
main_input = Input(shape=(X_train.shape[1],), dtype=X_train.dtype,
name='main_input')
xx = Embedding(output_dim=512, input_dim=3000, input_length=len(X))
(main_input)
xx= SpatialDropout1D(0.4)(xx)
lstm_out = LSTM(64)(xx)
#lstm_out = Dense(3,activation='softmax')(lstm_out)
from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model
auxiliary_input = Input(shape=(Z_train.shape[1],), name='aux_input')
auxB= Input(shape=(hasB_train.shape[1],), name='aux_B')
auxM = Input(shape=(hasM_train.shape[1],), name='aux_M')
auxBM_input = keras.layers.concatenate([ auxB, auxM])
auxiliary_output = Dense(3, activation='softmax', name='aux_output') (lstm_out)
auxBM_output = Dense(3, activation='softmax', name='auxBM_output') (auxBM_input)
x = keras.layers.concatenate([lstm_out, auxiliary_input, auxBM_input])
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
main_output = Dense(3, activation='sigmoid', name='main_output')(x)
model = Model(inputs=[main_input, auxiliary_input, auxB, auxM], outputs= [main_output, auxiliary_output, auxBM_output])
model.compile(optimizer='rmsprop', loss='categorical_crossentropy' ,metrics = ['accuracy'], loss_weights=[4, 1, 10])
model.summary()
when I run the this code model.save('model.h5'), I receive the below error:
Traceback (most recent call last): File
"C:/.../ENV/newDataset/combined3.py", line 209, in
model.save('blah.h5') File "C:\ProgramData\Anaconda2\envs\Building_Deep_Learning_Keras\lib\site-packages\keras\engine\network.py",
line 1085, in save
save_model(self, filepath, overwrite, include_optimizer) File "C:\ProgramData\Anaconda2\envs\Building_Deep_Learning_Keras\lib\site-packages\keras\engine\saving.py",
line 117, in save_model
}, default=get_json_type).encode('utf8') File "C:\ProgramData\Anaconda2\envs\Building_Deep_Learning_Keras\lib\json__init__.py",
line 237, in dumps
**kw).encode(obj) File "C:\ProgramData\Anaconda2\envs\Building_Deep_Learning_Keras\lib\json\encoder.py",
line 198, in encode
chunks = self.iterencode(o, _one_shot=True) File "C:\ProgramData\Anaconda2\envs\Building_Deep_Learning_Keras\lib\json\encoder.py",
line 256, in iterencode
return _iterencode(o, 0) File "C:\ProgramData\Anaconda2\envs\Building_Deep_Learning_Keras\lib\site-packages\keras\engine\saving.py",
line 84, in get_json_type
return obj.item() AttributeError: 'numpy.dtype' object has no attribute 'item'
I have no problem, if I run the below code:
model = Sequential()
model.add(Embedding(max_fatures, embed_dim,input_length = X.shape[1]))
model.add(SpatialDropout1D(0.4))
model.add(LSTM(lstm_out, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(3,activation='softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])
X_train, X_test, Y_train, Y_test = train_test_split(X,Y,train_size=0.8, random_state = 42)
model.fit(X_train, Y_train, epochs = 1, batch_size=32,shuffle=True)
model.save('test.h5')

I believe you are having this issue because of how Keras handles the dtype argument when you are creating a Functional Model. Keras expects the dtype to be just a simple string and not a numpy.dtype object, and therefore, it will have difficulty saving the model when you pass a numpy object into this argument.
To adjust, I would use one of the strings to describe the data input type, as suggested at https://keras.io/backend/.
I had a similar issue, and when I changed the dtype argument to what Keras was expecting (a string), I was able to save the model without any additional problem.
To fix your issue, I would suggest, changing the dtype=X_train.dtype argument to dtype=X_train.dtype.name, as this would produce the string form of the dtype, which can be handled by Keras.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Keras load_weights() not loading checkpoints - tensorflow

Related

tensorflow - Invalid argument: Input size should match but they differ by 2

Keras model compiles well outside SageMaker. But as soon as i try to train it in SageMaker with the Tensorflow instance i get an error

L2-normalization with Keras Backend?

Unable to create group (name already exists)

Save the Keras model error: AttributeError: 'numpy.dtype' object has no attribute 'item'

Categories

Resources