I trained my model and save it like this:
network.save(os.path.join(args.logdir, "cifar_model.h5") ,
include_optimizer=False)
now, I would like to load it and continue training like this, but that doesn't work
model = tf.keras.models.load_model("...\\cifar_model.h5", compile ="False")
model.compile(
optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.001, decay=1e-6),
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy(name="accuracy")],
)
model.tb_callback = tf.keras.callbacks.TensorBoard(args.logdir, update_freq=1000, profile_batch=1)
model.tb_callback.on_train_end = lambda *_: None
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rotation_range=15,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True,
)
datagen.fit(cifar.train.data["images"])
model.fit_generator(
# cifar.train.data["images"], cifar.train.data["labels"],
datagen.flow(cifar.train.data["images"], cifar.train.data["labels"], batch_size=args.batch_size),
# batch_size=args.batch_size,
steps_per_epoch=200,
epochs=args.epochs,
validation_data=(cifar.dev.data["images"], cifar.dev.data["labels"]),
callbacks=[model.tb_callback],
)
It throws an error:
AttributeError: 'Network' object has no attribute 'compile'
This should work according to https://www.tensorflow.org/alpha/guide/keras/saving_and_serializing
Note that I'm saving without optimizer so I can avoid bug with loading optimizer.
UPDATE:
I found out how to do this when I know the exact structure of layers.
Which I know, then I can recreate model and use weights from a loaded model like this:
load = tf.keras.models.load_model("...\\cifar_model.h5", compile ="False")
model.set_weights(load.get_weights())
But I couldnt apply same for load.layers i think its possible if u dont have sequential layers
Related
I'm trying to train a sequence to sequence model for machine translation using Keras on Google Colab TPU.
I have a dataset which I can load in memory but I have to preprocess to it to feed it to the model. In particular I need to convert the target words to one hot vectors and with many examples I can't load the entire conversion in memory, so I need to make batches of data.
I'm using this function as a batch generator:
def generate_batch_bert(X_ids, X_masks, y, batch_size = 1024):
''' Generate a batch of data '''
while True:
for j in range(0, len(X_ids), batch_size):
# batch of encoder and decoder data
encoder_input_data_ids = X_ids[j:j+batch_size]
encoder_input_data_masks = X_masks[j:j+batch_size]
y_decoder = y[j:j+batch_size]
# decoder target and input for teacher forcing
decoder_input_data = y_decoder[:,:-1]
decoder_target_seq = y_decoder[:,1:]
# batch of decoder target data
decoder_target_data = to_categorical(decoder_target_seq, vocab_size_fr)
# keep only with the right amount of instances for training on TPU
if encoder_input_data_ids.shape[0] == batch_size:
yield([encoder_input_data_ids, encoder_input_data_masks, decoder_input_data], decoder_target_data)
The problem is that whenever I try to run the fit function as follows:
model.fit(x=generate_batch_bert(X_train_ids, X_train_masks, y_train, batch_size = batch_size),
steps_per_epoch = train_samples//batch_size,
epochs=epochs,
callbacks = callbacks,
validation_data = generate_batch_bert(X_val_ids, X_val_masks, y_val, batch_size = batch_size),
validation_steps = val_samples//batch_size)
I get the following error:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_util.py:445 make_tensor_proto
raise ValueError("None values not supported.")
ValueError: None values not supported.
Not sure what's wrong and how I can solve this problem.
EDIT
I tried loading less amount of data in memory so that the conversion to one hot encoding of the target words doesn't crash the kernel and it actually works. So there is obviously something wrong on how I generate batches.
It's hard to tell what's wrong since you don't provide your model
definition nor any sample data. However, I'm fairly certain that you're
running into the same
TensorFlow bug
that I recently got bitten by.
The workaround is to use the tensorflow.data API which works much
better with TPUs. Like this:
from tensorflow.data import Dataset
import tensorflow as tf
def map_fn(X_id, X_mask, y):
decoder_target_data = tf.one_hot(y[1:], vocab_size_fr)
return (X_id, X_mask, y[:-1]), decoder_target_data
...
X_ids = Dataset.from_tensor_slices(X_ids)
X_masks = Dataset.from_tensor_slices(X_masks)
y = Dataset.from_tensor_slices(y)
ds = Dataset.zip((X_ids, X_masks, y)).map(map_fn).batch(1024)
model.fit(x = ds, ...)
I'm trying to build a model with DenseVariational layer so that it can report epistemic uncertainties. Something like https://www.tensorflow.org/probability/examples/Probabilistic_Layers_Regression#figure_3_epistemic_uncertainty
The model training works just fine and now I would like to save the model and load it in a production environment. However, when I tried model.save('path/model.h5'), I got
Layer DenseVariational has arguments in `__init__` and therefore must override `get_config`.
Then I added
class CustomVariational(tfp.layers.DenseVariational):
def get_config(self):
config = super().get_config().copy()
config.update({
'units': self.units,
'make_posterior_fn': self._make_posterior_fn,
'make_prior_fn': self._make_prior_fn
})
return config
but it failed with a new error
Unable to create link (name already exists)
Is DenseVariational layer for research only?
I think we can circumvent this problem by using the save_weights method.
When you add with tf.name_scope(...) to prior & posterior functions, it should be resolved, otherwise they end up with the same name for both tensors.
We're also fixing the example tutorial colab, should be online soon, thanks.
Update:
Instead of fixing it at the applications, we fixed it in the library instead: https://github.com/tensorflow/probability/commit/0ca065fb526b50ce38b68f7d5b803f02c78c8f16. Once it is updated, the duplicate tensor names should be resolved. Thanks.
It's been almost 2 years, and the problem is still going on.
A workaround is to store only the weights:
tf.keras.Model.save_weights(filepath, overwrite=True)
Then, you can use the same model structure and load the weights.
For example:
# model
model = tf.keras.Sequential([
tf.keras.layers.InputLayer(input_shape=(input_shape,), name="input"),
tfp.layers.DenseVariational(32, posterior_mean_field, prior_trainable), # trainable
tf.keras.layers.Dense(32, activation="relu"),
tf.keras.layers.Dense(1)
])
# save weights after compiling and training your model
model.save_weights('model_weights.h5')
Initialize a new model with the same structure:
# different model, same weights
model2 = tf.keras.Sequential([
tf.keras.layers.InputLayer(input_shape=(input_shape,), name="input"),
tfp.layers.DenseVariational(32, posterior_mean_field, prior_trainable),
tf.keras.layers.Dense(32, activation="relu"),
tf.keras.layers.Dense(1)
])
# load weights
model2.load_weights('model_weights.h5')
I hope this helps!
Suppose I have the following Keras model:
model = Sequential()
model.add(Dense(units=64, activation='relu'))
model.add(Dense(units=10, activation='softmax'))
model.compile(
loss = CategoricalCrossentropy(label_smoothing=0.01),
optimizer = RMSprop(learning_rate=0.001, momentum=0.0)
metrics = [Accuracy()]
)
I have two questions:
How can I view compilation settings (like the learning_rate)?
How can I change compilation settings (like the learning_rate)?
Remarks:
I noticed I can view layer settings using model.summary() or model.get_config() but that does not show compilation settings.
I know I can change the learning_rate by running the compile statement again with a different learning_rate. But I would like a "cleaner"/ more readable way to do this. Something like: model['compilation']['optimizer']['learning_rate'] = xxx. (Many sklearn model can be adjusted this way.)
Use .lr:
rate = model.optimizer.lr
I fine-tuned a BERT model from Tensorflow hub to build a simple sentiment analyzer. The model trains and runs fine. On export, I simply used:
tf.saved_model.save(model, export_dir='models')
And this works just fine.. until I reboot.
On a reboot, the model no longer loads. I've tried using a Keras loader as well as the Tensorflow Server, and I get the same error.
I get the following error message:
Not found: /tmp/tfhub_modules/09bd4e665682e6f03bc72fbcff7a68bf879910e/assets/vocab.txt; No such file or directory
The model is trying to load assets from the tfhub modules cache, which is wiped by reboot. I know I could persist the cache, but I don't want to do that because I want to be able to generate models and then copy them over to a separate application without worrying about the cache.
The crux of it is that I don't think it's necessary to look in the cache for the assets at all. The model was saved with an assets folder wherein vocab.txt was generated, so in order to find the assets it just needs to look in its own assets folder (I think). However, it doesn't seem to be doing that.
Is there any way to change this behaviour?
Added the code for building and exporting the model (it's not a clever model, just prototyping my workflow):
bert_model_name = "bert_en_uncased_L-12_H-768_A-12"
BATCH_SIZE = 64
EPOCHS = 1 # Initial
def build_bert_model(bert_model_name):
input_layer = tf.keras.layers.Input(shape=(), dtype=tf.string, name="inputs")
preprocessing_layer = hub.KerasLayer(
map_model_to_preprocess[bert_model_name], name="preprocessing"
)
encoder_inputs = preprocessing_layer(input_layer)
bert_model = hub.KerasLayer(
map_name_to_handle[bert_model_name], name="BERT_encoder"
)
outputs = bert_model(encoder_inputs)
net = outputs["pooled_output"]
net = tf.keras.layers.Dropout(0.1)(net)
net = tf.keras.layers.Dense(1, activation=None, name="classifier")(net)
return tf.keras.Model(input_layer, net)
def main():
train_ds, val_ds = load_sentiment140(batch_size=BATCH_SIZE, epochs=EPOCHS)
steps_per_epoch = tf.data.experimental.cardinality(train_ds).numpy()
init_lr = 3e-5
optimizer = tf.keras.optimizers.Adam(learning_rate=init_lr)
model = build_bert_model(bert_model_name)
model.compile(optimizer=optimizer, loss='mse', metrics='mse')
model.fit(train_ds, validation_data=val_ds, steps_per_epoch=steps_per_epoch)
tf.saved_model.save(model, export_dir='models')
This problem comes from a TensorFlow bug triggered by versions /1 and /2 of https://tfhub.dev/tensorflow/bert_en_uncased_preprocess. The updated models tensorflow/bert_*_preprocess/3 (released last Friday) avoid this bug. Please update to the newest version.
The Classify Text with BERT tutorial has been updated accordingly.
Thanks for bringing this up!
Hi I am having some serious problems saving and loading a tensorflow model which is combination of hugging face transformers + some custom layers to do classfication. I am using the latest Huggingface transformers tensorflow keras version. The idea is to extract features using distilbert and then run the features through CNN to do classification and extraction. I have got everything to work as far as getting the correct classifications.
The problem is in saving the model once trained and then loading the model again.
I am using tensorflow keras and tensorflow version 2.2
Following is the code to design the model, train it, evaluate it and then save and load it
bert_config = DistilBertConfig(dropout=0.2, attention_dropout=0.2, output_hidden_states=False)
bert_config.output_hidden_states = False
transformer_model = TFDistilBertModel.from_pretrained(DISTIL_BERT, config=bert_config)
input_ids_in = tf.keras.layers.Input(shape=(BERT_LENGTH,), name='input_token', dtype='int32')
input_masks_in = tf.keras.layers.Input(shape=(BERT_LENGTH,), name='masked_token', dtype='int32')
embedding_layer = transformer_model(input_ids_in, attention_mask=input_masks_in)[0]
x = tf.keras.layers.Bidirectional(
tf.keras.layers.LSTM(50, return_sequences=True, dropout=0.1,
recurrent_dropout=0, recurrent_activation="sigmoid",
unroll=False, use_bias=True, activation="tanh"))(embedding_layer)
x = tf.keras.layers.GlobalMaxPool1D()(x)
outputs = []
# lots of code here to define the dense layers to generate the outputs
# .....
# .....
model = Model(inputs=[input_ids_in, input_masks_in], outputs=outputs)
for model_layer in model.layers[:3]:
logger.info(f"Setting layer {model_layer.name} to not trainable")
model_layer.trainable = False
rms_optimizer = RMSprop(learning_rate=0.001)
model.compile(loss=SigmoidFocalCrossEntropy(), optimizer=rms_optimizer)
# the code to fit the model (which works)
# then code to evaluate the model (which also works)
# finally saving the model. This too works.
tf.keras.models.save_model(model, save_url, overwrite=True, include_optimizer=True, save_format="tf")
However, when I try to load the saved model using the following
tf.keras.models.load_model(
path, custom_objects={"Addons>SigmoidFocalCrossEntropy": SigmoidFocalCrossEntropy})
I get the following load error
ValueError: The two structures don't have the same nested structure.
First structure: type=TensorSpec str=TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs')
Second structure: type=dict str={'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='inputs/input_ids')}
More specifically: Substructure "type=dict str={'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='inputs/input_ids')}" is a sequence, while substructure "type=TensorSpec str=TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs')" is not
Entire first structure:
.
Entire second structure:
{'input_ids': .}
I believe the issue is because TFDistilBertModel layer can be called using a dictionary input from DistilBertTokenizer.encode() and that happens to be the first layer. So the model compiler on load expects that to be the input signature to the call model. However, the inputs defined to the model are two tensors of shape (None, 128)
So how do I tell the load function or the save function to assume the correct signatures?
I solved the issue.
The issue was the object transformer_model in the above code is itself not a layer. So if we want to embed it inside another keras layer we should use the internal keras layer that is wrapped in the model
So changing the line
embedding_layer = transformer_model(input_ids_in, attention_mask=input_masks_in[0]
to
embedding_layer = transformer_model.distilbert([input_ids_in, input_masks_in])[0]
makes everything work. Hope this helps someone else. Took a long time to debug through tf.keras code to figure this one out although in hindsight it is obvious. :)
I suffered the same problem, casually, yesterday. My solution is very similar to yours, I supposed that the problem was due to how tensorflow keras processes custom models so, the idea was to use the layers of the custom model inside my model. This has the advantage of not calling explicitly the layer by its name (in my case, it is useful for easy building more generic models using different pretrained encoders):
sent_encoder = getattr(transformers, self.model_name).from_pretrained(self.shortcut_weights).layers[0]
I don't explored all the models of HuggingFace, but a few that I tested seem to be a custom model with only one custom layer.
Your solution also works like a charm, in fact, both solutions are the same if "distilbert" references to ".layers[0]".