How to save Tensorflow encoder decoder model? - tensorflow

I followed this tutorial about building an encoder-decoder language translation model and built one for my native language.
Now I want to save it, deploy on cloud ML engine and make predictions with HTTP request.
I couldn't find a clear example on how to save this model,
I am new to ML and found TF save guide v confusing..
Is there a way to save this model using something like
tf.keras.models.save_model

Create the train saver after opening the session and after the training is done save the model:
with tf.Session() as sess:
saver = tf.train.Saver()
# Training of the model
save_path = saver.save(sess, "logs/encoder_decoder")
print(f"Model saved in path {save_path}")

You can save a Keras model in Keras's HDF5 format, see:
https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model
You will want to do something like:
import tf.keras
model = tf.keras.Model(blah blah)
model.save('my_model.h5')
If you migrate to TF 2.0, it's more straightforward to build a model in tf.keras and deploy using the TF SavedModel format. This 2.0 tutorial shows using a pretrained tf.keras model, saving the model in SavedModel format, deploying to the cloud and then doing an HTTP request for a prediction:
https://www.tensorflow.org/beta/guide/saved_model

I know I am a little late but was having the same problem (see How do I save an encoder-decoder model with TensorFlow? for more details) and figured out a solution. It's a little hacky, but it works!
Step 1 - Saving your model
Save your tokenizer (if applicable). Then individually save the weights of the model you used to train your data (naming your layers helps here).
# Save the tokenizer
with open('tokenizer.pickle', 'wb') as handle:
pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)
# save the weights individually
for layer in model.layers:
weights = layer.get_weights()
if weights != []:
np.savez(f'{layer.name}.npz', weights)
Step 2 - reloading the weights
You will want to reload the tokenizer (as applicable) then load the weights you just saved. The loaded weights are in an npz format so can't be used directly, but the very short documentation will tell you everything you need to know about this file type https://numpy.org/doc/stable/reference/generated/numpy.savez.html
# load the tokenizer
with open('tokenizer.pickle', 'rb') as handle:
tokenizer = pickle.load(handle)
# load the weights
w_encoder_embeddings = np.load('encoder_embeddings.npz', allow_pickle=True)
w_decoder_embeddings = np.load('decoder_embeddings.npz', allow_pickle=True)
w_encoder_lstm = np.load('encoder_lstm.npz', allow_pickle=True)
w_decoder_lstm = np.load('decoder_lstm.npz', allow_pickle=True)
w_dense = np.load('dense.npz', allow_pickle=True)
Step 3 - Recreate the your training model and apply the weights
You'll want to re-run the code you used to create your model. In my case this was:
encoder_inputs = Input(shape=(None,), name="encoder_inputs")
encoder_embeddings = Embedding(vocab_size, embedding_size, mask_zero=True, name="encoder_embeddings")(encoder_inputs)
# Encoder lstm
encoder_lstm = LSTM(512, return_state=True, name="encoder_lstm")
encoder_outputs, state_h, state_c = encoder_lstm(encoder_embeddings)
# discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,), name="decoder_inputs")
# target word embeddings
decoder_embeddings = Embedding(vocab_size, embedding_size, mask_zero=True, name="decoder_embeddings")
training_decoder_embeddings = decoder_embeddings(decoder_inputs)
# decoder lstm
decoder_lstm = LSTM(512, return_sequences=True, return_state=True, name="decoder_lstm")
decoder_outputs, _, _ = decoder_lstm(training_decoder_embeddings,
initial_state=encoder_states)
decoder_dense = TimeDistributed(Dense(vocab_size, activation='softmax'), name="dense")
decoder_outputs = decoder_dense(decoder_outputs)
# While training, model takes input and traget words and outputs target strings
loaded_model = Model([encoder_inputs, decoder_inputs], decoder_outputs, name="training_model")
Now you can apply your saved weights to these layers! It takes a little bit of investigation which weight goes to which layer, but this is made a lot easier by naming your layers and inspecting your model layers with model.layers.
# set the weights of the model
loaded_model.layers[2].set_weights(w_encoder_embeddings['arr_0'])
loaded_model.layers[3].set_weights(w_decoder_embeddings['arr_0'])
loaded_model.layers[4].set_weights(w_encoder_lstm['arr_0'])
loaded_model.layers[5].set_weights(w_decoder_lstm['arr_0'])
loaded_model.layers[6].set_weights(w_dense['arr_0'])
Step 4 - Create the inference model
Finally, you can now create your inference model based on this training model! Again in my case this was:
encoder_model = Model(encoder_inputs, encoder_states)
# Redefine the decoder model with decoder will be getting below inputs from encoder while in prediction
decoder_state_input_h = Input(shape=(512,))
decoder_state_input_c = Input(shape=(512,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
inference_decoder_embeddings = decoder_embeddings(decoder_inputs)
decoder_outputs2, state_h2, state_c2 = decoder_lstm(inference_decoder_embeddings, initial_state=decoder_states_inputs)
decoder_states2 = [state_h2, state_c2]
decoder_outputs2 = decoder_dense(decoder_outputs2)
# sampling model will take encoder states and decoder_input(seed initially) and output the predictions(french word index) We dont care about decoder_states2
decoder_model = Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs2] + decoder_states2)
And voilĂ ! You can now make inferences using the previously trained model!

Related

Question on prediction using Seq2seq model with embedding layers

I am a newbie in tensorflow and Seq2seq. When I wrote a code for Seq2Seq model with embedding layers based on others' codes with no embedding layers, I got errors when using the trained model to predict values.
Here are the codes for my Seq2Seq model:
# Build encoder
encoder_input = tf.keras.Input(shape=(None,))
encoder_embedding = tf.keras.layers.Embedding(
input_dim=max_eng_vocabulary,
output_dim=embedding_output_length,
mask_zero=True,
)
encoder_input_1 = encoder_embedding(encoder_input)
encoder_lstm = tf.keras.layers.LSTM(50, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_input_1)
encoder_states = [state_h, state_c]
# Build decoder model
decoder_input = tf.keras.Input(shape=(None,))
decoder_embedding = tf.keras.layers.Embedding(
input_dim=max_chi_vocabulary,
output_dim=embedding_output_length,
mask_zero=True,
)
decoder_input_1 = decoder_embedding(decoder_input)
decoder_lstm = tf.keras.layers.LSTM(
50,
return_state=True,
return_sequences=True,
)
decoder_outputs, _, _ = decoder_lstm(decoder_input_1, initial_state=encoder_states)
decoder_dense = tf.keras.layers.Dense(max_chi_vocabulary, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
# Combine the encoder and decoder
model = tf.keras.Model([encoder_input, decoder_input], decoder_outputs)
When I tried to use it for prediction, the code is similar with this:
model.predict([encoder_input_data[0], decoder_input_data[0]])
The input set in the above code is exactly one of the data sets in the training data set. After running the prediction code, I got errors: Layer lstm_1 expects 7 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'model/embedding_1/embedding_lookup/Identity_1:0' shape=(None, 1, 100) dtype=float32>]
A sketch of the model structure is also attached.
Model Structure
I have an additional question: it seems the masking function for the embeddings doesn't work. Is there anything wrong with my model definition?
Thanks for your help in advance!
Turns out the error is because of the environment. Using Colab leads to no error.

How to merge ReLU after quantization aware training

I have a network which contains Conv2D layers followed by ReLU activations, declared as such:
x = layers.Conv2D(self.hparams['channels_count'], kernel_size=(4,1))(x)
x = layers.ReLU()(x)
And it is ported to TFLite with the following representaiton:
Basic TFLite network without Q-aware training
However, after performing quantization-aware training on the network and porting it again, the ReLU layers are now explicit in the graph:
TFLite network after Q-aware training
This results in them being processed separately on the target instead of during the evaluation of the Conv2D kernel, inducing a 10% performance loss in my overall network.
Declaring the activation with the following implicit syntax does not produce the problem:
x = layers.Conv2D(self.hparams['channels_count'], kernel_size=(4,1), activation='relu')(x)
Basic TFLite network with implicit ReLU activation
TFLite network with implicit ReLU after Q-aware training
However, this restricts the network to basic ReLU activation, whereas I would like to use ReLU6 which cannot be declared in this way.
Is this a TFLite issue? If not, is there a way to prevent the ReLU layer from being split? Or alternatively, is there a way to manually merge the ReLU layers back into the Conv2D layers after the quantization-aware training?
Edit:
QA training code:
def learn_qaware(self):
quantize_model = tfmot.quantization.keras.quantize_model
self.model = quantize_model(self.model)
training_generator = SCDataGenerator(self.training_set)
validate_generator = SCDataGenerator(self.validate_set)
self.model.compile(
optimizer=self.configure_optimizers(qa_learn=True),
loss=self.get_LLP_loss(),
metrics=self.get_metrics(),
run_eagerly=config['eager_mode'],
)
self.model.fit(
training_generator,
epochs = self.hparams['max_epochs'],
batch_size = 1,
shuffle = self.hparams['shuffle_curves'],
validation_data = validate_generator,
callbacks = self.get_callbacks(qa_learn=True),
)
Quantized TFLite model generation code:
def tflite_convert(classifier):
output_file = get_tflite_filename(classifier.model_path)
# Convert the model to the TensorFlow Lite format without quantization
saved_shape = classifier.model.input.shape.as_list()
fixed_shape = saved_shape
fixed_shape[0] = 1
classifier.model.input.set_shape(fixed_shape) # Force batch size to 1 for generation
converter = tf.lite.TFLiteConverter.from_keras_model(classifier.model)
classifier.model.input.set_shape(saved_shape)
# Set the optimization flag.
converter.optimizations = [tf.lite.Optimize.DEFAULT]
# Enforce integer only quantization
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
# Provide a representative dataset to ensure we quantize correctly.
if config['eager_mode']:
tf.executing_eagerly()
def representative_dataset():
for x in classifier.validate_set.get_all_inputs():
rs = x.reshape(1, x.shape[0], 1, 1).astype(np.float32)
yield([rs])
converter.representative_dataset = representative_dataset
model_tflite = converter.convert()
# Save the model to disk
open(output_file, "wb").write(model_tflite)
return TFLite_model(output_file)
I have found a workaround which works by instantiating a non-trained version of the model, then copying over the weights from the quantization aware trained model before converting to TFLite.
This seems like quite a hack, so I'm still on the lookout for a cleaner solution.
Code for the workaround:
def dequantize(self):
if not hasattr(self, 'fp_model') or not self.fp_model:
self.fp_model = self.get_default_model()
def find_layer_in_model(name, model):
for layer in model.layers:
if layer.name == name:
return layer
return None
def find_weight_group_in_layer(name, layer):
for weight_group in quant_layer.trainable_weights:
if weight_group.name == name:
return weight_group
return None
for layer in self.fp_model.layers:
if 'input' in layer.name or 'quantize_layer' in layer.name:
continue
QUANT_TAG = "quant_"
quant_layer = find_layer_in_model(QUANT_TAG+layer.name,self.model)
if quant_layer is None:
raise RuntimeError('Failed to match layer ' + layer.name)
for i, weight_group in enumerate(layer.trainable_weights):
quant_weight_group = find_weight_group_in_layer(QUANT_TAG+weight_group.name, quant_layer)
if quant_weight_group is None:
quant_weight_group = find_weight_group_in_layer(weight_group.name, quant_layer)
if quant_weight_group is None:
raise RuntimeError('Failed to match weight group ' + weight_group.name)
layer.trainable_weights[i].assign(quant_weight_group)
self.model = self.fp_model
You can pass activation=tf.nn.relu6 to use ReLU6 activation.

Adding attention layer to the Encoder-Decoder model architecture gives worse results

I initially defined a Encoder-Decoder Model architecture for Next Phrase Prediction and trained it on some data, I was successfully able to predict using the same model. But when I tried to insert an Attention layer in the architecture the model training was successful but I was not able to define encoder and decoder models separately for predictions. Here is the new Model Architecture defined by me:
# Model architecture along with Attention Layer
# Create the Encoder layers first.
encoder_inputs = Input(shape=(len_input,))
encoder_emb = Embedding(input_dim=vocab_in_size, output_dim=embedding_dim)
# Bidirectional LSTM or Simple LSTM
encoder_lstm = Bidirectional(LSTM(units=units, return_sequences=True, return_state=True)) # Bidirectional(
encoder_out, fstate_h, fstate_c, bstate_h, bstate_c = encoder_lstm(encoder_emb(encoder_inputs))
state_h = Concatenate()([fstate_h,bstate_h])
state_c = Concatenate()([bstate_h,bstate_c])
encoder_states = [state_h, state_c]
# Now create the Decoder layers.
decoder_inputs = Input(shape=(None,))
decoder_emb = Embedding(input_dim=vocab_out_size, output_dim=embedding_dim)
decoder_lstm = LSTM(units=units*2, return_sequences=True, return_state=True) # units=units*2
decoder_lstm_out, _, _ = decoder_lstm(decoder_emb(decoder_inputs), initial_state=encoder_states)
# Attention layer
attn_layer = AttentionLayer(name='attention_layer')
attn_out, attn_states = attn_layer([encoder_out, decoder_lstm_out])
# Concat attention input and decoder LSTM output
decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_lstm_out, attn_out])
# Two dense layers
decoder_d1 = TimeDistributed(Dense(units, activation="relu"))
decoder_d2 = TimeDistributed(Dense(vocab_out_size, activation="softmax"))
decoder_out = decoder_d2(Dropout(rate=.2)(decoder_d1(Dropout(rate=.2)(decoder_concat_input))))
#decoder_out = decoder_d2(Dropout(rate=.2)(decoder_concat_input))
# combining the encoder and the decoder layers together
model = Model(inputs = [encoder_inputs, decoder_inputs], outputs= decoder_out)
model.compile(optimizer=tf.optimizers.Adam(), loss="sparse_categorical_crossentropy", metrics=['sparse_categorical_accuracy'])
model.summary()
Trained this model and defined another encoder and decoder using the same tensors:
# Changed infmodel
# Create the encoder model from the tensors we previously declared, while training
encoder_model = Model(encoder_inputs, [encoder_out, state_h, state_c], name = 'Encoder')
# decoder model
# Generate a new set of tensors for our new inference decoder
state_input_h = Input(shape=(units*2,), name="state_input_h") # units*2 if Bidirectional LSTM else units*1
state_input_c = Input(shape=(units*2,), name="state_input_c") # units*2
inf_decoder_inputs = Input(shape=(len_input, units), name="inf_decoder_inputs")
# similar decoder model architecture with state from encoder model
decoder_res, decoder_h, decoder_c = decoder_lstm(decoder_emb(decoder_inputs),
initial_state=[state_input_h, state_input_c])
# Attention inference
attn_out_res, attn_states_res = attn_layer([inf_decoder_inputs, decoder_res])
# Concat attention input and decoder LSTM output
decoder_out_concat_res = Concatenate(axis=-1, name='concat_layer')([decoder_res, attn_out_res])
inf_decoder_out = decoder_d2(decoder_d1(decoder_out_concat_res))
# finalizing the deocder model
inf_model = Model(inputs=[decoder_inputs] + [inf_decoder_inputs, state_input_h, state_input_c],
outputs=[inf_decoder_out, decoder_h, decoder_c], name = 'Decoder')
The results after model training have become worse, I believe I have some problems with model architecture. I have got to this architecture after trying many permutations. Go through the Model architecture once.

How to get output from a specific layer in keras.tf, the bottleneck layer in autoencoder?

I am developing an autoencoder for clustering certain groups of images.
input_images->...->bottleneck->...->output_images
I have calibrated the autoencoder to my satisfaction and saved the model; everything has been developed using keras.tensorflow on python3.
The next step is to apply the autoencoder to a ton of images and cluster them according to cosine distance in the bottleneck layer. Oops, I just realized that I don't know the syntax in keras.tf for running the model on a batch up to a specific layer rather than to the output layer. Thus the question:
How do I run something like Model.predict_on_batch or Model.predict_generator up to the certain "bottleneck" layer and retrieve the values on that layer rather than the values on the output layer?
You need to define a new model (if you didn't define the encoder and decoder as separate models initially, which is usually the easiest option).
If your model was defined without reusing layers, it's just:
inputs = model.input
outputs= model.get_layer('bottleneck').output
encoder = Model(inputs, outputs)
Use the encoder model as any other model.
The full code would be like this,
# ENCODER
encoding_dim = 37310
input_layer = Input(shape=(encoding_dim,))
encoder = Dense(500, activation='tanh')(input_layer)
encoder = Dense(100, activation='tanh')(encoder)
encoder = Dense(50, activation='tanh', name='bottleneck_layer')(encoder)
decoder = Dense(100, activation='tanh')(encoder)
decoder = Dense(500, activation='tanh')(decoder)
decoder = Dense(37310, activation='sigmoid')(decoder)
# full model
model_full = models.Model(input_layer, decoder)
model_full.compile(optimizer='adam', loss='mse')
model_full.fit(x, y, epochs=20, batch_size=16)
# bottleneck model
bottleneck_output = model_full.get_layer('bottleneck_layer').output
model_bottleneck = models.Model(inputs = model_full.input, outputs = bottleneck_output)
bottleneck_predictions = model_bottleneck.predict(X_test)

Add tf.image.decode_jpeg to Keras created graph

I would like to try to use Keras Sequential model in order to train a convnet on an image classification problem.
My training set is 18K images 455x255 which is probably too big to fit into memory and so I would like to use some kind of a batch pipeline.
In my original tensorflow implementation I have this code which is simlar to the MNIST tensorflow example
How can I feed this pipeline into the Sequential model, to create something like the Keras cifa10_cnn example
with tf.name_scope('input'):
# Input data
images_initializer = tf.placeholder(
dtype=tf.string,
shape=[len_all_filepaths])
labels_initializer = tf.placeholder(
dtype=tf.int32,
shape=[len_all_filepaths])
input_images = tf.Variable(
images_initializer, trainable=False, collections=[])
input_labels = tf.Variable(
labels_initializer, trainable=False, collections=[])
image, label = tf.train.slice_input_producer(
[input_images, input_labels], num_epochs=FLAGS.num_epochs)
# process path and string tensor into an image and a label
file_contents = tf.read_file(image)
image_contents = tf.image.decode_jpeg(file_contents, channels=NUM_CHANNELS)
image_contents.set_shape([None, None, NUM_CHANNELS])
# Rotate if necessary
rotated_image_contents, = tf.py_func(rotate, [image_contents], [tf.uint8])
rotated_image_contents.set_shape([IMAGE_HEIGHT, IMAGE_WIDTH, NUM_CHANNELS])
rotated_image_contents = tf.image.per_image_whitening(rotated_image_contents)
images, labels = tf.train.batch(
[rotated_image_contents, label],
batch_size=FLAGS.batch_size,
num_threads=16,
capacity=3 * FLAGS.batch_size
)
# Build a Graph that computes predictions from the inference model.
logits = model.inference(images, len(correct_labels))
# Add to the Graph the Ops for loss calculation.
loss = model.loss(logits, labels)
# Add to the Graph the Ops that calculate and apply gradients.
train_op = model.training(loss, FLAGS.learning_rate)
...
I think the ImageDataGenerator from Keras already does batching for you. I don't understand why the Keras datagen.fit() with a specified batch size and a standard generator doesn't work for your use case.