Building Keras Seq2Seq Inference Model - tensorflow

I am building a Seq2Seq Model with the Encoder-Decoder Architecture. The model is aimed to summarise input text. The training model has been built and the training seems fine. Below is the source code of the training model, which is similar to the Keras documentation.
from tensorflow import keras
#Hyperparameters
latent_dim = 256
batch_size = 32 # Batch size for training.
epochs = 10 # Number of epochs to train for.
# Define an input sequence and process it.
encoder_inputs = keras.Input(shape=(max_encoder_seq_length,))
enc_embedding = keras.layers.Embedding(input_dim=num_encoder_tokens, output_dim=128,)
enc_embedding_context = enc_embedding(encoder_inputs)
encoder = keras.layers.LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(enc_embedding_context)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = keras.Input(shape=(max_decoder_seq_length,))
dec_embedding = keras.layers.Embedding(input_dim=num_decoder_tokens, output_dim=128,)
dec_embedding_context = dec_embedding(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = keras.layers.LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(dec_embedding_context, initial_state=encoder_states)
decoder_dense = keras.layers.TimeDistributed(keras.layers.Dense(num_decoder_tokens, activation='softmax'))
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = keras.Model([encoder_inputs, decoder_inputs], decoder_outputs)
The model summary.
I am facing a problem when building the inference model. Initially, the last layer of the decoder model connected to the second output tensor of the last second layer (it should be connecting to the first output tensor). And each time the code below was executed, the index keep on increment.
encoder_inputs = model.input[0] # input_1
encoder_outputs, state_h_enc, state_c_enc = model.layers[4].output # lstm
encoder_states = [state_h_enc, state_c_enc]
encoder_model = keras.Model(encoder_inputs, encoder_states)
decoder_inputs = model.input[1] # input_2
decoder_embedding = model.layers[3].output
# Get the state from encoder
decoder_state_input_h = keras.Input(shape=(latent_dim,))
decoder_state_input_c = keras.Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_lstm = model.layers[5] #lstm_1
decoder_outputs_lstm, state_h_dec, state_c_dec = decoder_lstm(
decoder_embedding, initial_state=decoder_states_inputs
)
decoder_states = [state_h_dec, state_c_dec]
decoder_dense = model.layers[6]
decoder_outputs = decoder_dense(decoder_outputs_lstm)
decoder_model = keras.Model(
[decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states
)
The decoder model summary was as below.
The encoder model seems fine.

Related

Keras GRU: An `initial_state` was passed that is not compatible with `cell.state_size`

For a deep learning based project I have built a sequnce2sequnce model using LSTM. Now I want to use GRU instead of LSTM but I don't have enough knowledge in the domain of Deep learning. I got this error and cant solve it.
Error Message
An `initial_state` was passed that is not compatible with `cell.state_size`. Received `state_spec`=ListWrapper([InputSpec(shape=(None, 80, 512), ndim=3), InputSpec(shape=(None, 512), ndim=2)]); however `cell.state_size` is [512]
My code
time_steps_encoder=80
num_encoder_tokens=4096
latent_dim=512
time_steps_decoder=10
num_decoder_tokens=1500
batch_size=320
# Setting up the encoder
encoder_inputs = Input(shape=(time_steps_encoder, num_encoder_tokens), name="encoder_inputs")
encoder = GRU(latent_dim, return_state=True,return_sequences=True, name='endcoder')
state_h, state_c = encoder(encoder_inputs)
encoder_states = [state_h, state_c]
#Set up the decoder
decoder_inputs = Input(shape=(time_steps_decoder, num_decoder_tokens), name= "decoder_inputs")
decoder = GRU(latent_dim, return_sequences=True, return_state=True, name='decoder')
decoder_outputs, _ , _= decoder(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax', name='decoder_relu')
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()
plot_model(model, to_file='model_train.png', show_shapes=True, show_layer_names=True)
Error was showing in this line
Error
I think, batch_size might be the first dimension of your initial_states. To put it another way, each element in a batch may start out in a distinct initial_states.

Question on prediction using Seq2seq model with embedding layers

I am a newbie in tensorflow and Seq2seq. When I wrote a code for Seq2Seq model with embedding layers based on others' codes with no embedding layers, I got errors when using the trained model to predict values.
Here are the codes for my Seq2Seq model:
# Build encoder
encoder_input = tf.keras.Input(shape=(None,))
encoder_embedding = tf.keras.layers.Embedding(
input_dim=max_eng_vocabulary,
output_dim=embedding_output_length,
mask_zero=True,
)
encoder_input_1 = encoder_embedding(encoder_input)
encoder_lstm = tf.keras.layers.LSTM(50, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_input_1)
encoder_states = [state_h, state_c]
# Build decoder model
decoder_input = tf.keras.Input(shape=(None,))
decoder_embedding = tf.keras.layers.Embedding(
input_dim=max_chi_vocabulary,
output_dim=embedding_output_length,
mask_zero=True,
)
decoder_input_1 = decoder_embedding(decoder_input)
decoder_lstm = tf.keras.layers.LSTM(
50,
return_state=True,
return_sequences=True,
)
decoder_outputs, _, _ = decoder_lstm(decoder_input_1, initial_state=encoder_states)
decoder_dense = tf.keras.layers.Dense(max_chi_vocabulary, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
# Combine the encoder and decoder
model = tf.keras.Model([encoder_input, decoder_input], decoder_outputs)
When I tried to use it for prediction, the code is similar with this:
model.predict([encoder_input_data[0], decoder_input_data[0]])
The input set in the above code is exactly one of the data sets in the training data set. After running the prediction code, I got errors: Layer lstm_1 expects 7 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'model/embedding_1/embedding_lookup/Identity_1:0' shape=(None, 1, 100) dtype=float32>]
A sketch of the model structure is also attached.
Model Structure
I have an additional question: it seems the masking function for the embeddings doesn't work. Is there anything wrong with my model definition?
Thanks for your help in advance!
Turns out the error is because of the environment. Using Colab leads to no error.

why seq2seq model return negative loss if I used a pre-trained embedding model

I am following this example code to build a seq2seq model using keras. https://github.com/keras-team/keras/blob/master/examples/lstm_seq2seq.py
when I train that code it works normally fine and the results are good. But when I try to train it using a pre-trained embedding model, the loss and the crossentropy always get negative values.
I have tried to use only a dataset of 5 examples to make the model overfit over them, just to make sure it works correct, but the loss and the crossentropy still negative.
I use FastText embedding model, here is the code to load the dataset with the embedding vectors:
encoder_input_data = np.zeros(
(input_texts_len, max_encoder_seq_length,vector_length),
dtype='float32')
decoder_input_data = np.zeros(
(input_texts_len, max_decoder_seq_length,vector_length),
dtype='float32')
decoder_target_data = np.zeros(
(input_texts_len, max_decoder_seq_length,vector_length),
dtype='float32')
padding = np.zeros((vector_length),dtype='float32')
for i, (input_text, target_text) in enumerate(zip(input_texts, target_texts)):
for t, word in enumerate(input_text):
encoder_input_data[i, t] = w2v.get_vector(word)
encoder_input_data[i, t + 1:] = padding
for t, word in enumerate(target_text):
decoder_input_data[i, t] = w2v.get_vector(word)
if t > 0:
decoder_target_data[i, t - 1] = w2v.get_vector(word)
decoder_input_data[i, t + 1:] = padding
decoder_target_data[i, t] = padding
Here is the model code itself:
encoder_inputs = Input(shape=(max_encoder_seq_length,vec_leng,))
x = Masking(mask_value=0.0)(encoder_inputs)
encoder = LSTM(latent_dim,name='lstm_1')
encoder_outputs, state_h, state_c = encoder(x)
encoder_states = [state_h, state_c]
decoder_inputs = Input(shape=(max_decoder_seq_length,vec_leng,))
a = Masking(mask_value=0.0) (decoder_inputs)
decoder_lstm = LSTM(latent_dim,name='decoder_lstm')
decoder_outputs, _, _ = decoder_lstm(a, initial_state=encoder_states)
# Attention layer
attn_layer = AttentionLayer(name='attention_layer')
attn_out, attn_states = attn_layer([encoder_outputs, decoder_outputs])
decoder_concat_input = Concatenate(axis=-1)([decoder_outputs, attn_out])
decoder_dense = Dense(vec_leng, activation='softmax')
dense_time = TimeDistributed(decoder_dense, name='time_distributed_layer')
decoder_pred = dense_time(decoder_concat_input)
model = Model(inputs=[encoder_inputs, decoder_inputs], outputs=decoder_pred, name='main_model')
encoder_model = Model(inputs=encoder_inputs, outputs=[encoder_outputs, state_h, encoder_states], name='encoder_model')
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
encoder_states_ = Input(batch_shape=(1,max_encoder_seq_length, latent_dim))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
a = Input(shape=(max_decoder_seq_length,vec_leng,))
decoder_outputs, state_h, state_c = decoder_lstm(a, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
attn_inf_out, attn_inf_states = attn_layer([encoder_states_, decoder_outputs])
decoder_inf_concat = Concatenate(axis=-1)([decoder_outputs, attn_inf_out])
decoder_inf_pred = TimeDistributed(decoder_dense)(decoder_inf_concat)
decoder_model = Model(
[encoder_states_, decoder_states_inputs, a],
[decoder_inf_pred, attn_inf_states, decoder_states], name='decoder_model')
and here is the training prints:
what is the reason that I got these negative values? and how to solve them?
You get negative loss values because your target vectors elements are not correct, your one_hot target vector elements must be 1 or 0 integers.

Adding attention layer to the Encoder-Decoder model architecture gives worse results

I initially defined a Encoder-Decoder Model architecture for Next Phrase Prediction and trained it on some data, I was successfully able to predict using the same model. But when I tried to insert an Attention layer in the architecture the model training was successful but I was not able to define encoder and decoder models separately for predictions. Here is the new Model Architecture defined by me:
# Model architecture along with Attention Layer
# Create the Encoder layers first.
encoder_inputs = Input(shape=(len_input,))
encoder_emb = Embedding(input_dim=vocab_in_size, output_dim=embedding_dim)
# Bidirectional LSTM or Simple LSTM
encoder_lstm = Bidirectional(LSTM(units=units, return_sequences=True, return_state=True)) # Bidirectional(
encoder_out, fstate_h, fstate_c, bstate_h, bstate_c = encoder_lstm(encoder_emb(encoder_inputs))
state_h = Concatenate()([fstate_h,bstate_h])
state_c = Concatenate()([bstate_h,bstate_c])
encoder_states = [state_h, state_c]
# Now create the Decoder layers.
decoder_inputs = Input(shape=(None,))
decoder_emb = Embedding(input_dim=vocab_out_size, output_dim=embedding_dim)
decoder_lstm = LSTM(units=units*2, return_sequences=True, return_state=True) # units=units*2
decoder_lstm_out, _, _ = decoder_lstm(decoder_emb(decoder_inputs), initial_state=encoder_states)
# Attention layer
attn_layer = AttentionLayer(name='attention_layer')
attn_out, attn_states = attn_layer([encoder_out, decoder_lstm_out])
# Concat attention input and decoder LSTM output
decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_lstm_out, attn_out])
# Two dense layers
decoder_d1 = TimeDistributed(Dense(units, activation="relu"))
decoder_d2 = TimeDistributed(Dense(vocab_out_size, activation="softmax"))
decoder_out = decoder_d2(Dropout(rate=.2)(decoder_d1(Dropout(rate=.2)(decoder_concat_input))))
#decoder_out = decoder_d2(Dropout(rate=.2)(decoder_concat_input))
# combining the encoder and the decoder layers together
model = Model(inputs = [encoder_inputs, decoder_inputs], outputs= decoder_out)
model.compile(optimizer=tf.optimizers.Adam(), loss="sparse_categorical_crossentropy", metrics=['sparse_categorical_accuracy'])
model.summary()
Trained this model and defined another encoder and decoder using the same tensors:
# Changed infmodel
# Create the encoder model from the tensors we previously declared, while training
encoder_model = Model(encoder_inputs, [encoder_out, state_h, state_c], name = 'Encoder')
# decoder model
# Generate a new set of tensors for our new inference decoder
state_input_h = Input(shape=(units*2,), name="state_input_h") # units*2 if Bidirectional LSTM else units*1
state_input_c = Input(shape=(units*2,), name="state_input_c") # units*2
inf_decoder_inputs = Input(shape=(len_input, units), name="inf_decoder_inputs")
# similar decoder model architecture with state from encoder model
decoder_res, decoder_h, decoder_c = decoder_lstm(decoder_emb(decoder_inputs),
initial_state=[state_input_h, state_input_c])
# Attention inference
attn_out_res, attn_states_res = attn_layer([inf_decoder_inputs, decoder_res])
# Concat attention input and decoder LSTM output
decoder_out_concat_res = Concatenate(axis=-1, name='concat_layer')([decoder_res, attn_out_res])
inf_decoder_out = decoder_d2(decoder_d1(decoder_out_concat_res))
# finalizing the deocder model
inf_model = Model(inputs=[decoder_inputs] + [inf_decoder_inputs, state_input_h, state_input_c],
outputs=[inf_decoder_out, decoder_h, decoder_c], name = 'Decoder')
The results after model training have become worse, I believe I have some problems with model architecture. I have got to this architecture after trying many permutations. Go through the Model architecture once.

How to connect multi-layered Bi-directional LSTM encoder to a decoder?

I'm making a seq2seq model which uses a Bi-LSTM as encoder and Attention mechanism in decoder. For a single layer of LSTM model is working fine. My encoder looks something like this.
Encoder:
def encoding_layer(self, rnn_inputs, rnn_size, num_layers, keep_prob,
source_vocab_size,
encoding_embedding_size,
source_sequence_length,
emb_matrix):
embed = tf.nn.embedding_lookup(emb_matrix, rnn_inputs)
stacked_cells = tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.LSTMCell(rnn_size), keep_prob)
outputs, state = tf.nn.bidirectional_dynamic_rnn(cell_fw=stacked_cells,
cell_bw=stacked_cells,
inputs=embed,
sequence_length=source_sequence_length,
dtype=tf.float32)
concat_outputs = tf.concat(outputs, 2)
cell_state_fw, cell_state_bw = state
cell_state_final = tf.concat([cell_state_fw.c, cell_state_bw.c], 1)
hidden_state_final = tf.concat([cell_state_fw.h, cell_state_bw.h], 1)
encoder_final_state = tf.nn.rnn_cell.LSTMStateTuple(c=cell_state_final, h=hidden_state_final)
return concat_outputs, encoder_final_state
Decoder :
def decoding_layer_train(self, encoder_outputs, encoder_state, dec_cell, dec_embed_input,
target_sequence_length, max_summary_length,
output_layer, keep_prob, rnn_size, batch_size):
rnn_size = 2 * rnn_size
dec_cell = tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.LSTMCell(rnn_size), keep_prob)
train_helper = tf.contrib.seq2seq.TrainingHelper(dec_embed_input, target_sequence_length)
attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(rnn_size, encoder_outputs,
memory_sequence_length=target_sequence_length)
attention_cell = tf.contrib.seq2seq.AttentionWrapper(dec_cell, attention_mechanism,
attention_layer_size=rnn_size/2)
state = attention_cell.zero_state(dtype=tf.float32, batch_size=batch_size)
state = state.clone(cell_state=encoder_state)
decoder = tf.contrib.seq2seq.BasicDecoder(cell=attention_cell, helper=train_helper,
initial_state=state,
output_layer=output_layer)
outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder, impute_finished=True, maximum_iterations=max_summary_length)
return outputs
With above configuration of single layer Bi-LSTM my model is working fine. But, now I want to use a multilayered Bi-LSTM encoder and decoder. So, in encoder and decoder if I change the cell to:
stacked_cells = tf.contrib.rnn.MultiRNNCell([tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.LSTMCell(rnn_size), keep_prob) for _ in range(num_layers)])
After changing cell I am getting this error:
AttributeError: 'tuple' object has no attribute 'c'
here,
num_layers = 2
rnn_size = 128
embedding_size = 50
So, I want to know what exactly is returned as state in second case. And how to pass that state to decoder.
Full code: https://github.com/sainimohit23/Text-Summarization