Decoder does not accept Output of Bidirectional Encoder - tensorflow

I'm trying to implement a Encoder Decoder Model with Tensorflow. The Encoder is a bidirectional cell.
def encoder(hidden_units, encoder_embedding, sequence_length):
forward_cell = tf.contrib.rnn.LSTMCell(hidden_units)
backward_cell = tf.contrib.rnn.LSTMCell(hidden_units)
bi_outputs, final_states = tf.nn.bidirectional_dynamic_rnn(forward_cell, backward_cell, encoder_embedding, sequence_length= sequence_length, dtype=tf.float32)
encoder_outputs = tf.concat(bi_outputs, 2)
forward_cell_state, backward_cell_state =final_states
cell_state_final = tf.concat([forward_cell_state.c, backward_cell_state.c],1)
hidden_state_final = tf.concat([forward_cell_state.h, backward_cell_state.h],1)
encoder_final_state = tf.nn.rnn_cell.LSTMStateTuple(c=cell_state_final, h=hidden_state_final)
return encoder_outputs, encoder_final_state
Something went wrong between Encoder and Decoder. I get an Error like ValueError: Shapes (?, 42) and (12, 21) are not compatible ....
The Decoder has a attention mechanism and looks like this:
def decoder(decoder_embedding, vocab_size, hidden_units, sequence_length, encoder_output, encoder_state, batchsize):
projection_layer = Dense(vocab_size)
helper = tf.contrib.seq2seq.TrainingHelper(decoder_embedding, sequence_length=sequence_length)
# Decoder
decoder_cell = tf.contrib.rnn.LSTMCell(hidden_units)
# Attention Mechanis
attention_mechanism = tf.contrib.seq2seq.LuongAttention(hidden_units, encoder_output)
attn_cell = tf.contrib.seq2seq.AttentionWrapper(decoder_cell, attention_mechanism, attention_layer_size=hidden_units)
# Initial attention
attn_zero = attn_cell.zero_state(batch_size=batchsize, dtype=tf.float32)
ini_state = attn_zero.clone(cell_state=encoder_state)
decoder = tf.contrib.seq2seq.BasicDecoder(cell=attn_cell, initial_state=ini_state, helper=helper, output_layer=projection_layer)
decoder_outputs, _final_state, _final_sequence_lengths = tf.contrib.seq2seq.dynamic_decode(decoder)
return decoder_outputs
How can this be fixed?

The problem is that the number of hidden units of the encoder is twice as large as of the decoder and you are trying to Luong-style attention. It computes the attention energies (state similarities) as a dot-products between the decoder state and all encoder states—and the dot product requires the same dimensionality.
You have several options:
Use Bahdanau-style attention that has an additional non-linear layer into a shared encoder-decoder space.
Change the dimensionality of your encoder or decoder such that the hidden states will be the same, i.e. encoder's hidden_units = 2 × decoder's hidden_units
Add a linear dense layer after the encoder output that would project the encoder outputs to the decoder's hidden dimension.

Related

Question on prediction using Seq2seq model with embedding layers

I am a newbie in tensorflow and Seq2seq. When I wrote a code for Seq2Seq model with embedding layers based on others' codes with no embedding layers, I got errors when using the trained model to predict values.
Here are the codes for my Seq2Seq model:
# Build encoder
encoder_input = tf.keras.Input(shape=(None,))
encoder_embedding = tf.keras.layers.Embedding(
input_dim=max_eng_vocabulary,
output_dim=embedding_output_length,
mask_zero=True,
)
encoder_input_1 = encoder_embedding(encoder_input)
encoder_lstm = tf.keras.layers.LSTM(50, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_input_1)
encoder_states = [state_h, state_c]
# Build decoder model
decoder_input = tf.keras.Input(shape=(None,))
decoder_embedding = tf.keras.layers.Embedding(
input_dim=max_chi_vocabulary,
output_dim=embedding_output_length,
mask_zero=True,
)
decoder_input_1 = decoder_embedding(decoder_input)
decoder_lstm = tf.keras.layers.LSTM(
50,
return_state=True,
return_sequences=True,
)
decoder_outputs, _, _ = decoder_lstm(decoder_input_1, initial_state=encoder_states)
decoder_dense = tf.keras.layers.Dense(max_chi_vocabulary, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
# Combine the encoder and decoder
model = tf.keras.Model([encoder_input, decoder_input], decoder_outputs)
When I tried to use it for prediction, the code is similar with this:
model.predict([encoder_input_data[0], decoder_input_data[0]])
The input set in the above code is exactly one of the data sets in the training data set. After running the prediction code, I got errors: Layer lstm_1 expects 7 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'model/embedding_1/embedding_lookup/Identity_1:0' shape=(None, 1, 100) dtype=float32>]
A sketch of the model structure is also attached.
Model Structure
I have an additional question: it seems the masking function for the embeddings doesn't work. Is there anything wrong with my model definition?
Thanks for your help in advance!
Turns out the error is because of the environment. Using Colab leads to no error.

Adding attention layer to the Encoder-Decoder model architecture gives worse results

I initially defined a Encoder-Decoder Model architecture for Next Phrase Prediction and trained it on some data, I was successfully able to predict using the same model. But when I tried to insert an Attention layer in the architecture the model training was successful but I was not able to define encoder and decoder models separately for predictions. Here is the new Model Architecture defined by me:
# Model architecture along with Attention Layer
# Create the Encoder layers first.
encoder_inputs = Input(shape=(len_input,))
encoder_emb = Embedding(input_dim=vocab_in_size, output_dim=embedding_dim)
# Bidirectional LSTM or Simple LSTM
encoder_lstm = Bidirectional(LSTM(units=units, return_sequences=True, return_state=True)) # Bidirectional(
encoder_out, fstate_h, fstate_c, bstate_h, bstate_c = encoder_lstm(encoder_emb(encoder_inputs))
state_h = Concatenate()([fstate_h,bstate_h])
state_c = Concatenate()([bstate_h,bstate_c])
encoder_states = [state_h, state_c]
# Now create the Decoder layers.
decoder_inputs = Input(shape=(None,))
decoder_emb = Embedding(input_dim=vocab_out_size, output_dim=embedding_dim)
decoder_lstm = LSTM(units=units*2, return_sequences=True, return_state=True) # units=units*2
decoder_lstm_out, _, _ = decoder_lstm(decoder_emb(decoder_inputs), initial_state=encoder_states)
# Attention layer
attn_layer = AttentionLayer(name='attention_layer')
attn_out, attn_states = attn_layer([encoder_out, decoder_lstm_out])
# Concat attention input and decoder LSTM output
decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_lstm_out, attn_out])
# Two dense layers
decoder_d1 = TimeDistributed(Dense(units, activation="relu"))
decoder_d2 = TimeDistributed(Dense(vocab_out_size, activation="softmax"))
decoder_out = decoder_d2(Dropout(rate=.2)(decoder_d1(Dropout(rate=.2)(decoder_concat_input))))
#decoder_out = decoder_d2(Dropout(rate=.2)(decoder_concat_input))
# combining the encoder and the decoder layers together
model = Model(inputs = [encoder_inputs, decoder_inputs], outputs= decoder_out)
model.compile(optimizer=tf.optimizers.Adam(), loss="sparse_categorical_crossentropy", metrics=['sparse_categorical_accuracy'])
model.summary()
Trained this model and defined another encoder and decoder using the same tensors:
# Changed infmodel
# Create the encoder model from the tensors we previously declared, while training
encoder_model = Model(encoder_inputs, [encoder_out, state_h, state_c], name = 'Encoder')
# decoder model
# Generate a new set of tensors for our new inference decoder
state_input_h = Input(shape=(units*2,), name="state_input_h") # units*2 if Bidirectional LSTM else units*1
state_input_c = Input(shape=(units*2,), name="state_input_c") # units*2
inf_decoder_inputs = Input(shape=(len_input, units), name="inf_decoder_inputs")
# similar decoder model architecture with state from encoder model
decoder_res, decoder_h, decoder_c = decoder_lstm(decoder_emb(decoder_inputs),
initial_state=[state_input_h, state_input_c])
# Attention inference
attn_out_res, attn_states_res = attn_layer([inf_decoder_inputs, decoder_res])
# Concat attention input and decoder LSTM output
decoder_out_concat_res = Concatenate(axis=-1, name='concat_layer')([decoder_res, attn_out_res])
inf_decoder_out = decoder_d2(decoder_d1(decoder_out_concat_res))
# finalizing the deocder model
inf_model = Model(inputs=[decoder_inputs] + [inf_decoder_inputs, state_input_h, state_input_c],
outputs=[inf_decoder_out, decoder_h, decoder_c], name = 'Decoder')
The results after model training have become worse, I believe I have some problems with model architecture. I have got to this architecture after trying many permutations. Go through the Model architecture once.

TimeDistributed(Dense) vs Dense in seq2seq

Given the code below
encoder_inputs = Input(shape=(16, 70))
encoder = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(59, 93))
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs,_,_ = decoder_lstm(decoder_inputs,
initial_state=encoder_states)
decoder_dense = TimeDistributed(Dense(93, activation='softmax'))
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
if I change
decoder_dense = TimeDistributed(Dense(93, activation='softmax'))
to
decoder_dense = Dense(93, activation='softmax')
it still work, but which method is more effective?
If your Data is dependent on Time, like Time Series Data or the data comprising different frames of a Video, then Time Distributed Dense Layer is effective than simple Dense Layer.
Time Distributed Dense applies the same dense layer to every time step during GRU/LSTM Cell unrolling. That’s why the error function will be between the predicted label sequence and the actual label sequence.
Using return_sequences=False, the Dense layer will get applied only once in the last cell. This is normally the case when RNNs are used for classification problems.
If return_sequences=True, then the Dense layer is used to apply at every timestep just like TimeDistributedDense.
In your models both are the same, but if u change your second model to return_sequences=False, then the Dense will be applied only at the last cell.
Hope this helps. Happy Learning!

How to get output from a specific layer in keras.tf, the bottleneck layer in autoencoder?

I am developing an autoencoder for clustering certain groups of images.
input_images->...->bottleneck->...->output_images
I have calibrated the autoencoder to my satisfaction and saved the model; everything has been developed using keras.tensorflow on python3.
The next step is to apply the autoencoder to a ton of images and cluster them according to cosine distance in the bottleneck layer. Oops, I just realized that I don't know the syntax in keras.tf for running the model on a batch up to a specific layer rather than to the output layer. Thus the question:
How do I run something like Model.predict_on_batch or Model.predict_generator up to the certain "bottleneck" layer and retrieve the values on that layer rather than the values on the output layer?
You need to define a new model (if you didn't define the encoder and decoder as separate models initially, which is usually the easiest option).
If your model was defined without reusing layers, it's just:
inputs = model.input
outputs= model.get_layer('bottleneck').output
encoder = Model(inputs, outputs)
Use the encoder model as any other model.
The full code would be like this,
# ENCODER
encoding_dim = 37310
input_layer = Input(shape=(encoding_dim,))
encoder = Dense(500, activation='tanh')(input_layer)
encoder = Dense(100, activation='tanh')(encoder)
encoder = Dense(50, activation='tanh', name='bottleneck_layer')(encoder)
decoder = Dense(100, activation='tanh')(encoder)
decoder = Dense(500, activation='tanh')(decoder)
decoder = Dense(37310, activation='sigmoid')(decoder)
# full model
model_full = models.Model(input_layer, decoder)
model_full.compile(optimizer='adam', loss='mse')
model_full.fit(x, y, epochs=20, batch_size=16)
# bottleneck model
bottleneck_output = model_full.get_layer('bottleneck_layer').output
model_bottleneck = models.Model(inputs = model_full.input, outputs = bottleneck_output)
bottleneck_predictions = model_bottleneck.predict(X_test)

How to connect multi-layered Bi-directional LSTM encoder to a decoder?

I'm making a seq2seq model which uses a Bi-LSTM as encoder and Attention mechanism in decoder. For a single layer of LSTM model is working fine. My encoder looks something like this.
Encoder:
def encoding_layer(self, rnn_inputs, rnn_size, num_layers, keep_prob,
source_vocab_size,
encoding_embedding_size,
source_sequence_length,
emb_matrix):
embed = tf.nn.embedding_lookup(emb_matrix, rnn_inputs)
stacked_cells = tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.LSTMCell(rnn_size), keep_prob)
outputs, state = tf.nn.bidirectional_dynamic_rnn(cell_fw=stacked_cells,
cell_bw=stacked_cells,
inputs=embed,
sequence_length=source_sequence_length,
dtype=tf.float32)
concat_outputs = tf.concat(outputs, 2)
cell_state_fw, cell_state_bw = state
cell_state_final = tf.concat([cell_state_fw.c, cell_state_bw.c], 1)
hidden_state_final = tf.concat([cell_state_fw.h, cell_state_bw.h], 1)
encoder_final_state = tf.nn.rnn_cell.LSTMStateTuple(c=cell_state_final, h=hidden_state_final)
return concat_outputs, encoder_final_state
Decoder :
def decoding_layer_train(self, encoder_outputs, encoder_state, dec_cell, dec_embed_input,
target_sequence_length, max_summary_length,
output_layer, keep_prob, rnn_size, batch_size):
rnn_size = 2 * rnn_size
dec_cell = tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.LSTMCell(rnn_size), keep_prob)
train_helper = tf.contrib.seq2seq.TrainingHelper(dec_embed_input, target_sequence_length)
attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(rnn_size, encoder_outputs,
memory_sequence_length=target_sequence_length)
attention_cell = tf.contrib.seq2seq.AttentionWrapper(dec_cell, attention_mechanism,
attention_layer_size=rnn_size/2)
state = attention_cell.zero_state(dtype=tf.float32, batch_size=batch_size)
state = state.clone(cell_state=encoder_state)
decoder = tf.contrib.seq2seq.BasicDecoder(cell=attention_cell, helper=train_helper,
initial_state=state,
output_layer=output_layer)
outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder, impute_finished=True, maximum_iterations=max_summary_length)
return outputs
With above configuration of single layer Bi-LSTM my model is working fine. But, now I want to use a multilayered Bi-LSTM encoder and decoder. So, in encoder and decoder if I change the cell to:
stacked_cells = tf.contrib.rnn.MultiRNNCell([tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.LSTMCell(rnn_size), keep_prob) for _ in range(num_layers)])
After changing cell I am getting this error:
AttributeError: 'tuple' object has no attribute 'c'
here,
num_layers = 2
rnn_size = 128
embedding_size = 50
So, I want to know what exactly is returned as state in second case. And how to pass that state to decoder.
Full code: https://github.com/sainimohit23/Text-Summarization