TimeDistributed(Dense) vs Dense in seq2seq - tensorflow

Given the code below
encoder_inputs = Input(shape=(16, 70))
encoder = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(59, 93))
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs,_,_ = decoder_lstm(decoder_inputs,
initial_state=encoder_states)
decoder_dense = TimeDistributed(Dense(93, activation='softmax'))
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
if I change
decoder_dense = TimeDistributed(Dense(93, activation='softmax'))
to
decoder_dense = Dense(93, activation='softmax')
it still work, but which method is more effective?

If your Data is dependent on Time, like Time Series Data or the data comprising different frames of a Video, then Time Distributed Dense Layer is effective than simple Dense Layer.
Time Distributed Dense applies the same dense layer to every time step during GRU/LSTM Cell unrolling. That’s why the error function will be between the predicted label sequence and the actual label sequence.
Using return_sequences=False, the Dense layer will get applied only once in the last cell. This is normally the case when RNNs are used for classification problems.
If return_sequences=True, then the Dense layer is used to apply at every timestep just like TimeDistributedDense.
In your models both are the same, but if u change your second model to return_sequences=False, then the Dense will be applied only at the last cell.
Hope this helps. Happy Learning!

Related

Building Keras Seq2Seq Inference Model

I am building a Seq2Seq Model with the Encoder-Decoder Architecture. The model is aimed to summarise input text. The training model has been built and the training seems fine. Below is the source code of the training model, which is similar to the Keras documentation.
from tensorflow import keras
#Hyperparameters
latent_dim = 256
batch_size = 32 # Batch size for training.
epochs = 10 # Number of epochs to train for.
# Define an input sequence and process it.
encoder_inputs = keras.Input(shape=(max_encoder_seq_length,))
enc_embedding = keras.layers.Embedding(input_dim=num_encoder_tokens, output_dim=128,)
enc_embedding_context = enc_embedding(encoder_inputs)
encoder = keras.layers.LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(enc_embedding_context)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = keras.Input(shape=(max_decoder_seq_length,))
dec_embedding = keras.layers.Embedding(input_dim=num_decoder_tokens, output_dim=128,)
dec_embedding_context = dec_embedding(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = keras.layers.LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(dec_embedding_context, initial_state=encoder_states)
decoder_dense = keras.layers.TimeDistributed(keras.layers.Dense(num_decoder_tokens, activation='softmax'))
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = keras.Model([encoder_inputs, decoder_inputs], decoder_outputs)
The model summary.
I am facing a problem when building the inference model. Initially, the last layer of the decoder model connected to the second output tensor of the last second layer (it should be connecting to the first output tensor). And each time the code below was executed, the index keep on increment.
encoder_inputs = model.input[0] # input_1
encoder_outputs, state_h_enc, state_c_enc = model.layers[4].output # lstm
encoder_states = [state_h_enc, state_c_enc]
encoder_model = keras.Model(encoder_inputs, encoder_states)
decoder_inputs = model.input[1] # input_2
decoder_embedding = model.layers[3].output
# Get the state from encoder
decoder_state_input_h = keras.Input(shape=(latent_dim,))
decoder_state_input_c = keras.Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_lstm = model.layers[5] #lstm_1
decoder_outputs_lstm, state_h_dec, state_c_dec = decoder_lstm(
decoder_embedding, initial_state=decoder_states_inputs
)
decoder_states = [state_h_dec, state_c_dec]
decoder_dense = model.layers[6]
decoder_outputs = decoder_dense(decoder_outputs_lstm)
decoder_model = keras.Model(
[decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states
)
The decoder model summary was as below.
The encoder model seems fine.

Question on prediction using Seq2seq model with embedding layers

I am a newbie in tensorflow and Seq2seq. When I wrote a code for Seq2Seq model with embedding layers based on others' codes with no embedding layers, I got errors when using the trained model to predict values.
Here are the codes for my Seq2Seq model:
# Build encoder
encoder_input = tf.keras.Input(shape=(None,))
encoder_embedding = tf.keras.layers.Embedding(
input_dim=max_eng_vocabulary,
output_dim=embedding_output_length,
mask_zero=True,
)
encoder_input_1 = encoder_embedding(encoder_input)
encoder_lstm = tf.keras.layers.LSTM(50, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_input_1)
encoder_states = [state_h, state_c]
# Build decoder model
decoder_input = tf.keras.Input(shape=(None,))
decoder_embedding = tf.keras.layers.Embedding(
input_dim=max_chi_vocabulary,
output_dim=embedding_output_length,
mask_zero=True,
)
decoder_input_1 = decoder_embedding(decoder_input)
decoder_lstm = tf.keras.layers.LSTM(
50,
return_state=True,
return_sequences=True,
)
decoder_outputs, _, _ = decoder_lstm(decoder_input_1, initial_state=encoder_states)
decoder_dense = tf.keras.layers.Dense(max_chi_vocabulary, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
# Combine the encoder and decoder
model = tf.keras.Model([encoder_input, decoder_input], decoder_outputs)
When I tried to use it for prediction, the code is similar with this:
model.predict([encoder_input_data[0], decoder_input_data[0]])
The input set in the above code is exactly one of the data sets in the training data set. After running the prediction code, I got errors: Layer lstm_1 expects 7 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'model/embedding_1/embedding_lookup/Identity_1:0' shape=(None, 1, 100) dtype=float32>]
A sketch of the model structure is also attached.
Model Structure
I have an additional question: it seems the masking function for the embeddings doesn't work. Is there anything wrong with my model definition?
Thanks for your help in advance!
Turns out the error is because of the environment. Using Colab leads to no error.

Decoder does not accept Output of Bidirectional Encoder

I'm trying to implement a Encoder Decoder Model with Tensorflow. The Encoder is a bidirectional cell.
def encoder(hidden_units, encoder_embedding, sequence_length):
forward_cell = tf.contrib.rnn.LSTMCell(hidden_units)
backward_cell = tf.contrib.rnn.LSTMCell(hidden_units)
bi_outputs, final_states = tf.nn.bidirectional_dynamic_rnn(forward_cell, backward_cell, encoder_embedding, sequence_length= sequence_length, dtype=tf.float32)
encoder_outputs = tf.concat(bi_outputs, 2)
forward_cell_state, backward_cell_state =final_states
cell_state_final = tf.concat([forward_cell_state.c, backward_cell_state.c],1)
hidden_state_final = tf.concat([forward_cell_state.h, backward_cell_state.h],1)
encoder_final_state = tf.nn.rnn_cell.LSTMStateTuple(c=cell_state_final, h=hidden_state_final)
return encoder_outputs, encoder_final_state
Something went wrong between Encoder and Decoder. I get an Error like ValueError: Shapes (?, 42) and (12, 21) are not compatible ....
The Decoder has a attention mechanism and looks like this:
def decoder(decoder_embedding, vocab_size, hidden_units, sequence_length, encoder_output, encoder_state, batchsize):
projection_layer = Dense(vocab_size)
helper = tf.contrib.seq2seq.TrainingHelper(decoder_embedding, sequence_length=sequence_length)
# Decoder
decoder_cell = tf.contrib.rnn.LSTMCell(hidden_units)
# Attention Mechanis
attention_mechanism = tf.contrib.seq2seq.LuongAttention(hidden_units, encoder_output)
attn_cell = tf.contrib.seq2seq.AttentionWrapper(decoder_cell, attention_mechanism, attention_layer_size=hidden_units)
# Initial attention
attn_zero = attn_cell.zero_state(batch_size=batchsize, dtype=tf.float32)
ini_state = attn_zero.clone(cell_state=encoder_state)
decoder = tf.contrib.seq2seq.BasicDecoder(cell=attn_cell, initial_state=ini_state, helper=helper, output_layer=projection_layer)
decoder_outputs, _final_state, _final_sequence_lengths = tf.contrib.seq2seq.dynamic_decode(decoder)
return decoder_outputs
How can this be fixed?
The problem is that the number of hidden units of the encoder is twice as large as of the decoder and you are trying to Luong-style attention. It computes the attention energies (state similarities) as a dot-products between the decoder state and all encoder states—and the dot product requires the same dimensionality.
You have several options:
Use Bahdanau-style attention that has an additional non-linear layer into a shared encoder-decoder space.
Change the dimensionality of your encoder or decoder such that the hidden states will be the same, i.e. encoder's hidden_units = 2 × decoder's hidden_units
Add a linear dense layer after the encoder output that would project the encoder outputs to the decoder's hidden dimension.

Adding attention layer to the Encoder-Decoder model architecture gives worse results

I initially defined a Encoder-Decoder Model architecture for Next Phrase Prediction and trained it on some data, I was successfully able to predict using the same model. But when I tried to insert an Attention layer in the architecture the model training was successful but I was not able to define encoder and decoder models separately for predictions. Here is the new Model Architecture defined by me:
# Model architecture along with Attention Layer
# Create the Encoder layers first.
encoder_inputs = Input(shape=(len_input,))
encoder_emb = Embedding(input_dim=vocab_in_size, output_dim=embedding_dim)
# Bidirectional LSTM or Simple LSTM
encoder_lstm = Bidirectional(LSTM(units=units, return_sequences=True, return_state=True)) # Bidirectional(
encoder_out, fstate_h, fstate_c, bstate_h, bstate_c = encoder_lstm(encoder_emb(encoder_inputs))
state_h = Concatenate()([fstate_h,bstate_h])
state_c = Concatenate()([bstate_h,bstate_c])
encoder_states = [state_h, state_c]
# Now create the Decoder layers.
decoder_inputs = Input(shape=(None,))
decoder_emb = Embedding(input_dim=vocab_out_size, output_dim=embedding_dim)
decoder_lstm = LSTM(units=units*2, return_sequences=True, return_state=True) # units=units*2
decoder_lstm_out, _, _ = decoder_lstm(decoder_emb(decoder_inputs), initial_state=encoder_states)
# Attention layer
attn_layer = AttentionLayer(name='attention_layer')
attn_out, attn_states = attn_layer([encoder_out, decoder_lstm_out])
# Concat attention input and decoder LSTM output
decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_lstm_out, attn_out])
# Two dense layers
decoder_d1 = TimeDistributed(Dense(units, activation="relu"))
decoder_d2 = TimeDistributed(Dense(vocab_out_size, activation="softmax"))
decoder_out = decoder_d2(Dropout(rate=.2)(decoder_d1(Dropout(rate=.2)(decoder_concat_input))))
#decoder_out = decoder_d2(Dropout(rate=.2)(decoder_concat_input))
# combining the encoder and the decoder layers together
model = Model(inputs = [encoder_inputs, decoder_inputs], outputs= decoder_out)
model.compile(optimizer=tf.optimizers.Adam(), loss="sparse_categorical_crossentropy", metrics=['sparse_categorical_accuracy'])
model.summary()
Trained this model and defined another encoder and decoder using the same tensors:
# Changed infmodel
# Create the encoder model from the tensors we previously declared, while training
encoder_model = Model(encoder_inputs, [encoder_out, state_h, state_c], name = 'Encoder')
# decoder model
# Generate a new set of tensors for our new inference decoder
state_input_h = Input(shape=(units*2,), name="state_input_h") # units*2 if Bidirectional LSTM else units*1
state_input_c = Input(shape=(units*2,), name="state_input_c") # units*2
inf_decoder_inputs = Input(shape=(len_input, units), name="inf_decoder_inputs")
# similar decoder model architecture with state from encoder model
decoder_res, decoder_h, decoder_c = decoder_lstm(decoder_emb(decoder_inputs),
initial_state=[state_input_h, state_input_c])
# Attention inference
attn_out_res, attn_states_res = attn_layer([inf_decoder_inputs, decoder_res])
# Concat attention input and decoder LSTM output
decoder_out_concat_res = Concatenate(axis=-1, name='concat_layer')([decoder_res, attn_out_res])
inf_decoder_out = decoder_d2(decoder_d1(decoder_out_concat_res))
# finalizing the deocder model
inf_model = Model(inputs=[decoder_inputs] + [inf_decoder_inputs, state_input_h, state_input_c],
outputs=[inf_decoder_out, decoder_h, decoder_c], name = 'Decoder')
The results after model training have become worse, I believe I have some problems with model architecture. I have got to this architecture after trying many permutations. Go through the Model architecture once.

How to get output from a specific layer in keras.tf, the bottleneck layer in autoencoder?

I am developing an autoencoder for clustering certain groups of images.
input_images->...->bottleneck->...->output_images
I have calibrated the autoencoder to my satisfaction and saved the model; everything has been developed using keras.tensorflow on python3.
The next step is to apply the autoencoder to a ton of images and cluster them according to cosine distance in the bottleneck layer. Oops, I just realized that I don't know the syntax in keras.tf for running the model on a batch up to a specific layer rather than to the output layer. Thus the question:
How do I run something like Model.predict_on_batch or Model.predict_generator up to the certain "bottleneck" layer and retrieve the values on that layer rather than the values on the output layer?
You need to define a new model (if you didn't define the encoder and decoder as separate models initially, which is usually the easiest option).
If your model was defined without reusing layers, it's just:
inputs = model.input
outputs= model.get_layer('bottleneck').output
encoder = Model(inputs, outputs)
Use the encoder model as any other model.
The full code would be like this,
# ENCODER
encoding_dim = 37310
input_layer = Input(shape=(encoding_dim,))
encoder = Dense(500, activation='tanh')(input_layer)
encoder = Dense(100, activation='tanh')(encoder)
encoder = Dense(50, activation='tanh', name='bottleneck_layer')(encoder)
decoder = Dense(100, activation='tanh')(encoder)
decoder = Dense(500, activation='tanh')(decoder)
decoder = Dense(37310, activation='sigmoid')(decoder)
# full model
model_full = models.Model(input_layer, decoder)
model_full.compile(optimizer='adam', loss='mse')
model_full.fit(x, y, epochs=20, batch_size=16)
# bottleneck model
bottleneck_output = model_full.get_layer('bottleneck_layer').output
model_bottleneck = models.Model(inputs = model_full.input, outputs = bottleneck_output)
bottleneck_predictions = model_bottleneck.predict(X_test)