Stacked autoencoders for data denoising with keras not training the encoder? - tensorflow

I looked for several samples on the web to build a stacked autoencoder for data denoising but I don't seem to understand a fundamental part of the encoder part:
https://blog.keras.io/building-autoencoders-in-keras.html
Following the examples I built the autoencoder like that:
inputs = Input(shape=(timesteps, 50))
encoded1 = Dense(30, activation="relu")(inputs)
encoded2 = Dense(15, activation="relu")(encoded1)
encoded3 = Dense(5, activation="relu")(encoded2)
decoded1 = Dense(15, activation="relu")(encoded3)
decoded2 = Dense(30, activation="relu")(decoded1)
decoded = Dense(50, activation="sigmoid")(decoded2)
autoencoder = Model(inputs=inputs, outputs=decoded)
encoder = Model(inputs, encoded3)
autoencoder.compile(loss='mse', optimizer='rmsprop')
autoencoder.fit(trainX,
trainX,
epochs=epochs,
batch_size=512,
callbacks=callbacks,
validation_data=(trainX, trainX))
On the examples there is mostly a model with the encoder and a seperate model with the decoder. I always see that only the decoder model get's trained. The encoder is not trained. But for my usecase I only need the encoder model to denoise the data. Why does the encoder need no training?

Your interpretation about encoder-decoder is wrong. Encoder encodes your input data into some high dimensional representation which is abstract but it's very powerful if you want use that as features for further prediction. To make sure encoded output is as close to your actual input, you have decoder which decodes your encoded high-dimensional input back to original input. During training, both encoder and decoder are involved i.e. the weights of the encoder layers and decoder layers both are updated. If the encoder is not trained how it's going to learn the encoding mechanism. During inference, you use only the encoder module as you want to encode the input.

Related

Understanding states of a bidirectional LSTM in a seq2seq model (tf keras)

I am creating a language model: A seq2seq model with 2 Bidirectional LSTM layers. I have got the model to train and the accuracy seems good, but whilst stuck on figuring out the inference model, I've found myself a bit confused by the states that are returned by each LSTM layer.
I am using this tutorial as a guide, though the example in this link is not using bidriectional layers: https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html
Note: I am using a pretrained word embedding.
lstm_units = 100
# Set up embedding layer using pretrained weights
embedding_layer = Embedding(total_words+1, emb_dimension, input_length=max_input_len, weights=[embedding_matrix], name="Embedding")
# Encoder
encoder_input_x = Input(shape=(None,), name="Enc_Input")
encoder_embedding_x = embedding_layer(encoder_input_x)
encoder_lstm_x, enc_state_h_fwd, enc_state_c_fwd, enc_state_h_bwd, enc_state_c_bwd = Bidirectional(LSTM(lstm_units, dropout=0.5, return_state=True, name="Enc_LSTM1"), name="Enc_Bi1")(encoder_embedding_x)
encoder_states = [enc_state_h_fwd, enc_state_c_fwd, enc_state_h_bwd, enc_state_c_bwd]
# Decoder
decoder_input_x = Input(shape=(None,), name="Dec_Input")
decoder_embedding_x = embedding_layer(decoder_input_x)
decoder_lstm_layer = Bidirectional(LSTM(lstm_units, return_state=True, return_sequences=True, dropout=0.5, name="Dec_LSTM1"))
decoder_lstm_x, _, _, _, _= decoder_lstm_layer(decoder_embedding_x, initial_state=encoder_states)
decoder_dense_layer = TimeDistributed(Dense(total_words+1, activation="softmax", name="Dec_Softmax"))
decoder_output_x = decoder_dense_layer(decoder_lstm_x)
model = Model(inputs=[encoder_input_x, decoder_input_x], outputs=decoder_output_x)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
I believe diagram of the model looks like this, with 60 time steps.:
I want the encoder to pass the enc_state_h_fwd and enc_state_c_fwd forward to the decoder. This connection is highlighted by the orange arrow.
But since the model is bidirectional, I have some questions:
Do I need to pass the decoder states backwards to the encoder? And how would one possibly do this, it seems like a chicken and egg scenario.
The encoder_states that come from the encoder lstm layer output 4 states. h and c states going forward and backward. I feel like the "backward" states are denoted in my diagram by the pink arrow going left out of the encoder. I am passing these to the decoder, but why does it need them? Am I incorrectly connecting the pink arrow on the left to the purple arrow going into the decoder from the right?
This model is not valid. It is set up as a translation model, which during inference would predict one word at a time, starting with the start of sequence token, to predict y1, then looping and feeding in the start of sequence token, y1 to get y2 etc.
A bidirectional LSTM cannot be used for real time predictions in a many to many prediction unless the entire decoder input is available. In this case, the decoder input is only available after predicting one step at a time, so the first prediction (of y1) is invalid without the rest of the sequence (y2-yt).
The decoder should therefore not be an LSTM.
As for the states, the encoder Bidirectional LSTM does indeed output h and c states going forward (orange arrow), and h and c states going backward (pink arrow).
By concatenating these states and feeding them to the decoder, we can give the decoder more information. This is possible as we do have the entire encoder input at time of inference.
Also to be noted is that the bidirectional encoder with lstm_units (eg. 100) effectively has 200 lstm units, half going forward, half going backward. To feed these into the decoder, the decoder must have 200 units too.

Stacked Autoencoder for Classification

I have trained a stacked autoencoder which only contains the encoder part and has attached a classifier at the end. The model looks like below:
input_ = layers.Input(shape=(78,))
encoder = layers.Dense(50,activation='relu')(input_)
encoder_one = layers.Dense(30,activation='relu')(encoder)
encoder_two = layers.Dense(15,activation='relu')(encoder_one)
classifier = layers.Dense(11,activation='softmax')(encoder_two)
autoencoder = Model(inputs=input_, outputs=classifier)
In order to check if the model is working fine I cannot predict classes for this like which I can do for other models like CNN or RNN. How do I achieve that? I used stacked autoencoder with the last layer as classifier long ago in tensorflow 1.6. Previously I used to do
y_pred = autoencoder.predict(X_test).ravel()
But this above code does not seem to work for Tensorflow 2.3 anymore.
The way to predict classes using the classifier layer in and encoder is the following:
predicted_classes = autoencoder.predict(X_test)
predicted_classes = np.argmax(np.round(predicted_classes),axis=1)

how to have a LSTM Autoencoder model over the whole vocab prediction while presenting words as embedding

So I have been working on LSTM Autoencoder model. I have also created various version of this model.
1. create the model using the already trained word embedding:
in this scenario, I used the weights of already trained Glove vector, as the weight of features(text data).
This is the structure:
inputs = Input(shape=(SEQUENCE_LEN, EMBED_SIZE), name="input")
encoded = Bidirectional(LSTM(LATENT_SIZE), merge_mode="sum", name="encoder_lstm")(inputs)
encoded =Lambda(rev_entropy)(encoded)
decoded = RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = Bidirectional(LSTM(EMBED_SIZE, return_sequences=True), merge_mode="sum", name="decoder_lstm")(decoded)
autoencoder = Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss='mse')
autoencoder.summary()
checkpoint = ModelCheckpoint(filepath='checkpoint/{epoch}.hdf5')
history = autoencoder.fit_generator(train_gen, steps_per_epoch=num_train_steps, epochs=NUM_EPOCHS, validation_data=test_gen, validation_steps=num_test_steps, callbacks=[checkpoint])
in the second scenario, I implemented the word embedding layer in the model itself:
This is the structure:
inputs = Input(shape=(SEQUENCE_LEN, ), name="input")
embedding = Embedding(input_dim=VOCAB_SIZE, output_dim=EMBED_SIZE, input_length=SEQUENCE_LEN,trainable=False)(inputs)
encoded = Bidirectional(LSTM(LATENT_SIZE), merge_mode="sum", name="encoder_lstm")(embedding)
decoded = RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = LSTM(EMBED_SIZE, return_sequences=True)(decoded)
autoencoder = Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss='categorical_crossentropy')
autoencoder.summary()
checkpoint = ModelCheckpoint(filepath=os.path.join('Data/', "simple_ae_to_compare"))
history = autoencoder.fit_generator(train_gen, steps_per_epoch=num_train_steps, epochs=NUM_EPOCHS, validation_steps=num_test_steps)
in the third scenario, I did not use any embedding techniques but used the one hot encoding for the features. and this is the structure of the model:
`inputs = Input(shape=(SEQUENCE_LEN, VOCAB_SIZE), name="input")
encoded = Bidirectional(LSTM(LATENT_SIZE, kernel_initializer="glorot_normal",), merge_mode="sum", name="encoder_lstm")(inputs)
encoded = Lambda(score_cooccurance, name='Modified_layer')(encoded)
decoded = RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = LSTM(VOCAB_SIZE, return_sequences=True)(decoded)
autoencoder = Model(inputs, decoded)
sgd = optimizers.SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
autoencoder.compile(optimizer=sgd, loss='categorical_crossentropy')
autoencoder.summary()
checkpoint = ModelCheckpoint(filepath='checkpoint/50/{epoch}.hdf5')
history = autoencoder.fit_generator(train_gen, steps_per_epoch=num_train_steps, epochs=NUM_EPOCHS, callbacks=[checkpoint])`
As you see, in the first and second model Embed_size in the decoding is the number of neurons in that layer. it causes the output shape of encoder layer becomes [Latent_size, Embed_size].
in the third model, the output shape of the encoder is [Latent_size, Vocab_size].
Now my question
Is it doable to change the structure of the model in a way I have embedding for representing my words to the model, and at the same time having vocab_size in the decoder layer?
I need to have output_shape of encoder layer be [Latent_size, Vocab_size] and at the same time I don't want to represent my features as the one_hot encoding for the obvious reason.
I appreciate if you can share your idea with me.
One idea could be adding more layers, consider that with any cost I don't want to have Embed_size in the last layer.
Your questions:
Is it doable to change the structure of the model in a way I have embedding for representing my words to the model, and at the same time having vocab_size in the decoder layer?
I like to use as reference the Tensorflow transformer model:
https://github.com/tensorflow/models/tree/master/official/transformer
In language translation tasks the model input tends to be a token index, which then is subject to an embedding lookup resulting in a shape of (sequence_length, embedding_dims); the encoder itself works on this shape.
The decoder output tends to be in the shape of (sequence_length, embedding_dims) also. For instance the model above, then transforms the decoder output into logits by doing a dot product between the output and the embedding vectors. This is the transformation they use: https://github.com/tensorflow/models/blob/master/official/transformer/model/embedding_layer.py#L94
I would recommend an approach similar to the language translation models:
pre-stage:
input_shape=(sequence_length, 1) [ i.e. token_index in [0.. vocab_size)
encoder:
input_shape=(sequence_length, embedding_dims)
output_shape=(latent_dims)
decoder:
input_shape=(latent_dims)
output_shape=(sequence_length, embedding_dims)
Pre-processing converts token indexes into embedding_dims. This can be used to generate both the encoder input as well as the decoder targets.
Post processing to convert embedding_dims to logits (in the vocab_index space).
I need to have output_shape of encoder layer be [Latent_size, Vocab_size] and at the same time I don't want to represent my features as the one_hot encoding for the obvious reason.
That doesn't sound right. Typically what one is trying to achieve with an auto-encoder is to have a embedding vector for the sentence. So the output of the encoder in typically [latent_dims]. The output of the decoder needs to be translatable into [sequence_length, vocab_index (1) ] which is typically done by converting from embedding space to logits and then taking the argmax to convert to token index.

How to save tensorflow dynamic_rnn model and restore them as an decoder in a new encoder-decoder model?

I am trying to train a encoder-decoder model to automatically generate summary. the encoder part use CNN to encode article's abstract. the decoder part is RNN to generate article's title.
so the skeleton looks like:
encoder_state = CNNEncoder(encoder_inputs)
decoder_outputs, _ = RNNDecoder(encoder_state,decoder_inputs)
But I want to pre-trained the RNN decoder to teach the model to learn how to speak first. the decoder part is:
def RNNDecoder(encoder_state,decoder_inputs):
decoder_inputs_embedded = tf.nn.embedding_lookup(embeddings, decoder_inputs)
#from tensorflow.models.rnn import rnn_cell, seq2seq
cell = rnn.GRUCell(memory_dim)
decoder_outputs, decoder_final_state = tf.nn.dynamic_rnn(
cell, decoder_inputs_embedded,
initial_state=encoder_state,
dtype=tf.float32,scope="plain_decoder1"
)
return decoder_outputs, decoder_final_state
So my concern is how to save the save and restore RNNDecoder part separately?
Here you can take the output of the dynamic RNN first.
decoder_cell = tf.contrib.rnn.LSTMCell(decoder_hidden_units)
decoder_outputs, decoder_final_state = tf.nn.dynamic_rnn(decoder_cell, decoder_inputs_embedded,initial_state=encoder_final_state,dtype=tf.float32, time_major=True, scope="plain_decoder")
Take the decoder_outputs. Then use a softmax layer to fully connect it.
decoder_logits = tf.contrib.layers.linear(decoder_outputs, vocab_`size)
Then you can create a softmax loss with decoder_logits and train it in the noramal way.
When you want to restore the parameters you this kind of method in a session
with tf.Session() as session:
saver = tf.train.Saver()
saver.restore(session, checkpoint_file)
Here the checkpoint file should be your exact checkpoint file. So when running what happen is it will only restore your decoder weights and train with the main model.

Implementing a many-to-many LSTM in TensorFlow?

I am using TensorFlow to make predictions on time-series data. So it is like I have 50 tags and I want to find out the next possible 5 tags.
As shown in the following picture, I want to make it like the 4th structure.
I went through the tutorial demo: Recurrent Neural Networks
But I found it can provide like the 5th one in the above picture, which is different.
I am wondering which model could I use? I am thinking of the seq2seq models, but not sure if it is the right way.
You are right that you can use a seq2seq model. For brevity I've written up an example of how you can do it in Keras which also has a Tensorflow backend. I've not run the example so it might need tweaking. If your tags are one-hot you need to use cross-entropy loss instead.
from keras.models import Model
from keras.layers import Input, LSTM, RepeatVector
# The input shape is your sequence length and your token embedding size
inputs = Input(shape=(seq_len, embedding_size))
# Build a RNN encoder
encoder = LSTM(128, return_sequences=False)(inputs)
# Repeat the encoding for every input to the decoder
encoding_repeat = RepeatVector(5)(encoder)
# Pass your (5, 128) encoding to the decoder
decoder = LSTM(128, return_sequences=True)(encoding_repeat)
# Output each timestep into a fully connected layer
sequence_prediction = TimeDistributed(Dense(1, activation='linear'))(decoder)
model = Model(inputs, sequence_prediction)
model.compile('adam', 'mse') # Or categorical_crossentropy
model.fit(X_train, y_train)