how to have a LSTM Autoencoder model over the whole vocab prediction while presenting words as embedding - tensorflow

So I have been working on LSTM Autoencoder model. I have also created various version of this model.
1. create the model using the already trained word embedding:
in this scenario, I used the weights of already trained Glove vector, as the weight of features(text data).
This is the structure:
inputs = Input(shape=(SEQUENCE_LEN, EMBED_SIZE), name="input")
encoded = Bidirectional(LSTM(LATENT_SIZE), merge_mode="sum", name="encoder_lstm")(inputs)
encoded =Lambda(rev_entropy)(encoded)
decoded = RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = Bidirectional(LSTM(EMBED_SIZE, return_sequences=True), merge_mode="sum", name="decoder_lstm")(decoded)
autoencoder = Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss='mse')
autoencoder.summary()
checkpoint = ModelCheckpoint(filepath='checkpoint/{epoch}.hdf5')
history = autoencoder.fit_generator(train_gen, steps_per_epoch=num_train_steps, epochs=NUM_EPOCHS, validation_data=test_gen, validation_steps=num_test_steps, callbacks=[checkpoint])
in the second scenario, I implemented the word embedding layer in the model itself:
This is the structure:
inputs = Input(shape=(SEQUENCE_LEN, ), name="input")
embedding = Embedding(input_dim=VOCAB_SIZE, output_dim=EMBED_SIZE, input_length=SEQUENCE_LEN,trainable=False)(inputs)
encoded = Bidirectional(LSTM(LATENT_SIZE), merge_mode="sum", name="encoder_lstm")(embedding)
decoded = RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = LSTM(EMBED_SIZE, return_sequences=True)(decoded)
autoencoder = Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss='categorical_crossentropy')
autoencoder.summary()
checkpoint = ModelCheckpoint(filepath=os.path.join('Data/', "simple_ae_to_compare"))
history = autoencoder.fit_generator(train_gen, steps_per_epoch=num_train_steps, epochs=NUM_EPOCHS, validation_steps=num_test_steps)
in the third scenario, I did not use any embedding techniques but used the one hot encoding for the features. and this is the structure of the model:
`inputs = Input(shape=(SEQUENCE_LEN, VOCAB_SIZE), name="input")
encoded = Bidirectional(LSTM(LATENT_SIZE, kernel_initializer="glorot_normal",), merge_mode="sum", name="encoder_lstm")(inputs)
encoded = Lambda(score_cooccurance, name='Modified_layer')(encoded)
decoded = RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = LSTM(VOCAB_SIZE, return_sequences=True)(decoded)
autoencoder = Model(inputs, decoded)
sgd = optimizers.SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
autoencoder.compile(optimizer=sgd, loss='categorical_crossentropy')
autoencoder.summary()
checkpoint = ModelCheckpoint(filepath='checkpoint/50/{epoch}.hdf5')
history = autoencoder.fit_generator(train_gen, steps_per_epoch=num_train_steps, epochs=NUM_EPOCHS, callbacks=[checkpoint])`
As you see, in the first and second model Embed_size in the decoding is the number of neurons in that layer. it causes the output shape of encoder layer becomes [Latent_size, Embed_size].
in the third model, the output shape of the encoder is [Latent_size, Vocab_size].
Now my question
Is it doable to change the structure of the model in a way I have embedding for representing my words to the model, and at the same time having vocab_size in the decoder layer?
I need to have output_shape of encoder layer be [Latent_size, Vocab_size] and at the same time I don't want to represent my features as the one_hot encoding for the obvious reason.
I appreciate if you can share your idea with me.
One idea could be adding more layers, consider that with any cost I don't want to have Embed_size in the last layer.

Your questions:
Is it doable to change the structure of the model in a way I have embedding for representing my words to the model, and at the same time having vocab_size in the decoder layer?
I like to use as reference the Tensorflow transformer model:
https://github.com/tensorflow/models/tree/master/official/transformer
In language translation tasks the model input tends to be a token index, which then is subject to an embedding lookup resulting in a shape of (sequence_length, embedding_dims); the encoder itself works on this shape.
The decoder output tends to be in the shape of (sequence_length, embedding_dims) also. For instance the model above, then transforms the decoder output into logits by doing a dot product between the output and the embedding vectors. This is the transformation they use: https://github.com/tensorflow/models/blob/master/official/transformer/model/embedding_layer.py#L94
I would recommend an approach similar to the language translation models:
pre-stage:
input_shape=(sequence_length, 1) [ i.e. token_index in [0.. vocab_size)
encoder:
input_shape=(sequence_length, embedding_dims)
output_shape=(latent_dims)
decoder:
input_shape=(latent_dims)
output_shape=(sequence_length, embedding_dims)
Pre-processing converts token indexes into embedding_dims. This can be used to generate both the encoder input as well as the decoder targets.
Post processing to convert embedding_dims to logits (in the vocab_index space).
I need to have output_shape of encoder layer be [Latent_size, Vocab_size] and at the same time I don't want to represent my features as the one_hot encoding for the obvious reason.
That doesn't sound right. Typically what one is trying to achieve with an auto-encoder is to have a embedding vector for the sentence. So the output of the encoder in typically [latent_dims]. The output of the decoder needs to be translatable into [sequence_length, vocab_index (1) ] which is typically done by converting from embedding space to logits and then taking the argmax to convert to token index.

Related

Understanding states of a bidirectional LSTM in a seq2seq model (tf keras)

I am creating a language model: A seq2seq model with 2 Bidirectional LSTM layers. I have got the model to train and the accuracy seems good, but whilst stuck on figuring out the inference model, I've found myself a bit confused by the states that are returned by each LSTM layer.
I am using this tutorial as a guide, though the example in this link is not using bidriectional layers: https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html
Note: I am using a pretrained word embedding.
lstm_units = 100
# Set up embedding layer using pretrained weights
embedding_layer = Embedding(total_words+1, emb_dimension, input_length=max_input_len, weights=[embedding_matrix], name="Embedding")
# Encoder
encoder_input_x = Input(shape=(None,), name="Enc_Input")
encoder_embedding_x = embedding_layer(encoder_input_x)
encoder_lstm_x, enc_state_h_fwd, enc_state_c_fwd, enc_state_h_bwd, enc_state_c_bwd = Bidirectional(LSTM(lstm_units, dropout=0.5, return_state=True, name="Enc_LSTM1"), name="Enc_Bi1")(encoder_embedding_x)
encoder_states = [enc_state_h_fwd, enc_state_c_fwd, enc_state_h_bwd, enc_state_c_bwd]
# Decoder
decoder_input_x = Input(shape=(None,), name="Dec_Input")
decoder_embedding_x = embedding_layer(decoder_input_x)
decoder_lstm_layer = Bidirectional(LSTM(lstm_units, return_state=True, return_sequences=True, dropout=0.5, name="Dec_LSTM1"))
decoder_lstm_x, _, _, _, _= decoder_lstm_layer(decoder_embedding_x, initial_state=encoder_states)
decoder_dense_layer = TimeDistributed(Dense(total_words+1, activation="softmax", name="Dec_Softmax"))
decoder_output_x = decoder_dense_layer(decoder_lstm_x)
model = Model(inputs=[encoder_input_x, decoder_input_x], outputs=decoder_output_x)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
I believe diagram of the model looks like this, with 60 time steps.:
I want the encoder to pass the enc_state_h_fwd and enc_state_c_fwd forward to the decoder. This connection is highlighted by the orange arrow.
But since the model is bidirectional, I have some questions:
Do I need to pass the decoder states backwards to the encoder? And how would one possibly do this, it seems like a chicken and egg scenario.
The encoder_states that come from the encoder lstm layer output 4 states. h and c states going forward and backward. I feel like the "backward" states are denoted in my diagram by the pink arrow going left out of the encoder. I am passing these to the decoder, but why does it need them? Am I incorrectly connecting the pink arrow on the left to the purple arrow going into the decoder from the right?
This model is not valid. It is set up as a translation model, which during inference would predict one word at a time, starting with the start of sequence token, to predict y1, then looping and feeding in the start of sequence token, y1 to get y2 etc.
A bidirectional LSTM cannot be used for real time predictions in a many to many prediction unless the entire decoder input is available. In this case, the decoder input is only available after predicting one step at a time, so the first prediction (of y1) is invalid without the rest of the sequence (y2-yt).
The decoder should therefore not be an LSTM.
As for the states, the encoder Bidirectional LSTM does indeed output h and c states going forward (orange arrow), and h and c states going backward (pink arrow).
By concatenating these states and feeding them to the decoder, we can give the decoder more information. This is possible as we do have the entire encoder input at time of inference.
Also to be noted is that the bidirectional encoder with lstm_units (eg. 100) effectively has 200 lstm units, half going forward, half going backward. To feed these into the decoder, the decoder must have 200 units too.

How to use embedding models in tensorflow hub with LSTM layer?

I'm learning tensorflow 2 working through the text classification with TF hub tutorial. It used an embedding module from TF hub. I was wondering if I could modify the model to include a LSTM layer. Here's what I've tried:
train_data, validation_data, test_data = tfds.load(
name="imdb_reviews",
split=('train[:60%]', 'train[60%:]', 'test'),
as_supervised=True)
embedding = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"
hub_layer = hub.KerasLayer(embedding, input_shape=[],
dtype=tf.string, trainable=True)
model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Embedding(10000, 50))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(1))
model.summary()
model.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_data.shuffle(10000).batch(512),
epochs=10,
validation_data=validation_data.batch(512),
verbose=1)
results = model.evaluate(test_data.batch(512), verbose=2)
for name, value in zip(model.metrics_names, results):
print("%s: %.3f" % (name, value))
I don't know how to get the vocabulary size from the hub_layer. So I just put 10000 there. When run it, it throws this exception:
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[480,1] = -6 is not in [0, 10000)
[[node sequential/embedding/embedding_lookup (defined at .../learning/tensorflow/text_classify.py:36) ]] [Op:__inference_train_function_36284]
Errors may have originated from an input operation.
Input Source operations connected to node sequential/embedding/embedding_lookup:
sequential/embedding/embedding_lookup/34017 (defined at Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py:112)
Function call stack:
train_function
I stuck here. My questions are:
how should I use the embedding module from TF hub to feed an LSTM layer? it looks like embedding lookup has some issues with the setting.
how do I get the vocabulary size from the hub layer?
Thanks
Finally figured out the way to link pre-trained embeddings to LSTM or other layers. Just post the steps here in case anyone feels helpful.
Embedding layer has to be the first layer in the model. (hub_layer is the same as Embedding layer.) The not very intuitive part is that any text input to the hub layer will be converted to only one vector of shape [embedding_dim]. You need to do sentence splitting and tokenization to make sure whatever input to the model is a sequence in the form of array of arrays. e.g., "Let us prepare the data." should be converted to [["let"],["us"],["prepare"], ["the"], ["data"]]. You will also need to pad the sequences if you are using batch mode.
In addition, you will need to convert your target tokens to int if your training labels are strings. The input to the model is array of strings with shape [batch, seq_length], the hub embedding layer converts it to [batch, seq_length, embed_dim]. (If you add a LSTM or other RNN layer, the output from the layer is [batch, seq_length, rnn_units]. ) The output dense layer will output index of text instead of actual text. The index of text is stored in the downloaded tfhub directory as "tokens.txt". You can load the file and convert text to the corresponding index. Otherwise you cannot compute the loss.

Stacked autoencoders for data denoising with keras not training the encoder?

I looked for several samples on the web to build a stacked autoencoder for data denoising but I don't seem to understand a fundamental part of the encoder part:
https://blog.keras.io/building-autoencoders-in-keras.html
Following the examples I built the autoencoder like that:
inputs = Input(shape=(timesteps, 50))
encoded1 = Dense(30, activation="relu")(inputs)
encoded2 = Dense(15, activation="relu")(encoded1)
encoded3 = Dense(5, activation="relu")(encoded2)
decoded1 = Dense(15, activation="relu")(encoded3)
decoded2 = Dense(30, activation="relu")(decoded1)
decoded = Dense(50, activation="sigmoid")(decoded2)
autoencoder = Model(inputs=inputs, outputs=decoded)
encoder = Model(inputs, encoded3)
autoencoder.compile(loss='mse', optimizer='rmsprop')
autoencoder.fit(trainX,
trainX,
epochs=epochs,
batch_size=512,
callbacks=callbacks,
validation_data=(trainX, trainX))
On the examples there is mostly a model with the encoder and a seperate model with the decoder. I always see that only the decoder model get's trained. The encoder is not trained. But for my usecase I only need the encoder model to denoise the data. Why does the encoder need no training?
Your interpretation about encoder-decoder is wrong. Encoder encodes your input data into some high dimensional representation which is abstract but it's very powerful if you want use that as features for further prediction. To make sure encoded output is as close to your actual input, you have decoder which decodes your encoded high-dimensional input back to original input. During training, both encoder and decoder are involved i.e. the weights of the encoder layers and decoder layers both are updated. If the encoder is not trained how it's going to learn the encoding mechanism. During inference, you use only the encoder module as you want to encode the input.

How can I get a tensor output by a tensorflow.layer

I created a CNN model using higher level tensorflow layers, like
conv1 = tf.layers.conv2d(...)
maxpooling1 = tf.layers.max_pooling2d(...)
conv2 = tf.layers.conv2d(...)
maxpooling2 = tf.layers.max_pooling2d(...)
flatten = tf.layers.flatten(...)
logits = tf.layers.dense(...)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(...))
optimizer = tf.train.AdadeltaOptimizer(init_lr).minimize(loss)
acc = tf.reduce_mean(...)
The model is well trained and saved, everything is good so far. Next, I want to load this saved model, make a change to the learning rate, and continue to train (I know tensorflow provides exponential_decay() function to allow a decay learning rate, here i just want to be in full control of learning rate, and change it manually). To do this, my idea is like:
saver = tf.train.import_meta_grah(...)
saver.restore(sess, tf.train.latest_chechpoint(...))
graph = tf.get_default_graph()
inputImg_ = graph.get_tensor_by_name(...) # this is place_holder in model
labels_ = graph.get_tensor_by_name(...) # place_holder in model
logits = graphget_tensor_by_name(...) # output of dense layer
loss = grah.get_tensor_by_name(...) # loss
optimizer = tf.train.AdadeltaOptimizer(new_lr).minimize(loss) # I give it a new learning rate
acc = tf.reduce_mean(...)
Now I got a problem. the code above can successfully obtain inputmg_, labels_, because I named them when I defined them. But I cannot obtain logits because logits = tf.layers.dense(name='logits') the name is actually given to the dense layer instead of the output tensor logits. That means, I cannot obtain the tensor conv1, conv2 either. It seems tensorflow cannot name a tensor output by a layer. In this case, is there a way to obtain these tensors, like logits, conv1, maxpooling1? I've searched for the answer for a while but failed.
I was having the same problem and solved it using tf.identity.
Since the dense layer has bias and weights parameters, when you name it, you are naming the layer, not the output tensor.
The tf.identity returns a tensor with the same shape and contents as input.
So just leave the dense layer unamed and use it as input to the tf.identity
self.output = tf.layers.dense(hidden_layer3, 2)
self.output = tf.identity(self.output, name='output')
Now you can load the output
output = graph.get_tensor_by_name('output:0')

Implementing a many-to-many LSTM in TensorFlow?

I am using TensorFlow to make predictions on time-series data. So it is like I have 50 tags and I want to find out the next possible 5 tags.
As shown in the following picture, I want to make it like the 4th structure.
I went through the tutorial demo: Recurrent Neural Networks
But I found it can provide like the 5th one in the above picture, which is different.
I am wondering which model could I use? I am thinking of the seq2seq models, but not sure if it is the right way.
You are right that you can use a seq2seq model. For brevity I've written up an example of how you can do it in Keras which also has a Tensorflow backend. I've not run the example so it might need tweaking. If your tags are one-hot you need to use cross-entropy loss instead.
from keras.models import Model
from keras.layers import Input, LSTM, RepeatVector
# The input shape is your sequence length and your token embedding size
inputs = Input(shape=(seq_len, embedding_size))
# Build a RNN encoder
encoder = LSTM(128, return_sequences=False)(inputs)
# Repeat the encoding for every input to the decoder
encoding_repeat = RepeatVector(5)(encoder)
# Pass your (5, 128) encoding to the decoder
decoder = LSTM(128, return_sequences=True)(encoding_repeat)
# Output each timestep into a fully connected layer
sequence_prediction = TimeDistributed(Dense(1, activation='linear'))(decoder)
model = Model(inputs, sequence_prediction)
model.compile('adam', 'mse') # Or categorical_crossentropy
model.fit(X_train, y_train)