Preprocessing strings in keras for seq2seq machine translation - tensorflow

I am trying to learn how to implement seq2seq networks in keras for machine translation and I copied code straight from here. Unfortunately, I have no experience working with recurrent neural networks so I have almost no idea how the inputs to this model are supposed to look like. Basically, I want to know how I can transform strings of characters into something useable by the model I copied.
I'm only going to ask about encoder inputs here since it's my first roadblock. I've pasted the original code for the encoder below.
# Define an input sequence and process it.
encoder_inputs = keras.Input(shape=(None, num_encoder_tokens)) encoder =
keras.layers.LSTM(latent_dim, return_state=True) encoder_outputs,
state_h, state_c = encoder(encoder_inputs)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
Generally, I understand that each input sentence should be encoded as a matrix where entry i_j represents the one-hot encoding of the ith word in the sentence. I also understand that I can use Tokenizer to get me there. I tried using texts_to_matrix but that gives a BoW representation of each sentence. Not what I want. So I decided to use texts_to_sequences and embedding to get the model above to accept my input.
Here's what I have so far. I keep getting errors, which seem to be due to mismatching dimensions between the layers and/or the need for all input sentences to be the same length.
df =pd.DataFrame([{
"English":"My name is migs",
"French":"Je m'appelle mig"
},
{
"English":"mig has long hair",
"French":"mig a les cheveux longs"
},
{
"English":"The cat is black",
"French": "Le chat est noir"
},
{
"English":"mig has black hair",
"French":"mig a les cheveux noirs"
}])
english_tokenizer = text.Tokenizer()
english_tokenizer.fit_on_texts(df.English)
tokenized_english = english_tokenizer.texts_to_sequences(df.English)
num_encoder_tokens = len(english_tokenizer.word_index)+1
# Define an input sequence and process it. encoder_inputs = Input(shape=(None, num_encoder_tokens)) encoder_input_embedding =
Embedding(input_dim = num_encoder_tokens,output_dim=8)(encoder_inputs)
encoder = LSTM(latent_dim, return_state=True) encoder_outputs,
state_h, state_c = encoder(encoder_input_embedding)
# We discard `encoder_outputs` and only keep the states. encoder_states = [state_h, state_c]
Is padding absolutely necessary or can I modify the model or preprocessing steps to accomodate variable length input? And how can I modify my existing code to get it to run without throwing errors?

Related

RoBERTa example from tfhub produces error "During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string"

I would like to use the roberta-base model from tfhub. I am trying to run the example below, although I get an error when I try to feed sentences to model as input. I get the following error Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string. I am using python 3.7, tensorflow 2.5, and tensorflow_hub 0.12.
If I try to replace preprocessor and encoder with the corresponding BERT versions, the code above works. However, I would like it to work for RoBERTa as well (as shown below).
preprocess = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")
encoder = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4", trainable=True)
# define a text embedding model
text_input = tf.keras.layers.Input(shape=(), dtype=tf.string)
preprocessor = hub.KerasLayer("https://tfhub.dev/jeongukjae/roberta_en_cased_preprocess/1")
encoder_inputs = preprocessor(text_input)
encoder = hub.KerasLayer("https://tfhub.dev/jeongukjae/roberta_en_cased_L-12_H-768_A-12/1", trainable=True)
encoder_outputs = encoder(encoder_inputs)
pooled_output = encoder_outputs["pooled_output"] # [batch_size, 768].
sequence_output = encoder_outputs["sequence_output"] # [batch_size, seq_length, 768].
model = tf.keras.Model(text_input, pooled_output)
# You can embed your sentences as follows
sentences = tf.constant(["(your text here)"])
print(model(sentences))
Additionally, the code above with the RoBERTa preprocessor/encoder seems to work if I use CPU instead of GPU (adding with tf.device('/cpu:0')), but this is not feasible because I need to fine-tune a model on lots of data.

Keras autoencoder and getting the compressed feature vector representation

I have one sentence per row in a file and sentences are not more than 30 words. I am building an autoencoder using Keras and I am very new to this - so I may be doing few things incorrectly. So, help me out here.
I am trying to use autoencoder to get the intermediate context vector - the compressed feature vectors after the encode step.
Vocabulary is nothing but a list of distinct words in my file. 300 is the dimension of word embedding matrix. 30 is the maximum words each sentence can have. X_train is (#of sentence, 30) matrix of numbers where each number is nothing but where in the dictionary the word existed.
print len(vocabulary)
model = Sequential()
model.add(Embedding(len(vocabulary), 300))
model.compile('rmsprop', 'mse')
input_i = Input(shape=(30, 300))
encoded_h1 = Dense(64, activation='tanh')(input_i)
encoded_h2 = Dense(32, activation='tanh')(encoded_h1)
encoded_h3 = Dense(16, activation='tanh')(encoded_h2)
encoded_h4 = Dense(8, activation='tanh')(encoded_h3)
encoded_h5 = Dense(4, activation='tanh')(encoded_h4)
latent = Dense(2, activation='tanh')(encoded_h5)
decoder_h1 = Dense(4, activation='tanh')(latent)
decoder_h2 = Dense(8, activation='tanh')(decoder_h1)
decoder_h3 = Dense(16, activation='tanh')(decoder_h2)
decoder_h4 = Dense(32, activation='tanh')(decoder_h3)
decoder_h5 = Dense(64, activation='tanh')(decoder_h4)
output = Dense(300, activation='tanh')(decoder_h5)
autoencoder = Model(input_i,output)
autoencoder.compile('adadelta','mse')
X_embedded = model.predict(X_train)
autoencoder.fit(X_embedded,X_embedded,epochs=10, batch_size=256, validation_split=.1)
print autoencoder.summary()
The idea is taken from Keras - Autoencoder for Text Analysis
So, after training (if I have done correctly) how should I just run the encoding part for each sentence to get the feature representation? Help is appreciated. Thanks!
make a standalone model for encoder
encoder=Model(input_i,latent)
suppose for mnist data the code should be like--
encoder.predict(x_train[0])
by this you will get latent_space vector as output
To do this, refer to 'popping' off last layer via model.pop(). After training "pop off" the last layer by using model.pop(), then use model.predict(X_train) to get the representation.
https://keras.io/getting-started/faq/#how-can-i-remove-a-layer-from-a-sequential-model

How to use tensorflow seq2seq without embeddings?

I have been working on LSTM for timeseries forecasting by using tensorflow. Now, i want to try sequence to sequence (seq2seq). In the official site there is a tutorial which shows NMT with embeddings . So, how can I use this new seq2seq module without embeddings? (directly using time series "sequences").
# 1. Encoder
encoder_cell = tf.contrib.rnn.BasicLSTMCell(LSTM_SIZE)
encoder_outputs, encoder_state = tf.nn.static_rnn(
encoder_cell,
x,
dtype=tf.float32)
# Decoder
decoder_cell = tf.nn.rnn_cell.BasicLSTMCell(LSTM_SIZE)
helper = tf.contrib.seq2seq.TrainingHelper(
decoder_emb_inp, decoder_lengths, time_major=True)
decoder = tf.contrib.seq2seq.BasicDecoder(
decoder_cell, helper, encoder_state)
# Dynamic decoding
outputs, _ = tf.contrib.seq2seq.dynamic_decode(decoder)
outputs = outputs[-1]
# output is result of linear activation of last layer of RNN
weight = tf.Variable(tf.random_normal([LSTM_SIZE, N_OUTPUTS]))
bias = tf.Variable(tf.random_normal([N_OUTPUTS]))
predictions = tf.matmul(outputs, weight) + bias
What should be the args for TrainingHelper() if I use input_seq=x and output_seq=label?
decoder_emb_inp ???
decoder_lengths ???
Where input_seq are the first 8 point of the sequence, and output_seq are the last 2 point of the sequence.
Thanks on advance!
I got it to work for no embedding using a very rudimentary InferenceHelper:
inference_helper = tf.contrib.seq2seq.InferenceHelper(
sample_fn=lambda outputs: outputs,
sample_shape=[dim],
sample_dtype=dtypes.float32,
start_inputs=start_tokens,
end_fn=lambda sample_ids: False)
My inputs are floats with the shape [batch_size, time, dim]. For the example below dim would be 1, but this can easily be extended to more dimensions. Here's the relevant part of the code:
projection_layer = tf.layers.Dense(
units=1, # = dim
kernel_initializer=tf.truncated_normal_initializer(
mean=0.0, stddev=0.1))
# Training Decoder
training_decoder_output = None
with tf.variable_scope("decode"):
# output_data doesn't exist during prediction phase.
if output_data is not None:
# Prepend the "go" token
go_tokens = tf.constant(go_token, shape=[batch_size, 1, 1])
dec_input = tf.concat([go_tokens, target_data], axis=1)
# Helper for the training process.
training_helper = tf.contrib.seq2seq.TrainingHelper(
inputs=dec_input,
sequence_length=[output_size] * batch_size)
# Basic decoder
training_decoder = tf.contrib.seq2seq.BasicDecoder(
dec_cell, training_helper, enc_state, projection_layer)
# Perform dynamic decoding using the decoder
training_decoder_output = tf.contrib.seq2seq.dynamic_decode(
training_decoder, impute_finished=True,
maximum_iterations=output_size)[0]
# Inference Decoder
# Reuses the same parameters trained by the training process.
with tf.variable_scope("decode", reuse=tf.AUTO_REUSE):
start_tokens = tf.constant(
go_token, shape=[batch_size, 1])
# The sample_ids are the actual output in this case (not dealing with any logits here).
# My end_fn is always False because I'm working with a generator that will stop giving
# more data. You may extend the end_fn as you wish. E.g. you can append end_tokens
# and make end_fn be true when the sample_id is the end token.
inference_helper = tf.contrib.seq2seq.InferenceHelper(
sample_fn=lambda outputs: outputs,
sample_shape=[1], # again because dim=1
sample_dtype=dtypes.float32,
start_inputs=start_tokens,
end_fn=lambda sample_ids: False)
# Basic decoder
inference_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell,
inference_helper,
enc_state,
projection_layer)
# Perform dynamic decoding using the decoder
inference_decoder_output = tf.contrib.seq2seq.dynamic_decode(
inference_decoder, impute_finished=True,
maximum_iterations=output_size)[0]
Have a look at this question. Also I found this tutorial to be very useful to understand seq2seq models, although it does use embeddings. So replace their GreedyEmbeddingHelper by an InferenceHelper like the one I posted above.
P.s. I posted the full code at https://github.com/Andreea-G/tensorflow_examples

How to modify the Tensorflow Sequence2Sequence model to implement Bidirectional LSTM rather than Unidirectional one?

Refer to this post to know the background of the problem:
Does the TensorFlow embedding_attention_seq2seq method implement a bidirectional RNN Encoder by default?
I am working on the same model, and want to replace the unidirectional LSTM layer with a Bidirectional layer. I realize I have to use static_bidirectional_rnn instead of static_rnn, but I am getting an error due to some mismatch in the tensor shape.
I replaced the following line:
encoder_outputs, encoder_state = core_rnn.static_rnn(encoder_cell, encoder_inputs, dtype=dtype)
with the line below:
encoder_outputs, encoder_state_fw, encoder_state_bw = core_rnn.static_bidirectional_rnn(encoder_cell, encoder_cell, encoder_inputs, dtype=dtype)
That gives me the following error:
InvalidArgumentError (see above for traceback): Incompatible shapes:
[32,5,1,256] vs. [16,1,1,256]
[[Node: gradients/model_with_buckets/embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/Attention_0/add_grad/BroadcastGradientArgs
= BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/model_with_buckets/embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/Attention_0/add_grad/Shape,
gradients/model_with_buckets/embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/Attention_0/add_grad/Shape_1)]]
I understand that the outputs of both the methods are different, but I do not know how to modify attention code to incorporate that. How do I send both the forward and backward states to the attention module- do I concatenate both the hidden states?
I find from the error message that the batch size of two tensors somewhere don't match, one is 32 and the other is 16. I suppose it is because the output list of the bidirectional rnn is double sized of that of the unidirectional one. And you just don't adjust to that in the following code accordingly.
How do I send both the forward and backward states to the attention
module- do I concatenate both the hidden states?
You can reference this code:
def _reduce_states(self, fw_st, bw_st):
"""Add to the graph a linear layer to reduce the encoder's final FW and BW state into a single initial state for the decoder. This is needed because the encoder is bidirectional but the decoder is not.
Args:
fw_st: LSTMStateTuple with hidden_dim units.
bw_st: LSTMStateTuple with hidden_dim units.
Returns:
state: LSTMStateTuple with hidden_dim units.
"""
hidden_dim = self._hps.hidden_dim
with tf.variable_scope('reduce_final_st'):
# Define weights and biases to reduce the cell and reduce the state
w_reduce_c = tf.get_variable('w_reduce_c', [hidden_dim * 2, hidden_dim], dtype=tf.float32, initializer=self.trunc_norm_init)
w_reduce_h = tf.get_variable('w_reduce_h', [hidden_dim * 2, hidden_dim], dtype=tf.float32, initializer=self.trunc_norm_init)
bias_reduce_c = tf.get_variable('bias_reduce_c', [hidden_dim], dtype=tf.float32, initializer=self.trunc_norm_init)
bias_reduce_h = tf.get_variable('bias_reduce_h', [hidden_dim], dtype=tf.float32, initializer=self.trunc_norm_init)
# Apply linear layer
old_c = tf.concat(axis=1, values=[fw_st.c, bw_st.c]) # Concatenation of fw and bw cell
old_h = tf.concat(axis=1, values=[fw_st.h, bw_st.h]) # Concatenation of fw and bw state
new_c = tf.nn.relu(tf.matmul(old_c, w_reduce_c) + bias_reduce_c) # Get new cell from old cell
new_h = tf.nn.relu(tf.matmul(old_h, w_reduce_h) + bias_reduce_h) # Get new state from old state
return tf.contrib.rnn.LSTMStateTuple(new_c, new_h) # Return new cell and state

TensorFlow 1.2 How to Setup Time Series Prediction at Inference Time Using Seq2Seq

I am trying to study the tf.contrib.seq2seq section of the TensorFlow library using a toy model. Currently, my graph is as follows:
tf.reset_default_graph()
# Placeholders
enc_inp = tf.placeholder(tf.float32, [None, n_steps, n_input])
expect = tf.placeholder(tf.float32, [None, n_steps, n_output])
expect_length = tf.placeholder(tf.int32, [None])
keep_prob = tf.placeholder(tf.float32, [])
# Encoder
cells = [tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.BasicLSTMCell(n_hidden), output_keep_prob=keep_prob) for i in range(layers_stacked_count)]
cell = tf.contrib.rnn.MultiRNNCell(cells)
encoded_outputs, encoded_states = tf.nn.dynamic_rnn(cell, enc_inp, dtype=tf.float32)
# Decoder
de_cells = [tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.BasicLSTMCell(n_hidden), output_keep_prob=keep_prob) for i in range(layers_stacked_count)]
de_cell = tf.contrib.rnn.MultiRNNCell(de_cells)
training_helper = tf.contrib.seq2seq.TrainingHelper(expect, expect_length)
decoder = tf.contrib.seq2seq.BasicDecoder(cell=de_cell, helper=training_helper, initial_state=encoded_states)
final_outputs, final_state, final_sequence_lengths = tf.contrib.seq2seq.dynamic_decode(decoder)
decoder_logits = final_outputs.rnn_output
h = tf.contrib.layers.fully_connected(decoder_logits, n_output)
diff = tf.squared_difference(h, expect)
batch_loss = tf.reduce_sum(diff, axis=1)
loss = tf.reduce_mean(batch_loss)
optimiser = tf.train.AdamOptimizer(1e-3)
training_op = optimiser.minimize(loss)
The graph trains very well and executes fine. However, I am not sure what to do at inference time, since this graph always requires the expect variable (the value which I am trying to predict).
As I understand, the TrainingHelper function is using the ground truth as input, so what I need is another helper function at inference time.
Most implementations of seq2seq model I've seem appears to be outdated (tf.contrib.legacy_seq2seq). Some of the most up-to-date models often use GreddyEmbeddingHelper, which I'm not sure is appropriate for continuous time series predictions.
Another possible solution I've found is to use the CustomHelper function. However, there is no little material out there for me to learn and I've just kept banging my head against the wall.
If I am trying to implement a seq2seq model for time series prediction, what should I do at inference time?
Any help or advice would be greatly appreciated. Thanks in advance!
You are right that you need to use another helper function for inference, but you need to share weights between testing and inference.
You can do this with tf.variable_scope()
with tf.variable_scope("decode"):
training_helper = ...
with tf.variable_scope("decode", reuse = True):
inference_helper = ...
For a more complete example, see one of these two examples:
https://github.com/pplantinga/tensorflow-examples/blob/master/TensorFlow%201.2%20seq2seq%20example.ipynb
https://github.com/udacity/deep-learning/blob/master/seq2seq/sequence_to_sequence_implementation.ipynb