Bi-directional LSTM for variable-length sequence in Tensorflow - tensorflow

I want to train a bi-directional LSTM in tensorflow to perform a sequence classification problem (sentiment classification).
Because sequences are of variable lengths, batches are normally padded with vectors of zero. Normally, I use the sequence_length parameter in the uni-directional RNN to avoid training on the padding vectors.
How can this be managed with bi-directional LSTM. Does the "sequence_length" parameter work automatically starts from an advanced position in the sequence for the backward direction?
Thank you

bidirectional_dynamic_rnn also has a sequence_length parameter that takes care of sequences of variable lengths.
https://www.tensorflow.org/api_docs/python/tf/nn/bidirectional_dynamic_rnn (mirror):
sequence_length: An int32/int64 vector, size [batch_size], containing the actual lengths for each of the sequences.
You can see an example here: https://github.com/Franck-Dernoncourt/NeuroNER/blob/master/src/entity_lstm.py

In forward pass, rnn cell will stop at sequence_length which is the no-padding length of the input and is a parameter in tf.nn.bidirectional_dynamic_rnn. In backward pass, it firstly use function tf.reverse_sequence to reverse the first sequence_length elements and then traverse like that in the forward pass.
https://tensorflow.google.cn/api_docs/python/tf/reverse_sequence
This op first slices input along the dimension batch_axis, and for each slice i, reverses the first seq_lengths[i] elements along the dimension seq_axis.

Related

Keras variable input

Im working through a Keras example at https://www.tensorflow.org/tutorials/text/text_generation
The model is built here:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim,
batch_input_shape=[batch_size, None]),
tf.keras.layers.GRU(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform'),
tf.keras.layers.Dense(vocab_size)
])
return model
During training, they always pass in a length 100 array of ints.
But during prediction, they are able to pass in any length of input and the output is the same length as the input. I was always under the impression that the lengths of the time steps had to be the same. Is that not the case and the # of time steps of the RNN somehow can change?
RNNs are sequence models, ie. they take in a sequence of input and give out a sequence of outputs. The sequence length is also called the time steps is number of time the RNN cell is unwrapped and for each unwrapping an input is passed and RNN cell using its gates gives out an output (per each unwrapping). So in theory you can have as long sequence as you want. Now lets assume you have different inputs of different size, since you cannot have variable size inputs in a single batches you have to collect the inputs of same size an make a batch if you want to train using batches. You can as well use batch size of 1 and not worry about all this, but training become painfully slow.
In ptractical situations, while training we divide input into same sizes so that training become fast. There are situations like language translation models where this is not feasible.
So in theory RNNs does not have any limitation on the sequence length, however large sequence will start to loose the context at the begging as the sequence length increases.
While predictions you can use any sequence length you want to.
In you case your output size is same as input size because of return_sequences=True. You can as well have single output by using return_sequences=False where in only the output of last unwrapping is returned by keras.
Length of training sequences should not be equal to predicted length.
RNN deals with two vectors: new word and hidden state (accumulated from the previous words). It doesn't keep length of sequence.
But to get good prediction of long sequences - you have to train RNN with long sequences - because RNN should learn a long context.

how to deal with padded 0 during feedforward process

Assume I have a lists of inputs of different sizes, for example, some are of the shape[10,9,5] some are [7,6,5], I have to pad 0s to feed them into tensor flow with the same size, say [10,9,5], I need to do matrix multiplication and add the biases during the forward process which will introduce numbers in the padded 0 positions. So I have to create a mask matrix by myself to mask them? Or is there an easier way from tensorflow? Thanks!
BTW, I'm not feeding sequences nor using rnn. so I cannot use dynamic rnn
I think you may use attention mechanism to convert the variable-length inputs to some fixed length tensor before you feed them into a feed forward network.

What is a dynamic RNN in TensorFlow?

I am confused about what dynamic RNN (i.e. dynamic_rnn) is. It returns an output and a state in TensorFlow. What are these state and output? What is dynamic in a dynamic RNN, in TensorFlow?
Dynamic RNN's allow for variable sequence lengths. You might have an input shape (batch_size, max_sequence_length), but this will allow you to run the RNN for the correct number of time steps on those sequences that are shorter than max_sequence_length.
In contrast, there are static RNNs, which expect to run the entire fixed RNN length. There are cases where you might prefer to do this, such as if you are padding your inputs to max_sequence_length anyway.
In short, dynamic_rnn is usually what you want for variable length sequential data. It has a sequence_length parameter, and it is your friend.
While AlexDelPiero's answer was what I was googling for, the original question was different. You can take a look at this detailed description about LSTMs and intuition behind them. LSTM is the most common example of an RNN.
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
The short answer is: the state is an internal detail that is passed from one timestep to another. The output is a tensor of outputs on each timestep. You usually need to pass all outputs to the next RNN layer or the last output for the last RNN layer. To get the last output you can use output[:,-1,:]

Dynamic LSTM model in Tensorflow

I am looking to design a LSTM model using Tensorflow, wherein the sentences are of different length. I came across a tutorial on PTB dataset (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/rnn/ptb/ptb_word_lm.py). How does this model capture the instances of varying length? The example does not discuss anything about padding or other technique to handle the variable size sequences.
If I use padding, what should be the unrolling dimension?
You can do this in two way.
TF has a way to specify the input size. Look for a parameter called "sequence_length", I have used this in tf.nn.bidirectional_rnn. So the TF will unroll your cell only up to sequence_length but not to the step size.
Pad your input with predefined dummy input and predefined dummy output (for the dummy output). The lstm cell will learn to predict dummy output for the dummy input. When using it (say for matrix calculation) chop of the dummy parts.
The PTB model is truncated in time -- it always back-propagates a fixed number of steps (num_steps in the configs). So there is no padding -- it just reads the data and tries to predict the next word, and always reads num_steps words at a time.

Tensorflow unrolled LSTM longer than input sequence

I want to create an LSTM in tensorflow to predict time-series data. My training data is a set of input/output sequences of different lengths. Can I include multiple sequences of different lengths in the same training batch? Or do I need to pad them to equal lengths? If so, how?
Also: What will tensorflow do if the unrolled RNN is longer than the input sequence? The rnn() method contains an optional sequence_length argument which appears designed to handle this eventuality, but I'm not clear what it does.
Do you want to build the model from scratch? Otherwise you might want to look into the translate.py-model. Here your issue is taken care of by:
- padding the input (and output) sequences with a PAD-symbol (basically a neutral "no info"-symbol)
- buckets: For different groups of lengths you can create different buckets (makes sense only if your sequence-lengths are very different shortest to longest
You DONT have to batch inputs/output sequence of same length into a batch. TF has a way to specify the input size. The parameter "sequence_length", controls the number of time steps a cell is unrolled. So the TF will unroll your cell only up to sequence_length but not to the step size.
So while feeding the inputs and outputs also feed a sequence_length array which contain the length of each input
tf.nn.bidirectional_rnn(fwd_stacked_lstm_cells, bwd_stacked_lstm_cells,
reshaped_inputs,
sequence_length=sequence_length)
.....
feed_dict={
model.inputs: x,
model.targets: y,
model.sequence_length: lengths})
where
len(lengths) == batch_size and
for all i, lengths[i] == length of input x[i] (same as length of outpu y[i])