Tensorflow dynamic_rnn parameters meaning - tensorflow

I'm struggling to understand the cryptic RNN docs. Any help with the following will be greatly appreciated.
tf.nn.dynamic_rnn(cell, inputs, sequence_length=None, initial_state=None, dtype=None, parallel_iterations=None, swap_memory=False, time_major=False, scope=None)
I'm struggling to understand how these parameters relate to the mathematical LSTM equations and RNN definition. Where is the cell unroll size? Is it defined by the 'max_time' dimension of the inputs? Is the batch_size only a convenience for splitting long data or it's related to minibatch SGD? Is the output state passed across batches?

tf.nn.dynamic_rnn takes in a batch (with the minibatch meaning) of unrelated sequences.
cell is the actual cell that you want to use (LSTM, GRU,...)
inputs has a shape of batch_size x max_time x input_size in which max_time is the number of steps in the longest sequence (but all sequences could be of the same length)
sequence_length is a vector of size batch_size in which each element gives the length of each sequence in the batch (leave it as default if all your sequences are of the same size. This parameter is the one that defines the cell unroll size.
Hidden state handling
The usual way of handling hidden state is to define an initial state tensor before the dynamic_rnn, like this for instance :
hidden_state_in = cell.zero_state(batch_size, tf.float32)
output, hidden_state_out = tf.nn.dynamic_rnn(cell,
inputs,
initial_state=hidden_state_in,
...)
In the above snippet, both hidden_state_in and hidden_state_out have the same shape [batch_size, ...] (the actual shape depends on the type of cell you use but the important thing is that the first dimension is the batch size).
This way, dynamic_rnn has an initial hidden state for each sequence. It will pass on the hidden state from time step to time step for each sequence in the inputs parameter on its own, and hidden_state_out will contain the final output state for each sequence in the batch. No hidden state is passed between sequences of the same batch, but only between time steps of the same sequence.
When do I need to feed back the hidden state manually?
Usually, when you're training, every batch is unrelated so you don't have to feed back the hidden state when doing a session.run(output).
However, if you're testing, and you need the output at each time step, (i.e. you have to do a session.run() at every time step) you'll want to evaluate and feed back the output hidden state using something like this :
output, hidden_state = sess.run([output, hidden_state_out],
feed_dict={hidden_state_in:hidden_state})
otherwise tensorflow will just use the default cell.zero_state(batch_size, tf.float32) at each time step which equates to reinitialising the hidden state at each time step.

Related

Keras variable input

Im working through a Keras example at https://www.tensorflow.org/tutorials/text/text_generation
The model is built here:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim,
batch_input_shape=[batch_size, None]),
tf.keras.layers.GRU(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform'),
tf.keras.layers.Dense(vocab_size)
])
return model
During training, they always pass in a length 100 array of ints.
But during prediction, they are able to pass in any length of input and the output is the same length as the input. I was always under the impression that the lengths of the time steps had to be the same. Is that not the case and the # of time steps of the RNN somehow can change?
RNNs are sequence models, ie. they take in a sequence of input and give out a sequence of outputs. The sequence length is also called the time steps is number of time the RNN cell is unwrapped and for each unwrapping an input is passed and RNN cell using its gates gives out an output (per each unwrapping). So in theory you can have as long sequence as you want. Now lets assume you have different inputs of different size, since you cannot have variable size inputs in a single batches you have to collect the inputs of same size an make a batch if you want to train using batches. You can as well use batch size of 1 and not worry about all this, but training become painfully slow.
In ptractical situations, while training we divide input into same sizes so that training become fast. There are situations like language translation models where this is not feasible.
So in theory RNNs does not have any limitation on the sequence length, however large sequence will start to loose the context at the begging as the sequence length increases.
While predictions you can use any sequence length you want to.
In you case your output size is same as input size because of return_sequences=True. You can as well have single output by using return_sequences=False where in only the output of last unwrapping is returned by keras.
Length of training sequences should not be equal to predicted length.
RNN deals with two vectors: new word and hidden state (accumulated from the previous words). It doesn't keep length of sequence.
But to get good prediction of long sequences - you have to train RNN with long sequences - because RNN should learn a long context.

Understanding the functioning of a recurrent neural network with LSTM cells

Context:
I have a recurrent neural network with LSTM cells
The input to the network is a batch of size (batch_size, number_of_timesteps, one_hot_encoded_class) in my case (128, 300, 38)
The different rows of the batch (1-128) are not necessarily related
to each other
The target for one time step is given by the value of the next
time step.
My questions:
When I train the network using an input batch of (128,300,38) and a target batch of the same size,
does the network always consider only the last time-step t to predict the value of the next timestep t+1?
or does it consider all time steps from the beginning of the sequence up to time step t?
or does the LSTM cell internally remember all previous states?
I am confused about the functioning because the network is trained on multiple time steps simulatenously so I am not sure how the LSTM cell can still have knowledge of the previous states.
I hope somebody can help. Thanks in advance!
Code for dicussion:
cells = []
for i in range(self.n_layers):
cell = tf.contrib.rnn.LSTMCell(self.n_hidden)
cells.append(cell)
cell = tf.contrib.rnn.MultiRNNCell(cells)
init_state = cell.zero_state(self.batch_size, tf.float32)
outputs, final_state = tf.nn.dynamic_rnn(
cell, inputs=self.inputs, initial_state=init_state)
self.logits = tf.contrib.layers.linear(outputs, self.num_classes)
softmax_ce = tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=labels, logits=self.logits)
self.loss = tf.reduce_mean(softmax_ce)
self.train_step = tf.train.AdamOptimizer(self.lr).minimize(self.loss)
The above is a simple RNN unrolled to the neuron level with 3 time steps.
As you can see that the output at time step t, depends upon all time steps from the beginning. The network is trained using back-propagation through time where the weights are updated by the contribution of all error gradients across time. The weights are shared across time, so there is nothing like simultaneous update on all time steps.
The knowledge of the previous states are transfered through the state variable s_t as it is a function of previous inputs. So at any time step, the prediction is made based on the current input as well as (function of) previous inputs captured by the state variable.
NOTE: A basic rnn was used instead of LSTM because of simplicity.
Here's what would be helpful to keep in mind for your case specifically:
Given the input shape of [128, 300, 38]
One call to dynamic_rnn will propagate through all 300 steps, and if you are using something like LSTM, the state will also be carried through those 300 steps
However, each SUBSEQUENT call to dynamic_rnn will not automatically remember the state from the previous call. By the second call, the weights/etc. will have been updated thanks to the first call, but you will still need to pass the state that resulted from the first call into the second call. That's why dynamic_rnn has a parameter initial_state and that's why one of its outputs is final_state (i.e. the state after processing all 300 steps in ONE call). So you are meant to take the final state from call N and pass it back as the initial state for call N+1 to dynamic_rnn. This allrelates specifically to LSTM, since this is what you asked for
You are right to note that elements in one batch don't necessarily relate to each other within the same batch. This is something you need to consider carefully. Because with successive calls to dynamic_rnn, batch elements in your input sequences have to relate to their respective counterparts in the previous/following sequence, but not to each other. I.e. element 3 in the first call may have nothing to do with the other 127 elements within the same batch, but element 3 in the NEXT call has to be the temporal/logical continuation of element 3 in the PREVIOUS call, and so forth. This way, the state that you keep passing forward makes sense continuously

Analysis of the output from tf.nn.dynamic_rnn tensorflow function

I am not able to understand the output from tf.nn.dynamic_rnn tensorflow function. The document just tells about the size of the output, but it doesn't tell what does each row/column means. From the documentation:
outputs: The RNN output Tensor.
If time_major == False (default), this will be a Tensor shaped:
[batch_size, max_time, cell.output_size].
If time_major == True, this will be a Tensor shaped:
[max_time, batch_size, cell.output_size].
Note, if cell.output_size is a (possibly nested) tuple of integers
or TensorShape objects, then outputs will be a tuple having the
same structure as cell.output_size, containing Tensors having shapes
corresponding to the shape data in cell.output_size.
state: The final state. If cell.state_size is an int, this will
be shaped [batch_size, cell.state_size]. If it is a
TensorShape, this will be shaped [batch_size] + cell.state_size.
If it is a (possibly nested) tuple of ints or TensorShape, this will
be a tuple having the corresponding shapes.
The outputs tensor is a 3-D matrix but what does each row/column represent?
tf.dynamic_rnn provides two outputs, outputs and state.
outputs contains the output of the RNN cell at every time instant. Assuming the default time_major == False, let's say you have an input composed of 10 examples with 7 time steps each and a feature vector of size 5 for every time step. Then your input would be 10x7x5 (batch_sizexmax_timexfeatures). Now you give this as an input to a RNN cell with output size 15. Conceptually, each time step of each example is input to the RNN, and you would get a 15-long vector for each of those. So that is what outputs contains, a tensor in this case of size 10x7x15 (batch_sizexmax_timexcell.output_size) with the output of the RNN cell at each time step. If you are only interested in the last output of the cell, you can just slice the time dimension to pick just the last element (e.g. outputs[:, -1, :]).
state contains the state of the RNN after processing all the inputs. Note that, unlike outputs, this doesn't contain information about every time step, but only about the last one (that is, the state after the last one). Depending on your case, the state may or may not be useful. For example, if you have very long sequences, you may not want/be able to processes them in a single batch, and you may need to split them into several subsequences. If you ignore the state, then whenever you give a new subsequence it will be as if you are beginning a new one; if you remember the state, however (e.g. outputting it or storing it in a variable), you can feed it back later (through the initial_state parameter of tf.nn.dynamic_rnn) in order to correctly keep track of the state of the RNN, and only reset it to the initial state (generally all zeros) after you have completed the whole sequences. The shape of state can vary depending on the RNN cell that you are using, but, in general, you have some state for each of the examples (one or more tensors with size batch_sizexstate_size, where state_size depends on the cell type and size).

Tensorflow - LSTM state reuse within batch

I am working on a Tensorflow NN which uses an LSTM to track a parameter (time series data regression problem). A batch of training data contains a batch_size of consecutive observations. I would like to use the LSTM state as input to the next sample. So, if I have a batch of data observations, I would like to feed the state of the first observation as input to the second observation and so on. Below I define the lstm state as a tensor of size = batch_size. I would like to reuse the state within a batch:
state = tf.Variable(cell.zero_states(batch_size, tf.float32), trainable=False)
cell = tf.nn.rnn_cell.BasicLSTMCell(100)
output, curr_state = tf.nn.rnn(cell, data, initial_state=state)
In the API there is a tf.nn.state_saving_rnn but the documentation is kinda vague. My question: How to reuse curr_state within a training batch.
You are basically there, just need to update state with curr_state:
state_update = tf.assign(state, curr_state)
Then, make sure you either call run on state_update itself or an operation that has state_update as a dependency, or the assignment will not actually happen. For example:
with tf.control_dependencies([state_update]):
model_output = ...
As suggested in the comments, the typical case for RNNs is that you have a batch where the first dimension (0) is the number of sequences and the second dimension (1) is the maximum length of each sequence (if you pass time_major=True when you build the RNN these two are swapped). Ideally, in order to get good performance, you stack multiple sequences into one batch, and then split that batch time-wise. But that's all a different topic really.

Tensorflow unrolled LSTM longer than input sequence

I want to create an LSTM in tensorflow to predict time-series data. My training data is a set of input/output sequences of different lengths. Can I include multiple sequences of different lengths in the same training batch? Or do I need to pad them to equal lengths? If so, how?
Also: What will tensorflow do if the unrolled RNN is longer than the input sequence? The rnn() method contains an optional sequence_length argument which appears designed to handle this eventuality, but I'm not clear what it does.
Do you want to build the model from scratch? Otherwise you might want to look into the translate.py-model. Here your issue is taken care of by:
- padding the input (and output) sequences with a PAD-symbol (basically a neutral "no info"-symbol)
- buckets: For different groups of lengths you can create different buckets (makes sense only if your sequence-lengths are very different shortest to longest
You DONT have to batch inputs/output sequence of same length into a batch. TF has a way to specify the input size. The parameter "sequence_length", controls the number of time steps a cell is unrolled. So the TF will unroll your cell only up to sequence_length but not to the step size.
So while feeding the inputs and outputs also feed a sequence_length array which contain the length of each input
tf.nn.bidirectional_rnn(fwd_stacked_lstm_cells, bwd_stacked_lstm_cells,
reshaped_inputs,
sequence_length=sequence_length)
.....
feed_dict={
model.inputs: x,
model.targets: y,
model.sequence_length: lengths})
where
len(lengths) == batch_size and
for all i, lengths[i] == length of input x[i] (same as length of outpu y[i])