Confused on how to run tensorflow LSTM - tensorflow

I have seen two different ways of calling lstm on tensorflow and I am confused on what is the difference of one method with the other. And in which situation to use one or the other
The first one is to create an lstm and then call it immediatly like the code below
lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
initial_state = tf.zeros([batch_size, lstm.state_size])
for i in range(num_steps):
# The value of state is updated after processing each batch of words.
output, state = lstm(words[:, i], state)
And the second one is call lstm cell through rnn.rnn() like below.
# Define a lstm cell with tensorflow
lstm = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Split data because rnn cell needs a list of inputs for the RNN inner loop
inputToLstmSplited = tf.split(0, n_steps, inputToLstm) # n_steps * (batch_size, n_hidden)
inputToLstmSplitedFiltered = tf.matmul(inputToLstmSplited, weights['hidden']) + biases['hidden']
# Get lstm cell out
outputs, states = rnn.rnn(lstm, inputToLstmSplited, initial_state=istate)

The second effectively does the same as the loop in the first, returning a list of all the outputs collected in the loop and the final state. It does it a bit more efficiently though and with a number of safety checks. It also supports useful features like variable sequence lengths. The first option is presented in Tensorflow tutorials to give you an idea of how an RNN is unravelled, but the second option is preferred for "production" code.

Related

Getting keras LSTM layer to accept two inputs?

I'm working with padded sequences of maximum length 50. I have two types of sequence data:
1) A sequence, seq1, of integers (1-100) that correspond to event types (e.g. [3,6,3,1,45,45....3]
2) A sequence, seq2, of integers representing time, in minutes, from the last event in seq1. So the last element is zero, by definition. So for example [100, 96, 96, 45, 44, 12,... 0]. seq1 and seq2 are the same length, 50.
I'm trying to run the LSTM primarily on the event/seq1 data, but have the time/seq2 strongly influence the forget gate within the LSTM. The reason for this is I want the LSTM to tend to really penalize older events and be more likely to forget them. I was thinking about multiplying the forget weight by the inverse of the current value of the time/seq2 sequence. Or maybe (1/seq2_element + 1), to handle cases where it's zero minutes.
I see in the keras code (LSTMCell class) where the change would have to be:
f = self.recurrent_activation(x_f + K.dot(h_tm1_f,self.recurrent_kernel_f))
So I need to modify keras' LSTM code to accept multiple inputs. As an initial test, within the LSTMCell class, I changed the call function to look like this:
def call(self, inputs, states, training=None):
time_input = inputs[1]
inputs = inputs[0]
So that it can handle two inputs given as a list.
When I try running the model with the Functional API:
# Input 1: event type sequences
# Take the event integer sequences, run them through an embedding layer to get float vectors, then run through LSTM
main_input = Input(shape =(max_seq_length,), dtype = 'int32', name = 'main_input')
x = Embedding(output_dim = embedding_length, input_dim = num_unique_event_symbols, input_length = max_seq_length, mask_zero=True)(main_input)
## Input 2: time vectors
auxiliary_input = Input(shape=(max_seq_length,1), dtype='float32', name='aux_input')
m = Masking(mask_value = 99999999.0)(auxiliary_input)
lstm_out = LSTM(32)(x, time_vector = m)
# Auxiliary loss here from first input
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)
# An abitrary number of dense, hidden layers here
x = Dense(64, activation='relu')(lstm_out)
# The main output node
main_output = Dense(1, activation='sigmoid', name='main_output')(x)
## Compile and fit the model
model = Model(inputs=[main_input, auxiliary_input], outputs=[main_output, auxiliary_output])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'], loss_weights=[1., 0.2])
print(model.summary())
np.random.seed(21)
model.fit([train_X1, train_X2], [train_Y, train_Y], epochs=1, batch_size=200)
However, I get the following error:
An `initial_state` was passed that is not compatible with `cell.state_size`. Received `state_spec`=[InputSpec(shape=(None, 50, 1), ndim=3)]; however `cell.state_size` is (32, 32)
Any advice?
You can't pass a list of inputs to default recurrent layers in Keras. The input_spec is fixed and the recurrent code is implemented based on single tensor input also pointed out in the documentation, ie it doesn't magically iterate over 2 inputs of same timesteps and pass that to the cell. This is partly because of how the iterations are optimised and assumptions made if the network is unrolled etc.
If you like 2 inputs, you can pass constants (doc) to the cell which will pass the tensor as is. This is mainly to implement attention models in the future. So 1 input will iterate over timesteps while the other will not. If you really like 2 inputs to be iterated like a zip() in python, you will have to implement a custom layer.
I would like to throw in a different ideas here. They don't require you to modify the Keras code.
After the embedding layer of the event types, stack the embeddings with the elapsed time. The Keras function is keras.layers.Concatenate(axis=-1). Imagine this, a single even type is mapped to a n dimensional vector by the embedding layer. You just add the elapsed time as one more dimension after the embedding so that it becomes a n+1 vector.
Another idea, sort of related to your problem/question and may help here, is 1D convolution. The convolution can happen right after the concatenated embeddings. The intuition for applying convolution to event types and elapsed time is actually 1x1 convolution. In such a way that you linearly combine the two together and the parameters are trained. Note in terms of convolution, the dimensions of the vectors are called channels. Of course, you can also convolve more than 1 event at a step. Just try it. It may or may not help.

tf.zeros vs tf.placeholder as RNN initial state

Tensorflow newbie here! I understand that Variables will be trained over time, placeholders are used input data that doesn't change as your model trains (like input images, and class labels for those images).
I'm trying to implement the forward propagation of RNN using Tensorflow, and wondering on what type I should save the output of the RNN cell. In numpy RNN implementation, it uses
hiddenStates = np.zeros((T, self.hidden_dim)) #T is the length of the sequence
Then it iteratively saves the output in the np.zeros array.
In case of TF, which one should I use, tf.zeros or tf.placeholder?
What is the best practice in this case? I think it should be fine to use tf.zeros but wanted to double check.
First of all, it is important to you to understand that everything inside Tensorflow is a Tensor. So when you are performing some kind of computation (e.g. an rnn implementation like outputs = rnn(...)) the output of this computation is returned as a Tensor. So you don't need to store it inside any kind of structure. You can retrieve it by running the correspondent node (i.e. output) like session.run(output, feed_dict).
Told this, I think you need to take the final state of an RNN and provide it as initial state of a subsequent computation. Two ways:
A) If you are using RNNCell implementations During the construction of your model you can construct the zero state like this:
cell = (some RNNCell implementation)
initial_state = cell.zero_state(batch_size, tf.float32)
B) If you are uimplementing your own staff Define the state as a zero Tensor:
initial_state = tf.zeros([batch_size, hidden_size])
Then, in both cases you will have something like:
output, final_state = rnn(input, initial_state)
In your execution loop you can initialize your state first and then provide the final_state as initial_stateinside your feed_dict:
state = session.run(initial_state)
for step in range(epochs):
feed_dict = {initial_state: state}
_, state = session.run((train_op,final_state), feed_dict)
How you actually construct your feed_dict depends on the implementation of the RNN.
For an BasicLSTMCell, for example, a state is an LSTMState object and you need to provide both c and h:
feed_dict = {initial_state.c=state.c, initial_state.h: state.h}

Tensorflow RNN-LSTM - reset hidden state

I'm building a statefull LSTM used for language recognition.
Being statefull I can train the network with smaller files and a new batch will be like a next sentence in a discussion.
However for the network to be properly trained I need to reset the hidden state of the LSTM between some batches.
I'm using a variable to store the hidden_state of the LSTM for performance :
with tf.variable_scope('Hidden_state'):
hidden_state = tf.get_variable("hidden_state", [self.num_layers, 2, self.batch_size, self.hidden_size],
tf.float32, initializer=tf.constant_initializer(0.0), trainable=False)
# Arrange it to a tuple of LSTMStateTuple as needed
l = tf.unstack(hidden_state, axis=0)
rnn_tuple_state = tuple([tf.contrib.rnn.LSTMStateTuple(l[idx][0], l[idx][1])
for idx in range(self.num_layers)])
# Build the RNN
with tf.name_scope('LSTM'):
rnn_output, _ = tf.nn.dynamic_rnn(cell, rnn_inputs, sequence_length=input_seq_lengths,
initial_state=rnn_tuple_state, time_major=True)
Now I'm confused on how to reset the hidden state. I've tried two solutions but it's not working :
First solution
Reset the "hidden_state" variable with :
rnn_state_zero_op = hidden_state.assign(tf.zeros_like(hidden_state))
It does work and I think it's because the unstack and tuple construction are not "re-played" into the graph after running the rnn_state_zero_op operation.
Second solution
Following LSTMStateTuple vs cell.zero_state() for RNN in Tensorflow I tried to reset the cell state with :
rnn_state_zero_op = cell.zero_state(self.batch_size, tf.float32)
It doesn't seem to work either.
Question
I've another solution in mind but it's guessing at best : I'm not keeping the state returned by tf.nn.dynamic_rnn, I've thought of it but I get a tuple and I can't find a way to build an op to reset the tuple.
At this point I've to admit that I don't quite understand the internal working of tensorflow and if it's even possible to do what I'm trying to do.
Is there a proper way to do it ?
Thanks !
Thanks to this answer to another question I was able to find a way to have complete control on whether or not (and when) the internal state of the RNN should be reset to 0.
First you need to define some variables to store the state of the RNN, this way you will have control over it :
with tf.variable_scope('Hidden_state'):
state_variables = []
for state_c, state_h in cell.zero_state(self.batch_size, tf.float32):
state_variables.append(tf.nn.rnn_cell.LSTMStateTuple(
tf.Variable(state_c, trainable=False),
tf.Variable(state_h, trainable=False)))
# Return as a tuple, so that it can be fed to dynamic_rnn as an initial state
rnn_tuple_state = tuple(state_variables)
Note that this version define directly the variables used by the LSTM, this is much better than the version in my question because you don't have to unstack and build the tuple, which add some ops to the graph that you cannot run explicitly.
Secondly build the RNN and retrieve the final state :
# Build the RNN
with tf.name_scope('LSTM'):
rnn_output, new_states = tf.nn.dynamic_rnn(cell, rnn_inputs,
sequence_length=input_seq_lengths,
initial_state=rnn_tuple_state,
time_major=True)
So now you have the new internal state of the RNN. You can define two ops to manage it.
The first one will update the variables for the next batch. So in the next batch the "initial_state" of the RNN will be fed with the final state of the previous batch :
# Define an op to keep the hidden state between batches
update_ops = []
for state_variable, new_state in zip(rnn_tuple_state, new_states):
# Assign the new state to the state variables on this layer
update_ops.extend([state_variable[0].assign(new_state[0]),
state_variable[1].assign(new_state[1])])
# Return a tuple in order to combine all update_ops into a single operation.
# The tuple's actual value should not be used.
rnn_keep_state_op = tf.tuple(update_ops)
You should add this op to your session anytime you want to run a batch and keep the internal state.
Beware : if you run batch 1 with this op called then batch 2 will start with the batch 1 final state, but if you don't call it again when running batch 2 then batch 3 will start with batch 1 final state also. My advice is to add this op every time you run the RNN.
The second op will be used to reset the internal state of the RNN to zeros:
# Define an op to reset the hidden state to zeros
update_ops = []
for state_variable in rnn_tuple_state:
# Assign the new state to the state variables on this layer
update_ops.extend([state_variable[0].assign(tf.zeros_like(state_variable[0])),
state_variable[1].assign(tf.zeros_like(state_variable[1]))])
# Return a tuple in order to combine all update_ops into a single operation.
# The tuple's actual value should not be used.
rnn_state_zero_op = tf.tuple(update_ops)
You can call this op whenever you want to reset the internal state.
Simplified version of AMairesse post for one LSTM layer:
zero_state = tf.zeros(shape=[1, units[-1]])
self.c_state = tf.Variable(zero_state, trainable=False)
self.h_state = tf.Variable(zero_state, trainable=False)
self.init_encoder = tf.nn.rnn_cell.LSTMStateTuple(self.c_state, self.h_state)
self.output_encoder, self.state_encoder = tf.nn.dynamic_rnn(cell_encoder, layer, initial_state=self.init_encoder)
# save or reset states
self.update_ops += [self.c_state.assign(self.state_encoder.c, use_locking=True)]
self.update_ops += [self.h_state.assign(self.state_encoder.h, use_locking=True)]
or you can use replacement for init_encoder to reset states at step == 0 (you need to pass self.step_tf into session.run() as placeholder):
self.step_tf = tf.placeholder_with_default(tf.constant(-1, dtype=tf.int64), shape=[], name="step")
self.init_encoder = tf.cond(tf.equal(self.step_tf, 0),
true_fn=lambda: tf.nn.rnn_cell.LSTMStateTuple(zero_state, zero_state),
false_fn=lambda: tf.nn.rnn_cell.LSTMStateTuple(self.c_state, self.h_state))

Understanding CNN + LSTM RNN in Tensorflow

Hey there I want to use a CNN + RNN in order to do a regression task on images and I am not sure how to properly handle sequence length and states.
I thought about doing the following: Use the CNN to extract features for one frame. Put the flattened activation maps into the LSTM and save the state. Reduce the LSTM output to my regression value. For the next frame I would do the same with restoring the state of the LSTM with the previous iteration.
But that feels completly wrong since I am building a RNN around my LSTM cell which is not how its supposed to be right?
But if I input a sequence of frames into the LSTM (after CNN application to all of them) I get multiple outputs and a state. If I reuse that state I dont see the point for the frame sequence at all. I am totally confused.
Currently I am doing this but that is not working better than just a CNN for applied on every frame...
with tf.variable_scope('CNN'):
for time_step in xrange(sequence_length):
if time_step > 0: tf.get_variable_scope().reuse_variables()
cnn_res = CNN(images[time_step], normalizer_params=normalizer_params, regularizer=regularizer)
cnn_outputs.append(cnn_res)
cnn_outputs = tf.pack(cnn_outputs)
with tf.variable_scope('RNN'):
lstm_cell = LSTMBlockCell(128)
lstm_cell = tf.nn.rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=keep_prob)
cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] *3)
(rnn_outputs, state) = tf.nn.dynamic_rnn(cell, cnn_outputs,initial_state=initial_state, time_major=True,dtype=tf.float32)
rnn_outputs = rnn_outputs[sequence_length-1] # Using only last output for sequence, also tried to take every output into account.
rnn_outputs = layers.flatten(rnn_outputs)
Some fully connected layers reduce the rnn_outputs to my single value.
Actually what I want to do something is this (only that I want to get a value for the currently received frame without having any future frames): How do you pass video features from a CNN to an LSTM? but I have a hard time realizing this in tensorflow

Tensorflow RNN sequence training

I'm making my first steps learning TF and have some trouble training RNNs.
My toy problem goes like this: a two layers LSTM + dense layer network is fed with raw audio data and should test whether a certain frequency is present in the sound.
so the network should 1 to 1 map float(audio data sequence) to float(pre-chosen frequency volume)
I've got this to work on Keras and seen a similar TFLearn solution but would like to implement this on bare Tensorflow in a relatively efficient way.
what i've done:
lstm = rnn_cell.BasicLSTMCell(LSTM_SIZE,state_is_tuple=True,forget_bias=1.0)
lstm = rnn_cell.DropoutWrapper(lstm)
stacked_lstm = rnn_cell.MultiRNNCell([lstm] * 2,state_is_tuple=True)
outputs, states = rnn.dynamic_rnn(stacked_lstm, in, dtype=tf.float32)
outputs = tf.transpose(outputs, [1, 0, 2])
last = tf.gather(outputs, int(outputs.get_shape()[0]) - 1)
network= tf.matmul(last, W) + b
# cost function, optimizer etc...
during training I fed this with (BATCH_SIZE, SEQUENCE_LEN,1) batches and it seems like the loss converged correctly but I can't figure out how to predict with the trained network.
My (awful lot of) questions:
how do i make this network return a sequence right from Tensorflow without going back to python for each sample(feed a sequence and predict a sequence of the same size)?
If I do want to predict one sample at a time and iterate in python what is the correct way to do it?
During testing is dynamic_rnn needed or it's just used for unrolling for BPTT during training? why is dynamic_rnn returning all the back propagation steps Tensors? these are the outputs of each layer of the unrolled network right?
after some research:
how do i make this network return a sequence right from Tensorflow
without going back to python for each sample(feed a sequence and
predict a sequence of the same size)?
you can use state_saving_rnn
class Saver():
def __init__(self):
self.d = {}
def state(self, name):
if not name in self.d:
return tf.zeros([1,LSTM_SIZE],tf.float32)
return self.d[name]
def save_state(self, name, val):
self.d[name] = val
return tf.identity('save_state_name') #<-important for control_dependencies
outputs, states = rnn.state_saving_rnn(stacked_lstm, inx, Saver(),
('lstmstate', 'lstmstate2', 'lstmstate3', 'lstmstate4'),sequence_length=[EVAL_SEQ_LEN])
#4 states are for two layers of lstm each has hidden and CEC variables to restore
network = [tf.matmul(outputs[-1], W) for i in xrange(EVAL_SEQ_LEN)]
one problem is that state_saving_rnn is using rnn() and not dynamic_rnn() therefore unroll at compile time EVAL_SEQ_LEN steps you might want to re-implement state_saving_rnn with dynamic_rnn if you want to input long sequences
If I do want to predict one sample at a time and iterate in python what is the correct way to do it?
you can use dynamic_rnn and supply initial_state. this is probably just as efficient as state_saving_rnn. look at state_saving_rnn implementations for reference
During testing is dynamic_rnn needed or it's just used for unrolling for BPTT during training? why is dynamic_rnn returning all the back propagation steps Tensors? these are the outputs of each layer of the unrolled network right?
dynamic_rnn does do unrolling at runtime similarly to compile time rnn(). I guess it returns all the steps for you to branch the graph in some other places - after less time steps. in a network that use [one time step input * current state -> one output, new state] like the one described above it's not needed in testing but could be used for training truncated time back propagation