I am writing a LSTM network recently.There always a question confuse me. When I use a placeholder as a input of graph, how can I initialize the state of LSTM? Because of the dynamic shape of place holder(like batch size of the data)I don't know how to initialize hidden state dynamically with batch size through graph.
Input data like this:
input_seq = tf.compat.v1.placeholder(dtype=tf.float32,shape=\[2,None,3\])#time_step=2,batch_size=None
Input first time step:
seq = input_seq\[0\]
I'd like to set the hidden state as follows:
self.hidden_state = tf.zeros_like(tf.compat.v1.placeholder(dtype=tf.float32,shape=\[seq.get_shape()\[1\],self.hidden_dims\]))
I's all right before I execute the graph. But when I flow the graph, I have to feed the hidden state placeholder. So I'm finding a method don't to do this.
Related
I am using the following implementation of the Seq2Seq model. Now, if I want to pass some inputs and get the corresponding values of encoder's hidden state (self.encoder_last_state), how can I do it?
https://github.com/JayParks/tf-seq2seq/blob/master/seq2seq_model.py
You need to first assemble input_feed, similar to the predict routine. Once you have that, just execute sess.run over the required hidden layer.
To assmeble the input_feed:
input_feed = self.check_feeds(encoder_inputs, encoder_inputs_length, decoder_inputs=None, decoder_inputs_length=None, decode=True)
input_feed[self.keep_prob_placeholder.name] = 1.0
sess.run over self.encoder_last_state:
encoder_last_state_activations = sess.run(self.encoder_last_state, input_feed)
Tensorflow newbie here! I understand that Variables will be trained over time, placeholders are used input data that doesn't change as your model trains (like input images, and class labels for those images).
I'm trying to implement the forward propagation of RNN using Tensorflow, and wondering on what type I should save the output of the RNN cell. In numpy RNN implementation, it uses
hiddenStates = np.zeros((T, self.hidden_dim)) #T is the length of the sequence
Then it iteratively saves the output in the np.zeros array.
In case of TF, which one should I use, tf.zeros or tf.placeholder?
What is the best practice in this case? I think it should be fine to use tf.zeros but wanted to double check.
First of all, it is important to you to understand that everything inside Tensorflow is a Tensor. So when you are performing some kind of computation (e.g. an rnn implementation like outputs = rnn(...)) the output of this computation is returned as a Tensor. So you don't need to store it inside any kind of structure. You can retrieve it by running the correspondent node (i.e. output) like session.run(output, feed_dict).
Told this, I think you need to take the final state of an RNN and provide it as initial state of a subsequent computation. Two ways:
A) If you are using RNNCell implementations During the construction of your model you can construct the zero state like this:
cell = (some RNNCell implementation)
initial_state = cell.zero_state(batch_size, tf.float32)
B) If you are uimplementing your own staff Define the state as a zero Tensor:
initial_state = tf.zeros([batch_size, hidden_size])
Then, in both cases you will have something like:
output, final_state = rnn(input, initial_state)
In your execution loop you can initialize your state first and then provide the final_state as initial_stateinside your feed_dict:
state = session.run(initial_state)
for step in range(epochs):
feed_dict = {initial_state: state}
_, state = session.run((train_op,final_state), feed_dict)
How you actually construct your feed_dict depends on the implementation of the RNN.
For an BasicLSTMCell, for example, a state is an LSTMState object and you need to provide both c and h:
feed_dict = {initial_state.c=state.c, initial_state.h: state.h}
Question : I have an LSTM cell with variable scope name 'rnn' and assign it as 'scope'.
If I use scope.reuse_variables() within a graph, I know that the weights are reused for a new input X...
But if weights are reused, does the LSTM hidden state automatically reset? ... or do I have to explicitly reset the hidden state every time I call scope.reuse_variables()
Thank you !
The hidden state is not saved with the model. It depends on the input data (which gets fed/queued/etc.).
I'm struggling to understand the cryptic RNN docs. Any help with the following will be greatly appreciated.
tf.nn.dynamic_rnn(cell, inputs, sequence_length=None, initial_state=None, dtype=None, parallel_iterations=None, swap_memory=False, time_major=False, scope=None)
I'm struggling to understand how these parameters relate to the mathematical LSTM equations and RNN definition. Where is the cell unroll size? Is it defined by the 'max_time' dimension of the inputs? Is the batch_size only a convenience for splitting long data or it's related to minibatch SGD? Is the output state passed across batches?
tf.nn.dynamic_rnn takes in a batch (with the minibatch meaning) of unrelated sequences.
cell is the actual cell that you want to use (LSTM, GRU,...)
inputs has a shape of batch_size x max_time x input_size in which max_time is the number of steps in the longest sequence (but all sequences could be of the same length)
sequence_length is a vector of size batch_size in which each element gives the length of each sequence in the batch (leave it as default if all your sequences are of the same size. This parameter is the one that defines the cell unroll size.
Hidden state handling
The usual way of handling hidden state is to define an initial state tensor before the dynamic_rnn, like this for instance :
hidden_state_in = cell.zero_state(batch_size, tf.float32)
output, hidden_state_out = tf.nn.dynamic_rnn(cell,
inputs,
initial_state=hidden_state_in,
...)
In the above snippet, both hidden_state_in and hidden_state_out have the same shape [batch_size, ...] (the actual shape depends on the type of cell you use but the important thing is that the first dimension is the batch size).
This way, dynamic_rnn has an initial hidden state for each sequence. It will pass on the hidden state from time step to time step for each sequence in the inputs parameter on its own, and hidden_state_out will contain the final output state for each sequence in the batch. No hidden state is passed between sequences of the same batch, but only between time steps of the same sequence.
When do I need to feed back the hidden state manually?
Usually, when you're training, every batch is unrelated so you don't have to feed back the hidden state when doing a session.run(output).
However, if you're testing, and you need the output at each time step, (i.e. you have to do a session.run() at every time step) you'll want to evaluate and feed back the output hidden state using something like this :
output, hidden_state = sess.run([output, hidden_state_out],
feed_dict={hidden_state_in:hidden_state})
otherwise tensorflow will just use the default cell.zero_state(batch_size, tf.float32) at each time step which equates to reinitialising the hidden state at each time step.
I have been working with the "dynamic_rnn" to create a model.
The model is based upon a 80 time period signal, and I want to zero the "initial_state" before each run so I have setup the following code fragment to accomplish this:
state = cell_L1.zero_state(self.BatchSize,Xinputs.dtype)
outputs, outState = rnn.dynamic_rnn(cell_L1,Xinputs,initial_state=state, dtype=tf.float32)
This works great for the training process. The problem is once I go to the inference, where my BatchSize = 1, I get an error back as the rnn "state" doesn't match the new Xinputs shape. So what I figured is I need to make "self.BatchSize" based upon the input batch size rather than hard code it. I tried many different approaches, and none of them have worked. I would rather not pass a bunch of zeros through the feed_dict as it is a constant based upon the batch size.
Here are some of my attempts. They all generally fail since the input size is unknown upon building the graph:
state = cell_L1.zero_state(Xinputs.get_shape()[0],Xinputs.dtype)
.....
state = tf.zeros([Xinputs.get_shape()[0], self.state_size], Xinputs.dtype, name="RnnInitializer")
Another approach, thinking the initializer might not get called until run-time, but still failed at graph build:
init = lambda shape, dtype: np.zeros(*shape)
state = tf.get_variable("state", shape=[Xinputs.get_shape()[0], self.state_size],initializer=init)
Is there a way to get this constant initial state to be created dynamically or do I need to reset it through the feed_dict with tensor-serving code? Is there a clever way to do this only once within the graph maybe with an tf.Variable.assign?
The solution to the problem was how to obtain the "batch_size" such that the variable is not hard coded.
This was the correct approach from the given example:
Xinputs = tf.placeholder(tf.int32, (None, self.sequence_size, self.num_params), name="input")
state = cell_L1.zero_state(Xinputs.get_shape()[0],Xinputs.dtype)
The problem is the use of "get_shape()[0]", this returns the "shape" of the tensor and takes the batch_size value at [0]. The documentation doesn't seem to be that clear, but this appears to be a constant value so when you load the graph into an inference, this value is still hard coded (maybe only evaluated at graph creation?).
Using the "tf.shape()" function, seems to do the trick. This doesn't return the shape, but a tensor. So this seems to be updated more at run-time. Using this code fragment solved the problem of a training batch of 128 and then loading the graph into TensorFlow-Service inference handling a batch of just 1.
Xinputs = tf.placeholder(tf.int32, (None, self.sequence_size, self.num_params), name="input")
batch_size = tf.shape(Xinputs)[0]
state = self.cell_L1.zero_state(batch_size,Xinputs.dtype)
Here is a good link to TensorFlow FAQ which describes this approach 'How do I build a graph that works with variable batch sizes?':
https://www.tensorflow.org/resources/faq