How to use multilayered bidirectional LSTM in Tensorflow? - tensorflow

I want to know how to use multilayered bidirectional LSTM in Tensorflow.
I have already implemented the contents of bidirectional LSTM, but I wanna compare this model with the model added multi-layers.
How should I add some code in this part?
x = tf.unstack(tf.transpose(x, perm=[1, 0, 2]))
#print(x[0].get_shape())
# Define lstm cells with tensorflow
# Forward direction cell
lstm_fw_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Backward direction cell
lstm_bw_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Get lstm cell output
try:
outputs, _, _ = rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,
dtype=tf.float32)
except Exception: # Old TensorFlow version only returns outputs not states
outputs = rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,
dtype=tf.float32)
# Linear activation, using rnn inner loop last output
outputs = tf.stack(outputs, axis=1)
outputs = tf.reshape(outputs, (batch_size*n_steps, n_hidden*2))
outputs = tf.matmul(outputs, weights['out']) + biases['out']
outputs = tf.reshape(outputs, (batch_size, n_steps, n_classes))

You can use two different approaches to apply multilayer bilstm model:
1) use out of previous bilstm layer as input to the next bilstm. In the beginning you should create the arrays with forward and backward cells of length num_layers. And
for n in range(num_layers):
cell_fw = cell_forw[n]
cell_bw = cell_back[n]
state_fw = cell_fw.zero_state(batch_size, tf.float32)
state_bw = cell_bw.zero_state(batch_size, tf.float32)
(output_fw, output_bw), last_state = tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, output,
initial_state_fw=state_fw,
initial_state_bw=state_bw,
scope='BLSTM_'+ str(n),
dtype=tf.float32)
output = tf.concat([output_fw, output_bw], axis=2)
2) Also worth a look at another approach stacked bilstm.

This is primarily same as the first answer but with a little variation of usage of scope name and with added dropout wrappers. It also takes care of the error the first answer gives about variable scope.
def bidirectional_lstm(input_data, num_layers, rnn_size, keep_prob):
output = input_data
for layer in range(num_layers):
with tf.variable_scope('encoder_{}'.format(layer),reuse=tf.AUTO_REUSE):
# By giving a different variable scope to each layer, I've ensured that
# the weights are not shared among the layers. If you want to share the
# weights, you can do that by giving variable_scope as "encoder" but do
# make sure first that reuse is set to tf.AUTO_REUSE
cell_fw = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.truncated_normal_initializer(-0.1, 0.1, seed=2))
cell_fw = tf.contrib.rnn.DropoutWrapper(cell_fw, input_keep_prob = keep_prob)
cell_bw = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.truncated_normal_initializer(-0.1, 0.1, seed=2))
cell_bw = tf.contrib.rnn.DropoutWrapper(cell_bw, input_keep_prob = keep_prob)
outputs, states = tf.nn.bidirectional_dynamic_rnn(cell_fw,
cell_bw,
output,
dtype=tf.float32)
# Concat the forward and backward outputs
output = tf.concat(outputs,2)
return output

On top of Taras's answer. Here is another example using just 2-layer Bidirectional RNN with GRU cells
embedding_weights = tf.Variable(tf.random_uniform([vocabulary_size, state_size], -1.0, 1.0))
embedding_vectors = tf.nn.embedding_lookup(embedding_weights, tokens)
#First BLSTM
cell = tf.nn.rnn_cell.GRUCell(state_size)
cell = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=1-dropout)
(forward_output, backward_output), _ = \
tf.nn.bidirectional_dynamic_rnn(cell, cell, inputs=embedding_vectors,
sequence_length=lengths, dtype=tf.float32,scope='BLSTM_1')
outputs = tf.concat([forward_output, backward_output], axis=2)
#Second BLSTM using the output of previous layer as an input.
cell2 = tf.nn.rnn_cell.GRUCell(state_size)
cell2 = tf.nn.rnn_cell.DropoutWrapper(cell2, output_keep_prob=1-dropout)
(forward_output, backward_output), _ = \
tf.nn.bidirectional_dynamic_rnn(cell2, cell2, inputs=outputs,
sequence_length=lengths, dtype=tf.float32,scope='BLSTM_2')
outputs = tf.concat([forward_output, backward_output], axis=2)
BTW, don't forget to add different scope name. Hope this help.

As #Taras pointed out, you can use:
(1) tf.nn.bidirectional_dynamic_rnn()
(2) tf.contrib.rnn.stack_bidirectional_dynamic_rnn().
All previous answers only capture (1), so I give some details on (2), in particular since it usually outperforms (1). For an intuition about the different connectivities
see here.
Let's say you want to create a stack of 3 BLSTM layers, each with 64 nodes:
num_layers = 3
num_nodes = 64
# Define LSTM cells
enc_fw_cells = [LSTMCell(num_nodes)for layer in range(num_layers)]
enc_bw_cells = [LSTMCell(num_nodes) for layer in range(num_layers)]
# Connect LSTM cells bidirectionally and stack
(all_states, fw_state, bw_state) = tf.contrib.rnn.stack_bidirectional_dynamic_rnn(
cells_fw=enc_fw_cells, cells_bw=enc_bw_cells, inputs=input_embed, dtype=tf.float32)
# Concatenate results
for k in range(num_layers):
if k == 0:
con_c = tf.concat((fw_state[k].c, bw_state[k].c), 1)
con_h = tf.concat((fw_state[k].h, bw_state[k].h), 1)
else:
con_c = tf.concat((con_c, fw_state[k].c, bw_state[k].c), 1)
con_h = tf.concat((con_h, fw_state[k].h, bw_state[k].h), 1)
output = tf.contrib.rnn.LSTMStateTuple(c=con_c, h=con_h)
In this case, I use the final states of the stacked biRNN rather than the states at all timesteps (saved in all_states), since I was using an encoding decoding scheme, where the above code was only the encoder.

Related

Tensorflow reusing of Multi-Layered LSTM Network

I am trying to use same LSTM architecture for different inputs and hence passing the same cells while unfolding the bidirectional LSTM while unfolding different inputs. I am not sure if it's creating two whole different LSTM Networks. It looks like there are two different nodes in my Graph. My code and graph looks something like this:
def get_multirnn_cell(self):
cells = []
for _ in range(config.n_layers):
cell = tf.nn.rnn_cell.LSTMCell(config.n_hidden, initializer=tf.glorot_uniform_initializer())
dropout_cell = tf.nn.rnn_cell.DropoutWrapper(cell=cell,
input_keep_prob=config.keep_prob,
output_keep_prob=config.keep_prob)
cells.append(dropout_cell)
return cells
def add_lstm_op(self):
with tf.variable_scope('lstm'):
cells_fw = self.get_multirnn_cell()
cells_bw = self.get_multirnn_cell()
cell_fw = tf.nn.rnn_cell.MultiRNNCell(cells_fw)
cell_bw = tf.nn.rnn_cell.MultiRNNCell(cells_bw)
(_, _), (state_one_fw, state_one_bw) = tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw,
inputs=self.question_one,
sequence_length=self.seql_one,
dtype=tf.float32)
self.state_one = tf.concat([state_one_fw[-1].h, state_one_bw[-1].h], name='state_one', axis=-1)
# self.state_one = tf.concat([state_one_fw, state_one_bw], axis=-1)
# [batch_size, 2*hidden_size]
(_, _), (state_two_fw, state_two_bw) = tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw,
inputs=self.question_two,
sequence_length=self.seql_two,
dtype=tf.float32)
self.state_two = tf.concat([state_two_fw[-1].h, state_two_bw[-1].h], name='state_two', axis=-1)
If you want to reuse the multirnn_cell, you could pass a reuse=tf.AUTO_REUSE for the variable_scope.
with tf.variable_scope('lstm', reuse=tf.AUTO_REUSE)
See the doc.

TensorFlow: How to embed float sequences to fixed size vectors?

I am looking methods to embed variable length sequences with float values to fixed size vectors. The input formats as following:
[f1,f2,f3,f4]->[f1,f2,f3,f4]->[f1,f2,f3,f4]-> ... -> [f1,f2,f3,f4]
[f1,f2,f3,f4]->[f1,f2,f3,f4]->[f1,f2,f3,f4]->[f1,f2,f3,f4]-> ... -> [f1,f2,f3,f4]
...
[f1,f2,f3,f4]-> ... -> ->[f1,f2,f3,f4]
Each line is a variable length sequnece, with max length 60. Each unit in one sequece is a tuple of 4 float values. I have already paded zeros to fill all sequences to the same length.
The following architecture seems solve my problem if I use the output as the same as input, I need the thought vector in the center as the embedding for the sequences.
In tensorflow, I have found tow candidate methods tf.contrib.legacy_seq2seq.basic_rnn_seq2seq and tf.contrib.legacy_seq2seq.embedding_rnn_seq2seq.
However, these tow methos seems to be used to solve NLP problem, and the input must be discrete value for words.
So, is there another functions to solve my problems?
All you need is only an RNN, not the seq2seq model, since seq2seq goes with an additional decoder which is unecessary in your case.
An example code:
import numpy as np
import tensorflow as tf
from tensorflow.contrib import rnn
input_size = 4
max_length = 60
hidden_size=64
output_size = 4
x = tf.placeholder(tf.float32, shape=[None, max_length, input_size], name='x')
seqlen = tf.placeholder(tf.int64, shape=[None], name='seqlen')
lstm_cell = rnn.BasicLSTMCell(hidden_size, forget_bias=1.0)
outputs, states = tf.nn.dynamic_rnn(cell=lstm_cell, inputs=x, sequence_length=seqlen, dtype=tf.float32)
encoded_states = states[-1]
W = tf.get_variable(
name='W',
shape=[hidden_size, output_size],
dtype=tf.float32,
initializer=tf.random_normal_initializer())
b = tf.get_variable(
name='b',
shape=[output_size],
dtype=tf.float32,
initializer=tf.random_normal_initializer())
z = tf.matmul(encoded_states, W) + b
results = tf.sigmoid(z)
###########################
## cost computing and training components goes here
# e.g.
# targets = tf.placeholder(tf.float32, shape=[None, input_size], name='targets')
# cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=targets, logits=z))
# optimizer = tf.train.AdamOptimizer(learning_rate=0.1).minimize(cost)
###############################
init = tf.global_variables_initializer()
batch_size = 4
data_in = np.zeros((batch_size, max_length, input_size), dtype='float32')
data_in[0, :4, :] = np.random.rand(4, input_size)
data_in[1, :6, :] = np.random.rand(6, input_size)
data_in[2, :20, :] = np.random.rand(20, input_size)
data_in[3, :, :] = np.random.rand(60, input_size)
data_len = np.asarray([4, 6, 20, 60], dtype='int64')
with tf.Session() as sess:
sess.run(init)
#########################
# training process goes here
#########################
res = sess.run(results,
feed_dict={
x: data_in,
seqlen: data_len})
print(res)
To encode sequence to a fixed length vector you typically use recurrent neural networks (RNNs) or convolutional neural networks (CNNs).
If you use a recurrent neural network you can use the output at the last time step (last element in your sequence). This corresponds to the thought vector in your question. Have a look at tf.dynamic_rnn. dynamic_rnn requires you to specify to type of RNN cell you want to use. tf.contrib.rnn.LSTMCell and tf.contrib.rnn.GRUCell are most common.
If you want to use CNNs you need to use 1 dimensional convolutions. To build CNNs you need tf.layers.conv1d and tf.layers.max_pooling1d
I have found a solution to my problem, using the following architecture,
,
The LSTMs layer below encode the series x1,x2,...,xn. The last output, the green one, is duplicated to the same count as the input for the decoding LSTM layers above. The tensorflow code is as following
series_input = tf.placeholder(tf.float32, [None, conf.max_series, conf.series_feature_num])
print("Encode input Shape", series_input.get_shape())
# encoding layer
encode_cell = tf.contrib.rnn.MultiRNNCell(
[tf.contrib.rnn.BasicLSTMCell(conf.rnn_hidden_num, reuse=False) for _ in range(conf.rnn_layer_num)]
)
encode_output, _ = tf.nn.dynamic_rnn(encode_cell, series_input, dtype=tf.float32, scope='encode')
print("Encode output Shape", encode_output.get_shape())
# last output
encode_output = tf.transpose(encode_output, [1, 0, 2])
last = tf.gather(encode_output, int(encode_output.get_shape()[0]) - 1)
# duplite the last output of the encoding layer
decoder_input = tf.stack([last for _ in range(conf.max_series)], axis=1)
print("Decoder input shape", decoder_input.get_shape())
# decoding layer
decode_cell = tf.contrib.rnn.MultiRNNCell(
[tf.contrib.rnn.BasicLSTMCell(conf.series_feature_num, reuse=False) for _ in range(conf.rnn_layer_num)]
)
decode_output, _ = tf.nn.dynamic_rnn(decode_cell, decoder_input, dtype=tf.float32, scope='decode')
print("Decode output", decode_output.get_shape())
# Loss Function
loss = tf.losses.mean_squared_error(labels=series_input, predictions=decode_output)
print("Loss", loss)

How to find intermediate outputs of LSTM by running tf.nn.dynamic_rnn in tensorflow

I am new to tensorflow and have recently read about LSTM from various blogs like Understanding LSTM Networks, Colah, The Unreasonable Effectiveness of Recurrent Neural Networks, Karparthy etc.
I found this Code on the web:
import numpy as np
import tensorflow as tf
def length(sequence):
used = tf.sign(tf.reduce_max(tf.abs(sequence), reduction_indices=2))
length = tf.reduce_sum(used, reduction_indices=1)
length = tf.cast(length, tf.int32)
return length
num_neurons = 10
num_layers = 3
max_length = 8
frame_size = 5
# dropout = tf.placeholder(tf.float32)
cell = tf.contrib.rnn.LSTMCell(num_neurons, state_is_tuple= True)
# cell = DropoutWrapper(cell, output_keep_prob=dropout)
cell = tf.contrib.rnn.MultiRNNCell([cell] * num_layers)
sequence = tf.placeholder(tf.float32, [None, max_length, frame_size])
output, state = tf.nn.dynamic_rnn(
cell,
sequence,
dtype=tf.float32,
sequence_length=length(sequence),
)
if __name__ == '__main__':
sample = np.random.random((8, max_length, frame_size)) + 0.1
# sample[np.ix_([0,1],range(50,max_length))] = 0
# drop = 0.2
with tf.Session() as sess:
init_op = init_op = tf.global_variables_initializer()
sess.run(init_op)
o, s = sess.run([output, state], feed_dict={sequence: sample})
# print "Output shape is ", o.shape()
# print "state shape is ", s.shape()
print "Output is ", o
print "State is ", s
Pertaining to the above code with state_is_tuple= True, I have some doubts.
Q. What is the simple meaning of outputs and state which tf.nn.dynamic_rnn returns.
I read on the internet that output is the output of last layer at several time steps and
state is the final state.
My intermediate doubt is, what do we mean by "output of last layer at several time steps"
I looked into dynamic_rnn code as my main task is to find
(https://github.com/tensorflow/tensorflow/blob/r1.1/tensorflow/python/ops/rnn.py)
Q. ***All the intermediate output of LSTM by calling dynamic_rnn in the same fashion as the above code. How can I do it.
I also read dynamic_rnn internally calls _dynamic_rnn.
This _dynamic_rnn returns final_output and final_state. Apart from final_output. I want all the intermediate outputs.
My take is to write custom _dynamic_rnn as defined in
https://github.com/tensorflow/tensorflow/blob/r1.1/tensorflow/python/ops/rnn.py
Please help.

Tensorflow: stacked bidirectional LSTMs

I want to stack two LSTMs without using MultiRNN wrapper. However, following code results with ValueError: Shapes (3,) and (2,) are not compatible because of inputs=states_fw_1 in the second LSTM. How can I pass hidden state of the first LSTM as input to the second?
LSTM 1
with tf.name_scope("BiLSTM_1"):
with tf.variable_scope('forward_1'):
cell_fw_1 = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
with tf.variable_scope('backward_srl'):
cell_bw_srl = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
outputs_1, states_1 = tf.nn.bidirectional_dynamic_rnn(
cell_fw=cell_fw_1,
cell_bw=cell_bw_1,
dtype=tf.float64,
sequence_length=self.input_seq_len,
inputs=self.embedded_input_layer,
scope='BiLSTM_1')
State is tuple
states_fw_1, states_bw_1 = states
LSTM 2
with tf.name_scope("BiLSTM_2"):
with tf.variable_scope('forward'):
cell_fw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
with tf.variable_scope('backward'):
cell_bw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
outputs, states = tf.nn.bidirectional_dynamic_rnn(
cell_fw=cell_fw,
cell_bw=cell_bw,
dtype=tf.float64,
sequence_length=self.input_seq_len,
inputs=states_fw_1,
scope="BiLSTM_extraction")
I'm learning TF 2 days (so I'm not pro-guy) and I found this problem to be interested to resolve.
Here are my findings:
You want to do thing which is not possible to obtain using 'LSTMCell' implementation. Here is why:
You want to feed the "states_fw_1 to the next BI-LSTM. So, first question should be: What are dimensions of "states_fw_1"? For any RNN implementation you need [batch_size, seq_len, input_size]. For "states_fw_1" it is [batch_size, hidden_size] (I have just check the size of "states_fw_1" running below code). So you can see that your output does not fit to RNN requirements. It is because model output just one the last state of LSTM cell, not all the history (see the documentation). And you are not interested in last state, because you want feed state[t-step] to the layer above.The 'state_fw_1' is useful when you want to classify the sequence (not each element in sequence)
Edit: 'state_fw_1' contain the last "hidden_state" and last "memory_cell". For classification only "hidden_state" will be usefull, I think.
So you just need to use the merged output (from forward and backward pass) . And 'LSTMCell' cell output have size [batch_size, seq_len, hidden_size*2], (*2 as forward and backward) so it is right for next stacked RNN (output come from each time-step, not like the state).
Here is the code which I was testing:
import tensorflow as tf
import numpy as np
hidden_size = 21
seq_len = tf.placeholder(tf.int32, [None])
inputs = tf.placeholder(tf.float32, [None, None, 32])
with tf.variable_scope('BiLSTM_1'):
with tf.variable_scope('forward_1'):
cell_fw_1 = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
with tf.variable_scope('backward_srl'):
cell_bw_1 = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
outputs_1, states_1 = tf.nn.bidirectional_dynamic_rnn(
cell_fw=cell_fw_1,
cell_bw=cell_bw_1,
dtype=tf.float32,
sequence_length=seq_len,
inputs=inputs,
scope='BiLSTM_1')
# Merge Output tensor from forward and backward pass. It size is [batch_size, seq_len, 2*hidden_size]
outputs_1 = tf.concat(outputs_1, 2)
with tf.name_scope("BiLSTM_2"):
with tf.variable_scope('forward'):
cell_fw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
with tf.variable_scope('backward'):
cell_bw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
outputs, states = tf.nn.bidirectional_dynamic_rnn(
cell_fw=cell_fw,
cell_bw=cell_bw,
dtype=tf.float32,
sequence_length=seq_len,
inputs=outputs_1,
scope="BiLSTM_2")
# Initializate the weights and biases
init = tf.initialize_all_variables()
batch_size = 5
seq_len_val = 10
train_inputs = np.zeros((batch_size, seq_len_val, 32))
train_seq_len = np.ones(batch_size) * seq_len_val
with tf.Session() as session:
session.run(init)
feed = {inputs: train_inputs, seq_len: train_seq_len}
out,state,state_1 = session.run([outputs,states, states_1],feed)
print ("State size: ", state_1[0].c.shape, " Out Size: ", out[0][0].shape)
print ("Batch_size: ", batch_size, " Sequence Len: ", seq_len_val, " Hidden Size: ", hidden_size)
'outputs_1' returned by LSTM 1 is a tuple containing 'outputs_fw' and 'outputs_bw'.
'outputs_fw' and 'outputs_bw' will be of dimension: [batch_size, sequence_length, hidden_size].
You have to concatenate 'outputs_fw' and 'outputs_bw' hidden states (us tf.concat with axis=2) and pass that as input to LSTM 2 instead of passing 'states_fw_1' as input to LSTM 2.

how to stack LSTM layers using TensorFlow

what I have is the following, which I believe is a network with one hidden LSTM layer:
# Parameters
learning rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10
# Network Parameters
n_input = 13
n_steps = 10
n_hidden = 512
n_classes = 13
# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])
# Define weights
weights = {
'out' : tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
'out' : tf.Variable(tf.random_normal([n_classes]))
}
However, I am trying to build an LSTM network using TensorFlow to predict power consumption. I have been looking around to find a good example, but I could not find any model with 2 hidden LSTM layers. Here's the model that I would like to build:
1 input layer,
1 output layer,
2 hidden LSTM layers(with 512 neurons in each),
time step(sequence length): 10
Could anyone guide me to build this using TensorFlow? ( from defining weights, building input shape, training, predicting, use of optimizer or cost function, etc), any help would be much appreciated.
Thank you so much in advance!
Here is how I do it in a translation model with GRU cells. You can just replace the GRU with an LSTM. It is really easy just use tf.nn.rnn_cell.MultiRNNCell with a list of the multiple cells it should wrap. In the code bellow I am manually unrolling it but you can pass it to tf.nn.dynamic_rnn or tf.nn.rnn as well.
y = input_tensor
with tf.variable_scope('encoder') as scope:
rnn_cell = rnn.MultiRNNCell([rnn.GRUCell(1024) for _ in range(3)])
state = tf.zeros((BATCH_SIZE, rnn_cell.state_size))
output = [None] * TIME_STEPS
for t in reversed(range(TIME_STEPS)):
y_t = tf.reshape(y[:, t, :], (BATCH_SIZE, -1))
output[t], state = rnn_cell(y_t, state)
scope.reuse_variables()
y = tf.pack(output, 1)
First you need some placeholders to put your training data (one batch)
x_input = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])
y_output = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])
A LSTM need a state, which consists of two components, the hidden state and the cell state, very good guide here: https://arxiv.org/pdf/1506.00019.pdf. For every layer in the LSTM you have one cell state and one hidden state.
The problem is that Tensorflow stores this in a LSTMStateTuple which you can not send into placeholder. So you need to store it in a Tensor, and then unpack it into a tuple:
state_placeholder = tf.placeholder(tf.float32, [num_layers, 2, batch_size, state_size])
l = tf.unpack(state_placeholder, axis=0)
rnn_tuple_state = tuple(
[tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1])
for idx in range(num_layers)]
)
Then you can use the built-in Tensorflow API to create the stacked LSTM layer.
cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
cell = tf.nn.rnn_cell.MultiRNNCell([cell]*num_layers, state_is_tuple=True)
outputs, state = tf.nn.dynamic_rnn(cell, x_input, initial_state=rnn_tuple_state)
From here you continue with the outputs to calculate logits and then a loss with respect to the y_inputs.
Then you run each batch with the sess.run-command, with truncated backpropagation (good explanation here http://r2rt.com/styles-of-truncated-backpropagation.html)
init_state = np.zeros((num_layers, 2, batch_size, state_size))
...current_state... = sess.run([...state...], feed_dict={x_input:batch_in, state_placeholder:current_state ...})
current_state = np.array(current_state)
You will have to convert the state to a numpy array before feeding it again.
Perhaps it is better to use a librarly like Tflearn or Keras instead?