How to extract all weights from LSTM cell in vanila Tensorflow? - tensorflow

I am train LSTM network
cell_fw = tf.contrib.rnn.BasicLSTMCell(HIDDEN_SIZE)
cell_bw = tf.contrib.rnn.BasicLSTMCell(HIDDEN_SIZE)
rnn_outputs, final_state_fw, final_state_bw = tf.contrib.rnn.static_bidirectional_rnn(
cell_fw=cell_fw,
cell_bw=cell_bw,
inputs=rnn_inputs,
dtype=tf.float32
)
Further, I am try to save it coefficients:
d = {}
with tf.Session() as sess:
# train code ...
variables_names =[v.name for v in tf.global_variables()]
values = sess.run(variables_names)
for k,v in zip(variables_names, values):
d[k] = v
Dictionary d have only 2 objects from each LSTM cell:
[(k,v.shape) for (k,v) in sorted(d.items(), key=lambda x:x[0])]
[('bidirectional_rnn/bw/basic_lstm_cell/biases:0', (1024,)),
('bidirectional_rnn/bw/basic_lstm_cell/weights:0', (272, 1024)),
('bidirectional_rnn/fw/basic_lstm_cell/biases:0', (1024,)),
('bidirectional_rnn/fw/basic_lstm_cell/weights:0', (272, 1024)),
('char_embedding:0', (70, 16)),
('softmax_biases:0', (5068,)),
('softmax_weights:0', (5068, 512))]
I'm puzzled. Each LSTM cell should contain up to 4 trainable layers, or not? If so, how to get all weights from LSTM-cell??

the 4 weights (and biases) of a LSTM cell are stored as a single tensor, where slices along the second axis correspond to the different kind of weights (in gate, forget gate, ecc)
For instance, I guess that in your case the value of HIDDEN_SIZE is 256
To access the different parts, you should slice the tensors along the axis of length 1024 (but I don't know in which order the different kind of weights are stored...)

Related

Parameters computation of LSTM

I am trying to compute the total parameters of LSTM model, and I have some confusion.
I have searched some answers, such as this post and this post. I don't know how what's the role of hidden units play in the parameter computation(h1=64, h2=128 in my case).
import tensorflow as tf
b, t, d_in, d_out = 32, 256, 161, 257
data = tf.placeholder("float", [b, t, d_in]) # [batch, timestep, dim_in]
labels = tf.placeholder("float", [b, t, d_out]) # [batch, timestep, dim_out]
myinput = data
batch_size, seq_len, dim_in = myinput.shape
rnn_layers = []
h1 = 64
c1 = tf.nn.rnn_cell.LSTMCell(h1)
rnn_layers.append(c1)
h2 = 128
c2 = tf.nn.rnn_cell.LSTMCell(h1)
rnn_layers.append(c2)
multi_rnn_cell = tf.nn.rnn_cell.MultiRNNCell(rnn_layers)
rnnoutput, state = tf.nn.dynamic_rnn(cell=multi_rnn_cell,
inputs=myinput, dtype=tf.float32)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
all_trainable_vars = tf.reduce_sum([tf.reduce_prod(v.shape) for v in tf.trainable_variables()])
print(sess.run(all_trainable_vars))
I printed the total number of parameters using Tensorflow, it showed the total number of a parameter is 90880. How can I get this result step by step, Thank you
In your case, you defined a LSTM cell via this line c1 = tf.nn.rnn_cell.LSTMCell(h1). To answer your question, here I will introduce the mathematical definition of LSTM. Like the picture (picture source wikipedia-lstm) below,
t: means at time t.
f_t is named forget gate.
i_t is named input gate.
o_t is named are called.
c_t, h_t are named the cell state and hidden state of the LSTM cell, respectively.
For tf.nn.rnn_cell.LSTMCell(h1), h1=64 is the dimension of h_t, i.e. dim(h_t) = 64.

Understanding Tensorflow BasicLSTMCell Kernel and Bias shape

I want to understand better those shape of the Tensorflow´s BasicLSTMCell Kernel and Bias.
#tf_export("nn.rnn_cell.BasicLSTMCell")
class BasicLSTMCell(LayerRNNCell):
input_depth = inputs_shape[1].value
h_depth = self._num_units
self._kernel = self.add_variable(
_WEIGHTS_VARIABLE_NAME,
shape=[input_depth + h_depth, 4 * self._num_units])
self._bias = self.add_variable(
_BIAS_VARIABLE_NAME,
shape=[4 * self._num_units],
initializer=init_ops.zeros_initializer(dtype=self.dtype))
Why does the kernel have the shape=[input_depth + h_depth, 4 * self._num_units]) and the bias the shape = [4 * self._num_units] ? Maybe the factor 4 come from the forget gate, block input, input gate and output gate? And what´s the reason for the summation of input_depth and h_depth?
More information about my LSTM Network:
num_input = 12, timesteps = 820, num_hidden = 64, num_classes = 2.
With tf.trainables_variables() i get the following information:
Variable name: Variable:0 Shape: (64, 2) Parameters: 128
Variable name: Variable_1:0 Shape: (2,) Parameters: 2
Variable name: rnn/basic_lstm_cell/kernel:0 Shape: (76, 256) Parameters: 19456
Variable name: rnn/basic_lstm_cell/bias:0 Shape: (256,) Parameters: 256
The following Code defines my LSTM Network.
def RNN(x, weights, biases):
x = tf.unstack(x, timesteps, 1)
lstm_cell = rnn.BasicLSTMCell(num_hidden)
outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)
return tf.matmul(outputs[-1], weights['out']) + biases['out']
First, about summing input_depth and h_depth: RNNs generally follow equations like h_t = W*h_t-1 + V*x_t to compute the state h at time t. That is, we apply a matrix multiplication to the last state and the current input and add the two. This is actually equivalent to concatenating h_t-1 and x_t (let's just call this c), "stacking" the two matrices W and V (let's just call this S) and computing S*c.
Now we only have one matrix multiplication instead of two; I believe this can be parallelized more effectively so this is done for performance reasons. Since h_t-1 has size h_depth and x has size input_depth we need to add the two dimensionalities for the concatenated vector c.
Second, you are right about the factor 4 coming from the gates. This is essentially the same as above: Instead of carrying out four separate matrix multiplications for the input and each of the gates, we carry out one multiplication that results in a big vector that is the input and all four gate values concatenated. Then we can just split this vector into four parts. In the LSTM cell source code this happens in lines 627-633.

Tensorflow 1.0 LSTM Cell in dynamic_rnn throws dimension error

I am trying to implement an LSTM Model as a model_fn input to an Estimator. My X is only a .txt with a time series of prices. Before going into my first hidden layer, I try to define the lstm cell as:
def lstm_cell():
return tf.contrib.rnn.BasicLSTMCell(
size, forget_bias=0.0, state_is_tuple=True)
attn_cell = lstm_cell
if is_training and keep_prob < 1:
def attn_cell():
return tf.contrib.rnn.DropoutWrapper(
lstm_cell(), output_keep_prob=keep_prob)
cell = tf.contrib.rnn.MultiRNNCell([attn_cell() for _ in range(num_layers)], state_is_tuple=True)
initial_state = cell.zero_state(batch_size, data_type())
inputs = tf.unstack(X, num=num_steps, axis=0)
outputs = []
outputs, state = tf.nn.dynamic_rnn(cell, inputs,
initial_state=initial_state)
This then is supposed to go into:
first_hidden_layer = tf.contrib.layers.relu(outputs, 1000)
Unfortunately, it throws an error idicating that "ValueError: Dimension must be 1 but is 3 for 'transpose' (op: 'Transpose') with input shapes: [1], [3]."
I gather that my problem is the "inputs" tensor. In its description, the inputs variable is supposed to be a tensor with form [batch_size,max_time,...], but Ihave no idea how to translate this into above structure since, through the estimator, only input values X and target values y are fed to the system. So my question would be how to create a tensor that can serve as an inputs variable to the dynamic_rnn class.
Thanks a lot.
I believe you don't need the line:
inputs = tf.unstack(X, num=num_steps, axis=0)
you can supply X directly to dynamic_rnn since dynamic_rnn doesn't take a list of tensors; It takes one tensor where the time axis is dimension 0 (if time_major == True) or dimension 1 (if time_major == False).
Actually, it seems that X has 2 dimensions only, since inputs is list of 1 dimensional tensors (as indicated by the error message). so you should replace the unstack line with:
inputs = tf.expand_dims(X, axis=2)
This will add a 3rd dimension of size 1 that is needed by dynamic_rnn

how to stack LSTM layers using TensorFlow

what I have is the following, which I believe is a network with one hidden LSTM layer:
# Parameters
learning rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10
# Network Parameters
n_input = 13
n_steps = 10
n_hidden = 512
n_classes = 13
# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])
# Define weights
weights = {
'out' : tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
'out' : tf.Variable(tf.random_normal([n_classes]))
}
However, I am trying to build an LSTM network using TensorFlow to predict power consumption. I have been looking around to find a good example, but I could not find any model with 2 hidden LSTM layers. Here's the model that I would like to build:
1 input layer,
1 output layer,
2 hidden LSTM layers(with 512 neurons in each),
time step(sequence length): 10
Could anyone guide me to build this using TensorFlow? ( from defining weights, building input shape, training, predicting, use of optimizer or cost function, etc), any help would be much appreciated.
Thank you so much in advance!
Here is how I do it in a translation model with GRU cells. You can just replace the GRU with an LSTM. It is really easy just use tf.nn.rnn_cell.MultiRNNCell with a list of the multiple cells it should wrap. In the code bellow I am manually unrolling it but you can pass it to tf.nn.dynamic_rnn or tf.nn.rnn as well.
y = input_tensor
with tf.variable_scope('encoder') as scope:
rnn_cell = rnn.MultiRNNCell([rnn.GRUCell(1024) for _ in range(3)])
state = tf.zeros((BATCH_SIZE, rnn_cell.state_size))
output = [None] * TIME_STEPS
for t in reversed(range(TIME_STEPS)):
y_t = tf.reshape(y[:, t, :], (BATCH_SIZE, -1))
output[t], state = rnn_cell(y_t, state)
scope.reuse_variables()
y = tf.pack(output, 1)
First you need some placeholders to put your training data (one batch)
x_input = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])
y_output = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])
A LSTM need a state, which consists of two components, the hidden state and the cell state, very good guide here: https://arxiv.org/pdf/1506.00019.pdf. For every layer in the LSTM you have one cell state and one hidden state.
The problem is that Tensorflow stores this in a LSTMStateTuple which you can not send into placeholder. So you need to store it in a Tensor, and then unpack it into a tuple:
state_placeholder = tf.placeholder(tf.float32, [num_layers, 2, batch_size, state_size])
l = tf.unpack(state_placeholder, axis=0)
rnn_tuple_state = tuple(
[tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1])
for idx in range(num_layers)]
)
Then you can use the built-in Tensorflow API to create the stacked LSTM layer.
cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
cell = tf.nn.rnn_cell.MultiRNNCell([cell]*num_layers, state_is_tuple=True)
outputs, state = tf.nn.dynamic_rnn(cell, x_input, initial_state=rnn_tuple_state)
From here you continue with the outputs to calculate logits and then a loss with respect to the y_inputs.
Then you run each batch with the sess.run-command, with truncated backpropagation (good explanation here http://r2rt.com/styles-of-truncated-backpropagation.html)
init_state = np.zeros((num_layers, 2, batch_size, state_size))
...current_state... = sess.run([...state...], feed_dict={x_input:batch_in, state_placeholder:current_state ...})
current_state = np.array(current_state)
You will have to convert the state to a numpy array before feeding it again.
Perhaps it is better to use a librarly like Tflearn or Keras instead?

RNN & Batches in Tensorflow

The batche approach for RNN in Tensorflow is not clear to me. For example tf.nn.rnn Take as input list of Tensors [BATCH_SIZE x INPUT_SIZE]. We normally are feeding to session batches of data, so why it take list of batches not single batch?
This leads to next confusion for me:
data = []
for _ in range(0, len(train_input)):
data.append(tf.placeholder(tf.float32, [CONST_BATCH_SIZE, CONST_INPUT_SIZE]))
lstm = tf.nn.rnn_cell.BasicLSTMCell(CONST_NUM_OF_HIDDEN_STATES)
val, state = tf.nn.rnn(lstm, data, dtype=tf.float32)
I pass list of Tensors [CONST_BATCH_SIZE x CONST_INPUT_OTPUT_SIZE] to tf.nn.rnn and got output value that is list of Tensors [CONST_BATCH_SIZE x CONST_NUM_OF_HIDDEN_STATES]. Now I want to use softmax for all HIDDEN_STATES outputs and need to calculate weights with matmaul + bias
Should I use for matmul:
weight = tf.Variable(tf.zeros([CONST_NUM_OF_HIDDEN_STATES, CONST_OTPUT_SIZE]))
for i in val:
mult = tf.matmul(i, weight)
bias = tf.Variable(tf.zeros([CONST_OTPUT_SIZE]))
prediction = tf.nn.softmax(mult + bias)
Or should I create 2D array from val and then use tf.matmul without for?
This should work. output is batched data from RNN. For all the batch input probs will have the probability.
logits = tf.matmul(output, softmax_w) + softmax_b
probs = tf.nn.softmax(logits)