combining DropoutWrapper and ResidualWrapper with variational_recurrent=True - tensorflow

I'm trying to create a MultiRNNCell of LSTM cells wrapped with both DropoutWrapper and ResidualWrapper. For using variational_recurrent=True, we must provide input_size parameter to DropoutWrapper. I'm not able figure out what input_size should be passed to each LSTM layer, since ResidualWrapper also adds skip connections to augment the input at each layer.
I'm using the following utility function to create one LSTM layer:
def create_cell(units, residual_connections, keep_prob, input_size):
lstm_cell = tf.nn.rnn_cell.LSTMCell(units,
activation=tf.nn.tanh,
initializer=tf.truncated_normal_initializer(),
cell_clip=5.)
lstm_cell = tf.nn.rnn_cell.DropoutWrapper(lstm_cell,
dtype=tf.float32,
input_keep_prob=keep_prob,
output_keep_prob=keep_prob,
state_keep_prob=keep_prob,
variational_recurrent=True,
input_size=input_size)
if residual_connections:
lstm_cell = tf.nn.rnn_cell.ResidualWrapper(lstm_cell)
return lstm_cell
And the following code to create the complete cell:
net = tf.layers.dense(inputs,
128,
activation=tf.nn.relu,
kernel_initializer=tf.variance_scaling_initializer())
net = tf.layers.batch_normalization(net, training=training)
cells = [create_cell(64, False, keep_prob, ??)]
for _ in range(5):
cells.append(create_cell(64, True, keep_prob, ??))
multirnn_cell = tf.nn.rnn_cell.MultiRNNCell(cells)
net, rnn_s1 = tf.nn.dynamic_rnn(cell=multirnn_cell, inputs=net, initial_state=rnn_s0, dtype=tf.float32)
What values should be passed to input_size for first and subsequent LSTM layers?

Related

Keras Model fit throws shape mismatch error

I am building a Siamese network using Keras(TensorFlow) where the target is a binary column, i.e., match or mismatch(1 or 0). But the model fit method throws an error saying that the y_pred is not compatible with the y_true shape. I am using the binary_crossentropy loss function.
Here is the error I see:
Here is the code I am using:
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[tf.keras.metrics.Recall()])
history = model.fit([X_train_entity_1.todense(),X_train_entity_2.todense()],np.array(y_train),
epochs=2,
batch_size=32,
verbose=2,
shuffle=True)
My Input data shapes are as follows:
Inputs:
X_train_entity_1.shape is (700,2822)
X_train_entity_2.shape is (700,2822)
Target:
y_train.shape is (700,1)
In the error it throws, y_pred is the variable which was created internally. What is y_pred dimension is 2822 when I am having a binary target. And 2822 dimension actually matches the input size, but how do I understand this?
Here is the model I created:
in_layers = []
out_layers = []
for i in range(2):
input_layer = Input(shape=(1,))
embedding_layer = Embedding(embed_input_size+1, embed_output_size)(input_layer)
lstm_layer_1 = Bidirectional(LSTM(1024, return_sequences=True,recurrent_dropout=0.2, dropout=0.2))(embedding_layer)
lstm_layer_2 = Bidirectional(LSTM(512, return_sequences=True,recurrent_dropout=0.2, dropout=0.2))(lstm_layer_1)
in_layers.append(input_layer)
out_layers.append(lstm_layer_2)
merge = concatenate(out_layers)
dense1 = Dense(256, activation='relu', kernel_initializer='he_normal', name='data_embed')(merge)
drp1 = Dropout(0.4)(dense1)
btch_norm1 = BatchNormalization()(drp1)
dense2 = Dense(32, activation='relu', kernel_initializer='he_normal')(btch_norm1)
drp2 = Dropout(0.4)(dense2)
btch_norm2 = BatchNormalization()(drp2)
output = Dense(1, activation='sigmoid')(btch_norm2)
model = Model(inputs=in_layers, outputs=output)
model.summary()
Since my data is very sparse, I used todense. And there the type is as follows:
type(X_train_entity_1) is scipy.sparse.csr.csr_matrix
type(X_train_entity_1.todense()) is numpy.matrix
type(X_train_entity_2) is scipy.sparse.csr.csr_matrix
type(X_train_entity_2.todense()) is numpy.matrix
Summary of last few layers as follows:
Mismatched shape in the Input layer. The input shape needs to match the shape of a single element passed as x, or dataset.shape[1:]. So since your dataset size is (700,2822), that is 700 samples of size 2822. So your input shape should be 2822.
Change:
input_layer = Input(shape=(1,))
To:
input_layer = Input(shape=(2822,))
You need to set return_sequences in the lstm_layer_2 to False:
lstm_layer_2 = Bidirectional(LSTM(512, return_sequences=False, recurrent_dropout=0.2, dropout=0.2))(lstm_layer_1)
Otherwise, you will still have the timesteps of your input. That is why you have the shape (None, 2822, 1). You can also add a Flatten layer prior to your output layer, but I would recommend setting return_sequences=False.
Note that a Dense layer computes the dot product between the inputs and the kernel along the last axis of the inputs.

Multilayer LSTM without tf.contrib.rnn.MultiRNNCell

To implement a multilayer LSTM network, I usually use the following code:
def lstm_cell():
return tf.contrib.rnn.LayerNormBasicLSTMCell(model_settings['rnn_size'])
attn_cell = lstm_cell
def attn_cell():
return tf.contrib.rnn.DropoutWrapper(lstm_cell(), output_keep_prob=0.7)
cell = tf.contrib.rnn.MultiRNNCell([attn_cell() for _ in range(num_layers)], state_is_tuple=True)
outputs_, _ = tf.nn.dynamic_rnn(cell, x, dtype=tf.float32)
But, this way, I do not have access to the hidden layer outputs, if I want to manipulate the arrangements of the hidden layer outputs.
Is there any other way to make a multilayer LSTM network without using tf.contrib.rnn.MultiRNNCell?
You can simply stack several LSTM layers, for example via the Sequential module:
model = Sequential()
model.add(layers.LSTM(..., return_sequences = True, input_shape = (...)))
model.add(layers.LSTM(..., return_sequences = True)
...
model.add(layers.LSTM(...))
In this case the return sequences keyword is crucial for the intermediate layers.

tensorflow RNN implementation

I'm building a RNN model to do the image classification. I used a pipeline to feed in the data. However it returns
ValueError: Variable rnn/rnn/basic_rnn_cell/weights already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:
I wonder what can I do to fix this since there are not many examples of implementing RNN with an input pipeline. I know it would work if I use the placeholder, but my data is already in the form of tensors. Unless I can feed the placeholder with tensors, I prefer just to use the pipeline.
def RNN(inputs):
with tf.variable_scope('cells', reuse=True):
basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=batch_size)
with tf.variable_scope('rnn'):
outputs, states = tf.nn.dynamic_rnn(basic_cell, inputs, dtype=tf.float32)
fc_drop = tf.nn.dropout(states, keep_prob)
logits = tf.contrib.layers.fully_connected(fc_drop, batch_size, activation_fn=None)
return logits
#Training
with tf.name_scope("cost_function") as scope:
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=train_label_batch, logits=RNN(train_batch)))
train_step = tf.train.MomentumOptimizer(learning_rate, 0.9).minimize(cost)
#Accuracy
with tf.name_scope("accuracy") as scope:
correct_prediction = tf.equal(tf.argmax(RNN(test_image), 1), tf.argmax(test_image_label, 0))
accuracy = tf.cast(correct_prediction, tf.float32)
You need to use the reuse option correctly. following changes would solve it. For prediction you need to use the already existed variables in the graph.
def RNN(inputs, reuse):
with tf.variable_scope('cells', reuse=reuse):
basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=batch_size, reuse=reuse)
...
...
#Training
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=train_label_batch, logits=RNN(train_batch, reuse=None)))
#Accuracy
...
correct_prediction = tf.equal(tf.argmax(RNN(test_image, reuse=True), 1), tf.argmax(test_image_label, 0))

Tensorflow: stacked bidirectional LSTMs

I want to stack two LSTMs without using MultiRNN wrapper. However, following code results with ValueError: Shapes (3,) and (2,) are not compatible because of inputs=states_fw_1 in the second LSTM. How can I pass hidden state of the first LSTM as input to the second?
LSTM 1
with tf.name_scope("BiLSTM_1"):
with tf.variable_scope('forward_1'):
cell_fw_1 = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
with tf.variable_scope('backward_srl'):
cell_bw_srl = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
outputs_1, states_1 = tf.nn.bidirectional_dynamic_rnn(
cell_fw=cell_fw_1,
cell_bw=cell_bw_1,
dtype=tf.float64,
sequence_length=self.input_seq_len,
inputs=self.embedded_input_layer,
scope='BiLSTM_1')
State is tuple
states_fw_1, states_bw_1 = states
LSTM 2
with tf.name_scope("BiLSTM_2"):
with tf.variable_scope('forward'):
cell_fw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
with tf.variable_scope('backward'):
cell_bw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
outputs, states = tf.nn.bidirectional_dynamic_rnn(
cell_fw=cell_fw,
cell_bw=cell_bw,
dtype=tf.float64,
sequence_length=self.input_seq_len,
inputs=states_fw_1,
scope="BiLSTM_extraction")
I'm learning TF 2 days (so I'm not pro-guy) and I found this problem to be interested to resolve.
Here are my findings:
You want to do thing which is not possible to obtain using 'LSTMCell' implementation. Here is why:
You want to feed the "states_fw_1 to the next BI-LSTM. So, first question should be: What are dimensions of "states_fw_1"? For any RNN implementation you need [batch_size, seq_len, input_size]. For "states_fw_1" it is [batch_size, hidden_size] (I have just check the size of "states_fw_1" running below code). So you can see that your output does not fit to RNN requirements. It is because model output just one the last state of LSTM cell, not all the history (see the documentation). And you are not interested in last state, because you want feed state[t-step] to the layer above.The 'state_fw_1' is useful when you want to classify the sequence (not each element in sequence)
Edit: 'state_fw_1' contain the last "hidden_state" and last "memory_cell". For classification only "hidden_state" will be usefull, I think.
So you just need to use the merged output (from forward and backward pass) . And 'LSTMCell' cell output have size [batch_size, seq_len, hidden_size*2], (*2 as forward and backward) so it is right for next stacked RNN (output come from each time-step, not like the state).
Here is the code which I was testing:
import tensorflow as tf
import numpy as np
hidden_size = 21
seq_len = tf.placeholder(tf.int32, [None])
inputs = tf.placeholder(tf.float32, [None, None, 32])
with tf.variable_scope('BiLSTM_1'):
with tf.variable_scope('forward_1'):
cell_fw_1 = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
with tf.variable_scope('backward_srl'):
cell_bw_1 = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
outputs_1, states_1 = tf.nn.bidirectional_dynamic_rnn(
cell_fw=cell_fw_1,
cell_bw=cell_bw_1,
dtype=tf.float32,
sequence_length=seq_len,
inputs=inputs,
scope='BiLSTM_1')
# Merge Output tensor from forward and backward pass. It size is [batch_size, seq_len, 2*hidden_size]
outputs_1 = tf.concat(outputs_1, 2)
with tf.name_scope("BiLSTM_2"):
with tf.variable_scope('forward'):
cell_fw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
with tf.variable_scope('backward'):
cell_bw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
outputs, states = tf.nn.bidirectional_dynamic_rnn(
cell_fw=cell_fw,
cell_bw=cell_bw,
dtype=tf.float32,
sequence_length=seq_len,
inputs=outputs_1,
scope="BiLSTM_2")
# Initializate the weights and biases
init = tf.initialize_all_variables()
batch_size = 5
seq_len_val = 10
train_inputs = np.zeros((batch_size, seq_len_val, 32))
train_seq_len = np.ones(batch_size) * seq_len_val
with tf.Session() as session:
session.run(init)
feed = {inputs: train_inputs, seq_len: train_seq_len}
out,state,state_1 = session.run([outputs,states, states_1],feed)
print ("State size: ", state_1[0].c.shape, " Out Size: ", out[0][0].shape)
print ("Batch_size: ", batch_size, " Sequence Len: ", seq_len_val, " Hidden Size: ", hidden_size)
'outputs_1' returned by LSTM 1 is a tuple containing 'outputs_fw' and 'outputs_bw'.
'outputs_fw' and 'outputs_bw' will be of dimension: [batch_size, sequence_length, hidden_size].
You have to concatenate 'outputs_fw' and 'outputs_bw' hidden states (us tf.concat with axis=2) and pass that as input to LSTM 2 instead of passing 'states_fw_1' as input to LSTM 2.

how to stack LSTM layers using TensorFlow

what I have is the following, which I believe is a network with one hidden LSTM layer:
# Parameters
learning rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10
# Network Parameters
n_input = 13
n_steps = 10
n_hidden = 512
n_classes = 13
# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])
# Define weights
weights = {
'out' : tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
'out' : tf.Variable(tf.random_normal([n_classes]))
}
However, I am trying to build an LSTM network using TensorFlow to predict power consumption. I have been looking around to find a good example, but I could not find any model with 2 hidden LSTM layers. Here's the model that I would like to build:
1 input layer,
1 output layer,
2 hidden LSTM layers(with 512 neurons in each),
time step(sequence length): 10
Could anyone guide me to build this using TensorFlow? ( from defining weights, building input shape, training, predicting, use of optimizer or cost function, etc), any help would be much appreciated.
Thank you so much in advance!
Here is how I do it in a translation model with GRU cells. You can just replace the GRU with an LSTM. It is really easy just use tf.nn.rnn_cell.MultiRNNCell with a list of the multiple cells it should wrap. In the code bellow I am manually unrolling it but you can pass it to tf.nn.dynamic_rnn or tf.nn.rnn as well.
y = input_tensor
with tf.variable_scope('encoder') as scope:
rnn_cell = rnn.MultiRNNCell([rnn.GRUCell(1024) for _ in range(3)])
state = tf.zeros((BATCH_SIZE, rnn_cell.state_size))
output = [None] * TIME_STEPS
for t in reversed(range(TIME_STEPS)):
y_t = tf.reshape(y[:, t, :], (BATCH_SIZE, -1))
output[t], state = rnn_cell(y_t, state)
scope.reuse_variables()
y = tf.pack(output, 1)
First you need some placeholders to put your training data (one batch)
x_input = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])
y_output = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])
A LSTM need a state, which consists of two components, the hidden state and the cell state, very good guide here: https://arxiv.org/pdf/1506.00019.pdf. For every layer in the LSTM you have one cell state and one hidden state.
The problem is that Tensorflow stores this in a LSTMStateTuple which you can not send into placeholder. So you need to store it in a Tensor, and then unpack it into a tuple:
state_placeholder = tf.placeholder(tf.float32, [num_layers, 2, batch_size, state_size])
l = tf.unpack(state_placeholder, axis=0)
rnn_tuple_state = tuple(
[tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1])
for idx in range(num_layers)]
)
Then you can use the built-in Tensorflow API to create the stacked LSTM layer.
cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
cell = tf.nn.rnn_cell.MultiRNNCell([cell]*num_layers, state_is_tuple=True)
outputs, state = tf.nn.dynamic_rnn(cell, x_input, initial_state=rnn_tuple_state)
From here you continue with the outputs to calculate logits and then a loss with respect to the y_inputs.
Then you run each batch with the sess.run-command, with truncated backpropagation (good explanation here http://r2rt.com/styles-of-truncated-backpropagation.html)
init_state = np.zeros((num_layers, 2, batch_size, state_size))
...current_state... = sess.run([...state...], feed_dict={x_input:batch_in, state_placeholder:current_state ...})
current_state = np.array(current_state)
You will have to convert the state to a numpy array before feeding it again.
Perhaps it is better to use a librarly like Tflearn or Keras instead?