How to train different LSTM on the same tensorflow session? - tensorflow

I would like to train two different LSTMs to make them interact in a dialogue context (ie one rnn generate a sequence, which will be used as a context for the second rnn, which will answer, etc...). However, I do not know how to train them separately on tensorflow (I think that I did not fully understand the logic behind tf graphs). When I execute my code, I get the following error:
Variable rnn/basic_lstm_cell/weights already exists, disallowed. Did you mean to set reuse=True in VarScope?
The error happens when I create my second RNN. Do you know how to fix this ?
My code is the following:
#User LSTM
no_units=100
_seq_user = tf.placeholder(tf.float32, [batch_size, max_length_user, user_inputShapeLen], name='seq')
_seq_length_user = tf.placeholder(tf.int32, [batch_size], name='seq_length')
cell = tf.contrib.rnn.BasicLSTMCell(
no_units)
output_user, hidden_states_user = tf.nn.dynamic_rnn(
cell,
_seq_user,
dtype=tf.float32,
sequence_length=_seq_length_user
)
out2_user = tf.reshape(output_user, shape=[-1, no_units])
out2_user = tf.layers.dense(out2_user, user_outputShapeLen)
out_final_user = tf.reshape(out2_user, shape=[-1, max_length_user, user_outputShapeLen])
y_user_ = tf.placeholder(tf.float32, [None, max_length_user, user_outputShapeLen])
softmax_user = tf.nn.softmax(out_final_user, dim=-1)
loss_user = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out_final_user, labels=y_user_))
optimizer = tf.train.AdamOptimizer(learning_rate=10**-4)
minimize = optimizer.minimize(loss_user)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(epoch):
print 'Epoch: ', i
batch_X, batch_Y, batch_sizes = lstm.batching(user_train_X, user_train_Y, sizes_user_train)
for data_, target_, size_ in zip(batch_X, batch_Y, batch_sizes):
sess.run(minimize, {_seq_user:data_, _seq_length_user:size_, y_user_:target_})
#System LSTM
no_units_system=100
_seq_system = tf.placeholder(tf.float32, [batch_size, max_length_system, system_inputShapeLen], name='seq_')
_seq_length_system = tf.placeholder(tf.int32, [batch_size], name='seq_length_')
cell_system = tf.contrib.rnn.BasicLSTMCell(
no_units_system)
output_system, hidden_states_system = tf.nn.dynamic_rnn(
cell_system,
_seq_system,
dtype=tf.float32,
sequence_length=_seq_length_system
)
out2_system = tf.reshape(output_system, shape=[-1, no_units])
out2_system = tf.layers.dense(out2_system, system_outputShapeLen)
out_final_system = tf.reshape(out2_system, shape=[-1, max_length_system, system_outputShapeLen])
y_system_ = tf.placeholder(tf.float32, [None, max_length_system, system_outputShapeLen])
softmax_system = tf.nn.softmax(out_final_system, dim=-1)
loss_system = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out_final_system, labels=y_system_))
optimizer = tf.train.AdamOptimizer(learning_rate=10**-4)
minimize = optimizer.minimize(loss_system)
for i in range(epoch):
print 'Epoch: ', i
batch_X, batch_Y, batch_sizes = lstm.batching(system_train_X, system_train_Y, sizes_system_train)
for data_, target_, size_ in zip(batch_X, batch_Y, batch_sizes):
sess.run(minimize, {_seq_system:data_, _seq_length_system:size_, y_system_:target_})

Regarding the variable scope error, try setting different variable scope for each graph.
with tf.variable_scope('User_LSTM'):
your user_lstm graph
with tf.variable_scope('System_LSTM'):
your system_lstm graph
Also, avoid using same names for different python objects. (ex.optimizer) The second declaration will override the first declaration, which will confuse you when you use tensorboard.
By the way, I would recommend training the model end-to-end fashion rather than running two sessions separately. Try feeding the output tensor of the first LSTM into the second LSTM with single optimizer and loss function.

To be short, to solve the problem(Variable rnn/basic_lstm_cell/weights already exists), what you need are 2 separated variable scopes (as is mentioned by #J-min). Because in tensorflow, variables are organized by their names, and by manage these two sets of variables in the two scopes, tensorflow will be able to distinguish them from each other.
And by train them separately on tensorflow, I suppose that you want to define two distinct loss functions, and optimize these two LSTM networks with two optimizers, each corresponding to one of the loss functions before.
Under such circumstances, you need to get the lists of these two sets of variables, and pass these lists to your optimizer, like that
opt1 = GradientDescentOptimizer(learning_rate=0.1)
opt_op1 = opt.minimize(loss1, var_list=<list of variables from scope 1>)
opt2 = GradientDescentOptimizer(learning_rate=0.1)
opt_op2 = opt.minimize(loss2, var_list=<list of variables from scope 2>)

Just change argument name of the cells while initializing them. For example:
user_cell = tf.contrib.rnn.BasicLSTMCell(no_units, name='user')
system_cell = tf.contrib.rnn.BasicLSTMCell(no_units, name='system')
In this way, TensorFlow won't share the variables of two cells. Then you can get the outputs as:
output_user, hidden_states_user = tf.nn.dynamic_rnn(
user_cell,
_seq_system,
dtype=tf.float32,
sequence_length=_seq_length_system
)
output_system, hidden_states_system = tf.nn.dynamic_rnn(
system_cell,
_seq_system,
dtype=tf.float32,
sequence_length=_seq_length_system
)

Related

Provide Tensorflow Seq2Seq output as input at next step (inference)

I would like to create a Seq2Seq model to forecast time series data. I am using the InferenceHelper and I am struggling with the sample_fn parameter. I would like to pass the decoder output of each cell through a dense layer in order to generate a single output at each time step. So I'm providing a function that does this to the sample_fn parameter.
Later on I would like to concatenate the rnn cell outputs with other non-time-series features and build more dense layers on top of it.
The network does fine at training time but not during inference. I think this is caused by the fact that I'm not sharing the same dense layer between training and inference time.
I tried to set the reuse parameter and used a with tf.variable_scope() environment. However, the sample_fn is already called within a specific scope in dynamic_decode and so I fail to use the same scope as I did during training.
The relevant part of my code looks as follows:
The placeholders:
inputs = tf.placeholder(shape=(None, 100, 1), dtype=tf.float32, name='inputs')
input_lengths = tf.placeholder(shape=(None,), dtype=tf.int32, name='input_lengths')
targets = tf.placeholder(shape=(None, 100), dtype=tf.float32, name='targets')
target_lengths = tf.placeholder(shape=(None,), dtype=tf.int32, name='target_lengths')
The encoder:
encoder_cell = tf.nn.rnn_cell.MultiRNNCell([tf.contrib.rnn.GRUCell(num_units=16, name='encoder_cell_0'])
self.decoder_cell = tf.nn.rnn_cell.MultiRNNCell([tf.contrib.rnn.GRUCell(num_units=16, name='decoder_cell_0']))
_, final_encoder_states = tf.nn.dynamic_rnn(cell=encoder_cell, inputs=inputs,
sequence_length=input_lengths, dtype=tf.float32)
The decoder (training)
start_tokens = tf.fill([tf.shape(inputs)[0]], start_token)
start_tokens = tf.cast(tf.expand_dims(start_tokens, 1), dtype=tf.float32)
targets_as_inputs = tf.concat([start_tokens, targets], axis=1)
targets_as_inputs = tf.reshape(targets_as_inputs, (-1, targets_as_inputs.shape[1], 1))
training_helper = tf.contrib.seq2seq.TrainingHelper(inputs=targets_as_inputs, sequence_length=target_lengths, name='training_helper')
training_decoder = tf.contrib.seq2seq.BasicDecoder(cell=decoder_cell, helper=training_helper, initial_state=final_encoder_states)
train_outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder=training_decoder, maximum_iterations=max_target_sequence_length, impute_finished=True)
train_predictions = train_outputs.rnn_output
train_predictions = tf.layers.dense(train_predictions, 1, activation=None, name='output_dense_layer')
The decoder (inference). The incorrect part:
def sample_fn(outputs):
return tf.layers.dense(outputs, 1, activation=None,
name='output_dense_layer', reuse=tf.AUTO_REUSE)
infer_helper = tf.contrib.seq2seq.InferenceHelper(sample_fn=sample_fn, sample_shape=(1),
sample_dtype=tf.float32, start_inputs=start_tokens, end_fn=lambda sample_ids: False, next_inputs_fn=None)
infer_decoder = tf.contrib.seq2seq.BasicDecoder(cell=decoder_cell, helper=infer_helper, initial_state=final_encoder_states)
infer_outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder=infer_decoder, maximum_iterations=max_target_sequence_length, impute_finished=True)
infer_predictions = infer_outputs.rnn_output
infer_predictions = sample_fn(infer_predictions)
There is a similar question: How to use tensorflow seq2seq without embeddings?
The author uses sample_fn=lambda outputs: outputs. But this returns a ValueError in my case because the dimensions don't match. How could they with multiple cells? sample_fn should return a single value.
For now, I have solved my problem by creating my own dynamic_decode function. I copied everything beside
with variable_scope.variable_scope(scope, "decoder", reuse=reuse) as varscope:
as well as a related if condition with varscope and another if condition testing the decoder class from tf.contrib.seq2seq.dynamic_decode.
Not a nice solution but good enough for now.

Do the operations defined in array ops in Tensorflow have gradient defined?

I want to know whether the tensorflow operations in this link, have a gradient defined. I am asking because I am implementing a custom loss function and when I run it I always have this error :
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
This is my custom Loss function:
def calculate_additional_loss(y_true,y_pred):
#additional loss
x_decoded_normalized = original_dim* y_pred
#y_true = K.print_tensor(y_true, message='y_true = ')
#y_pred = K.print_tensor(y_pred, message='y_pred = ')
error = tf.constant(0, dtype= tf.float32)
additional_loss= tf.constant(0, dtype= tf.float32)
final_loss= tf.constant(0, dtype= tf.float32)
for k in range(batch_size):
#add padding
reshaped_elem_1 = K.reshape(x_decoded_normalized[k], [DIM,DIM])
a = K.reshape(reshaped_elem_1[:,DIM-1], [DIM,1])
b = K.reshape(reshaped_elem_1[:,1], [DIM,1])
reshaped_elem_1 = tf.concat ([b,reshaped_elem_1], axis= 1)
reshaped_elem_1 = tf.concat ([reshaped_elem_1,a], axis= 1)
c= K.reshape(reshaped_elem_1[DIM-1,:], [1,DIM+2])
d= K.reshape(reshaped_elem_1[1,:], [1,DIM+2])
reshaped_elem_1 = tf.concat ([d,reshaped_elem_1],axis=0)
reshaped_elem_1 = tf.concat ([reshaped_elem_1,c],axis=0)
for (i,j) in range(reshaped_elem_1.shape[0],reshaped_elem_1.shape[1]):
error = tf.add(error, tf.pow((reshaped_elem_1[i,j]-
reshaped_elem_1[i,j+1]),-2),
tf.pow((reshaped_elem_1[i,j]-reshaped_elem_1[i,j-
1]),-2), tf.pow((reshaped_elem_1[i,j]-
reshaped_elem_1[i-1,j]),-2),
tf.pow((reshaped_elem_1[i,j]-reshaped_elem_1[i+1,j]),-2))
additional_loss = tf.add(additional_loss, tf.divide(error, original_dim))
final_loss += tf.divide(additional_loss, batch_size)
print('final_loss', final_loss)
return final_loss
and This is where I am calling it:
models = (encoder, decoder)
additional_loss = calculate_additional_loss(inputs,outputs)
vae.add_loss(additional_loss)
vae.compile(optimizer='adam')
vae.summary()
plot_model(vae,to_file='vae_mlp.png',show_shapes=True)
vae.fit(x_train, epochs=epochs, batch_size=batch_size, validation_data=(x_test, None), verbose = 1, callbacks=[CustomMetrics()])
Thank you in advance.
Most ops have a defined gradient. There are some ops for which a gradient is not defined and the error message you get gives you some examples.
Having said that, there are couple of mistakes I see in your code :
final_loss is defined as tf.constant, but you are trying to increment it.
You are taking a tuple from range
error is defined as tf.constant, but you are trying to increment it.
Don't use for loop in this way over batch_size. Instead use TensorFlow functions to handle batch dimension directly. This way you are just proliferating your nodes.
The way you have written your code makes me think that you're thinking of TensorFlow as pure python. It is not. You define the graph and then you execute it inside a session. So, in the function use TF functions to just define the computations.

Simple softmax classifier in tensorflow

So I am trying to write a simple softmax classifier in TensorFlow.
Here is the code:
# Neural network parameters
n_hidden_units = 500
n_classes = 10
# training set placeholders
input_X = tf.placeholder(dtype='float32',shape=(None,X_train.shape[1], X_train.shape[2]),name="input_X")
input_y = tf.placeholder(dtype='int32', shape=(None,), name="input_y")
# hidden layer
dim = X_train.shape[1]*X_train.shape[2] # dimension of each traning data point
flatten_X = tf.reshape(input_X, shape=(-1, dim))
weights_hidden_layer = tf.Variable(initial_value=np.zeros((dim,n_hidden_units)), dtype ='float32')
bias_hidden_layer = tf.Variable(initial_value=np.zeros((1,n_hidden_units)), dtype ='float32')
hidden_layer_output = tf.nn.relu(tf.matmul(flatten_X, weights_hidden_layer) + bias_hidden_layer)
# output layer
weights_output_layer = tf.Variable(initial_value=np.zeros((n_hidden_units,n_classes)), dtype ='float32')
bias_output_layer = tf.Variable(initial_value=np.zeros((1,n_classes)), dtype ='float32')
output_logits = tf.matmul(hidden_layer_output, weights_output_layer) + bias_output_layer
predicted_y = tf.nn.softmax(output_logits)
# loss
one_hot_labels = tf.one_hot(input_y, depth=n_classes, axis = -1)
loss = tf.losses.softmax_cross_entropy(one_hot_labels, output_logits)
# optimizer
optimizer = tf.train.MomentumOptimizer(0.01, 0.5).minimize(
loss, var_list=[weights_hidden_layer, bias_hidden_layer, weights_output_layer, bias_output_layer])
This compiles, and I have checked the shape of all the tensor and it coincides with what I expect.
However, I tried to run the optimizer using the following code:
# running the optimizer
s = tf.InteractiveSession()
s.run(tf.global_variables_initializer())
for i in range(5):
s.run(optimizer, {input_X: X_train, input_y: y_train})
loss_i = s.run(loss, {input_X: X_train, input_y: y_train})
print("loss at iter %i:%.4f" % (i, loss_i))
And the loss kept being the same in all iterations!
I must have messed up something, but I fail to see what.
Any ideas? I also appreciate if somebody leaves comments regarding code style and/or tensorflow tips.
You have made a mistake. You are initializing your weights using np.zeros. Use np.random.normal. You can choose mean for this Gaussian Distribution by using number of inputs going to a particular neuron. You can read more about it here.
The reason that you want to initialize with Gaussian Distribution is because you want to break symmetry. If all the weights are initialized by zero, then you can use backpropogation to see that all the weights will evolved same.
One could visualize the weight histogram using TensorBoard to make it easier. I executed your code for this. A few more lines are needed to set up Tensorboard logging but the histogram summary of weights can be easily added.
Initialized to zeros
weights_hidden_layer = tf.Variable(initial_value=np.zeros((784,n_hidden_units)), dtype ='float32')
tf.summary.histogram("weights_hidden_layer",weights_hidden_layer)
Xavier initialization
initializer = tf.contrib.layers.xavier_initializer()
weights_hidden_layer = tf.Variable(initializer(shape=(784,n_hidden_units)), dtype ='float32')
tf.summary.histogram("weights_hidden_layer",weights_hidden_layer)

tensorflow RNN implementation

I'm building a RNN model to do the image classification. I used a pipeline to feed in the data. However it returns
ValueError: Variable rnn/rnn/basic_rnn_cell/weights already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:
I wonder what can I do to fix this since there are not many examples of implementing RNN with an input pipeline. I know it would work if I use the placeholder, but my data is already in the form of tensors. Unless I can feed the placeholder with tensors, I prefer just to use the pipeline.
def RNN(inputs):
with tf.variable_scope('cells', reuse=True):
basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=batch_size)
with tf.variable_scope('rnn'):
outputs, states = tf.nn.dynamic_rnn(basic_cell, inputs, dtype=tf.float32)
fc_drop = tf.nn.dropout(states, keep_prob)
logits = tf.contrib.layers.fully_connected(fc_drop, batch_size, activation_fn=None)
return logits
#Training
with tf.name_scope("cost_function") as scope:
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=train_label_batch, logits=RNN(train_batch)))
train_step = tf.train.MomentumOptimizer(learning_rate, 0.9).minimize(cost)
#Accuracy
with tf.name_scope("accuracy") as scope:
correct_prediction = tf.equal(tf.argmax(RNN(test_image), 1), tf.argmax(test_image_label, 0))
accuracy = tf.cast(correct_prediction, tf.float32)
You need to use the reuse option correctly. following changes would solve it. For prediction you need to use the already existed variables in the graph.
def RNN(inputs, reuse):
with tf.variable_scope('cells', reuse=reuse):
basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=batch_size, reuse=reuse)
...
...
#Training
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=train_label_batch, logits=RNN(train_batch, reuse=None)))
#Accuracy
...
correct_prediction = tf.equal(tf.argmax(RNN(test_image, reuse=True), 1), tf.argmax(test_image_label, 0))

TensorFlow 1.2 How to Setup Time Series Prediction at Inference Time Using Seq2Seq

I am trying to study the tf.contrib.seq2seq section of the TensorFlow library using a toy model. Currently, my graph is as follows:
tf.reset_default_graph()
# Placeholders
enc_inp = tf.placeholder(tf.float32, [None, n_steps, n_input])
expect = tf.placeholder(tf.float32, [None, n_steps, n_output])
expect_length = tf.placeholder(tf.int32, [None])
keep_prob = tf.placeholder(tf.float32, [])
# Encoder
cells = [tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.BasicLSTMCell(n_hidden), output_keep_prob=keep_prob) for i in range(layers_stacked_count)]
cell = tf.contrib.rnn.MultiRNNCell(cells)
encoded_outputs, encoded_states = tf.nn.dynamic_rnn(cell, enc_inp, dtype=tf.float32)
# Decoder
de_cells = [tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.BasicLSTMCell(n_hidden), output_keep_prob=keep_prob) for i in range(layers_stacked_count)]
de_cell = tf.contrib.rnn.MultiRNNCell(de_cells)
training_helper = tf.contrib.seq2seq.TrainingHelper(expect, expect_length)
decoder = tf.contrib.seq2seq.BasicDecoder(cell=de_cell, helper=training_helper, initial_state=encoded_states)
final_outputs, final_state, final_sequence_lengths = tf.contrib.seq2seq.dynamic_decode(decoder)
decoder_logits = final_outputs.rnn_output
h = tf.contrib.layers.fully_connected(decoder_logits, n_output)
diff = tf.squared_difference(h, expect)
batch_loss = tf.reduce_sum(diff, axis=1)
loss = tf.reduce_mean(batch_loss)
optimiser = tf.train.AdamOptimizer(1e-3)
training_op = optimiser.minimize(loss)
The graph trains very well and executes fine. However, I am not sure what to do at inference time, since this graph always requires the expect variable (the value which I am trying to predict).
As I understand, the TrainingHelper function is using the ground truth as input, so what I need is another helper function at inference time.
Most implementations of seq2seq model I've seem appears to be outdated (tf.contrib.legacy_seq2seq). Some of the most up-to-date models often use GreddyEmbeddingHelper, which I'm not sure is appropriate for continuous time series predictions.
Another possible solution I've found is to use the CustomHelper function. However, there is no little material out there for me to learn and I've just kept banging my head against the wall.
If I am trying to implement a seq2seq model for time series prediction, what should I do at inference time?
Any help or advice would be greatly appreciated. Thanks in advance!
You are right that you need to use another helper function for inference, but you need to share weights between testing and inference.
You can do this with tf.variable_scope()
with tf.variable_scope("decode"):
training_helper = ...
with tf.variable_scope("decode", reuse = True):
inference_helper = ...
For a more complete example, see one of these two examples:
https://github.com/pplantinga/tensorflow-examples/blob/master/TensorFlow%201.2%20seq2seq%20example.ipynb
https://github.com/udacity/deep-learning/blob/master/seq2seq/sequence_to_sequence_implementation.ipynb