Provide Tensorflow Seq2Seq output as input at next step (inference) - tensorflow

I would like to create a Seq2Seq model to forecast time series data. I am using the InferenceHelper and I am struggling with the sample_fn parameter. I would like to pass the decoder output of each cell through a dense layer in order to generate a single output at each time step. So I'm providing a function that does this to the sample_fn parameter.
Later on I would like to concatenate the rnn cell outputs with other non-time-series features and build more dense layers on top of it.
The network does fine at training time but not during inference. I think this is caused by the fact that I'm not sharing the same dense layer between training and inference time.
I tried to set the reuse parameter and used a with tf.variable_scope() environment. However, the sample_fn is already called within a specific scope in dynamic_decode and so I fail to use the same scope as I did during training.
The relevant part of my code looks as follows:
The placeholders:
inputs = tf.placeholder(shape=(None, 100, 1), dtype=tf.float32, name='inputs')
input_lengths = tf.placeholder(shape=(None,), dtype=tf.int32, name='input_lengths')
targets = tf.placeholder(shape=(None, 100), dtype=tf.float32, name='targets')
target_lengths = tf.placeholder(shape=(None,), dtype=tf.int32, name='target_lengths')
The encoder:
encoder_cell = tf.nn.rnn_cell.MultiRNNCell([tf.contrib.rnn.GRUCell(num_units=16, name='encoder_cell_0'])
self.decoder_cell = tf.nn.rnn_cell.MultiRNNCell([tf.contrib.rnn.GRUCell(num_units=16, name='decoder_cell_0']))
_, final_encoder_states = tf.nn.dynamic_rnn(cell=encoder_cell, inputs=inputs,
sequence_length=input_lengths, dtype=tf.float32)
The decoder (training)
start_tokens = tf.fill([tf.shape(inputs)[0]], start_token)
start_tokens = tf.cast(tf.expand_dims(start_tokens, 1), dtype=tf.float32)
targets_as_inputs = tf.concat([start_tokens, targets], axis=1)
targets_as_inputs = tf.reshape(targets_as_inputs, (-1, targets_as_inputs.shape[1], 1))
training_helper = tf.contrib.seq2seq.TrainingHelper(inputs=targets_as_inputs, sequence_length=target_lengths, name='training_helper')
training_decoder = tf.contrib.seq2seq.BasicDecoder(cell=decoder_cell, helper=training_helper, initial_state=final_encoder_states)
train_outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder=training_decoder, maximum_iterations=max_target_sequence_length, impute_finished=True)
train_predictions = train_outputs.rnn_output
train_predictions = tf.layers.dense(train_predictions, 1, activation=None, name='output_dense_layer')
The decoder (inference). The incorrect part:
def sample_fn(outputs):
return tf.layers.dense(outputs, 1, activation=None,
name='output_dense_layer', reuse=tf.AUTO_REUSE)
infer_helper = tf.contrib.seq2seq.InferenceHelper(sample_fn=sample_fn, sample_shape=(1),
sample_dtype=tf.float32, start_inputs=start_tokens, end_fn=lambda sample_ids: False, next_inputs_fn=None)
infer_decoder = tf.contrib.seq2seq.BasicDecoder(cell=decoder_cell, helper=infer_helper, initial_state=final_encoder_states)
infer_outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder=infer_decoder, maximum_iterations=max_target_sequence_length, impute_finished=True)
infer_predictions = infer_outputs.rnn_output
infer_predictions = sample_fn(infer_predictions)
There is a similar question: How to use tensorflow seq2seq without embeddings?
The author uses sample_fn=lambda outputs: outputs. But this returns a ValueError in my case because the dimensions don't match. How could they with multiple cells? sample_fn should return a single value.

For now, I have solved my problem by creating my own dynamic_decode function. I copied everything beside
with variable_scope.variable_scope(scope, "decoder", reuse=reuse) as varscope:
as well as a related if condition with varscope and another if condition testing the decoder class from tf.contrib.seq2seq.dynamic_decode.
Not a nice solution but good enough for now.

Related

How do you fit a tf.Dataset to a Keras Autoencoder Model when the Dataset has been generated using TFX?

Problem
As the title suggests I have been trying to create a pipeline for training an Autoencoder model using TFX. The problem I'm having is fitting the tf.Dataset returned by the DataAccessor.tf_dataset_factory object to the Autoencoder.
Below I summarise the steps I've taken through this project, and have some Questions at the bottom if you wish to skip the background information.
Intro
TFX Pipeline
The TFX components I have used so far have been:
CsvExampleGenerator (the dataset has 82 columns, all numeric, and the sample csv has 739 rows)
StatisticsGenerator / SchemaGenerator, the schema has been edited as is now loaded in using an Importer
Transform
Trainer (this is the component I am currently having problems with)
Model
The model that I am attempting to train is based off of the example laid out here https://www.tensorflow.org/tutorials/generative/autoencoder. However, my model is being trained on tabular data, searching for anomalous results, as opposed to image data.
As I have tried a couple of solutions I have tried using both the Keras.layers and Keras.model format for defining the model and I outline both below:
Subclassing Keras.Model
class Autoencoder(keras.models.Model):
def __init__(self, features):
super(Autoencoder, self).__init__()
self.encoder = tf.keras.Sequential([
keras.layers.Dense(82, activation = 'relu'),
keras.layers.Dense(32, activation = 'relu'),
keras.layers.Dense(16, activation = 'relu'),
keras.layers.Dense(8, activation = 'relu')
])
self.decoder = tf.keras.Sequential([
keras.layers.Dense(16, activation = 'relu'),
keras.layers.Dense(32, activation = 'relu'),
keras.layers.Dense(len(features), activation = 'sigmoid')
])
def call(self, x):
inputs = [keras.layers.Input(shape = (1,), name = f) for f in features]
dense = keras.layers.concatenate(inputs)
encoded = self.encoder(dense)
decoded = self.decoder(encoded)
return decoded
Subclassing Keras.Layers
def _build_keras_model(features: List[str]) -> tf.keras.Model:
inputs = [keras.layers.Input(shape = (1,), name = f) for f in features]
dense = keras.layers.concatenate(inputs)
dense = keras.layers.Dense(32, activation = 'relu')(dense)
dense = keras.layers.Dense(16, activation = 'relu')(dense)
dense = keras.layers.Dense(8, activation = 'relu')(dense)
dense = keras.layers.Dense(16, activation = 'relu')(dense)
dense = keras.layers.Dense(32, activation = 'relu')(dense)
outputs = keras.layers.Dense(len(features), activation = 'sigmoid')(dense)
model = keras.Model(inputs = inputs, outputs = outputs)
model.compile(
optimizer = 'adam',
loss = 'mae'
)
return model
TFX Trainer Component
For creating the Trainer Component I have been mainly following the implementation details laid out here: https://www.tensorflow.org/tfx/guide/trainer
As well as following the default penguins example: https://www.tensorflow.org/tfx/tutorials/tfx/penguin_simple#write_model_training_code
run_fn defintion
def run_fn(fn_args: tfx.components.FnArgs) -> None:
tft_output = tft.TFTransformOutput(fn_args.transform_output)
train_dataset = _input_fn(
file_pattern = fn_args.train_files,
data_accessor = fn_args.data_accessor,
tf_transform_output = tft_output,
batch_size = fn_args.train_steps
)
eval_dataset = _input_fn(
file_pattern = fn_args.eval_files,
data_accessor = fn_args.data_accessor,
tf_transform_output = tft_output,
batch_size = fn_args.custom_config['eval_batch_size']
)
# model = Autoencoder(
# features = fn_args.custom_config['features']
# )
model = _build_keras_model(features = fn_args.custom_config['features'])
model.compile(optimizer = 'adam', loss = 'mse')
model.fit(
train_dataset,
steps_per_epoch = fn_args.train_steps,
validation_data = eval_dataset,
validation_steps = fn_args.eval_steps
)
...
_input_fn definition
def _apply_preprocessing(raw_features, tft_layer):
transformed_features = tft_layer(raw_features)
return transformed_features
def _input_fn(
file_pattern,
data_accessor: tfx.components.DataAccessor,
tf_transform_output: tft.TFTransformOutput,
batch_size: int) -> tf.data.Dataset:
"""
Generates features and label for tuning/training.
Args:
file_pattern: List of paths or patterns of input tfrecord files.
data_accessor: DataAccessor for converting input to RecordBatch.
tf_transform_output: A TFTransformOutput.
batch_size: representing the number of consecutive elements of returned
dataset to combine in a single batch
Returns:
A dataset that contains features where features is a
dictionary of Tensors.
"""
dataset = data_accessor.tf_dataset_factory(
file_pattern,
tfxio.TensorFlowDatasetOptions(batch_size = batch_size),
tf_transform_output.transformed_metadata.schema
)
transform_layer = tf_transform_output.transform_features_layer()
def apply_transform(raw_features):
return _apply_preprocessing(raw_features, transform_layer)
return dataset.map(apply_transform).repeat()
This differs from the _input_fn example given above as I was following the example in the next tfx tutorial found here: https://www.tensorflow.org/tfx/tutorials/tfx/penguin_tft#run_fn
Also for reference, there is no Target within the example data so there is no label_key to be passed to the tfxio.TensorFlowDatasetOptions object.
Error
When trying to run the Trainer component using a TFX InteractiveContext object I receive the following error.
ValueError: No gradients provided for any variable: ['dense_460/kernel:0', 'dense_460/bias:0', 'dense_461/kernel:0', 'dense_461/bias:0', 'dense_462/kernel:0', 'dense_462/bias:0', 'dense_463/kernel:0', 'dense_463/bias:0', 'dense_464/kernel:0', 'dense_464/bias:0', 'dense_465/kernel:0', 'dense_465/bias:0'].
From my own attempts to solve this I believe the problem lies in the way that an Autoencoder is trained. From the Autoencoder example linked here https://www.tensorflow.org/tutorials/generative/autoencoder the data is fitted like so:
autoencoder.fit(x_train, x_train,
epochs=10,
shuffle=True,
validation_data=(x_test, x_test))
therefore it stands to reason that the tf.Dataset should also mimic this behaviour and when testing with plain Tensor objects I have been able to recreate the error above and then solve it when adding the target to be the same as the training data in the .fit() function.
Things I've Tried So Far
Duplicating Train Dataset
model.fit(
train_dataset,
train_dataset,
steps_per_epoch = fn_args.train_steps,
validation_data = eval_dataset,
validation_steps = fn_args.eval_steps
)
Raises error due to Keras not accepting a 'y' value when a dataset is passed.
ValueError: `y` argument is not supported when using dataset as input.
Returning a dataset that is a tuple with itself
def _input_fn(...
dataset = data_accessor.tf_dataset_factory(
file_pattern,
tfxio.TensorFlowDatasetOptions(batch_size = batch_size),
tf_transform_output.transformed_metadata.schema
)
transform_layer = tf_transform_output.transform_features_layer()
def apply_transform(raw_features):
return _apply_preprocessing(raw_features, transform_layer)
dataset = dataset.map(apply_transform)
return dataset.map(lambda x: (x, x))
This raises an error where the keys from the features dictionary don't match the output of the model.
ValueError: Found unexpected keys that do not correspond to any Model output: dict_keys(['feature_string', ...]). Expected: ['dense_477']
At this point I switched to using the keras.model Autoencoder subclass and tried to add output keys to the Model using an output which I tried to create dynamically in the same way as the inputs.
def call(self, x):
inputs = [keras.layers.Input(shape = (1,), name = f) for f in x]
dense = keras.layers.concatenate(inputs)
encoded = self.encoder(dense)
decoded = self.decoder(encoded)
outputs = {}
for feature_name in x:
outputs[feature_name] = keras.layers.Dense(1, activation = 'sigmoid')(decoded)
return outputs
This raises the following error:
TypeError: Cannot convert a symbolic Keras input/output to a numpy array. This error may indicate that you're trying to pass a symbolic value to a NumPy call, which is not supported. Or, you may be trying to pass Keras symbolic inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically converting the API call to a lambda layer in the Functional Model.
I've been looking into solving this issue but am no longer sure if the data is being passed correctly and am beginning to think I'm getting side-tracked from the actual problem.
Questions
Has anyone managed to get an Autoencoder working when connected via TFX examples?
Did you alter the tf.Dataset or handled the examples in a different way to the _input_fn demonstrated?
So I managed to find an answer to this and wanted to leave what I found here in case anyone else stumbles onto a similar problem.
It turns out my feelings around the error were correct and the solution did indeed lie in how the tf.Dataset object was presented.
This can be demonstrated when I ran some code which simulated the incoming data using randomly generated tensors.
tensors = [tf.random.uniform(shape = (1, 82)) for i in range(739)]
# This gives us a list of 739 tensors which hold 1 value for 82 'features' simulating the dataset I had
dataset = tf.data.Dataset.from_tensor_slices(tensors)
dataset = dataset.map(lambda x : (x, x))
# This returns a dataset which marks the training set and target as the same
# which is what the Autoecnoder model is looking for
model.fit(dataset ...)
Following this I proceeded to do the same thing with the dataset returned by the _input_fn. Given that the tfx DataAccessor object returns a features_dict however I needed to combine the tensors in that dict together to create a single tensor.
This is how my _input_fn looks now:
def create_target_values(features_dict: Dict[str, tf.Tensor]) -> tuple:
value_tensor = tf.concat(list(features_dict.values()), axis = 1)
return (features_dict, value_tensor)
def _input_fn(
file_pattern,
data_accessor: tfx.components.DataAccessor,
tf_transform_output: tft.TFTransformOutput,
batch_size: int) -> tf.data.Dataset:
"""
Generates features and label for tuning/training.
Args:
file_pattern: List of paths or patterns of input tfrecord files.
data_accessor: DataAccessor for converting input to RecordBatch.
tf_transform_output: A TFTransformOutput.
batch_size: representing the number of consecutive elements of returned
dataset to combine in a single batch
Returns:
A dataset that contains (features, target_tensor) tuple where features is a
dictionary of Tensors, and target_tensor is a single Tensor that is a concatenated tensor of all the
feature values.
"""
dataset = data_accessor.tf_dataset_factory(
file_pattern,
tfxio.TensorFlowDatasetOptions(batch_size = batch_size),
tf_transform_output.transformed_metadata.schema
)
dataset = dataset.map(lambda x: create_target_values(features_dict = x))
return dataset.repeat()

Why does my model learn with Ragged Tensors but not Dense Tensors?

I have a string of letters that follow a "grammar." I also have boolean labels on my training set of whether the string follows "the grammar" or not. Basically, my model is trying to learn determine if a string of letters follows the rules. It's a fairly simple problem (I got it out of a textbook).
I am generating my dataset like this:
def generate_dataset(size):
good_strings = [string_to_ids(generate_string(embedded_reber_grammar))
for _ in range(size // 2)]
bad_strings = [string_to_ids(generate_corrupted_string(embedded_reber_grammar))
for _ in range(size - size // 2)]
all_strings = good_strings + bad_strings
X = tf.ragged.constant(all_strings, ragged_rank=1)
# X = X.to_tensor(default_value=0)
y = np.array([[1.] for _ in range(len(good_strings))] +
[[0.] for _ in range(len(bad_strings))])
return X, y
Notice the line X = X.to_tensor(default_value=0). If this line is commented out, my model learns just fine. However, if it is not commented out, it fails to learn and the validation set performs the same as chance (50-50).
Here is my actual model:
np.random.seed(42)
tf.random.set_seed(42)
embedding_size = 5
model = keras.models.Sequential([
keras.layers.InputLayer(input_shape=[None], dtype=tf.int32, ragged=True),
keras.layers.Embedding(input_dim=len(POSSIBLE_CHARS) + 1, output_dim=embedding_size),
keras.layers.GRU(30),
keras.layers.Dense(1, activation="sigmoid")
])
optimizer = keras.optimizers.SGD(lr=0.02, momentum = 0.95, nesterov=True)
model.compile(loss="binary_crossentropy", optimizer=optimizer, metrics=["accuracy"])
history = model.fit(X_train, y_train, epochs=5, validation_data=(X_valid, y_valid))
I am using 0 as the default value for the dense tensors. The strings_to_ids doesn't use 0 for any of the values but instead starts at 1. Also, when I switch to using a Dense tensor I change ragged=True to False. I have no idea why using a dense tensor causes the model to fail, as I've used dense tensors before in similar exercises.
For additional details, see the solution from the book (exercise 8) or my own colab notebook.
So turns out the answer was that the shape of the dense tensor was different across the training set and validation set. This was because the longest sequence differed in length between the two sets (same with the test set).

tf.keras.backend.function for transforming embeddings inside tf.data.dataset

I am trying to use the output of a neural network to transform data inside tf.data.dataset. Specifically, I am using a Delta-Encoder to manipulate embeddings inside the tf.data pipeline. In so doing, however, I get the following error:
OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed in Graph execution. Use Eager execution or decorate this function with #tf.function.
I have searched the dataset pipeline page and stack overflow, but I could not find something that addresses my question. In the code below I am using an Autoencoder, as it yields an identical error with more concise code.
The offending part seems to be
[[x,]] = tf.py_function(Auto_Func, [x], [tf.float32])
inside
tf_auto_transform.
num_embeddings = 100
input_dims = 1000
embeddings = np.random.normal(size = (num_embeddings, input_dims)).astype(np.float32)
target = np.zeros(num_embeddings)
#creating Autoencoder
inp = Input(shape = (input_dims,), name ='input')
hidden = Dense(10, activation = 'relu', name = 'hidden')(inp)
out = Dense(input_dims, activation = 'relu', name='output')(hidden)
auto_encoder = tf.keras.models.Model(inputs =inp, outputs=out)
Auto_Func = tf.keras.backend.function(inputs = Autoencoder.get_layer(name='input').input,
outputs = Autoencoder.get_layer(name='output').input )
#Autoencoder transform for dataset.map
def tf_auto_transform(x, target):
x_shape = x.shape
##tf.function
#def func(x):
# return tf.py_function(Auto_Func, [x], [tf.float32])
#[[x,]] = func(x)
[[x,]] = tf.py_function(Auto_Func, [x], [tf.float32])
x.set_shape(x_shape)
return x, target
def get_dataset(X,y, batch_size = 32):
train_ds = tf.data.Dataset.from_tensor_slices((X, y))
train_ds = train_ds.map(tf_auto_transform)
train_ds = train_ds.batch(batch_size)
return train_ds
dataset = get_dataset(embeddings, target, 2)
The above code yields the following error:
OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed in Graph execution. Use Eager execution or decorate this function with #tf.function.
I tried to eliminate the error by running the commented out section of the tf_auto_transform function, but the error persisted.
SideNote: While it is true that the Delta encoder paper has code, it is written in tf 1.x. I am trying to use tf 2.x with the tf functional API instead. Thank you for your help!
At the risk of outing myself as a n00b, the answer is to switch the order of the map and batch functions. I am trying to apply a neural network to make some changes on data. tf.keras models take batches as input, not individual samples. By batching the data first, I can run batches through my nn.
def get_dataset(X,y, batch_size = 32):
train_ds = tf.data.Dataset.from_tensor_slices((X, y))
#The changed order
train_ds = train_ds.batch(batch_size)
train_ds = train_ds.map(tf_auto_transform)**strong text**
return train_ds
It really is that simple.

How to backprop through a model that predicts the weights for another in Tensorflow

I am currently trying to train a model (hypernetwork) that can predict the weights for another model (main network) such that the main network's cross-entropy loss decreases. However when I use tf.assign to assign the new weights to the network it does not allow backpropagation into the hypernetwork thus rendering the system non-differentiable. I have tested whether my weights are properly updated and they seem to be since when subtracting initial weights from updated ones is a non zero sum.
This is a minimal sample of what I am trying to achieve.
import numpy as np
import tensorflow as tf
from tensorflow.contrib.layers import softmax
def random_addition(variables):
addition_update_ops = []
for variable in variables:
update = tf.assign(variable, variable+tf.random_normal(shape=variable.get_shape()))
addition_update_ops.append(update)
return addition_update_ops
def network_predicted_addition(variables, network_preds):
addition_update_ops = []
for idx, variable in enumerate(variables):
if idx == 0:
print(variable)
update = tf.assign(variable, variable + network_preds[idx])
addition_update_ops.append(update)
return addition_update_ops
def dense_weight_update_net(inputs, reuse):
with tf.variable_scope("weight_net", reuse=reuse):
output = tf.layers.conv2d(inputs=inputs, kernel_size=(3, 3), filters=16, strides=(1, 1),
activation=tf.nn.leaky_relu, name="conv_layer_0", padding="SAME")
output = tf.reduce_mean(output, axis=[0, 1, 2])
output = tf.reshape(output, shape=(1, output.get_shape()[0]))
output = tf.layers.dense(output, units=(16*3*3*3))
output = tf.reshape(output, shape=(3, 3, 3, 16))
return output
def conv_net(inputs, reuse):
with tf.variable_scope("conv_net", reuse=reuse):
output = tf.layers.conv2d(inputs=inputs, kernel_size=(3, 3), filters=16, strides=(1, 1),
activation=tf.nn.leaky_relu, name="conv_layer_0", padding="SAME")
output = tf.reduce_mean(output, axis=[1, 2])
output = tf.layers.dense(output, units=2)
output = softmax(output)
return output
input_x_0 = tf.zeros(shape=(32, 32, 32, 3))
target_y_0 = tf.zeros(shape=(32), dtype=tf.int32)
input_x_1 = tf.ones(shape=(32, 32, 32, 3))
target_y_1 = tf.ones(shape=(32), dtype=tf.int32)
input_x = tf.concat([input_x_0, input_x_1], axis=0)
target_y = tf.concat([target_y_0, target_y_1], axis=0)
output_0 = conv_net(inputs=input_x, reuse=False)
target_y = tf.one_hot(target_y, 2)
crossentropy_loss_0 = tf.losses.softmax_cross_entropy(onehot_labels=target_y, logits=output_0)
conv_net_parameters = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope="conv_net")
weight_net_parameters = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope="weight_net")
print(conv_net_parameters)
weight_updates = dense_weight_update_net(inputs=input_x, reuse=False)
#updates_0 = random_addition(conv_net_parameters)
updates_1 = network_predicted_addition(conv_net_parameters, network_preds=[weight_updates])
with tf.control_dependencies(updates_1):
output_1 = conv_net(inputs=input_x, reuse=True)
crossentropy_loss_1 = tf.losses.softmax_cross_entropy(onehot_labels=target_y, logits=output_1)
check_sum = tf.reduce_sum(tf.abs(output_0 - output_1))
c_opt = tf.train.AdamOptimizer(beta1=0.9, learning_rate=0.001)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) # Needed for correct batch norm usage
with tf.control_dependencies(update_ops): # Needed for correct batch norm usage
train_variables = weight_net_parameters #+ conv_net_parameters
c_error_opt_op = c_opt.minimize(crossentropy_loss_1,
var_list=train_variables,
colocate_gradients_with_ops=True)
init=tf.global_variables_initializer()
with tf.Session() as sess:
init = sess.run(init)
loss_list_0 = []
loss_list_1 = []
for i in range(1000):
_, checksum, crossentropy_0, crossentropy_1 = sess.run([c_error_opt_op, check_sum, crossentropy_loss_0,
crossentropy_loss_1])
loss_list_0.append(crossentropy_0)
loss_list_1.append(crossentropy_1)
print(checksum, np.mean(loss_list_0), np.mean(loss_list_1))
Does anyone know how I can get tensorflow to compute the gradients for this? Thank you.
In this case your weights aren't variables, they are computed tensors based on the hypernetwork. All you really have is one network during training. If I understand you correctly you are then proposing to discard the hypernetwork and be able to use just the main network to perform predictions.
If this is the case then you can either save the weight values manually and reload them as constants, or you could use tf.cond and tf.assign to assign them as you are doing during training, but use tf.cond to choose to use the variable or the computed tensor depending on whether you're doing training or inference.
During training you will need to use the computed tensor from the hypernetwork in order to enable backprop.
Example from comments, w is the weight you'll use, you can assign a variable during training to keep track of it, but then use tf.cond to either use the variable (during inference) or the computed value from the hypernetwork (during training). In this example you need to pass in a boolean placeholder is_training_placeholder to indicate if you're running training of inference.
tf.assign(w_variable, w_from_hypernetwork)
w = tf.cond(is_training_placeholder, true_fn=lambda: w_from_hypernetwork, false_fn=lambda: w_variable)

How to train different LSTM on the same tensorflow session?

I would like to train two different LSTMs to make them interact in a dialogue context (ie one rnn generate a sequence, which will be used as a context for the second rnn, which will answer, etc...). However, I do not know how to train them separately on tensorflow (I think that I did not fully understand the logic behind tf graphs). When I execute my code, I get the following error:
Variable rnn/basic_lstm_cell/weights already exists, disallowed. Did you mean to set reuse=True in VarScope?
The error happens when I create my second RNN. Do you know how to fix this ?
My code is the following:
#User LSTM
no_units=100
_seq_user = tf.placeholder(tf.float32, [batch_size, max_length_user, user_inputShapeLen], name='seq')
_seq_length_user = tf.placeholder(tf.int32, [batch_size], name='seq_length')
cell = tf.contrib.rnn.BasicLSTMCell(
no_units)
output_user, hidden_states_user = tf.nn.dynamic_rnn(
cell,
_seq_user,
dtype=tf.float32,
sequence_length=_seq_length_user
)
out2_user = tf.reshape(output_user, shape=[-1, no_units])
out2_user = tf.layers.dense(out2_user, user_outputShapeLen)
out_final_user = tf.reshape(out2_user, shape=[-1, max_length_user, user_outputShapeLen])
y_user_ = tf.placeholder(tf.float32, [None, max_length_user, user_outputShapeLen])
softmax_user = tf.nn.softmax(out_final_user, dim=-1)
loss_user = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out_final_user, labels=y_user_))
optimizer = tf.train.AdamOptimizer(learning_rate=10**-4)
minimize = optimizer.minimize(loss_user)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(epoch):
print 'Epoch: ', i
batch_X, batch_Y, batch_sizes = lstm.batching(user_train_X, user_train_Y, sizes_user_train)
for data_, target_, size_ in zip(batch_X, batch_Y, batch_sizes):
sess.run(minimize, {_seq_user:data_, _seq_length_user:size_, y_user_:target_})
#System LSTM
no_units_system=100
_seq_system = tf.placeholder(tf.float32, [batch_size, max_length_system, system_inputShapeLen], name='seq_')
_seq_length_system = tf.placeholder(tf.int32, [batch_size], name='seq_length_')
cell_system = tf.contrib.rnn.BasicLSTMCell(
no_units_system)
output_system, hidden_states_system = tf.nn.dynamic_rnn(
cell_system,
_seq_system,
dtype=tf.float32,
sequence_length=_seq_length_system
)
out2_system = tf.reshape(output_system, shape=[-1, no_units])
out2_system = tf.layers.dense(out2_system, system_outputShapeLen)
out_final_system = tf.reshape(out2_system, shape=[-1, max_length_system, system_outputShapeLen])
y_system_ = tf.placeholder(tf.float32, [None, max_length_system, system_outputShapeLen])
softmax_system = tf.nn.softmax(out_final_system, dim=-1)
loss_system = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out_final_system, labels=y_system_))
optimizer = tf.train.AdamOptimizer(learning_rate=10**-4)
minimize = optimizer.minimize(loss_system)
for i in range(epoch):
print 'Epoch: ', i
batch_X, batch_Y, batch_sizes = lstm.batching(system_train_X, system_train_Y, sizes_system_train)
for data_, target_, size_ in zip(batch_X, batch_Y, batch_sizes):
sess.run(minimize, {_seq_system:data_, _seq_length_system:size_, y_system_:target_})
Regarding the variable scope error, try setting different variable scope for each graph.
with tf.variable_scope('User_LSTM'):
your user_lstm graph
with tf.variable_scope('System_LSTM'):
your system_lstm graph
Also, avoid using same names for different python objects. (ex.optimizer) The second declaration will override the first declaration, which will confuse you when you use tensorboard.
By the way, I would recommend training the model end-to-end fashion rather than running two sessions separately. Try feeding the output tensor of the first LSTM into the second LSTM with single optimizer and loss function.
To be short, to solve the problem(Variable rnn/basic_lstm_cell/weights already exists), what you need are 2 separated variable scopes (as is mentioned by #J-min). Because in tensorflow, variables are organized by their names, and by manage these two sets of variables in the two scopes, tensorflow will be able to distinguish them from each other.
And by train them separately on tensorflow, I suppose that you want to define two distinct loss functions, and optimize these two LSTM networks with two optimizers, each corresponding to one of the loss functions before.
Under such circumstances, you need to get the lists of these two sets of variables, and pass these lists to your optimizer, like that
opt1 = GradientDescentOptimizer(learning_rate=0.1)
opt_op1 = opt.minimize(loss1, var_list=<list of variables from scope 1>)
opt2 = GradientDescentOptimizer(learning_rate=0.1)
opt_op2 = opt.minimize(loss2, var_list=<list of variables from scope 2>)
Just change argument name of the cells while initializing them. For example:
user_cell = tf.contrib.rnn.BasicLSTMCell(no_units, name='user')
system_cell = tf.contrib.rnn.BasicLSTMCell(no_units, name='system')
In this way, TensorFlow won't share the variables of two cells. Then you can get the outputs as:
output_user, hidden_states_user = tf.nn.dynamic_rnn(
user_cell,
_seq_system,
dtype=tf.float32,
sequence_length=_seq_length_system
)
output_system, hidden_states_system = tf.nn.dynamic_rnn(
system_cell,
_seq_system,
dtype=tf.float32,
sequence_length=_seq_length_system
)