Memory leak tf.data + Keras - tensorflow

I have a memory leak in my training pipeline and don't know how to fix it.
I use Tensorflow version: 1.9.0 and Keras (tf) version: 2.1.6-tf with Python 3.5.2
This is how my training pipeline looks like:
for i in range(num_epochs):
training_data = training_set.make_one_shot_iterator().get_next()
hist = model.fit(training_data[0],[training_data[1],training_data[2],training_data[3]],
steps_per_epoch=steps_per_epoch_train,epochs=1, verbose=1, callbacks=[history, MemoryCallback()])
# custom validation
It looks like memory of the iterator is not freed after the iterator is exhausted. I have already tried del traininig_data after model.fit. It didn't work.
Can anybody give some hints?
Edit:
This is how I create the dataset.
dataset = tf.data.TFRecordDataset(tfrecords_filename)
dataset = dataset.map(map_func=preprocess_fn, num_parallel_calls=8)
dataset = dataset.shuffle(100)
dataset = dataset.batch(batch_size=batch_size)
dataset = dataset.prefetch(1)

Including the repeat() method to reinitialize your iterator might solve your problem. You can take a look at Input Pipeline Performance Guide to figure out what would be the a good optimized order of your methods according to your requirements.
dataset = dataset.shuffle(100)
dataset = dataset.repeat() # Can specify num_epochs as input if needed
dataset = dataset.batch(batch_size=batch_size)
dataset = dataset.prefetch(1)
In case you can afford to do the validation as a part of the fit method, you can use something like the code below and lose the loop altogether to make your life easier.
training_data = training_set.make_one_shot_iterator().get_next()
# val_data refers to your validation data and steps_per_epochs_val refers to no of your validation batches
hist = model.fit(training_data[0],training_data[1],training_data[2],training_data[3]], validation_data=val_data.make_one_shot_iterator(), validation_steps=steps_per_epochs_val,
steps_per_epoch=steps_per_epoch_train, epochs=num_epochs, verbose=1, callbacks=[history, MemoryCallback()])
Reference: https://github.com/keras-team/keras/blob/master/examples/mnist_dataset_api.py

Related

Training with Dataset API and numpy array yields completely different results

I have a CNN regression model and feature comes in (2000, 3000, 1) shape, where 2000 is total number of samples with each being a (3000, 1) 1D array. Batch size is 8, 20% of the full dataset is used for validation.
However, zip feature and label into tf.data.Dataset gives completely different scores from feeding numpy arrays directly in.
The tf.data.Dataset code looks like:
# Load features and labels
features = np.array(features) # shape is (2000, 3000, 1)
labels = np.array(labels) # shape is (2000,)
dataset = tf.data.Dataset.from_tensor_slices((features, labels))
dataset = dataset.shuffle(buffer_size=2000)
dataset = dataset.batch(8)
train_dataset = dataset.take(200)
val_dataset = dataset.skip(200)
# Training model
model.fit(train_dataset, validation_data=val_dataset,
batch_size=8, epochs=1000)
The numpy code looks like:
# Load features and labels
features = np.array(features) # exactly the same as previous
labels = np.array(labels) # exactly the same as previous
# Training model
model.fit(x=features, y=labels, shuffle=True, validation_split=0.2,
batch_size=8, epochs=1000)
Except for this, other code is exactly the same, for example
# Set global random seed
tf.random.set_seed(0)
np.random.seed(0)
# No preprocessing of feature at all
# Load model (exactly the same)
model = load_model()
# Compile model
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss=tf.keras.losses.MeanSquaredError(),
metrics=[tf.keras.metrics.mean_absolute_error, ],
)
The former method via tf.data.Dataset API yields mean absolute error (MAE) around 10-3 on both training and validation set, which looks quite suspicious as the model doesn't have any drop-out or regularization to prevent overfitting. On the other hand, feeding numpy arrays right in gives training MAE around 0.1 and validation MAE around 1.
The low MAE of tf.data.Dataset method looks super suspicious however I just couldn't figure out anything wrong with the code. Also I could confirm the number of training batches is 200 and validation batches is 50, meaning I didn't use the training set for validation.
I tried to vary the global random seed or use some different shuffle seeds, which didn't change the results much. Training was done on NVIDIA V100 GPUs, and I tried tensorflow version 2.9, 2.10, 2.11 which didn't make much difference.
The problem lies in the default behaviour of "shuffle" method of tf.data.Dataset, more specificially the reshuffle_each_iteration argument which is by default True. Meaning if I implement the following code:
dataset = tf.data.Dataset.from_tensor_slices((features, labels))
dataset = dataset.shuffle(buffer_size=2000)
dataset = dataset.batch(8)
train_dataset = dataset.take(200)
val_dataset = dataset.skip(200)
model.fit(train_dataset, validation_data=val_dataset, batch_size=8, epochs=1000)
The dataset would actually be shuffle after each epoch though it might not look so apparently so. As a result, the validation data would leak into training set (in fact there would be no distinguish between these two sets as the order is shuffled every epoch).
So make sure to set reshuffle_each_iteration to False if you would like to shuffle the dataset and then do train-val split.
UPDATE: TensorFlow confirms this issue and warning would be added in future docs.
PS: It's a hard lesson for me, as I have been using the model for analysing the results for several months (as a graduating MPhil student).

Training seq2seq model on Google Colab TPU with big dataset - Keras

I'm trying to train a sequence to sequence model for machine translation using Keras on Google Colab TPU.
I have a dataset which I can load in memory but I have to preprocess to it to feed it to the model. In particular I need to convert the target words to one hot vectors and with many examples I can't load the entire conversion in memory, so I need to make batches of data.
I'm using this function as a batch generator:
def generate_batch_bert(X_ids, X_masks, y, batch_size = 1024):
''' Generate a batch of data '''
while True:
for j in range(0, len(X_ids), batch_size):
# batch of encoder and decoder data
encoder_input_data_ids = X_ids[j:j+batch_size]
encoder_input_data_masks = X_masks[j:j+batch_size]
y_decoder = y[j:j+batch_size]
# decoder target and input for teacher forcing
decoder_input_data = y_decoder[:,:-1]
decoder_target_seq = y_decoder[:,1:]
# batch of decoder target data
decoder_target_data = to_categorical(decoder_target_seq, vocab_size_fr)
# keep only with the right amount of instances for training on TPU
if encoder_input_data_ids.shape[0] == batch_size:
yield([encoder_input_data_ids, encoder_input_data_masks, decoder_input_data], decoder_target_data)
The problem is that whenever I try to run the fit function as follows:
model.fit(x=generate_batch_bert(X_train_ids, X_train_masks, y_train, batch_size = batch_size),
steps_per_epoch = train_samples//batch_size,
epochs=epochs,
callbacks = callbacks,
validation_data = generate_batch_bert(X_val_ids, X_val_masks, y_val, batch_size = batch_size),
validation_steps = val_samples//batch_size)
I get the following error:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_util.py:445 make_tensor_proto
raise ValueError("None values not supported.")
ValueError: None values not supported.
Not sure what's wrong and how I can solve this problem.
EDIT
I tried loading less amount of data in memory so that the conversion to one hot encoding of the target words doesn't crash the kernel and it actually works. So there is obviously something wrong on how I generate batches.
It's hard to tell what's wrong since you don't provide your model
definition nor any sample data. However, I'm fairly certain that you're
running into the same
TensorFlow bug
that I recently got bitten by.
The workaround is to use the tensorflow.data API which works much
better with TPUs. Like this:
from tensorflow.data import Dataset
import tensorflow as tf
def map_fn(X_id, X_mask, y):
decoder_target_data = tf.one_hot(y[1:], vocab_size_fr)
return (X_id, X_mask, y[:-1]), decoder_target_data
...
X_ids = Dataset.from_tensor_slices(X_ids)
X_masks = Dataset.from_tensor_slices(X_masks)
y = Dataset.from_tensor_slices(y)
ds = Dataset.zip((X_ids, X_masks, y)).map(map_fn).batch(1024)
model.fit(x = ds, ...)

End of Sequence Error when using tf.estimator and tf.data

I am using tf.estimator.train_and_evaluate and tf.data.Dataset to feed data to the estimator:
Input Data function:
def data_fn(data_dict, batch_size, mode, num_epochs=10):
dataset = {}
if mode == tf.estimator.ModeKeys.TRAIN:
dataset = tf.data.Dataset.from_tensor_slices(data_dict['train_data'].astype(np.float32))
dataset = dataset.cache()
dataset = dataset.shuffle(buffer_size= batch_size * 10).repeat(num_epochs).batch(batch_size)
else:
dataset = tf.data.Dataset.from_tensor_slices(data_dict['valid_data'].astype(np.float32))
dataset = dataset.cache()
dataset = dataset.batch(batch_size)
iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()
return next_element
Train Function:
def train_model(data):
tf.logging.set_verbosity(tf.logging.INFO)
config = tf.ConfigProto(allow_soft_placement=True,
log_device_placement=False)
config.gpu_options.allow_growth = True
run_config = tf.contrib.learn.RunConfig(
save_checkpoints_steps=10,
keep_checkpoint_max=10,
session_config=config
)
train_input = lambda: data_fn(data, 100, tf.estimator.ModeKeys.TRAIN, num_epochs=1)
eval_input = lambda: data_fn(data, 1000, tf.estimator.ModeKeys.EVAL)
estimator = tf.estimator.Estimator(model_fn=model_fn, params=hps, config=run_config)
train_spec = tf.estimator.TrainSpec(train_input, max_steps=100)
eval_spec = tf.estimator.EvalSpec(eval_input,
steps=None,
throttle_secs = 30)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
The training goes fine, but when it comes to evaluation I get this error:
OutOfRangeError (see above for traceback): End of sequence
If I don't use Dataset.batch on evaluation dataset (by omitting the line dataset[name] = dataset[name].batch(batch_size) in data_fn) I get the same error but after a much longer time.
I can only avoid this error if I don't batch the data and use steps=1 for evaluation, but does that perform the evaluation on the whole dataset?
I don't understand what causes this error as the documentation suggests I should be able to evaluate on batches too.
Note: I get the same error when using tf.estimator.evaluate on data batches.
I posted this question as a github issue and here is the response from the Tensorflow team:
https://github.com/tensorflow/tensorflow/issues/19541
Copying from "xiejw" for completeness:
If I understand correctly, this issue is "once give estimator an input_fn with dataset inside, the evaluate process will error out with OutOfRangeError."
Estimator can handle this correctly actually. However, a known common root cause for this is metrics defined in model_fn have bug. We need to rule that part out first.
#mrezak if possible, can you show the code about the model_fn? Or if you have a minimal reproducible script, that will be extremely helpful. -- Thanks in advance.
A common problem for this is: metric in tensorflow should return two Ops: update_op and value_op. Estimator calls the update_op for each batch of the data in input source and, once it is exhausted, it call the value_op to get the metric values. The value_op here should have dependency back to variables reading only.
Many model_fn puts the dependency of value_op with the input pipeline, so, estimator.evaluate will thereby trigger the input pipeline one more time, which errors out with OutOfRangeError
The problem was indeed how I defined the eval_metric in model_fn. In my actual code my total loss to be optimized was composed of multiple losses (reconstruction + L2 + KL) and in the evaluation part I wanted to get the reconstruction loss (on the validation data), which depended on the input data pipeline. My actual reconstruction cost was more complex than MSE (none of the other tf.metric functions as well) which was not straightforward to be implemented using tf.metric basic functions.
This is "xiejw"'s suggestion which fixed the issue:
my_total_loss = ... # the loss you care. Pay attention to how you reduce the loss.
eval_metric_ops = {'total_loss: tf.metrics.mean(my_total_loss)}

How to speed up batch preparation when using Estimators API combined with tf.data.Dataset

I'd like to speed up my training routine that uses the Estimator API with input_fn wrote using tf.data.Dataset.
My implementation takes 2 second to prepare a batch of data and then runs training on GPU for 1 sec, and then start over preparing a batch. Which is really inefficient.
I'm looking for a way to prepare the batches asynchronously and upload them to GPU to speed up the training. Or alternatively for a way to cache datasets between invocations of input_fn (the dataset.cache() doesn't seems to be a good choice as the dataset has to be recreated on each input_fn invocation).
Here is a simplified version of my code:
def input_fn(filenames, labels, epochs):
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
dataset = dataset.map(_read_wav, num_parallel_calls=num_map_threads)
if shuffle:
dataset = dataset.shuffle(buffer_size=len(labels))
dataset = dataset.map(_post_process, num_parallel_calls=num_map_threads)
dataset = dataset.map(lambda wav, label: ({'wav': wav}, label))
dataset = dataset.batch(128)
dataset = dataset.repeat(epochs) # to iterate over the training set forever
iterator = dataset.dataset.make_one_shot_iterator()
features, labels = iterator.get_next()
return features, labels
train_input_fn = lambda : input_fn(train_files, train_labels, None)
eval_input_fn = lambda : input_fn(eval_files, eval_labels, 1)
train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn, max_steps=45000)
eval_spec = tf.estimator.EvalSpec(input_fn=eval_input_fn)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
I've noticed that the Estimator API is under active development and in the master branch of tensorflow the input_fn can return datasets already, so maybe I'm asking too early and this feature isn't ready yet. But if so, please provide a ticket where this implementation can be tracked.
Using tf.data.Dataset.cache() is indeed not a good choice since it will cache the whole dataset into memory, which takes time and might overflow your memory.
The way to go is to use tf.data.Dataset.prefetch() at the end of your pipeline, which will always make sure that the data pipeline holds buffer_size elements. It is usually enough to have buffer_size = 1 at the end:
dataset = ...
dataset = dataset.batch(128)
dataset = dataset.prefetch(1) # prefetch one batch
As explained by #mrry in this answer, you can also try to increase the number of prefetched batches a bit.
Typically it is most useful to add a small prefetch buffer (with perhaps just a single element) at the very end of the pipeline, but more complex pipelines can benefit from additional prefetching, especially when the time to produce a single element can vary.
If you still have a slow input pipeline compared to your GPU computations, you need to increase the number of threads working in parallel using the num_parallel_calls argument of tf.data.Dataset.map().
A few points to add to Olivier's answer, mostly from this post:
repeat before shuffle is slightly faster, at the downside of blurred epoch boundaries. This may be significant in rare cases, but I doubt it.
shuffle before mapping - this reduces the memory foot print of your shuffle buffer size, since it only needs to buffer the filenames rather than the file contents.
it makes more sense to me to apply the third map transform to the output of get_next() rather than the dataset - not sure if that affects speed much. You could also consider putting both other map calls in the same one to reduce scheduling issues.
experiment with repeat before batching. Probably won't make a difference, but might be minor. If you repeat before shuffle as mentioned above you'll have to.
as mentioned by Olivier, use prefetch.
Code with modifications:
def input_fn(filenames, labels, epochs):
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
dataset = dataset.repeat(epochs)
if shuffle:
dataset = dataset.shuffle(buffer_size=len(labels))
def combined_map_fn(*args):
return _post_process(_read_wav(*args))
dataset = dataset.map(combined_map_fn, num_parallel_calls=num_map_threads)
dataset = dataset.batch(128)
dataset = dataset.prefetch(1)
iterator = dataset.dataset.make_one_shot_iterator()
wavs, labels = iterator.get_next()
features = {'wav': wavs}
return features, labels

Tensorflow : Trainning and test into the same graph with input queues

I am facing to an issue that can't solve with what I found on the internet.
I have build my neural network and connect it to inpute pipeline.
Reading data from tfrecord, with tf.train.batch and queueRunners, Coords, etc..
I have build my NN into a python class named "Model" that I use like :
model = Model(...all hyperparameter here...)
...
model.predict()
or
model.step()
All the training phase works very well.
But now I would like to add a test phase every X epoch/step of training.
I really don't know how to do this.
I have several idea but I don't find the best one:
Duplicate the code into my class to get : loss_train and loss_test, and so on for each node of my graph ? (using sharing variable between train and test)
create 2 instance of my model :
model_train = Model(reuse=false)
model_test = Model(reuse=true)
use tf.make_template ? I really don't found any good exemple of this fonction ...
any other solution ?
I would appreciate any suggestion,
I came across the same Problem when experimenting with TFRecords Datasets. There are several possibilities. Since I wanted to do this on a Computer with only one GPU anyways I implemented it as follows:
# Training Dataset
train_dataset = tf.contrib.data.TFRecordDataset(train_files)
train_dataset = train_dataset.map(parse_function)
train_dataset = train_dataset.shuffle(buffer_size=10000)
train_dataset = train_dataset.batch(200)
# Validation Dataset
validation_dataset = tf.contrib.data.TFRecordDataset(val_files)
validation_dataset = validation_dataset.map(parse_function)
validation_dataset = validation_dataset.batch(200)
# A feedable iterator is defined by a handle placeholder and its structure. We
# could use the `output_types` and `output_shapes` properties of either
# `training_dataset` or `validation_dataset` here, because they have
# identical structure.
handle = tf.placeholder(tf.string, shape=[])
iterator = tf.contrib.data.Iterator.from_string_handle(handle,
train_dataset.output_types, train_dataset.output_shapes)
next_element = iterator.get_next()
# Generate the Iterators
training_iterator = train_dataset.make_initializable_iterator()
validation_iterator = validation_dataset.make_one_shot_iterator()
# The `Iterator.string_handle()` method returns a tensor that can be evaluated
# and used to feed the `handle` placeholder.
training_handle = sess.run(training_iterator.string_handle())
validation_handle = sess.run(validation_iterator.string_handle())
Then for accessing the elements, you can just go like:
img, lbl = sess.run(next_element, feed_dict={handle: training_handle})
And exchange the handle dependant on what you are willing to do ATM.
Keep in mind that this is not parallelizable, however. Following this link, you can get insight into the different methods of creating multiple input pipelines Tensorflow | Reading Data.