About the tf.contrib.data.Dataset (from TensorFlow 1.2, see here and here) usage:
When I use repeat (for multiple epochs) together with shuffle (as read_batch_features does internally), how will I notice when some epochs ends, and what the current epoch is? Also, when the epoch ends, will the ShuffleDataset wait first to dequeue everything or will it already be filled with more data from the next epoch? In the last epoch, or if I don't use repeat, will the ShuffleDataset dequeue all remaining data, like tf.RandomShuffleQueue dequeueing does after close?
My current solution, which also gives me more control: I would not use repeat but go once over the data and use ShuffleDataset to get shuffling like RandomShuffleQueue, and then at some point I get OutOfRangeError and I know that I reached the end of the epoch. Then I reinitializable the iterator, like it is described here.
The behavior of Dataset.shuffle() depends on where in your pipeline it appears relative to the Dataset.repeat():
If you shuffle before the repeat, the sequence of outputs will first produce all records from epoch i, before any record from epoch i + 1.
If you shuffle after the repeat, the sequence of outputs may produce records from epoch i before or after epoch i + 1 (and, epoch i + k, with probability that increases with the buffer_size and decreases with k).
If you want to perform some computation between epochs, and avoid mixing data from different epochs, it is probably easiest to avoid repeat() and catch the OutOfRangeError at the end of each epoch.
There are some more interesting pipelines you could build to track the epoch number. For example, you could encode an epoch number as a component of each element:
dataset = (
Dataset.range(None).flat_map(lambda epoch_num:
Dataset.zip(
(Dataset.from_tensors(epoch_num).repeat(), # Infinite repeat of `epoch_num`.
..., # Definition of a Dataset over a single epoch.
)
)
)
)
...where ... is the expression that defines a Dataset for a single epoch, and includes batching and shuffling.
Related
What is an epoch when you're using a generator for your model.fit data?
it makes sense with the standard NumPy-array dataset - the epoch is the processing of the entire dataset.
However, with a generator, there's no length - hence no "epochs".
Does the epoch simply represent an arbitrarily sized group of steps, when using a generator-dataset?
Is there something special that happens at the end of an epoch?
Yes, an epoch is an arbitrary group of steps but generally it's one pass through the whole dataset.
However, you don't define that in the generator. You write a generator that yields batches, and then calculate steps_per_epoch = int(training_samples / batch_size ) something like that, and then pass the steps_per_epoch to the training/fit generator function (In keras for example).
Regarding the second question: Yes you can evaluate the model at the end of each epoch and log it to see the improvements, you can also save model checkpoints.
As I understand what of the diff between regular GAN to WGAN is that we train the discriminator/critic with more examples in each epoch. If in the regular gan we have in each epoch one batch for both modules, in WGAN we will have 5 batches (or more) for the discriminator and one for the generator.
So basically we have another inner loop for the discriminator :
real_images_labels = np.ones((BATCH_SIZE, 1))
fake_images_labels = -real_images_labels
for epoch in range(epochs):
for batch in range(NUM_BACHES):
for critic_iter in range(n_critic):
random_batches_idx = np.random.randint(0, NUM_BACHES) # Choose random batch from dataset
imgs_data=dataset_list[random_batches_idx]
c_loss_real = critic.train_on_batch(imgs_data, real_images_labels) # update the weights after 1 batch
noise = tf.random.normal([imgs_data.shape[0], noise_dim]) # Generate noise data
generated_images = generator(noise, training=True)
c_loss_fake = critic.train_on_batch(generated_images, fake_images_labels) # update the weights after 1 batch
imgs_data=dataset_list[batch]
noise = tf.random.normal([imgs_data.shape[0], noise_dim]) # Generate noise data
gen_loss_batch = gen_loss_batch + gan.train_on_batch(noise,real_images_labels)
The training is taking me a lot of time, per epoch about 3m. The idea I had to decrease the training time is instead running forward for each batch n_critic times I can increase the batch_size for the discriminator and run forward one time with a bigger batch_size.
I am seeking feedback: does it sound reasonable?
(I didn't paste my entire code, it was just a part of it).
Yes, it does sound reasonable typically increasing batch_size during training, typically decreases the training time with a cost of using more memory and lower accuracy (lower generalization ability).
Having said this you should do always do trial and error with regards to batching as extreme values may or may not increase the training time.
For further discussion you can refer to this question
I am training a GAN with text data. When I train the discriminator, I randomly sample m positive data from the dataset and generate m negative data with the generator. I found many papers mention about details of implementation such as training epochs. About the training epochs, I have a question about sampling positive data:
Sample from the dataset (maybe shuffled) in order, when the whole dataset is covered, we call 1 epoch
Just as I did, randomly sample positive data, when the total amount of sampled data is the same size as the dataset, we call 1 epoch
Which one is right? or which one is commonly used? or which one is better?
In my opinion, an epoch is when you passed through the whole training data once. and I think in the paper also they mean a pass through the whole training set when they mention an epoch.
However, the epoch can be also defined as after processing k elements, where k can be smaller than n (the size of the training set). Such a definition might make sense when you want to capture get some evaluation about your model on the dev set, and you normally do it after every single epoch.
After all, that is my opinion and my understandings of GAN papers.
Good luck!
I am using the tensorflow DataSet for input data pipeline. I am wondering how to run training without data shuffling in first epoch and start shuffling data from the second epoch.
the graph is usually built before iterative training start and during training it seems not straight-forward how to change the DataSet shuffling behavior since it looks to me kinds of like changing the graph.
any idea?
thanks,
Harry
The buffer_size argument to Dataset.shuffle() can be a computed tf.Tensor, so you can use the following code that uses Dataset.range(NUM_EPOCHS).flat_map(...) to transform a sequence of epoch numbers to the (shuffled or otherwise) elements of a per_epoch_dataset:
NUM_EPOCHS = ... # The total number of epochs.
BUFFER_SIZE = ... # The shuffle buffer size to use from the second epoch on.
per_epoch_dataset = ... # A `Dataset` representing the elements of a single epoch.
def shuffle_after_first_epoch(epoch):
# Set `epoch_buffer_size` to 1 (i.e. no shuffling) in the 0th epoch,
# and `BUFFER_SIZE` thereafter.
epoch_buffer_size = tf.cond(tf.equal(epoch, 0),
lambda: tf.constant(1, tf.int64),
lambda: tf.constant(BUFFER_SIZE, tf.int64))
return per_epoch_dataset.shuffle(epoch_buffer_size)
dataset = tf.data.Dataset.range(NUM_EPOCHS).flat_map(shuffle_after_first_epoch)
I have a question about the behavior of tf.train.batch, if it is intended to provide batches for multiple epochs. I have a TFRecord with multiple SequenceExamples, which I want to process for a number of epochs.
I noticed that the batch operation accepts a parameter allow_smaller_final_batch. How I intended to run an epoch was as follows: write a loop that fetches a number of batches equal to the total number of examples divided by the batch size and run my model on thos batches.
My batches would be based on a string_input_queue which produces the same file name num_epoch numbers of times. My question now is: Will the final batch for each epoch be of a smaller size, with that parameter set, or will only the final batch of my final epoch be of a smaller size? Because if that is the case, I will get an OutOfRangeException before all of my epochs are finished, Ï would think.
So: how exactly does tf.train.batch handle cases like these? And is my approach the correct one for running multiple epochs, or is there a better one?