What is fit_generator doing with a sub-sample of training and validation data? - tensorflow

I'm using a data generator with fit_generator in keras (for both training and validation data).
I was getting unexpected results so I instrumented the generator to output the batch index and count the number of steps since the last epoch. I have added ['acc'] to the model metrics.
When fit_generator runs I see it do several things:
It queues up the validation data (but I'm guessing it doesn't evaluate yet).
It iterates through all the training data and calls on_epoch_end()
It calls another 10 steps of training data. I assume this must be coming from a callback. What is it doing?
It completes iterating through the validation data and calls on_epoch_end()
It calls another 10 steps of validation data. Again, what is it doing?
fit_generator prints train/validation loss and accuracy and returns.
on_epoch_end() is never called after the 10 steps at 3 and 5. This is probably a bug, since we need the generators to be reset before the next epoch.
I'm mainly interested to understand what is going on at 3 and 5- why are the generators called, and why for only ten steps?
Versions:
print(keras.__version__)
2.2.2
print(tf.__version__)
1.9.0

Per the comments from Matias: the additional batches correspond to pre-queued batches for the next epoch. They are dropped on the floor when fit_generator returns. It's up to the user to reset the generators before calling fit_generator again.

Related

TF/Keras: ModelCheckpoint "period" and "save_best_only"

If I use Keras callback ModelCheckpoint, and I put save_best_only = True and period=3, how will the model be saved? After 3 period it saves the best result from that 3 period, or it just saves the best one of all epochs?
Piece of code that I used:
mcp = tf.keras.callbacks.ModelCheckpoint("my_model.h5", monitor="val_accuracy",
save_best_only=True, period=3)
First of all, according to documentation, period argument is deprecated in favor of save_freq argument (which if assigned to an int, it would consider number of seen batches and not the epochs). But for backwards compatibility, the period argument is still working.
But to find out the answer to your question, we need to inspect the source code for ModelCheckpoint callback. Actually, the best value of monitored metric seen so far is updated only if period epochs has passed (since the last checkpoint). And also. since the best metric value seen so far is compared with the monitored metric value of only current epoch, therefore we can conclude that only the best performing model in epochs period, 2*period, 3*period, etc. are compared and saved, and the performance of the model between those epochs is ignored.
Setting period=3 will attempt to save a model every 3 batches. If you want it to save at the end of every epoch, set period='epoch'.
If save_best_only=True it will check to see if the validation accuracy is higher this time than last time and only save that model. If the validation accuracy is not this high, it will not save the model.
Source: https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint#arguments_1

Number or train and validation samples is not shown as a return of model.fit

I noticed while editing an old notebook that model.fit (in keras) where model = Sequential() always returns the number of train and validation samples (for example: Train on 2508 samples, validate on 250 samples) just before showing the epoch progress. Yet I don't see it when I ran the training process again and I immediately see the epoch progress. (Note: verbose is set to 1).
I even checked keras.io/guides all outputs for Sequential.fit() methods don't return this line as well.
Did that happen due to a new update or do I need to add a certain parameter?
tf.compat.v1.disable_eager_execution()

What is Keras doing if my sample size is smaller than my batch size?

fairly new to LSTM, but I already searched for a solution and could not find anything satisfying or even similar enough.
So here is my problem:
I am dealing with sleep classifaction and have annotated records for about 6k patients.
To train my bidirectional LSTM, I pick one patient and fit the model on that data instead of putting all the data from all the patients into one big matrix because I want to prevent patient samples mixing when Keras is training with mini batches.
The sequence length or samples_size per patient are not the same.
Then I loop over all patients and do a additional loop for the number of epochs I considered to train the model for (as described in Developer Guides).
So since LSTMs (if not stateful) reset their cell and hidden state after each batch and the default batch_size for tf.keras.Sequential.fit() is 32 I wanted it to match the sample_size of the patient I am showing to the network. If I do so I am getting a warning and the training process errors after some time. The error is:
WARNING:tensorflow:6 out of the last 11 calls to .distributed_function at 0x0000023F9D517708> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/beta/tutorials/eager/tf_function#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
So I looked up what my longest sample_size is and set my batch_size accordingly.
tl;dr: What is Keras doing in all the instances where my variable sample_size is not matching my batch_size=max(len(sample_size))?
Is it just showing the available samples to the network?
If so: Why is there the warning mentioned above where setting the batch_size=sample_size leads to the failed training?
Or is it showing the available samples to the network and filling up the rest with zeros to match the given batch_size?
If so: Why is there the necessity of masking when using e.g. stateful mode?
edit:
So, I tried some additional workarounds and built my own Data Generator, which proves data of one patient as one batch. I then set steps_per_epoch=len(train_patients) to include all patients into one epoch. No warnings about retracing, which I do not understand either.
It seems to solve my problem of showing one patient per batch without mixing patient data and have a variable sample_size, but I really do not understand the differences between all these posibilities and their different warnings.

Training Estimators less than one epoch using dataset API?

I am trying to train a model on a large dataset. I would like to run the evaluation step multiple times before one epoch of training has been completed. Looking at the implementation of Dataset API with Estimators it looks like every time I restart the training after the evaluation step, Estimator creates a fresh dataset from scratch and the training never completes for the full data.
I have written an input function very similar to one provided on the tensorflow website.
def train_input_fn(features, labels, batch_size):
"""An input function for training"""
# Convert the inputs to a Dataset.
dataset = tf.data.Dataset.from_tensor_slices((dict(features),
labels))
# Shuffle, repeat, and batch the examples.
dataset = dataset.repeat(1).batch(batch_size)
# Return the read end of the pipeline.
return dataset
I then use the tf.estimator.Estimator.train to call my input function. I call the above input function with the following method.
classifier.train(input_fn=lambda: train_input_fn,
steps=n_steps)
where n_steps in number less than the total step taken to complete one epoch.
I then call an evaluation function like this.
classifier.evaluate(input_fn=lambda: eval_input_fn())
I want the run both the step in a loop.
Every time the loop reaches training, It initialization the dataset in the train_input_fn. This applies the training only in first n_steps of training data.
If you want to evaluate multiple times during training, you can check InMemoryEvaluatorHook.
You can probably refer this discussion about train_and_evaluate and InMemoryEvaluatorHook for more details on how to better use them.

What is the relationship between steps and epochs in TensorFlow?

I am going through TensorFlow get started tutorial. In the tf.contrib.learn example, these are two lines of code:
input_fn = tf.contrib.learn.io.numpy_input_fn({"x":x}, y, batch_size=4, num_epochs=1000)
estimator.fit(input_fn=input_fn, steps=1000)
I am wondering what is the difference between argument steps in the call to fit function and num_epochs in the numpy_input_fn call. Shouldn't there be just one argument? How are they connected?
I have found that code is somehow taking the min of these two as the number of steps in the toy example of the tutorial.
At least, one of the two parameters either num_epochs or steps has to be redundant. We can calculate one from the other. Is there a way I can know how many steps (number of times parameters get updated) my algorithm actually took?
I am curious about which one takes precedence. And does it depend on some other parameters?
TL;DR: An epoch is when your model goes through your whole training data once. A step is when your model trains on a single batch (or a single sample if you send samples one by one). Training for 5 epochs on a 1000 samples 10 samples per batch will take 500 steps.
The contrib.learn.io module is not documented very well, but it seems that numpy_input_fn() function takes some numpy arrays and batches them together as input for a classificator. So, the number of epochs probably means "how many times to go through the input data I have before stopping". In this case, they feed two arrays of length 4 in 4 element batches, so it will just mean that the input function will do this at most a 1000 times before raising an "out of data" exception. The steps argument in the estimator fit() function is how many times should estimator do the training loop. This particular example is somewhat perverse, so let me make up another one to make things a bit clearer (hopefully).
Lets say you have two numpy arrays (samples and labels) that you want to train on. They are a 100 elements each. You want your training to take batches with 10 samples per batch. So after 10 batches you will go through all of your training data. That is one epoch. If you set your input generator to 10 epochs, it will go through your training set 10 times before stopping, that is it will generate at most a 100 batches.
Again, the io module is not documented, but considering how other input related APIs in tensorflow work, it should be possible to make it generate data for unlimited number of epochs, so the only thing controlling the length of training are going to be the steps. This gives you some extra flexibility on how you want your training to progress. You can go a number of epochs at a time or a number of steps at a time or both or whatever.
Epoch: One pass through the entire data.
Batch size: The no of examples seen in one batch.
If there are 1000 examples and the batch size is 100, then there will be 10 steps per epoch.
The Epochs and batch size completely define the number of steps.
steps_cal = (no of ex / batch_size) * no_of_epochs
estimator.fit(input_fn=input_fn)
If you just write the above code, then the value of 'steps' is as given by 'steps_cal' in the above formula.
estimator.fit(input_fn=input_fn, steps = steps_less)
If you give a value(say 'steps_less') less than 'steps_cal', then only 'steps_less' no of steps will be executed.In this case, the training will not cover the entire no of epochs that were mentioned.
estimator.fit(input_fn=input_fn, steps = steps_more)
If you give a value(say steps_more) more than steps_cal, then also 'steps_cal' no of steps will be executed.
Let's start the opposite the order:
1) Steps - number of times the training loop in your learning algorithm will run to update the parameters in the model. In each loop iteration, it will process a chunk of data, which is basically a batch. Usually, this loop is based on the Gradient Descent algorithm.
2) Batch size - the size of the chunk of data you feed in each loop of the learning algorithm. You can feed the whole data set, in which case the batch size is equal to the data set size.You can also feed one example at a time. Or you can feed some number N of examples.
3) Epoch - the number of times you run over the data set extracting batches to feed the learning algorithm.
Say you have 1000 examples. Setting batch size = 100, epoch = 1 and steps = 200 gives a process with one pass (one epoch) over the entire data set. In each pass it will feed the algorithm a batch with 100 examples. The algorithm will run 200 steps in each batch. In total, 10 batches are seen. If you change the epoch to 25, then it will do this 25 times, and you get 25x10 batches seen altogether.
Why do we need this? There are many variations on gradient descent (batch, stochastic, mini-batch) as well as other algorithms for optimizing the learning parameters (e.g., L-BFGS). Some of them need to see the data in batches, while others see one datum at a time. Also, some of them include random factors/steps, hence you might need multiple passes on the data to get good convergence.
This answer is based on the experimentation I have done on the getting started tutorial code.
Mad Wombat has given a detailed explanation of the terms num_epochs, batch_size and steps. This answer is an extension to his answer.
num_epochs - The maximum number of times the program can iterate over the entire dataset in one train(). Using this argument, we can restrict the number of batches that can be processed during execution of one train() method.
batch_size - The number of examples in a single batch emitted by the input_fn
steps - Number of batches the LinearRegressor.train() method can process in one execution
max_steps is another argument for LinearRegressor.train() method. This argument defines the maximum number of steps (batches) can process in the LinearRegressor() objects lifetime.
Let's whats this means. The following experiments change two lines of the code provided by the tutorial. Rest of the code remains as is.
Note: For all the examples, assume the number of training i.e. the length of x_train to be equal to 4.
Ex 1:
input_fn = tf.estimator.inputs.numpy_input_fn(
{"x": x_train}, y_train, batch_size=4, num_epochs=2, shuffle=True)
estimator.train(input_fn=input_fn, steps=10)
In this example, we defined the batch_size = 4 and num_epochs = 2. So, the input_fn can emit just 2 batches of input data for one execution of train(). Even though we defined steps = 10, the train() method stops after 2 steps.
Now, execute the estimator.train(input_fn=input_fn, steps=10) again. We can see that 2 more steps have been executed. We can keep executing the train() method again and again. If we execute train() 50 times, a total of 100 steps have been executed.
Ex 2:
input_fn = tf.estimator.inputs.numpy_input_fn(
{"x": x_train}, y_train, batch_size=2, num_epochs=2, shuffle=True)
estimator.train(input_fn=input_fn, steps=10)
In this example, the value of batch_size is changed to 2 (it was equal to 4 in Ex 1). Now, in each execution of train() method, 4 steps are processed. After the 4th step, there are no batches to run on. If the train() method is executed again, another 4 steps are processed making it a total of 8 steps.
Here, the value of steps doesn't matter because the train() method can get a maximum of 4 batches. If the value of steps is less than (num_epochs x training_size) / batch_size, see ex 3.
Ex 3:
input_fn = tf.estimator.inputs.numpy_input_fn(
{"x": x_train}, y_train, batch_size=2, num_epochs=8, shuffle=True)
estimator.train(input_fn=input_fn, steps=10)
Now, let batch_size = 2, num_epochs = 8 and steps = 10. The input_fn can emit a total of 16 batches in one run of train() method. However, steps is set to 10. This means that eventhough input_fn can provide 16 batches for execution, train() must stop after 10 steps. Ofcourse, train() method can be re-executed for more steps cumulatively.
From examples 1, 2, & 3, we can clearly see how the values of steps, num_epoch and batch_size affect the number of steps that can be executed by train() method in one run.
The max_steps argument of train() method restricts the total number of steps that can be run cumulatively by train()
Ex 4:
If batch_size = 4, num_epochs = 2, the input_fn can emit 2 batches for one train() execution. But, if max_steps is set to 20, no matter how many times train() is executed only 20 steps will run in optimization. This is in contrast to example 1, where the optimizer can run to 200 steps if the train() method is exuted 100 times.
Hope this gives a detailed understanding of what these arguments mean.
num_epochs: the maximum number of epochs (seeing each data point).
steps: the number of updates (of parameters).
You can update multiple times in an epoch
when the batch size is smaller than the number of training data.