This question already has answers here:
Choosing number of Steps per Epoch
(4 answers)
Closed 2 years ago.
As per the definition from documentation :
Batch size : Number of samples per gradient update.
Steps per epoch : Total number of steps (batches of samples) before declaring one epoch finished and starting the next epoch
How are they any different and how are they dependent on each other, if they are?
here is a simple example. Assume that you have 1,000 training samples and you set the batch size to 50. In that case you will need to run 1000/50 =20 batches of data if you want to go through all of your training data once for each epoch. So to do that you set steps_per_epoch= 20. Many people set steps_per_epoch=number of train samples//batch_size. This is a good approximation to go through all your training examples in an epoch but it only works EXACTLY once if batch_size is a factor of the number of train samples. Below if a piece of code I wrote that determines the batch_size and steps_per_epoch to go through the samples EXACTLY once per epoch. In the code length is equal to the number of samples and b_max is the maximum batch size you will allow based on memory constraints.
batch_size=sorted([int(length/n) for n in range(1,length+1) if length % n ==0 and length/n<=b_max],reverse=True)[0]
steps=int(length/batch_size)
For training going through the training set exactly once isn't that important if you shuffle your data.. However for validation and test going through the validation set once or the test set once is important to get a precisely true result.
I am training a GAN with text data. When I train the discriminator, I randomly sample m positive data from the dataset and generate m negative data with the generator. I found many papers mention about details of implementation such as training epochs. About the training epochs, I have a question about sampling positive data:
Sample from the dataset (maybe shuffled) in order, when the whole dataset is covered, we call 1 epoch
Just as I did, randomly sample positive data, when the total amount of sampled data is the same size as the dataset, we call 1 epoch
Which one is right? or which one is commonly used? or which one is better?
In my opinion, an epoch is when you passed through the whole training data once. and I think in the paper also they mean a pass through the whole training set when they mention an epoch.
However, the epoch can be also defined as after processing k elements, where k can be smaller than n (the size of the training set). Such a definition might make sense when you want to capture get some evaluation about your model on the dev set, and you normally do it after every single epoch.
After all, that is my opinion and my understandings of GAN papers.
Good luck!
I have a large training dataset created by a generator, about 60,000 batches (size 32). Due to the time required for training, I need to use a callback to save the model periodically. However, I want to save it more frequently than once per epoch of 60,000, because that takes about 2 hours on Colab.
As I understand it, setting steps_per_epoch will give me smaller epochs, Say, 10,000. What is not clear to me from the documentation is will this still cycle through my whole 60k batches, or will it stop at 10k and just repeat that 10k? i.e. Does a new epoch start from where the last one left off when using steps_per_epoch?
Thanks, Julian
While I don't know about that option specifically, it wouldn't reuse old data because datasets are only meant to be processed forwards. If it repeated the data, it would have to store a copy of everything it's already processed somewhere since you can't reset a generator. That wouldn't be practical on a large dataset.
In keras, both model.fit and model.predict has a parameter of batch_size. My understanding is that batch size in model.fit is related to batch optimization, what's the physical meaning of batch_size in model_predict? Does it need to be equal to the one used by model.fit?
No it doesn‘t. Imagine inside your model there is a function which increases the amount of memory significantly. Therefore, you might run into resource errors if you try to predict all your data in one go. This is often the case when you use gpu with limited gpu memory for predicting. So instead you choose to predict only small batches at the same time. The batch_size parameter in the predict function will not alter your results in any way. So you can choose any batch_size you want for prediction.
It depends on your model and whether the batch size when training must match the batch size when predicting. For example, if you're using a stateful LSTM then the batch size matters because the entire sequence of data is spread across multiple batches, i.e. it's one long sequence that transcends the batches. In that case the batch size used to predict should match the batch size when training because it's important they match in order to define the whole length of the sequence. In stateless LSTM, or regular feed-forward perceptron models the batch size doesn't need to match, and you actually don't need to specify it for predict().
Just to add; this is different to train_on_batch() where you can supply a batch of input samples and get an equal number of prediction outputs. So, if you create a batch of 100 samples, you submit to train_on_batch() then you get 100 predictions, i.e. one for each sample. This can have performance benefits over issuing one at a time to predict().
As said above, batch size just increases the number of training data that is fed in at one go(batches). Increasing it may increase chances of your computer resources running out, assuming you are running it on your personal computer. If you are running it on the cloud with higher resources, you should be fine. You can toggle the number as you want, but don't put in a big number, I suggest going up slowly. Also, you may want to read this before you increase your batch size:
https://stats.stackexchange.com/questions/164876/tradeoff-batch-size-vs-number-of-iterations-to-train-a-neural-network
I am going through TensorFlow get started tutorial. In the tf.contrib.learn example, these are two lines of code:
input_fn = tf.contrib.learn.io.numpy_input_fn({"x":x}, y, batch_size=4, num_epochs=1000)
estimator.fit(input_fn=input_fn, steps=1000)
I am wondering what is the difference between argument steps in the call to fit function and num_epochs in the numpy_input_fn call. Shouldn't there be just one argument? How are they connected?
I have found that code is somehow taking the min of these two as the number of steps in the toy example of the tutorial.
At least, one of the two parameters either num_epochs or steps has to be redundant. We can calculate one from the other. Is there a way I can know how many steps (number of times parameters get updated) my algorithm actually took?
I am curious about which one takes precedence. And does it depend on some other parameters?
TL;DR: An epoch is when your model goes through your whole training data once. A step is when your model trains on a single batch (or a single sample if you send samples one by one). Training for 5 epochs on a 1000 samples 10 samples per batch will take 500 steps.
The contrib.learn.io module is not documented very well, but it seems that numpy_input_fn() function takes some numpy arrays and batches them together as input for a classificator. So, the number of epochs probably means "how many times to go through the input data I have before stopping". In this case, they feed two arrays of length 4 in 4 element batches, so it will just mean that the input function will do this at most a 1000 times before raising an "out of data" exception. The steps argument in the estimator fit() function is how many times should estimator do the training loop. This particular example is somewhat perverse, so let me make up another one to make things a bit clearer (hopefully).
Lets say you have two numpy arrays (samples and labels) that you want to train on. They are a 100 elements each. You want your training to take batches with 10 samples per batch. So after 10 batches you will go through all of your training data. That is one epoch. If you set your input generator to 10 epochs, it will go through your training set 10 times before stopping, that is it will generate at most a 100 batches.
Again, the io module is not documented, but considering how other input related APIs in tensorflow work, it should be possible to make it generate data for unlimited number of epochs, so the only thing controlling the length of training are going to be the steps. This gives you some extra flexibility on how you want your training to progress. You can go a number of epochs at a time or a number of steps at a time or both or whatever.
Epoch: One pass through the entire data.
Batch size: The no of examples seen in one batch.
If there are 1000 examples and the batch size is 100, then there will be 10 steps per epoch.
The Epochs and batch size completely define the number of steps.
steps_cal = (no of ex / batch_size) * no_of_epochs
estimator.fit(input_fn=input_fn)
If you just write the above code, then the value of 'steps' is as given by 'steps_cal' in the above formula.
estimator.fit(input_fn=input_fn, steps = steps_less)
If you give a value(say 'steps_less') less than 'steps_cal', then only 'steps_less' no of steps will be executed.In this case, the training will not cover the entire no of epochs that were mentioned.
estimator.fit(input_fn=input_fn, steps = steps_more)
If you give a value(say steps_more) more than steps_cal, then also 'steps_cal' no of steps will be executed.
Let's start the opposite the order:
1) Steps - number of times the training loop in your learning algorithm will run to update the parameters in the model. In each loop iteration, it will process a chunk of data, which is basically a batch. Usually, this loop is based on the Gradient Descent algorithm.
2) Batch size - the size of the chunk of data you feed in each loop of the learning algorithm. You can feed the whole data set, in which case the batch size is equal to the data set size.You can also feed one example at a time. Or you can feed some number N of examples.
3) Epoch - the number of times you run over the data set extracting batches to feed the learning algorithm.
Say you have 1000 examples. Setting batch size = 100, epoch = 1 and steps = 200 gives a process with one pass (one epoch) over the entire data set. In each pass it will feed the algorithm a batch with 100 examples. The algorithm will run 200 steps in each batch. In total, 10 batches are seen. If you change the epoch to 25, then it will do this 25 times, and you get 25x10 batches seen altogether.
Why do we need this? There are many variations on gradient descent (batch, stochastic, mini-batch) as well as other algorithms for optimizing the learning parameters (e.g., L-BFGS). Some of them need to see the data in batches, while others see one datum at a time. Also, some of them include random factors/steps, hence you might need multiple passes on the data to get good convergence.
This answer is based on the experimentation I have done on the getting started tutorial code.
Mad Wombat has given a detailed explanation of the terms num_epochs, batch_size and steps. This answer is an extension to his answer.
num_epochs - The maximum number of times the program can iterate over the entire dataset in one train(). Using this argument, we can restrict the number of batches that can be processed during execution of one train() method.
batch_size - The number of examples in a single batch emitted by the input_fn
steps - Number of batches the LinearRegressor.train() method can process in one execution
max_steps is another argument for LinearRegressor.train() method. This argument defines the maximum number of steps (batches) can process in the LinearRegressor() objects lifetime.
Let's whats this means. The following experiments change two lines of the code provided by the tutorial. Rest of the code remains as is.
Note: For all the examples, assume the number of training i.e. the length of x_train to be equal to 4.
Ex 1:
input_fn = tf.estimator.inputs.numpy_input_fn(
{"x": x_train}, y_train, batch_size=4, num_epochs=2, shuffle=True)
estimator.train(input_fn=input_fn, steps=10)
In this example, we defined the batch_size = 4 and num_epochs = 2. So, the input_fn can emit just 2 batches of input data for one execution of train(). Even though we defined steps = 10, the train() method stops after 2 steps.
Now, execute the estimator.train(input_fn=input_fn, steps=10) again. We can see that 2 more steps have been executed. We can keep executing the train() method again and again. If we execute train() 50 times, a total of 100 steps have been executed.
Ex 2:
input_fn = tf.estimator.inputs.numpy_input_fn(
{"x": x_train}, y_train, batch_size=2, num_epochs=2, shuffle=True)
estimator.train(input_fn=input_fn, steps=10)
In this example, the value of batch_size is changed to 2 (it was equal to 4 in Ex 1). Now, in each execution of train() method, 4 steps are processed. After the 4th step, there are no batches to run on. If the train() method is executed again, another 4 steps are processed making it a total of 8 steps.
Here, the value of steps doesn't matter because the train() method can get a maximum of 4 batches. If the value of steps is less than (num_epochs x training_size) / batch_size, see ex 3.
Ex 3:
input_fn = tf.estimator.inputs.numpy_input_fn(
{"x": x_train}, y_train, batch_size=2, num_epochs=8, shuffle=True)
estimator.train(input_fn=input_fn, steps=10)
Now, let batch_size = 2, num_epochs = 8 and steps = 10. The input_fn can emit a total of 16 batches in one run of train() method. However, steps is set to 10. This means that eventhough input_fn can provide 16 batches for execution, train() must stop after 10 steps. Ofcourse, train() method can be re-executed for more steps cumulatively.
From examples 1, 2, & 3, we can clearly see how the values of steps, num_epoch and batch_size affect the number of steps that can be executed by train() method in one run.
The max_steps argument of train() method restricts the total number of steps that can be run cumulatively by train()
Ex 4:
If batch_size = 4, num_epochs = 2, the input_fn can emit 2 batches for one train() execution. But, if max_steps is set to 20, no matter how many times train() is executed only 20 steps will run in optimization. This is in contrast to example 1, where the optimizer can run to 200 steps if the train() method is exuted 100 times.
Hope this gives a detailed understanding of what these arguments mean.
num_epochs: the maximum number of epochs (seeing each data point).
steps: the number of updates (of parameters).
You can update multiple times in an epoch
when the batch size is smaller than the number of training data.