Why does Keras accept a batch size option for model.evaluate? - tensorflow

Why does the evaluate function of the Keras API in Tensorflow accept a batch_size? To my knowledge, this parameter should only be relevant for managing how many samples we use per iteration during training. What influence does this choice have during model evaluation?

Batch size is mainly used in Sequence-based predictions or in Time series predictions.
Below are the cases where you have to use batch size while prediction.
In Time Series use cases it may be desirable to use a large batch size when training the network and a batch size of 1 when making predictions in order to predict the next step in the sequence.
For Stateful RNN it is required to provide a fixed batch size during prediction/evaluation where the output state of the current batch is used as the initial state for the next batch. They keep information from one batch to another batch.
If your model doesn't fall into these kinds of category technically you don't need to provide batch size as input during evaluating. Even if you provide batch size, it's how much data you are feeding at a time for GPU.

Related

Batch normalisation during testing

I am working on a 2d time series problem with vector size 140*6 for binary classification using CNN. I have not used any scaling and normalising techniques instead directly fed data to CNN with 3 hidden layers and Batch Normalisation layers with batch size 256 during training .Since I have to test it at real time as well with batch size 1 how would batch Normalisation work then having not calculated any mean or std deviation for any training layer.And also should batch normalisation later be used for forward pass during final testing or the mean and std deviation only should be calculated for training layers and used.
Batch normalization is not used during testing. The reason for that being is batch normalization is used to alleviate the problem of covariance shift between different batches in training data. The covariance shift leads to bad models getting trained, thus, we use it. It has no role to play during testing.
And if you have used batch normalization with batch size 1, then, that is simply instance normalization.
This questions has been asked two years ago but I don't think the accepted answer is correct! Batch Normalization IS is used during testing (at least you keep the batch normalisation LAYERS), but with the training data's saved running averages of mean and variance. So it is not actual batch normalisation during testing but rather a linear transformation with the saved training statistics. Therefore, if you are testing with batch size of 1 you would just use the saved running averages of the training data.
The following thread answers the question: Batch normalization during testing

Different between fit and evaluate in keras

I have used 100000 samples to train a general model in Keras and achieve good performance. Then, for a particular sample, I want to use the trained weights as initialization and continue to optimize the weights to further optimize the loss of the particular sample.
However, the problem occurred. First, I load the trained weight by the keras API easily, then, I evaluate the loss of the one particular sample, and the loss is close to the loss of the validation loss during the training of the model. I think it is normal. However, when I use the trained weight as the inital and further optimize the weight over the one sample by model.fit(), the loss is really strange. It is much higher than the evaluate result and gradually became normal after several epochs.
I think it is strange that, for the same one simple and loading the same model weight, why the model.fit() and model.evaluate() return different results. I used batch normalization layers in my model and I wonder that it may be the reason. The result of model.evaluate() seems normal, as it is close to what I seen in the validation set before.
So what cause the different between fit and evaluation? How can I solve it?
I think your core issue is that you are observing two different loss values during fit and evaluate. This has been extensively discussed here, here, here and here.
The fit() function loss includes contributions from:
Regularizers: L1/L2 regularization loss will be added during training, increasing the loss value
Batch norm variations: during batch norm, running mean and variance of the batch will be collected and then those statistics will be used to perform normalization irrespective of whether batch norm is set to trainable or not. See here for more discussion on that.
Multiple batches: Of course, the training loss will be averaged over multiple batches. So if you take average of first 100 batches and evaluate on the 100th batch only, the results will be different.
Whereas for evaluate, just do forward propagation and you get the loss value, nothing random here.
Bottomline is, you should not compare train and validation loss (or fit and evaluate loss). Those functions do different things. Look for other metrics to check if your model is training fine.

batch size in model.fit and model.predict

In keras, both model.fit and model.predict has a parameter of batch_size. My understanding is that batch size in model.fit is related to batch optimization, what's the physical meaning of batch_size in model_predict? Does it need to be equal to the one used by model.fit?
No it doesn‘t. Imagine inside your model there is a function which increases the amount of memory significantly. Therefore, you might run into resource errors if you try to predict all your data in one go. This is often the case when you use gpu with limited gpu memory for predicting. So instead you choose to predict only small batches at the same time. The batch_size parameter in the predict function will not alter your results in any way. So you can choose any batch_size you want for prediction.
It depends on your model and whether the batch size when training must match the batch size when predicting. For example, if you're using a stateful LSTM then the batch size matters because the entire sequence of data is spread across multiple batches, i.e. it's one long sequence that transcends the batches. In that case the batch size used to predict should match the batch size when training because it's important they match in order to define the whole length of the sequence. In stateless LSTM, or regular feed-forward perceptron models the batch size doesn't need to match, and you actually don't need to specify it for predict().
Just to add; this is different to train_on_batch() where you can supply a batch of input samples and get an equal number of prediction outputs. So, if you create a batch of 100 samples, you submit to train_on_batch() then you get 100 predictions, i.e. one for each sample. This can have performance benefits over issuing one at a time to predict().
As said above, batch size just increases the number of training data that is fed in at one go(batches). Increasing it may increase chances of your computer resources running out, assuming you are running it on your personal computer. If you are running it on the cloud with higher resources, you should be fine. You can toggle the number as you want, but don't put in a big number, I suggest going up slowly. Also, you may want to read this before you increase your batch size:
https://stats.stackexchange.com/questions/164876/tradeoff-batch-size-vs-number-of-iterations-to-train-a-neural-network

Does batch normalization work on balanced dataset?

I trained a classification network using tensorFlow with batch normalization in every convolutional layer. When I predict on a balanced test set where every category included in it, the accuracy is normal. However, if I chose any one specific category from test set, the accuracy is low, even zero.
But when 3 categories included in test set, the accuracy became higher. As we all know, the weights was fixed when the model finished training. But I find the balance in test set have greatly influence on prediction accuracy.
I think if batch normalization has influence on this, so I remove all batch normalization and retrained the model again. This time, when I predict only one category picture, it became normal.
Could anyone know why? THANKS!
You're right. If your training set is unbalanced you compute and accumulate mean values (for every layer) that are skewed in favor of the majority class.
In fact, you're not "normalizing" but instead, you're making the unbalancing problem worse.
Use batch normalization when you have a balanced training set and you can be sure that your batches will contain a balanced number of samples. This gives you optimal results.
However, since you added in the comments that you're using tf.contrib.layers.conv2d(x, num_output, kernel_size, stride, padding, activation_fn, normal_fn=tf.contrib.layers.batch_norm)
I spotted the problem: normalizer_fn calls the function you pass (batch_norm). But it uses the defaults parameters. By default, is_training equals to True thus you're computing even during the test phase the mean and the variance over the batch. Just read carefully the documentation of tf.contrib.layers.conv2d and use normalizer_params to pass is_training=True when training and is_training=False when testing/validating.

How to partition a single batch into many invocations to save memory

I have a somewhat big model, that can only be trained on GPU with a small batch size, but I need to use a larger batch size (from other experiments, I know this improves final accuracy and convergence time)
Caffe provides a nice solution to this problem through the 'iter_size' option, which splits a batch into n smaller batches, accumulate n gradients then update once
how can this be implemented efficiently in TensorFlow ?
You could use smaller batches, compute the gradients manually, and then add them up and apply them at once. For example, if you want a batch size of 100, compute gradients for 10 batches of 10, then add the gradients and apply them. This is explained here.
You can use the tf.gradients() op to compute the gradients for each batch separately and add them. Then use the apply_gradients() method on whatever optimizer you want to perform the training step.