Keras Batchnormalization, differing results in trainin and evaluation on training dataset

Keras Batchnormalization, differing results in trainin and evaluation on training dataset - tensorflow

I'm am training a CNN, for the sake of debugging a my problem I am working on a small subset of the actual training data.
During training the loss and accuracy seem very reasonable and pretty good. (In the example I used the same small subset for validation, the problem shows here already)
Fit on x_train and validate on x_train, using batch_size=32
Epoch 10/10
1/10 [==>...........................] - ETA: 2s - loss: 0.5126 - acc: 0.7778
2/10 [=====>........................] - ETA: 1s - loss: 0.3873 - acc: 0.8576
3/10 [========>.....................] - ETA: 1s - loss: 0.3447 - acc: 0.8634
4/10 [===========>..................] - ETA: 1s - loss: 0.3320 - acc: 0.8741
5/10 [==============>...............] - ETA: 0s - loss: 0.3291 - acc: 0.8868
6/10 [=================>............] - ETA: 0s - loss: 0.3485 - acc: 0.8848
7/10 [====================>.........] - ETA: 0s - loss: 0.3358 - acc: 0.8879
8/10 [=======================>......] - ETA: 0s - loss: 0.3315 - acc: 0.8863
9/10 [==========================>...] - ETA: 0s - loss: 0.3215 - acc: 0.8885
10/10 [==============================] - 3s - loss: 0.3106 - acc: 0.8863 - val_loss: 1.5021 - val_acc: 0.2707
When I evaluate on the same training dataset however the accuracy is really off from what I saw during training ( I would expect it to be at least as good as during training on the same dataset).
When evaluating straight forward or using
K.set_learning_phase(0)
I get, similar to the validation (Evaluating on x_train using batch_size=32):
Evaluation Accuracy: 0.266318537392, Loss: 1.50756853772
If I set the backend to learning phase the results get pretty good again, so the per batch normalization seems to work well. I suspect that the cumulated mean and variance are not properly being used.
So after
K.set_learning_phase(1)
I get (Evaluating on x_train using batch_size=32):
Evaluation Accuracy: 0.887728457507, Loss: 0.335956037511
I added the the batchnormalization layer after the first convolutional layer like this:
model = models.Sequential()
model.add(Conv2D(80, first_conv_size, strides=2, activation="relu", input_shape=input_shape, padding=padding_name))
model.add(BatchNormalization(axis=-1))
model.add(MaxPooling2D(first_max_pool_size, strides=4, padding=padding_name))
...
Further down the line I would also have some dropout layers, which I removed to investigate the Batchnormalization behavior. My intend would be to use the model in non-training phase for normal prediction.
Shouldn't it work like that, or am I missing some additional configuration?
Thanks!
I'm using keras 2.0.8 with tensorflow 1.1.0 (anaconda)

This is really annoying. When you set the learning_phase to be True - a BatchNormalization layer is getting normalization statistics straight from data what might be a problem when you have a small batch_size. I came across similar issue some time ago - and here you have my solution:
When building a model - add an option if the model would predict in either learning or not-learning phase and in this used in learning phase use the following class instead of BatchNormalization:
class NonTrainableBatchNormalization(BatchNormalization):
"""
This class makes possible to freeze batch normalization while Keras
is in training phase.
"""
def call(self, inputs, training=None):
return super(
NonTrainableBatchNormalization, self).call(inputs, training=False)
Once you train your model - reset its weights to a NonTrainable copy:
learning_phase_model.set_weights(learned_model.get_weights())
Now you can fully enjoy using BatchNormalization in a learning_phase.

Related

How do you add additional layers to a TensorFlow Neural Network?

How do you add additional layers to a TensorFlow Neural Network and know the additional layer will not be an overfit??? It seems that 2 layers wont be very helpful however it did give me a 91% accuracy and I wanted a 100% accuracy. So I wanted to add 5 to 10 additional layers and try and "overfit" the neural network. Would an overfit always give 100% accuracy on a training set?
The basic building block of a neural network is the layer.
I'm using the model example from https://www.tensorflow.org/tutorials/keras/classification
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10)
])
model.fit(train_images, train_labels, epochs=10)
The first layer in this network, transforms the format of the images from a two-dimensional array (of 28 by 28 pixels) to a one-dimensional array (of 28 * 28 = 784 pixels). Think of this layer as unstacking rows of pixels in the image and lining them up. This layer has no parameters to learn; it only reformats the data.
Currently this example after the pixels are flattened, the network consists of a sequence of two tf.keras.layers.Dense layers or fully connected, neural layers. The first Dense layer has 128 nodes (or neurons). The second (and last) layer returns a array with length of 10.
QUESTION: I wanted to start by adding ONE additional layer and then overfit with say 5 layers. How do manually add an additional layer and fit this layer? can I specify 5 additional layers without having to specify each layer? Whats a typical estimate for "overfit" on a image data set with a given size of say 30x30 pixels?
Adding One Addtional Layer gave me the same accuracy.
Epoch 1/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.4866 - accuracy: 0.8266
Epoch 2/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.3619 - accuracy: 0.8680
Epoch 3/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.3278 - accuracy: 0.8785
Epoch 4/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.3045 - accuracy: 0.8874
Epoch 5/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.2885 - accuracy: 0.8929
Epoch 6/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.2727 - accuracy: 0.8980
Epoch 7/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.2597 - accuracy: 0.9014
Epoch 8/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.2475 - accuracy: 0.9061
Epoch 9/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.2386 - accuracy: 0.9099
Epoch 10/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.2300 - accuracy: 0.9125

You can add layers to a neural network as follows.
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10)
])
Overfitting:
Overfitting occurs when a model fits exactly against its training data. When this happens, the algorithm unfortunately cannot perform accurately against unseen data. For example, if our model saw 95% accuracy on the training set but only 45% accuracy on the test set then the model is overfitting against its training data. It does not always give 100% accuracy on a training set.
It can be identified by checking validation metrics such as accuracy and loss.When the model is impacted by overfitting, the validation measures typically rise until a point at which they stagnate or start to decline. The model searches for a good fit during an upward trend, and when it finds one, the trend begins to decline or stagnate. For more information, please refer this.

Tensorflow binary classification training loss won't decrease, accuracy stuck at around 50%

I'm fairly new to this, and could use some advice on where to go from here.
I'm using tensorflow 2.3.0 with keras to build a binary classification model. I am unable to share the dataset since it's proprietary data owned by my company, but the features are all numerical financial data, representing a sort of histogram for a customer.
I've tried two models, one with 300 features and one with 600, the one with 600 simply representing a longer history. The features are normalized first, and the labels are all 0 or 1, to indicate whether the account should be flagged or not.
I have 500,000 training samples, and 60,000 test samples. The 0/1 label split is roughly half.
This is the code I have currently:
import pandas as pd
import numpy as np
# Make numpy values easier to read.
np.set_printoptions(precision=3, suppress=True)
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import utils
features = pd.read_csv('train.csv')
labels = np.array(features.pop('target'))
features = np.array(features)
num_features = features.shape[1]
features = utils.normalize(features)
model = tf.keras.Sequential([
layers.Dense(512, activation='relu', input_shape=(num_features,)),
layers.Dropout(0.5),
layers.Dense(512, activation='relu'),
layers.Dropout(0.5),
layers.Dense(512, activation='relu'),
layers.Dropout(0.5),
layers.Dense(1, activation='sigmoid')
])
model.compile(loss = tf.losses.BinaryCrossentropy(), optimizer = tf.optimizers.Adam(learning_rate=0.001), metrics=['accuracy'])
model.fit(features, labels, epochs=100)
This is likely the wrong topology, it's just my most recent attempt. I have tried a few different topologies - ranging from tiny single-layer networks with a small number of units to what you see here. I have tried different learning rates and epochs, and with or without the dropout. All of them give basically this same pattern:
Epoch 1/100
15625/15625 [==============================] - 46s 3ms/step - loss: 0.6932 - accuracy: 0.5113
Epoch 2/100
15625/15625 [==============================] - 46s 3ms/step - loss: 0.6929 - accuracy: 0.5127
Epoch 3/100
15625/15625 [==============================] - 46s 3ms/step - loss: 0.6929 - accuracy: 0.5135
Epoch 4/100
15625/15625 [==============================] - 47s 3ms/step - loss: 0.6928 - accuracy: 0.5142
Epoch 5/100
15625/15625 [==============================] - 48s 3ms/step - loss: 0.6928 - accuracy: 0.5138
The loss essentially flatlines here and the accuracy hovers around this point. If I use a very high learning rate the loss starts high but eventually flatlines around this same point.
To test if the model is working at all, I tried with a very small subset of the data (only 5 rows or so), and it quickly reduces the loss to near zero with 100% accuracy, which of course is greatly overfit but was only meant to test the code/data.
What are some next steps I can try to improve this? Does this look like maybe just poorly designed features that the NN can't figure out how to correlate, or is this perhaps not the right choice of algorithm?
EDIT:
Based on comments and responses (thanks!), I've tried a few more tweaks and I'm making some progress. I've adjusted the batch size, tweaked the topology, and lowered the learning rate. I also didn't really understand how validation data fit into the picture, so I have been running a training session now with validation_split=0.2 - my new problem is that now my training loss is decreasing/accuracy increasing, but the inverse is true for the validation loss/accuracy. Here's some epoch snapshots:
Epoch 1/1000
1563/1563 [==============================] - 25s 16ms/step - loss: 0.6926 - accuracy: 0.5150 - val_loss: 0.6927 - val_accuracy: 0.5134
Epoch 20/1000
1563/1563 [==============================] - 24s 15ms/step - loss: 0.6746 - accuracy: 0.5760 - val_loss: 0.7070 - val_accuracy: 0.5103
Epoch 50/1000
1563/1563 [==============================] - 24s 15ms/step - loss: 0.5684 - accuracy: 0.7015 - val_loss: 0.8222 - val_accuracy: 0.5043
I assume this is overfitting in action?

I would change the dense layer units to 512,128,64,1. Remove all dropout layers except the last one. Set the drop out rate of the last one to something like .3. Use your test samples as validation data so you can see if the model is over/under fitting. Also recommend you try using an adjustable learning using the keras callback ReduceLROnPlateau, and early stopping using the keras callback EarlyStopping. Documentation is [here.][1] Set each callback to monitor validation loss. My suggested code is shown below:
reduce_lr=tf.keras.callbacks.ReduceLROnPlateau(
monitor="val_loss",factor=0.5, patience=2, verbose=1)
e_stop=tf.keras.callbacks.EarlyStopping( monitor="val_loss", patience=5,
verbose=0, restore_best_weights=True)
callbacks=[reduce_lr, e_stop]
In model.fit include
callbacks=callbacks

Keras: Validation accuracy stays the exact same but validation loss decreases

I know that the problem can't be with the dataset because I've seen other projects use the same dataset.
Here is my data preprocessing code:
import pandas as pd
dataset = pd.read_csv('political_tweets.csv')
dataset.head()
dataset = pd.read_csv('political_tweets.csv')["tweet"].values
y_train = pd.read_csv('political_tweets.csv')["dem_or_rep"].values
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(dataset, y_train, test_size=0.1)
max_words = 10000
print(max_words)
max_len = 25
tokenizer = Tokenizer(num_words = max_words, filters='!"#$%&()*+,-./:;<=>?#[\\]^_`{|}~\t\n1234567890', lower=False,oov_token="<OOV>")
tokenizer.fit_on_texts(x_train)
x_train = tokenizer.texts_to_sequences(x_train)
x_train = pad_sequences(x_train, max_len, padding='post', truncating='post')
tokenizer.fit_on_texts(x_test)
x_test = tokenizer.texts_to_sequences(x_test)
x_test = pad_sequences(x_test, max_len, padding='post', truncating='post')
And my model:
model = Sequential([
Embedding(max_words+1,64,input_length=max_len),
Bidirectional(GRU(64, return_sequences = True), merge_mode='concat'),
GlobalMaxPooling1D(),
Dense(64,kernel_regularizer=regularizers.l2(0.02)),
Dropout(0.5),
Dense(1, activation='sigmoid'),
])
model.summary()
model.compile(loss='binary_crossentropy', optimizer=RMSprop(learning_rate=0.0001), metrics=['accuracy'])
model.fit(x_train,y_train, batch_size=128, epochs=500, verbose=1, shuffle=True, validation_data=(x_test, y_test))
Both of my losses decrease, my training accuracy increases, but the validation accuracy stays at 50% (which is awful considering I am doing a binary classification model).
Epoch 1/500
546/546 [==============================] - 35s 64ms/step - loss: 1.7385 - accuracy: 0.5102 - val_loss: 1.2458 - val_accuracy: 0.5102
Epoch 2/500
546/546 [==============================] - 34s 62ms/step - loss: 0.9746 - accuracy: 0.5137 - val_loss: 0.7886 - val_accuracy: 0.5102
Epoch 3/500
546/546 [==============================] - 34s 62ms/step - loss: 0.7235 - accuracy: 0.5135 - val_loss: 0.6943 - val_accuracy: 0.5102
Epoch 4/500
546/546 [==============================] - 34s 62ms/step - loss: 0.6929 - accuracy: 0.5135 - val_loss: 0.6930 - val_accuracy: 0.5102
Epoch 5/500
546/546 [==============================] - 34s 62ms/step - loss: 0.6928 - accuracy: 0.5135 - val_loss: 0.6931 - val_accuracy: 0.5102
Epoch 6/500
546/546 [==============================] - 34s 62ms/step - loss: 0.6927 - accuracy: 0.5135 - val_loss: 0.6931 - val_accuracy: 0.5102
Epoch 7/500
546/546 [==============================] - 37s 68ms/step - loss: 0.6925 - accuracy: 0.5136 - val_loss: 0.6932 - val_accuracy: 0.5106
Epoch 8/500
546/546 [==============================] - 34s 63ms/step - loss: 0.6892 - accuracy: 0.5403 - val_loss: 0.6958 - val_accuracy: 0.5097
Epoch 9/500
546/546 [==============================] - 35s 63ms/step - loss: 0.6815 - accuracy: 0.5633 - val_loss: 0.7013 - val_accuracy: 0.5116
Epoch 10/500
546/546 [==============================] - 34s 63ms/step - loss: 0.6747 - accuracy: 0.5799 - val_loss: 0.7096 - val_accuracy: 0.5055
I've seen other posts on this topic and they say to add dropout, crossentropy, decrease the learning rate, etc. I have done all of this and none of it works.
Any help is greatly appreciated.
Thanks in advance!

A couple of observations for your problem:
Though not particularly familiar with the dataset, I trust that it is used in many circumstances without problems. However, you could try to check for its balance. In train_test_split() there is a parameter called stratify which, if fed the y, it will ensure the same number of samples for each class are in training set and test set proportionally.
Your phenomenon with validation loss and validation accuracy is not something out of the ordinary. Imagine that in the first epochs, the neural network considers some ground truth positive examples (ys) with GT == 1 with 55% confidence. While the training advances, the neural network learns better, and now it is 90% confident for a ground truth positive example (ys) with GT == 1. Since the threshold for calculating the accuracy is 50% , in both situations you have the same accuracy. Nevertheless, the loss has changed significantly, since 90% >> 55%.
You training seems to advance(slowly but surely). Have you considered using Adam as an off-the-shelves optimizer?
If the low accuracy is still maintained over some epochs, you may very well suffer from a well known phenomenon called underfitting, in which your model is unable to capture the dependencies between your data. To mitigate/avoid underfitting altogether, you may want to use a more complex model (2 LSTMs / 2 GRUs) stacked.
At this stage, remove the Dropout() layer, since you have underfitting, not overfitting.
Decrease the batch_size. Very big batch_size can lead to local minima, rendering you network unable to properly learn/generalize.
If none of these work, try starting with a lower learning rate, say 0.00001 instead of 0.0001.
Reiterate over the dataset preprocessing steps. Ensure the sentences are converted properly.

I have had a similar issue and I think it might be because dropout is right before the output layer. Try moving it to one layer before that.

Discrepancy in validation accuracy and validation loss during training

I'm training my CNN model with 14k+ images for 30 epochs and in the 28th epoch, I find an abnormal validation accuracy and loss as shown below :
- 67s - loss: 0.0767 - acc: 0.9750 - val_loss: 0.6755 - val_acc: 0.8412
Epoch 27/30
- 67s - loss: 0.1039 - acc: 0.9630 - val_loss: 0.3671 - val_acc: 0.9018
Epoch 28/30
- 67s - loss: 0.0639 - acc: 0.9775 - val_loss: 1.1921e-07 - val_acc: 0.1190
Epoch 29/30
- 66s - loss: 0.0767 - acc: 0.9744 - val_loss: 0.8091 - val_acc: 0.8306
Epoch 30/30
- 66s - loss: 0.0534 - acc: 0.9815 - val_loss: 0.2091 - val_acc: 0.9433
Using TensorFlow backend.
Can anyone explain why this happened?

To me it's looks like overfitting. Your training loss is approaching zero and training accuracy approaches 100, whereas validation loss and accuracy jump around.
I would recommend you to tune your regularization (dropout, l2/l1, data augmentation ...) or model capacity.
Usually, it's a good practice to have high-capacity model with tuned regularization.

Like Arkady. A said, its strongly looks like overfitting. It means your model memorized the images so your accuracy raise and raise. But on validation data you achieve bad result.
Example: You memorize that 2*8=16 without understand how multiplication in math realy works. So for the question 2*8 you give as answer 16. But for 2*9=? you dont know what the anwser is.
How to avoid:
Use strong Image Augmentation like imgaug or augmentor.
Use Dropout Layer
Calculate and Save 2 Graphs for each epoch, one for train data Accuracy, one for validation.Normaly both graphs going up at the beginning, and after epoch X the validation graph start to jumping or decreasing. This epoch or epoch-1 is your last good state.
Use more metrics like ROC AUC
Use EarlyStop Callback with monitoring val_acc

What is the difference between Loss, accuracy, validation loss, Validation accuracy?

At the end of each epoch, I am getting for example the following output:
Epoch 1/25
2018-08-06 14:54:12.555511:
2/2 [==============================] - 86s 43s/step - loss: 6.0767 - acc: 0.0469 - val_loss: 4.1037 - val_acc: 0.2000
Epoch 2/25
2/2 [==============================] - 26s 13s/step - loss: 3.6901 - acc: 0.0938 - val_loss: 2.5610 - val_acc: 0.0000e+00
Epoch 3/25
2/2 [==============================] - 66s 33s/step - loss: 3.1491 - acc: 0.1406 - val_loss: 2.4793 - val_acc: 0.0500
Epoch 4/25
2/2 [==============================] - 44s 22s/step - loss: 3.0686 - acc: 0.0694 - val_loss: 2.3159 - val_acc: 0.0500
Epoch 5/25
2/2 [==============================] - 62s 31s/step - loss: 2.5884 - acc: 0.1094 - val_loss: 2.4601 - val_acc: 0.1500
Epoch 6/25
2/2 [==============================] - 41s 20s/step - loss: 2.7708 - acc: 0.1493 - val_loss: 2.2542 - val_acc: 0.4000
.
.
.
.
Can anyone explain me what's the difference between loss, accuracy, validation loss and validation accuracy?

When we mention validation_split as fit parameter while fitting DL model, it splits data into two parts for every epoch i.e. training data and validation data.
It trains the model on training data and validate the model on validation data by checking its loss and accuracy.
Usually with every epoch increasing, loss goes lower and accuracy goes higher. But with val_loss and val_acc, many cases can be possible:
val_loss starts increasing, val_acc starts decreasing(means model is cramming values not learning)
val_loss starts increasing, val_acc also increases.(could be case of overfitting or diverse probability values in cases softmax is used in output layer)
val_loss starts decreasing, val_acc starts increasing(Correct, means model build is learning and working fine)
This is a link to refer as well in which there is more description given. Thanks. How to interpret "loss" and "accuracy" for a machine learning model
I have tried to explain at https://www.javacodemonk.com/difference-between-loss-accuracy-validation-loss-validation-accuracy-when-training-deep-learning-model-with-keras-ff358faa

In your model.compile function you have defined a loss function and a metrics function.
Your "loss" is the value of your loss function (unknown as you do not show your code)
Your "acc" is the value of your metrics (in this case accuracy)
The val_* simply means that the value corresponds to your validation data.
Only the loss function is used to update your model's parameters, the accuracy is only used for you to see how well your model is doing.
You should seek to minimize your loss and maximize your accuracy.
Ideally the difference between your validation data results and your training data results should be similar (allthough some difference are expected)

I think here is another answer that worth notifying:
val_loss is the value of cost function for your cross-validation data and loss is the value of cost function for your training data
https://datascience.stackexchange.com/a/25269

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas