val_loss: 1.1921e-07 - val_acc: 0.0715 How is that possible? - tensorflow

I am currently training this model: https://pastebin.com/F7dQvmZP. When i trained it with only 1 feature (raw data) per timestep i got a loss of ~1.3 and an accuracy of ~57%. After adding the direction of change (1 if increased 0 if same -1 if decreased) as a second feature to each timestep my loss went down to ~0.8 and my accuracy increased to ~70%. Then i added a differently scaled version of the raw data as a third feature. This data is basically scaled such that the maximum reading during that timeseries is 1.0. Training this quickly results in a loss of ~1e-7 but the accuracy stays at ~7%. The input is composed like this
np.dstack((measurements, change, scaled))
I dont really know how that is possible since my outputs are one hot encoded and I only have 22 classes. The training data includes 291300 training and 97100 validation samples. It trains normal until I add the third feature (Even if I only use the third feature). Any help would be appreciated.

Related

Neural Network has accuracy of 75% but has bad predictions

I'm building a convolutional neural networtk in order to predict 5 emotions from a data set of faces.
After working in the construction of the weights I could get an accuracy of 75%
score = model_2_emotion.evaluate(test_datagen.flow(X_test, Y_test, batch_size = 4))
print('Accuracy: {}'.format(score[1]))
308/308 [==============================] - 17s 56ms/step - loss: 0.6139 - accuracy: 0.7575
Accuracy: 0.7575264573097229
But model_2_emotion.predict(X_test) returns me this array
array([[0.6594997 , 0.00083318, 0.19473663, 0.08065161, 0.06427888],
[0.6610887 , 0.0008383 , 0.19332188, 0.08035047, 0.06440066],
[0.66172844, 0.00082645, 0.19264877, 0.08032911, 0.06446711],
...,
[0.66067713, 0.00084266, 0.19318439, 0.08052441, 0.06477145],
[0.66050553, 0.00085838, 0.19319515, 0.08056776, 0.06487323],
[0.6602842 , 0.00084602, 0.19372217, 0.08054546, 0.06460217]],
dtype=float32)
Where we can see it's just predecting "correcty" the first emotion (the first column) with the accuracy of 60% and from this array produces me this heat map:
Heat map
Which I think there is something wrong since its passing through the first emotion. Since I got 75% of accuracy but bad predictions, someone knows what's going on?
Looking at your confusion matrix (this is not called a heat map), seems like your model is only predicting a single class, and that your data is unbalanced.
How many samples you have for each class (is it unbalanced)?
How many epochs is your model training?
how many neurons your neural network have in the last layer (it is supposed to have 5 neurons) ?
Only looking closer to the data/problem (and in the train/test accuracy curve over epochs) a better suggestion could be made, but your problem seems to be Under/Overfiting, and that you can benefit of better theoretical basis.
Take a look on any source about bias-variance trade off.
https://quantdare.com/mitigating-overfitting-neural-networks/
here are some generic tips: get more data, improve pre processing, improve model (more layers, different kernel sizes, skip connections, batch normalization, different optimization/learning rates etc ...).

What to do when accuracy increasing but loss is also increasing on validation data?

I'm currently working on a multi-class classification problem which is highly imbalanced. I want to save my model weights for best epoch but I'm confused on which metric I should choose?
Here's my training progress bar :
I am using ModelCheckpoint callback in tf.keras and monitoring val_loss as a metric to save best model weights.
As you can see in the image,
At 8th epoch I got an val_acc = 0.9845 but val_loss = 0.629 and precision and recall is also high here.
But at 3rd epoch I got val_acc = 0.9840 but val_loss = 0.590
I understand the difference is not huge but in such cases what's the ideal metric to believe on imbalanced dataset?
The most important factors are the the validation and training error.
If the validation loss (error) is going to increase so means overfitting. You must set the number of epochs as high as possible and avoid the overfitting and terminate training based on the error rates. . As long as it keeps dropping training should continue. Till model start to converge at nth epochs. Indeed it should converge quite well to a low val_loss.
Just bear in mind an epoch is one learning cycle where the learner can see the whole training data set. If you have two batches, the learner needs to go through two iterations for one epoch.
This link can be helpful.
You can divide data in 3 data sets, training, validation and evaluation. Train each network along enough number of epochs to track the training Mean Squared Error to be stuck in a minimum.
The training process uses training data-set and should be executed epoch by epoch, then calculate the Mean Squared Error of the network in each epoch for the validation set. The network for the epoch with the minimum validation MSE is selected for the evaluation process.
This can happen for several reasons. Assuming you have used proper separation of train, test and validation set and preprocessing of datasets like min-max scaler, adjusting missing values, you can do the following.
First run the model for several epoch and plot the validation loss graph.
If the loss is first reducing and after reaching a certain point it is now increasing, if the graph is in U shape, then you can do early stopping.
In other scenario, when loss is steadily increasing, early stopping won't work. In this case, add dropout layer of 0.2-0.3 in between the major layers. This will introduce randomness in the layers and will stop the model from memorising.
Now once you add dropouts, your model may suddenly start to behave strange. Tweak with activation functions and number of output nodes or Dense layer and it will eventually get right.

model.evaluate() varies wildly with number of steps when using generators

Running tensorflow 2.x in Colab with its internal keras version (tf.keras). My model is a 3D convolutional UNET for multiclass segmentation (not sure if it's relevant).
I've successfully trained (high enough accuracy on validation) this model the traditional way but I'd like to do augmentation to improve it, therefore I'm switching to (hand-written) generators. When I use generators I see my loss increasing and my accuracy decreasing a lot (e.g.: loss increasing 4-fold, not some %) in the fit.
To try to localize the issue I've tried loading my trained weights and computing the metrics on the data returned by the generators. And what's happening makes no sense. I can see that the results visually are ok.
model.evaluate(validationGenerator,steps=1)
2s 2s/step - loss: 0.4037 - categorical_accuracy: 0.8716
model.evaluate(validationGenerator,steps=2)
2s/step - loss: 1.7825 - categorical_accuracy: 0.7158
model.evaluate(validationGenerator,steps=4)
7s 2s/step - loss: 1.7478 - categorical_accuracy: 0.7038
Why would the loss vary with the number of steps? I could guess some % due to statistical variations... not 4 fold increase!
If I try
x,y = next(validationGenerator)
nSamples = x.shape[0]
meanLoss = np.zeros(nSamples)
meanAcc = np.zeros(nSamples)
for pIdx in range(nSamples):
y_pred = model.predict(np.expand_dims(x[pIdx,:,:,:,:],axis=0))
meanAcc[pIdx]=np.mean(tf.keras.metrics.categorical_accuracy(np.expand_dims(y[pIdx,:,:,:,:],axis=0),y_pred))
meanLoss[pIdx]=np.mean(tf.keras.metrics.categorical_crossentropy(np.expand_dims(y[pIdx,:,:,:,:],axis=0),y_pred))
print(np.mean(meanAcc))
print(np.mean(meanLoss))
I get accuracy~85% and loss ~0.44. Which is what I expect from the previous fit, and it varies by vary little from one batch to the other. And these are the same exact numbers that I get if I do model.evaluate() with 1 step (using the same generator function).
However I need about 30 steps to run trough my whole training dataset. What should I do?
If I fit my already good model to this generator it indeed worsen the performances a lot (it goes from a nice segmentation of the image to uniform predictions of 25% for each of the 4 classes!!!!)
Any idea on where to debud the issue? I've also visually looked at the images produced by the generator and at the model predictions and everything looks correct (as testified by the numbers I found when evaluating using a single step). I've tried writing a minimal working example with a 2 layers model but... in it the issue does not happen.
UPDATE: Generators code
So, as I've been asked, these are the generators code. They're handwritten
def dataGen (X,Y_train):
patchS = 64 #set the size of the patch I extract
batchS = 16 #number of samples per batch
nSamples = X.shape[0] #get total number of samples
immSize = X.shape[1:] #get the shape of the iamge to crop
#Get 4 patches from each image
#extract them randomly, and in random patient order
patList = np.array(range(0,nSamples),dtype='int16')
patList = patList.reshape(nSamples,1)
patList = np.tile(patList,(4,2))
patList[:nSamples,0]=0 #Use this index to tell the code where to get the patch from
patList[nSamples:2*nSamples,0]=1
patList[2*nSamples:3*nSamples,0]=2
patList[3*nSamples:4*nSamples,0]=3
np.random.shuffle(patList)
patStart=0
Xout = np.zeros((batchS,patchS,patchS,patchS,immSize[3])) #allocate output vector
while True:
Yout = np.zeros((batchS,patchS,patchS,patchS)) #allocate vector of labels
for patIdx in range(batchS):
XSR = 32* (patList[patStart+patIdx,0]//2) #get the index of where to extract the patch
YSR = 32* (patList[patStart+patIdx,0]%2)
xStart = random.randrange(XSR,XSR+32) #get a patch randomly somewhere between a range
yStart = random.randrange(YSR,YSR+32)
zStart = random.randrange(0,26)
patInd = patList[patStart+patIdx,1]
Xout[patIdx,:,:,:,:] = X[patInd,xStart:(xStart+patchS),yStart:(yStart+patchS),zStart:(zStart+patchS),:]
Yout[patIdx,:,:,:] = Y_train[patInd,xStart:(xStart+patchS),yStart:(yStart+patchS),zStart:(zStart+patchS)]
if((patStart+patIdx)>(patList.shape[0]-2)):
np.random.shuffle(patList) #after going through the whole list restart
patStart=0
patStart = patStart+batchS
Yout = tf.keras.utils.to_categorical (Yout, num_classes=4, dtype='float32') #convert to one hot encoding
yield Xout, Yout
Posting the workaround I've found for the future person coming here from google.
Apparently the issue lies in how keras calls a handwritten generator. When it was called multiple times in a row by using evaluate(gen, steps=N) apparently it returned wrong outputs. There's no documentation around about how to address this or how a generator should be written.
I ended up writing my code using a tf.keras.utils.sequence class and the same previous code now works perfectly. No way to know why.
Here are different factors that affect loss & accuracy:
For Accuracy, we know that it measures the accuracy of the prediction: i.e. correct-classes /total-classes.
While loss tracks the inverse-confidence of the prediction.
A high Loss indicates that although the model is performing well with the prediction, It is becoming uncertain of the prediction it is making.
For example, For an image classification scenario, The image of a cat is passed into two models. Model A predicts {cat: 0.8, dog: 0.2} and model B predicts {cat: 0.6, dog: 0.4}.
Both models will score the same accuracy, but model B will have a higher loss.
On your evaluation part, Based on the documentation
Steps: Integer or None. Total number of steps (batches of samples) before declaring the evaluation round finished. Ignored with the default value of None. If x is a tf.data dataset and steps is None, 'evaluate' will run until the dataset is exhausted. This argument is not supported by array inputs.
So for simplify, it's getting the Nth batch of your validation samples.
It could be that the model prediction is becoming uncertain since the majority of the unknown data falls on those specific steps. which in your case, steps 2 & 3.
So, As the evaluation steps progress, The prediction becomes more uncertain leading to a higher loss.
You might need to retrain your model with more training samples but of course, you need to be careful since you might encounter overfitting.
In terms of data augmentation, you might wanna check this link
In Training Perspective, proper data augmentation is one of the factors that leads to good model performance.

What does 'Epoch' mean in training Generative Adversarial Networks

I am training a GAN with text data. When I train the discriminator, I randomly sample m positive data from the dataset and generate m negative data with the generator. I found many papers mention about details of implementation such as training epochs. About the training epochs, I have a question about sampling positive data:
Sample from the dataset (maybe shuffled) in order, when the whole dataset is covered, we call 1 epoch
Just as I did, randomly sample positive data, when the total amount of sampled data is the same size as the dataset, we call 1 epoch
Which one is right? or which one is commonly used? or which one is better?
In my opinion, an epoch is when you passed through the whole training data once. and I think in the paper also they mean a pass through the whole training set when they mention an epoch.
However, the epoch can be also defined as after processing k elements, where k can be smaller than n (the size of the training set). Such a definition might make sense when you want to capture get some evaluation about your model on the dev set, and you normally do it after every single epoch.
After all, that is my opinion and my understandings of GAN papers.
Good luck!

Why does the accuracy drop to zero in each epoch, while training convlstm layers in keras?

I am trying to use ConvLSTM layers in Keras 2 to train an action recognition model. The model has 3 ConvLSTM layers and 2 Fully Connected ones.
At each and every epoch the accuracy for the first batch (usually more than one) is zero and then it increases to some amount more than the previous epoch. For example, the first epoch finishes at 0.3 and the next would finish at 0.4 and so on.
My question is why does it get back to zero at each epoch?
p.s.
The ConvLSTM is stateless.
The model is compiled with SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True), for some reason it does not converge using Adam.
So - in order to understand why something like this happening you need to understand how keras computes accuracy during the batch computation:
Before each batch - a number of positively classified examples is stored.
After each batch - a number of positively classified examples is stored and it's printed after division by all examples used in training.
As your accuracy is pretty low it's highly probable that in a first few batches none of the examples will be classified properly. Especially when you have a small batch. This makes accuracy to be 0 at the beginning of your training.