What could be reasons for high MAE and MSE in Keras?

What could be reasons for high MAE and MSE in Keras? - pandas

My MAE and MSE quite high. But the training data (not including test data 20%) (1030, 23) instances (after applied IQR and Z-score). By the way, all the categorical columns had been fully encoded.
Epoch: 1900, loss:50195632.3010, mae:3622.3535, mse:50195636.0000, val_loss:65308249.2427, val_mae:4636.2290, val_mse:65308244.0000,
Below is my setting for Keras.
model = keras.Sequential([
layers.Dense(64, activation='relu', input_shape=[len(dftrain.keys())]),
layers.Dense(64, activation='relu'),
layers.Dense(1)
])
optimizer = tf.keras.optimizers.RMSprop(0.001)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae', 'mse'])
EPOCHS = 2000
history = model.fit(
normed_train_data,
train_labels,
epochs=EPOCHS,
validation_split = 0.2,
verbose=0,
callbacks=[tfdocs.modeling.EpochDots()])
What do you think?

"High" MAE itself is relative and varies according to the data and there could be multiple factors contributing towards it.
If you are getting started, I d recommend you to perform Exploratory Data
Analysis (EDA) and come up with features and also prepare that data for training.
Once you verify the data, try tuning the parameters of the model to suit your usecase. ML is more about experimenting than about coding.
Notebooks like these in Kaggle will help you get started.
Neural Network Model for House Prices
Comprehensive data exploration with Python

There could be many reasons actually. My quick guesses would be your dataset. The data for training. Is it compatible to the model's expectations? (shapes, formats etc.) Like, in case of text classification, are the texts encoded before feeding to the model.
Are the labels correctly, transformed to neural network expectations?
If yes, rest will be on your network definition, are you using the right loss function, layers etc?
Try a basic model architecture for your problem, this basic architecture model can be taken from implementations for the similar problem found on internet. This will give you a good starting point.

The other answers have already mentioned some good points, but another thing you can do is to normalize your data if you haven't already. NNs are highly sensitive to this. Some methods you can try here are Batch Normalization, Standard Scaler or Min-Max Scaler.
Also, if your model is overfitting (training loss decreasing, but not validation loss), consider adding regularization in the form of Dropout between your layers and see if it improves.
These links might be helpful:
link1
link2

Related

Why doesn't my CNN validation accuracy increase? [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 months ago.
Improve this question
I am attempting to create a simple CNN to be able to distinguish eye (retinal) scans of different severities. It is a multi-class classification problem, 5 classes. This by now is probably a fairly standard, textbook case for CNNs. I am using the Kaggle EyePACs dataset. The photos are very big, so I'm using a dataset that has rescaled them.
My issue is, when I'm training the model, I expect to see the usual learning curves where both training and validation curves increase together like this example from google:
However my curves look like this:
I haven't done any image pre-processing on the data, I was hoping that there would be some rudimentary learning going on which I can then improve upon using CLAHE and what have you. I've changed the classes so that instead of trying to predict the grades from 0 to 4, I've removed the middle classes so that we just have the extremes: 0 and 4 (and thus it became a binary classification problem, where class 4 was relabelled 1 and so it's 0 and 1). However the curve didn't change much and still looks like this:
What could be the issue? I thought that as the model gets better with the training data, it must improve on the validation. Yes, this is overfitting, but I assumed that kicks in after some positive learning, not straight away. Validation set doesn't seem to be learning at all. Also, shouldn't these models start with random parameters, so that the initial accuracy would be random; but instead it's around 0.75 from the get-go. It just doesn't learn after that. What's going on? What should I look at changing? Is this a data problem or a hyperparameter problem? Shall I include the code here? Many thanks.
=============================Edit=============================
Here's the code I used. I know it's rudimentary, it's a mishmash of both the 'image classification from scratch' Keras tutorial as well as some standard MNIST tutorials you get around the web. Grateful for any pointers.
Creating the image-label dataset objects for train (+validation split) and test:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
"/content/drive/MyDrive/Colab Notebooks/resized train 15/Binary 0-4",
labels="inferred",
label_mode="binary",
validation_split=0.2,
seed=1337,
subset="training",
)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
"/content/drive/MyDrive/Colab Notebooks/resized train 15/Binary 0-4",
labels="inferred",
label_mode="binary",
validation_split=0.2,
seed=1337,
subset="validation",
)
test_ds = tf.keras.preprocessing.image_dataset_from_directory(
"/content/drive/MyDrive/Colab Notebooks/resized test 15/0-4/",
labels="inferred",
label_mode="binary",
)
Found 26518 files belonging to 2 classes.
Using 21215 files for training.
Found 26518 files belonging to 2 classes.
Using 5303 files for validation.
Found 36759 files belonging to 2 classes.
#To make it run faster (I think?):
train_ds = train_ds.prefetch(buffer_size=32)
val_ds = val_ds.prefetch(buffer_size=32)
test_ds = test_ds.prefetch(buffer_size=32)
#The architecture:
from keras.models import Sequential
from keras.layers import Dense, Rescaling, Conv2D, MaxPool2D, Flatten
model = Sequential()
model.add(Rescaling(1.0 / 255))
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(256,256,3)))
model.add(MaxPool2D(pool_size=(2, 2), strides=2))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2), strides=2))
model.add(Flatten())
model.add(Dense(units=2, activation='sigmoid'))
#Compile it:
from keras import optimizers
model.compile(optimizer=keras.optimizers.Adam(1e-3), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
#And then finally train (run) it:
history = model.fit(
x=train_ds,
epochs=30,
validation_data=val_ds,
)
#I think this is how I evaluate the trained model against the test data:
loss, acc = model.evaluate(test_ds)
print("Accuracy", acc)
#It prints out the following output:
1149/1149 [==============================] - 278s 238ms/step - loss: 0.1408 - accuracy: 0.9672
Accuracy 0.9671916961669922
And then of course I end it with model.save('Binary CNN 0-4').
I think I have spotted one thing I can change already -- that's to change the loss function to binary_crossentropy and adjust the number of units at the final dense layer to 1 (instead of 2)(?). But surely that little change won't actually address why the validation set isn't learning.

You've not included code, so I hope it's OK to give a couple of
tentative general answers
Q) initial accuracy, how can it be as high as 0.75?
A) Tensorflow reports the average training accuracy over the epoch, and if
there are many batches then it learns during epoch 0.
The first accuracy reported is the average over epoch 0
and can be much better than random.
If, for example, the input data is unbalanced and has 75% of
labels in one category, the model may learn very quickly that
it can achieve 75% accuracy by allocating 100% of training data
to that category.
Q) Can overfitting start at the beginning?
A) It can start very close to the beginning. A network may in effect just be memorising the training set.
There are standard approaches to overfitting, which include
i) Try a simpler network. It makes sense anyway to start simple and add
complexity as required.
ii) Regularization of layers - add (e.g.) L2 regularizers to your layers
iii) Add dropout layers between hidden layers
iv) Batch normalisation between hidden layers.
v) Image augmentation (randomly add some rotation, shift, flipping if appropriate)
vi) Get more training data
vii) Use transfer learning
as another answer has suggested. This is most likely appropriate if
you don't have much training data. You can then just add a layer
or two to the pre-built model (probably removing its last
layer or two), and train only the new layers.
Only trial and error will show what works

Is validation curve slight greater or lower in CNN models good?

Can you tell me which one among the two is a good validation vs train plot?
Both of them are trained with same keras sequential layers, but the second one is trained using more number of samples, i.e. augmented the dataset.
I'm a little bit confused about the zigzags in the first plot, otherwise I think it is better than the second.
In the second plot, there are no zigzags but the validation accuracy tends to be a little high than train, is it overfitting or considerable?
It is an image detection model where the first model's dataset size is 5170 and the second had 9743 samples.
The convolutional layers defined for the model building:
tf.keras.layers.Conv2D(128,(3,3), activation = 'relu', input_shape = (150,150,3)),
tf.keras.layers.MaxPool2D(2,2),
tf.keras.layers.Conv2D(64,(3,3), activation = 'relu'),
tf.keras.layers.MaxPool2D(2,2),
tf.keras.layers.Conv2D(32,(3,3), activation = 'relu'),
tf.keras.layers.MaxPool2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512,activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(128,activation='relu'),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Dense(1,activation='sigmoid')
Can the model be improved?

From the graphs the second graph where you have more samples is better. The reason is with more samples the model is trained on a much wider probability distribution of images. So when validation is run you have a better chance of correctly classifying the image. You have a lot of dropout in your model. This is good to prevent over fitting, however it will lower the training accuracy relative to the validation accuracy. Your model seems to be doing well. It might improve if you add additional convolution- max pooling layers. Alternative of course is to use transfer learning. I would recommend efficientnetb3. I also recommend using an adjustable learning rate. The Keras callback ReduceLROnPlateau works well for that purpose. Documentation is here.. Code below shows my recommended settings.
rlronp=tf.keras.callbacks.ReduceLROnPlateau(
monitor='val_loss',
factor=0.5,
patience=2,
verbose=1,
mode='auto'
)
in model.fit include callbacks=[rlronp]

Augmenting on the fly: running out of data and validation accuracy=0.5

My validation accuracy is stuck at 50% while my training accuracy manages to converge to 100%. The pitfall is that i have very few data: 46 images in train set and 12 in validation set.
Therefore, I am augmenting my data while training but i am running out of data too early. and as i saw from previous answers that i should specify steps_per_epoch.
however, using steps_per_epoch=46/batch_size is not returning that much of iteration (maximum of 10 if i specify a very low batch size).
I assume data augmentation is not being applied? How can i be sure my data is indeed being augmented? Below is my data augmentation code:
gen=ImageDataGenerator(rotation_range=180,
horizontal_flip=True,
vertical_flip=True,
)
train_batches=gen.flow(
x=x_train,
y=Y_train,
batch_size=5,
subset=None,
shuffle=True
)
val_batches=gen.flow(
x=x_val,
y=Y_val,
batch_size=3,
subset=None,
shuffle=True
)
history= model.fit(
train_batches,
batch_size=32,
# steps_per_epoch=len(x_train)/batch_size,
epochs=50,
verbose=2,
validation_data=val_batches,
validation_steps=len(x_val)/batch_size)
I will really appreciate your help!

I think the mistake is not in your code.
You have a very small dataset, you are using only 2 augmentations, and (I assume) you initialize your model with random weights. Your model expectedly overfits.
Here are a couple of ideas that may help you:
Add more argumentations. Vertical and horizontal flips - are just not enough (with your small dataset). Think about crops, rotations, color changes etc. BTW here is a good tutorial on image augmentation where you'll find more ideas on what types of data augmentation you can use for your task: https://notrocketscience.blog/complete-guide-to-data-augmentation-for-computer-vision/
Transfer learning - is a must-do for small datasets. If you are using popular/default architecture, PyTorch and Tensorflow allow you to load model weights trained on ImageNet, for instance. If your architecture is custom - download some open-source dataset (better similar to your task) and pretrain model with this data.
Appropriate validation. Consider n-fold cross-validation, because a fixed train and test set is not a good idea for the small datasets. Your validation accuracy may be low by chance (for instance, all "hard" images are in the test set), but not because the model is bad.
Let me know if it helps!

RNN Text Generation: How to balance training/test lost with validation loss?

I'm working on a short project that involves implementing a character RNN for text generation. My model uses a single LSTM layer with varying units (messing around with between 50 and 500), dropout at a rate of 0.2, and softmax activation. I'm using RMSprop with a learning rate of 0.01.
My issue is that I can't find a good way to characterize the validation loss. I'm using a validation split of 0.3 and I'm finding that the validation loss starts to become constant after only a few epochs (maybe 2-5 or so) while the training loss keeps decreasing. Does validation loss carry much weight in this sort of problem? The purpose of the model is to generate new strings, so quantifying the validation loss with other strings seems... pointless?
It's hard for me to really find the best model since qualitatively I get the sense that the best model is trained for more epochs than it takes for the validation loss to stop changing but also for fewer epochs than it takes for the training loss to start increasing. I would really appreciate any advice you have regarding this problem as well as any general advice about RNN's for text generation, especially regarding dropout and overfitting. Thanks!
This is the code for fitting the model for every epoch. The callback is a custom callback that just prints a few tests. I'm now realizing that history_callback.history['loss'] is probably the training loss isn't it...
for i in range(num_epochs):
history_callback = model.fit(x, y,
batch_size=128,
epochs=1,
callbacks=[print_callback],
validation_split=0.3)
loss_history.append(history_callback.history['loss'])
validation_loss_history.append(history_callback.history['val_loss'])
My intention for this model isn't to replicate sentences from the training data, rather, I'd like to generate sentence from the same distribution that I'm training on.

Yes history_callback.history['loss'] is Training Loss and history_callback.history['val_loss'] is the Validation Loss.
Yes, Validation Loss carries weight in this sort of problem because you just don't want to replicate the sentences which are given during Training but you want to learn the patterns from the Training Data and generate new sentences when it sees a new data.
From the information you mentioned in the question and from the insights identified from comments (thanks to Brian Bartoldson), it is understood that your model is overfitting. In addition to EarlyStopping and dropout, you can try the below mentioned techniques to mitigate overfitting problem.
3.a. Shuffle the Data, by using shuffle=True in model.fit. Code is shown below
3.b. Use recurrent_dropout. For example, If we set the value of Recurrent Dropout as 0.2 in a Recurrent Layer (LSTM), it means that it will consider only 80% of the Time Steps for that Recurrent Layer (LSTM).
3.c. Use Regularization. You can try l1 Regularization or l1_l2 Regularization as well for the arguments, kernel_regularizer, recurrent_regularizer, bias_regularizer, activity_regularizer of the LSTM Layer.
Sample code to use Shuffle, Early Stopping, Recurrent_Dropout, Regularization is shown below:
from tensorflow.keras.regularizers import l2
from tensorflow.keras.models import Sequential
model = Sequential()
Regularizer = l2(0.001)
model.add(tf.keras.layers.LSTM(units = 50, activation='relu',kernel_regularizer=Regularizer ,
recurrent_regularizer=Regularizer , bias_regularizer=Regularizer , activity_regularizer=Regularizer, dropout=0.2, recurrent_dropout=0.3))
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=15)
history_callback = model.fit(x, y,
batch_size=128,
epochs=1,
callbacks=[print_callback, callback],
validation_split=0.3, shuffle = True)
Hope this helps. Happy Learning!

Is my training data set too complex for my neural network?

I am new to machine learning and stack overflow, I am trying to interpret two graphs from my regression model.
Training error and Validation error from my machine learning model
my case is similar to this guy Very large loss values when training multiple regression model in Keras but my MSE and RMSE are very high.
Is my modeling underfitting? if yes what can I do to solve this problem?
Here is my neural network I used for solving a regression problem
def build_model():
model = keras.Sequential([
layers.Dense(128, activation=tf.nn.relu, input_shape=[len(train_dataset.keys())]),
layers.Dense(64, activation=tf.nn.relu),
layers.Dense(1)
])
optimizer = tf.keras.optimizers.RMSprop(0.001)
model.compile(loss='mean_squared_error',
optimizer=optimizer,
metrics=['mean_absolute_error', 'mean_squared_error'])
return model
and my data set
I have 500 samples, 10 features and 1 target

Quite the opposite: it looks like your model is over-fitting. When you have low error rates for your training set, it means that your model has learned from the data well and can infer the results accurately. If your validation data is high afterwards however, that means that the information learned from your training data is not successfully being applied to new data. This is because your model has 'fit' onto your training data too much, and only learned how to predict well when its based off of that data.
To solve this, we can introduce common solutions to reduce over-fitting. A very common technique is to use Dropout layers. This will randomly remove some of the nodes so that the model cannot correlate with them too heavily - therefor reducing dependency on those nodes and 'learning' more using the other nodes too. I've included an example that you can test below; try playing with the value and other techniques to see what works best. And as a side note: are you sure that you need that many nodes within your dense layer? Seems like quite a bit for your data set, and that may be contributing to the over-fitting as a result too.
def build_model():
model = keras.Sequential([
layers.Dense(128, activation=tf.nn.relu, input_shape=[len(train_dataset.keys())]),
Dropout(0.2),
layers.Dense(64, activation=tf.nn.relu),
layers.Dense(1)
])
optimizer = tf.keras.optimizers.RMSprop(0.001)
model.compile(loss='mean_squared_error',
optimizer=optimizer,
metrics=['mean_absolute_error', 'mean_squared_error'])
return model

Well i think your model is overfitting
There are several ways that can help you :
1-Reduce the network’s capacity Which you can do by removing layers or reducing the number of elements in the hidden layers
2- Dropout layers, which will randomly remove certain features by setting them to zero
3-Regularization
If i want to give a brief explanation on these:
-Reduce the network’s capacity:
Some models have a large number of trainable parameters. The higher this number, the easier the model can memorize the target class for each training sample. Obviously, this is not ideal for generalizing on new data.by lowering the capacity of the network, it's going to learn the patterns that matter or that minimize the loss. But remember،reducing the network’s capacity too much will lead to underfitting.
-regularization:
This page can help you a lot
https://towardsdatascience.com/handling-overfitting-in-deep-learning-models-c760ee047c6e
-Drop out layer
You can use some layer like this
model.add(layers.Dropout(0.5))
This is a dropout layer with a 50% chance of setting inputs to zero.
For more details you can see this page:
https://machinelearningmastery.com/how-to-reduce-overfitting-with-dropout-regularization-in-keras/

As mentioned in the existing answer by #omoshiroiii your model in fact seems to be overfitting, that's why RMSE and MSE are too high.
Your model learned the detail and noise in the training data to the extent that it is now negatively impacting the performance of the model on new data.
The solution is therefore randomly removing some of the nodes so that the model cannot correlate with them too heavily.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas