LSTM training difficulties - tensorflow

I wanted to train LSTM model for tabular time series data. My data shape is
((7342689, 50, 5), (7342689,))
I was having a hard time to handle the training loss. Initially I tried with default learning rate , but it didn't help. My class label is severely skewed. I have added focal loss and class weights to handle class imbalance issues. I have tried with adding one more layer with 50 neurons, but that loss started to increase instead of decrease. I appreciate your suggestions. Thanks!
Here is my current model architecture:
adam = Adam(learning_rate=0.0001)
model = keras.Sequential()
model.add(LSTM(100, input_shape = (50, 5)))
model.add(Dropout(0.5))
model.add(Dense(1, activation="sigmoid"))
model.compile(loss=tfa.losses.SigmoidFocalCrossEntropy()
, metrics=[keras.metrics.binary_accuracy]
, optimizer=adam)
model.summary()
class_weights = dict(zip(np.unique(y_train), class_weight.compute_class_weight('balanced', classes=np.unique(y_train),y=y_train)))
history=model.fit(X_train, y_train, batch_size=64, epochs=50,class_weight=class_weights)

The loss of the model first decreased and then increased, which may be because the optimization process got stuck in a local optimal solution. Maybe you can try reducing the learning rate and increasing the epoch.

Related

machine learning/ deep learning model python

I have made a deep learning model for predicting the real estate prices based on different parameters such as: area, number of rooms, floor, number of floors, age of the building, type of building, etc.
My dataset is scrapped, data is prepared to analysis (discrete columns as dummies itp). My model is shown in the following code. The problem is that the model's accuracy is always 0.
Here is my code:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(units=6, activation='relu'))
model.add(tf.keras.layers.Dense(units=6, activation='relu'))
model.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
model.fit(X_train, y_train, batch_size = 32, epochs = 100)
accuracy_score(y_test, y_pred)
I couldn't find tutorial for similar case.
Predicting real estate prices needs a regression deep learninng model.
so the problem in that code is
1- in the output layer you're using 'sigmoid' function, change it to 'linear'
2- change the loss to mean absolute error or mean squarred error
3- change the metrics too to mae or mse
4-probably add more epochs, a neural network regression probably needs way more epochs to learn
Try those changes they should fix the problems

Can this be considered overfitting?

I have 4 classes, each with 1350 images. The validation set has 20% of the total images (it is generated automatically). The training model uses MobilenetV2 network:
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, include_top=False, weights='imagenet')
The model is created:
model = tf.keras.Sequential([
base_model,
tf.keras.layers.Conv2D(32, 3, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.MaxPool2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(4, activation='softmax', kernel_regularizer=regularizers.l2(0.001))
])
The model is trained through 20 epochs and then fine tunning is done in 15 epochs. The result is as follows:
Image of the model trained before fine tunning
Image of the model trained after 15 epochs and fine tunning
A bit difficult to tell without the numeric values of validation loss but I would say the results before fine tuning are slightly over fitting and for after fine tuning less over fitting. A couple of additional things you could do. One is try using an adjustable learning rate using the callback tf.keras.callbacks.ReduceLROnPlateau. Set it up to monitor validation loss. Documentation is here. I set factor=.5 and patience=1. Second replace the Flatten layer with tf.keras.layers.GlobalMaxPool2D and see if it improves the validation loss.

Deep LSTM accuracy not crossing 50%

I am working on a classification problem of the semeval 2017 task 4A dataset can be found here
and I am using deep LSTM network for it. In pre-processing, I have done lower casing->tokenization->lemmatization->removing stop words->removing punctuations. For word embeddings, I have used WORD2VEC model. There are 18,000 samples in my training set and 2000 samples in testing.
The code for my model is
model = Sequential()
model.add(Embedding(max_words, 30, input_length=max_len))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(Dropout(0.3))
model.add(Bidirectional(LSTM(32, use_bias=True, return_sequences=True)))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Bidirectional(LSTM(32, use_bias=True, return_sequences=True), input_shape=(128, 1,64)))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(SeqSelfAttention(attention_activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))
model.summary()
The value of max_words is 2000 and max_len is 300
But even after this, my testing accuracy is not crossing 50%. I can't figure out the problem.
PS - I am using validation technique too. The loss function is 'Binary Crossentropy' and optimizer is 'Adam'.
Training "LSTM" is very different with other common deep learning model.
I recommend a higher dropout rate like 0.7,0.8. and Adam optimizer is particularly unstable in LSTM with real world data. So, i recommend SGD scheduled for a momentum of 0.9 and ReduceLROnPlateau. You have to do very long training, and if spark loss is observed, the training is going very well. (Spark Loss is a word used by NVIDIA researchers. It refers to a phenomenon in which the value of Loss that appears to converge increases significantly.)

Keras model not learning and predicting only one class out of three classes

New to the field of deep learning and currently working on this competition for predicting the earthquake damage to buildings.
The model I created starts at an accuracy of .56 but remains at this for any number of epochs i let it run. When finished, the model only predicts one of the three classes (which I one hot encoded into a dataframe with three columns). Changing the number of layers, optimizers, data preparation, dropout wont change anything. Even trying to overfit my model with the over-parameterization of the neural network will still have the same accuracy and a non-learning model.
What am I doing wrong?
This is my code:
model = keras.models.Sequential()
model.add(keras.layers.Dense(64, input_dim = 85, activation = "relu"))
keras.layers.Dropout(0.3)
model.add(keras.layers.Dense(128, activation = "relu"))
keras.layers.Dropout(0.3)
model.add(keras.layers.Dense(256, activation = "relu"))
keras.layers.Dropout(0.3)
model.add(keras.layers.Dense(512, activation = "relu"))
model.add(keras.layers.Dense(3, activation = "softmax"))
adam = keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(optimizer = adam,
loss='categorical_crossentropy',
metrics = ['accuracy'])
history = model.fit(traindata, trainlabels,
epochs = 5,
validation_split = 0.2,
verbose = 1,)
There's nothing visually wrong with your model, but it may be too haevy to learn any useful features.
Try normalizing your input with https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
Start with only 2 layers, and a few numbers of neurons.
Increase batch_size and try learning_rate scheduling.
Observe the validation_accuracy, stop when it starts to overfit.
Finally, for a 3-class classification, 56% accuracy is better than baseline, remmeber it's a competition so the data is not dummy playground data which you can expect to get a 90% accuracy with an MLP in the first try.
Finally, try hyperparameter optimization with tuner.

Training and Loss not changing in Keras CNN model

I am running a CNN for left and right shoeprint classfication. I have 190,000 training images and I use 10% of it for validation. My model is setup as shown below. I get the paths of all the images, read them in and resize them. I normalize the image, and then fit it to the model. My issue is that I have stuck at a training accuracy of 62.5% and a loss of around 0.6615-0.6619. Is there something wrong that I am doing? How can I stop this from happening?
Just some interesting points to note:
I first tested this on 10 images I was having the same issue but changing the optimizer to adam and batch size to 4 worked.
I then tested on more and more images, but each time I would need to change the batch size to get improvements in the accuracy and loss. With 10,000 images I had to use a batch size of 500 and optimizer rmsprop. However, the accuracy and loss only really began to change after epoch 10.
I am now training on 190,000 images and I cannot increase the batch size as my GPU is at is max.
imageWidth = 50
imageHeight = 150
def get_filepaths(directory):
file_paths = []
for filename in files:
filepath = os.path.join(root, filename)
file_paths.append(filepath) # Add it to the list.
return file_paths
def cleanUpPaths(fullFilePaths):
cleanPaths = []
for f in fullFilePaths:
if f.endswith(".png"):
cleanPaths.append(f)
return cleanPaths
def getTrainData(paths):
trainData = []
for i in xrange(1,190000,2):
im = image.imread(paths[i])
im = image.imresize(im, (150,50))
im = (im-255)/float(255)
trainData.append(im)
trainData = np.asarray(trainData)
right = np.zeros(47500)
left = np.ones(47500)
trainLabels = np.concatenate((left, right))
trainLabels = np_utils.to_categorical(trainLabels)
return (trainData, trainLabels)
#create the convnet
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(imageWidth,imageHeight,1),strides=1))#32
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), activation='relu',strides=1))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(1, 3)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (1, 2), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 1)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
sgd = SGD(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',metrics=['accuracy'])
#prepare the training data*/
trainPaths = get_filepaths("better1/train")
trainPaths = cleanUpPaths(trainPaths)
(trainData, trainLabels) = getTrainData(trainPaths)
trainData = np.reshape(trainData,(95000,imageWidth,imageHeight,1)).astype('float32')
trainData = (trainData-255)/float(255)
#train the convnet***
model.fit(trainData, trainLabels, batch_size=500, epochs=50, validation_split=0.2)
#/save the model and weights*/
model.save('myConvnet_model5.h5');
model.save_weights('myConvnet_weights5.h5');
I've had this issue a number of times now, so thought to make a little recap of it and possible solutions etc. to help people in the future.
Issue: Model predicts one of the 2 (or more) possible classes for all data it sees*
Confirming issue is occurring: Method 1: accuracy for model stays around 0.5 while training (or 1/n where n is number of classes). Method 2: Get the counts of each class in predictions and confirm it's predicting all one class.
Fixes/Checks (in somewhat of an order):
Double Check Model Architecture: use model.summary(), inspect the model.
Check Data Labels: make sure the labelling of your train data hasn't got mixed up somewhere in the preprocessing etc. (it happens!)
Check Train Data Feeding Is Randomised: make sure you are not feeding your train data to the model one class at a time. For instance if using ImageDataGenerator().flow_from_directory(PATH), check that param shuffle=True and that batch_size is greater than 1.
Check Pre-Trained Layers Are Not Trainable:** If using a pre-trained model, ensure that any layers that use pre-trained weights are NOT initially trainable. For the first epochs, only the newly added (randomly initialised) layers should be trainable; for layer in pretrained_model.layers: layer.trainable = False should be somewhere in your code.
Ramp Down Learning Rate: Keep reducing your learning rate by factors of 10 and retrying. Note you will have to fully reinitialize the layers you are trying to train each time you try a new learning rate. (For instance, I had this issue that was only solved once I got down to lr=1e-6, so keep going!)
If any of you know of more fixes/checks that could possible get the model training properly then please do contribute and I'll try to update the list.
**Note that is common to make more of the pretrained model trainable, once the new layers have been initially trained "enough"
*Other names for the issue to help searches get here...
keras tensorflow theano CNN convolutional neural network bad training stuck fixed not static broken bug bugged jammed training optimization optimisation only 0.5 accuracy does not change only predicts one single class wont train model stuck on class model resetting itself between epochs keras CNN same output
You can try to add a BatchNornmalization() layer after MaxPooling2D(). It works for me.
I just have 2 things more to add to the great list of DBCerigo.
Check activation functions: some layers have linear activation function by default, if you do not insert some non linearity into your model it wont be able to generalize, so the net will try to learn how to separate linearly a feature space that is not linear. Making sure you have your non linearity set is a good checkpoint.
Check Model Complexity: if you have a relatively simple model and it learns only till the 1st or the 2nd epoch and then it stalls, it may be that it is trying to learn something too complex. Try making the model deeper. This usually happens when working with frozen models with only 1 or 2 layers unfrozen.
Although the 2nd one may be obvious, I run into his problem once and I lost lots of time checking everythin (data, batches, LR...) before figuring out.
Hope this helps
I would try a couple of things. A lower learning rate should help with more data. Generally, adapting the optimizer should help. Additionally your network seems really small, you might want to increase the capacity of the model by adding layers or increasing the number of filters in the layers.
A better description on how to apply deep learning in practice is given here.
in my case it is the activification function matters. I change from 'sgd' to 'a'