A neural network that can't overfit? - tensorflow

I am fitting a model to some noisy satellite data. The labels are measurements of rock on the bars of a river. There is a noisy but significant relationship. I only have 250 points but the method would expand and eventually run off much bigger datasets. I'm looking at a mix of models (RANSAC, Huber, SVM Regression) and DNNs. My DNN results seem too good to be true. The network looks like:
model = Sequential()
model.add(Dense(128, kernel_regularizer= regularizers.l2(0.001), input_dim=NetworkDims, kernel_initializer='he_normal', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128, kernel_regularizer= regularizers.l2(0.001), kernel_initializer='normal', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, kernel_regularizer= regularizers.l2(0.001), kernel_initializer='normal', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, kernel_regularizer= regularizers.l2(0.001), kernel_initializer='normal', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(32, kernel_regularizer= regularizers.l2(0.001), kernel_initializer='normal', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, kernel_initializer='normal'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam')
return model
And when I save the history and plot training loss (green dots) and validation loss (cyan line) vs epoch I get this:
Training and validation loss just creep down. With a small dataset, I was expecting the validation loss to go its own way. In fact, if I run a 10-fold cross val score with this network, the error reported by cross val score does creep down. This just looks too good to be true. It implies that I could train this thing for 1000 epochs and still improve results. If it looks too good to be true, it usually is, but why?
EDIT: More results.
So I tried to cut dropout to 0.1 at each and remove the L2. Inteesting. With the toned-down drop-out, I get even better results:
10% dropout rate
Without the L2, there is overfitting:
No L2 reg

My guess would be that you have such a high dropout on every layer, which is why it's having trouble just overfitting on the training data. My prediction is that if you lower that dropout and regularization, it'll learn the training data much faster.
I'm not too sure if the results are too good to be true because it's hard to base how good a model is based on loss function. But it should be the dropout and regularization that is preventing it from overfitting in a few epochs.

Related

Keras binary classification model's AUC score doesn't increase

I have a imbalanced dataset which has 57000 zeros and 2500 ones. I gave class weights as an input to my model, tried to change optimizers, tried to resize number of layers and neurons. Finally I stick to ;
because it was the only one that seems systematic, tried to change layer weight regularization rules but nothing helped me yet. I am not talking about just for my validation AUC score, even train success doesn't rise satisfyingly.
Here is how I declared my model, don't mind if you think the problem is layer and node sizes. I think I tried everything that sounds sensible.
class_weight = {0: 23.59,
1: 1.}
model=Sequential()
model.add(Dense(40, input_dim=x_train.shape[1], activation='relu'))
model.add(Dense(33, activation='relu',kernel_regularizer=regularizers.l1_l2(l1=1e-5, l2=1e-4),bias_regularizer=regularizers.l2(1e-4),activity_regularizer=regularizers.l2(1e-5)))
model.add(Dense(28, activation='relu'))
model.add(Dense(20, activation='relu'))
model.add(Dense(15, activation='relu'))
model.add(Dense(9, activation='relu'))
model.add(Dense(5, activation='relu'))
model.add(Dense(1, activation='sigmoid',kernel_regularizer=regularizers.l1_l2(l1=1e-5, l2=1e-4),bias_regularizer=regularizers.l2(1e-4),activity_regularizer=regularizers.l2(1e-5)))
opt = keras.optimizers.SGD(learning_rate=0.1)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['AUC'])
model.fit(x_train,y_train,epochs=600,verbose=1,validation_data=(x_test,y_test),class_weight=class_weight)
After approximate 100 epoch, it was stuck at 0.73-0.75 auc, doesn't rise anymore. I couldn't even overfit my model

EEG Deep Learning application with limited accuracy (around 70%) no matter the architecture

I have been trying to replicate some experiments involving EEG data and epileptic seizure detection. The dataset I've been using is a very well known one in this task, CHB MIT and I have tried a couple of different approaches, but I never seem to be able to replicate the results shown in some of the papers I've read.
I wanted to use CNN to classify raw EEG data. I have split the original signal in 2seconds segments, before this I scaled the data to have unit variance and also applied a band pass filter from 4-40HZ to remove unwanted artifacts. Since there is a very high class imbalance, I applied SMOTE to increase the minority class and undersampled the majority class. The dataset has 23 patients and I have been training on the data from all patients except 1 and doing the validation on the left out patient. The results always hoover around 70% accuracy. The main issue here is that I have used multiple architectures and now I'm using a simple MLP and I always get the 70% accuracy with the training and validation losses diverging in an overfitting manner in just a few epochs. The training accuracy is at around 98%, btw.
I am out of options. Some of the papers I read claim to have over 90% accuracy in this cross-patient task.
Here's one of my CNN architectures. I'm using keras:
model = Sequential()
model.add(Input(shape=(self.channels, WINDOW_SIZE, 1)))
model.add(Conv2D(8, kernel_size=(1, 10), activation='relu'))
model.add(Conv2D(16, (18, 8), activation='relu'))
model.add(MaxPooling2D((1, 2)))
model.add(Dropout(0.2))
model.add(Reshape((16, 632, 1)))
model.add(Conv2D(32, (16, 10), activation='relu'))
model.add(MaxPooling2D((1, 2)))
model.add(Dropout(0.2))
model.add(Reshape((32, 311, 1)))
model.add(Conv2D(64, (32, 10), activation='relu'))
model.add(MaxPooling2D((1, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
I don't want to convert my input data into images, like some of the papers I've read do. My idea is to use the raw data.
Some of the papers:
https://dl.acm.org/doi/fullHtml/10.1145/3241056
http://150.162.46.34:8080/icassp2019/ICASSP2019/pdfs/0001120.pdf
A small part of one of my training sessions:
Accuracy:
Loss:

Deep LSTM accuracy not crossing 50%

I am working on a classification problem of the semeval 2017 task 4A dataset can be found here
and I am using deep LSTM network for it. In pre-processing, I have done lower casing->tokenization->lemmatization->removing stop words->removing punctuations. For word embeddings, I have used WORD2VEC model. There are 18,000 samples in my training set and 2000 samples in testing.
The code for my model is
model = Sequential()
model.add(Embedding(max_words, 30, input_length=max_len))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(Dropout(0.3))
model.add(Bidirectional(LSTM(32, use_bias=True, return_sequences=True)))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Bidirectional(LSTM(32, use_bias=True, return_sequences=True), input_shape=(128, 1,64)))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(SeqSelfAttention(attention_activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))
model.summary()
The value of max_words is 2000 and max_len is 300
But even after this, my testing accuracy is not crossing 50%. I can't figure out the problem.
PS - I am using validation technique too. The loss function is 'Binary Crossentropy' and optimizer is 'Adam'.
Training "LSTM" is very different with other common deep learning model.
I recommend a higher dropout rate like 0.7,0.8. and Adam optimizer is particularly unstable in LSTM with real world data. So, i recommend SGD scheduled for a momentum of 0.9 and ReduceLROnPlateau. You have to do very long training, and if spark loss is observed, the training is going very well. (Spark Loss is a word used by NVIDIA researchers. It refers to a phenomenon in which the value of Loss that appears to converge increases significantly.)

Loss increasing with batch normalization (tf.Keras)

I have a FF NN with 2 hidden layers for a regression problem. Compared to when I do not add BN, loss (MSE) is about double when training on the same number of epochs, and the execution time is also increased by about 20%. Why is that?
If I had to take a guess -- BN is not worth it on a 2-layer network, and the extra overhead introduced by BN is actually higher than whatever decrease in processing time it causes.
That would explain the execution time, but I am not sure why the loss is higher, too.
model = Sequential()
model.add(Dense(128, 'relu'))
model.add(BatchNormalization())
model.add(Dense(128, 'relu'))
model.add(BatchNormalization())
model.add(Dense(1, 'linear'))
model.compile(loss=mean_squared_error, optimizer='adam')
I've tried a variety of optimizers, activation functions, number of epochs, batch size, etc, but no difference.
For regression, you should not use BatchNorm before the output layer.
On the other hand, you could use BatchNorm right after the input layer and before the first Dense layer to normalize inputs.

Why does increasing features lead to worse neural network performance?

I have a regression problem and configured a multi-layered neural network using Keras. The original dataset had 286 features, and using 20 epochs, the NN converged to a MSE loss of ~0.0009. This is using the Adam optimizer.
I then added three more features, and using the same configuration, the NN won't converge. After 1 epoch, it gets stuck at a loss of 0.003, so significantly worse.
After checking that the new features are represented correctly, I have tried the following with no success:
adjusting number of layers
adjusting number of neurons in each layer
including dropout layers
adjusting the learning rate
Here is my original configuration:
model = Sequential()
model.add(Dense(300, activation='relu',
input_dim=training_set.shape[1]))
model.add(Dense(100, activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(optimizer='Adam',loss='mse')
Anybody have any ideas?