Why I'm getting bad result with Keras vs random forest or knn? - tensorflow

I'm learning deep learning with keras and trying to compare the results (accuracy) with machine learning algorithms (sklearn) (i.e random forest, k_neighbors)
It seems that with keras I'm getting the worst results.
I'm working on simple classification problem: iris dataset
My keras code looks:
samples = datasets.load_iris()
X = samples.data
y = samples.target
df = pd.DataFrame(data=X)
df.columns = samples.feature_names
df['Target'] = y
# prepare data
X = df[df.columns[:-1]]
y = df[df.columns[-1]]
# hot encoding
encoder = LabelEncoder()
y1 = encoder.fit_transform(y)
y = pd.get_dummies(y1).values
# split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)
# build model
model = Sequential()
model.add(Dense(1000, activation='tanh', input_shape = ((df.shape[1]-1),)))
model.add(Dense(500, activation='tanh'))
model.add(Dense(250, activation='tanh'))
model.add(Dense(125, activation='tanh'))
model.add(Dense(64, activation='tanh'))
model.add(Dense(32, activation='tanh'))
model.add(Dense(9, activation='tanh'))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train)
score, acc = model.evaluate(X_test, y_test, verbose=0)
#results:
#score = 0.77
#acc = 0.711
I have tired to add layers and/or change number of units per layer and/or change the activation function (to relu) by it seems that the result are not higher than 0.85.
With sklearn random forest or k_neighbors I'm getting result (on same dataset) above 0.95.
What am I missing ?
With sklearn I did little effort and got good results, and with keras, I had a lot of upgrades but not as good as sklearn results. why is that ?
How can I get same results with keras ?

In short, you need:
ReLU activations
Simpler model
Data mormalization
More epochs
In detail:
The first issue here is that nowadays we never use activation='tanh' for the intermediate network layers. In such problems, we practically always use activation='relu'.
The second issue is that you have build quite a large Keras model, and it might very well be the case that with only 100 iris samples in your training set you have too few data to effectively train such a large model. Try reducing drastically both the number of layers and the number of nodes per layer. Start simpler.
Large neural networks really thrive when we have lots of data, but in cases of small datasets, like here, their expressiveness and flexibility may become a liability instead, compared with simpler algorithms, like RF or k-nn.
The third issue is that, in contrast to tree-based models, like Random Forests, neural networks generally require normalizing the data, which you don't do. Truth is that knn also requires normalized data, but in this special case, since all iris features are in the same scale, it does not affect the performance negatively.
Last but not least, you seem to run your Keras model for only one epoch (the default value if you don't specify anything in model.fit); this is somewhat equivalent to building a random forest with a single tree (which, BTW, is still much better than a single decision tree).
All in all, with the following changes in your code:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
model = Sequential()
model.add(Dense(150, activation='relu', input_shape = ((df.shape[1]-1),)))
model.add(Dense(150, activation='relu'))
model.add(Dense(y.shape[1], activation='softmax'))
model.fit(X_train, y_train, epochs=100)
and everything else as is, we get:
score, acc = model.evaluate(X_test, y_test, verbose=0)
acc
# 0.9333333373069763
We can do better: use slightly more training data and stratify them, i.e.
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size = 0.20, # a few more samples for training
stratify=y)
And with the same model & training epochs you can get a perfect accuracy of 1.0 in the test set:
score, acc = model.evaluate(X_test, y_test, verbose=0)
acc
# 1.0
(Details might differ due to some randomness imposed by default in such experiments).

Adding some dropout might help you improve accuracy. See Tensorflow's documentation for more information.
Essentially how you add a Dropout layer is just very similar to how you added those Dense() layers.
model.add(Dropout(0.2)
Note: The parameter '0.2 implies that 20% of the connections in the layer is randomly omitted to reduce the interdependencies between them, which reduces overfitting.

Related

My model fit too slow, tringle of val_loss is 90

I have a task to write a neural network. On input of 9 neurons, and output of 4 neurons for a multiclass classification problem. I have tried different models and for all of them:
Drop-out mechanism is used.
Batch normalization is used.
And the resulting neural networks all are overfitting. Precision is <80%, I want to have min 90% precision. Loss is 0.8 on the median.
Please, can you suggest to me what model I should use?
Dataset:
TMS_coefficients.RData file
Part of my code:
(trainX, testX, trainY, testY) = train_test_split(dataset,
values, test_size=0.25, random_state=42)
# модель нейронки
visible = layers.Input(shape=(9,))
hidden0 = layers.Dense(64, activation="tanh")(visible)
batch0 = layers.BatchNormalization()(hidden0)
drop0 = layers.Dropout(0.3)(batch0)
hidden1 = layers.Dense(32, activation="tanh")(drop0)
batch1 = layers.BatchNormalization()(hidden1)
drop1 = layers.Dropout(0.2)(batch1)
hidden2 = layers.Dense(128, activation="tanh")(drop1)
batch2 = layers.BatchNormalization()(hidden2)
drop2 = layers.Dropout(0.5)(batch2)
hidden3 = layers.Dense(64, activation="tanh")(drop2)
batch3 = layers.BatchNormalization()(hidden3)
output = layers.Dense(4, activation="softmax")(batch3)
model = tf.keras.Model(inputs=visible, outputs=output)
model.compile(optimizer=tf.keras.optimizers.Adam(0.0001),
loss='categorical_crossentropy',
metrics=['Precision'],)
history = model.fit(trainX, trainY, validation_data=(testX, testY), epochs=5000, batch_size=256)
From the loss curve, I can say it is not overfitting at all! In fact, your model is underfitting. Why? because, when you have stopped training, the loss curve for the validation set has not become flat yet. That means, your model still has the potential to do well if it was trained more.
The model overfits when the training loss is decreasing (or remains the same) but the validation loss gradually increases without decreasing. This is clearly not the case
So, what you can do:
Try training longer.
Add more layers.
Try different activation functions like ReLU instead of tanh.
Use lower dropout (probably your model is struggling to learn for high value of dropouts).
Make sure you have shuffled your data before train-test splitting (if you are using sklearn for train_test_split() then it is done by default) and also check if the test data is similar to the train data and both of them goes under the same preprocessing steps.

Need help to apply 1D CNN on a dataset

I am working on my own dataset which is stored in a csv file. It has three columns: val1 | val2 | label. There are total of 6 labels. The number of rows and columns are 2000 and 3 respectively. I want to create a 1D CNN network that takes input val1 and val2 and can predict the label. So far I have tried
df = pd.read_csv("data.csv")
x = df.drop(["label"], axis=1) #x.shape = (2000, 2)
x = np.expand_dims(x,-1) #x.shape = (2000, 2, 1)
y = df.label #y.shape = (2000, 1)
y = to_categorical(y) #y.shape = (2000, 6)
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.2)
model = Sequential()
model.add(Conv1D(filters=256, kernel_size=2, activation='relu', input_shape=(2,1)))
model.add(Dropout(0.2))
model.add(MaxPooling1D(pool_size=1))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(6, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train,
batch_size=64,
epochs=100,
verbose=1,
validation_data=(X_valid,y_valid),
shuffle=True,
)
The above model gives validation and training accuracy of maximum 30% only.
Things that i tried:
Data augmentation.
Changing the number of filters.
Increasing the number of layers.
How can i increase the accuracy of the model ?
There are plenty of options you can try:
play around with the learning rate
try a different model architecture
try a fully-connected neural network. With just two 1D inputs do you really have a grid structure in your input that a CNN can exploit? It may be that a FCNN is a more appropriate architecture for your task
remove dropout as that may support underfitting
assuming that you are underfitting, increase the number of neurons in your network
Try an altogether different model type, e.g. a decision tree, logistic regression, an SVM or a random forest
check your data. Maybe it is not clean enough to allow the network to infer something from it. Apply data cleaning, e.g. if there are inconsistencies.
supply more data. It always depends on your problem, but 2000 data points may not be that much.
This is not an exhaustive list. The first step would definitely be to check your data. Your result that both training and validation performance are low suggests that you are underfitting. That would suggest that your model is too small or too heavily regularized (Dropout). I rather feel like your model is too large and too complex, but that will depend on your task. Give logistic regression, an SVM or an FCNN a shot. If it turns out that your task is indeed very complex, try to gather more data or infer more structure of your problem.

Arbitrary threshold for sigmoid activation function for CNN binary classification?

I am classifying sentiment of reviews - 0 or 1 - using gensim Doc2Vec and CNN in Tensorflow 2.2.0:
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim,
input_length=maxlen,
embeddings_initializer=Constant(embedding), trainable=False),
tf.keras.layers.Conv1D(128, 5, activation='relu'),
tf.keras.layers.GlobalMaxPooling1D(),
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',
optimizer=tf.keras.optimizers.Adam(1e-4),
metrics=['accuracy'])
history = model.fit(X_train, y_train,
epochs=8,
validation_split=0.3,
batch_size=10)
I then make predictions and convert my sigmoid probability to 0 or 1 using np.round():
predicted = model.predict(X_test)
predicted = np.round(predicted,1).astype(np.int32)
I get great results (~96% accuracy) indicating that the threshold of 0.5 is working as expected...
However, when I try to predict on a set of new data, the model seems to separate bad reviews from good ones but across approx 0.0:
# Example sigmoid outputs for new test reviews:
good_review_1: 0.000052
good_review_2: 0.000098
bad_review_1: 0.112334
bad_review_2: 0.214934
Mind you, the model never saw X_test during training and it is able to predict just fine. It's only when I introduce a new set of review text strings, I run into incorrect predictions. For new reviews, the only preprocessing that I do before calling model.predict() is feeding them through the same tokenizer used for model training:
s = 'This is a sample bad review.'
tokenizer.texts_to_sequences(pd.Series(s))
s = pad_sequences(s, maxlen=maxlen, padding='pre', truncating='pre')
model.predict(s)
I've been trying to make sense of this conundrum but I'm making little progress. I ran into post and it indicates
Some sigmoid functions will have this at 0, while some will have it set to a different 'threshold'.
But this still doesn't explain why my model was able to predict on np.round()'s 0.5 threshold for X_test dataset (which the model never learned on) and then unable to predict on new dataset at the same 0.5 threshold...

Keras model not learning and predicting only one class out of three classes

New to the field of deep learning and currently working on this competition for predicting the earthquake damage to buildings.
The model I created starts at an accuracy of .56 but remains at this for any number of epochs i let it run. When finished, the model only predicts one of the three classes (which I one hot encoded into a dataframe with three columns). Changing the number of layers, optimizers, data preparation, dropout wont change anything. Even trying to overfit my model with the over-parameterization of the neural network will still have the same accuracy and a non-learning model.
What am I doing wrong?
This is my code:
model = keras.models.Sequential()
model.add(keras.layers.Dense(64, input_dim = 85, activation = "relu"))
keras.layers.Dropout(0.3)
model.add(keras.layers.Dense(128, activation = "relu"))
keras.layers.Dropout(0.3)
model.add(keras.layers.Dense(256, activation = "relu"))
keras.layers.Dropout(0.3)
model.add(keras.layers.Dense(512, activation = "relu"))
model.add(keras.layers.Dense(3, activation = "softmax"))
adam = keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(optimizer = adam,
loss='categorical_crossentropy',
metrics = ['accuracy'])
history = model.fit(traindata, trainlabels,
epochs = 5,
validation_split = 0.2,
verbose = 1,)
There's nothing visually wrong with your model, but it may be too haevy to learn any useful features.
Try normalizing your input with https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
Start with only 2 layers, and a few numbers of neurons.
Increase batch_size and try learning_rate scheduling.
Observe the validation_accuracy, stop when it starts to overfit.
Finally, for a 3-class classification, 56% accuracy is better than baseline, remmeber it's a competition so the data is not dummy playground data which you can expect to get a 90% accuracy with an MLP in the first try.
Finally, try hyperparameter optimization with tuner.

What does it mean when training and validation accuracy are 1.000 but results are still poor?

I am using Keras to perform landmark detection - specifically locating parts of the body on a picture of a human. I have gathered around 2,000 training samples and am using rmsprop w/ mse loss function. After training my CNN, I am left with loss: 3.1597e-04 - acc: 1.0000 - val_loss: 0.0032 - val_acc: 1.0000
I figured this would mean my model would perform well on the test data, however, instead the predicted points are way off from the labeled points. Any ideas or help would be greatly appreciated!
IMG_SIZE = 96
NUM_KEYPOINTS = 15
NUM_EPOCHS = 50
NUM_CHANNELS = 1
TESTING = True
def load(test=False):
# load data from CSV file
df = pd.read_csv(fname)
# convert Image to numpy arrays
df['Image'] = df['Image'].apply(lambda im: np.fromstring(im, sep=' '))
df = df.dropna() # drop rows with missing values
X = np.vstack(df['Image'].values) / 255. # scale pixel values to [0, 1]
X = X.reshape(X.shape[0], IMG_SIZE, IMG_SIZE, NUM_CHANNELS)
X = X.astype(np.float32)
y = df[df.columns[:-1]].values
y = (y - (IMG_SIZE / 2)) / (IMG_SIZE / 2) # scale target coordinates to [-1, 1]
X, y = shuffle(X, y, random_state=42) # shuffle train data
y = y.astype(np.float32)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)
return X_train, X_test, y_train, y_test
def build_model():
# construct the neural network
model = Sequential()
model.add(Conv2D(16, (3, 3), activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, NUM_CHANNELS)))
model.add(MaxPooling2D(2, 2))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(2, 2))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(2, 2))
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(500, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(NUM_KEYPOINTS * 2))
return model
if __name__ == '__main__':
X_train, X_test, y_train, y_test = load(test=TESTING)
model = build_model()
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd, loss='mse', metrics=['accuracy'])
hist = model.fit(X_train, y_train, epochs=NUM_EPOCHS, verbose=1, validation_split=0.2)
# save the model
model.save_weights("/output/model_weights.h5")
histFile = open("/output/training_history", "wb")
pickle.dump(hist.history, histFile)
According to this question How does keras define "accuracy" and "loss"? your "accuracy" is defined as categorical accuracy which makes absolutely no sense for your problem.
After training you are left with a 10x difference between your training loss and validation loss which would suggest overfitting (hard to say for sure without a graph and some examples).
To start fixing it:
Use a metric that makes sense in your context and you understand what it does and how it's computed.
Take random examples where the metric is very good and where is very bad and manually validate that that is really the case (otherwise you need a different metric).
In your case I would imagine a metric based on the distance between the desired location and the predicted ones. This is not a default thing and you would have to implement it yourself.
Always be suspicious if the model says it's perfect.
It is impossible to tell from your question, but I will venture a guess here by some implications of your data split.
Typically, when one splits one's data into more than two sets, one is using all but one of them to train on some parameter or another. For example, the first split is used to choose the model weights, the second split to choose the model architecture, etc. Presumably you are training something with your 'validation' set, otherwise you wouldn't have it. Thus, the problem is almost certainly overfitting. The way that you detect overfitting, usually, is the difference in the accuracy of your model on data used to train your model (which is usually everything except for one single split) which you are calling your 'training' and 'validation' splits, and the accuracy of a split which your model has not touched, which you are calling your 'test' split.
So, per your question-comment "I assume if the validation accuracy is that high then there is no overfitting, right?". No. If the difference between the accuracy of your model on any data that you've used to train anything at all is higher than the accuracy of your model on data that your model has never touched in any way shape form or fashion, then you've overfit. Which seems to be the case with you.
OTOH, it may be the case that you've simply not shuffled your data. It's impossible to tell without having a look-see at the training/testing pipeline.