Im running a rather simple Keras classification, based on 30 features. What I dont get to understand yet is why the loss function gets way more volatile if I increase the number of rows going into the model:
df = pd.read_csv("cancer_classification.csv")
df = df.iloc[:50]
# split data
X = df.drop("benign_0__mal_1", axis=1).values
y = df["benign_0__mal_1"].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.50, random_state=101)
# scale
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit_transform(X_train)
scaler.transform(X_test)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
print(X_train.shape)
# ---> (50, 30)
model = Sequential()
model.add(Dense(30, activation="relu"))
model.add(Dense(15, activation="relu"))
model.add(Dense(5, activation="relu"))
# binary classification - so last layer has sigmoid activation function
model.add(Dense(1, activation="sigmoid"))
model.compile(loss="binary_crossentropy", optimizer="adam")
# we will overfit to show how it looks like - so 1000 epochs
model.fit(X_train, y_train, epochs=1000, validation_data=(X_test, y_test))
# plotting it out - we leave out first 10 rows so we dont skew chart too much with high loss number on the beginning
loss_df = pd.DataFrame(model.history.history)
loss_df = loss_df.iloc[10:]
loss_df.plot()
plt.show()
The original idea was to visualize overfitting when loss keeps dropping while val_loss starts to rise. I wonder why putting 500 rows creates such wild oscilations in the loss functions.
50 df rows
500 df rows
Try to shuffle your data. And check the number class distribution for the first 50 rows and for the 500 rows I think this might be a consequence of not shuffling. It can also be caused by the features you are getting into your model.
Related
I am trying to solve the XOR problem using the following code:
import numpy as np
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Input, Concatenate
from tensorflow.keras.utils import plot_model
from tensorflow.keras.optimizers import SGD, Adam
# input data
x = np.array([[0,0], [0,1], [1,0], [1,1]], 'float32')
y = np.array([[0], [1], [1], [0]], 'float32')
### Model
model = Sequential()
# add layers (architecture)
model.add(Dense(2, activation = 'relu')
model.add(Dense(1, activation = 'sigmoid'))
# compile
model.compile(loss = 'mean_squared_error',
optimizer = SGD(learning_rate = 0.1, momentum=0.8),
metrics = ['accuracy'])
# train
model.fit(x, y, epochs = 25000, batch_size = 1)
# evaluate
ev = model.evaluate(x, y)
I already tested:
using different activation functions in the hidden layer (sigmoid and tanh)
using different learning rates and momentum
Also, I am running with a high number of epochs (25000). Still, it only accurately predicts all outputs a few times. Most of the times accuracy is equal to 0.5 or 0.75.
I have read that this is the minimum configuration to solve this problem. However, it also seems that the error surface presents a number of regions with local minima.
My question is:
Should I assume that the model is correct and can learn the problem, although sometimes it gets 'stuck' in a local minima, OR do I still need to improve my model somehow to solve the XOR more accurately and consistently?
Here is my code for distributed training via spark-tensorflow-distributor that uses tensorflow MultiWorkerMirroredStrategy to train using multiple servers
https://github.com/tensorflow/ecosystem/blob/master/spark/spark-tensorflow-distributor/spark_tensorflow_distributor/mirrored_strategy_runner.py
import sys
from spark_tensorflow_distributor import MirroredStrategyRunner
import mlflow.keras
mlflow.keras.autolog()
mlflow.log_param("learning_rate", 0.001)
import tensorflow as tf
import time
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
def train():
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()
#tf.distribute.experimental.CollectiveCommunication.NCCL
model = None
with strategy.scope():
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.3)
N, D = X_train.shape # number of observation and variables
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
model = tf.keras.models.Sequential([
tf.keras.layers.Input(shape=(D,)),
tf.keras.layers.Dense(1, activation='sigmoid') # use sigmoid function for every epochs
])
model.compile(optimizer='adam', # use adaptive momentum
loss='binary_crossentropy',
metrics=['accuracy'])
# Train the Model
r = model.fit(X_train, y_train, validation_data=(X_test, y_test))
mlflow.keras.log_model(model, "mymodel")
MirroredStrategyRunner(num_slots=4, use_custom_strategy=True).run(train)
I realize that saving via mlflow.keras.log_model produces 4 models in databrick experiments,
each of the 4 models is not a good predictor
if I change num_slots from 4 to 1, there is only 1 model saved in databrick experiment and the model is a good predictor during inference
My question is
Do I need an extra step to merge the 4 models together to create 1 model that can predict as good as num_slot = 1? Or am I doing something wrong? I was expecting only the chief node saving models
So, you do not want to call log_model in all 4 of the Tensorflow workers. You want to log it in 1 of them. I believe you would use https://www.tensorflow.org/api_docs/python/tf/distribute/get_replica_context to figure out which worker you are, and perhaps only log if you are worker 0. That's what I do when using Horovod for a similar purpose.
You do not merge the models; they are the same model in all 4 replicas. That's the point of what this is doing.
If the model is 'worse' than with 1 replica, I would suspect other subtler issues are at play. For example, with 4 workers, your batch size has changed unless you compensate for that. See https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras#train_the_model for a discussion.
I'm working on a project where I have 3 inputs (v, f, n) and 1 output (delta(t)).
I'm trying to test the effect of the inputs on the output and to figure out which input is the most effective in different situations, therefore I would like to predict new output values that depend on new inputs values.
I have been testing this system and I got the following data table:
This table contains 1000 rows.
I'm new to this whole Neural Network thing, so I don't know what should be the Activation function, the loss function, etc.
I've been trying use some Keras models, but I'm getting wrong predictions when trying model.predict() some inputs values.
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
model = Sequential()
model.add(Dense(16, activation='relu', input_shape=(3,)))
model.add(Dense(16, activation='relu'))
model.add(Dense(1))
model.compile(optimizer=Adam(), loss='mse')
data = np.array(pd.read_excel(r'Data.xlsx'))
x = data[:, :3]
y = data[:, 3]
target = model.fit(x, y, validation_split=0.2, epochs=15000,
batch_size=256)
# check some predictions:
print(model.predict([[0.9, 840370875, 240]]))
I'm trying to experiment with a simple TensorFlow model built with keras, but I can't figure out why I'm getting such poor predictions. Here's the model:
x_train = np.asarray([[.5], [1.0], [.4], [5], [25]])
y_train = np.asarray([.25, .5, .2, 2.5, 12.5])
opt = keras.optimizers.Adam(lr=0.01)
model = Sequential()
model.add(Dense(1, activation="relu", input_shape=(x_train.shape[1:])))
model.add(Dense(9, activation="relu"))
model.add(Dense(1, activation="relu"))
model.compile(loss='mean_squared_error', optimizer=opt, metrics=['mean_squared_error'])
model.fit(x_train, y_train, shuffle=True, epochs=10)
print(model.predict(np.asarray([[5]])))
As you can see, it should learn to divide the input by two. However the loss is 32.5705, and over a few epochs, it refuses to change whatsoever (even if I do something crazy like 100 epochs, it's always that loss). Is there anything you can see that I'm doing horribly wrong here? The prediction for any value it seems is 0..
It also seems to be randomly switching between performing as expected, and the weird behavior described above. I re-ran it and got a loss of 0.0019 after 200 epochs, but if I re-run it with all the same parameters a second later the loss stays at 30 like before. What's going on here?
Some reasons that I can think of,
training set is too small
learning rate is high
last layer should just be a linear layer
for some runs the ReLU units are dying (see dead ReLU problem) and your network weights don't change after that so you see the same loss value.
In this case maybe a tanh activation will provide better conditioning for optimization
I made a few changes to your code based on what I commented, and I get decent results.
import keras
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation
x_train = np.random.random((50000, 1))#np.asarray([[.5], [1.0], [.4], [5], [25]])
y_train = x_train /2. #TODO: add small amount of noise to y #np.asarray([.25, .5, .2, 2.5, 12.5])
opt = keras.optimizers.Adam(lr=0.0005, clipvalue=0.5)
model = Sequential()
model.add(Dense(1, activation="tanh", input_shape=x_train.shape[1:]))
model.add(Dense(9, activation="tanh"))
model.add(Dense(1, activation=None))
model.compile(loss='mean_squared_error', optimizer=opt, metrics=['mean_squared_error'])
model.fit(x_train, y_train, shuffle=True, epochs=10)
print(model.predict(np.asarray([.4322])))
Output:
[[0.21410337]]
I'm trying to train a classifier on Google QuickDraw drawings using Keras:
import numpy as np
from tensorflow.keras.layers import Conv2D, Dense, Flatten, MaxPooling2D
from tensorflow.keras.models import Sequential
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=5, data_format="channels_last", activation="relu", input_shape=(28, 28, 1)))
model.add(MaxPooling2D(data_format="channels_last"))
model.add(Conv2D(filters=16, kernel_size=3, data_format="channels_last", activation="relu"))
model.add(MaxPooling2D(data_format="channels_last"))
model.add(Flatten(data_format="channels_last"))
model.add(Dense(units=128, activation="relu"))
model.add(Dense(units=64, activation="relu"))
model.add(Dense(units=4, activation="softmax"))
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
x = np.load("./x.npy")
y = np.load("./y.npy")
model.fit(x=x, y=y, batch_size=100, epochs=40, validation_split=0.2)
The input data is a 4d array with 12000 normalized images (28 x 28 x 1) per class. The output data is an array of one hot encoded vectors.
If I train this model on four classes, it produces convincing results:
(red is training data, blue is validation data)
I know the model is slightly overfitted. However, I want to keep the architecture as simple as possible, so I accepted that.
My problem is that as soon as I add just one arbitrary class, the model starts to overfit extremely:
I tried many different things to prevent it from overfitting such as Batch Normalization, Dropout, Kernel Regularizers, much more training data and different batch sizes, none of which caused any significant improvement.
What could be the reason why my CNN overfits so much?
EDIT: This is the code I used to create x.npy and y.npy:
import numpy as np
from tensorflow.keras.utils import to_categorical
files = ['cat.npy', 'dog.npy', 'apple.npy', 'banana.npy', 'flower.npy']
SAMPLES = 12000
x = np.concatenate([np.load(f'./data/{f}')[:SAMPLES] for f in files]) / 255.0
y = np.concatenate([np.full(SAMPLES, i) for i in range(len(files))])
# (samples, rows, cols, channels)
x = x.reshape(x.shape[0], 28, 28, 1).astype('float32')
y = to_categorical(y)
np.save('./x.npy', x)
np.save('./y.npy', y)
The .npy files come from here.
The problem lies with how the data split is done. Notice that there are 5 classes and you do 0.2 validation split. By default there's no shuffling and in your code you feed the data in a sequential order. What that means:
Training data consists entirely of 4 classes: 'cat.npy', 'dog.npy', 'apple.npy', 'banana.npy'. That's the 0.8 training split.
Test data is 'flower.npy'. That's your 0.2 validation split. The model was never trained on this so it gets terrible accuracy.
Such results are only possible thanks to the fact that the validation_split=0.2, so you get close to perfect class separation.
Solution
x = np.load("./x.npy")
y = np.load("./y.npy")
# Shuffle the data!
p = np.random.permutation(len(x))
x = x[p]
y = y[p]
model.fit(x=x, y=y, batch_size=100, epochs=40, validation_split=0.2)
if my hypothesis is correct, setting the validation_split to e.g. 0.5 should also get you much better results (though it's not a solution).