Getting bad performance with a simple TensorFlow model - tensorflow

I'm trying to experiment with a simple TensorFlow model built with keras, but I can't figure out why I'm getting such poor predictions. Here's the model:
x_train = np.asarray([[.5], [1.0], [.4], [5], [25]])
y_train = np.asarray([.25, .5, .2, 2.5, 12.5])
opt = keras.optimizers.Adam(lr=0.01)
model = Sequential()
model.add(Dense(1, activation="relu", input_shape=(x_train.shape[1:])))
model.add(Dense(9, activation="relu"))
model.add(Dense(1, activation="relu"))
model.compile(loss='mean_squared_error', optimizer=opt, metrics=['mean_squared_error'])
model.fit(x_train, y_train, shuffle=True, epochs=10)
print(model.predict(np.asarray([[5]])))
As you can see, it should learn to divide the input by two. However the loss is 32.5705, and over a few epochs, it refuses to change whatsoever (even if I do something crazy like 100 epochs, it's always that loss). Is there anything you can see that I'm doing horribly wrong here? The prediction for any value it seems is 0..
It also seems to be randomly switching between performing as expected, and the weird behavior described above. I re-ran it and got a loss of 0.0019 after 200 epochs, but if I re-run it with all the same parameters a second later the loss stays at 30 like before. What's going on here?

Some reasons that I can think of,
training set is too small
learning rate is high
last layer should just be a linear layer
for some runs the ReLU units are dying (see dead ReLU problem) and your network weights don't change after that so you see the same loss value.
In this case maybe a tanh activation will provide better conditioning for optimization
I made a few changes to your code based on what I commented, and I get decent results.
import keras
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation
x_train = np.random.random((50000, 1))#np.asarray([[.5], [1.0], [.4], [5], [25]])
y_train = x_train /2. #TODO: add small amount of noise to y #np.asarray([.25, .5, .2, 2.5, 12.5])
opt = keras.optimizers.Adam(lr=0.0005, clipvalue=0.5)
model = Sequential()
model.add(Dense(1, activation="tanh", input_shape=x_train.shape[1:]))
model.add(Dense(9, activation="tanh"))
model.add(Dense(1, activation=None))
model.compile(loss='mean_squared_error', optimizer=opt, metrics=['mean_squared_error'])
model.fit(x_train, y_train, shuffle=True, epochs=10)
print(model.predict(np.asarray([.4322])))
Output:
[[0.21410337]]

Related

Validation loss and loss stuck at 0

I am trying to train a classification problem with two labels to predict. For some reason, my validation_loss and my loss are always stuck at 0 when training. What could be the reason? Is there something wrong when calling loss functions? are they the appropriated ones for multi-label classification?
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=12, shuffle=True)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=12, shuffle=True)
model = keras.Sequential([
#keras.layers.Flatten(batch_input_shape=(None,24)),
keras.layers.Dense(first_neurona, activation='relu'),
keras.layers.Dense(second_neurona, activation='relu'),
keras.layers.Dense(third_neurona, activation='relu'),
keras.layers.Dense(fourth_neurona, activation='relu'),
keras.layers.BatchNormalization(), #WE NORMALIZE THE INPUT DATA
keras.layers.Dropout(0.25),
keras.layers.Dense(2, activation='softmax'),
#keras.layers.BatchNormalization() #for multi-class problems we use softmax? 2 clases: Forehand or backhand
])
model.compile(optimizer=keras.optimizers.Adam(learning_rate=lr),
loss='categorical_crossentropy',
metrics=['accuracy'])
history=model.fit(X_train, y_train, epochs=n_epochs, batch_size=batchSize, validation_data=(X_val, y_val))
test_loss, test_acc = model.evaluate(X_test, y_test)
EDIT:
See the shape of my training data:
X_train shape : (280, 14) X_val shape : (94, 14) y_train shape : (280, 2) y_val shape : (94, 2)
the parameters when calling the function:
first neuron units:4
second neuron units: 8
learning rate= 0.0001
epochs= 1000
batch_size=32
also the metrics plots:
softmax activation fx should be the last step in multiclass classification - it converts logits to probabilities with no (non)-linearity transformations
pay attention to the output - this & this advices helped me once -- in last layer you should have the same number of classes (output_units, here number of neurons) as the number of target classes in the training dataset -- because getting probabilities of belonging to each one class -- you then select argmax as most probable... and for this reason one-hot encoding apriori model.fit() is used for labels - in order to get at finish the slice of all probabilities per sample, from which you will choose further argmax for this sample (as most probable)... OR use sparse_categorical_crossentropy
often on this trouble is suggested to decrease learning_rate - against exploding gradients
be carefull with activation='relu' & vanishing gradient - - use better Leaky ReLu or PReLU or use gradient clipping
this investigation also can matter
p.s. I can not test your data
from the plot of validation accuracy it is clear it is changing so the loss must be changing as well. I suspect you are plotting the wrong values for loss which is why I wanted to see the actual model.fit printed training data. Make sure you have these assignments for loss'
val_loss=history.history('val_loss')
train_loss=history.history('loss')
also recommend you increase the learning_rate to .001

Why I'm getting bad result with Keras vs random forest or knn?

I'm learning deep learning with keras and trying to compare the results (accuracy) with machine learning algorithms (sklearn) (i.e random forest, k_neighbors)
It seems that with keras I'm getting the worst results.
I'm working on simple classification problem: iris dataset
My keras code looks:
samples = datasets.load_iris()
X = samples.data
y = samples.target
df = pd.DataFrame(data=X)
df.columns = samples.feature_names
df['Target'] = y
# prepare data
X = df[df.columns[:-1]]
y = df[df.columns[-1]]
# hot encoding
encoder = LabelEncoder()
y1 = encoder.fit_transform(y)
y = pd.get_dummies(y1).values
# split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)
# build model
model = Sequential()
model.add(Dense(1000, activation='tanh', input_shape = ((df.shape[1]-1),)))
model.add(Dense(500, activation='tanh'))
model.add(Dense(250, activation='tanh'))
model.add(Dense(125, activation='tanh'))
model.add(Dense(64, activation='tanh'))
model.add(Dense(32, activation='tanh'))
model.add(Dense(9, activation='tanh'))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train)
score, acc = model.evaluate(X_test, y_test, verbose=0)
#results:
#score = 0.77
#acc = 0.711
I have tired to add layers and/or change number of units per layer and/or change the activation function (to relu) by it seems that the result are not higher than 0.85.
With sklearn random forest or k_neighbors I'm getting result (on same dataset) above 0.95.
What am I missing ?
With sklearn I did little effort and got good results, and with keras, I had a lot of upgrades but not as good as sklearn results. why is that ?
How can I get same results with keras ?
In short, you need:
ReLU activations
Simpler model
Data mormalization
More epochs
In detail:
The first issue here is that nowadays we never use activation='tanh' for the intermediate network layers. In such problems, we practically always use activation='relu'.
The second issue is that you have build quite a large Keras model, and it might very well be the case that with only 100 iris samples in your training set you have too few data to effectively train such a large model. Try reducing drastically both the number of layers and the number of nodes per layer. Start simpler.
Large neural networks really thrive when we have lots of data, but in cases of small datasets, like here, their expressiveness and flexibility may become a liability instead, compared with simpler algorithms, like RF or k-nn.
The third issue is that, in contrast to tree-based models, like Random Forests, neural networks generally require normalizing the data, which you don't do. Truth is that knn also requires normalized data, but in this special case, since all iris features are in the same scale, it does not affect the performance negatively.
Last but not least, you seem to run your Keras model for only one epoch (the default value if you don't specify anything in model.fit); this is somewhat equivalent to building a random forest with a single tree (which, BTW, is still much better than a single decision tree).
All in all, with the following changes in your code:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
model = Sequential()
model.add(Dense(150, activation='relu', input_shape = ((df.shape[1]-1),)))
model.add(Dense(150, activation='relu'))
model.add(Dense(y.shape[1], activation='softmax'))
model.fit(X_train, y_train, epochs=100)
and everything else as is, we get:
score, acc = model.evaluate(X_test, y_test, verbose=0)
acc
# 0.9333333373069763
We can do better: use slightly more training data and stratify them, i.e.
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size = 0.20, # a few more samples for training
stratify=y)
And with the same model & training epochs you can get a perfect accuracy of 1.0 in the test set:
score, acc = model.evaluate(X_test, y_test, verbose=0)
acc
# 1.0
(Details might differ due to some randomness imposed by default in such experiments).
Adding some dropout might help you improve accuracy. See Tensorflow's documentation for more information.
Essentially how you add a Dropout layer is just very similar to how you added those Dense() layers.
model.add(Dropout(0.2)
Note: The parameter '0.2 implies that 20% of the connections in the layer is randomly omitted to reduce the interdependencies between them, which reduces overfitting.

Multivariate Binary Classification Prediction Tensorflow 2 LSTM

I am currently working on the implementation of an LSTM to predict a binary outcome (either 0 or 1) for a given set of normed scaled features.
self._regressor.add(LSTM(units=60, activation='relu', return_sequences=True, input_shape=(data.x_train.shape[1], data.x_train.shape[2])))
self._regressor.add(Dropout(0.2))
self._regressor.add(LSTM(units=60, activation='relu', return_sequences=True))
self._regressor.add(Dropout(0.3))
self._regressor.add(LSTM(units=80, activation='relu', return_sequences=True))
self._regressor.add(Dropout(0.4))
self._regressor.add(LSTM(units=120, activation='relu'))
self._regressor.add(Dropout(0.5))
#this is the output layer
self._regressor.add(Dense(units=1, activation='sigmoid'))
self._logger.info("TensorFlow Summary\n {}".format(self._regressor.summary()))
#run regressor
self._regressor.compile(optimizer='adam', loss="binary_crossentropy", metrics=['accuracy'])
self._regressor.fit(data.x_train, data.y_train, epochs=1, batch_size=32)
data.y_pred_scaled = self._regressor.predict(data.x_test)
data.y_pred = self._scaler_target.inverse_transform(data.y_pred_scaled)
scores = self._regressor.evaluate(data.x_test, data.y_test, verbose=0)
My issue here is that the output of my prediction has a range of max: 0.5188445 and min: 0.518052, implying to me that all of my classifications are positive (which is definitely incorrect). I even tried predict_classes and this yielded an array of 1's.
I am struggling to find where my issue is despite numerous searches online. I have ensured that my final output layer consists of a sigmoid function as well as included the loss as the binary_crossentropy also. My data has been scaled using sklearn's MinMaxScaler with feature_range=(0,1). I am running my code through a debugger and everything up to the self._regressor.fit looks good so far. I am just struggling with quantifying the output of the predictions.
Any help would be greatly appreciated.

Sequentialmodels without `input_shape` passed to the 1st layer cannot reload optimizer state

WARNING:tensorflow:Sequential models without an input_shape passed to the first layer cannot reload their optimizer state. As a result, your model is starting with a freshly initialized optimizer.
while trying to load a saved model i encountered this warning from tensorflow
import tensorflow.keras as keras
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train = tf.keras.utils.normalize(x_train, axis=1)
x_test = tf.keras.utils.normalize(x_test, axis=1)
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=3)
model.save('epic_num_reader.model')
new_model = tf.keras.models.load_model('epic_num_reader.model')
predictions = new_model.predict(x_test)
Had the same problem after upgrading to TF 1.14, I fixed it changing the definition of the first layer from this:
model.add(tf.keras.layers.Flatten())
to this
model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
where 28 is the size of the input map to be flattened (mnist pixels in our case)
As the warning suggest, your first layer need the argument input_shape. In your case this would be the layer Flatten.
In the keras Documentation there is an extra section about the sequential API. See here for further information.
model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
for the first layer after tf 1.14 it is require to use input type which is the dimensions for the particular image.
Or you might get warning while retrieving model to not get proper working for your optimizer

How to use a Keras LSTM with batch size of 1 ? (avoid zero padding)

I want to train a Keras LSTM using a batch size of one.
In this case I wouldn't need zero padding so far I understood. The necessity for zero padding comes from equalizing the size of the batches, right?
It turns out this is not as easy as I thought.
My network looks like this:
model = Sequential()
model.add(Embedding(output_dim=embeddings.shape[1],
input_dim=embeddings.shape[0],
weights=[embeddings],
trainable=False))
model.add(Dropout(0.2))
model.add(Bidirectional(LSTM(100,
return_sequences=True,
activation="tanh",kernel_initializer="glorot_uniform")))
model.add(TimeDistributed(Dense(maxLabel)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=20, batch_size=1, shuffle=True)
I naively provide my training data and training labels as simple numpy array with this shape properties:
X: (2161,)
Y: (2161,)
I get now a ValueError: Error when checking target: expected activation_1 to have 3 dimensions, but got array with shape (2161, 1)
I am not sure how to satisfy this 3-D property without zero-padding which is what I want to avoid when working with batch sizes of one anyway.
Does anyone see what I am doing wrong?