RNN model not learning anything - tensorflow

I am practicing with RNN. I randomly create 5 integers. If the first integer is an odd number, the y value is 1, otherwise y is 0 (So, only the first x counts). Problem is, when I run this model, it does not 'learn': val_loss and val_accuracy does not change over epochs. What would be the cause?
from keras.layers import SimpleRNN, LSTM, GRU, Dropout, Dense
from keras.models import Sequential
import numpy as np
data_len = 300
x = []
y = []
for i in range(data_len):
a = np.random.randint(1,10,5)
if a[0] % 2 == 0:
y.append('0')
else:
y.append('1')
a = a.reshape(5, 1)
x.append(a)
print(x)
X = np.array(x)
Y = np.array(y)
model = Sequential()
model.add(GRU(units=24, activation='relu', return_sequences=True, input_shape=[5,1]))
model.add(Dropout(rate=0.5))
model.add(GRU(units=12, activation='relu'))
model.add(Dropout(rate=0.5))
model.add(Dense(units=1, activation='softmax'))
model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
model.summary()
history = model.fit(X[:210], Y[:210], epochs=20, validation_split=0.2)
Epoch 1/20
168/168 [==============================] - 1s 6ms/step - loss: 0.4345 - accuracy: 0.5655 - val_loss: 0.5000 - val_accuracy: 0.5000
...
Epoch 20/20
168/168 [==============================] - 0s 315us/step - loss: 0.4345 - accuracy: 0.5655 - val_loss: 0.5000 - val_accuracy: 0.5000

You're using softmax activation with 1 neuron, which always returns [1]. Use sigmoid activation with 1 neuron for binary classification, and softmax for multiple neurons for multiclass classification

Change data_len to a higher number like 30000 and it will be able to learn. Right now the amount of data is very small. and ofcourse, you'll need to change the activation (to sigomid) -- as suggested by Yoskutik

Related

Keras Callback - Save the per-epoch outputs

I am trying to create a callback for extracting the per-epoch outputs in the 1st hidden layer of my model. With what I have written, self.model.layers[0].output outputs a Tensor object but I could not see the actual entries.
Ideally I would like to save these output tensors, and visualise using an epoch vs mean-output plot. This has been implemented in Glorot & Bengio (2010) but the source code is not available.
How shall I edit my code in order to make the model fitting process save the outputs in each epoch? Thanks in advance.
class PerEpochOutputCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
print('First layer output of epoch:', epoch+1, self.model.layers[0].output)
model_relu_3= Sequential()
# Use ReLU hidden layer
model_relu_3.add(Dense(3, input_dim= 8, activation= 'relu', kernel_initializer= 'uniform'))
model_relu_3.add(Dense(5, input_dim= 3, activation= 'relu', kernel_initializer= 'uniform'))
model_relu_3.add(Dense(5, input_dim= 5, activation= 'relu', kernel_initializer= 'uniform'))
model_relu_3.add(Dense(1, activation= 'sigmoid', kernel_initializer= 'uniform'))
model_relu_3.compile(loss='binary_crossentropy', optimizer='adam', metrics= ['accuracy'])
# Train model
tr_results = model_relu_3.fit(X, y, validation_split=0.2, epochs=10, batch_size=32,
verbose=2, callbacks=[PerEpochOutputCallback()])
====
Train on 614 samples, validate on 154 samples
Epoch 1/10
First layer output of epoch: 1 Tensor("dense_42/Relu:0", shape=(None, 3), dtype=float32)
614/614 - 0s - loss: 0.6915 - accuracy: 0.6531 - val_loss: 0.6897 - val_accuracy: 0.6429
Epoch 2/10
First layer output of epoch: 2 Tensor("dense_42/Relu:0", shape=(None, 3), dtype=float32)
614/614 - 0s - loss: 0.6874 - accuracy: 0.6531 - val_loss: 0.6853 - val_accuracy: 0.6429
Epoch 3/10
First layer output of epoch: 3 Tensor("dense_42/Relu:0", shape=(None, 3), dtype=float32)
614/614 - 0s - loss: 0.6824 - accuracy: 0.6531 - val_loss: 0.6783 - val_accuracy: 0.6429

LSTM: loss value is not changing

I am working on predicting stock trend (up, or down).
Below is how I am handling my pre-processing.
index_ = len(df.columns) - 1
x = df.iloc[:,:index_]
x = x[['Relative_Volume', 'CurrentPrice', 'MarketCap']]
x = x.values.astype(float)
# x = x.reshape(len(x), 1, x.shape[1]).astype(float)
x = x.reshape(*x.shape, 1)
y = df.iloc[:,index_:].values.astype(float)
# x.shape = (44930, 3, 1)
# y.shape = (44930, 1)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20, random_state=98 )
Then I am building my BILSTM model:
def build_nn():
model = Sequential()
model.add(Bidirectional(LSTM(128, return_sequences=True, input_shape = (x_train.shape[0], 1) , name="one")))
model.add(Dropout(0.20))
model.add(Bidirectional(LSTM(128, return_sequences=True , name="two")))
model.add(Dropout(0.20))
model.add(Bidirectional(LSTM(64, return_sequences=False , name="three")))
model.add(Dropout(0.20))
model.add(Dense(1,activation='sigmoid'))
# opt = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, decay=0.01)
opt = SGD(lr=0.01)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
return model
filepath = "bilstmv1.h5"
chkp = ModelCheckpoint(monitor = 'val_accuracy', mode = 'auto', filepath=filepath, verbose = 1, save_best_only=True)
model = build_nn()
# model.summary()
model.fit(x_train, y_train,
epochs=3,
batch_size=256,
validation_split=0.1, callbacks=[chkp])
model.summary()
Below is the output of the loss_value:
Epoch 1/3
127/127 [==============================] - 27s 130ms/step - loss: 0.6829 - accuracy: 0.5845 - val_loss: 0.6797 - val_accuracy: 0.5803
Epoch 00001: val_accuracy improved from -inf to 0.58025, saving model to bilstmv1.h5
Epoch 2/3
127/127 [==============================] - 14s 112ms/step - loss: 0.6788 - accuracy: 0.5851 - val_loss: 0.6798 - val_accuracy: 0.5803
Epoch 00002: val_accuracy did not improve from 0.58025
Epoch 3/3
127/127 [==============================] - 14s 112ms/step - loss: 0.6800 - accuracy: 0.5822 - val_loss: 0.6798 - val_accuracy: 0.5803
Epoch 00003: val_accuracy did not improve from 0.58025
I have tried to change the optimzer, loss_function, and other modification. As you can expect, all the predictions are same since the loss function is not being changed.
You have an issue with your input shape in your first LSTM layer. Keras inputs takes (None, Your_Shape) as its input, since your input to the model can vary. You can have 1 input, 2 inputs, or infinity inputs. The only way to represent dynamic is by using None as the first input. The quickest way to do this is to change the input to (None, *input_shape), since the * will expand your input shape.
Your build function will then become:
def build_nn():
model = Sequential()
model.add(Bidirectional(LSTM(128, return_sequences=True, input_shape = (None, *x_train.shape) , name="one")))
model.add(Dropout(0.20))
model.add(Bidirectional(LSTM(128, return_sequences=True , name="two")))
model.add(Dropout(0.20))
model.add(Bidirectional(LSTM(64, return_sequences=False , name="three")))
model.add(Dropout(0.20))
model.add(Dense(1,activation='sigmoid'))
# opt = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, decay=0.01)
opt = SGD(lr=0.01)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
return model
Though I still advise having a look at your Optimizer as that might affect your results. You can also use -1 as an input shape which will mean auto fill, but you can only use it once.

NaN loss and 0 accuracy from the start itself: Encoder Decoder Model Keras

I have made an encoder decoder model using Keras framework, for making a chatbot. I cannot find any issues with my model, still on training the LOSS is nan from the first epoch itself, and the accuracy remains zero.
I have tried the code for different batch sizes, different learning rates, different optimizers, but there is not even a slight change in the output values. I even tried gradient clipping and regularization still no signs of even a bit of improvement. The output that the model gives is completely random.
The code takes up inputs of shape:
(BATCH, MAX_LENGTH) for encoder input -> Converted to (BATCH, MAX_LENGTH, EMB_SIZE) by embedding layer
(BATCH, MAX_LENGTH) for decoder input -> Converted to (BATCH, MAX_LENGTH, EMB_SIZE) by embedding layer
Output shape is:
(BATCH, MAX_LENGTH, 1) for decoder target (hence the loss that I use is 'sparse_categorical_crossentropy')
Here is the code of my model:
# Define an input sequence and process it.
encoder_inputs = Input(name='encoder_input', shape=(None,))
encoder_embedding = Embedding(name='encoder_emb', input_dim=VOCAB_SIZE,
output_dim=EMB_SIZE,
weights=[embedding_matrix],
trainable=False,
input_length=MAX_LENGTH)(encoder_inputs)
encoder = LSTM(HIDDEN_DIM, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_embedding)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(name='decoder_input', shape=(None, ))
decoder_embedding = Embedding(name='decoder_emb', input_dim=VOCAB_SIZE,
output_dim=EMB_SIZE,
weights=[embedding_matrix],
trainable=False,
input_length=MAX_LENGTH)(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(HIDDEN_DIM, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_embedding,
initial_state=encoder_states)
decoder_dense = TimeDistributed(Dense(VOCAB_SIZE, activation='softmax'))
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
The word embeddings (embedding_matrix) is developed using GloVe embeddings.
This is how the results come up for the training...
Epoch 1/100
1329/1329 [==============================] - 1s 868us/step - loss: nan - accuracy: 4.7655e-04
Epoch 2/100
1329/1329 [==============================] - 0s 353us/step - loss: nan - accuracy: 4.7655e-04
Epoch 3/100
1329/1329 [==============================] - 0s 345us/step - loss: nan - accuracy: 4.7655e-04
Epoch 4/100
1329/1329 [==============================] - 0s 354us/step - loss: nan - accuracy: 4.7655e-04
Epoch 5/100
1329/1329 [==============================] - 0s 349us/step - loss: nan - accuracy: 4.7655e-04
The issue was in my data. The model is perfect!

Forward Pass calculation on current batch in "get_updates" method of Keras SGD Optimizer

I am trying to implement a stochastic armijo rule in the get_gradient method of Keras SGD optimizer.
Therefore, I need to calculate another forward pass to check if the learning_rate chosen was good. I don't want another calculation of the gradients, but I want to use the updated weights.
Using Keras Version 2.3.1 and Tensorflow Version 1.14.0
def get_updates(self, loss, params):
grads = self.get_gradients(loss, params)
self.updates = [K.update_add(self.iterations, 1)]
lr = self.learning_rate
if self.initial_decay > 0:
lr = lr * (1. / (1. + self.decay * K.cast(self.iterations,
K.dtype(self.decay))))
# momentum
shapes = [K.int_shape(p) for p in params]
moments = [K.zeros(shape, name='moment_' + str(i))
for (i, shape) in enumerate(shapes)]
self.weights = [self.iterations] + moments
for p, g, m in zip(params, grads, moments):
v = self.momentum * m - lr * g # velocity
self.updates.append(K.update(m, v))
if self.nesterov:
new_p = p + self.momentum * v - lr * g
else:
new_p = p + v
# Apply constraints.
if getattr(p, 'constraint', None) is not None:
new_p = p.constraint(new_p)
self.updates.append(K.update(p, new_p))
### own changes ###
if self.armijo:
inputs = (model._feed_inputs +
model._feed_targets +
model._feed_sample_weights)
input_layer = model.layers[0].input
armijo_function = K.function(inputs=input_layer, outputs=[loss],
updates=self.updates,name='armijo')
loss_next= armijo_function(inputs)
[....change updates if learning rate was not good enough...]
return self.updates
Unfortunately, I don't understand the error message when trying to calculate "loss_next":
tensorflow.python.framework.errors_impl.InvalidArgumentError: Requested Tensor connection between nodes "conv2d_1_input" and "conv2d_1_input" would create a cycle.
Two questions here:
how to access the current batch I am working on? The forward calculation should only consider the actual batch and as the gradients also belong only to that batch.
any better ideas to not use K.function for updating and evaluating a forward pass to calculate the loss function on that batch?
Anyone who can help? Thanks in advance.
how to access the current batch I am working on? The forward calculation should only consider the actual batch and as the gradients also belong only to that batch.
For this you can use batch_size = Total training records in model.fit() so that every epoch has just one forward pass and back propagation. Thus you can analysis the gradients on epoch 1 and modify the learning rate for epoch 2 OR if you are using the custom training loop then modify the code accordingly.
any better ideas to not use K.function for updating and evaluating a forward pass to calculate the loss function on that batch?
I do not recall any other option to evaluate gradient apart from using from tensorflow.keras import backend as K in tensorflow version 1.x. The best option is to update tensorflow to latest version 2.2.0 and use tf.GradientTape.
Would recommend to go through this answer to capture gradients using from tensorflow.keras import backend as K in tensorflow 1.x.
Below is a sample code which is almost similar to your requirement. I am using tensorflow version 2.2.0. You can build your requirements from this program.
We are doing below functions in the program -
We are altering the Learning rate after every epoch. You can do that using callbacks argument of model.fit. Here I am incrementing learning rate by 0.01 for every epoch using tf.keras.callbacks.LearningRateScheduler and also displaying it at end of every epoch using tf.keras.callbacks.Callback.
Computing the gradient using tf.GradientTape() after end of every epoch. We are collecting the grads of every epoch to a list using append.
Also have set batch_size=len(train_images)as per your requirement.
Note : I am training on just 500 records from Cifar dataset due to memory constraints.
Code -
%tensorflow_version 2.x
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import backend as K
import os
import numpy as np
import matplotlib.pyplot as plt
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()
train_images = train_images[:500]
train_labels = train_labels[:500]
test_images = test_images[:50]
test_labels = test_labels[:50]
model = Sequential([
Conv2D(16, 3, padding='same', activation='relu', input_shape=(32, 32, 3)),
MaxPooling2D(),
Conv2D(32, 3, padding='same', activation='relu'),
MaxPooling2D(),
Conv2D(64, 3, padding='same', activation='relu'),
MaxPooling2D(),
Flatten(),
Dense(512, activation='relu'),
Dense(10)
])
lr = 0.01
adam = Adam(lr)
# Define the Gradient Fucntion
epoch_gradient = []
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# Define the Required Callback Function
class GradientCalcCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
with tf.GradientTape() as tape:
logits = model(train_images, training=True)
loss = loss_fn(train_labels, logits)
grad = tape.gradient(loss, model.trainable_weights)
model.optimizer.apply_gradients(zip(grad, model.trainable_variables))
epoch_gradient.append(grad)
gradcalc = GradientCalcCallback()
# Define the Required Callback Function
class printlearningrate(tf.keras.callbacks.Callback):
def on_epoch_begin(self, epoch, logs={}):
optimizer = self.model.optimizer
lr = K.eval(optimizer.lr)
Epoch_count = epoch + 1
print('\n', "Epoch:", Epoch_count, ', LR: {:.2f}'.format(lr))
printlr = printlearningrate()
def scheduler(epoch):
optimizer = model.optimizer
return K.eval(optimizer.lr + 0.01)
updatelr = tf.keras.callbacks.LearningRateScheduler(scheduler)
model.compile(optimizer=adam,
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
epochs = 10
history = model.fit(train_images, train_labels, epochs=epochs, batch_size=len(train_images),
validation_data=(test_images, test_labels),
callbacks = [printlr,updatelr,gradcalc])
# (7) Convert to a 2 dimensiaonal array of (epoch, gradients) type
gradient = np.asarray(epoch_gradient)
print("Total number of epochs run:", epochs)
print("Gradient Array has the shape:",gradient.shape)
Output -
Epoch: 1 , LR: 0.01
Epoch 1/10
1/1 [==============================] - 0s 427ms/step - loss: 30.1399 - accuracy: 0.0820 - val_loss: 2114.8201 - val_accuracy: 0.1800 - lr: 0.0200
Epoch: 2 , LR: 0.02
Epoch 2/10
1/1 [==============================] - 0s 329ms/step - loss: 141.6176 - accuracy: 0.0920 - val_loss: 41.7008 - val_accuracy: 0.0400 - lr: 0.0300
Epoch: 3 , LR: 0.03
Epoch 3/10
1/1 [==============================] - 0s 328ms/step - loss: 4.1428 - accuracy: 0.1160 - val_loss: 2.3883 - val_accuracy: 0.1800 - lr: 0.0400
Epoch: 4 , LR: 0.04
Epoch 4/10
1/1 [==============================] - 0s 329ms/step - loss: 2.3545 - accuracy: 0.1060 - val_loss: 2.3471 - val_accuracy: 0.1800 - lr: 0.0500
Epoch: 5 , LR: 0.05
Epoch 5/10
1/1 [==============================] - 0s 340ms/step - loss: 2.3208 - accuracy: 0.1060 - val_loss: 2.3047 - val_accuracy: 0.1800 - lr: 0.0600
Epoch: 6 , LR: 0.06
Epoch 6/10
1/1 [==============================] - 0s 331ms/step - loss: 2.3048 - accuracy: 0.1300 - val_loss: 2.3069 - val_accuracy: 0.0600 - lr: 0.0700
Epoch: 7 , LR: 0.07
Epoch 7/10
1/1 [==============================] - 0s 337ms/step - loss: 2.3041 - accuracy: 0.1340 - val_loss: 2.3432 - val_accuracy: 0.0600 - lr: 0.0800
Epoch: 8 , LR: 0.08
Epoch 8/10
1/1 [==============================] - 0s 341ms/step - loss: 2.2871 - accuracy: 0.1400 - val_loss: 2.6009 - val_accuracy: 0.0800 - lr: 0.0900
Epoch: 9 , LR: 0.09
Epoch 9/10
1/1 [==============================] - 1s 515ms/step - loss: 2.2810 - accuracy: 0.1440 - val_loss: 2.8530 - val_accuracy: 0.0600 - lr: 0.1000
Epoch: 10 , LR: 0.10
Epoch 10/10
1/1 [==============================] - 0s 343ms/step - loss: 2.2954 - accuracy: 0.1300 - val_loss: 2.3049 - val_accuracy: 0.0600 - lr: 0.1100
Total number of epochs run: 10
Gradient Array has the shape: (10, 10)
Hope this answers your question. Happy Learning.

Train accuracy improving but validation remains unchanged?

I am using TF 2.0. I was trying to train a network on my own data. It was not going well. The validation accuracy was close to 0 and stagnant. I tried many regularizations to no effect. Then I tried training a network on 3 classes of data where all images in each class are the same so as to eliminate the possibility of variability. But this is not working either. Since all in-class images are the same, I would expect the validation accuracy to perfectly match the training accuracy since there is no new data. Why is that not the case? Here is my code:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.applications.mobilenet import preprocess_input
import matplotlib.pyplot as plt
base_model = tf.keras.applications.MobileNet(weights='imagenet', include_top=False)
def turn_off(n):
for layer in model.layers[:n]:
layer.trainable = False
for layer in model.layers[n:]:
layer.trainable = True
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(
x) # we add dense layers so that the model can learn more complex functions and classify for better results.
x = Dense(1024, activation='relu')(x) # dense layer 2
x = Dense(512, activation='relu')(x) # dense layer 3
preds = Dense(3, activation='softmax')(x) # final layer with softmax activation
model = Model(inputs=base_model.input, outputs=preds)
turn_off(87)
train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input,
rescale=1. / 255,
validation_split=0.2) # set validation split
train_generator = train_datagen.flow_from_directory(
'/users/josh.flori/desktop/colors/',
target_size=(224, 224),
batch_size=32,
color_mode='rgb',
class_mode='categorical',
subset='training',
shuffle=True) # set as training data
validation_generator = train_datagen.flow_from_directory(
'/users/josh.flori/desktop/colors/',
target_size=(224, 224),
batch_size=32,
color_mode='rgb',
class_mode='categorical',
subset='validation',
shuffle=True) # set as validation data
# Adam optimizer
# loss function will be categorical cross entropy
# evaluation metric will be accuracy
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit_generator(
train_generator,
steps_per_epoch=train_generator.samples // train_generator.batch_size,
validation_data=validation_generator,
validation_steps=validation_generator.samples // train_generator.batch_size,
epochs=6)
Here is the training output
9/9 [==============================] - 19s 2s/step - loss: 0.2645 - accuracy: 0.9134 - val_loss: 1.6668 - val_accuracy: 0.3438
Epoch 2/6
9/9 [==============================] - 20s 2s/step - loss: 0.0417 - accuracy: 0.9567 - val_loss: 2.6176 - val_accuracy: 0.3438
Epoch 3/6
9/9 [==============================] - 17s 2s/step - loss: 0.4771 - accuracy: 0.9422 - val_loss: 4.0694 - val_accuracy: 0.3438
Epoch 4/6
9/9 [==============================] - 18s 2s/step - loss: 0.0000e+00 - accuracy: 1.0000 - val_loss: 2.1304 - val_accuracy: 0.3125
Epoch 5/6
9/9 [==============================] - 18s 2s/step - loss: 9.7658e-07 - accuracy: 1.0000 - val_loss: 3.1633 - val_accuracy: 0.3125
Epoch 6/6
9/9 [==============================] - 18s 2s/step - loss: 2.2571e-05 - accuracy: 1.0000 - val_loss: 3.4949 - val_accuracy: 0.3125
My image folders look like this
where there are exactly 128 identical images per folder.
I've been reading all day, trying different images, I can't seem to get anywhere. What is causing this particular behavior? It has to be something obvious but I'm not sure.