The second epoch's initial loss is not consistently related to the first epoch's final loss. The loss and the accuracy remain constant every epoch - tensorflow

The second loss is not consistently related to the first epoch. After that, every initial loss always stays the same every epoch. And all these parameters stay the same. I have some background in deep learning, but this is my first time implementing my own model so I want to know what's going wrong with my model intuitively. The dataset is the cropped face with two classifications each having 300 pictures. I highly appreciate your help.
import tensorflow as tf
from tensorflow import keras
from IPython.display import Image
import matplotlib.pyplot as plt
from keras.layers import ActivityRegularization
from keras.layers import Dropout
from tensorflow.keras.preprocessing.image import ImageDataGenerator
image_generator = ImageDataGenerator(
featurewise_center=False, samplewise_center=False,
featurewise_std_normalization=False, samplewise_std_normalization=False,
rotation_range=0, width_shift_range=0.0, height_shift_range=0.0,
brightness_range=None, shear_range=0.0, zoom_range=0.0, channel_shift_range=0.0,
horizontal_flip=False, vertical_flip=False, rescale=1./255
)
image = image_generator.flow_from_directory('./util/untitled folder',batch_size=938)
x, y = image.next()
x_train = x[:500]
y_train = y[:500]
x_test = x[500:600]
y_test = y[500:600]
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(4)
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(4)
plt.imshow(x_train[0])
def convolutional_model(input_shape):
input_img = tf.keras.Input(shape=input_shape)
x = tf.keras.layers.Conv2D(64, (7,7), padding='same')(input_img)
x = tf.keras.layers.BatchNormalization(axis=3)(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=1, padding='same')(x)
x = Dropout(0.5)(x)
x = tf.keras.layers.Conv2D(128, (3, 3), padding='same', strides=1)(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.MaxPool2D(pool_size=(2, 2), padding='same', strides=4)(x)
x = tf.keras.layers.Flatten()(x)
x = ActivityRegularization(0.1,0.2)(x)
outputs = tf.keras.layers.Dense(2, activation='softmax')(x)
model = tf.keras.Model(inputs=input_img, outputs=outputs)
return model
conv_model = convolutional_model((256, 256, 3))
conv_model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.SGD(lr=1),
metrics=['accuracy'])
conv_model.summary()
conv_model.fit(train_dataset,epochs=100, validation_data=test_dataset)
Epoch 1/100
2021-12-23 15:06:22.165763: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2021-12-23 15:06:22.172255: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
125/125 [==============================] - ETA: 0s - loss: 804.6805 - accuracy: 0.48602021-12-23 15:06:50.936870: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
125/125 [==============================] - 35s 275ms/step - loss: 804.6805 - accuracy: 0.4860 - val_loss: 0.7197 - val_accuracy: 0.4980
Epoch 2/100
125/125 [==============================] - 34s 270ms/step - loss: 0.7360 - accuracy: 0.4820 - val_loss: 0.7197 - val_accuracy: 0.4980
Epoch 3/100
125/125 [==============================] - 34s 276ms/step - loss: 0.7360 - accuracy: 0.4820 - val_loss: 0.7197 - val_accuracy: 0.4980

As you have a constant loss + accuracy, it is highly likely that your network does not learn anything (since you have two classes, it always predicts one of them).
The activation function, loss function and number of neurons on the last layer are correct.
The problem is not related to they way you load the images, but to the learning rate which is 1. At such a high learning rate, it is impossible for the network to be able to learn anything.
You should start with a much smaller learning rate, for example 0.0001 or 0.00001, and then try to debug the data-loading process if you still have poor performance.

I am quite certain that it has something to do with how you load the data, and more specifically the x, y = image.next() part. If you are able to split the data from ./util/untitled folder to separate folders having training and validation data respectively, you could use the same kind on pipeline as in the examples section on Tensorflow page:
train_datagen = ImageDataGenerator(
featurewise_center=False,
samplewise_center=False,
featurewise_std_normalization=False,
samplewise_std_normalization=False,
rotation_range=0,
width_shift_range=0.0,
height_shift_range=0.0,
brightness_range=None,
shear_range=0.0,
zoom_range=0.0,
channel_shift_range=0.0,
horizontal_flip=False,
vertical_flip=False,
rescale=1./255)
test_datagen = ImageDataGenerator(featurewise_center=False,
samplewise_center=False,
featurewise_std_normalization=False,
samplewise_std_normalization=False,
rotation_range=0,
width_shift_range=0.0,
height_shift_range=0.0,
brightness_range=None,
shear_range=0.0,
zoom_range=0.0,
channel_shift_range=0.0,
horizontal_flip=False,
vertical_flip=False,
rescale=1./255)
train_generator = train_datagen.flow_from_directory(
'data/train',
target_size=(256, 256),
batch_size=4)
validation_generator = test_datagen.flow_from_directory(
'data/validation',
target_size=(256, 256),
batch_size=4)
model.fit(
train_generator,
epochs=100,
validation_data=validation_generator)

Related

Training CNN Model and accuracy stays at 1

I've been trying to train this CNN Model, It's a Tensorflow tutorial and I just changed the dataset ( I used fruit 360 dataset) without altering the core of the code. When it finishes training the accuracy stays constant at 0.8565 it doesn't change and when I try and test some images it almost always wrong.
What am I doing wrong?
Code output after executing
Here's the code I used
[enter image description here][1]import matplotlib.pyplot as plt
import numpy as np
import os
import PIL
import tensorflow as tf
import tarfile
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras import datasets, layers, models
from tensorflow import keras
import pathlib
dataset_url = "https://file.io/z5JM3sYAWXv4"
data_dir = tf.keras.utils.get_file(origin=dataset_url,
fname='tomatos',
untar=True,
extract=True)
data_dir = pathlib.Path(data_dir)
print(data_dir)
file_count = sum(len(files) for _, _, files in os.walk(r'tomatos'))
print(file_count)
batch_size = 32
img_height = 180
img_width = 180
train_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
val_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
class_names = train_ds.class_names
print(class_names)
AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
num_classes = len(class_names)
model = Sequential([
layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.summary()
epochs=2
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
data_augmentation = keras.Sequential(
[
layers.RandomFlip("horizontal",
input_shape=(img_height,
img_width,
3)),
layers.RandomRotation(0.1),
layers.RandomZoom(0.1),
]
)
model = Sequential([
data_augmentation,
layers.Rescaling(1./255),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.2),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.summary()
epochs = 4
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
sunflower_url = "https://puffycarrot.com/wp-content/uploads/2017/04/Green-tomatoes.jpg"
sunflower_path = tf.keras.utils.get_file('tomato2', origin=sunflower_url)
img = tf.keras.utils.load_img(
sunflower_path, target_size=(img_height, img_width)
)
img_array = tf.keras.utils.img_to_array(img)
img_array = tf.expand_dims(img_array, 0) # Create a batch
predictions = model.predict(img_array)
score = tf.nn.softmax(predictions[0])
print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(class_names[np.argmax(score)], 100 * np.max(score))
)
#Yaman Tarawneh, I tried replicating your above mentioned code in Google colab (using TF 2.8) and in Pycharm (using TF 2.7) and did not find the error.
Please check the output image for Pycharm :
and got the same output in Google colab :
Total params: 3,988,898
Trainable params: 3,988,898
Non-trainable params: 0
_________________________________________________________________
Epoch 1/4
78/78 [==============================] - 8s 41ms/step - loss: 0.0309 - accuracy: 0.9835 - val_loss: 5.6374e-07 - val_accuracy: 1.0000
Epoch 2/4
78/78 [==============================] - 2s 25ms/step - loss: 5.7533e-07 - accuracy: 1.0000 - val_loss: 2.7360e-07 - val_accuracy: 1.0000
Epoch 3/4
78/78 [==============================] - 2s 25ms/step - loss: 3.0400e-07 - accuracy: 1.0000 - val_loss: 1.3978e-07 - val_accuracy: 1.0000
Epoch 4/4
78/78 [==============================] - 2s 25ms/step - loss: 1.7403e-07 - accuracy: 1.0000 - val_loss: 7.2102e-08 - val_accuracy: 1.0000
This image most likely belongs to Tomato not Ripened with a 100.00 percent confidence.
For further analysis if the issue still persists, Please let us know which Python and Tensorflow version are you using.

LSTM: loss value is not changing

I am working on predicting stock trend (up, or down).
Below is how I am handling my pre-processing.
index_ = len(df.columns) - 1
x = df.iloc[:,:index_]
x = x[['Relative_Volume', 'CurrentPrice', 'MarketCap']]
x = x.values.astype(float)
# x = x.reshape(len(x), 1, x.shape[1]).astype(float)
x = x.reshape(*x.shape, 1)
y = df.iloc[:,index_:].values.astype(float)
# x.shape = (44930, 3, 1)
# y.shape = (44930, 1)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20, random_state=98 )
Then I am building my BILSTM model:
def build_nn():
model = Sequential()
model.add(Bidirectional(LSTM(128, return_sequences=True, input_shape = (x_train.shape[0], 1) , name="one")))
model.add(Dropout(0.20))
model.add(Bidirectional(LSTM(128, return_sequences=True , name="two")))
model.add(Dropout(0.20))
model.add(Bidirectional(LSTM(64, return_sequences=False , name="three")))
model.add(Dropout(0.20))
model.add(Dense(1,activation='sigmoid'))
# opt = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, decay=0.01)
opt = SGD(lr=0.01)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
return model
filepath = "bilstmv1.h5"
chkp = ModelCheckpoint(monitor = 'val_accuracy', mode = 'auto', filepath=filepath, verbose = 1, save_best_only=True)
model = build_nn()
# model.summary()
model.fit(x_train, y_train,
epochs=3,
batch_size=256,
validation_split=0.1, callbacks=[chkp])
model.summary()
Below is the output of the loss_value:
Epoch 1/3
127/127 [==============================] - 27s 130ms/step - loss: 0.6829 - accuracy: 0.5845 - val_loss: 0.6797 - val_accuracy: 0.5803
Epoch 00001: val_accuracy improved from -inf to 0.58025, saving model to bilstmv1.h5
Epoch 2/3
127/127 [==============================] - 14s 112ms/step - loss: 0.6788 - accuracy: 0.5851 - val_loss: 0.6798 - val_accuracy: 0.5803
Epoch 00002: val_accuracy did not improve from 0.58025
Epoch 3/3
127/127 [==============================] - 14s 112ms/step - loss: 0.6800 - accuracy: 0.5822 - val_loss: 0.6798 - val_accuracy: 0.5803
Epoch 00003: val_accuracy did not improve from 0.58025
I have tried to change the optimzer, loss_function, and other modification. As you can expect, all the predictions are same since the loss function is not being changed.
You have an issue with your input shape in your first LSTM layer. Keras inputs takes (None, Your_Shape) as its input, since your input to the model can vary. You can have 1 input, 2 inputs, or infinity inputs. The only way to represent dynamic is by using None as the first input. The quickest way to do this is to change the input to (None, *input_shape), since the * will expand your input shape.
Your build function will then become:
def build_nn():
model = Sequential()
model.add(Bidirectional(LSTM(128, return_sequences=True, input_shape = (None, *x_train.shape) , name="one")))
model.add(Dropout(0.20))
model.add(Bidirectional(LSTM(128, return_sequences=True , name="two")))
model.add(Dropout(0.20))
model.add(Bidirectional(LSTM(64, return_sequences=False , name="three")))
model.add(Dropout(0.20))
model.add(Dense(1,activation='sigmoid'))
# opt = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, decay=0.01)
opt = SGD(lr=0.01)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
return model
Though I still advise having a look at your Optimizer as that might affect your results. You can also use -1 as an input shape which will mean auto fill, but you can only use it once.

Tensorflow 2 Metrics produce wrong results with 2 GPUs

I took this piece of code from tensorflow documentation about distributed training with custom loop https://www.tensorflow.org/tutorials/distribute/custom_training and I just fixed it to work with the tf.keras.metrics.AUC and run it with 2 GPUS (2 Nvidia V100 from a DGX machine).
# Import TensorFlow
import tensorflow as tf
# Helper libraries
import numpy as np
print(tf.__version__)
fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
# Adding a dimension to the array -> new shape == (28, 28, 1)
# We are doing this because the first layer in our model is a convolutional
# layer and it requires a 4D input (batch_size, height, width, channels).
# batch_size dimension will be added later on.
train_images = train_images[..., None]
test_images = test_images[..., None]
# One hot
train_labels = tf.keras.utils.to_categorical(train_labels, 10)
test_labels = tf.keras.utils.to_categorical(test_labels, 10)
# Getting the images in [0, 1] range.
train_images = train_images / np.float32(255)
test_images = test_images / np.float32(255)
# If the list of devices is not specified in the
# `tf.distribute.MirroredStrategy` constructor, it will be auto-detected.
GPUS = [0, 1]
devices = ["/gpu:" + str(gpu_id) for gpu_id in GPUS]
strategy = tf.distribute.MirroredStrategy(devices=devices)
print ('Number of devices: {}'.format(strategy.num_replicas_in_sync))
BUFFER_SIZE = len(train_images)
BATCH_SIZE_PER_REPLICA = 64
GLOBAL_BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync
EPOCHS = 10
train_dataset = tf.data.Dataset.from_tensor_slices((train_images, train_labels)).shuffle(BUFFER_SIZE).batch(GLOBAL_BATCH_SIZE)
test_dataset = tf.data.Dataset.from_tensor_slices((test_images, test_labels)).batch(GLOBAL_BATCH_SIZE)
train_dist_dataset = strategy.experimental_distribute_dataset(train_dataset)
test_dist_dataset = strategy.experimental_distribute_dataset(test_dataset)
def create_model():
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, 3, activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Conv2D(64, 3, activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
return model
with strategy.scope():
# Set reduction to `none` so we can do the reduction afterwards and divide by
# global batch size.
loss_object = tf.keras.losses.CategoricalCrossentropy(
from_logits=True,
reduction=tf.keras.losses.Reduction.NONE)
def compute_loss(labels, predictions):
per_example_loss = loss_object(labels, predictions)
return tf.nn.compute_average_loss(per_example_loss, global_batch_size=GLOBAL_BATCH_SIZE)
with strategy.scope():
test_loss = tf.keras.metrics.Mean(name='test_loss')
train_accuracy = tf.keras.metrics.CategoricalAccuracy(
name='train_accuracy')
test_accuracy = tf.keras.metrics.CategoricalAccuracy(
name='test_accuracy')
train_auc = tf.keras.metrics.AUC(name='train_auc')
test_auc = tf.keras.metrics.AUC(name='test_auc')
# model, optimizer, and checkpoint must be created under `strategy.scope`.
with strategy.scope():
model = create_model()
optimizer = tf.keras.optimizers.Adam()
def train_step(inputs):
images, labels = inputs
with tf.GradientTape() as tape:
predictions = model(images, training=True)
loss = compute_loss(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_accuracy(labels, predictions)
train_auc(labels, predictions)
return loss
def test_step(inputs):
images, labels = inputs
predictions = model(images, training=False)
t_loss = loss_object(labels, predictions)
test_loss.update_state(t_loss)
test_accuracy(labels, predictions)
test_auc(labels, predictions)
# `run` replicates the provided computation and runs it
# with the distributed input.
#tf.function
def distributed_train_step(dataset_inputs):
per_replica_losses = strategy.run(train_step, args=(dataset_inputs,))
return strategy.reduce(tf.distribute.ReduceOp.SUM, per_replica_losses,
axis=None)
#tf.function
def distributed_test_step(dataset_inputs):
return strategy.run(test_step, args=(dataset_inputs,))
for epoch in range(EPOCHS):
# TRAIN LOOP
total_loss = 0.0
num_batches = 0
for x in train_dist_dataset:
total_loss += distributed_train_step(x)
num_batches += 1
train_loss = total_loss / num_batches
# TEST LOOP
for x in test_dist_dataset:
distributed_test_step(x)
template = ("Epoch {}, Loss: {}, Accuracy: {}, AUC: {},"
"Test Loss: {}, Test Accuracy: {}, Test AUC: {}")
print (template.format(epoch+1,
train_loss, train_accuracy.result()*100, train_auc.result()*100,
test_loss.result(), test_accuracy.result()*100, test_auc.result()*100))
test_loss.reset_states()
train_accuracy.reset_states()
test_accuracy.reset_states()
train_auc.reset_states()
test_auc.reset_states()
The problem is that AUC's evaluation is definitely wrong cause it exceeds its range (should be from 0-100) and i get theese results by running the above code for one time:
Epoch 1, Loss: 1.8061423301696777, Accuracy: 66.00833892822266, AUC: 321.8688659667969,Test Loss: 1.742477536201477, Test Accuracy: 72.0999984741211, Test AUC: 331.33709716796875
Epoch 2, Loss: 1.7129968404769897, Accuracy: 74.9816665649414, AUC: 337.37017822265625,Test Loss: 1.7084736824035645, Test Accuracy: 75.52999877929688, Test AUC: 337.1878967285156
Epoch 3, Loss: 1.643971562385559, Accuracy: 81.83333587646484, AUC: 355.96209716796875,Test Loss: 1.6072628498077393, Test Accuracy: 85.3499984741211, Test AUC: 370.603759765625
Epoch 4, Loss: 1.5887378454208374, Accuracy: 87.27833557128906, AUC: 373.6204528808594,Test Loss: 1.5906082391738892, Test Accuracy: 87.13999938964844, Test AUC: 371.9998474121094
Epoch 5, Loss: 1.581775426864624, Accuracy: 88.0, AUC: 373.9468994140625,Test Loss: 1.5964380502700806, Test Accuracy: 86.68000030517578, Test AUC: 371.0227355957031
Epoch 6, Loss: 1.5764907598495483, Accuracy: 88.49166870117188, AUC: 375.2404479980469,Test Loss: 1.5832056999206543, Test Accuracy: 87.94000244140625, Test AUC: 373.41998291015625
Epoch 7, Loss: 1.5698528289794922, Accuracy: 89.19166564941406, AUC: 376.473876953125,Test Loss: 1.5770654678344727, Test Accuracy: 88.58000183105469, Test AUC: 375.5516662597656
Epoch 8, Loss: 1.564456820487976, Accuracy: 89.71833801269531, AUC: 377.8564758300781,Test Loss: 1.5792100429534912, Test Accuracy: 88.27000427246094, Test AUC: 373.1791687011719
Epoch 9, Loss: 1.5612279176712036, Accuracy: 90.02000427246094, AUC: 377.9949645996094,Test Loss: 1.5729509592056274, Test Accuracy: 88.9800033569336, Test AUC: 375.5257263183594
Epoch 10, Loss: 1.5562015771865845, Accuracy: 90.54000091552734, AUC: 378.9789123535156,Test Loss: 1.56815767288208, Test Accuracy: 89.3499984741211, Test AUC: 375.8636474609375
Accuracy is ok but it seems that it's the only one metric that behaves nice. I tried other metrics too but they are not evaluated correctly. It seems that the problems come when using more than one GPU, cause when I run this code with one GPU it produce the right results.
When you use distributed strategy, the metric must be constructed and used inside the strategy.scope() block. So when you want to call the metric.result() method, remember to put it inside the with strategy.scope() block.
with strategy.scope():
print(metric.result())

RNN model not learning anything

I am practicing with RNN. I randomly create 5 integers. If the first integer is an odd number, the y value is 1, otherwise y is 0 (So, only the first x counts). Problem is, when I run this model, it does not 'learn': val_loss and val_accuracy does not change over epochs. What would be the cause?
from keras.layers import SimpleRNN, LSTM, GRU, Dropout, Dense
from keras.models import Sequential
import numpy as np
data_len = 300
x = []
y = []
for i in range(data_len):
a = np.random.randint(1,10,5)
if a[0] % 2 == 0:
y.append('0')
else:
y.append('1')
a = a.reshape(5, 1)
x.append(a)
print(x)
X = np.array(x)
Y = np.array(y)
model = Sequential()
model.add(GRU(units=24, activation='relu', return_sequences=True, input_shape=[5,1]))
model.add(Dropout(rate=0.5))
model.add(GRU(units=12, activation='relu'))
model.add(Dropout(rate=0.5))
model.add(Dense(units=1, activation='softmax'))
model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
model.summary()
history = model.fit(X[:210], Y[:210], epochs=20, validation_split=0.2)
Epoch 1/20
168/168 [==============================] - 1s 6ms/step - loss: 0.4345 - accuracy: 0.5655 - val_loss: 0.5000 - val_accuracy: 0.5000
...
Epoch 20/20
168/168 [==============================] - 0s 315us/step - loss: 0.4345 - accuracy: 0.5655 - val_loss: 0.5000 - val_accuracy: 0.5000
You're using softmax activation with 1 neuron, which always returns [1]. Use sigmoid activation with 1 neuron for binary classification, and softmax for multiple neurons for multiclass classification
Change data_len to a higher number like 30000 and it will be able to learn. Right now the amount of data is very small. and ofcourse, you'll need to change the activation (to sigomid) -- as suggested by Yoskutik

Train accuracy improving but validation remains unchanged?

I am using TF 2.0. I was trying to train a network on my own data. It was not going well. The validation accuracy was close to 0 and stagnant. I tried many regularizations to no effect. Then I tried training a network on 3 classes of data where all images in each class are the same so as to eliminate the possibility of variability. But this is not working either. Since all in-class images are the same, I would expect the validation accuracy to perfectly match the training accuracy since there is no new data. Why is that not the case? Here is my code:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.applications.mobilenet import preprocess_input
import matplotlib.pyplot as plt
base_model = tf.keras.applications.MobileNet(weights='imagenet', include_top=False)
def turn_off(n):
for layer in model.layers[:n]:
layer.trainable = False
for layer in model.layers[n:]:
layer.trainable = True
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(
x) # we add dense layers so that the model can learn more complex functions and classify for better results.
x = Dense(1024, activation='relu')(x) # dense layer 2
x = Dense(512, activation='relu')(x) # dense layer 3
preds = Dense(3, activation='softmax')(x) # final layer with softmax activation
model = Model(inputs=base_model.input, outputs=preds)
turn_off(87)
train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input,
rescale=1. / 255,
validation_split=0.2) # set validation split
train_generator = train_datagen.flow_from_directory(
'/users/josh.flori/desktop/colors/',
target_size=(224, 224),
batch_size=32,
color_mode='rgb',
class_mode='categorical',
subset='training',
shuffle=True) # set as training data
validation_generator = train_datagen.flow_from_directory(
'/users/josh.flori/desktop/colors/',
target_size=(224, 224),
batch_size=32,
color_mode='rgb',
class_mode='categorical',
subset='validation',
shuffle=True) # set as validation data
# Adam optimizer
# loss function will be categorical cross entropy
# evaluation metric will be accuracy
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit_generator(
train_generator,
steps_per_epoch=train_generator.samples // train_generator.batch_size,
validation_data=validation_generator,
validation_steps=validation_generator.samples // train_generator.batch_size,
epochs=6)
Here is the training output
9/9 [==============================] - 19s 2s/step - loss: 0.2645 - accuracy: 0.9134 - val_loss: 1.6668 - val_accuracy: 0.3438
Epoch 2/6
9/9 [==============================] - 20s 2s/step - loss: 0.0417 - accuracy: 0.9567 - val_loss: 2.6176 - val_accuracy: 0.3438
Epoch 3/6
9/9 [==============================] - 17s 2s/step - loss: 0.4771 - accuracy: 0.9422 - val_loss: 4.0694 - val_accuracy: 0.3438
Epoch 4/6
9/9 [==============================] - 18s 2s/step - loss: 0.0000e+00 - accuracy: 1.0000 - val_loss: 2.1304 - val_accuracy: 0.3125
Epoch 5/6
9/9 [==============================] - 18s 2s/step - loss: 9.7658e-07 - accuracy: 1.0000 - val_loss: 3.1633 - val_accuracy: 0.3125
Epoch 6/6
9/9 [==============================] - 18s 2s/step - loss: 2.2571e-05 - accuracy: 1.0000 - val_loss: 3.4949 - val_accuracy: 0.3125
My image folders look like this
where there are exactly 128 identical images per folder.
I've been reading all day, trying different images, I can't seem to get anywhere. What is causing this particular behavior? It has to be something obvious but I'm not sure.