Great validation accuracy but terrible prediction - tensorflow

I have got the following CNN:
import os
import numpy as np
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from keras.models import Sequential
from keras.utils import to_categorical
from sklearn.preprocessing import LabelEncoder
from tqdm import tqdm
# Load the data
data_dir = PATH_DIR
x_train = []
y_train = []
total_files = 0
for subdir in os.listdir(data_dir):
subdir_path = os.path.join(data_dir, subdir)
if os.path.isdir(subdir_path):
total_files += len([f for f in os.listdir(subdir_path) if f.endswith('.npy')])
with tqdm(total=total_files, unit='file') as pbar:
for subdir in os.listdir(data_dir):
subdir_path = os.path.join(data_dir, subdir)
if os.path.isdir(subdir_path):
for image_file in os.listdir(subdir_path):
if image_file.endswith('.npy'):
image_path = os.path.join(subdir_path, image_file)
image = np.load(image_path)
x_train.append(image)
y_train.append(subdir)
pbar.update()
x_train = np.array(x_train)
y_train = np.array(y_train)
# Preprocess the labels
label_encoder = LabelEncoder()
y_train = label_encoder.fit_transform(y_train)
y_train = to_categorical(y_train)
# Create the model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(57, 57, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(8, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10)
model.save('GeneratedModels/units_model_np.h5')
And then the following function that is called within a loop about 15 times a second. Where image is a numpy array.
def guess_unit(image, classList):
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
model = tf.keras.models.load_model(MODEL_PATH)
image = np.expand_dims(image, axis=0)
prediction = model.predict(image, verbose=0)
index = np.argmax(prediction)
# Return the predicted unit
return classList[index]
The problem is that when i train the model the accuracy is very high (99,99976%) but when I am using the predict the output is terribily wrong, to the point it does not make any sense. Sometimes the image received will be the same but the predict will return 2 different things.
I have no idea what am I doing wrong. It's the first time I am tinkering with Neural Networks.
I have tried to use the model.predict with the images that it was trained on and it's always getting them right. Is just when it receives dynamic images that it's terribly wrong.
NOTE: I have 8 classes and it was trained using about 13000 images.

Generally to get performance on your training data you have to split your data into training, testing and validation (which I see you haven't done). This can be done manually or done via adding validation_split into your fit function.
Without seeing any curves on how your loss and accuracy it's behaving it's difficult to make any suggestions. However it might be the case that your are underfitting or overfitting to your data (I would assume that your facing overfitting in your case). In case you are overfitting to your data, I would suggest you to add some regularization or change your model architecture as the one used might not be appropriate. Options that one could think of would be to add regularization via Dropout or adding regularization to your weights.

Related

Issue with EarlyStopping Keras when training two similar models

With the aim of training a model with a known performance, I run the same model twice. The first is following a 90/10 split, where I can measure the performance of the model with the test set. The second one uses the same parameters as the former, but now on the entire dataset for deployment, which I call "full model" (a common approach using shallow ML algorithms).
I'm using a MLP from the Keras/TensorFlow package running on GPU. I also decided to apply a callback function, EarlyStopping, to stop after the result at the validation dataset (10% of the training set) does not improve after 50 iterations, and to get the best configuration once the fit is complete.
What has been weird to see is that the training of the first model usually goes until the end of the epochs (around 300 depending on the run due to the GPU random seeds), but the second model, the "full model" takes between 40-60 epochs and gives back a very poor performance.
My doubt is if this is due to the callback function being shared by the two models. Is it possible that the +-50 trials of patience of the "full model" end up being compared to the best case of the first model, and therefore ending the testing?
Code below:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from keras.layers import Activation
from keras.layers import BatchNormalization
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# from tensorflow.keras.layers import Dropout
from scikeras.wrappers import KerasRegressor
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)
def create_model():
model = Sequential()
model.add(Dense(400, input_dim=len(X_train.columns)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dense(400))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dense(400))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dense(400))
model.add(BatchNormalization())
model.add(Activation('relu'))
# model.add(Dropout(0.2))
model.add(Dense(1, activation='linear'))
# compile the keras model
model.compile(loss='mse', optimizer=tf.keras.optimizers.Adam(0.01), metrics=['mean_squared_error','mean_absolute_error'])
return model
callback_model = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=50, restore_best_weights=True)
callback_fullmodel = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=50, restore_best_weights=True)
model_rf = Pipeline([
('scaler', StandardScaler()),
('estimator', KerasRegressor(model=create_model, epochs=300, batch_size=1024, verbose=1,validation_split=0.1, callbacks=callback_model)) #
])
full_model_rf = Pipeline([
('scaler', StandardScaler()),
('estimator', KerasRegressor(model=create_model, epochs=3000, batch_size=1024, verbose=1, validation_split=0.1, callbacks=callback_fullmodel)) #
])
model = model_rf.fit(X_train, y_train)
full_model = full_model_rf.fit(X, y)

Methods to increase the Accuracy of the AI base Heart Sound's Classification

We have one GitHub Project for classification of heart sounds (link), with below README content:
Technology can play a role in addressing the above problem. The
Phonocardiogram (PCG) is the method of retrieving the sound of the
heart. This sound can capture through simple stethoscope. In this
work, we are proposing an artificial intelligence model which have the
potential to detect the heart abnormality from the heart sounds.
The dataset can be downloaded from https://physionet.org. This data is
also available in the link below
https://drive.google.com/open?id=13ehWqXt8YDrmmjQc7XAUqcCk6Dwb69hy The
data was gathered from two sources: (A) from the public via the
iStethoscope Pro iPhone app, and (B) from a clinic trial in hospitals
using the digital stethoscope DigiScope. There were two tasks
associated with this data:
Heart Sound Feature Extraction The first task is to extract the features from the heart sounds within audio data.
Heart Sound Classification The task is to produce a method that can classify real heart sound into one of four categories (Normal, Murmur,
Extra-Heart Sound and Artifact).
So if possible, i asked here to find out some idea to improve the validation accuracy in
the deep learning algorithm for classification of heart sounds which the codes and blocks could be seen below (link):
import keras
from keras.models import Sequential
from keras.layers import Conv1D, MaxPool1D, GlobalAvgPool1D, Dropout, BatchNormalization, Dense
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, LearningRateScheduler, EarlyStopping
from keras.utils import np_utils
from keras.regularizers import l2
from scipy.signal import decimate
#new_labels = np.array(new_labels, dtype='int')
Y_train = np_utils.to_categorical(y_train)
Y_test=np_utils.to_categorical(y_test)
model = Sequential()
model.add(Conv1D(filters=4, kernel_size=9, activation='relu', input_shape = x_train.shape[1:],kernel_regularizer = l2(0.025)))
model.add(MaxPool1D(strides=4))
model.add(BatchNormalization())
model.add(Conv1D(filters=4, kernel_size=(9), activation='relu',
kernel_regularizer = l2(0.05)))
model.add(MaxPool1D(strides=4))
model.add(BatchNormalization())
model.add(Conv1D(filters=8, kernel_size=(9), activation='relu',
kernel_regularizer = l2(0.1)))
model.add(MaxPool1D(strides=4))
model.add(BatchNormalization())
model.add(Conv1D(filters=16, kernel_size=(9), activation='relu'))
model.add(MaxPool1D(strides=4))
model.add(BatchNormalization())
model.add(Dropout(0.25))
model.add(Conv1D(filters=64, kernel_size=(4), activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Conv1D(filters=32, kernel_size=(1), activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.75))
model.add(GlobalAvgPool1D())
model.add(Dense(3, activation='softmax'))
def batch_generator(x_train, y_train, batch_size):
"""
Rotates the time series randomly in time
"""
x_batch = np.empty((batch_size, x_train.shape[1], x_train.shape[2]), dtype='float32')
y_batch = np.empty((batch_size, y_train.shape[1]), dtype='float32')
full_idx = range(x_train.shape[0])
while True:
batch_idx = np.random.choice(full_idx, batch_size)
x_batch = x_train[batch_idx]
y_batch = y_train[batch_idx]
for i in range(batch_size):
sz = np.random.randint(x_batch.shape[1])
x_batch[i] = np.roll(x_batch[i], sz, axis = 0)
yield x_batch, y_batch
weight_saver = ModelCheckpoint('set_a_weights.h5', monitor='val_loss',
save_best_only=True, save_weights_only=True)
model.compile(optimizer=Adam(1e-4), loss='categorical_crossentropy', metrics=['accuracy'])
annealer = LearningRateScheduler(lambda x: 1e-3 * 0.8**x)
x_train.shape
hist = model.fit_generator(batch_generator(x_train, Y_train, 8),
epochs=10, steps_per_epoch=1000,
validation_data=(x_test, Y_test),
callbacks=[weight_saver, annealer],
verbose=2)
model.load_weights('set_a_weights.h5')
import matplotlib.pyplot as plt
Thanks.
Try having a look to related publications. For example, work based on
The Heat Sounds Shenzhen Corpus may be of use to you?
Otherwise, I agree with the above that this may not be a suitable question for stack overflow.

Selecting Metrics in Keras CNN

I am trying to use CNN for trying to classify cats/dogs and noticed something strange.
When i define the model compile statement as below -
cat_dog_model.compile(optimizer =optimizers.Adam(),
metrics= [metrics.Accuracy()], loss=losses.binary_crossentropy)
my accuracy is very bad - something like 0.15% after 25 epochs.
When i define the same as
cnn.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
my accuracy shoots upto 55% in the first epoch and almost 80% by epoch 25.
When I read the Keras doc - https://keras.io/api/optimizers/ they mention explicitly that
You can either instantiate an optimizer before passing it to model.compile(), as in the above example, or you can pass it by its string identifier. In the latter case, the default parameters for the optimizer will be used.
Also the metrics parameter are also as per the API - Keras Metrics API
So as per my understanding i am using default parameters on both. Also when i change the metrics parameter to hardcode I get the same accuracy. So somehow the accuracy metrics is causing this issue. But I cant figure out why - Any help is appreciated.
My qn is why is hard coding metrics better than defining it as parameter?
Some more details : I am trying to use 8k images for training and about 2k images for validation.
sample code (you can change the line number 32 to get different results) :
from keras import models, layers, losses, metrics, optimizers
import numpy as np
import pandas as pd
from keras.preprocessing.image import ImageDataGenerator, load_img,img_to_array
train_datagen = ImageDataGenerator(rescale = 1./255,shear_range = 0.2,zoom_range = 0.2,horizontal_flip = True)
train_set = train_datagen.flow_from_directory('/content/drive/MyDrive/....../training_set/',
target_size = (64, 64),batch_size = 32,class_mode = 'binary')
test_datagen = ImageDataGenerator(rescale = 1./255)
test_set = test_datagen.flow_from_directory(
'/content/drive/MyDrive/........./test_set/',
target_size = (64, 64),batch_size = 32,class_mode = 'binary')
cat_dog_model = models.Sequential()
cat_dog_model.add(layers.Conv2D(filters=32, kernel_size=3, activation='relu', input_shape=[64, 64, 3]))
cat_dog_model.add(layers.MaxPool2D(pool_size=2, strides=2))
cat_dog_model.add(layers.Conv2D(filters=32, kernel_size=3, activation='relu'))
cat_dog_model.add(layers.MaxPool2D(pool_size=2, strides=2) )
cat_dog_model.add(layers.Flatten())
cat_dog_model.add(layers.Dense(units=128, activation='relu'))
cat_dog_model.add(layers.Dense(units=1, activation='sigmoid'))
cat_dog_model.compile(optimizer =optimizers.Adam(), metrics= [metrics.Accuracy()], loss=losses.binary_crossentropy)
cat_dog_model.summary()
cat_dog_model.fit(x=train_set,validation_data=test_set, epochs=25)

ValueError: Error when checking target: expected activation_6 to have shape(None,2) but got array with shape (5760,1)

I am trying to adapt Python code for a Convolutional Neural Network (in Keras) with 8 classes to work on 2 classes. My problem is that I get the following error message:
ValueError: Error when checking target: expected activation_6 to have
shape(None,2) but got array with shape (5760,1).
My Model is as follows (without the indentation issues):
class MiniVGGNet:
#staticmethod
def build(width, height, depth, classes):
# initialize the model along with the input shape to be
# "channels last" and the channels dimension itself
model = Sequential()
inputShape = (height, width, depth)
chanDim = -1
# if we are using "channels first", update the input shape
# and channels dimension
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
chanDim = 1
# first CONV => RELU => CONV => RELU => POOL layer set
model.add(Conv2D(32, (3, 3), padding="same",
input_shape=inputShape))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(32, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# second CONV => RELU => CONV => RELU => POOL layer set
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# first (and only) set of FC => RELU layers
model.add(Flatten())
model.add(Dense(512))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# softmax classifier
model.add(Dense(classes))
model.add(Activation("softmax"))
# return the constructed network architecture
return model
Where classes = 2, and inputShape=(32,32,3).
I know that my error has something to do with my classes/use of binary_crossentropy and occurs in the model.fit line below, but haven't been able to figure out why it is problematic, or how to fix it.
By changing model.add(Dense(classes)) above to model.add(Dense(classes-1)) I can get the model to train, but then my labels size and target_names are mismatched, and I have only one category which everything is categorized as.
# import the necessary packages
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from pyimagesearch.nn.conv import MiniVGGNet
from pyimagesearch.preprocessing import ImageToArrayPreprocessor
from pyimagesearch.preprocessing import SimplePreprocessor
from pyimagesearch.datasets import SimpleDatasetLoader
from keras.optimizers import SGD
#from keras.datasets import cifar10
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
help="path to input dataset")
ap.add_argument("-o", "--output", required=True,
help="path to the output loss/accuracy plot")
args = vars(ap.parse_args())
# grab the list of images that we'll be describing
print("[INFO] loading images...")
imagePaths = list(paths.list_images(args["dataset"]))
# initialize the image preprocessors
sp = SimplePreprocessor(32, 32)
iap = ImageToArrayPreprocessor()
# load the dataset from disk then scale the raw pixel intensities
# to the range [0, 1]
sdl = SimpleDatasetLoader(preprocessors=[sp, iap])
(data, labels) = sdl.load(imagePaths, verbose=500)
data = data.astype("float") / 255.0
# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels,
test_size=0.25, random_state=42)
# convert the labels from integers to vectors
trainY = LabelBinarizer().fit_transform(trainY)
testY = LabelBinarizer().fit_transform(testY)
# initialize the label names for the items dataset
labelNames = ["mint", "used"]
# initialize the optimizer and model
print("[INFO] compiling model...")
opt = SGD(lr=0.01, decay=0.01 / 10, momentum=0.9, nesterov=True)
model = MiniVGGNet.build(width=32, height=32, depth=3, classes=2)
model.compile(loss="binary_crossentropy", optimizer=opt,
metrics=["accuracy"])
# train the network
print("[INFO] training network...")
H = model.fit(trainX, trainY, validation_data=(testX, testY),
batch_size=64, epochs=10, verbose=1)
print ("Made it past training")
# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=64)
print(classification_report(testY.argmax(axis=1),
predictions.argmax(axis=1), target_names=labelNames))
# plot the training loss and accuracy
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, 10), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, 10), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, 10), H.history["acc"], label="train_acc")
plt.plot(np.arange(0, 10), H.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy on items dataset")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend()
plt.savefig(args["output"])
I have looked at these questions already, but cannot workout how to get around this problem based on the responses.
Stackoverflow Question 1
Stackoverflow Question 2
Stackoverflow Question 3
Any advice or help would be much appreciated, as I've spent the last couple of days on this.
Matt's comment was absolutely correct in that the problem lay with using LabelBinarizer and this hint led me to a solution that did not require me to give up using softmax, or change the last layer to have classes = 1. For posterity and for others, here's the section of code that I altered and how I was able to avoid LabelBinarizer:
from keras.utils import np_utils
from sklearn.preprocessing import LabelEncoder
# load the dataset from disk then scale the raw pixel intensities
# to the range [0,1]
sp = SimplePreprocessor (32, 32)
iap = ImageToArrayPreprocessor()
# encode the labels, converting them from strings to integers
le=LabelEncoder()
labels = le.fit_transform(labels)
data = data.astype("float") / 255.0
labels = np_utils.to_categorical(labels,2)
# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
....
I believe the problem lies in the use of LabelBinarizer.
From this example:
>>> lb = preprocessing.LabelBinarizer()
>>> lb.fit_transform(['yes', 'no', 'no', 'yes'])
array([[1],
[0],
[0],
[1]])
I gather that the output of your transformation has the same format, i. e. a single 1 or 0 encoding "is new" or "is used".
If your problem only calls for classification among these two classes, that format is preferable because it contains all the information and uses less space than the alternative, i. e. [1,0], [0,1], [0,1], [1,0].
Therefore, using classes = 1 would be correct, and the output should be a float indicating the network's confidence in a sample being in the first class. Since these values have to sum to one, the probability of it being in the second class could easily be inferred by subtracting from 1.
You would need to replace softmax with any other activation, because softmax on a single value always returns 1. I'm not completely sure about the behaviour of binary_crossentropy with a single-valued result, and you may want to try mean_squared_error as the loss.
If you are looking to expand your model to cover more than two classes, you would want to convert your target vector to a One-hot encoding. I believe inverse_transform from LabelBinarizer would do this, although that would seem to be quite a roundabout way to get there. I see that sklearn also has OneHotEncoder which may the more appropriate replacement.
NB: You can specify the activation function for any layer more easily with, for example:
Dense(36, activation='relu')
This may be helpful in keeping your code to a manageable size.

Tensorboard event file size is growing after consecutive model training

I'm training 8 models in a for loop and saving each tensorboard log file into a seperate directory. Folder structure is like Graph is my main directory for graphs and directories under Graph such as net01 net02... net08 are the ones I'm outputting my event files. By doing this I can visualize training logs in Tensorboard in that fancy fashion with every single training process gets its own colour.
My problem is the growing sizes of eventfiles. The first event file is apporoximately 300KB's, but the second event file have a size of 600KB's, third is 900 KB and so on. They each reside in their own seperate directory and each of them are different training sessions from each other but somehow tensorboard appends the earlier sessions into last one. In the end I should've a total size of 12*300Kb= 3600 KB of session files, but I endup with something like 10800KB of session files. As the nets are getting deeper I endup with session file sizes of like 600 MB. So clearly I'm missing something out.
I tried to visualize last file with the biggest size to check whether it includes all the previous training sessions and can draw like 8 nets but it failed. SO a big bunch of irrelevant information is stored in this session file.
I'm using Anaconda3-Spyder on Win7-64. Database is divided into 8 and for each run I'm leaving one out for validation and using the rest as training. Here is a simplified version of my code:
from keras.models import Model
from keras.layers import Dense, Flatten, Input, Conv2D, MaxPooling2D
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import TensorBoard, ModelCheckpoint, CSVLogger
import os.path
import shutil
import numpy
# ------------------------------------------------------------------
img_width, img_height = 48, 48
num_folds=8
folds_path= "8fold_folds"
nets_path = "8fold_nets_simplenet"
csv_logpath = 'simplenet_log.csv'
nets_string = "simplenet_nets0"
nb_epoch = 50
batch_size = 512
cvscores = []
#%%
def foldpath(foldnumber):
pathbase= os.path.join(folds_path,'F')
train_data_dir = os.path.join(pathbase+str(foldnumber),"train")
valid_data_dir = os.path.join(pathbase+str(foldnumber),"test")
return train_data_dir,valid_data_dir
#%%
for i in range(1, num_folds+1):
modelpath= os.path.join(nets_path,nets_string+str(i))
if os.path.exists(modelpath):
shutil.rmtree(modelpath)
os.makedirs(modelpath)
[train_data_dir, valid_data_dir]=foldpath(i)
img_input = Input(shape=(img_width,img_height,1),name='input')
x = Conv2D(32, (3,3), activation='relu', padding='same', name='conv1-'+str(i))(img_input)
x = MaxPooling2D((2, 2), strides=(2, 2), name='pool1-'+str(i))(x)
x = Conv2D(64, (3,3), activation='relu', padding='same', name='conv2-'+str(i))(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='pool2-'+str(i))(x)
x = Conv2D(128, (3,3), activation='relu', padding='same', name='conv3-'+str(i))(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='pool3-'+str(i))(x)
x = Flatten()(x)
x = Dense(512, name='dense1-'+str(i))(x)
#x = Dropout(0.5)(x)
x = Dense(512, name='dense2-'+str(i))(x)
#x = Dropout(0.5)(x)
predictions = Dense(6, activation='softmax', name='predictions-'+str(i))(x)
model = Model(inputs=img_input, outputs=predictions)
# compile model-----------------------------------------------------------
model.compile(optimizer='Adam', loss='binary_crossentropy',
metrics=['accuracy'])
# ----------------------------------------------------------------
# prepare data augmentation configuration
train_datagen = ImageDataGenerator(rescale=1./255,
featurewise_std_normalization=True,
featurewise_center=True)
valid_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
color_mode='grayscale',
classes = ['1','3','4','5','6','7'],
class_mode='categorical',
shuffle='False'
)
validation_generator = valid_datagen.flow_from_directory(
valid_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
color_mode='grayscale',
classes = ['1','3','4','5','6','7'],
class_mode='categorical',
shuffle='False'
)
# --------------------callbacks---------------------------
csv_logger = CSVLogger(csv_logpath, append=True, separator=';')
graph_path = os.path.join('Graphs',modelpath)
os.makedirs(graph_path)
tensorboard = TensorBoard(log_dir= graph_path, write_graph=True, write_images=False)
callbacks_list=[csv_logger,tensorboard]
# ------------------
print("Starting to fit the model")
model.fit_generator(train_generator,
steps_per_epoch = train_generator.samples/batch_size,
validation_data = validation_generator,
validation_steps = validation_generator.samples/batch_size,
epochs = nb_epoch, verbose=1, callbacks=callbacks_list)
Not sure about this one but my guess would be that it has to do with your graphs being stored after each loop iteration. To check if your graphs are responsible for this, you could try write_graph = False, and see if you still have the same problem. To make sure the graph is reset, you could try to clear the tensorflow graph at the end of each iteration using this:
keras.backend.clear_session()
The problem is that with training of each model, the next model still contains all the graph elements of previous trainings. Thus before training each model, reset the Tensorflow graph and then continue with the training.