What's wrong with my ResNet50 on two machines? - tensorflow

Firstly, I trained a ResNet50 to be a six-class classifier from scratch on Kaggle, and got like this.
As you can see, the accuracy of training set and validation set improved steadily.
And after that, I rented a cloud host on the internet for a better GPU(1080ti), and copied my code (I uploaded my Jupyter notebook). And then I runned it. But strange things happened. My validation accuracy is extremely unsteady and always fluctuated widely (around 0.3). Here's the screenshot.
And also, the training on the host is much more difficult than on Kaggle kernel.
Here are the screenshots after some epochs.(actually the host's one is trained over much more epochs than the Kaggle's one)
And here's my codes of ImageDataGenerator.
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.1,
zoom_range=0.1,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True,
vertical_flip=True,
validation_split=0.1
)
test_datagen = ImageDataGenerator(
rescale=1./255,
validation_split=0.1
)
train_generator = train_datagen.flow_from_directory(
base_path,
target_size=(300, 300),
batch_size=16,
class_mode='categorical',
subset='training',
seed=0
)
validation_generator = test_datagen.flow_from_directory(
base_path,
target_size=(300, 300),
batch_size=16,
class_mode='categorical',
subset='validation',
seed=0
)

Related

Tensorflow: train multiple models in parallel with the same ImageDataGenerator

I'm doing HPO on a small custom CNN. During training the GPU is under-utilised and I'm finding a bottleneck in the CPU: the data augmentation process is too slow. Looking online, I found that I could use multiple CPU cores for the generator and speedup the process. I set up workers=n_cores and this did improve things, but not as much as I'd like.
So I though that I could train multiple models simultaneously on the GPU, and feed the same augmented data to the models. However, I can't come up with some idea on how to do this and I couldn't find any similar question.
Here's a minimal example (I'm leaving out imports for brevity):
# load model and set only last layer as trainable
def create_model(learning_rate, alpha, dropout):
model_path = '/content/drive/My Drive/Progetto Advanced Machine Learning/Model Checkpoints/Custom Model 1 2020-06-01 10:56:21.010759.hdf5'
model = tf.keras.models.load_model(model_path)
x = model.layers[-2].output
x = Dropout(dropout)(x)
predictions = Dense(120, activation='softmax', name='prediction', kernel_regularizer=tf.keras.regularizers.l2(alpha))(x)
model = Model(inputs=model.inputs, outputs=predictions)
for layer in model.layers[:-2]:
layer.trainable = False
model.compile(loss='categorical_crossentropy', optimizer=Adam(learning_rate), metrics=['accuracy'])
return model
#declare the search space
SEARCH_SPACE = [skopt.space.Real(0.0001, 0.1, name='learning_rate', prior='log-uniform'),
skopt.space.Real(1e-9, 1, name='alpha', prior='log-uniform'),
skopt.space.Real(0.0001, 0.95, name='dropout', prior='log-uniform')]
# declare generator
train_datagenerator = ImageDataGenerator(rescale=1. / 255, rotation_range=30, zoom_range=0.2, horizontal_flip=True, validation_split=0.2, data_format='channels_last')
# training function to be called by the optimiser
#use_named_args(SEARCH_SPACE)
def fitness(learning_rate, alpha, dropout):
model = create_model(learning_rate, alpha, dropout)
#compile generators
train_batches = train_datagenerator.flow_from_directory(train_out_path, target_size=image_size, color_mode="rgb", class_mode="categorical" , batch_size=32, subset='training', seed = 20052020)
val_batches = train_datagenerator.flow_from_directory(directory=train_out_path, target_size=image_size, color_mode="rgb", class_mode="categorical" , batch_size=32, subset='validation', shuffle=False, seed = 20052020)
#train
early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)
training_results = model.fit(train_batches, epochs=5, verbose=1, shuffle=True, validation_data=val_batches, workers=2)
history[hyperpars] = training_results.history
with open(dict_save_path, 'wb') as f:
pickle.dump(history, f)
return training_results.history['val_accuracy'][-1]
# HPO
result = skopt.forest_minimize(fitness, SEARCH_SPACE, n_calls=10, callback=checkpoint_saver)

keras epoch runs twice

When I run the model.fit_generator, the epoch runs twice, the first time it only goes up to 39/40, then the second time 40/40. Any reason why this is happening?
Here is a GIF, you can also see epoch 1/2 actually appears in the epoch 2/2 run. This only happens when I pass validation_data=validation_generator
Update, Here is the code:
The dataset is from here
https://tiny-imagenet.herokuapp.com/
Packages are:
absl-py==0.9.0
astor==0.7.1
attrs==19.3.0
autopep8==1.4.4
backcall==0.1.0
bleach==3.1.4
brotlipy==0.7.0
certifi==2020.4.5.1
cffi==1.14.0
chardet==3.0.4
colorama==0.4.3
cryptography==2.8
cycler==0.10.0
decorator==4.4.2
defusedxml==0.6.0
entrypoints==0.3
future==0.18.2
gast==0.2.2
google-pasta==0.2.0
grpcio==1.23.0
h5py==2.10.0
idna==2.9
imageio==2.8.0
importlib-metadata==1.6.0
ipykernel==5.2.0
ipython==7.13.0
ipython-genutils==0.2.0
jedi==0.17.0
Jinja2==2.11.2
joblib==0.14.1
json5==0.9.0
jsonschema==3.2.0
jupyter-client==6.1.3
jupyter-core==4.6.3
jupyter-tensorboard==0.2.0
jupyterlab==2.1.0
jupyterlab-server==1.1.1
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.2.0
llvmlite==0.31.0
Markdown==3.2.1
MarkupSafe==1.1.1
matplotlib==3.2.1
mistune==0.8.4
nbconvert==5.6.1
nbformat==5.0.6
notebook==6.0.3
numba==0.48.0
numpy==1.18.1
olefile==0.46
opt-einsum==0+untagged.56.g2664021.dirty
pandas==1.0.3
pandocfilters==1.4.2
parso==0.7.0
pickleshare==0.7.5
Pillow==7.1.1
prometheus-client==0.7.1
prompt-toolkit==3.0.5
protobuf==3.11.4
pycparser==2.20
Pygments==2.6.1
pyOpenSSL==19.1.0
pyparsing==2.4.7
PyQt5==5.12.3
PyQt5-sip==4.19.18
PyQtWebEngine==5.12.1
pyreadline==2.1
pyrsistent==0.16.0
PySocks==1.7.1
python-dateutil==2.8.1
pytz==2019.3
pywin32==227
pywinpty==0.5.7
pyzmq==19.0.0
requests==2.23.0
scikit-learn==0.22.2.post1
scipy==1.2.1
Send2Trash==1.5.0
six==1.14.0
tensorboard==1.15.0
tensorflow==1.15.0
tensorflow-estimator==1.15.1
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
tornado==6.0.4
traitlets==4.3.3
urllib3==1.25.9
wcwidth==0.1.9
webencodings==0.5.1
Werkzeug==0.16.1
win-inet-pton==1.1.0
wincertstore==0.2
wrapt==1.12.1
zipp==3.1.0
Code is
train_datagen = ImageDataGenerator(validation_split=0.9)
train_generator = train_datagen.flow_from_directory(directory= 'tiny-imagenet-200/train/',
target_size=(64, 64),
batch_size=256,
class_mode='categorical',
shuffle=True,
seed=42,
subset ="training"
)
val_data = pd.read_csv('./tiny-imagenet-200/val/val_annotations.txt', sep='\t', header=None, names=['File', 'Class', 'X', 'Y', 'H', 'W'])
val_data.drop(['X', 'Y', 'H', 'W'], axis=1, inplace=True)
valid_datagen = ImageDataGenerator(validation_split=0.9)
validation_generator = valid_datagen.flow_from_dataframe(dataframe=val_data,
directory='./tiny-imagenet-200/val/images/',
x_col='File',
y_col='Class',
target_size=(64, 64),
color_mode='rgb',
class_mode='categorical',
batch_size=256,
shuffle=True,
seed=42,
subset ="training")
history = model.fit_generator(train_generator,
epochs=2,
validation_data=validation_generator,
#callbacks=[tensorboard_callback]
)
You are using validation_split when instancing ImageDataGenerator and setting subset ="training" to validation_generator, but you actually have your validation and training sets separated in different directories. Now, I'm not 100% sure, but I think it may have to do with it.
Also, I would use the same common arguments for both training and validation when calling flow_from_dataframe: x_col, y_col, target_size, color_mode, etc.
Take a look at the examples shown here (official docs):
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
'data/train',
target_size=(150, 150),
batch_size=32,
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
'data/validation',
target_size=(150, 150),
batch_size=32,
class_mode='binary')
model.fit_generator(
train_generator,
steps_per_epoch=2000,
epochs=50,
validation_data=validation_generator,
validation_steps=800) ```

Keras, training with an empty category

Maybe it is a naive question.
I want to try a small experiment for research: train the model with an extra and empty category from the one that I have in the training and validation and see how the prediction for this extra category goes down with the number of samples and epochs.
In particular I added a 5th phantom category in the pandas dataframe.
I am also using an ImageDataGenerator.
train_datagen = ImageDataGenerator(
rotation_range=0,
rescale=1./255,
shear_range=0.0,
zoom_range=0.2,
horizontal_flip=False,
width_shift_range=0.0,
height_shift_range=0.0
)
train_generator = train_datagen.flow_from_dataframe(
train_df,
"/mypath/",
x_col='filename',
y_col='category',
target_size=IMAGE_SIZE,
class_mode='categorical',
batch_size=batch_size
)
validation_datagen = ImageDataGenerator(rescale=1./255)
validation_generator = validation_datagen.flow_from_dataframe(
validate_df,
"/mypath/",
x_col='filename',
y_col='category',
target_size=IMAGE_SIZE,
class_mode='categorical',
batch_size=batch_size
)
history = model.fit_generator(
train_generator,
epochs=epochs,
validation_data=validation_generator,
validation_steps=total_validate//batch_size,
steps_per_epoch=total_train//batch_size,
callbacks=callbacks
)
However when I a try to train the CNN I got the following error:
Error when checking target: expected dense_2 to have shape (5,) but got array with shape (4,)
Someone can suggest a workaround?

Low accurcacy - Transfer learning + bottle-neck keras-tensorflow (resnet50)

I'm trying to do transfer learning / bottle neck with keras/tensorflow on a google Colaboratory notebook. My problem is that the accuracy doesn't go over 6% (Kaggle's dog breed challenge, 120 classes, data generated with datagen.flow_from_directory)
Below is my code, is there something I'm missing?
tr_model=ResNet50(include_top=False,
weights='imagenet',
input_shape = (224, 224, 3),)
datagen = ImageDataGenerator(rescale=1. / 255)
#### Training ####
train_generator = datagen.flow_from_directory(train_data_dir,
target_size=(image_size,image_size),
class_mode=None,
batch_size=batch_size,
shuffle=False)
bottleneck_features_train = tr_model.predict_generator(train_generator)
train_labels = to_categorical(train_generator.classes , num_classes=num_classes)
#### Validation ####
validation_generator = datagen.flow_from_directory(validation_data_dir,
target_size=(image_size,image_size),
class_mode=None,
batch_size=batch_size,
shuffle=False)
bottleneck_features_validation = tr_model.predict_generator(validation_generator)
validation_labels = to_categorical(validation_generator.classes, num_classes=num_classes)
#### Model creation ####
model = Sequential()
model.add(Flatten(input_shape=bottleneck_features_train.shape[1:]))
model.add(Dense(num_class, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(bottleneck_features_train, train_labels,
epochs=30,
batch_size=batch_size,
validation_data=(bottleneck_features_validation, validation_labels))
I get a val_acc = 0.0592
When I use ResNet50 with the last layer, I get a score of 82%.
Can anyone spot what's wrong with my code.
Suppress the rescale and add the preprocessing helped a lot.
Those modifications help immensely:
from keras.applications.resnet50 import preprocess_input
datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
I now have an accuracy of 80%

Keras / Tensorflow not utilising all CPU cores

I have a rather straightforward script for training and validating. I'm using tensorflow-gpu and I can see GPU:0 being used. However, the python process itself appears to be using just a single core with just around 90% utilisation. My GPU isn't getting maxed out during training either. It gets fully utilised during validation, however.
I wonder whether the use of a single core is preventing the GPU from being utilised more. Is there a way to use more CPU cores? I've tried setting config.intra_op_parallelism_threads = 4, but still only a single core is used.
Here's my script:
import model
from keras.optimizers import SGD
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.preprocessing.image import ImageDataGenerator
from visual_callbacks import AccLossPlotter
import numpy as np
def main():
np.random.seed(45)
nb_class = 2
width, height = 224, 224
sn = model.SqueezeNet(nb_classes=nb_class, inputs=(3, height, width))
print('Build model')
sgd = SGD(lr=0.001, decay=0.0002, momentum=0.9, nesterov=True)
sn.compile(
optimizer=sgd, loss='categorical_crossentropy', metrics=['accuracy'])
print(sn.summary())
# Training
train_data_dir = 'data/train'
validation_data_dir = 'data/validation'
nb_train_samples = 2000
nb_validation_samples = 800
nb_epoch = 500
# Generator
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
#train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(width, height),
batch_size=32,
class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size=(width, height),
batch_size=32,
class_mode='categorical')
# Instantiate AccLossPlotter to visualise training
plotter = AccLossPlotter(graphs=['acc', 'loss'], save_graph=True)
early_stopping = EarlyStopping(monitor='val_loss', patience=3, verbose=0)
checkpoint = ModelCheckpoint(
'weights.{epoch:02d}-{val_loss:.2f}.h5',
monitor='val_loss',
verbose=0,
save_best_only=True,
save_weights_only=True,
mode='min',
period=1)
sn.fit_generator(
train_generator,
samples_per_epoch=nb_train_samples,
nb_epoch=nb_epoch,
validation_data=validation_generator,
nb_val_samples=nb_validation_samples,
callbacks=[plotter, checkpoint])
sn.save_weights('weights.h5')
if __name__ == '__main__':
main()
input('Press ENTER to exit...')
You can not utilize both CPU as well as GPU simultaneously. When you are using GPU for computation, your CPU not doing the actual computation, it is only doing the book-keeping job for GPU kernels. And for doing book-keeping, CPU does not have to utilize all the cores (single core is enough).
My GPU isn't getting maxed out during training either. It gets fully utilised during validation, however.
That is because during training you are calculating the gradients and doing back-prop which are not massively parallel processes compare to simple forward pass(you have to update weights after every batch forward pass). And so those can not fully utilize the GPU. But during the validation you only calculating the forward pass and that is why during validation GPU's are fully utilized.
Although you may get more GPU utilization if you increase the batch_size.