Training model on yolov3-tiny, but average loss always equals -nan - yolo

I'm experimenting with yolov3-tiny with darknet on windows 10 with cpu. However, I keep getting an average loss of nan. I have followed all the directions per the direction at https://github.com/AlexeyAB/darknet.git. I edited my cfg file with all three filter for both yolo's set to 21 (since I only have two classes.) I set the subdivisions to 8 and batch to 64. I'm using a little over 500 images that I made myself and I'm trying to do custom detection. I want yolo to determine if the image is a thumbs up or a thumbs down. I have run the train command numerous times but I never get past 100 iteration an
#config file:
[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=8
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1
[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=1
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
###########
[convolutional]
batch_normalize=1
filters=21
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=21
size=3
stride=1
pad=1
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=21
activation=linear
[yolo]
mask = 3,4,5
anchors = 38, 93, 55,120, 66,156, 90,259, 110,239, 118,283
classes=2
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
[route]
layers = -4
[convolutional]
batch_normalize=1
filters=21
size=1
stride=1
pad=1
activation=leaky
[upsample]
stride=2
[route]
layers = -1, 8
[convolutional]
batch_normalize=1
filters=21
size=3
stride=1
pad=1
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=21
activation=linear
[yolo]
mask = 0,1,2
anchors = 38, 93, 55,120, 66,156, 90,259, 110,239, 118,283
classes=2
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

try: random = 0
it works for me when training yolov3-tiny ~

Related

I am trying to run a tensorflow visual recognitiontraining on a m1 MacBook Pro but get always the same error

'''import numpy as np
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense, GlobalAveragePooling2D
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ModelCheckpoint, EarlyStopping
from tensorflow.keras.optimizers import Adam
from keras.applications.inception_resnet_v2 import InceptionResNetV2
import glob
n_data = len(glob.glob("raw_data\*"))*64
n_epoch = 50
n_batch = 32
datagen = ImageDataGenerator(
rotation_range=20,
brightness_range=[0.8, 1.2],
shear_range=0.2,
zoom_range=0.2,
fill_mode='nearest',
horizontal_flip=True,
rescale=1. /255,
data_format=None,
validation_split=0.2
)
train_gen = datagen.flow_from_directory(
'./dataset',
target_size = (400, 400),
class_mode = 'categorical',
color_mode = 'rgb',
batch_size = n_batch,
subset="training",
shuffle=True
)
val_gen = datagen.flow_from_directory(
'./dataset',
target_size = (400, 400),
class_mode = 'categorical',
color_mode = 'rgb',
batch_size = n_batch,
subset="validation",
shuffle=True
)
model = Sequential()
base_model = InceptionResNetV2(weights='imagenet', include_top=False, input_shape=(400,400,3))
model.add(base_model)
model.add(GlobalAveragePooling2D())
model.add(Dense(units = 1024, activation="relu"))
#model.add(Dropout(0.2))
#model.add(Dense(units = 1024, activation="relu"))
model.add(Dense(units=13, activation="softmax"))
base_total = len(base_model.layers)
for layer in base_model.layers[:base_total]:
layer.trainable=False
for layer in model.layers[base_total:]:
layer.trainable=True
for layer in model.layers[1:]:
layer.trainable = True
opt = Adam(learning_rate=0.0001)
model.compile(optimizer=opt, loss="categorical_crossentropy", metrics=['accuracy'], run_eagerly=True)
model.summary()
checkpoint = ModelCheckpoint("chess_check.h5", monitor="val_acc", verbose=1, save_bes_only=True, save_weights_onlny=False, mode="auto", period=1)
early = EarlyStopping(monitor="val_acc", min_delta=0, patience=10, verbose=1, mode="auto", restore_best_weights=True)
hist = model.fit_generator(steps_per_epoch = int((0.8*n_data)//n_batch), generator = train_gen, validation_data = val_gen, validation_steps = int((0.2*n_data)//n_batch), epochs=n_epoch, verbose=1, callbacks=[checkpoint, early])
model.save_weights('chess.h5')'''
this is the terminal output:
File "/Users/Coden/.Trash/vs-r/tutorial-en/lib/python3.10/site-packages/keras/engine/training.py", line 1420, in fit
raise ValueError('Unexpected result of train_function '
ValueError: Unexpected result of train_function (Empty logs). Please use Model.compile(..., run_eagerly=True), or tf.config.run_functions_eagerly(True) for more information of where went wrong, or file a issue/bug to tf.keras.
these are the installed packages:
'''
absl-py==1.1.0
astunparse==1.6.3
cachetools==5.2.0
certifi==2022.6.15
charset-normalizer==2.0.12
flatbuffers==1.12
gast==0.4.0
google-auth==2.8.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio==1.46.3
h5py==3.7.0
idna==3.3
keras==2.9.0
Keras-Preprocessing==1.1.2
libclang==14.0.1
Markdown==3.3.7
numpy==1.22.4
oauthlib==3.2.0
opt-einsum==3.3.0
packaging==21.3
Pillow==9.1.1
protobuf==3.19.4
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyparsing==3.0.9
requests==2.28.0
requests-oauthlib==1.3.1
rsa==4.8
scipy==1.8.1
six==1.16.0
tensorboard==2.9.1
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow-estimator==2.9.0
tensorflow-macos==2.9.2
termcolor==1.1.0
typing_extensions==4.2.0
urllib3==1.26.9
Werkzeug==2.1.2
wrapt==1.14.1
'''
the code works on a weak windows laptop

Getting wrong y_pred values from model.predict

First of all I am very new to deep learning. Here I want to create a confusion matrix. For this reason, I need y_pred and y_true. I calculated y_true and y_pred the following way:
y_true = test_gen.classes
y_pred = (model.predict(test_gen)>0.5).astype("int32")
My confusion matrix code is:
from sklearn.metrics import classification_report, confusion_matrix
print('Confusion Matrix')
print(confusion_matrix(y_true, y_pred ))
mat = confusion_matrix(y_true, y_pred)
I set metrics=['accuracy','TruePositives', 'TrueNegatives', 'FalsePositives', 'FalseNegatives'] in my model.compile
best_model = model
best_model.load_weights('./classify_model.h5')
best_model.evaluate(test_gen)
The value that I get for TruePositives,TrueNegatives, FalsePositives,FalseNegatives from best_model.evaluate(test_gen) don't match with my confusion matrix value.
My Train dataset:
My test dataset:
target_size=(224,224)
batch_size=64
train_datagen = ImageDataGenerator(
preprocessing_function=tf.keras.applications.resnet_v2.preprocess_input,
horizontal_flip=True, zoom_range=0.1
)
test_datagen = ImageDataGenerator(
preprocessing_function=tf.keras.applications.resnet_v2.preprocess_input
)
train_gen = train_datagen.flow_from_dataframe(
train_df,
directory=train_path,
x_col='file_paths',
y_col='labels',
target_size=target_size,
batch_size=batch_size,
color_mode='rgb',
class_mode='binary'
)
valid_gen = test_datagen.flow_from_dataframe(
valid_df,
directory=train_path,
x_col='file_paths',
y_col='labels',
target_size=target_size,
batch_size=batch_size,
color_mode='rgb',
class_mode='binary'
)
test_gen = test_datagen.flow_from_dataframe(
test_df,
directory=test_path,
x_col='file_paths',
y_col='labels',
target_size=target_size,
batch_size=batch_size,
color_mode='rgb',
class_mode='binary'
)
base_model = tf.keras.applications.ResNet50V2(include_top=False, input_shape=(224,224,3))
model = tf.keras.Sequential([
base_model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(1, activation='sigmoid')
])
lr=0.001
model.compile(loss='binary_crossentropy', optimizer=Adam(learning_rate=lr), metrics=['accuracy', 'TruePositives', 'TrueNegatives', 'FalsePositives', 'FalseNegatives'])
I am having trouble calculating y_true and y_pred correctly. Please help me to construct confusion matrix for this code.

HIGH Resolution Image and Colab PRO Crash

I have 1700 images of 1000*1000 Image height and Width. There are minor details in it, so I prefer to keep this size. Now, my google colab pro crashes. Please Help.
'''
##title IMAGE TO DATA, NORMALIZATION AND AUGMENTATION
#Directories with Subdirectories as Classes for training and validation datasets
%%capture
train_dir = '/content/Dataset/Training'
validation_dir = '/content/Dataset/Validation'
# Set batch size and Image Height and Width
batch_size = 32
IMG_HEIGHT, IMG_WIDTH = (1000,1000)
#Image to Data Transform using ImageDataGenerator of Keras
#Image to Data for Training Data
Dataset_Image_Training = ImageDataGenerator(rescale = 1./255, zoom_range=[0.8, 1.5], brightness_range= [0.8, 2.0])
train_data_gen = Dataset_Image_Training.flow_from_directory(
batch_size= batch_size,
directory=train_dir,
shuffle=True,
target_size=(IMG_HEIGHT,IMG_WIDTH),
class_mode='binary')
#Image to Data for Validation Data
validation_image_generator = ImageDataGenerator(rescale=1./255, zoom_range=[0.8, 1.5], brightness_range= [0.8, 2.0])
val_data_gen = validation_image_generator.flow_from_directory(
batch_size=batch_size,
directory= validation_dir,
shuffle=True,
target_size=(IMG_HEIGHT,IMG_WIDTH),
class_mode= 'binary')
#Check Classes in Dataset
train_data_gen.class_indices
##title Deep Learning CNN Model with Keras Seqential with **Dropout**
#%%capture
model = Sequential([
Conv2D(32, (3,3), padding='same', activation='relu', input_shape=(IMG_HEIGHT, IMG_WIDTH ,3)),
MaxPool2D(2,2),
Dropout(0.5),
Conv2D(64, (3,3), padding='same', activation='relu'),
MaxPool2D(2,2),
Dropout(0.5),
Conv2D(128, (3,3), padding='same', activation='relu'),
MaxPool2D(2,2),
Dropout(0.5),
Conv2D(256, (3,3), padding='same', activation='relu'),
MaxPool2D(2,2),
Dropout(0.5),
Flatten(),
Dense(512, activation='relu'),
Dropout(0.5),
Dense(1, activation='sigmoid')])
# Model Compilation
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
#Tensorboard Set up
import tensorflow as tf
import datetime
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
#Checkpoint and earlystop setting
filepath = '/content/drive/My Drive/DL_Model.hdf5'
checkpoint = [tf.keras.callbacks.ModelCheckpoint(filepath, monitor='val_accuracy', mode='max', save_best_only=True, Save_weights_only = False, verbose = 1),
tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience = 15, verbose =1), [tensorboard_callback]]
#Model Fitting
hist = model.fit(
train_data_gen,
steps_per_epoch=None,
epochs=500,
validation_data=val_data_gen,
validation_steps=None,
callbacks = [checkpoint]
)
#Accuracy Print
train_acc = max(hist.history['accuracy'])
val_acc = max(hist.history['val_accuracy'])
train_loss = min(hist.history['loss'])
val_loss = min(hist.history['val_loss'])
print('Training accuracy is')
print(train_acc)
print('Validation accuracy is')
print(val_acc)
print('Training loss is')
print(train_loss)
print('Validation loss is')
print(val_loss)
#Load Tensorboard
%load_ext tensorboard
%tensorboard --logdir logs
'''

Error: l.outputs == params.inputs filters= in the [convolutional]-layer doesn't correspond to classes= or mask= in [yolo]-layer

I want to train a yolov3-tiny weights. I use darknet commands for training. Previously, I set the filter in the cfg file according to the number of classes, and set the number of classes in the data file. Finally, I used yolov3-tiny. Pre-training weights for training, but still report errors, please help.
This is the data file:
classes = 1
train = data/train_crosswalk_pos.txt
valid = data/val_crosswalk_pos.txt
names = data/crosswalk_pos.names
backup = backup/
This is the names file:
crosswalk
This is the cfg file:
[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=24
subdivisions=8
width=416
height=416
channels=1
momentum=0.9
decay=0.0005
angle=30
saturation = 1.5
exposure = 1.8
hue=.2
learning_rate=0.001
burn_in=1000
max_batches = 10000
policy=steps
steps=8000,9000
scales=.1,.1
[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=1
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
###########
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=18
activation=linear
[yolo]
mask = 3,4,5
anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
classes=1
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
[route]
layers = -4
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[upsample]
stride=2
[route]
layers = -1, 8
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=18
activation=linear
[yolo]
mask = 0,1,2
anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
classes=1
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
There are some more configurations to be made in the cfg file,
Set
Search for phrase 'yolo' and in the above convolution layer set filters = (num classes + 5) *3.
Note. This will need to be done in 3 places, for classes = 1, filters will be 18
Max_batches needs to be (num classes * 2000), minimum 4000, in this scenario it needs to be 4000.
Steps, below max_batches needs to be 80%, 90% of Max batches value, so here it should be 3200, 3600.
Your query is for yolov3, I am mentioning yolov4, probably the same.
In your case:
classes: 1 (in Yolo layers) max_batches: 10000 steps: 8000,9000 filters:18
Instead, it should:
classes: 1 (in Yolo layers) max_batches: 6000 steps: 4800 5400 filters:18

model.fit 0 validation performance. Model evaluate is correct

The validation accuracy seems to be zero.
Learning on Mnist Dataset. I copied the code from an Online Resource, and it isn't working properly.
Random Text to allow posting Question: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
Output
Epoch 1/8
235/235 [==============================] - 13s 55ms/step - loss: 0.7066 - accuracy: 0.8715 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
...
Epoch 8/8
235/235 [==============================] - 13s 55ms/step - loss: 0.0176 - accuracy: 0.9941 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Score: [0.052960414439439774, 0.9857000112533569]
Code
imports...
# Dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Cast to np.float32
x_train = x_train.astype(np.float32)
y_train = y_train.astype(np.float32)
x_test = x_test.astype(np.float32)
y_test = y_test.astype(np.float32)
# Reshape the images to a depth dimension
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)
# Dataset variables
train_size = x_train.shape[0]
test_size = x_test.shape[0]
width, height, depth = x_train.shape[1:]
num_features = width * height * depth
num_classes = 10
# Compute the categorical classes_list
y_train = to_categorical(y_train, num_classes=num_classes)
y_test = to_categorical(y_test, num_classes=num_classes)
# Model params
lr = 0.001
optimizer = Adam(lr=lr)
epochs = 8
batch_size = 256
dropout_rate = 0.1
# Define the DNN
input_img = Input(shape=x_train.shape[1:])
x = Conv2D(filters=8, kernel_size=5, padding='same')(input_img)
...
y_pred = Activation("softmax")(x)
# Build the model
model = Model(inputs=[input_img], outputs=[y_pred])
# Compile and train (fit) the model, afterwards evaluate the model
model.summary()
model.compile(
loss="categorical_crossentropy",
optimizer=optimizer,
metrics=["accuracy"])
model.fit(
x=x_train,
y=y_train,
epochs=epochs,
batch_size=batch_size,
validation_data=[x_test, y_test])
score = model.evaluate(
x_test,
y_test,
verbose=0)
print("Score: ", score)
Providing the solution here (Answer Section) even though it is present in the Comment Section (Thanks to Medrik), for the benefit of the community.
The validation_data was passed in [] bracket instead () bracket. This issue was resolved when modifying below code from
model.fit( x=x_train, y=y_train, epochs=epochs, batch_size=batch_size, validation_data=[x_test, y_test])
to
model.fit( x=x_train, y=y_train, epochs=epochs, batch_size=batch_size, validation_data=(x_test, y_test))