Multi-class object detection with tensorflow: weird behavior on evaluation - tensorflow

I am using Tensorflow Object detection, with faster_rcnn_inception_v2_coco as pretrained model. I'm on Windows 10, with tensorflow-gpu 1.6 on NVIDIA GeForce GTX 1080, CUDA 9.0 and CUDNN 7.0.
I'm trying to training a multi-class object detection with a custom dataset, but I had some weird behavior. I have 2 classes: Pistol and Knife (with respectively 876 and 664 images, all with similar size from 360x200 to 640x360, and similar ratio). So, I think that the dataset is balanced. I splitted it into Train set (1386 images: 594 knife, 792 pistol) and Test set (154 images: 70 knife, 84 pistol)
The CNN seems that can detect only one of the two object with good accuracy, and which object can detect (of the two classes) changes randomly during the training steps and in the same image (example: step 10000 it detect only pistol, step 20000 only knife, step 30000 knife, step 40000 pistol, step 50000 knife, etc..), as showen below:
]
Moreover, the Loss looks weird, and the accuracy during the evaluation are never high for both classes together.
During the training phase, the loss seems to oscillate at every training step.
Loss:
Total Loss:
From the mAp (image below) you can see that the two objects are never identified together at the same step:
If I trained these two classes separately, I can achieve a good 50-60% accuracy. If I train these two classes together, the results is what you have seen.
Here you can find the generate_tfrecord.py and the model configuration file (that I changed to made it multi-class). The label map isthe following:
item {
id: 1
name: 'knife'
}
item {
id: 2
name: 'pistola'
}
Any suggest are welcome.
UPDATES
After 600k iterations, loss is still oscillating.
The scenario is the following: Loss, Total Loss, and mAp.

Finally, I solved my issue.
I follow the advice of #Suleiman, but at first time I shuffled only the test.csv and train.csv. I saw that inside my generate_tfrecords.py the items will be reordered by filename, so the shuffle from before was useless.
I shuffled the dataset inside generate_tfrecords.py by changing
examples = pd.read_csv(FLAGS.csv_input)
grouped = split(examples, 'filename')
for group in grouped:
tf_example = create_tf_example(group, path)
to this:
examples = pd.read_csv(FLAGS.csv_input)
grouped = split(examples, 'filename')
shuffle(grouped) // shuffling list of entries
for group in grouped:
tf_example = create_tf_example(group, path)
adding the shuffle of the list of entries. The results improved a lot, as you can see in the plots of Loss, Total Loss and mAp:
Loss and Total Loss:
mAp:
Now there's only a peak in the loss, maybe for some faults in the dataset that I'll will clean. Obviously, also the evaluation and the detection are now quite good.
SO REMEMBER: the order of the images in your TFRecords are very important (expecially when batch size is 1)!
Thanks Suleiman for the hint.

Related

YOLOv4 loss too high

I am using YOLOv4-tiny for a custom dataset of 26 classes that I collected from Open Images Dataset. The dataset is almost balanced(850 images per class but different number of bounding boxes). When I used YOLOv4-tiny to train on just 3 classes the loss was near 0.5, it was fairly accurate. But for 26 classes as soon as the loss goes below 2 the model starts to overfit. The prediction are also very inaccurate.
I have tried to change the parameters like the learning rate, the momentum and the size but whatever I do the models becomes worse then before. Using regular YOLOv4 model rather then YOLO-tiny does not help either. How can I bring the loss further down?
Have you tried training with mAP? You can take a subset of your training set and make it the validation set. This can be done in the same way you made your training and test set. Then, you can run darknet.exe detector train data/obj.data yolo-obj.cfg yolov4.conv.137 -map. This will keep track of the loss in your validation set. When the error in the validation say goes up, this is the time to stop training and prevent overfitting (this is called: early stopping).
You need to run the training for (classes*2000)iterations. However, for the best scores, you need to train your model for at least 6000 iterations (also known as max_batches). Also please remember if you are using a b&w image, change the channels=3 to channels=1. You can stop your training once the avg loss becomes something like this: 0.XXXX.
Here's my mAP graph for 6000 iterations that ran for 6.2 hours:
avg loss with 6000 max_batches.
Moreover, you can follow this FAQ documentation here by Stéphane Charette.

TensorFlow image classification colab sheet from training material: newbie questions

Apologies if my questions are relatively simple, but I have been approaching the TensorFlow bit recently with the aim to learn new skills.
In the example, but there are several things I can't get:
in the explore data section, the size of the datasets return as 60/10k respectively for train and test.
where the size of the train/test size declared?
packages like SkLearn allows this to be specified in percentage when invoking the split methods.
in the training model part, when the 5 epochs are trained, the 1875 number appear below.
- what is that?
- I was expecting the training to run over the 60k items, but even by multiplying 1875 by 5 the number doesn't reach the 10k.
Dataset is loaded using tensorflow datasets API
The source itself has the split of 60K (Train) and 10K (Test)
https://www.tensorflow.org/datasets/catalog/fashion_mnist
An Epoch is a complete run with all the training samples. The training is done in batches. In the example you refer to, a batch size of 32 is used. So to complete one epoch, 1875 batches (60000 / 32) are run.
Hope this helps.

model.evaluate() varies wildly with number of steps when using generators

Running tensorflow 2.x in Colab with its internal keras version (tf.keras). My model is a 3D convolutional UNET for multiclass segmentation (not sure if it's relevant).
I've successfully trained (high enough accuracy on validation) this model the traditional way but I'd like to do augmentation to improve it, therefore I'm switching to (hand-written) generators. When I use generators I see my loss increasing and my accuracy decreasing a lot (e.g.: loss increasing 4-fold, not some %) in the fit.
To try to localize the issue I've tried loading my trained weights and computing the metrics on the data returned by the generators. And what's happening makes no sense. I can see that the results visually are ok.
model.evaluate(validationGenerator,steps=1)
2s 2s/step - loss: 0.4037 - categorical_accuracy: 0.8716
model.evaluate(validationGenerator,steps=2)
2s/step - loss: 1.7825 - categorical_accuracy: 0.7158
model.evaluate(validationGenerator,steps=4)
7s 2s/step - loss: 1.7478 - categorical_accuracy: 0.7038
Why would the loss vary with the number of steps? I could guess some % due to statistical variations... not 4 fold increase!
If I try
x,y = next(validationGenerator)
nSamples = x.shape[0]
meanLoss = np.zeros(nSamples)
meanAcc = np.zeros(nSamples)
for pIdx in range(nSamples):
y_pred = model.predict(np.expand_dims(x[pIdx,:,:,:,:],axis=0))
meanAcc[pIdx]=np.mean(tf.keras.metrics.categorical_accuracy(np.expand_dims(y[pIdx,:,:,:,:],axis=0),y_pred))
meanLoss[pIdx]=np.mean(tf.keras.metrics.categorical_crossentropy(np.expand_dims(y[pIdx,:,:,:,:],axis=0),y_pred))
print(np.mean(meanAcc))
print(np.mean(meanLoss))
I get accuracy~85% and loss ~0.44. Which is what I expect from the previous fit, and it varies by vary little from one batch to the other. And these are the same exact numbers that I get if I do model.evaluate() with 1 step (using the same generator function).
However I need about 30 steps to run trough my whole training dataset. What should I do?
If I fit my already good model to this generator it indeed worsen the performances a lot (it goes from a nice segmentation of the image to uniform predictions of 25% for each of the 4 classes!!!!)
Any idea on where to debud the issue? I've also visually looked at the images produced by the generator and at the model predictions and everything looks correct (as testified by the numbers I found when evaluating using a single step). I've tried writing a minimal working example with a 2 layers model but... in it the issue does not happen.
UPDATE: Generators code
So, as I've been asked, these are the generators code. They're handwritten
def dataGen (X,Y_train):
patchS = 64 #set the size of the patch I extract
batchS = 16 #number of samples per batch
nSamples = X.shape[0] #get total number of samples
immSize = X.shape[1:] #get the shape of the iamge to crop
#Get 4 patches from each image
#extract them randomly, and in random patient order
patList = np.array(range(0,nSamples),dtype='int16')
patList = patList.reshape(nSamples,1)
patList = np.tile(patList,(4,2))
patList[:nSamples,0]=0 #Use this index to tell the code where to get the patch from
patList[nSamples:2*nSamples,0]=1
patList[2*nSamples:3*nSamples,0]=2
patList[3*nSamples:4*nSamples,0]=3
np.random.shuffle(patList)
patStart=0
Xout = np.zeros((batchS,patchS,patchS,patchS,immSize[3])) #allocate output vector
while True:
Yout = np.zeros((batchS,patchS,patchS,patchS)) #allocate vector of labels
for patIdx in range(batchS):
XSR = 32* (patList[patStart+patIdx,0]//2) #get the index of where to extract the patch
YSR = 32* (patList[patStart+patIdx,0]%2)
xStart = random.randrange(XSR,XSR+32) #get a patch randomly somewhere between a range
yStart = random.randrange(YSR,YSR+32)
zStart = random.randrange(0,26)
patInd = patList[patStart+patIdx,1]
Xout[patIdx,:,:,:,:] = X[patInd,xStart:(xStart+patchS),yStart:(yStart+patchS),zStart:(zStart+patchS),:]
Yout[patIdx,:,:,:] = Y_train[patInd,xStart:(xStart+patchS),yStart:(yStart+patchS),zStart:(zStart+patchS)]
if((patStart+patIdx)>(patList.shape[0]-2)):
np.random.shuffle(patList) #after going through the whole list restart
patStart=0
patStart = patStart+batchS
Yout = tf.keras.utils.to_categorical (Yout, num_classes=4, dtype='float32') #convert to one hot encoding
yield Xout, Yout
Posting the workaround I've found for the future person coming here from google.
Apparently the issue lies in how keras calls a handwritten generator. When it was called multiple times in a row by using evaluate(gen, steps=N) apparently it returned wrong outputs. There's no documentation around about how to address this or how a generator should be written.
I ended up writing my code using a tf.keras.utils.sequence class and the same previous code now works perfectly. No way to know why.
Here are different factors that affect loss & accuracy:
For Accuracy, we know that it measures the accuracy of the prediction: i.e. correct-classes /total-classes.
While loss tracks the inverse-confidence of the prediction.
A high Loss indicates that although the model is performing well with the prediction, It is becoming uncertain of the prediction it is making.
For example, For an image classification scenario, The image of a cat is passed into two models. Model A predicts {cat: 0.8, dog: 0.2} and model B predicts {cat: 0.6, dog: 0.4}.
Both models will score the same accuracy, but model B will have a higher loss.
On your evaluation part, Based on the documentation
Steps: Integer or None. Total number of steps (batches of samples) before declaring the evaluation round finished. Ignored with the default value of None. If x is a tf.data dataset and steps is None, 'evaluate' will run until the dataset is exhausted. This argument is not supported by array inputs.
So for simplify, it's getting the Nth batch of your validation samples.
It could be that the model prediction is becoming uncertain since the majority of the unknown data falls on those specific steps. which in your case, steps 2 & 3.
So, As the evaluation steps progress, The prediction becomes more uncertain leading to a higher loss.
You might need to retrain your model with more training samples but of course, you need to be careful since you might encounter overfitting.
In terms of data augmentation, you might wanna check this link
In Training Perspective, proper data augmentation is one of the factors that leads to good model performance.

same train and eval dataset but get a different result

Versions:
TensorFlow: 1.8.0
TensorBoard: 1.8.0
What i did:
I'm training a model with imbalanced dataset with tf.estimator.DNNClassifier. When i did two times of training process both start from a totally new beginning(AKA, no checkpoint for each training) with the same data. I got two results which are very different from each other as shown in the following pictures.
1st-train
2nd-train
A few points to comment:
There is not difference between the two training process (no code or data changes), they both start from a new beginning.
The training dataset size is about 100M.
Both training results are from 6 epochs. (And each result cost $25 on google ml-engine.)
From the two pictures we can tell:
The 1st training learns nothing for 6 epochs.
The 2nd training learns (it got a AUC over 0.6).
Although the difference of AUC values between two trainings is only 0.1 (0.6 - 0.5), but it has big different in the meaning (a-random-guess versus a-non-random-guess).
Problems:
Why is this happen: same training data but get a totally different result?

when to stop training object detection tensorflow

I am training faster rcnn model on fruit dataset using a pretrained model provided in google api(faster_rcnn_inception_resnet_v2_atrous_coco).
I made few changes to the default configuration. (number of classes : 12 fine_tune_checkpoint: path to the pretrained checkpoint model and from_detection_checkpoint: true). Total number of annotated images I have is around 12000.
After training for 9000 steps, the results I got have an accuracy percent below 1, though I was expecting it to be atleast 50% (In evaluation nothing is getting detected as accuracy is almost 0). The loss fluctuates in between 0 and 4.
What should be the number of steps I should train it for. I read an article which says to run around 800k steps but its the number of step when you train from scratch?
FC layers of the model are changed because of the different number of the classes but it should not effect those classes which are already present in the pre-trained model like 'apple'?
Any help would be much appreciated!
You shouldn't look at your training loss to determine when to stop. Instead, you should run your model through the evaluator periodically, and stop training when the evaluation mAP stops improving.