Running tensorflow 2.x in Colab with its internal keras version (tf.keras). My model is a 3D convolutional UNET for multiclass segmentation (not sure if it's relevant).
I've successfully trained (high enough accuracy on validation) this model the traditional way but I'd like to do augmentation to improve it, therefore I'm switching to (hand-written) generators. When I use generators I see my loss increasing and my accuracy decreasing a lot (e.g.: loss increasing 4-fold, not some %) in the fit.
To try to localize the issue I've tried loading my trained weights and computing the metrics on the data returned by the generators. And what's happening makes no sense. I can see that the results visually are ok.
model.evaluate(validationGenerator,steps=1)
2s 2s/step - loss: 0.4037 - categorical_accuracy: 0.8716
model.evaluate(validationGenerator,steps=2)
2s/step - loss: 1.7825 - categorical_accuracy: 0.7158
model.evaluate(validationGenerator,steps=4)
7s 2s/step - loss: 1.7478 - categorical_accuracy: 0.7038
Why would the loss vary with the number of steps? I could guess some % due to statistical variations... not 4 fold increase!
If I try
x,y = next(validationGenerator)
nSamples = x.shape[0]
meanLoss = np.zeros(nSamples)
meanAcc = np.zeros(nSamples)
for pIdx in range(nSamples):
y_pred = model.predict(np.expand_dims(x[pIdx,:,:,:,:],axis=0))
meanAcc[pIdx]=np.mean(tf.keras.metrics.categorical_accuracy(np.expand_dims(y[pIdx,:,:,:,:],axis=0),y_pred))
meanLoss[pIdx]=np.mean(tf.keras.metrics.categorical_crossentropy(np.expand_dims(y[pIdx,:,:,:,:],axis=0),y_pred))
print(np.mean(meanAcc))
print(np.mean(meanLoss))
I get accuracy~85% and loss ~0.44. Which is what I expect from the previous fit, and it varies by vary little from one batch to the other. And these are the same exact numbers that I get if I do model.evaluate() with 1 step (using the same generator function).
However I need about 30 steps to run trough my whole training dataset. What should I do?
If I fit my already good model to this generator it indeed worsen the performances a lot (it goes from a nice segmentation of the image to uniform predictions of 25% for each of the 4 classes!!!!)
Any idea on where to debud the issue? I've also visually looked at the images produced by the generator and at the model predictions and everything looks correct (as testified by the numbers I found when evaluating using a single step). I've tried writing a minimal working example with a 2 layers model but... in it the issue does not happen.
UPDATE: Generators code
So, as I've been asked, these are the generators code. They're handwritten
def dataGen (X,Y_train):
patchS = 64 #set the size of the patch I extract
batchS = 16 #number of samples per batch
nSamples = X.shape[0] #get total number of samples
immSize = X.shape[1:] #get the shape of the iamge to crop
#Get 4 patches from each image
#extract them randomly, and in random patient order
patList = np.array(range(0,nSamples),dtype='int16')
patList = patList.reshape(nSamples,1)
patList = np.tile(patList,(4,2))
patList[:nSamples,0]=0 #Use this index to tell the code where to get the patch from
patList[nSamples:2*nSamples,0]=1
patList[2*nSamples:3*nSamples,0]=2
patList[3*nSamples:4*nSamples,0]=3
np.random.shuffle(patList)
patStart=0
Xout = np.zeros((batchS,patchS,patchS,patchS,immSize[3])) #allocate output vector
while True:
Yout = np.zeros((batchS,patchS,patchS,patchS)) #allocate vector of labels
for patIdx in range(batchS):
XSR = 32* (patList[patStart+patIdx,0]//2) #get the index of where to extract the patch
YSR = 32* (patList[patStart+patIdx,0]%2)
xStart = random.randrange(XSR,XSR+32) #get a patch randomly somewhere between a range
yStart = random.randrange(YSR,YSR+32)
zStart = random.randrange(0,26)
patInd = patList[patStart+patIdx,1]
Xout[patIdx,:,:,:,:] = X[patInd,xStart:(xStart+patchS),yStart:(yStart+patchS),zStart:(zStart+patchS),:]
Yout[patIdx,:,:,:] = Y_train[patInd,xStart:(xStart+patchS),yStart:(yStart+patchS),zStart:(zStart+patchS)]
if((patStart+patIdx)>(patList.shape[0]-2)):
np.random.shuffle(patList) #after going through the whole list restart
patStart=0
patStart = patStart+batchS
Yout = tf.keras.utils.to_categorical (Yout, num_classes=4, dtype='float32') #convert to one hot encoding
yield Xout, Yout
Posting the workaround I've found for the future person coming here from google.
Apparently the issue lies in how keras calls a handwritten generator. When it was called multiple times in a row by using evaluate(gen, steps=N) apparently it returned wrong outputs. There's no documentation around about how to address this or how a generator should be written.
I ended up writing my code using a tf.keras.utils.sequence class and the same previous code now works perfectly. No way to know why.
Here are different factors that affect loss & accuracy:
For Accuracy, we know that it measures the accuracy of the prediction: i.e. correct-classes /total-classes.
While loss tracks the inverse-confidence of the prediction.
A high Loss indicates that although the model is performing well with the prediction, It is becoming uncertain of the prediction it is making.
For example, For an image classification scenario, The image of a cat is passed into two models. Model A predicts {cat: 0.8, dog: 0.2} and model B predicts {cat: 0.6, dog: 0.4}.
Both models will score the same accuracy, but model B will have a higher loss.
On your evaluation part, Based on the documentation
Steps: Integer or None. Total number of steps (batches of samples) before declaring the evaluation round finished. Ignored with the default value of None. If x is a tf.data dataset and steps is None, 'evaluate' will run until the dataset is exhausted. This argument is not supported by array inputs.
So for simplify, it's getting the Nth batch of your validation samples.
It could be that the model prediction is becoming uncertain since the majority of the unknown data falls on those specific steps. which in your case, steps 2 & 3.
So, As the evaluation steps progress, The prediction becomes more uncertain leading to a higher loss.
You might need to retrain your model with more training samples but of course, you need to be careful since you might encounter overfitting.
In terms of data augmentation, you might wanna check this link
In Training Perspective, proper data augmentation is one of the factors that leads to good model performance.
I am using Tensorflow Object detection, with faster_rcnn_inception_v2_coco as pretrained model. I'm on Windows 10, with tensorflow-gpu 1.6 on NVIDIA GeForce GTX 1080, CUDA 9.0 and CUDNN 7.0.
I'm trying to training a multi-class object detection with a custom dataset, but I had some weird behavior. I have 2 classes: Pistol and Knife (with respectively 876 and 664 images, all with similar size from 360x200 to 640x360, and similar ratio). So, I think that the dataset is balanced. I splitted it into Train set (1386 images: 594 knife, 792 pistol) and Test set (154 images: 70 knife, 84 pistol)
The CNN seems that can detect only one of the two object with good accuracy, and which object can detect (of the two classes) changes randomly during the training steps and in the same image (example: step 10000 it detect only pistol, step 20000 only knife, step 30000 knife, step 40000 pistol, step 50000 knife, etc..), as showen below:
]
Moreover, the Loss looks weird, and the accuracy during the evaluation are never high for both classes together.
During the training phase, the loss seems to oscillate at every training step.
Loss:
Total Loss:
From the mAp (image below) you can see that the two objects are never identified together at the same step:
If I trained these two classes separately, I can achieve a good 50-60% accuracy. If I train these two classes together, the results is what you have seen.
Here you can find the generate_tfrecord.py and the model configuration file (that I changed to made it multi-class). The label map isthe following:
item {
id: 1
name: 'knife'
}
item {
id: 2
name: 'pistola'
}
Any suggest are welcome.
UPDATES
After 600k iterations, loss is still oscillating.
The scenario is the following: Loss, Total Loss, and mAp.
Finally, I solved my issue.
I follow the advice of #Suleiman, but at first time I shuffled only the test.csv and train.csv. I saw that inside my generate_tfrecords.py the items will be reordered by filename, so the shuffle from before was useless.
I shuffled the dataset inside generate_tfrecords.py by changing
examples = pd.read_csv(FLAGS.csv_input)
grouped = split(examples, 'filename')
for group in grouped:
tf_example = create_tf_example(group, path)
to this:
examples = pd.read_csv(FLAGS.csv_input)
grouped = split(examples, 'filename')
shuffle(grouped) // shuffling list of entries
for group in grouped:
tf_example = create_tf_example(group, path)
adding the shuffle of the list of entries. The results improved a lot, as you can see in the plots of Loss, Total Loss and mAp:
Loss and Total Loss:
mAp:
Now there's only a peak in the loss, maybe for some faults in the dataset that I'll will clean. Obviously, also the evaluation and the detection are now quite good.
SO REMEMBER: the order of the images in your TFRecords are very important (expecially when batch size is 1)!
Thanks Suleiman for the hint.
I am training faster rcnn model on fruit dataset using a pretrained model provided in google api(faster_rcnn_inception_resnet_v2_atrous_coco).
I made few changes to the default configuration. (number of classes : 12 fine_tune_checkpoint: path to the pretrained checkpoint model and from_detection_checkpoint: true). Total number of annotated images I have is around 12000.
After training for 9000 steps, the results I got have an accuracy percent below 1, though I was expecting it to be atleast 50% (In evaluation nothing is getting detected as accuracy is almost 0). The loss fluctuates in between 0 and 4.
What should be the number of steps I should train it for. I read an article which says to run around 800k steps but its the number of step when you train from scratch?
FC layers of the model are changed because of the different number of the classes but it should not effect those classes which are already present in the pre-trained model like 'apple'?
Any help would be much appreciated!
You shouldn't look at your training loss to determine when to stop. Instead, you should run your model through the evaluator periodically, and stop training when the evaluation mAP stops improving.
In the sample pipeline config file of TensorFlow object detection, there is this snippet:
eval_config: {
num_examples: 2000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10
}
Does "num_examples" mean each evaluation run uses the same first 2000 images, or it treats the test set as a circular buffer and uses different 2000 images each time?
Actually this means only the same top num_examples samples in your evaluation dataset will be used in each run of evaluation.
num_example is equal to the number of test images you are feeding into the API
TL;DR Circular buffer if enough num_epochs and no shuffle
I believe it works in "collaboration" with the input reader config. If in the eval_input_reader you set num_epochs to 1, then it will process the first 2000 images from the input queue, provided the shuffle = false, otherwise some random 2000 images. If you don't have 2000 images, it will probably fail, as the queue is emptied.
The relevant code is here and here
I am using Keras with TensorFlow backend. The dataset I am working with is sequence data with a Y value that is continuous between 0 and 1. The dataset is split into training with size 1900 and a testing with size 400. I am using the VGG19 architecture that I created from scratch in Keras. I am using an epoch of 30.
My question is, if I run this architecture multiple times, I get very different results. My results can be between 0.15 and 0.5 RMSE. Is this normal for this type of data? Is it because I am not running enough epochs? The loss from the network seems to stabilize around 0.024 at the end of the run. Any ideas?