understanding Image Classification ResNet accuracy - tensorflow

Currently I have trained a ResNet50 for image classification with a 85% classification accuracy. The doubt that I am encountering is that, when I input an image that is far away from the training set, instead of getting a 0% confidence in all the labels, I am getting 100% in one of the labels.
In other words if I train my network to classify dogs and cats it classifies correctly with 90-99% accuracy in between the image class but if I present a bicycle it tends to output 100% confidence for one of the labels.
Is this the correct expected response from the network?
If so, how can I make the network report 0% confidence in my training label set in case I present an image that is outside the training set?
Thanks

Related

Poor classification accuracy when varying input image size with convolutional neural network (CNN)

I'm using Keras and TensorFlow to perform image classification and I obtain very high accuracy with image sets of a fixed size (where all training images have the same dimensions). However, accuracy is very poor when I let image height and width vary by specifying an input_shape of (None, None, 3). It's not that surprising that performance drops when image dimensions vary but what was surprising to me is the effect on training time. When images all have the same dimensions, each training epoch takes about 20 minutes; however, when training images are of varying sizes, each training epoch is taking less than 5 minutes (everything else including GPU and CNN architecture being the same).
Why would varying the input image size result in such a short training time? This likely is tied to why classification performance is decreasing significantly and I'd like a better understanding as to why this is occurring.

Training dataset repeatedly - Keras

I am doing an image classification task using Keras.
I used the vgg16 architecture, I thought it is easier to do, the task is to classify the image having tumor or not in MRI images.
As usual, I read and make all the images in same shape (224×224×3) and normalised by dividing all the images by 255. Then train test split, test dataset is 25% and training dataset is 75%.
train, test = train_test_split(X, y, test_size=0.25)
Then, I trained and got val_loss as 0.64 and val_accuracy as 0.7261.
I save the trained model in my google drive.
Next day, I used the same procedure, to improve the model performance by loading the saved model.
I didn't change the model architecture, I simply loaded the saved model which scores 0.7261 accuracy.
This time, I got better performance, the val_loss is 0.58 and val_accurqcy is 0.7976.
I wonder how this gets high accuracy. Then, I found that when splitting the dataset, the images will splits in random, and thus some of the test data in the 1st training process will become training data in the 2nd training process. So, the model learns the images and predicted well in 2nd training process.
I have to clarify, is this model is truly learns the tumor patterns or it is like that we train and test the model with same dataset or same image samples.
Thanks
When using train_test_split and validating in different sessions, always set your random seed. Otherwise, you will be using different splits, and leaking data like you stated. The model is not "learning" more, rather is being validated on data that it has already trained on. You will likely get worse real-world performance.

Resnet training - L2 loss decreases while cross-entropy stays around 0.69

I am using this https://github.com/tensorflow/models/tree/master/official/resnet official tensorflow implementation of resnet to train a binary classifier on my own dataset. I modified a little bit of the input_fn in imagenet_main.py to do my own image loading and preprocessing. But after many times of parameter tuning, I can't make my model train properly. I can only find a set of parameters that let training accuracy increase reaching 100%, while the validation accuracy stay around 50% forever. The implementation uses piece-wise learning-rate. I tried initial learning rate from 0.1 to 1e-5 and weight decay from 1e-2 to 1e-5, and no convergence on validation set was found.
A suspicious observation is that during training, the l2 loss decrease slowly and steady while cross-entropy is very reluctant to decrease, staying around 0.69.
Any idea about what can I try further ?
Regarding my dataset and image preprocessing, The training data set is around 100K images. The validation set is around 10K. I just resize each image to 224*224 while keeping aspect ration and subtract 127 on each channel and divide them by 255.
Actually #Hua resnet have so many trainable parameters and it is trained on image net which has 1k classes. and your data-set has only two classes. Dense layers of resnet has 4k neurons which in result increase the number of trainable parameter. Now number of parameters are directly related to risk of over-fitting. Means that resnet model is not suitable for your data kindly make some changes to resnet. Try to decrease number of parameter. That may help –

when to stop training object detection tensorflow

I am training faster rcnn model on fruit dataset using a pretrained model provided in google api(faster_rcnn_inception_resnet_v2_atrous_coco).
I made few changes to the default configuration. (number of classes : 12 fine_tune_checkpoint: path to the pretrained checkpoint model and from_detection_checkpoint: true). Total number of annotated images I have is around 12000.
After training for 9000 steps, the results I got have an accuracy percent below 1, though I was expecting it to be atleast 50% (In evaluation nothing is getting detected as accuracy is almost 0). The loss fluctuates in between 0 and 4.
What should be the number of steps I should train it for. I read an article which says to run around 800k steps but its the number of step when you train from scratch?
FC layers of the model are changed because of the different number of the classes but it should not effect those classes which are already present in the pre-trained model like 'apple'?
Any help would be much appreciated!
You shouldn't look at your training loss to determine when to stop. Instead, you should run your model through the evaluator periodically, and stop training when the evaluation mAP stops improving.

patch-wise training and fully convolutional training in FCN

In the FCN paper, the authors discuss the patch wise training and fully convolutional training. What is the difference between these two?
Please refer to section 4.4 attached in the following.
It seems to me that the training mechanism is as follows,
Assume the original image is M*M, then iterate the M*M pixels to extract N*N patch (where N<M). The iteration stride can some number like N/3 to generate overlapping patches. Moreover, assume each single image corresponds to 20 patches, then we can put these 20 patches or 60 patches(if we want to have 3 images) into a single mini-batch for training. Is this understanding right? It seems to me that this so-called fully convolutional training is the same as patch-wise training.
The term "Fully Convolutional Training" just means replacing fully-connected layer with convolutional layers so that the whole network contains just convolutional layers (and pooling layers).
The term "Patchwise training" is intended to avoid the redundancies of full image training.
In semantic segmentation, given that you are classifying each pixel in the image, by using the whole image, you are adding a lot of redundancy in the input. A standard approach to avoid this during training segmentation networks is to feed the network with batches of random patches (small image regions surrounding the objects of interest) from the training set instead of full images. This "patchwise sampling" ensures that the input has enough variance and is a valid representation of the training dataset (the mini-batch should have the same distribution as the training set). This technique also helps to converge faster and to balance the classes. In this paper, they claim that is it not necessary to use patch-wise training and if you want to balance the classes you can weight or sample the loss.
In a different perspective, the problem with full image training in per-pixel segmentation is that the input image has a lot of spatial correlation. To fix this, you can either sample patches from the training set (patchwise training) or sample the loss from the whole image. That is why the subsection is called "Patchwise training is loss sampling".
So by "restricting the loss to a randomly sampled subset of its spatial terms excludes patches from the gradient computation." They tried this "loss sampling" by randomly ignoring cells from the last layer so the loss is not calculated over the whole image.