How can I batch images with arbitrary sizes in TensorFlow? - tensorflow

Acually in Caffe, there seems exists a way to maintain the aspect ratio after resizing images such that the smaller dimension equaled 500 or else.
But I cannot find any way to solve this problem.
In this paper, we can see that,
We implemented our model using the Caffe library [15] and optimized it using SGD with momentum. Training on the AVA dataset’s approximately 250k training images took 2 weeks on a single Nvidia M40 GPU. Although our network can train and evaluate with images of arbitrary dimensions, very large images drastically decrease training and evaluation speed and pose memory issues due to GPU memory constraints. Therefore, in practice we resize each image such that the smaller image dimension equaled 500, while maintaining the original aspect ratio. This resulted in significant loss of resolution in some cases, but is a signifi- cantly higher resolution than is typically used for convolutional networks. We used a batch size of 128, a learning rate of 10−3 , momentum of 0.9 and weight decay of 5 · 10−4 . We reduced the learning rate after every 20k iterations. The convolutional layers were pre-trained on ImageNet [6].

You can simply resize the image as they say with your favorite library, for instance, you could use scipy.misc.imresize.

Related

Why does Training time not reduce when training a keras model after Increasing the batch size in beyond a certain amount

I am currently traing an NLP model in Keras with TF 2.8 where I am experimenting by adding GRU and LSTM layers. When I train the model, I used different batch size to see the impact it had on the accuracy and overal training time.
What I noticed was that after Increasing the batch size after a certain amount the training time doesnt reduce, after a certain amount the training size stayed the same.
I started with a batch size of 2 then slowly increased upto 4096 trying multiples of two, yet after 512 the training time remained the same.
It's often wrongly mentioned that batch learning is as fast or faster than on-line training. In fact, batch-learning is changing the weights once, the complete set of data (the batch) has been presented to the network. Therefore, the weight update frequency is rather slow. This explains why the processing speed in your measurements acts like you observed.
Even if its matrix operation, each row-colum multiplication might be happening on one gpu-core. So, full matrix multiplication is divided on as many cores as possible. For one matrix mul, each gpu-core takes some time, and when you add more images, that time increases, do more rows. If at batch size of 4, your gpu is already at full performance capacity, i.e. all cores are running, then increasing batch size is not going to give any advantage. Your added data just sits in gpu memory and is processed when an nvidia dice gets free of previous operation.
To get a further understanding for the training techniques, have a look at the 2003 paper The general inefficiency of batch training for gradient descent learning. It deals with the comparison of batch and on-line learning.
Also generally, RNN kernels can have O(timesteps) complexity, with batch size having a smaller effect than you might anticipate.

Training SSD-MOBILENET V1 and the loss does not deacrease

I'm new in everithing about CNN and tensorflow. Im training a pretrained ssd-mobilenev1-pets.config to detect columns of buildings, about one day but the loss is between 2-1 and doesnt decrease since 10 hours ago.
I realized that my input images are 128x128 and SSD resize de image to 300*300.
Does the size of the input images affect the training?
If that is the case, should I retrain the network with larger input images? or what would be another option to decrease the loss? my train dataset has 660 images and test 166 I dont Know if there are enough images
I really aprecciate your help ....
Loss values of ssd_mobilenet can be different from faster_rcnn. From EdjeElectronics' TensorFlow Object Detection Tutorial:
For my training on the Faster-RCNN-Inception-V2 model, it started at
about 3.0 and quickly dropped below 0.8. I recommend allowing your
model to train until the loss consistently drops below 0.05, which
will take about 40,000 steps, or about 2 hours (depending on how
powerful your CPU and GPU are). Note: The loss numbers will be
different if a different model is used. MobileNet-SSD starts with a
loss of about 20, and should be trained until the loss is consistently
under 2.
For more information: https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10#6-run-the-training
The SSD Mobilnet architecture demands additional training to suffice
the loss accuracy values of the R-CNN model, however, offers
practicality, scalability, and easy accessibility on smaller devices
which reveals the SSD model as a promising candidate for further
assessment (Fleury and Fleury, 2018).
For more information: Fleury, D. & Fleury, A. (2018). Implementation of Regional-CNN and SSD machine learning object detection architectures for the real time analysis of blood borne pathogens in dark field microscopy. MDPI AG.
I would recommend you to take 15%-20% images for testing which cover all the variety present in training data. As you said you have 650+ images for training and 150+ for testing. That is roughly 25% of testing images. It looks like you have enough images to start with. I know the more, the merrier but make sure your model also has sufficient data to learn from!
Resizing the images does not contribute to the loss. It makes sure there is consistency across all images for the model to recognize them without bias. The loss has nothing to do with image resizing as long as every image is resized identically.
You have to make stops and recover checkpoints again and again if you want your model to be perfectly fit. Usually, you can get away with good accuracy by re-training the ssd mobilenet until the loss consistently becomes under 1.Ideally we want the loss to be as lower as possible but we want to make sure the model is not over-fitting. It is all about trial and error. (Loss between 0.5 and 1 seems to be doing the job well but again it all depends on you.)
The reason I think your model is underperforming is due to the fact that you have variety of testing data and not enough training data to suffice.
The model has not been given enough knowledge in training data to make the model learn for new variety of testing data. (For example : Your test data has some images of new angles of buildings which are not sufficiently present in training data). In that case, I recommend you to put variety of all images in training data and then picking images to test making sure you still have sufficient training data of new postures. That's why I recommend you to take 15%-20% test data.

What is the computational power required for training High resolution (4024 x 3036) images using VGG16-Net?

I am working on a classification of high-resolution images using VGG16-Net in keras.
But I am unable to use images of size beyond (600 x 600) resolution for training using Batch size 1 on nVIDIA GeForce GTX 1080 GPU,
I am facing Resource Exhaustion error OOM i.e unable to allocate tensor of the shape [18, 64, 600, 600].
Can anyone please suggest me any solution for this?
I want to use the large size images since I am labeling the images as Good and Bad based on the very small difference.
Thanks in advance!!
The whole network plus batch data need to be able to fit into VRAM. If you really do need to use high resolution images then you need to use a smaller network.
vgg-16 is old and inefficient anyway, not recommended for a new project. You could lookup things like mobilenetv2 or mnasnet but bare in mind that all of these commonly used models are generally optimized for around 600x600 or much smaller. Out of interest, I have tried training CNNs on very high resolution images, just to see what would happen and I found that of course, they train and run painfully slowly with much reduced accuracy - if all of the features in the images are very large with respect to the convolutional filters, then the filters won't be able to pick up on them.

How to improve recall of faster rcnn object detection model

I'm retraining a faster rcnn inception coco model for detecting brand of products on shelf.
I stopped the model around 400k steps when total loss dropped under 0.1 over a period of time. The recall was around 65% and precision was 40% with 95% confidence cut-off threshold.
Learning rate started at 0.00001 and configured to reduce to 0.000005 after 200k steps.
The dataset size is 15 classes with at least 100 annotated boxes for each class. Total number of images is 300.
How to improve recall of the model?
Should I change to faster rcnn ras (which has higher mAP but I don't think precision is as important as recall in my use case)?
Another question is: usually what's the recall for a object detection model? Is it very challenging to reach higher than 90%?
Many thanks in advance!
You could try using image augmentation to expand your training dataset. 300 images is not much. Try looking at https://github.com/aleju/imgaug.
As you are asking about the Faster RCNN model, you can track two different metrics.
Precision and Recall for the Region Proposal Network (RPN).
Precision and Recall for the RCNN Final output.
The above two metrics can give us a better understanding of how the model is performing.
Case 1: When the recall of RPN is high and low for the RCNN output, then it is clear that, you don't have enough positive labels for the classification network to learn.
Case 2: When the recall of RPN is low and high for the RCNN output, then you might not have enough amount of training data and less number of classes.
Case 3: When both recalls are low, then try larger dataset as your model is already converging.
-- Experimenting with learning rate always helps.
-- Simple Hack : You can use multiple aspect ratios (near to your original aspect ratios) so that you can get more labels for training (Not sure how well it helps in your case).

Training Resnet deep neural network from scratch

I need to gain some knowledge about deep neural networks.
For a 'ResNet' very deep neural network, we can use transfer learning to train a model.
But Resnet has been trained over the ImageNet dataset. So their pre-trained weights can be used to train the model with another dataset. (for an example training a model for lung cancer detection with CT lung images)
I feels that this approach will be not accurate as the pre-trained weights has been completely trained over other objects but not with medical data.
Instead of transfer learning, is it possible to train the resnet from scratch? (but the available number of images to train the resnet is around 1500) . Is it something possible to do with a normal computer.
Can someone please share your valuable ideas with me
is it possible to train the resnet from scratch?
Yes, it is possible, but the amount of time one needs to get to good accuracy greatly depends on the data. For instance, training original ResNet-50 on a NVIDIA M40 GPU took 14 days (10^18 single precision ops). The most expensive operation in CNN is the convolution in the early layers.
ImageNet contains 14m 226x226x3 images. Since your dataset is ~10000x smaller, each epoch will take ~10000x less ops. On top of that, if you pass gray-scale instead of RGB images, the first convolution will take 3x less ops. Likewise spatial image size affects the training time as well. Training on smaller images can also increase the batch size, which usually speeds things up due to vectorization.
All in all, I estimate that a machine with a single consumer GPU, such as 1080 or 1080ti, can train ~100 epochs of ResNet-50 model in a day. Obviously, training on a 2-GPU machine would be even faster. If that is what you mean by a normal computer, the answer is yes.
But since your dataset is very small, there's a big chance of overfitting. This looks like the biggest issue that your approach faces.