What's the value of random scale / crop / brightness in image classifier - tensorflow

When we retrain the image classifier layer in Mobilenet, the retrain script allows us to specific several parameters to preprocess the input images:
random_scale
random_crop
random_brightness
I would like to know how to determine these values? I saw in some articles they set random_brightness and random_scale to 30, and random_crop to 0.
Can someone help me to understand these parameters?

Found the answer from this link: https://github.com/tensorflow/hub/blob/master/docs/tutorials/image_retraining.md
A common way of improving the results of image training is by deforming, cropping, or brightening the training inputs in random ways. This has the advantage of expanding the effective size of the training data thanks to all the possible variations of the same images, and tends to help the network learn to cope with all the distortions that will occur in real-life uses of the classifier. The biggest disadvantage of enabling these distortions in our script is that the bottleneck caching is no longer useful, since input images are never reused exactly. This means the training process takes a lot longer (many hours), so it's recommended you try this as a way of polishing your model only after you have one that you're reasonably happy with.

Related

Different results in image classifier using JPG BMP

I trained an Image Classifier with Tensforflow using a bunch of JPG images.
Let's say I have 3 classifiers, ClassifierA, ClassifierB, ClassifierC.
When testing the classifiers, I have no issues at all in 90% of the images I use as a test. But in some cases, I have misclassifications due to the image quality.
For example, the image below is the same, saved as BMP and JPG. You'll see little differences due to the format quality.
When I test the BMP version using tf.image.decode_bmp I get misclassifications, let's say ClassifierA 70%
When I test the JPG version using tf.image.decode_jpeg I get the right one, ClassifierB 90%
When I test the JPG version using tf.image.decode_jpeg and dct_method="INTEGER_ACCURATE" I get the right one with the much better result, ClassifierB 99%
What could be the issue here? Such difference between BMP and JPG, and how can I solve this if there's a solution?
update1: I retrained my Classifier using different effects and randomly changing the quality in which I save the images I use as a dataset.
Now, I get the right output, but still the percentages changes a lot, for example44% with BMP and +90% with JPG
This is a fabulous question, and even more fabulous of an observation. I'm going to use this in my own work in the future!
I expect you have just identified a rather fascinating issue with the dataset. It appears that your model is overfitting to features specific to JPG compression. The solution is to increase data augmentation. In particular, convert your training samples between various formats randomly.
This issue also makes me think that sharpening and blurring operations would make good data augmentation features. It's common to alter color, contrast, rotation, scale, orientation, and translation of the image to augmentat the training dataset, but I don't commonly see blur and sharpness used. I suspect these two data augmentation techniques will go a long way to resolving your issue by themselves.
In case the OP (or others reading this) are not terribly familiar with what "data augmentation" is, I'll define it. It is common to warp your training images in various ways to generate endlessly unique images from your (otherwise finite) dataset. For example, randomly flipping the image left/right is quite simple, common, and effectively doubles your dataset. Changing contrast and brightness settings further alter your images. Adding these and other data augmentation transformations to your pipeline creates a much richer dataset and trains a network that is more robust to these common variations in images.
It's important that the data augmentation techniques you use produce realistic variations. For example, rotating an image is quite a realistic augmentation technique. If your training image is a cat standing horizontally, it's realistically possible that a future sample might be a cat at a 25-degree angle.

Do I need every class in a training image for object detection?

I just try to dive into TensorFlows Object Detection. I have a very small training set of circa 40 images yet. Each image can have up to 3 classes. But now the question came into my mind: Does every training image need every class? Is that important for efficient training? Or is it okay if an image may only have one of the object classes?
I get a very high total loss with ~8.0 and thought this might be the reason for this but I couldn't find an answer.
In general machine learning systems can cope with some amount of noise.
An image missing labels or having the wrong labels is fine as long as overall you have sufficient data for the model to figure it out.
40 examples for image classification sounds very small. It might work if you start with a pre-trained image network and there are few classes that are very easy to distinguish.
Ignore absolute the loss value, it doesn't mean anything. Look at the curve to see that the loss is decreasing and stop the training when the curve flattens out. Compare the loss value to a test dataset to check if the values are sufficiently similar (you are not overfitting). You might be able to compare to another training of the exact same system (to check if the training is stable for example).

Convolutional Neural Networks - reasons for changing image data range

When looking at data augmentation techniques for input images in a Convolutional Neural Network, it is often mentioned that you can change/rescale the range of image values from [0,255] to [0,1].
What is the reasoning behind this?
This is scaling (part of preprocessing inputs for any network, not just CNN). Why is it done? This is done to keep the ranges of all the features in the same region. You can refer this answer for more information about the same.
But, here in your case, you only have features regarding the pixel intensities of the image. So, why do you need scaling in this case? This is because most of the parameter initialization, that is being automatically done by the framework you are using, assumes that the data being passed to it is normalized. It tends to make network converge faster, as many researchers have spent time figuring out the right initialization for the network parameters.

How to fix incorrect guess in image recognition

I'm very new to this stuff so please bear with me. I followed a quick simple video about image recognition/classification in YT and the program indeed could classify the image with a high percentage. But then I do have some other images that was incorrectly classified.
On tensorflow site: https://www.tensorflow.org/tutorials/image_retraining#distortions
However, one should generally avoid point-fixing individual errors in
the test set, since they are likely to merely reflect more general
problems in the (much larger) training set.
so here are my questions:
What would be the best way to correct the program's guess? eg. image is B but the app returned with the results "A - 70%, B - 30%"
If the answer to one would be to retrain again, how do I go about retraining the program again without deleting the previous bottlenecks files created? ie. I want the program to keep learning while retaining previous data I already trained it to recognize.
Unfortunately there is often no easy fix, because the model you are training is highly complex and very hard for a human to interpret.
However, there are techniques you can use to try and reduce your test error. First make sure your model isn't overfitting or underfitting by observing the difference between train and test errors. If either is the case then try applying standard techniques, such as choosing a deeper model and/or using more filters if underfitting or adding regularization if overfitting.
Since you say you are already classifying correctly a high percentage of the time, I would start inspecting misclassified examples directly to try and gain insight into what you might be able to improve.
If possible, try and observe what your misclassified images have in common. If you are lucky they will all fall into one or a small number of categories. Here are some examples of what you might see and possible solutions:
Problem: Dogs facing left are misclassified as cats
Solution: Try augmenting your training set with rotations
Problem: Darker images are being misclassified
Solution: Make sure you are normalizing your images properly
It is also possible that you have reached the limits of your current approach. If you still need to do better consider trying a different approach like using a pretrained network for image recognition, such as VGG.

Image classification / detection - Objects being used in real life vs. stock photo images?

When training detection models, are images that are used in real life better (i.e. higher accuracy / mAP) than images of the same object but in the form of stock photo?
The more variety the better. If you train a network on images that all have a white background and expect it to perform under conditions with noisy backgrounds you should expect the results on unseen data to perform worse because the network never had a chance to learn distinguiting features of target object vs. background objects.
If you have images with transparent backgrounds one form of data augmentation that would be expected to improve results would be to place that image against many random backgrounds. The closer you come to realistic renderings of an image the better you can expect your results to be.
The more realistic examples you can augment your training dataset with, the better. Note that it generally does not help to add random noise to your data to generate larger training datasets, it only improves results when your expanded dataset contains realistic variants of the original images in the dataset.
My motto when training neural networks is this: The network will cheat any chance it gets. It will learn impressively well, but given the opportunity, it will take shortcuts. Don't let it take shortcuts. That often translates to: Make the problem harder such that no shortcut exists for it to take. Neural networks often perform better under more difficult conditions because the simplest solution it can arrive at is also the most general purpose. Read up on multi-task learning for some exciting examples that provide great food-for-thought.