I trained an Image Classifier with Tensforflow using a bunch of JPG images.
Let's say I have 3 classifiers, ClassifierA, ClassifierB, ClassifierC.
When testing the classifiers, I have no issues at all in 90% of the images I use as a test. But in some cases, I have misclassifications due to the image quality.
For example, the image below is the same, saved as BMP and JPG. You'll see little differences due to the format quality.
When I test the BMP version using tf.image.decode_bmp I get misclassifications, let's say ClassifierA 70%
When I test the JPG version using tf.image.decode_jpeg I get the right one, ClassifierB 90%
When I test the JPG version using tf.image.decode_jpeg and dct_method="INTEGER_ACCURATE" I get the right one with the much better result, ClassifierB 99%
What could be the issue here? Such difference between BMP and JPG, and how can I solve this if there's a solution?
update1: I retrained my Classifier using different effects and randomly changing the quality in which I save the images I use as a dataset.
Now, I get the right output, but still the percentages changes a lot, for example44% with BMP and +90% with JPG
This is a fabulous question, and even more fabulous of an observation. I'm going to use this in my own work in the future!
I expect you have just identified a rather fascinating issue with the dataset. It appears that your model is overfitting to features specific to JPG compression. The solution is to increase data augmentation. In particular, convert your training samples between various formats randomly.
This issue also makes me think that sharpening and blurring operations would make good data augmentation features. It's common to alter color, contrast, rotation, scale, orientation, and translation of the image to augmentat the training dataset, but I don't commonly see blur and sharpness used. I suspect these two data augmentation techniques will go a long way to resolving your issue by themselves.
In case the OP (or others reading this) are not terribly familiar with what "data augmentation" is, I'll define it. It is common to warp your training images in various ways to generate endlessly unique images from your (otherwise finite) dataset. For example, randomly flipping the image left/right is quite simple, common, and effectively doubles your dataset. Changing contrast and brightness settings further alter your images. Adding these and other data augmentation transformations to your pipeline creates a much richer dataset and trains a network that is more robust to these common variations in images.
It's important that the data augmentation techniques you use produce realistic variations. For example, rotating an image is quite a realistic augmentation technique. If your training image is a cat standing horizontally, it's realistically possible that a future sample might be a cat at a 25-degree angle.
Related
I am trying to build CNN model using TensorFlow at my own data set. But i faced with problem that is i have many pictures with different sizes. There are one kind of object in my pictures. If i make all pictures with same size, objects at pictures are not same size. In order to run CNN model with TensorFlow how to fix this problem? I heard one thing from others that is no matter having different size of input data, using tf.reduce_max, tf.reduce_mean is the best solution. if it is true that best solution to fix my problem, how to use this in my CNN model?
If i make all pictures with same size, objects at pictures are not same size.
If you know already how to make your input images to have the same size, you are ready for your task to train your CNN model. Unless you have a strict need to make the object for the picture to have the same size, it does not matter to the network.
Usual approach is to resize the images to a fixed size that is accepted by the network as input. This means distorting the aspect ratio of objects.
If that bothers you, you could try padding the images to a square (supposing the network input is a square) and then resize. This will keep the aspect ratio, but add some extra-information (the padding).
Another option is to crop the image to a square, if you are confident you are not losing important information and your task allows it.
I am trying to use the Tensorflow For Poets Google CodeLab as a template for a image classification project.
I use tens (maybe hundreds) of thousands of images with varying (relatively high) resolutions for retraining, but they are taking up too much disk space (over 10 GB) and I would like to downscale them to save some space.
As far as I understand, image resolution is not much of a concern here and it should not be an issue to scale down all the images (from roughly 4000x3000 to something much smaller).
I tried using 224x224 resolution and everything worked fine, but then I noticed some existing SO questions mentioning that the input images are being scaled to 299x299 rather than 224x224.
This made me wonder: What is the optimal input image resolution when using the code from the said CodeLab to make sure the images take up as little space as possible without making any sacrifice to the performance of the retrained model?
Was I sabotaging the process by overly downscaling the images? I use the mobilenet_v1_0.50_224 model, which is why I thought using 224x224 images for retraining would be the best way to go.
Given all my images have a high enough resolution, would I benefit from modifying the scripts to accept a larger image size?
When we retrain the image classifier layer in Mobilenet, the retrain script allows us to specific several parameters to preprocess the input images:
random_scale
random_crop
random_brightness
I would like to know how to determine these values? I saw in some articles they set random_brightness and random_scale to 30, and random_crop to 0.
Can someone help me to understand these parameters?
Found the answer from this link: https://github.com/tensorflow/hub/blob/master/docs/tutorials/image_retraining.md
A common way of improving the results of image training is by deforming, cropping, or brightening the training inputs in random ways. This has the advantage of expanding the effective size of the training data thanks to all the possible variations of the same images, and tends to help the network learn to cope with all the distortions that will occur in real-life uses of the classifier. The biggest disadvantage of enabling these distortions in our script is that the bottleneck caching is no longer useful, since input images are never reused exactly. This means the training process takes a lot longer (many hours), so it's recommended you try this as a way of polishing your model only after you have one that you're reasonably happy with.
I'm very new to this stuff so please bear with me. I followed a quick simple video about image recognition/classification in YT and the program indeed could classify the image with a high percentage. But then I do have some other images that was incorrectly classified.
On tensorflow site: https://www.tensorflow.org/tutorials/image_retraining#distortions
However, one should generally avoid point-fixing individual errors in
the test set, since they are likely to merely reflect more general
problems in the (much larger) training set.
so here are my questions:
What would be the best way to correct the program's guess? eg. image is B but the app returned with the results "A - 70%, B - 30%"
If the answer to one would be to retrain again, how do I go about retraining the program again without deleting the previous bottlenecks files created? ie. I want the program to keep learning while retaining previous data I already trained it to recognize.
Unfortunately there is often no easy fix, because the model you are training is highly complex and very hard for a human to interpret.
However, there are techniques you can use to try and reduce your test error. First make sure your model isn't overfitting or underfitting by observing the difference between train and test errors. If either is the case then try applying standard techniques, such as choosing a deeper model and/or using more filters if underfitting or adding regularization if overfitting.
Since you say you are already classifying correctly a high percentage of the time, I would start inspecting misclassified examples directly to try and gain insight into what you might be able to improve.
If possible, try and observe what your misclassified images have in common. If you are lucky they will all fall into one or a small number of categories. Here are some examples of what you might see and possible solutions:
Problem: Dogs facing left are misclassified as cats
Solution: Try augmenting your training set with rotations
Problem: Darker images are being misclassified
Solution: Make sure you are normalizing your images properly
It is also possible that you have reached the limits of your current approach. If you still need to do better consider trying a different approach like using a pretrained network for image recognition, such as VGG.
When training detection models, are images that are used in real life better (i.e. higher accuracy / mAP) than images of the same object but in the form of stock photo?
The more variety the better. If you train a network on images that all have a white background and expect it to perform under conditions with noisy backgrounds you should expect the results on unseen data to perform worse because the network never had a chance to learn distinguiting features of target object vs. background objects.
If you have images with transparent backgrounds one form of data augmentation that would be expected to improve results would be to place that image against many random backgrounds. The closer you come to realistic renderings of an image the better you can expect your results to be.
The more realistic examples you can augment your training dataset with, the better. Note that it generally does not help to add random noise to your data to generate larger training datasets, it only improves results when your expanded dataset contains realistic variants of the original images in the dataset.
My motto when training neural networks is this: The network will cheat any chance it gets. It will learn impressively well, but given the opportunity, it will take shortcuts. Don't let it take shortcuts. That often translates to: Make the problem harder such that no shortcut exists for it to take. Neural networks often perform better under more difficult conditions because the simplest solution it can arrive at is also the most general purpose. Read up on multi-task learning for some exciting examples that provide great food-for-thought.