I had been wondering whether data augmentation is done using config file of yolo or is implemented in src/image.c. I am talking with reference to darknet and study of issue #1408.
I am addressing this issue after a lot of research. If i look into the answer of issue #1408 darknet, it is clear that augmentation is implemeted in image.c located in src folder of darkenet repository but on the other side if I look into the yolo.cfg file and points to line 13, 14, 15 and 16 parameters of augments are defined there as angle, saturation, exposture and hue.
I am confused. If I look into the file image.c located in the src directory, methods for the augmentations are load_data_detection() and rotate_crop_image but since these are just methods. How values are supplied to them.
Another point where i am throwing light is about angle rotation support yolo. I read somewhere that yolo angle support is not provided. Can some elaboration can be made to this point.
During training yolo model with the images I got results in the terminal which shows that image number has increased during training like I was doing training with 400 images but during training, i got the below output
1180: 1.007143, 1.776118 avg loss, 0.001000 rate, 3.467960 seconds, 75520 images, 3.077078 hours left
Any useful helps or links can be worth to me.
[![enter image description here][1]][1]I am actually reconstructing some images using dual photography. Next, I want to train a network to reconstruct clear images by removing noise (Denoising autoencoder).
The input for training the network is reconstructed images, whereas, the output is ground truth or computer based standard test images. Now the input e.g., Lena is some how not exact version of Lena with image shifted in positions and some artifacts.
If I keep input as my reconstructed image and training output as Lena test image (computer standard test image) , will it work?
I only want to know if input/output shifted or some details missing in one of them (due to some cropping) would work.
It depends on many factors like your images for training and the architecture of the network.
However, what you want to do is to make a network that learns the noise or low level information and for this purpose Generative Adversarial Networks (GAN) are very popular. You can read about them here. Maybe, after you have tried your approach and if the results are not satisfactory then try using GANs, like, DCGAN (Deep Convolution GAN).
Also, share your outcomes with the community if you would like.
Denoising Autoencoders! Love it!
There is no reason for not training your model with those images. The autoencoder, if well trained, will eventually learn the transformation if there is enough data.
However, if you have the 'positive' images, I strongly recommend you to create your own noisy images and then train in that controlled working area. You will simplify your problem and it will be easier to solve.
What is stopping you from doing just that?
Im currenty working on a project at University, where we are using python + tensorflow and keras to train an image object detector, to detect different parts of the root system of Arabidopsis.
Our current ressults are pretty bad, as we do only have about 100 images to train the model with at this moment, but we are currently working on cultuvating more plants in order to get more images(more data) to train the tensorflow model.
We have implemented the following Mask_RCNN model:Github- Mask_RCNN tensorflow
We are looking to detect three object clases: stem, main root and secondary root.
But the model detects main roots incorrectly where the secondary roots are located.
It should be able to detect something like this:Root detection example
Training root data set that we are using right now:training images
What is the usual sample size that is used to train a neural network accurate results?
First off: I think there is no simple rule to estimate the sample size but at least it depends on:
1. Quality of your images
I downloaded the images and I think you need to preprocess them before you can use it to reduce the "problem complexity". In some projects, in which I worked with biological data, a background removal (image - low pass filter) was the key to get better results. But you should definitely remove/crop the area outside the region of your interest (like the tape and the ruler). I would try to get the cleanest data set as possible (including manually adjustments cv2/ gimp/ etc.) to focus the network to solve "the right problem".. After that you could apply some random distortion to make it also work on fuzzy/bad/realistic images as well.
2. The way you work with your data
There are a few tricks that enables you to "expand" your dataset.
Sometimes it's very helpful to let a generator method crop random small patches from your input data. This allows you to work with more batches (on small gpus) and gives your network more "variety", (just think about the conv2d task: if you don't use random cropping your filters will slide over the same areas over and over again (at the same image)). Because of the same reason: apply random distortion, flip and rotate your images.
3. Network architecture
In your case I would prefer a U-Net architecture with a last conv2d output of 3 (your classes) feature maps, a final softmax activation and an categorical_crossentropy, this enables you to play with the depth, because sometimes you need sophisticated architectures to solve a problem (close to 100%) but in your case you just want to see a first working result. So fewer layers and a simple architecture could also help you to get things work. Maybe there are some trained network weights for a U-Net which meets your requirements (search on kaggle for example). Because it is also helpful (to reduce the data you need) to use "transfer learning" -> use the first layers of an network (weights) which is already trained. Using a semantic segmentation the first filters will become something like an edge detection for the most given problems/images.
4. Your mental model of "accurate results"
This is the hardest part.. because it evolves during your project. Eg. in the same moment your networks starts to perform well on preprocessed input images you will start to think about architecture/data changes to make it work on fuzzy images as well. This is why you should start with a feasible problem but always improve your dataset (including rare kinds of roots) and tune your network architecture step by step.
I just try to dive into TensorFlows Object Detection. I have a very small training set of circa 40 images yet. Each image can have up to 3 classes. But now the question came into my mind: Does every training image need every class? Is that important for efficient training? Or is it okay if an image may only have one of the object classes?
I get a very high total loss with ~8.0 and thought this might be the reason for this but I couldn't find an answer.
In general machine learning systems can cope with some amount of noise.
An image missing labels or having the wrong labels is fine as long as overall you have sufficient data for the model to figure it out.
40 examples for image classification sounds very small. It might work if you start with a pre-trained image network and there are few classes that are very easy to distinguish.
Ignore absolute the loss value, it doesn't mean anything. Look at the curve to see that the loss is decreasing and stop the training when the curve flattens out. Compare the loss value to a test dataset to check if the values are sufficiently similar (you are not overfitting). You might be able to compare to another training of the exact same system (to check if the training is stable for example).
I am using a tensorflow framework and I have noticed that there are major variances in the size of the tensorflow model files.
For example the framework provides 2 models:
one of pretrained model to be used with fine tuning for example
and one which contains an untrained version.
They both have a size of 172.539 kb
When I apply fine tuning in my model with some minor changes in the graph (there is a module in framework for that) and save my model the size remains essentially the same: 178.525 kb.
First, I am bit surprised that my fine-tuned model is somewhat bigger since I change just the last layer from 21 to 14 classes so I would expect a somewhat smaller model file size but since the difference is so little I didn't pay attention.
But when I trained the same model using the same model file (the pretrained one I mean) and saved the model in disk the file size is quite different: 340.097 kb. By the term train I mean I allow the network to modify all parameter not just the parameters of the last layer.
The model that is being implemented is a variation of resnet for semantic image segmentation (if can someone deduct the expected model file size from the model itself).
So, my questions are why I have such a variance in the model file sizes and how come my saved fine-tuned model is larger than the original model? Is there a way to include/exclude parameters in the model to be saved?
P.S.1 Some information that might be handy:
I am using tensorflow v2 model saving while I think the framework files use v1. I am not sure how to identify this besides the fact that the former produces 3 files.
The framework is called tensorflow-deeplab-resnet and can be found here and the models are here.
I am not sure stack overflow it 's the right place for this question either.
That is because, when training models and saving them, Tensorflow will also save the gradients of your ops.
So allowing training on the last layer will increase the size of your saved model a little. And allowing training on the whole model will essentially double the size of the save file because each op will have its gradients saved.
I'm attempting to train a faster-rccn model for small digit detection. I'm using the newly released tensorflow object detection API and so far have been fine tuning a pre-trained faster_rcnn_resnet101_coco from the zoo. All my training attempts have resulted in models with high precision but low recall. Out of the ~120 objects (digits) on each image only ~20 objects are ever detected, but when detected the classification is accurate. (Also, I am able to train a simple convnet from scratch on my cropped images with high accuracy so the problem is in the detection aspect of the model.) Each digit is on average 60x30 in the original images (and probably about half that size after the image is resized before being fed into the model.) Here is an example image with detected boxes of what I'm seeing:
What is odd to me is how it is able to correctly detect neighboring digits but completely miss the rest that are very similar in terms of pixel dimensions.
I have tried adjusting the hyperparameters around anchor box generation and first_stage_max_proposals but nothing has improved the results so far. Here is an example config file I have used. What other hyperparameters should I try adjusting? Any other suggestions on how to diagnose the problem? Should I be looking into other architectures or does my task look doable with faster-rccn and/or SSD?
In the end the immediate problem was that I was not using the visualizer correctly. By updating the parameters for visualize_boxes_and_labels_on_image_array as described by Johnathan in the comments I was able to see that that I am at least detecting more boxes than I had thought.
I check your config gile, you are decreasing the resolution of your image to 1024. The region of your digit will not contain a lot of pixel and you are loosing some information. What I suggest is to train the model with an another dataset (smaller images). You can for example crop the images in 4 four area.
If you have a good GPU increase the max dimension in the image_resizer, but I guess you will run out of memory