I have just started TF Object Detection API two weeks ago, and manage to train a model to recognize a custom object, in my case, a Mecanum wheel.
Here's the details:
No. of training images = 125
All training images are around 500 x 500 (plus minus)
Transfer Learning
Model used = ssd_mobilenet_v1_coco
batch size = 2
total steps ran = 12715
loss is around 0.5000 - 2.5000, some time it fluctuate to more than 10, I am not sure why
Here's the result:
The first image is encouraging.
The second image starts to disappoint me a little. I expect the model to detect FOUR (four boxes) Mecanum wheel. Why?
Then, I suspect that's there's something wrong with my trained model. I tried with the sample test images, the third image and fourth image, then I am sure that this is totally not the model I first aim for.
I have been reading this post which I think our problems are quite similar (and he manage to solve it). He mentioned that the input image needs to be less than 600 x 1024, so I tried with fifth image and unsurprisingly, the result is again disappointing.
I went through the tutorial series by sentdex and in the comment sections, I notice that there are many people face this problem too. So, what to do now?
Can someone please help me to edit the list? Why can't I make it to one paragraph one list?
125 images? You will not be able to get very good results with that many images. If you want to validate that this is indeed the problem, try training with just subsets of your original 125 images.
For example, how bad is the output when you train on 10 images?
Does it get better when you use 50 images?
Does it get better yet when you use 125 images?
If the accuracy improves with increasing dataset size, you can extrapolate and guess that with 1000 images, you will be able to do even better. I would guess that that is your problem.
Related
I have a dataset of many images where images have 5 magnifications (x10, x20, x30, x40, x50) to the same class but they are not a sequence data, and all images are in RGB mode and with size 512x512 and I want to give this 5 images as an input to the CNN, and I don't know how.
Also, there is another problem which is once the model was well trained on the 5 image pipeline, is it okay or will it work, when I have only one image (one magnification, x10 as an example)?
You have asked two questions.
For the 1st one, there are two ways to do it. 1- you can design the model in a way that the input size is 5×512×512×3, and you go to train the model.
For your 2nd question, you need to design your model in a way to handle a feature absence or missing features. For a complicated one, that I can think about, you can design the model in this way,
You have 5 inputs, per image, and each image goes through one or more CNN, and after one or a few layers you merge those together.
For each input, you can consider an additional feature, a boolean to indicate if this current image should be considered in training or not ( is absent or present). During your training, you should make a combination of all 5, and also consider the absence of some, so that your model learns to handle the absence of one or more images out of 5 in the input.
I hope I was clear enough and it helps.
Good luck.
I've followed an object detection tutorial from pythonprogramming.net to recognize a small robot (my custom object) based on the ssd_mobilenet_v1_coco model.
I've about 450 labelled images of my robot.
I used the official sample config for ssd_mobilenet_v1_coco, and only made the necessary changes like num_class = 1, and reduced the batch size to 7, and trained until I had a loss that was consistently between 1 and 2 (about 10000 epochs).
The problem is, the model detects everything it used to know from its pre-trained state as my small robot. So it identifies everything as being a robot even though they aren't.
I faced this issue before. And fixed it by adding images contains pre-trained objects as negative examples. Another way to fix it is training longer. If you do both that will fix the problem i think. And try increasing your dataset by the way (i was training with 6000 images).
I just try to dive into TensorFlows Object Detection. I have a very small training set of circa 40 images yet. Each image can have up to 3 classes. But now the question came into my mind: Does every training image need every class? Is that important for efficient training? Or is it okay if an image may only have one of the object classes?
I get a very high total loss with ~8.0 and thought this might be the reason for this but I couldn't find an answer.
In general machine learning systems can cope with some amount of noise.
An image missing labels or having the wrong labels is fine as long as overall you have sufficient data for the model to figure it out.
40 examples for image classification sounds very small. It might work if you start with a pre-trained image network and there are few classes that are very easy to distinguish.
Ignore absolute the loss value, it doesn't mean anything. Look at the curve to see that the loss is decreasing and stop the training when the curve flattens out. Compare the loss value to a test dataset to check if the values are sufficiently similar (you are not overfitting). You might be able to compare to another training of the exact same system (to check if the training is stable for example).
Currently I am working on a project with convolutional network using tensorflow and I have set up the network and now i need to train it. I don't have a clue of how could the image should be for training. Like how much of % of the image the object is training on.
It's a cigarette that I have to detect and I have tried around 280 individual pictures where the cigarette is about2-5% of the image. I'm thinking of scrapping those pictures and take new one where the cigarette is about 30-50% of the image.
All the pictures are taking outside on the street environment.
So my question is: are there are any kind of rule regarding good pictures in a training set?
I will report back when I have tried my own solution
The object you are trying to recognise is too small. In the Sample, I think first one will be the best bet for you. Convolution neural network works by doing convolution operations on image pixels. In the second picture, background is too large compared to the object you are trying to recongise. Training on such data will not help you.
Just trying to answer your rule question:
Make sure that cigarette occupies maximum region of the Image. It can be 50% to 90% (with experience). You can still identify cigarettes with 2 to 3 % area, but you need millions of images with varying backgrounds.
CNN learns from the input image. Looking at the sample images you shared (I guess all the images are taken from road side platforms and grass areas). CNN may not learn to find the cigarette, instead it will learn to detect the common background, if your background occupies maximum area of the image. Please make sure to keep different background patterns.
I'm very new to this stuff so please bear with me. I followed a quick simple video about image recognition/classification in YT and the program indeed could classify the image with a high percentage. But then I do have some other images that was incorrectly classified.
On tensorflow site: https://www.tensorflow.org/tutorials/image_retraining#distortions
However, one should generally avoid point-fixing individual errors in
the test set, since they are likely to merely reflect more general
problems in the (much larger) training set.
so here are my questions:
What would be the best way to correct the program's guess? eg. image is B but the app returned with the results "A - 70%, B - 30%"
If the answer to one would be to retrain again, how do I go about retraining the program again without deleting the previous bottlenecks files created? ie. I want the program to keep learning while retaining previous data I already trained it to recognize.
Unfortunately there is often no easy fix, because the model you are training is highly complex and very hard for a human to interpret.
However, there are techniques you can use to try and reduce your test error. First make sure your model isn't overfitting or underfitting by observing the difference between train and test errors. If either is the case then try applying standard techniques, such as choosing a deeper model and/or using more filters if underfitting or adding regularization if overfitting.
Since you say you are already classifying correctly a high percentage of the time, I would start inspecting misclassified examples directly to try and gain insight into what you might be able to improve.
If possible, try and observe what your misclassified images have in common. If you are lucky they will all fall into one or a small number of categories. Here are some examples of what you might see and possible solutions:
Problem: Dogs facing left are misclassified as cats
Solution: Try augmenting your training set with rotations
Problem: Darker images are being misclassified
Solution: Make sure you are normalizing your images properly
It is also possible that you have reached the limits of your current approach. If you still need to do better consider trying a different approach like using a pretrained network for image recognition, such as VGG.