Get more accuracy DeepLab then the pretrainned model - tensorflow

I have a question in deeplab v3 accuracy right now the best accuracy(mIOU) achieved by the model xception65_coco_voc_trainval 87.80. I was wondering that can we further increase it if i train the model for less number of class lets say for person only using the same dataset or it will not make any difference.
The problem is along the edges model has problem identifying it correctly. The orignal image had all green background and after applying the mask some greenish remains. So 2 questions in short
1. If i train for single class then will accuracy improve ?
2. If not then is there any other method i can apply to achive better results
Note:
Background doesn't have to be green in all cases.
i saw the remove.bg website and they have done good job, does any one knows what approach have they taken to achieve good results
Thanks

Related

Is it possible to feed multiple images input to convolutional neural network

I have a dataset of many images where images have 5 magnifications (x10, x20, x30, x40, x50) to the same class but they are not a sequence data, and all images are in RGB mode and with size 512x512 and I want to give this 5 images as an input to the CNN, and I don't know how.
Also, there is another problem which is once the model was well trained on the 5 image pipeline, is it okay or will it work, when I have only one image (one magnification, x10 as an example)?
You have asked two questions.
For the 1st one, there are two ways to do it. 1- you can design the model in a way that the input size is 5×512×512×3, and you go to train the model.
For your 2nd question, you need to design your model in a way to handle a feature absence or missing features. For a complicated one, that I can think about, you can design the model in this way,
You have 5 inputs, per image, and each image goes through one or more CNN, and after one or a few layers you merge those together.
For each input, you can consider an additional feature, a boolean to indicate if this current image should be considered in training or not ( is absent or present). During your training, you should make a combination of all 5, and also consider the absence of some, so that your model learns to handle the absence of one or more images out of 5 in the input.
I hope I was clear enough and it helps.
Good luck.

Do I need every class in a training image for object detection?

I just try to dive into TensorFlows Object Detection. I have a very small training set of circa 40 images yet. Each image can have up to 3 classes. But now the question came into my mind: Does every training image need every class? Is that important for efficient training? Or is it okay if an image may only have one of the object classes?
I get a very high total loss with ~8.0 and thought this might be the reason for this but I couldn't find an answer.
In general machine learning systems can cope with some amount of noise.
An image missing labels or having the wrong labels is fine as long as overall you have sufficient data for the model to figure it out.
40 examples for image classification sounds very small. It might work if you start with a pre-trained image network and there are few classes that are very easy to distinguish.
Ignore absolute the loss value, it doesn't mean anything. Look at the curve to see that the loss is decreasing and stop the training when the curve flattens out. Compare the loss value to a test dataset to check if the values are sufficiently similar (you are not overfitting). You might be able to compare to another training of the exact same system (to check if the training is stable for example).

TensorFlow Custom Object Detection Disappointing Result - Why?

I have just started TF Object Detection API two weeks ago, and manage to train a model to recognize a custom object, in my case, a Mecanum wheel.
Here's the details:
No. of training images = 125
All training images are around 500 x 500 (plus minus)
Transfer Learning
Model used = ssd_mobilenet_v1_coco
batch size = 2
total steps ran = 12715
loss is around 0.5000 - 2.5000, some time it fluctuate to more than 10, I am not sure why
Here's the result:
The first image is encouraging.
The second image starts to disappoint me a little. I expect the model to detect FOUR (four boxes) Mecanum wheel. Why?
Then, I suspect that's there's something wrong with my trained model. I tried with the sample test images, the third image and fourth image, then I am sure that this is totally not the model I first aim for.
I have been reading this post which I think our problems are quite similar (and he manage to solve it). He mentioned that the input image needs to be less than 600 x 1024, so I tried with fifth image and unsurprisingly, the result is again disappointing.
I went through the tutorial series by sentdex and in the comment sections, I notice that there are many people face this problem too. So, what to do now?
Can someone please help me to edit the list? Why can't I make it to one paragraph one list?
125 images? You will not be able to get very good results with that many images. If you want to validate that this is indeed the problem, try training with just subsets of your original 125 images.
For example, how bad is the output when you train on 10 images?
Does it get better when you use 50 images?
Does it get better yet when you use 125 images?
If the accuracy improves with increasing dataset size, you can extrapolate and guess that with 1000 images, you will be able to do even better. I would guess that that is your problem.

deep q learning is not converging

I'm experimenting with deep q learning using Keras , and i want to teach an agent to perform a task .
in my problem i wan't to teach an agent to avoid hitting objects in it's path by changing it's speed (accelerate or decelerate)
the agent is moving horizontally and the objects to avoid are moving vertically and i wan't him to learn to change it's speed in a way to avoid hitting them .
i based my code on this : Keras-FlappyBird
i tried 3 different models (i'm not using convolution network)
model with 10 dense hidden layer with sigmoid activation function , with 400 output node
model with 10 dense hidden layer with Leaky ReLU activation function
model with 10 dense hidden layer with ReLu activation function, with 400 output node
and i feed to the network the coordinates and speeds of all the object in my word to the network .
and trained it for 1 million frame but still can't see any result
here is my q-value plot for the 3 models ,
Model 1 : q-value
Model 2 : q-value
Model 3 : q-value
Model 3 : q-value zoomed
as you can see the q values isn't improving at all same as fro the reward ... please help me what i'm i doing wrong ..
I am a little confused by your environment. I am assuming that your problem is not flappy bird, and you are trying to port over code from flappy bird into your own environment. So even though I don't know your environment or your code, I still think there is enough to answer some potential issues to get you on the right track.
First, you mention the three models that you have tried. Of course, picking the right function approximation is very important for generalized reinforcement learning, but there are so many more hyper-parameters that could be important in solving your problem. For example, there is the gamma, learning rate, exploration and exploration decay rate, replay memory length in certain cases, batch size of training, etc. With your Q-value not changing in a state that you believe should in fact change, leads me to believe that limited exploration is being done for models one and two. In the code example, epsilon starts at .1, maybe try different values there up to 1. Also that will require messing with the decay rate of the exploration rate as well. If your q values are shooting up drastically across episodes, I would also look at the learning rate as well (although in the code sample, it looks pretty small). On the same note, gamma can be extremely important. If it is too small, you learner will be myopic.
You also mention you have 400 output nodes. Does your environment have 400 actions? Large action spaces also come with their own set of challenges. Here is a good white paper to look at if indeed you do have 400 actions https://arxiv.org/pdf/1512.07679.pdf. If you do not have 400 actions, something is wrong with your network structure. You should treat each of the output nodes as a probability of which action to select. For example, in the code example you posted, they have two actions and use relu.
Getting the parameters of deep q learning right is very difficult, especially when you account for how slow training is.

Tensorflow - retrain inception for ethnicity recognition and hair color

I'm new to tensorflow and the inception model. I found the following tutorial online (https://www.tensorflow.org/versions/master/how_tos/image_retraining/) and wanted to test this on my own project.
I'm trying to let the model recognize ethnicity based on people in a picture. I have made a training set of approx 850 images per category.
For some reason I'm unable to get more than a 65% accuracy level. I tried increasing the training steps and amount of images as well.
Maybe the inception model is not the correct model to use for this?
Could someone point me in a good direction of what I can do to improve the results?
Regards,
P.
Do you get 65% accuracy on the train or on the test set?
If it is on the train set, you are probably doing something wrong with your code.
If it is on the test set, you are indeed using the wrong model. The Inception model is a very very big model, and having only 850 images per category won't give you a good general model. It will simply "remember" those 850 images. (think of remembering the answer to each question on a test, instead of learning for a test)
Maybe you can try building a simpler, smaller model first and see how well that model learns!