Problems recognizing color for pretrained Inception -v3 - tensorflow

I'm retraining the last layer of a pre-trained Inception V3 model: https://www.tensorflow.org/tutorials/image_retraining
My image recognition task is to try to identify the species of mushrooms.
When I look at the prediction results for misclassified images, the model is often assigning a mushroom to a species with a similar shape, but a totally different color. So, it seems like the Inception model is more sensitive to shape and less sensitive to color. Unfortunately, color is a very important feature for this problem.
Any idea what I can do?

Related

TensorFlow False Detection

I have trained TensorFlow model with 200 car images, they can success detection:
But, when there is no car in the image, a false car detection occurs, why?, how i can prevent this?:
Your model needs to be trained with negative examples (i.e. images without any cars in them). It needs to understand that it is possible that an image doesn't have any car.
Further, it looks like the model has understood some unrelated features as car features. E.g.: the parking lot area. To avoid this use random erasing augmentation while training model.
Random erasing -Inspired by the mechanisms of dropout regularization, random erasing can be seen as analogous to dropout except in the input data space rather than embedded into the network architecture. By removing certain input patches, the model is forced to find other descriptive characteristics.

Multiple labels on a single bounding box for tensorflow SSD mobilenet

I have configured SSD mobilenet v1 and have trained the model previously as well. However in my dataset for each of the bounding box there are multiple class labels. My dataset is of faces each face have 2 labels: age and gender. Both these labels have the same bounding box coordinates.
After training on this dataset the problem that I encounter is that the model only labels the gender of the face and not the age. In yolo however both gender and age can be shown.
Is it possible to achieve multiple labels on a single bounding box using SSD mobile net ?
It depends on the implementation but SSD uses a softmax layer to predict a single class per bounding box, whereas YOLO predicts individual sigmoid confidence scores for each class. So in SSD a single class with argmax gets picked but in YOLO you can accept multiple classes above a threshold.
However you are really doing a multi-task learning problem with two types of outputs, so you should extend these models to predict both types of classes jointly.

Tensorflow RGB-D Training

I have RGB-D (color&depth) images for given scene. I would like to use tensorflow to train a classification model based on pre-trained network such as inception. As far as I understood, these pre-trained models were built using 3-channel RGB images. However, the inclusion of 4th channel cannot be handled.
How do I use RGB-D images directly? Do I need to pre-process the images, and seperate RGB and D, if so, how do I use the D (1-channel) alone?
Thank you!
If you want to use a pre-trained model you can only use RGB, as they were only trained to understand RGB. In this case, it is as you said: separate them and discard depth.
To use a 4 channel image like this you would need to retrain the network from scratch rather than loading a pre-trained set of weights.
You will probably get good results using the same architecture as is used for 3 channel images (save for the minor change required to support the 4 channel input), so retraining shouldn't be terribly hard.

Pre Trained LeNet Model for License plate Recognition

I have implemented a form of the LeNet model via tensorflow and python for a Car number plate recognition system. My model was trained solely on my train data and tested on the test data. My dataset contains segmented images wherein every image has only one character in them. This is what my data looks like. My created model does not perform very well, so I'm now looking for models which I can use via Transfer Learning. Since most models, are already trained on a humongous dataset, I looked over a few like AlexNet, ResNet, GoogLeNet and Inception v2. Most of these models have not been trained on the type of data that I want which would be, Letters and digits.
Question: Should I still go forward with one of these models and train them on my dataset or are there any better models which would help ? For such models would keras be a better option since it is more high level than Tensorflow?
Question: I'd prefer to work with the LeNet model itself since training the other models would definitely take a long time due to the insufficient specs of my laptop. So is there any implementation of the model which uses machine printed character images to train the model which I could use to then train the final layers of the model on my data?
to get good results you should use a model explicitly designed for text recognition.
First, (roughly) crop the input image to the region around the text.
Then, feed the image of the text into a neural network (NN) to detect the text.
A typical NN for text recognition extracts relevant features (with convolutional NN), propagates those features through the image (with recurrent NN) and finally predicts a character score for each position in the image.
Usually, those networks are trained with the CTC loss.
As a starting point I would suggest looking at the CRNN implementation (they also provide a pre-trained model) [1] and the corresponding paper [2]. There is, as far as I remember, also a TensorFlow implementation on github.
You can use any framework (e.g TensorFlow or CNTK or ...) you like as long as it features convolutional and recurrent NN and the CTC loss.
I once attended a presentation about CNTK where they claimed that they have a very fast implementation of recurrent NN - so maybe CNTK would be a good choice for your slow computer?
[1] CRNN implementation: https://github.com/bgshih/crnn
[2] Shi - An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

deep learning for shape localization and recognition

There is a set of images, each of which contains different shape entities, such as shown in the following figure. I am trying to localize and recognize these different shapes. For instance, adding a bounding box for each different shape and maybe even label it. What are the major research papers/deep learning models that have been able to solve this kind of problem?
Object detection papers such as rcnn, faster rcnn, yolo and ssd would help you solve this if you were bent on using a deep learning approach.
It’s easy to say this is a trivial problem that can be solved with tools in OpenCV and deep learning is overkill, but I can see many reasons to use deep learning tools and that does not answer your question.
We assume that your shapes has different scales and rotations. Actually your main image shown above is very large for training process and it needs a lot of training samples to generate a good accuracy at the end on test samples. In this case it is better to train a Convolutional Neural Network on a short images (like 128x128) with only one shape per each image and then use slide trick!
This project will have three main steps:
Generate test and train samples, each image should have only one shape
Train a classifier to recognize a single shape within each input image
Use slide trick! Break your original image containing many shapes to overlapping blocks of size 128x128. Pass each block to your model trained in the second step.
In this way at the end you will have label for each shape from your trained model, and also you will have location of each shape using slide trick.
For the classifier you can use exactly CNN structure of Tensorflow's MNIST tutorial.
Here is a paper with exactly same method applied to finger print images to extract local features.
A direct fingerprint minutiae extraction approach based on convolutional neural networks