I have a Tensorflow model that works reasonably well for detecting an object in an image and generating a bounding rectangle. The output includes one Softmax and 4 analog values for location. I need to add one more analog output for predicting the object orientation. How can I import the pre-trained model weights and freeze them so that only the part dealing with the orientation in the last layer will be trained.
Related
I am trying to apply GradCAM to my pre-trained CNN model to generate heat maps of layers. My custom CNN design is shown as follows:
- It adopted all the convolution layers and the pre-trained weights from the VGG16 model.
- Extract lower level features (early convolution layers) from VGG16.
- Train the fully connected layers of both normal/high and lower level features from VGG16.
- Concatenate outputs of both normal/high- and lower-level f.c. layers and then train more f.c. layers before the final prediction.
model design
I want to use GradCAM to visualize the feature maps of the low-level route and the normal/high-level route and I have done such heatmaps on non-concatenate fine-tuned VGG using the last convolutional layers. My question is, on a concatenated CNN model, can the Grad-CAM method still work using the gradient of the prediction with respect to the low- and high-level feature map feature maps respectfully? If not, are there other methods that can do the heatmaps visualization for such a model? Is using the shared fully connected layer an option?
Any idea and suggestions are much appreciated!
I have configured SSD mobilenet v1 and have trained the model previously as well. However in my dataset for each of the bounding box there are multiple class labels. My dataset is of faces each face have 2 labels: age and gender. Both these labels have the same bounding box coordinates.
After training on this dataset the problem that I encounter is that the model only labels the gender of the face and not the age. In yolo however both gender and age can be shown.
Is it possible to achieve multiple labels on a single bounding box using SSD mobile net ?
It depends on the implementation but SSD uses a softmax layer to predict a single class per bounding box, whereas YOLO predicts individual sigmoid confidence scores for each class. So in SSD a single class with argmax gets picked but in YOLO you can accept multiple classes above a threshold.
However you are really doing a multi-task learning problem with two types of outputs, so you should extend these models to predict both types of classes jointly.
I have implemented a form of the LeNet model via tensorflow and python for a Car number plate recognition system. My model was trained solely on my train data and tested on the test data. My dataset contains segmented images wherein every image has only one character in them. This is what my data looks like. My created model does not perform very well, so I'm now looking for models which I can use via Transfer Learning. Since most models, are already trained on a humongous dataset, I looked over a few like AlexNet, ResNet, GoogLeNet and Inception v2. Most of these models have not been trained on the type of data that I want which would be, Letters and digits.
Question: Should I still go forward with one of these models and train them on my dataset or are there any better models which would help ? For such models would keras be a better option since it is more high level than Tensorflow?
Question: I'd prefer to work with the LeNet model itself since training the other models would definitely take a long time due to the insufficient specs of my laptop. So is there any implementation of the model which uses machine printed character images to train the model which I could use to then train the final layers of the model on my data?
to get good results you should use a model explicitly designed for text recognition.
First, (roughly) crop the input image to the region around the text.
Then, feed the image of the text into a neural network (NN) to detect the text.
A typical NN for text recognition extracts relevant features (with convolutional NN), propagates those features through the image (with recurrent NN) and finally predicts a character score for each position in the image.
Usually, those networks are trained with the CTC loss.
As a starting point I would suggest looking at the CRNN implementation (they also provide a pre-trained model) [1] and the corresponding paper [2]. There is, as far as I remember, also a TensorFlow implementation on github.
You can use any framework (e.g TensorFlow or CNTK or ...) you like as long as it features convolutional and recurrent NN and the CTC loss.
I once attended a presentation about CNTK where they claimed that they have a very fast implementation of recurrent NN - so maybe CNTK would be a good choice for your slow computer?
[1] CRNN implementation: https://github.com/bgshih/crnn
[2] Shi - An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
I retrained the inception-v3 model on my own classes and I am encountering a problem there:
When I predict the class of a specific image I get exactly the same result as when I rotate that image by 90 or 180 degree and predict the class of the rotated image.
So got confused and I am asking myself: Is the tensorflow inception-v3 model rotation invariant?
In my case the rotation of the object is important and an image x can be of class A but when x gets rotated it becomes and object of class B (For example when classifying digits: a by 180 degree rotated 6 becomes a 9).
InceptionV3 is not rotationally invariant. Indeed, InceptionV3 contains of convolutional layers, which means that a small (say, 3x3) block is multiplied by a trained 3x3 set of weights. Those weights are not restricted to being rotationally invariant, so the network can and will produce different activations when the input is rotated.
That said, Inception is a fairly smart network, and if you feed it with (say) an image of a rotated dog, it should have no difficulty figuring out that this is still a dog (or at least, more similar to a dog than to any other class). You should notice though that the class probabilities change somewhat for the rotated image.
I am trying to use Tensorflow for transfer learning using a pre-trained VGG16 model.
However, the input to the model in my problem is an RGB image with an extra channel functioning as a binary mask. This is different than the original input on which the model was trained (224x224 RGB images).
I think that using the pretrained model is still possible in this case. How do I assign weights for connections between the first convolutional layer and the extra channel? Is transfer learning still applicable in such a scenario?
Thanks!