Is the tensorflow inception-v3 model rotation invariant? - tensorflow

I retrained the inception-v3 model on my own classes and I am encountering a problem there:
When I predict the class of a specific image I get exactly the same result as when I rotate that image by 90 or 180 degree and predict the class of the rotated image.
So got confused and I am asking myself: Is the tensorflow inception-v3 model rotation invariant?
In my case the rotation of the object is important and an image x can be of class A but when x gets rotated it becomes and object of class B (For example when classifying digits: a by 180 degree rotated 6 becomes a 9).

InceptionV3 is not rotationally invariant. Indeed, InceptionV3 contains of convolutional layers, which means that a small (say, 3x3) block is multiplied by a trained 3x3 set of weights. Those weights are not restricted to being rotationally invariant, so the network can and will produce different activations when the input is rotated.
That said, Inception is a fairly smart network, and if you feed it with (say) an image of a rotated dog, it should have no difficulty figuring out that this is still a dog (or at least, more similar to a dog than to any other class). You should notice though that the class probabilities change somewhat for the rotated image.

Related

What kind of Neural network to use for classifing a dotted image to a number depending on the number of and size of dots?

I am currently trying to train a CNN to classify images that consists of dots to a class where the class is a value depending on the number and size of the dots. More dots should be in a high number class and less dots a low number class.
I wonder if there is an alternative to a CNN for this task. I just started designing a CNN since it was an image problem but then realized that the difference to other object classification problems in images is that these images don't really have the same properties, like edges for example, that object images have.
The main goal is to get a number out of the network when the input is an image of this kind and I don't have a preference of how to do it except that it must be a Machine learning solution.
This is how the images look. I have the possibility to use two different kinds where one is the original and the other is binary grayscale black and white.
Binary black and white image
Original image
you can convert image to binary, where pixel will have 0 and 1, assume 0 is background and 1 is the dots. you can sum all the 1`s in that image to get class of yours, to normalize output you can divide it but some number.
if you want machine learning solution then just feed that binary image to a single Dense layer, and it will be a regression problem not classification.
your output activation function should be RELU, and loss function MSE.

Is it reasonable to change the input shape for a trained convolutional neural network

I've seen a number of super-resolution networks that seem to imply that it's fine to train a network on inputs of (x,y,d) but then pass in images of arbitrary sizes into a model for prediction, that in Keras for example is specified with the placeholder values (None,None,3) and will accept any size.
for example https://github.com/krasserm/super-resolution is trained on inputs of 24x24x3 but accepts arbitrary sized images for resize, the demo code using 124x118x3.
Is this a sane practice? Does the network when given a larger input simply slide a window over it applying the same weights as it learnt on the smaller size image?
Your guess is correct. Convolutional layers learn to distinguish features at the scale of their kernel, not at the scale of the image as a whole. A layer with a 3x3 kernel will learn to identify a feature up to 3x3 pixels large and will be able to identify that feature in an image whether the image is itself 3x3, 100x100, or 1080x1920.
There will be absolutely no problem with the convolutions, they will work exactly as they are expected to, with the same weights, same kernel size, etc. etc.
The only possible problem is: the model may not have learned the new scale of your images (because it has never seen this scale before) and may give you poor results.
On the other hand, the model could be trained with many sizes/scales, becoming more robust to variation.
There will be a problem with Flatten, Reshape, etc.
Only GlobalMaxPooling2D and GlobalAveragePooling2D will support different sizes.

Tensorflow: Load pre-trained model weights into a different architecture

I have a Tensorflow model that works reasonably well for detecting an object in an image and generating a bounding rectangle. The output includes one Softmax and 4 analog values for location. I need to add one more analog output for predicting the object orientation. How can I import the pre-trained model weights and freeze them so that only the part dealing with the orientation in the last layer will be trained.

Multiple labels on a single bounding box for tensorflow SSD mobilenet

I have configured SSD mobilenet v1 and have trained the model previously as well. However in my dataset for each of the bounding box there are multiple class labels. My dataset is of faces each face have 2 labels: age and gender. Both these labels have the same bounding box coordinates.
After training on this dataset the problem that I encounter is that the model only labels the gender of the face and not the age. In yolo however both gender and age can be shown.
Is it possible to achieve multiple labels on a single bounding box using SSD mobile net ?
It depends on the implementation but SSD uses a softmax layer to predict a single class per bounding box, whereas YOLO predicts individual sigmoid confidence scores for each class. So in SSD a single class with argmax gets picked but in YOLO you can accept multiple classes above a threshold.
However you are really doing a multi-task learning problem with two types of outputs, so you should extend these models to predict both types of classes jointly.

How can I normalize my loss function in relation to the number of elements in a class in Tensorflow?

In an image segmentation problem, we will usually get a background label that dominates the mask and the object(s) occupying only a small area in the mask.
As such, during training, I have observed that my neural network does very well in classifying the background label but really poor at classifying the objects.
I am using the Tensorflow's tf.nn.sparse_softmax_cross_entropy_with_logits as my lost function. Maybe in relation to this problem, my loss value is also not decreasing/converging.
I was told that maybe I should consider adding a weighting factor to my loss function so that the dominant class will have a small weight compared to other non-dominant class. Can anyone share some insights?