how to calculate the confidence of a softmax layer - tensorflow

I am working on a multi-class computer vision classification task and using a CNN with FC layers stacked on top using softmax activation, the problem is that lets say im classifying animals categories, if i predicted what a rock image is it will return a high probability for the most similar category of animals due to using softmax activation that returns a probabilistic distribution compressed between 0 and 1. what can i use to determine the confidence of my models probability output to say whether i can rely on these probabilities or not.
PS:I dont want to add a no_label class
Is it possible using keras functional api to have 2 outputs of the model the pre_softmax and the softmax output without updating the weights according to a linear activation which is the pre_softmax layer since the training would be affected

Is it possible using keras functional api to have 2 outputs of the model the pre_softmax and the softmax output without updating the weights according to a linear activation which is the pre_softmax layer since the training would be affected
Yes. You can do it like this
input = tf.keras.layers.Input((128,128,3))
x = tf.keras.layers.Conv2D(32,3)(input)
x = tf.keras.layers.MaxPooling2D()(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(128)(x)
non_softmax_output = tf.keras.layers.Dense(10)(x)
softmax_output = tf.keras.layers.Softmax()(non_softmax_output)
model = tf.keras.models.Model(inputs=input,outputs=[non_softmax_output,softmax_output])
model.summary()
>>>
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 128, 128, 3)] 0
conv2d_1 (Conv2D) (None, 126, 126, 32) 896
max_pooling2d_1 (MaxPooling (None, 63, 63, 32) 0
2D)
flatten_1 (Flatten) (None, 127008) 0
dense_23 (Dense) (None, 128) 16257152
dense_24 (Dense) (None, 10) 1290
softmax (Softmax) (None, 10) 0
=================================================================
Total params: 16,259,338
Trainable params: 16,259,338
Non-trainable params: 0
_________________________________________________________________
The easier alternative is to just work with the predictions from the softmax layer. You don't gather much from the linear layer without the activation. Those weights by themselves do not mean much. You could instead define a function outside the model that changes the predictions based on some threshold value
Assume you define only 1 output in the above model with a softmax layer. You can define a function like this to get predictions based on some threshold value you choose
def modify_predict(test_images,threshold):
predictions = model.predict(test_images)
max_values = np.max(predictions,axis=1)
labels = np.argmax(predictions,axis=1)
new_predictions = np.where(max_values > threshold, labels, 999) #You can use any indicator here instead of 999 for your no_label class
return new_predictions

On the first part of your question, the only way you can know how your
model will behave on non-animal pictures is by having non-animal pictures
in your data.
There are two options
The first is to include non-animal pictures in the training set (and dev and test sets), and to train the model to distinguish between animal / non-animal.
You could either build a separate binary classification model to distinguish animal/non-animal (as alrady suggesetd in comments), or you could integrate it into one model by having a
'non-animal' class. (Although I recognise you indicate this last option is
not something you want to do).
The second is to include non-animal pictures in the dev and test sets, but not in the training set. You can't then train the model to distinguish between animal and non-animal, but you can at least measure how it behaves on
non-animal pictures, and perhaps create some sort of heuristic for selecting only some of your model's predictions. This seems like a worse option to me, even though it's generally accepted that dev and test sets can come from a different distribution to the training set. It's something one might do if there were only a small number of non-animal pictures available, but that surely can't be the case here.
There is, for example, a large labelled image database
available at https://www.image-net.org/index.php

Related

How to feed new vectors into recurrent and convolutional keras model for real-time/streaming/live inference?

I have successfully trained a Keras/TensorFlow model consisting of layers SimpleRNN→Conv1D→GRU→Dense. The model is meant to run on an Apple Watch for real time inference, which means I want to feed it with a new feature vector and predict a new output for each time step. My problem is that I don't know how to feed data into it such that the convolutional layer receives the latest k outputs from the RNN layer.
I can see three options:
Feed it with one feature vector at a time, i.e. (1,1,6). In this case I assume that the convolutional layer will receive only one time step and hence zero pad for all the previous samples.
Feed it with the last k feature vectors for each time step, i.e. (1,9,6), where k = 9 is the CNN kernel length. In this case I assume that the state flow in the recurrent layers will not work.
Feed it with the last k feature vectors every k:th time step, again where k = 9 is the CNN kernel length. I assume this would work, but introduces unnecessary latency that I wish to avoid.
What I want is a model that I can feed with a new single feature vector for each time step, and it will automatically feed the last k outputs of the SimpleRNN layer into the following Conv1D layer. Is this possible with my current model? If not, can I work with the layer arguments, or can I introduce some kind of FIFO buffer layer between the SimpleRNN and Conv1D layer?
Here is my current model:
feature_vector_size = 6
model = tf.keras.models.Sequential([
Input(shape=(None, feature_vector_size)),
SimpleRNN(16, return_sequences=True, name="rnn"),
Conv1D(16, 9, padding="causal", activation="relu"),
GRU(12, return_sequences=True, name="gru"),
Dropout(0.2),
Dense(1, activation=tf.nn.sigmoid, name="dense")
])
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
rnn (SimpleRNN) (None, None, 16) 368
_________________________________________________________________
conv1d (Conv1D) (None, None, 16) 2320
_________________________________________________________________
gru (GRU) (None, None, 12) 1080
_________________________________________________________________
dropout (Dropout) (None, None, 12) 0
_________________________________________________________________
dense (Dense) (None, None, 1) 13
=================================================================
Edit:
After having researched the problem a bit, I have realized:
The Conv1D layer will zero pad in all three cases that I described, so option 3 won't work either. Setting padding="valid" solves this particular problem.
The SimpleRNN and GRU layers must have stateful=True. I found this description of how to make a model stateful after it has been trained stateless: How to implement a forward pass in a Keras RNN in real-time?
Keras sequence models seem to be made for complete, finite sequences only. The infinite streaming use case with one time step at a time isn't really supported.
However, the original question remains open: How can I build and/or feed new feature vectors into the model such that the convolutional layer receives the latest k outputs from the RNN layer?
For anyone else with the same problem: I couldn't solve the SimpleRNN to Conv1D data flow easily, so I ended up replacing the SimpleRNN layer with another Conv1D layer and setting padding="valid" on both Conv1D layers. The resulting model outputs exactly one time step when fed with a sequence of c * k - 1 time steps, where c is the number of Conv1D layers and k is the convolutional kernel length (c = 2 and k = 9 in my case):
feature_vector_size = 6
model = tf.keras.models.Sequential([
Input(shape=(None, feature_vector_size)),
Conv1D(16, 9, padding="valid", name="conv1d1"),
Conv1D(16, 9, padding="valid", name="conv1d2"),
GRU(12, return_sequences=True, name="gru"),
Dropout(0.2),
Dense(1, activation=tf.nn.sigmoid, name="dense")
])
After training, I make the GRU layer stateful according to How to implement a forward pass in a Keras RNN in real-time?. For real-time inference I keep a FIFO queue of the 17 latest feature vectors and feed all these 17 vectors into the model as an input sequence for each new time step.
I don't know if this is the best possible solution, but at least it works.

Keras output names are incorrect when tf ops are used as outputs

I'm having trouble setting output names correctly in a Keras model.
The use case here is a Tensorflow Serving model, which names inputs and outputs based on the layer names.
Inputs are easy enough to name. But outputs, if they aren't instances of keras.Layer, don't seem to have their names properly set as the output names in the model.
See the following example:
import tensorflow as tf
import tensorflow.keras as keras
input_0 = tf.keras.Input(shape=(10,), name="my_input_0")
x = keras.layers.Dense(units=1)(input_0)
output_0 = tf.math.log(x, name="my_output_0")
output_1 = tf.math.exp(x, name="my_output_1")
inputs = {
"my_input_0": input_0
}
outputs = {
"my_output_0": output_0,
"my_output_1": output_1
}
model = keras.Model(inputs, outputs)
model.summary()
The model summary has the correct name for the input layer, but does not have the correct names for either output layer, despite the fact that the name was specified both in the output dict keys, as well as the layer name itself.
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
my_input_0 (InputLayer) [(None, 10)] 0 []
dense_1 (Dense) (None, 1) 11 ['my_input_0[0][0]']
tf.math.log_1 (TFOpLambda) (None, 1) 0 ['dense_1[0][0]']
tf.math.exp_1 (TFOpLambda) (None, 1) 0 ['dense_1[0][0]']
==================================================================================================
Total params: 11
Trainable params: 11
Non-trainable params: 0
__________________________________________________________________________________________________
It's possible to work around this issue by wrapping the outputs in keras.Layer():
output_0 = keras.layers.Layer(name="my_output_0")(tf.math.log(x))
output_1 = keras.layers.Layer(name="my_output_1")(tf.math.exp(x))
But this adds extra layers to the model; I imagine the runtime cost is negligible, but it feels ugly and clogs up the model summary.
Is there a better way to accomplish this output naming? Remember, the core issue is that a Tensorflow Serving model sets the output names based on the output layer names, which are the same as those displayed in the summary.
One solution would be to use model.layers._name for every layer you want to rename.
For example:
model.layers[2]._name="my_output_0"
model.summary()
This should give the output:

Change fully convolutional network input shape in TF 2.3 and tf.keras

I'm working with tensorflow 2.3 and tf.keras
I've trained a network on images with input shape (None,120,120,12) . Actually I've also been able to train the model while declaring the input as (None,128,128,12) while feeding (None,120,120,12) batches because of a coding error. TF just printed out a warning and didn't care. This wasn't the behavior in previous versions. My network has only convolutional layers and, if the input size has enough powers of 2 considering the depth, it provides an output image of the same shape as the input, it has only convolutional layers.
I've finally fully trained this model and I'd like to apply it also to images of different shape. Is there a proper way to change the input shape of my trained model? Or should I define a new model and then copy the weights layer by layer? Or should I just forget about it and just accept the warnings and forget about them since it works anyway?
ah. You again. I think your problem is basically simple. Once you train your model with an input size. If you want to run the model, the input must be the same shape. However, if you want to take advantage of the trained model and believe that the features have learnt is not much different, then you can apply transferlearning, and of course, retrain it again. You don't have to copy weights, just freeze the model and train only the input and output. You can check this for some basic example with your VGG
base_model = tensorflow.keras.applications.VGG19(
weights='imagenet', # Load weights pre-trained on ImageNet.
input_shape=(224, 224, 3),
include_top=False)
base_model.trainable = False
inputs = layers.Input(shape=150,150,3)
x = base_model(inputs, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
outputs = keras.layers.Dense(1)(x)
model = keras.Model(inputs, outputs)
model.summary()
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_4 (InputLayer) [(None, 150, 150, 3)] 0
_________________________________________________________________
vgg19 (Model) multiple 20024384
_________________________________________________________________
global_average_pooling2d (Gl (None, 512) 0
_________________________________________________________________
dense (Dense) (None, 1) 513
=================================================================

Why does a 1x1 convolution layer work for feature reduction in a Neural Network Regression?

I would love some insight on this question - I've tried to find explanations in the literature, but I'm stumped. So I am building a neural network (using Keras) to solve a regression problem. I have ~500,000 samples with 20,000 features each, and am trying to predict a numerical output. Think predicting a house price based on a bunch of numerical measurements of the house, yard, etc. The features are arranged alphabetically so their neighboring features are fairly meaningless.
When I first tried to create a neural network, it suffered from severe overfitting if I provided all 20,000 features - manually reducing it to 1,000 features improved performance massively.
I read about 1x1 convolutional neural networks being used for feature reduction, but it was all used for images and 2D inputs.
So I built a basic neural network with 3 layers:
model = Sequential()
model.add(Conv1D(128, kernel_size=1, activation="relu", input_shape=(n_features,1)))
model.add(Flatten())
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(1, activation='linear'))
I also reshaped my training set as input from n_samples, n_features to:
reshaped= X_train.reshape(n_samples, n_features, 1) to conform with the expected input of Conv1D.
Contrary to normal dense neural networks, this works as though I manually selected the top performing features. My questions is - why does this work?? Replacing the convolution layer with a dense layer completely kills the performance. Does this even have anything to do with feature reduction or is something else going on entirely?
I thought 2d images use 1x1 convolutions to reduce the channel dimensions of the image - but I only have 1 channel with 1x1 convolution, so what's being reduced? Does setting my 1D convolution layer filters to 128 mean I have selected 128 features which are subsequently fed to the next layer? Are the features selected based on loss back propagation?
I'm having a lot of trouble visualizing what is happening to the information from my features.
Lastly, what if I were to then add another convolution layer down the road? Is there a way to conceptualize what would happen if I added another 1x1 layer? Is it further subsampling of features?
Thank you!
Let's augment your model with a Dense layer with 128 units and observe the summary for two models.
Conv Model
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model, Sequential
n_features = 1000 # your sequence length
model = Sequential()
model.add(Conv1D(128, kernel_size=1, activation="relu", input_shape=(n_features,1)))
model.add(Flatten())
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(1, activation='linear'))
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_1 (Conv1D) (None, 1000, 128) 256
_________________________________________________________________
flatten_1 (Flatten) (None, 128000) 0
_________________________________________________________________
dense_8 (Dense) (None, 100) 12800100
_________________________________________________________________
dense_9 (Dense) (None, 1) 101
=================================================================
Total params: 12,800,457
Trainable params: 12,800,457
Non-trainable params: 0
FC Model
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model, Sequential
n_features = 1000 # your sequence length
model = Sequential()
model.add(Dense(128, activation="relu", input_shape=(n_features,1)))
model.add(Flatten())
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(1, activation='linear'))
model.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_10 (Dense) (None, 1000, 128) 256
_________________________________________________________________
flatten_2 (Flatten) (None, 128000) 0
_________________________________________________________________
dense_11 (Dense) (None, 100) 12800100
_________________________________________________________________
dense_12 (Dense) (None, 1) 101
=================================================================
Total params: 12,800,457
Trainable params: 12,800,457
Non-trainable params: 0
_____________________________
As you can see both models have an identical number of parameters in each layer. But inherently they are completely different.
Let's say we have the inputs with length 4 only. A 1 convolution with 3 filters will use 3 separate kernels on those 4 inputs, each kernel will operate on a single element of input at a time as we have chosen kernel_size = 1. So, each kernel is just a single scalar value which will be multiplied with the input array one element at a time (bias will be added). The thing here is the 1 convolution doesn't look anywhere besides the current input meaning it doesn't have any spatial freedom, it only looks at current input point at a time. (this will become useful for later explanation)
Now, with dense/fc layer each neuron is connected to each input, meaning the fc layer has full spatial freedom, it looks everywhere. The equivalent Conv layer will be something with a kernel_size = 1000 (the actual input length).
So, why Conv1D 1 convolution is maybe performing better?
Well, it's hard to tell without actually looking into data properties. But one guess would be you're using features that don't have any spatial dependency.
You have chosen the features randomly and probably mixing them (looking at many input features at once doesn't help but learns some extra noise). This could be the reason why you're getting better performance with a Conv layer which only looks one feature at a time instead of an FC layer which looks at all of them and mixes them.

Want to check Intermediate Operations inside Keras Layer

I am facing floating point resolution loss during convolution operation while porting the code on my embedded processor which supports only half precision, so I want to test the intermediate operations that are performed layer by layer in my Keras based model which is performing good while on Full precision on my desktop.
In the following snippet of code I want to compute the 1DConv on the 1500x3 shaped input data. The kernel size is 10 and Kernel shape is (10x3x16).
To compute the 1D-Convolution, Keras does the Expand Dimensions on input shape and add one more dimension to it, which becomes suitable for 2D Convolution operation.
Then series of operations are called e.g. Conv2D followed by Squeeze and finally BiasAdd.
Finally the output of the Conv1D layer is pushed in
conv1d_20/Elu layer.
Please find the picture attached for full description of operations involved.
Now, I want to test the output much before the actual output of a Layer is produced.
Please see the below code:
Input_sequence = keras.layers.Input(shape=(1500,3))
encoder_conv1 = keras.layers.Conv1D(filters=16, kernel_size=10, padding='same', activation=tf.nn.elu)(Input_sequence)
The Model summary shows:
Model: "model_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_5 (InputLayer) [(None, 1500, 3)] 0
_________________________________________________________________
conv1d_20 (Conv1D) (None, 1500, 16) 496
I want to define the model output at conv1d_20/Conv2D but it gives me error. But the below is accepted at compilation.
encoder = keras.Model(inputs=autoencoder.input, outputs=autoencoder.get_layer('conv1d_20').output)
encoder.get_output_at(0)
It outputs
<tf.Tensor 'conv1d_20/Elu:0' shape=(?, 1500, 16) dtype=float32>
I want to test the output of Conv2D operation but it produces the output of conv1d_20/Elu.
How can I do this test. Please help me.
Conv1D operation
You can disable the bias(use_bias=False) and activation functions(activation=None) when defining the Conv1D operation.
Input_sequence = keras.layers.Input(shape=(1500,3))
encoder_conv1 = keras.layers.Conv1D(filters=16, kernel_size=10,
padding='same', use_bias=False,
activation=None)(Input_sequence)