cost function after converting tf.layers to tf.keras.layers - tensorflow

I have a CNN where output dimension is [None, 10]
It is a multi-label problem, where output signifies possible categories which x might belong. (eg, an image can be classified as cat dark and so on)
Following is what I have now, how can I change the code to keras version?
I can't find equivalent of sigmoid_cross_entropy_with_logits
model = tf.layers.dense(L3, category_num, activation=None)
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(logits=model, labels=Y)
cost = tf.reduce_mean(tf.reduce_sum(cross_entropy, axis=1))
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)

Direct alternative in Keras is to use sigmoid activation in your output layer and binary_crossentropy as cost function.
net.add(Dense(..., activation='sigmoid'))
net.compile(optimizer, loss='binary_crossentropy')
Take a look https://github.com/keras-team/keras/issues/741

In Keras:
#you model here -- last layer:
model.add(Dense(10))
model.add(Activation('sigmoid'))
model.compile(loss='categorical_crossentropy',
optimizer="adam",metrics=['accuracy'])

Related

CNN with LSTM-Layer

I have implemented a CNN with an LSTM layer. My input consists of four images. The images were transformed into a tensor by feature extraction. The input shape is (4,256,256,3).
The following is the structure of my model:
model = keras.models.Sequential()
model.add(TimeDistributed(Conv2D(32,(3,3),padding = 'same', activation = 'relu'),input_shape = (4,256,256,3)))
model.add(TimeDistributed(MaxPooling2D((2,2))))
model.add(TimeDistributed(Dropout(0.25)))
model.add(TimeDistributed(Conv2D(64,(3,3),padding = 'same', activation = 'relu')))
model.add(TimeDistributed(MaxPooling2D((4,4))))
model.add(TimeDistributed(Dropout(0.25)))
model.add(TimeDistributed(Conv2D(128,(3,3),padding = 'same', activation = 'relu')))
model.add(TimeDistributed(MaxPooling2D((2,2))))
model.add(TimeDistributed(Dropout(0.25)))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(128, activation='tanh'))# finalize with standard Dense, Dropout...
model.add(Dense(64, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(1, activation='relu'))
optim = keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer=optim, loss=['MSE'])
history = model.fit(x=X, y=Y, batch_size=4, epochs=5, validation_split=0.2, validation_data=(X,Y))
My problem is that my model predicts the same values for all inputs.
What could be the problem?
you use the same data for training and validation. this kills the whole point of validation. Perhaps the mistake lies in this. Try to split the data, or apply cross validation.
Also, the application of the relu activation function to the last layer in combination with the mse error looks strange. At least the real can give an unlimited result, and the data should be normalized.
I hope this will help you
if you are working with a classification problem specifically binary classification, then using sigmoid activation instead softmax And MSE loss is not a good choice for binary classification.

weighted loss function for multilabel classification

I am working on multilabel classification problem for images. I have 5 classes and I am using sigmoid for the last layer of classification. I have imbalanced data caused by multilabel problem and I thought I can use:
tf.nn.weighted_cross_entropy_with_logits( labels, logits, pos_weight, name=None)
However I don't know how to get logits from my model. I also think I shouldn't use sigmoid in the last layer since this loss function applies sigmoid to the logit.
First of all I suggest you have a look at the TensorFlow tutorial for classification on imbalanced dataset. However keep in mind that this tutorial is for binary classification and uses a sigmoid as last dense layer activation function. For multi-label classification you should use a softmax activation.
The softmax function normalizes a set of N real numbers into a probability distribution such that they sum up to 1.
For K = 2, the softmax and sigmoid function are the same.
I don't know your model, but you could create something like this (following the tutorial):
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=None)
])
To obtain the predictions you could do:
predictions = model(x_train[:1]).numpy() # obtains the prediction logits
tf.nn.softmax(predictions).numpy() # converts the logits to probabilities
In order to train you can define the following loss, compile the model, and train:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam',
loss=loss_fn,
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
Now, since you have an imbalanced dataset, in order to add weights, if you look at the documentation of SparseCategoricalCrossEntropy, you can see that the __call__ method has an optional parameter sample_weights:
Optional sample_weight acts as a coefficient for the loss. If a scalar
is provided, then the loss is simply scaled by the given value. If
sample_weight is a tensor of size [batch_size], then the total loss
for each sample of the batch is rescaled by the corresponding element
in the sample_weight vector.
I suggest you have a look at this answer if you have doubts on how to proceed. I think it answers perfectly what you want to achieve.
Also I find that this tutorial explains pretty well the multi-label classification problem.

One back-propagation pass in keras [duplicate]

I am interested in building reinforcement learning models with the simplicity of the Keras API. Unfortunately, I am unable to extract the gradient of the output (not error) with respect to the weights. I found the following code that performs a similar function (Saliency maps of neural networks (using Keras))
get_output = theano.function([model.layers[0].input],model.layers[-1].output,allow_input_downcast=True)
fx = theano.function([model.layers[0].input] ,T.jacobian(model.layers[-1].output.flatten(),model.layers[0].input), allow_input_downcast=True)
grad = fx([trainingData])
Any ideas on how to calculate the gradient of the model output with respect to the weights for each layer would be appreciated.
To get the gradients of model output with respect to weights using Keras you have to use the Keras backend module. I created this simple example to illustrate exactly what to do:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras import backend as k
model = Sequential()
model.add(Dense(12, input_dim=8, init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
To calculate the gradients we first need to find the output tensor. For the output of the model (what my initial question asked) we simply call model.output. We can also find the gradients of outputs for other layers by calling model.layers[index].output
outputTensor = model.output #Or model.layers[index].output
Then we need to choose the variables that are in respect to the gradient.
listOfVariableTensors = model.trainable_weights
#or variableTensors = model.trainable_weights[0]
We can now calculate the gradients. It is as easy as the following:
gradients = k.gradients(outputTensor, listOfVariableTensors)
To actually run the gradients given an input, we need to use a bit of Tensorflow.
trainingExample = np.random.random((1,8))
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
evaluated_gradients = sess.run(gradients,feed_dict={model.input:trainingExample})
And thats it!
The below answer is with the cross entropy function, feel free to change it your function.
outputTensor = model.output
listOfVariableTensors = model.trainable_weights
bce = keras.losses.BinaryCrossentropy()
loss = bce(outputTensor, labels)
gradients = k.gradients(loss, listOfVariableTensors)
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
evaluated_gradients = sess.run(gradients,feed_dict={model.input:training_data1})
print(evaluated_gradients)

Why does the loss of vgg16 equal nan, but performs normally when adding an extra Softmax layer?

I am coding a vgg16 net with Tensorflow low-level api. The model is test in imagenet12 dataset. Due to computation cost, I split the validation set into 80% training data and 20% test data.
First, the last layer fc8 outputs without the activation of softmax, and I use the tf.nn.softmax_cross_entropy_with_logits_v2(labels, logits) to compute the loss. It finally outputs nan in the training process.
Then I try to add a softmax layer under fc8, but still use tf.nn.softmax_cross_entropy_with_logits_v2(labels, logits) to compute the loss. Surprisingly, the loss outputs normally rather than nan.
Here is the code before adding softmax layer:
def vgg16():
...
fc8_layer = FullConnectedLayer(y, self.weight_dict, regularizer_fc=self.regularizer_fc)
self.op_logits = fc8_layer.layer_output
def loss(self):
entropy = tf.nn.softmax_cross_entropy_with_logits_v2(labels=self.Y, logits=self.op_logits)
l2_loss = tf.losses.get_regularization_loss()
self.op_loss = tf.reduce_mean(entropy, name='loss') + l2_loss
and I change the vgg16 output like:
def vgg16():
...
fc8_layer = FullConnectedLayer(y, self.weight_dict, regularizer_fc=self.regularizer_fc)
self.op_logits = tf.nn.softmax(fc8_layer.layer_output)
Besides, here is my optimizer:
def optimize(self):
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
with tf.variable_scope(tf.get_variable_scope(), reuse=tf.AUTO_REUSE):
self.opt = tf.train.MomentumOptimizer(learning_rate=self.config.learning_rate, momentum=0.9,
use_nesterov=True)
self.op_opt = self.opt.minimize(self.op_loss)
I don't understand why adding a softmax layer works. In my idea, two softmax layers wont affect the final loss, since it doesn't change the proportion of each output unit.

Keras dense layer outputs are 'nan'

I'm using Keras to build a RNN model with CTC loss.
I found that when passed a tensor to a Dense layer with activation=None, and the outputs of this layer were all nan.
But when set activation='softmax', the outputs were normal not nan.
problem code (elements of logits are all nan):
logits = Dense(out_shape, activation = None, name="logits")(x_permute)#x_permute is a tensor with shape (?,1876,96)
loss_ctc = Lambda(ctc_lambda_func, name='ctc_my')(
[logits, labels, x_len, lab_len])
model = Model(inputs=[x, labels, x_len, lab_len], outputs=[loss_ctc])
model.compile(loss={'ctc_my': lambda y_true,y_pred: y_pred}, optimizer='adadelta')
normal code(elements of logits are not nan):
logits = Dense(out_shape, activation = None, name="logits")(x_permute)#x_permute is a tensor with shape (?,1876,96)
output = Activation(activation="softmax", name="softmax")(logits)
loss_ctc = Lambda(ctc_lambda_func, name='ctc_my')(
[output, labels, x_len, lab_len])
model = Model(inputs=[x, labels, x_len, lab_len], outputs=[loss_ctc])
model.compile(loss={'ctc_my': lambda y_true,y_pred: y_pred}, optimizer='adadelta')
def ctc_lambda_func(args):
y_pred, y_true, input_length, label_length = args
return ctc_batch_cost(y_true, y_pred,input_length,label_length)
Anyone helps? many thanks.
I may misunderstand you, but why would you want activation="none"?
Maybe what you want to use is linear activation?
Have a look at Keras Activation Functions
as per Klemen Grm
your neural network is completely linear. You might consider different activation functions (eg: tanh, sigmoid, linear) for your hidden and output layers. This both lets you constrain the output range, and will probably improve the learning properties of your network.
In addition to what Klemen says, for the last one you want a softmax,
that normalizes the outputs into probabilities.
Neural networks have to implement complex mapping functions hence they need activation functions that are non-linear in order to bring in the much needed non-linearity property that enables them to approximate any function. A neuron without an activation function is equivalent to a neuron with a linear activation function