I am interested in building reinforcement learning models with the simplicity of the Keras API. Unfortunately, I am unable to extract the gradient of the output (not error) with respect to the weights. I found the following code that performs a similar function (Saliency maps of neural networks (using Keras))
get_output = theano.function([model.layers[0].input],model.layers[-1].output,allow_input_downcast=True)
fx = theano.function([model.layers[0].input] ,T.jacobian(model.layers[-1].output.flatten(),model.layers[0].input), allow_input_downcast=True)
grad = fx([trainingData])
Any ideas on how to calculate the gradient of the model output with respect to the weights for each layer would be appreciated.
To get the gradients of model output with respect to weights using Keras you have to use the Keras backend module. I created this simple example to illustrate exactly what to do:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras import backend as k
model = Sequential()
model.add(Dense(12, input_dim=8, init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
To calculate the gradients we first need to find the output tensor. For the output of the model (what my initial question asked) we simply call model.output. We can also find the gradients of outputs for other layers by calling model.layers[index].output
outputTensor = model.output #Or model.layers[index].output
Then we need to choose the variables that are in respect to the gradient.
listOfVariableTensors = model.trainable_weights
#or variableTensors = model.trainable_weights[0]
We can now calculate the gradients. It is as easy as the following:
gradients = k.gradients(outputTensor, listOfVariableTensors)
To actually run the gradients given an input, we need to use a bit of Tensorflow.
trainingExample = np.random.random((1,8))
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
evaluated_gradients = sess.run(gradients,feed_dict={model.input:trainingExample})
And thats it!
The below answer is with the cross entropy function, feel free to change it your function.
outputTensor = model.output
listOfVariableTensors = model.trainable_weights
bce = keras.losses.BinaryCrossentropy()
loss = bce(outputTensor, labels)
gradients = k.gradients(loss, listOfVariableTensors)
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
evaluated_gradients = sess.run(gradients,feed_dict={model.input:training_data1})
print(evaluated_gradients)
Related
I have implemented a CNN with an LSTM layer. My input consists of four images. The images were transformed into a tensor by feature extraction. The input shape is (4,256,256,3).
The following is the structure of my model:
model = keras.models.Sequential()
model.add(TimeDistributed(Conv2D(32,(3,3),padding = 'same', activation = 'relu'),input_shape = (4,256,256,3)))
model.add(TimeDistributed(MaxPooling2D((2,2))))
model.add(TimeDistributed(Dropout(0.25)))
model.add(TimeDistributed(Conv2D(64,(3,3),padding = 'same', activation = 'relu')))
model.add(TimeDistributed(MaxPooling2D((4,4))))
model.add(TimeDistributed(Dropout(0.25)))
model.add(TimeDistributed(Conv2D(128,(3,3),padding = 'same', activation = 'relu')))
model.add(TimeDistributed(MaxPooling2D((2,2))))
model.add(TimeDistributed(Dropout(0.25)))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(128, activation='tanh'))# finalize with standard Dense, Dropout...
model.add(Dense(64, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(1, activation='relu'))
optim = keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer=optim, loss=['MSE'])
history = model.fit(x=X, y=Y, batch_size=4, epochs=5, validation_split=0.2, validation_data=(X,Y))
My problem is that my model predicts the same values for all inputs.
What could be the problem?
you use the same data for training and validation. this kills the whole point of validation. Perhaps the mistake lies in this. Try to split the data, or apply cross validation.
Also, the application of the relu activation function to the last layer in combination with the mse error looks strange. At least the real can give an unlimited result, and the data should be normalized.
I hope this will help you
if you are working with a classification problem specifically binary classification, then using sigmoid activation instead softmax And MSE loss is not a good choice for binary classification.
I am trying to a resnet-50 model in tensorflow by cifar-100 dataset.I have used builtin resnet_v1_50 to create model in tensorflow with two fully connected layer on it's head.But my validation accuracy stuck at nearly 37%.What is the problem???am I configure wrongly define and configure resnet_v1_50??? my model creation code is given below.
import tensorflow as tf
from tensorflow.contrib.slim.python.slim.nets import resnet_v1
X = tf.placeholder(dtype=tf.float32, shape=[None, 32, 32, 3])
Y = tf.placeholder(dtype=tf.float32, shape=[None, 100])
net, end_points = resnet_v1.resnet_v1_50(X,global_pool=False,is_training=True)
flattened = tf.contrib.layers.flatten(net)
dense_fc1 = tf.layers.dense(inputs=flattened,units=625, activation=tf.nn.relu,kernel_initializer=tf.contrib.layers.xavier_initializer())
dropout_fc1 = tf.layers.dropout(inputs=dense_fc1,rate=0.5, training=self.training)
logits = tf.layers.dense(inputs=dropout_fc1, units=num_classes,kernel_initializer = tf.contrib.layers.xavier_initializer())
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
I think you have an extra dense layer. ResNet uses single fully-connected layer with softmax and size=num_classes.
You might also need to make sure that your hyperparameters are set correctly, like learning_rate and weight_decay and your input processing pipeline is also correct.
Here is an extra link to see if your pipeline is similar to a working solution.
I am using a bidirectional RNN in Keras and need to use Tensoflows LazyAdamOptimizer. I need to do Gradient Normalization. How can I implement gradient normalization with tensorflows LazyAdamOptimizer and than use the functional keras model further on?
I am training a unsupervised RNN to predict a input sequence of lenght 10. The Problem is, that i am using a keras functional model. Because of the sparsity of the embedding layer i need to use Tensorflows LazyAdamOptimizer, which is not a default optimizer in keras. When using a default keras optimizer i can do gradient normalization just by setting the argument 'clipnorm=1' in the optimizer function. Because i am using LazyAdam i need to do this with tensorflow and than pass it back to my keras model, but i can't get the code going.
#model architecture
model_input = Input(shape=(seq_len, ))
embedding_a = Embedding(len(port_fwd_dict), 50, input_length=seq_len, mask_zero=True)(model_input)
lstm_a = Bidirectional(GRU(25, return_sequences=True,implementation=2, reset_after=True, recurrent_activation='sigmoid'), merge_mode="concat (embedding_a)
dropout_a = Dropout(0.2)(lstm_a)
lstm_b = Bidirectional(GRU(25, return_sequences=False, activation="relu", implementation=2, reset_after=True, recurrent_activation='sigmoid'), merge_mode="concat")(dropout_a)
dropout_b = Dropout(0.2)(lstm_b)
dense_layer = Dense(100, activation="linear")(dropout_b)
dropout_c = Dropout(0.2)(dense_layer)
model_output = Dense(len(port_fwd_dict)-1, activation="softmax(dropout_c)
# trying to implement gradient normalization
optimizer = tf.contrib.opt.LazyAdamOptimizer()
optimizer = tf.contrib.estimator.clip_gradients_by_norm(optimizer, 1)
loss = tf.reduce_mean(categorical_crossentropy(model_input, model_output))
train_op = optimizer.minimize(loss, tf.train.get_global_step())
model = Model(inputs=model_input, outputs=model_output)
model.compile(optimizer=train_op, loss='categorical_crossentropie', metrics = [ 'categorical_accuracy'])
history = model.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, validation_split=validation_split, class_weight = 'auto')
Blockquote
I get the following Error: NameError: name 'categorical_crossentropy' is not defined
But even if this error is solved, i do not know if this code will work. Because I need to use the keras function model.compile and in this function there need to be a loss specified. but when i do this in the tensorflow part above, it is not working.
I need a way to do gradient normalization and use my normal keras functional model?!
maybe you can try my implement of lazy optimizer:
https://github.com/bojone/keras_lazyoptimizer
It is a pure keras implement, wrapping a existing optimizer to be a lazy version.
I have a simple frozen tensorflow model (frozen in Keras) that I load and then try to use for prediction. I do this first in python (code below), and then using C and libtensorflow (and get the same results). The examples I have found provide the the logits (before activation) as the final output, rather than the class label after activation. Is there a way to obtain the label through the graph itself?
I understand I can operate the sigmoid / softmax operators on the logits, but that's not what I want to do. (I'm porting the code to use the libtensorflow C api, and would prefer to let the graph do the math.)
My understanding is that the session runs the graph to the operation / tensor, and stops before that operation. Is there a way to get the operation after the activation?
Keras Model:
model = Sequential()
model.add(Dense(100, activation='relu', input_shape=(21,)))
model.add(Dense(1, activation='sigmoid'))
Tensorflow code to load frozen model and predict:
from tensorflow.python.platform import gfile
with tf.Session() as sess:
with gfile.FastGFile('slopemodel/slopemodel.pb', 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
sess.graph.as_default()
g_in = tf.import_graph_def(graph_def)
tensor_output = sess.graph.get_tensor_by_name('import/dense_2/Sigmoid:0')
tensor_input = sess.graph.get_tensor_by_name('import/dense_1_input:0')
predictions = sess.run(tensor_output, {tensor_input:sample})
print(predictions)
Truncated list of important nodes in the graph:
['import/dense_1_input',
'import/dense_1/kernel',
'import/dense_1/kernel/read',
'import/dense_1/bias',
'import/dense_1/bias/read',
'import/dense_1/MatMul',
'import/dense_1/BiasAdd',
'import/dense_1/Relu',
'import/dense_2/kernel',
'import/dense_2/kernel/read',
'import/dense_2/bias',
'import/dense_2/bias/read',
'import/dense_2/MatMul',
'import/dense_2/BiasAdd',
'import/dense_2/Sigmoid',
'import/Adam/iterations',
.
.
.]
Yes, you simply need to change the tensor_output to the one you want to obtain. Note that you will not receive the labels themselves but a one-hot vector from which you'll need to find the corresponding label yourself.
I have a CNN where output dimension is [None, 10]
It is a multi-label problem, where output signifies possible categories which x might belong. (eg, an image can be classified as cat dark and so on)
Following is what I have now, how can I change the code to keras version?
I can't find equivalent of sigmoid_cross_entropy_with_logits
model = tf.layers.dense(L3, category_num, activation=None)
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(logits=model, labels=Y)
cost = tf.reduce_mean(tf.reduce_sum(cross_entropy, axis=1))
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
Direct alternative in Keras is to use sigmoid activation in your output layer and binary_crossentropy as cost function.
net.add(Dense(..., activation='sigmoid'))
net.compile(optimizer, loss='binary_crossentropy')
Take a look https://github.com/keras-team/keras/issues/741
In Keras:
#you model here -- last layer:
model.add(Dense(10))
model.add(Activation('sigmoid'))
model.compile(loss='categorical_crossentropy',
optimizer="adam",metrics=['accuracy'])