What's the default activation function of cudnnlstm in TensorFlow? How can I set an activation function such as relu? Maybe it's just linear model? I read the document, but I did not find it.
For example, the code is below:
lstmcell=tf.contrib.cudnn_rnn.CudnnLSTM(1,encoder_size,direction="bidirectional")
hq,_ =lstmcell(query)
And I read the document of TensorFlow From this link.
The function is below
__init__(
num_layers,
num_units,
input_mode=CUDNN_INPUT_LINEAR_MODE,
direction=CUDNN_RNN_UNIDIRECTION,
dropout=0.0,
seed=None,
dtype=tf.float32,
kernel_initializer=None,
bias_initializer=None,
name=None
)
And no keyword to set a parameter such as "activation = "tanh" just like tf.nn.rnn_cell.LSTMell.
So what's the default activation function of cudnnlstm in TensorFlow, and how to change it to leaky_relu.
tf.contrib.cudnn_rnn.CudnnLSTM() : Tanh
This was given in the Keras github.
https://github.com/keras-team/keras/issues/8510#issuecomment-429255318
Nvidia documentation.
https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/
To answer OP's 2nd question which was edited in later, there is currently no way to set a custom activation function for CudnnLSTM and CudnnGRU.
Related
How do you change the activation function in Yolov7? For example, the default activation function of yolov7 (except tiny) is SiLU according to YOLOv7 paper and I want to change to ReLU or other activation functions
I found a solution but it's for yolov5.
I want to use ResNet50 with Imagenet weights.
The last layer of ResNet50 is (from here)
x = layers.Dense(1000, activation='softmax', name='fc1000')(x)
I need to keep the weights of this layer but remove the softmax function.
I want to manually change it so my last layer looks like this
x = layers.Dense(1000, name='fc1000')(x)
but the weights stay the same.
Currently I call my net like this
resnet = Sequential([
Input(shape(224,224,3)),
ResNet50(weights='imagenet', input_shape(224,224,3))
])
I need the Input layer because otherwise the model.compile says that placeholders aren't filled.
Generally there are two ways of achievieng this:
Quick way - supported functions:
To change the final layer's activation function, you can pass an argument classifier_activation.
So in order to get rid of activation all together, your module can be called like:
import tensorflow as tf
resnet = tf.keras.Sequential([
tf.keras.layers.Input(shape=(224,224,3)),
tf.keras.applications.ResNet50(
weights='imagenet',
input_shape=(224,224,3),
pooling="avg",
classifier_activation=None
)
])
This however, is not going to work if the you want a different function, that is not supported by Keras classifer_activation parameter (e. g. custom activation function).
To achieve this you can use the workaround solution:
Long way - copy the model's weights
This solution proposes copying the original model's weights onto your custom one. This approach works because apart from the activation function you are not chaning the model's architecture.
You need to:
1. Download original model.
2. Save it's weights.
3. Declare your modified version of the model (in your case, without the activation function).
4. Set the weights of the new model.
Below snippet explains this concept in more detail:
import tensorflow as tf
# 1. Download original resnet
resnet = tf.keras.Sequential([
tf.keras.layers.Input(shape=(224,224,3)),
tf.keras.applications.ResNet50(
weights='imagenet',
input_shape=(224,224,3),
pooling="avg"
)
])
# 2. Hold weights in memory:
imagenet_weights = resnet.get_weights()
# 3. Declare the model, but without softmax
resnet_no_softmax = tf.keras.Sequential([
tf.keras.layers.Input(shape=(224,224,3)),
tf.keras.applications.ResNet50(
include_top=False,
weights='imagenet',
input_shape=(224,224,3),
pooling="avg"
),
tf.keras.layers.Dense(1000, name='fc1000')
])
# 4. Pass the imagenet weights onto the second resnet
resnet_no_softmax.set_weights(imagenet_weights)
Hope this helps!
It seems setting model.trainable=False in tensorflow keras does nothing except for to print a wrong model.summary(). Here is the code to reproduce the issue:
import tensorflow as tf
import numpy as np
IMG_SHAPE = (160, 160, 3)
# Create the base model from the pre-trained model MobileNet V2
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
base_model.trainable = False
# for layer in base_model.layers:
# layer.trainable=False
bc=[] #before compile
ac=[] #after compile
for layer in base_model.layers:
bc.append(layer.trainable)
print(np.all(bc)) #True
print(base_model.summary()) ##this changes to show no trainable parameters but that is wrong given the output to previous np.all(bc)
base_model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.001),
loss='categorical_crossentropy',
metrics=['accuracy'])
for layer in base_model.layers:
ac.append(layer.trainable)
print(np.all(ac)) #True
print(base_model.summary()) #this changes to show no trainable parameters but that is wrong given the output to previous np.all(ac)
In light of this - What is the expected behavior and purpose of model.trainable=False in tensorflow keras?
https://github.com/tensorflow/tensorflow/issues/29535
I think this issue could help.
If you are looking for a way to not update some weights in your model I would suggest using the parameter var_list in the minimize function from your Optimizer.
For some reason when creating a model from keras Tensorflow switch all tf.Variables to True, and since all are Tensors we are not able to update the value to False.
What I do in my code is create scope names for all pretrained models and loop over it adding all layers that are not from my pretrained model.
trainable_variables = []
variables_collection = tf.get_collection('learnable_variables')
for layer in tf.trainable_variables():
if 'vgg_model' not in layer.name:
trainable_variables.append(layer)
tf.add_to_collection('learnable_variables', layer)
grad = tf.train.GradientDescentOptimizer(lr)
train_step = grad.minimize(tf.reduce_sum([loss]), var_list=trainable_variables)
Watch out for global_initializer as well, since it will overwrite your pretrained Weights as well. You can solve that by using tf.variables_initializer and passing a list of variables you want to add weights.
sess.run(tf.variables_initializer(variables_collection))
Source I used when trying to solve this problem
Is it possible to make a trainable variable not trainable?
TensorFlow: Using tf.global_variables_initializer() after partially loading pre-trained weights
Trying to use non keras backend functions for custom loss calculation in keras models.
I am trying to make my keras cnn model use a custom loss function ( KAppa score). However since kappas is not defined in Keras backend , i need to used scikit-learn based kappa implementation. This sklearn function takes array of labels as the argument unlike keras backend functions which take tensors. The loss function call within keras mostly sends tensors Y_pred and Y_true. I did the implementation below using some quide i found online but I get errors .
import keras.backend as K
def cohen_kappa_score_func(y_true, y_pred):
sess = tf.Session()
with sess.as_default():
score = cohen_kappa_score(type(y_true.eval()),type(y_pred.eval()), weights='linear')#idea is to convert the tensor to array
sess.close()
return score
#use this later to compile the keras model with custom loss function as
model.compile(optimizer=optimizers.SGD(lr=0.001, momentum=0.9),
loss=cohen_kappa_score_func,
metrics=['categorical_crossentropy', 'mae','categorical_accuracy'])
This doesnt work and i get the following error
"InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'dense_15_target' with dtype float and shape [?,?]
[[node dense_15_target "
Please give me suggestios to solve this.
The official Tensorflow API doc claims that the parameter kernel_initializer defaults to None for tf.layers.conv2d and tf.layers.dense.
However, reading the layers tutorial (https://www.tensorflow.org/tutorials/layers), I noted that this parameter is not set in the code. For example:
# Convolutional Layer #1
conv1 = tf.layers.conv2d(
inputs=input_layer,
filters=32,
kernel_size=[5, 5],
padding="same",
activation=tf.nn.relu)
The example code from the tutorial runs without any errors, so I think the default kernel_initializer is not None. So, which initializer is used?
In another code, I did not set the kernel_initializer of the conv2d and dense layers, and everything was fine. However, when I tried to set the kernel_initializer to tf.truncated_normal_initializer(stddev=0.1, dtype=tf.float32), I got NaN errors. What is going on here? Can anyone help?
Great question! It is quite a trick to find out!
As you can see, it is not documented in tf.layers.conv2d
If you look at the definition of the function you see that the function calls variable_scope.get_variable:
In code:
self.kernel = vs.get_variable('kernel',
shape=kernel_shape,
initializer=self.kernel_initializer,
regularizer=self.kernel_regularizer,
trainable=True,
dtype=self.dtype)
Next step: what does the variable scope do when the initializer is None?
Here it says:
If initializer is None (the default), the default initializer passed in
the constructor is used. If that one is None too, we use a new
glorot_uniform_initializer.
So the answer is: it uses the glorot_uniform_initializer
For completeness the definition of this initializer:
The Glorot uniform initializer, also called Xavier uniform initializer.
It draws samples from a uniform distribution within [-limit, limit]
where limit is sqrt(6 / (fan_in + fan_out))
where fan_in is the number of input units in the weight tensor
and fan_out is the number of output units in the weight tensor.
Reference: http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf
Edit: this is what I found in the code and documentation. Perhaps you could verify that the initialization looks like this by running eval on the weights!
According to this course by Andrew Ng and the Xavier documentation, if you are using ReLU as activation function, better change the default weights initializer(which is Xavier uniform) to Xavier normal by:
y = tf.layers.conv2d(x, kernel_initializer=tf.contrib.layers.xavier_initializer(uniform=False), )
2.0 Compatible Answer: Even in Tensorflow 2.0, the Default Kernel Initializer in tf.keras.layers.Conv2D and tf.keras.layers.Dense is glorot_uniform.
This is specified in the Tensorflow.org Website.
Link for Conv2D is https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D?version=nightly#init
and the Link for Dense is
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense?version=nightly#init