It seems that keras trainable attribute is ignored by tensorflow, which makes it very inconvenient to use keras as a syntactical shortcut in tensorflow.
For example:
import keras
import tensorflow as tf
import numpy as np
import keras.backend as K
Conv2 = keras.layers.Conv2D(filters=16, kernel_size=3, padding='same')
Conv2.trainable = False #This layers has been set to not trainable.
A=keras.layers.Input(batch_shape=(1,16,16,3))
B = Conv2(A)
x = np.random.randn(1, 16, 16,3)
y = np.random.randn(1,16, 16, 16)
True_y = tf.placeholder(shape=(1,16,16,16), dtype=tf.float32)
loss = tf.reduce_sum((B - True_y) ** 2)
opt_op = tf.train.AdamOptimizer(learning_rate=0.01).minimize(loss)
print(tf.trainable_variables())
# [<tf.Variable 'conv2d_1/kernel:0' shape=(3, 3, 3, 16) dtype=float32_ref>, <tf.Variable 'conv2d_1/bias:0' shape=(16,) dtype=float32_ref>]
sess = K.get_session()
for _ in range(10):
out = sess.run([opt_op, loss], feed_dict={A:x, True_y:y})
print(out[1])
OutPut:
5173.94
4968.7754
4785.889
4624.289
4482.1
4357.5757
4249.1504
4155.329
4074.634
4005.6482
It simply means the loss is decreasing and the weights are trainable.
I read the blog ''Keras as a simplified interface to TensorFlow'', but it mentioned nothing about the trainable problem.
Any suggestion is appreciated.
Your conclusion is basically correct. Keras is a wrapper around TensorFlow, but not all Keras functionality transfers directly into TensorFlow, so you need to be careful when you mix Keras and raw TF.
Specifically, in this case, if you want to call the minimize function yourself, you need to specify which variables you want to train on using the var_list argument of minimize.
Related
I am porting a model from PyTorch to Keras/Tensorflow, and I want to make sure I'm using the same algorithm for weight initialization. How do I mimic PyTorch's weight initialization in Keras?
If you refactor the PyTorch initialization code, you'll find that the weight initialization algorithm is surprisingly simple. The comment in that code is correct; just read that comment and mimic it.
Here's working Keras / Tensorflow code that mimics it:
import tensorflow as tf
from tensorflow.keras import layers
class PytorchInitialization(tf.keras.initializers.VarianceScaling):
def __init__(self, seed=None):
super().__init__(
scale=1 / 3, mode='fan_in', distribution='uniform', seed=seed)
# Conv layer
conv = layers.Conv2D(32, 3, activation="relu", padding="SAME",
input_shape=(28, 28, 1),
kernel_initializer=PytorchInitialization(),
bias_initializer=PytorchInitialization())
# Dense / linear layer
classifier = layers.Dense(10,
kernel_initializer=PytorchInitialization(),
bias_initializer=PytorchInitialization(),
I followed this tutorial to build a siamese network for my problem.
I was using Tensorflow 2.4.1 and now upgraded
This code worked wonderfully before
base_cnn = resnet.ResNet50(
weights="imagenet", input_shape=target_shape + (3,), include_top=False
)
flatten = layers.Flatten()(base_cnn.output)
dense1 = layers.Dense(512, activation="relu")(flatten)
dense1 = layers.BatchNormalization()(dense1)
dense2 = layers.Dense(256, activation="relu")(dense1)
dense2 = layers.BatchNormalization()(dense2)
output = layers.Dense(256)(dense2)
embedding = Model(base_cnn.input, output, name="Embedding")
trainable = False
for layer in base_cnn.layers:
if layer.name == "conv5_block1_out":
trainable = True
layer.trainable = trainable
Now each resnet layer or mobilenet or efficient net (tried them all)
throws these errors:
WARNING:tensorflow:
The following Variables were used a Lambda layer's call (tf.nn.convolution_620), but
are not present in its tracked objects:
<tf.Variable 'stem_conv/kernel:0' shape=(3, 3, 3, 48) dtype=float32>
It is possible that this is intended behavior, but it is more likely
an omission. This is a strong indication that this layer should be
formulated as a subclassed Layer rather than a Lambda layer.
It compiles and seems to fit.
But do we have to initialize the models somewhat differently in 2.5?
Thanks for any pointers!
I'm not sure what's the main reason for your issue as it's not reproducible generally. But here are some notes about that warning message. The traceback shown in your question is not from ResNet but from EfficientNet.
Now, we know that the Lambda layer exists so that arbitrary expressions can be used as a Layer when constructing Sequential and Functional API models. Lambda layers are best suited for simple operations or quick experimentation. While it is possible to use Variables with Lambda layers, this practice is discouraged as it can easily lead to bugs. For example:
import tensorflow as tf
x_input = tf.range(12.).numpy().reshape(-1, 4)
weights = tf.Variable(tf.random.normal((4, 2)), name='w')
bias = tf.ones((1, 2), name='b')
# lambda custom layer
mylayer1 = tf.keras.layers.Lambda(lambda x: tf.add(tf.matmul(x, weights),
bias), name='lambda1')
mylayer1(x_input)
WARNING:tensorflow:
The following Variables were used a Lambda layer's call (lambda1), but
are not present in its tracked objects:
<tf.Variable 'w:0' shape=(4, 2) dtype=float32, numpy=
array([[-0.753139 , -1.1668463 ],
[-1.3709341 , 0.8887151 ],
[ 0.3157893 , 0.01245957],
[-1.3878908 , -0.38395467]], dtype=float32)>
It is possible that this is intended behavior, but it is more likely
an omission. This is a strong indication that this layer should be
formulated as a subclassed Layer rather than a Lambda layer.
<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[ -3.903028 , 0.7617702],
[-16.687727 , -1.8367348],
[-29.472424 , -4.43524 ]], dtype=float32)>
It's because the mylayer1 layer doesn't trace the tf.Variables directly and so that those parameter won't appear in mylayer1.trainable_weights.
mylayer1.trainable_weights
[]
In general, Lambda layers can be convenient for simple stateless computation, but anything more complex should use a subclass Layer instead. From your traceback, it seems like there can be such a possible scenario with the step_conv layer.
for layer in EfficientNetB0(weights=None).layers:
if layer.name == 'stem_conv':
print(layer)
<tensorflow.python.keras.layers.convolutional.Conv2D object..
Quick surveying on source code of tf.compat.v1.nn.conv2d, lead to a lambda expression that might be the cause.
Here there is no need to revert back to TF2.4.1. I would always recommend try with latest version because it addressed many of the performance issues and new features.
I was able to execute above code without any issues in TF2.5.
import tensorflow as tf
print(tf.__version__)
from tensorflow.keras.applications import ResNet50
from tensorflow.keras import layers, Model
img_width, img_height = 224, 224
target_shape = (img_width, img_height, 3)
base_cnn = ResNet50(
weights="imagenet", input_shape=target_shape, include_top=False
)
flatten = layers.Flatten()(base_cnn.output)
dense1 = layers.Dense(512, activation="relu")(flatten)
dense1 = layers.BatchNormalization()(dense1)
dense2 = layers.Dense(256, activation="relu")(dense1)
dense2 = layers.BatchNormalization()(dense2)
output = layers.Dense(256)(dense2)
embedding = Model(base_cnn.input, output, name="Embedding")
trainable = False
for layer in base_cnn.layers:
if layer.name == "conv5_block1_out":
trainable = True
layer.trainable = trainable
Output:
2.5.0
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
94773248/94765736 [==============================] - 1s 0us/step
As per #Olli, Restarting and clearing the session the kernel has resolved the problem.
pip install tensorflow==2.3.0 , worked for me instead of tf 2.5
I was facing the issue related to using Lambda layer
bce = tf.keras.losses.BinaryCrossentropy()
ll=bce(y_test[0], model.predict(X_test[0].reshape(1,-1)))
print(ll)
<tf.Tensor: shape=(), dtype=float32, numpy=0.04165391>
print(model.input)
<tf.Tensor 'dense_1_input:0' shape=(None, 195) dtype=float32>
model.output
<tf.Tensor 'dense_3/Sigmoid:0' shape=(None, 1) dtype=float32>
grads=K.gradients(ll, model.input)[0]
print(grads)
None
So here i have Trained a 2 hidden layer neural network, input has 195 features and output is 1 size. I wanted to feed the neural network with validation instances named as X_test one by one with their correct labels in y_test and for each instance calculate the gradients of the output with respect to input, the grads upon printing gives me a None. Your help is appreciated.
One can do this using tf.GradientTape. I wrote the following code to learn a sin wave, and get its derivative in the spirit of this question. I think, it should be possible to extend the following codes in order to compute partial derivatives.
Importing the needed libraries:
import numpy as np
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import losses
import tensorflow as tf
Create the data:
x = np.linspace(0, 6*np.pi, 2000)
y = np.sin(x)
Defining a Keras NN:
def model_gen(Input_shape):
X_input = Input(shape=Input_shape)
X = Dense(units=64, activation='sigmoid')(X_input)
X = Dense(units=64, activation='sigmoid')(X)
X = Dense(units=1)(X)
model = Model(inputs=X_input, outputs=X)
return model
Training the model:
model = model_gen(Input_shape=(1,))
opt = Adam(lr=0.01, beta_1=0.9, beta_2=0.999, decay=0.001)
model.compile(loss=losses.mean_squared_error, optimizer=opt)
model.fit(x,y, epochs=200)
To obtain the gradient of the network w.r.t. the input:
x = list(x)
x = tf.constant(x)
with tf.GradientTape() as t:
t.watch(x)
y = model(x)
dy_dx = t.gradient(y, x)
dy_dx.numpy()
One can further visualise dy_dx to make sure of how smooth the derivative is. Finally, note that one get a smoother derivative when one uses a smooth activation (e.g. sigmoid) instead of Relu as noted here.
I'm trying to obtain the gradients from a keras model. The backend function keras.backend.gradients creates a symbolic function which needs to be evaluated on some specific input. The following code does work for this problem but it makes use of the old tensorflow sessions and in particular of feed_dict.
import numpy as np
import keras
from keras import backend as K
import tensorflow as tf
model = keras.Sequential()
model.add(keras.layers.Dense(16, activation='relu', input_shape = (49, )))
model.add(keras.layers.Dense(11, activation='softmax'))
model.compile(optimizer='rmsprop', loss='mse')
trainingExample = np.random.random((1, 49))
gradients = K.gradients(model.output, model.trainable_weights)
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
evaluated_gradients = sess.run(gradients,\
feed_dict={model.input:trainingExample})
sess.close()
How can I rewrite this in tensorflow 2 style, i.e. without the sessions? There is an alternative method described here. However I don't understand why it should be necessary to give some explicit output to evaluate the gradients and how to make the solution work without these outputs.
In tensorflow-2 you can get gradients very easily using gradient tf.GradientTape().
I am citing the official tutorial code here -
#tf.function
def train_step(images, labels):
with tf.GradientTape() as tape:
predictions = model(images)
loss = loss_object(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_loss(loss)
train_accuracy(labels, predictions)
you can find the complete tutorial in tensorflow official website - https://www.tensorflow.org/tutorials/quickstart/advanced
I am having an issue. I run the same code on my local machine with CPU and Tensorflow 1.14.0. It works fine. However, when I run it on GPU with Tensorflow 2.0, I get
CancelledError: [_Derived_]RecvAsync is cancelled. [[{{node Adam/Adam/update/AssignSubVariableOp/_65}}]] [[Reshape_13/_62]] [Op:__inference_distributed_function_3722]
Function call stack: distributed_function
Reproducible code is here:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
print(tf.__version__)
import matplotlib.pyplot as plt
%matplotlib inline
batch_size = 32
num_obs = 100
num_cats = 1 # number of categorical features
n_steps = 10 # number of timesteps in each sample
n_numerical_feats = 18 # number of numerical features in each sample
cat_size = 12 # number of unique categories in each categorical feature
embedding_size = 1 # embedding dimension for each categorical feature
labels = np.random.random(size=(num_obs*n_steps,1)).reshape(-1,n_steps,1)
print(labels.shape)
#(100, 10, 1)
#3 numerical variable
num_data = np.random.random(size=(num_obs*n_steps,n_numerical_feats))
print(num_data.shape)
#(1000, 3)
#Reshaping numeric features to fit into an LSTM network
features = num_data.reshape(-1,n_steps, n_numerical_feats)
print(features.shape)
#(100, 10, 3)
#one categorical variables with 4 levels
cat_data = np.random.randint(0,cat_size,num_obs*n_steps)
print(cat_data.shape)
#(1000,)
idx = cat_data.reshape(-1, n_steps)
print(idx.shape)
#(100, 10)
numerical_inputs = keras.layers.Input(shape=(n_steps, n_numerical_feats), name='numerical_inputs', dtype='float32')
#<tf.Tensor 'numerical_inputs:0' shape=(?, 10, 36) dtype=float32>
cat_input = keras.layers.Input(shape=(n_steps,), name='cat_input')
#<tf.Tensor 'cat_input:0' shape=(None, 10) dtype=float32>
cat_embedded = keras.layers.Embedding(cat_size, embedding_size, embeddings_initializer='uniform')(cat_input)
#<tf.Tensor 'embedding_1/Identity:0' shape=(None, 10, 1) dtype=float32>
merged = keras.layers.concatenate([numerical_inputs, cat_embedded])
#<tf.Tensor 'concatenate_1/Identity:0' shape=(None, 10, 37) dtype=float32>
lstm_out = keras.layers.LSTM(64, return_sequences=True)(merged)
#<tf.Tensor 'lstm_2/Identity:0' shape=(None, 10, 64) dtype=float32>
Dense_layer1 = keras.layers.Dense(32, activation='relu', use_bias=True)(lstm_out)
#<tf.Tensor 'dense_4/Identity:0' shape=(None, 10, 32) dtype=float32>
Dense_layer2 = keras.layers.Dense(1, activation='linear', use_bias=True)(Dense_layer1 )
#<tf.Tensor 'dense_5/Identity:0' shape=(None, 10, 1) dtype=float32>
model = keras.models.Model(inputs=[numerical_inputs, cat_input], outputs=Dense_layer2)
#compile model
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae', 'mse'])
EPOCHS =5
#fit the model
#you can use input layer names instead
history = model.fit([features, idx],
y = labels,
epochs=EPOCHS,
batch_size=batch_size)
Does anyone have similar issues? Obviously this is a bug but i do not know how to come around because I want to use Tensorflow 2.0.
I found that tensorflow-gpu2.0.0 was compiled with cuda7.6.0.
Then I update my cuda from 7.4.2 to 7.6.4, the problem solved.
Update cuda to 7.6.2;
Use TF_FORCE_GPU_ALLOW_GROWTH=true to force allow GPU growth.
I have faced similar issues, these steps may help you with code on tf2.0
Check the GPU memory, make sure nothing else is running on it.
Put and Run this script before importing Keras or Tensorflow, Restart runtime then first execute this.
import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
gpu = gpus[0]
tf.config.experimental.set_memory_growth(gpu, True)
Try Reducing your model size, batch size if possible. Until it works.