Flag for training and test for custom layer in Keras - tensorflow

I want to create a custom keras layer which does something during training and something else for validation or testing.
from tensorflow import keras
K = keras.backend
from keras.layers import Layer
import tensorflow as tf
class MyCustomLayer(Layer):
def __init__(self, ratio=0.5, **kwargs):
self.ratio = ratio
super(MyCustomLayer, self).__init__(**kwargs)
#tf.function
def call(self, x, is_training=None):
is_training = K.learning_phase()
tf.print("training: ", is_training)
if is_training is 1 or is_training is True:
xs = x * 4
return xs
else:
xs = x*0
return xs
model = Sequential()
model.add(Dense(16, input_dim=input_dim))
model.add(MyCustomLayer(0.5))
model.add(ReLU())
model.add(Dense(32, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(output_dim, activation='softmax', kernel_regularizer=l2(0.01)))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, validation_split=0.05, epochs=5)
In the output I always get:
training: 0
training: 0
training: 0
training: 0
training: 0
training: 0
training: 0
training: 0
Does anyone knows how to fix this?

There are some issues and misconceptions here. First you are mixing imports between keras and tf.keras imports, you should use only one of them. Second the parameter for call is called training, not is_training.
I think the issue is that tf.print does not really print the value of the training variable as its a tensorflow symbolic variable and might change value indirectly. There are other ways to check if the layer behaves differently during inference and training, for example:
class MyCustomLayer(Layer):
def __init__(self, ratio=0.5, **kwargs):
super(MyCustomLayer, self).__init__(**kwargs)
def call(self, inputs, training=None):
train_x = inputs * 4
test_x = inputs * 0
return K.in_train_phase(train_x,
test_x,
training=training)
Using then this model:
model = Sequential()
model.add(Dense(1, input_dim=10))
model.add(MyCustomLayer(0.5))
model.compile(loss='mse', optimizer='adam')
And making an instance of a function that explictly receives the K.learning_phase() variable:
fun = K.function([model.input, K.learning_phase()], [model.output])
If you call it with Klearning_phase() set to 1 or 0 you do see different outputs:
d = np.random.random(size=(2,10))
print(fun([d, 1]))
print(fun([d, 0]))
Result:
[array([[4.1759257], [3.9988194]], dtype=float32)]
[array([[0.], [0.]], dtype=float32)]
And this indicates that the layer has differen behavior during training and inference/testing.

So, I just figured out what was going wrong. I was mixing two different types of classes:
from keras import Sequential
from tensorflow import keras
K = keras.backend
So, the model is using keras and I was calling the flag from tensorflow.keras. For this reason K.learning_phase() was not working as expected.
To fix it I used
from tensorflow.keras import Sequential
from tensorflow import keras
K = keras.backend

Related

Get gradients with respect to inputs in Keras ANN model

bce = tf.keras.losses.BinaryCrossentropy()
ll=bce(y_test[0], model.predict(X_test[0].reshape(1,-1)))
print(ll)
<tf.Tensor: shape=(), dtype=float32, numpy=0.04165391>
print(model.input)
<tf.Tensor 'dense_1_input:0' shape=(None, 195) dtype=float32>
model.output
<tf.Tensor 'dense_3/Sigmoid:0' shape=(None, 1) dtype=float32>
grads=K.gradients(ll, model.input)[0]
print(grads)
None
So here i have Trained a 2 hidden layer neural network, input has 195 features and output is 1 size. I wanted to feed the neural network with validation instances named as X_test one by one with their correct labels in y_test and for each instance calculate the gradients of the output with respect to input, the grads upon printing gives me a None. Your help is appreciated.
One can do this using tf.GradientTape. I wrote the following code to learn a sin wave, and get its derivative in the spirit of this question. I think, it should be possible to extend the following codes in order to compute partial derivatives.
Importing the needed libraries:
import numpy as np
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import losses
import tensorflow as tf
Create the data:
x = np.linspace(0, 6*np.pi, 2000)
y = np.sin(x)
Defining a Keras NN:
def model_gen(Input_shape):
X_input = Input(shape=Input_shape)
X = Dense(units=64, activation='sigmoid')(X_input)
X = Dense(units=64, activation='sigmoid')(X)
X = Dense(units=1)(X)
model = Model(inputs=X_input, outputs=X)
return model
Training the model:
model = model_gen(Input_shape=(1,))
opt = Adam(lr=0.01, beta_1=0.9, beta_2=0.999, decay=0.001)
model.compile(loss=losses.mean_squared_error, optimizer=opt)
model.fit(x,y, epochs=200)
To obtain the gradient of the network w.r.t. the input:
x = list(x)
x = tf.constant(x)
with tf.GradientTape() as t:
t.watch(x)
y = model(x)
dy_dx = t.gradient(y, x)
dy_dx.numpy()
One can further visualise dy_dx to make sure of how smooth the derivative is. Finally, note that one get a smoother derivative when one uses a smooth activation (e.g. sigmoid) instead of Relu as noted here.

Tensorflow 2.0 keras fit overrides the "training" parameter in call function

When creating a model using tensorflow 2.0 I am getting two different behaviours depending on how I do a forward pass:
1) If I do the forward pass using model(X) then it is fine and the "training" parameter in the call method works normally
vs.
2) If I use model.fit(X, y) to run the model instead then the "training" parameter seems to get overriden and set to None regardless of whether its default is True or False.
Does anyone know why this is happening? It means for example that I can't set up the model so that dropout only occurs when training is set to True.
!pip install tensorflow-gpu==2.0.0-alpha0
import tensorflow as tf
from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras import Model
X = np.random.random((250, 5))
y = X[:, 0] > 0 * 1.0
class MyModel(Model):
def __init__(self):
super(MyModel, self).__init__()
self.dense1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)
self.dense2 = tf.keras.layers.Dense(5, activation=tf.nn.softmax)
self.dropout = tf.keras.layers.Dropout(0.5)
def call(self, inputs, training=True):
print("Training ", training)
x = self.dense1(inputs)
if training:
x = self.dropout(x, training=training)
return self.dense2(x)
model = MyModel()
Then this prints out Training True as expected:
model(X) # prints out: Training True
But this prints out Training None ?
model.compile(optimizer='adam', loss='mse')
model.fit(X, y, epochs=1) # prints out: Training None
This was a bug:
The training value of tf.keras.Model.call() becomes None when
tf.keras.Model.fit(). (tf2.0.0-alpha0)
https://github.com/tensorflow/tensorflow/issues/27275
This was resolved in newly released TF2.0.
Please replace !pip install tensorflow-gpu==2.0.0-alpha0 with !pip install tensorflow-gpu==2.3 in the above code. Please check the gist here. Thanks!
The output is shown below.
model.compile(optimizer='adam', loss='mse')
model.fit(X, y, epochs=1) # prints out: Training None
Training True
Training True
8/8 [==============================] - 0s 1ms/step - loss: 0.6448
<tensorflow.python.keras.callbacks.History at 0x7f1f5961edd8>

Tensorflow dense layers worse than keras sequential

I try to train an agent on the inverse-pendulum (similar to cart-pole) problem, which is a benchmark of reinforcement learning. I use neural-fitted-Q-iteration algorithm which uses a multi-layer neural network to evaluate the Q function.
I use Keras.Sequential and tf.layers.dense to build the neural network repectively, and leave all other things to be the same. However, Keras gives me a good results and tensorflow does not. In fact, tensorflow doesn't work at all with its loss being increasing and the agent learns nothing from the training.
Here I present the code for Keras as follows
def build_model():
model = Sequential()
model.add(Dense(5, input_dim=3))
model.add(Activation('sigmoid'))
model.add(Dense(5))
model.add(Activation('sigmoid'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
adam = Adam(lr=1E-3)
model.compile(loss='mean_squared_error', optimizer=adam)
return model
and the tensorflow version is
class NFQ_fit(object):
"""
neural network approximator for NFQ iteration
"""
def __init__(self, sess, N_feature, learning_rate=1E-3, batch_size=100):
self.sess = sess
self.N_feature = N_feature
self.learning_rate = learning_rate
self.batch_size = batch_size
# DNN structure
self.inputs = tf.placeholder(tf.float32, [None, N_feature], 'inputs')
self.labels = tf.placeholder(tf.float32, [None, 1], 'labels')
self.l1 = tf.layers.dense(inputs=self.inputs,
units=5,
activation=tf.sigmoid,
use_bias=True,
kernel_initializer=tf.truncated_normal_initializer(0.0, 1E-2),
bias_initializer=tf.constant_initializer(0.0),
kernel_regularizer=tf.contrib.layers.l2_regularizer(1E-4),
name='hidden-layer-1')
self.l2 = tf.layers.dense(inputs=self.l1,
units=5,
activation=tf.sigmoid,
use_bias=True,
kernel_initializer=tf.truncated_normal_initializer(0.0, 1E-2),
bias_initializer=tf.constant_initializer(0.0),
kernel_regularizer=tf.contrib.layers.l2_regularizer(1E-4),
name='hidden-layer-2')
self.outputs = tf.layers.dense(inputs=self.l2,
units=1,
activation=tf.sigmoid,
use_bias=True,
kernel_initializer=tf.truncated_normal_initializer(0.0, 1E-2),
bias_initializer=tf.constant_initializer(0.0),
kernel_regularizer=tf.contrib.layers.l2_regularizer(1E-4),
name='outputs')
# optimization
# self.mean_loss = tf.losses.mean_squared_error(self.labels, self.outputs)
self.mean_loss = tf.reduce_mean(tf.square(self.labels-self.outputs))
self.regularization_loss = tf.losses.get_regularization_loss()
self.loss = self.mean_loss # + self.regularization_loss
self.train_op = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(self.loss)
The two models are the same. Both of them has two hidden layers with the same dimension. I expect that the problems may come from the kernel initialization but I don't know how to fix it.
Using Keras is great. If you want better TensorFlow integration check out tf.keras. There's no particular reason to use tf.layers if the Keras (or tf.keras) defaults work better.
In this case glorot_uniform looks like the default initializer. This is also the global TensorFlow default, so consider removing the kernel_initializer argument instead of the explicit truncated normal initialization in your question (or passing Glorot explicitly).

Keras,models.add() missing 1 required positional argument: 'layer'

I'm classifying digits of the MNIST dataset using a simple feed forward neural net with Keras. So I execute the code below.
import os
import tensorflow as tf
import keras
from keras.models import Sequential
from keras.layers import Dense, Activation
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('/tmp/data', one_hot=True)
# Path to Computation graphs
LOGDIR = './graphs_3'
# start session
sess = tf.Session()
#Hyperparameters
LEARNING_RATE = 0.01
BATCH_SIZE = 1000
EPOCHS = 10
# Layers
HL_1 = 1000
HL_2 = 500
# Other Parameters
INPUT_SIZE = 28*28
N_CLASSES = 10
model = Sequential
model.add(Dense(HL_1, input_dim=(INPUT_SIZE,), activation="relu"))
#model.add(Activation(activation="relu"))
model.add(Dense(HL_2, activation="relu"))
#model.add(Activation("relu"))
model.add(Dropout(rate=0.9))
model.add(Dense(N_CLASSES, activation="softmax"))
model.compile(
optimizer="Adam",
loss="categorical_crossentropy",
metrics=['accuracy'])
# one_hot_labels = keras.utils.to_categorical(labels, num_classes=10)
model.fit(
x=mnist.train.images,
y=mnist.train.labels,
epochs=EPOCHS,
batch_size=BATCH_SIZE)
score = model.evaluate(
x=mnist.test.images,
y=mnist.test.labels)
print("score = ", score)
However, I get the following error:
model.add(Dense(1000, input_dim=(INPUT_SIZE,), activation="relu"))
TypeError: add() missing 1 required positional argument: 'layer'
The syntax is exactly as shown in the keras docs. I am using keras 2.0.9, so I don't think it's a version control problem. Did I do something wrong?
It seems perfect indeed....
But I noticed you're not creating "an instance" of a sequential model, your using the class name instead:
#yours: model = Sequential
#correct:
model = Sequential()
Since the methods in a class are always declared containing self as the first argument, calling the methods without an instance will probably require the instance as the first argument (which is self).
The method's definition is def add(self,layer,...):

Calling a Keras model on a TensorFlow tensor but keep weights

In Keras as a simplified interface to TensorFlow: tutorial they describe how one can call a Keras model on a TensorFlow tensor.
from keras.models import Sequential
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=784))
model.add(Dense(10, activation='softmax'))
# this works!
x = tf.placeholder(tf.float32, shape=(None, 784))
y = model(x)
They also say:
Note: by calling a Keras model, your are reusing both its architecture and its weights. When you are calling a model on a tensor, you are creating new TF ops on top of the input tensor, and these ops are reusing the TF Variable instances already present in the model.
I interpret this as that the weights of the model will be the same in y as in model. However, for me it seems like the weights in the resulting Tensorflow node are reinitialized. A minimal example can be seen below:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
# Create model with weight initialized to 1
model = Sequential()
model.add(Dense(1, input_dim=1, kernel_initializer='ones',
bias_initializer='zeros'))
model.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
# Save the weights
model.save_weights('file')
# Create another identical model except with weight initialized to 0
model2 = Sequential()
model2.add(Dense(1, input_dim=1, kernel_initializer='zeros',
bias_initializer='zeros'))
model2.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
# Load the weight from the first model
model2.load_weights('file')
# Call model with Tensorflow tensor
v = tf.Variable([[1, ], ], dtype=tf.float32)
node = model2(v)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
print(sess.run(node), model2.predict(np.array([[1, ], ])))
# Prints (array([[ 0.]], dtype=float32), array([[ 1.]], dtype=float32))
Why I want to do this:
I want to use a trained network in another minimization scheme were the network "punish" places in the search space that are not allowed. So if you have ideas not involving this specific approach, that is also very appreciated.
Finally found the answer. There are two problems in the example from the question.
1:
The first and most obvious was that I called the tf.global_variables_intializer() function which will re-initialize all variables in the session. Instead I should have called the tf.variables_initializer(var_list) where var_list is a list of variables to initialize.
2:
The second problem was that Keras did not use the same session as the native Tensorflow objects. This meant that to be able to run the tensorflow object model2(v) with my session sess it needed to be reinitialized. Again Keras as a simplified interface to tensorflow: Tutorial was able to help
We should start by creating a TensorFlow session and registering it with Keras. This means that Keras will use the session we registered to initialize all variables that it creates internally.
import tensorflow as tf
sess = tf.Session()
from keras import backend as K
K.set_session(sess)
If we apply these changes to the example provided in my question we get the following code that does exactly what is expected from it.
from keras import backend as K
from keras.models import Sequential
from keras.layers import Dense
sess = tf.Session()
# Register session with Keras
K.set_session(sess)
model = Sequential()
model.add(Dense(1, input_dim=1, kernel_initializer='ones',
bias_initializer='zeros'))
model.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
model.save_weights('test')
model2 = Sequential()
model2.add(Dense(1, input_dim=1, kernel_initializer='zeros',
bias_initializer='zeros'))
model2.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
model2.load_weights('test')
v = tf.Variable([[1, ], ], dtype=tf.float32)
node = model2(v)
init = tf.variables_initializer([v, ])
sess.run(init)
print(sess.run(node), model2.predict(np.array([[1, ], ])))
# prints: (array([[ 1.]], dtype=float32), array([[ 1.]], dtype=float32))
Conclusion:
The lesson is that when mixing Tensorflow and Keras, make sure everything uses the same session.
Thanks for asking this question, and answering it, it helped me! In addition to setting the same tf session in the Keras backend, it is also important to note that if you want to load a Keras model from a file, you need to run a global variable initializer op before you load the model.
sess = tf.Session()
# make sure keras has the same session as this code
tf.keras.backend.set_session(sess)
# Do this BEFORE loading a keras model
init_op = tf.global_variables_initializer()
sess.run(init_op)
model = models.load_model('path/to/your/model.h5')