How can I tell Keras the learning phase when I use train_on_batch to train a model? - tensorflow

I have dropout layers in my model so I want keras to figure out the training and test phases to run or ignore the dropout layers, and I found that K.set_learning_phase can do me this favor but how can I add it to training and test processes? My code is like this:
def discriminator(self):
x_A = Input(shape=self.shape)
x_B = Input(shape=self.shape)
x = concatenate([x_A, x_B], axis=-1)
self.model = Sequential()
self.model.add(Dropout(0.5, input_shape=self.shape_double))
self.model.add(LSTM(200, return_sequences=True, kernel_constraint=unit_norm()))
self.model.add(Dropout(0.5))
self.model.add(LSTM(200, return_sequences=True, kernel_constraint=unit_norm()))
self.model.add(Dropout(0.5))
self.model.add(Flatten())
self.model.add(Dense(8, activation="softmax", kernel_constraint=unit_norm())
label=self.model(x)
return Model([x_A,x_B], label)
...
def train(self, epoch, batch_size):
for epoch in range(epochs):
for batch,train_A,train_B,train_label in enumerate(Load_train(batch_size)):
Dloss = self.discriminator.train_on_batch([train_A,train_B],train_label)
...
def test(self,test_A,test_B,test_label):
predicted_label_dist = self.discriminator.predict([test_A,test_B])
...
Any suggestions will be appreciated. Thanks.

Keras does figure out the appropriate learning phase on its own by default when you call fit or predict. Hence, your dropout will only be applied during training but not during testing. However, if you still wish to configure training phase on your own i.e. overwrite the default behaviour you can do it like this (from the keras docs):
keras.backend.set_learning_phase(value)
Where:
value: Learning phase value, either 0 or 1 (integers).
simply add this code in your training and testing function.

Related

Validation loss reported seems to be wrong, can preprocessing be the reason?

I'm training a resnet model with Keras, fine tuned on my own images. While training, Tensorboard is constantly reporting a validation loss that seems unrelated to training loss (much higher, see image below where train is orange line and validation blue line). Furthermore when training is finished (for example final losses as reported by Tensorboard could be respectively 0.06 and 0.57) I evaluate the model "manually" and validation loss seems to be in the same range of training loss (ex:0.07).
I suspect that preprocessing could be the reason of this strange result. Essentially the inputs and the outputs of the model are created like this:
inp = tf.keras.Input(input_shape)
resnet = tf.keras.applications.ResNet50V2(include_top=False, input_shape=input_shape, input_tensor=inp,pooling="avg")
# Add ResNet50V2 specific preprocessing method into the model.
preprocessed = tf.keras.layers.Lambda(lambda x: tf.keras.applications.resnet_v2.preprocess_input(x))(inp)
out = resnet(preprocessed)
out = tf.keras.layers.Dense(num_outputs, activation=None)(out)
and the training :
model.compile(
optimizer=tf.keras.optimizers.Adam(lrate),
loss='mse',
metrics=[tf.keras.metrics.MeanSquaredError()],
)
model.fit(
train_dataset,
epochs=epochs,
validation_data=val_dataset,
callbacks=callbacks
)
It's like if preprocessing does not occur when validation loss is calculated but I don't know why.

Shouldn't same neural network weights produce same results?

So I am working with different deep learning frameworks as part of my research and have observed something weird (at least I cannot explain the cause of it).
I trained a fairly simple MLP model (on mnist dataset) in Tensorflow, extracted trained weights, created the same model architecture in PyTorch and applied the trained weights to PyTorch model. Now my expectation is to get same test accuracy from both Tensorflow and PyTorch models but this isn't the case. I get different results.
So my question is: If a model is trained to some optimal value, shouldn't the trained weights produce same results every time testing is done on the same dataset (regardless of the framework used)?
PyTorch Model:
class Net(nn.Module):
def __init__(self) -> None:
super(Net, self).__init__()
self.fc1 = nn.Linear(784, 24)
self.fc2 = nn.Linear(24, 10)
def forward(self, x: Tensor) -> Tensor:
x = torch.flatten(x, 1)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
Tensorflow Model:
def build_model() -> tf.keras.Model:
# Build model layers
model = models.Sequential()
# Flatten Layer
model.add(layers.Flatten(input_shape=(28,28)))
# Fully connected layer
model.add(layers.Dense(24, activation='relu'))
model.add(layers.Dense(10))
# compile the model
model.compile(
optimizer='sgd',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy']
)
# return newly built model
return model
To extract weights from Tensorflow model and apply them to Pytorch model I use following functions:
Extract Weights:
def get_weights(model):
# fetch latest weights
weights = model.get_weights()
# transpose weights
t_weights = []
for w in weights:
t_weights.append(np.transpose(w))
# return
return t_weights
Apply Weights:
def set_weights(model, weights):
"""Set model weights from a list of NumPy ndarrays."""
state_dict = OrderedDict(
{k: torch.Tensor(v) for k, v in zip(model.state_dict().keys(), weights)}
)
self.load_state_dict(state_dict, strict=True)
Providing solution in answer section for the benefit of community. From comments
If you are using the same weights in the same manner then results
should be the same, though float rounding error should also be
accounted. Also it doesn't matter if model is trained at all. You can
think of your model architecture as a chain of matrix multiplications
with element-wise nonlinearities in between. How big is the
difference? Are you comparing model outputs, our metrics computed over
dataset? As a suggestion, intialize model with some random values in
Keras, do a forward pass for a single batch (paraphrased from jdehesa and Taras Sereda)

What is the correct way to implement a 'useless loss' with Keras?

I have a Keras model that has two outputs:
output is the true output of the network on which the loss is going to be computed
additional is used to make an external task during inference (no loss should be computed with this output)
When I build the model, I write something like that:
model = Model(inputs=inp, outputs=[output, additional])
Since my Model has two outputs, I need to provide two losses when compiling the model so I created a useless loss like this:
class NoopLoss(object):
def __call__(self, y_true, y_pred, **kwargs):
return self.compute_loss(y_true, y_pred)
def compute_loss(self, y_true, y_pred):
return tf.math.square(0.0)
Which I integrate in the compile step like this:
loss = UsefulLoss() # the real loss I'm using
noop_loss = NoopLoss()
model.compile(loss=[loss, noop_loss], optimizer=optimizer, metrics=['binary_accuracy'])
It works, but I feel it is a bit hackish, is there a correct way to implement this behavior? I didn't find any official useless loss in the Keras documentation.
In my opinion, Keras was not thought to consider things like this.
I often use these hacks myself too.
But, not sure it's a better solution, actually it might not be, you can create a training model and an inference model, both sharing the trainable part:
inputs = Input(...)
trainable_out = SomeLayer(...)(inputs)
....
trainable_out = ....
extra_output = SomeLayer(...)(something)
training_model = Model(inputs, trainable_out)
inference_model = Model(inputs, [trainable_out, extra_output])
You can train training_model and automatically the other model will be trained as well.

combine prediction output of one frozen model to predict and then use those predictions as a loss for the model which i am training?

I want to combine the prediction output of one frozen model in a training phase of another.
I have tried using different graph sessions but it resets the default graph in the training phase.
predictions = model1.model(input1, input2, mode)
predictions2 = model2.predict(predictions)
loss1 = mean_squared_error(predictions, labels)
loss2 = mean_squared_error(input2, predictions2)
total_loss = loss1+loss2
optimizer.minimize(total_loss)
ValueError: Tensor Tensor("output_layer/BiasAdd:0", shape=(?, 100), dtype=float32) is not an element of this graph
I just figured this out!
In the tensorflow's estimator framework, load the model in the 'model_fn' of your Estimator space with the attribute:
'keras_model.trainable=False'
eg snippet:
def model_fn(inputs):
.......
#some operations
.......
model2=load_model('frozen_model.h5')
model2.trainable=False
model2.summary()
predictions= model2(inputs=predictions)

How to handle BN and DO behavioural changes in subclassed models?

So, batch normalization and dropout are layers that change behaviour depending on whether you're in training or inferencing phase. Usually, Keras takes care of that on behalf of me. But, if I'm doing custom training, how can I handle that?
What I've done: added if statement to bypass dropout layer while in inference mode
class mymodel(tf.keras.models.Model):
def __init__(self, **kwargs):
super(mymodel, self).__init__(**kwargs)
self.l1 = tf.keras.layers.Dense(3, input_shape=(2,))
self.l2 = tf.keras.layers.Dropout(0.9)
def call(self, x, training=None):
x = self.l1(x)
if training:
x = self.l2(x)
return x
I'm not sure if that's all? And what about Batch normalization?
EDIT: my 'custom training loop' for the toy example above is:
def train_one_ste(model, batch)
with tf.GradientTape() as tape:
output = model(batch)
grad = tape.gradient(output, model.trainable_weights)
optimizer.apply_gradients(zip(grad, model.trainable_weight)
For this you can control the learning phase manually, using K.set_learning_phase(1) during training, and K.set_learning_phase(0) during testing/inference. Here K is the module keras.backend.
Also note that to run one training step with a given batch, you can use model.train_on_batch(x, y), in which case Keras will manage the learning phase for you.