Multiple loss functions on (somewhat) overlapping sub-models in Keras - tensorflow

I have a model in Keras where I would like to use two loss functions. The model consists of an autoencoder and a classifier on top of it. I would like to have one loss function that makes sure the autoencoder is fitted reasonably well (for example, it can be mse) and another loss function that evaluates the classifier (for example, categorical_crossentropy). I would like to fit my model and use a loss function that would be a linear combination of the two loss functions.
# loss functions
def ae_mse_loss(x_true, x_pred):
ae_loss = K.mean(K.square(x_true - x_pred), axis=1)
return ae_loss
def clf_loss(y_true, y_pred):
return K.sum(K.categorical_crossentropy(y_true, y_pred), axis=-1)
def combined_loss(y_true, y_pred):
???
return ae_loss + w1*clf_loss
where w1 is some weight that defines "importance of clf_loss" in the final combined loss.
# autoencoder
ae_in_layer = Input(shape=in_dim, name='ae_in_layer')
ae_interm_layer1 = Dense(interm_dim, activation='relu', name='ae_interm_layer1')(ae_in_layer)
ae_mid_layer = Dense(latent_dim, activation='relu', name='ae_mid_layer')(ae_interm_layer1)
ae_interm_layer2 = Dense(interm_dim, activation='relu', name='ae_interm_layer2')(ae_mid_layer)
ae_out_layer = Dense(in_dim, activation='linear', name='ae_out_layer')(ae_interm_layer2)
ae_model=Model(ae_input_layer, ae_out_layer)
ae_model.compile(optimizer='adam', loss = ae_mse_loss)
# classifier
clf_in_layer = Dense(interm_dim, activation='sigmoid', name='clf_in_layer')(ae_out_layer)
clf_out_layer = Dense(3, activation='softmax', name='clf_out_layer')(clf_in_layer)
clf_model = Model(clf_in_layer, clf_out_layer)
clf_model.compile(optimizer='adam', loss = combined_loss, metrics = [ae_mse_loss, clf_loss])
What I'm not sure about is how to distinguish y_true and y_pred in the two loss functions (since they refer to true and predicted data at different stages in the model). What I had in mind is something like this (I'm not sure how to implement it since obviously I need to pass only one set of arguments y_true & y_pred):
def combined_loss(y_true, y_pred):
ae_loss = ae_mse_loss(x_true_ae, x_pred_ae)
clf_loss = clf_loss(y_true_clf, y_pred_clf)
return ae_loss + w1*clf_loss
I could define this problem as two separate models and train each model separately but I would really prefer if I could do this all at once if possible (since it would optimize both problems simultaneously). I realize, this model doesn't make much sense but it demonstrates the (much more complicated) problem I'm trying to solve in a simple way.
Any suggestions would be appreciated.

All you need is simply available in native keras
you can automatically combine multiple losses using loss_weights parameter
In the example below I tried to reproduce your example where I combined an mse loss for the regression task and a categorical_crossentropy for the classification task
in_dim = 10
interm_dim = 64
latent_dim = 32
n_class = 3
n_sample = 100
X = np.random.uniform(0,1, (n_sample,in_dim))
y = tf.keras.utils.to_categorical(np.random.randint(0,n_class, n_sample))
# autoencoder
ae_in_layer = Input(shape=in_dim, name='ae_in_layer')
ae_interm_layer1 = Dense(interm_dim, activation='relu', name='ae_interm_layer1')(ae_in_layer)
ae_mid_layer = Dense(latent_dim, activation='relu', name='ae_mid_layer')(ae_interm_layer1)
ae_interm_layer2 = Dense(interm_dim, activation='relu', name='ae_interm_layer2')(ae_mid_layer)
ae_out_layer = Dense(in_dim, activation='linear', name='ae_out_layer')(ae_interm_layer2)
# classifier
clf_in_layer = Dense(interm_dim, activation='sigmoid', name='clf_in_layer')(ae_out_layer)
clf_out_layer = Dense(n_class, activation='softmax', name='clf_out_layer')(clf_in_layer)
model = Model(ae_in_layer, [ae_out_layer,clf_out_layer])
model.compile(optimizer='adam',
loss = {'ae_out_layer':'mse', 'clf_out_layer':'categorical_crossentropy'},
loss_weights = {'ae_out_layer':1., 'clf_out_layer':0.5})
model.fit(X, [X,y], epochs=10)
In this specific case, the loss is the result of 1*ae_out_layer_loss + 0.5*clf_out_layer_loss

Related

Completely different results using Tensorflow and Pytorch for MobilenetV3 Small

I am using transfer learning from MobileNetV3 Small to predict 5 different points on an image. I am doing this as a regression task.
For both models:
Setting the last 50 layers trainable and adding the same fully connected layers to the end.
Learning rate 3e-2
Batch size 32
Adam optimizer with the same betas
100 epochs
The inputs consist of RGB unscaled images
Pytorch
Model
def _init_weights(m):
if type(m) == nn.Linear:
nn.init.xavier_uniform_(m.weight)
m.bias.data.fill_(0.01)
def get_mob_v3_small():
model = torchvision.models.mobilenet_v3_small(pretrained=True)
children_list = get_children(model)
for c in children_list[:-50]:
for p in c.parameters():
p.requires_grad = False
return model
class TransferMobileNetV3_v2(nn.Module):
def __init__(self,
num_keypoints: int = 5):
super(TransferMobileNetV3_v2, self).__init__()
self.classifier_neurons = num_keypoints*2
self.base_model = get_mob_v3_small()
self.base_model.classifier = nn.Sequential(
nn.Linear(in_features=1024, out_features=1024),
nn.ReLU(),
nn.Linear(in_features=1024, out_features=512),
nn.ReLU(),
nn.Linear(in_features=512, out_features=self.classifier_neurons)
)
self.base_model.apply(_init_weights)
def forward(self, x):
out = self.base_model(x)
return out
Training Script
def train(net, trainloader, testloader, train_loss_fn, optimizer, scaler, args):
len_dataloader = len(trainloader)
for epoch in range(1, args.epochs+1):
net.train()
for batch_idx, sample in enumerate(trainloader):
inputs, labels = sample
inputs, labels = inputs.to(args.device), labels.to(args.device)
optimizer.zero_grad()
with torch.cuda.amp.autocast(args.use_amp):
prediction = net(inputs)
loss = train_loss_fn(prediction, labels)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
def main():
args = make_args_parser()
args.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
seed = args.seed
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(seed)
loss_fn = nn.MSELoss()
optimizer = optim.Adam(net.parameters(), lr=3e-2,
betas=(0.9, 0.999))
scaler = torch.cuda.amp.GradScaler(enabled=args.use_amp)
train(net, train_loader, test_loader, loss_fn, optimizer, scaler, args)
Tensorflow
Model
base_model = tf.keras.applications.MobileNetV3Small(weights='imagenet',
input_shape=(224,224,3))
x_in = base_model.layers[-6].output
x = Dense(units=1024, activation="relu")(x_in)
x = Dense(units=512, activation="relu")(x)
x = Dense(units=10, activation="linear")(x)
model = Model(inputs=base_model.input, outputs=x)
for layer in model.layers[:-50]:
layer.trainable=False
Training Script
model.compile(loss = "mse",
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-2))
history = model.fit(input_numpy, output_numpy,
verbose=1,
batch_size=32, epochs=100,validation_split = 0.2)
Results
The PyTorch model predicts one single point around the center for all 5 different points.
The Tensorflow model predicts the points quite well and are quite accurate.
The loss in the Pytorch model is much higher than the Tensorflow model.
Please do let me know what is going wrong as I am trying my best to shift to PyTorch for this work and I need this model to give me similar/identical results. Please do let me know what is going wrong as I am trying my best to shift to PyTorch for this work and I need this model to give me similar/identical results.
Note: I also noticed that the MobileNetV3 Small model seems to be different in PyTorch and different in Tensorflow. I do not know if am interpreting it wrong, but I'm putting it here just in case.

Trying to visualize activations in Tensorflow

I created a simple cnn to detect custom digits and I am trying to visualize the activations of my layers. When I run the following code layer_outputs = [layer.output for layer in model.layers[:9]] I get the error Layer conv2d has no inbound nodes
When I searched online, it said to define input shape of first layer, but I've already done that and I'm not sure why that is happening. Below is my model.
class myModel(Model):
def __init__(self):
super().__init__()
self.conv1 = Conv2D(filters=32, kernel_size=(3,3), activation='relu', padding='same',
input_shape=(image_height, image_width, num_channels))
self.maxPool1 = MaxPool2D(pool_size=(2,2))
self.conv2 = Conv2D(filters=64, kernel_size=(3,3), activation='relu', padding='same')
self.maxPool2 = MaxPool2D(pool_size=(2,2))
self.conv3 = Conv2D(filters=64, kernel_size=(3,3), activation='relu', padding='same')
self.maxPool3 = MaxPool2D(pool_size=(2,2))
self.flatten = Flatten()
self.d1 = Dense(128, activation='relu')
self.d2 = Dense(10, activation='softmax')
def call(self, x):
x = self.conv1(x)
x = self.maxPool1(x)
x = self.conv2(x)
x = self.maxPool2(x)
x = self.conv3(x)
x = self.maxPool3(x)
x = self.flatten(x)
x = self.d1(x)
x = self.d2(x)
return x
Based on your stated goal and what you've posted, I believe the problem here is slightly (and very understandably) misunderstanding the way the TensorFlow APIs work. The model object and its constituent parts only store state for the model, not the evaluation of it, for example the hyperparameters you've set and the parameters the model learns when its fed training data. Even if you worked to fix the problem with what you're trying, the .output of the layer objects as part of the model wouldn't return the activations you want to visualize. It instead returns the part of the TensorFlow graph that represents that part of the computation.
For what you want to do, you'll need to manipulate an object that's the result of calling the .predict function on the model that you've set up and trained. Or you could drop down to below the Keras abstractions and manipulate the tensors directly.
If I gave this more thought, there's probably a reasonably elegant way to get this by only evaluating your graph (i.e., calling .predict) once, but the most obvious naïve way is simply to instantiate several new models (or several subclasses of your model) with each of the layers of interest as the terminal output, which should get you what you want.
For example, you could do something like this for each of the layers whose outputs you're interested in:
my_test_image = # get an image
input = Input(shape=(None, 256, 256, 3)) # input size will need to be set according to the relevant model
outputs_of_interest = Model(input, my_model.layers[-2].output)
outputs_of_interest.predict(my_test_image) # <=== this has the output you want

Keras gives 'Not JSON Serializable' error when saving the model

I'm implementing a fully convolutional neural network for image segmentation by using unet defined here
https://github.com/zhixuhao
To give different weights to the pixels of different classes I defined an extra Lambda layer, as suggested here
Keras, binary segmentation, add weight to loss function
The problem is that Keras raises this error when saving the model
.....
self.model.save(filepath, overwrite=True)
.....
TypeError: ('Not JSON Serializable:', b'\n\x15clip_by_value/Minimum\x12\x07Minimum\x1a\x12conv2d_23/Identity\x1a\x17clip_by_value/Minimum/y*\x07\n\x01T\x12\x020\x01')
My network is defined in an external function
def weighted_binary_loss(X):
y_pred, y_true, weights = X
loss = binary_crossentropy(y_true, y_pred)
weights_mask = y_true*weights[0] + (1.-y_true)*weights[1]
loss = multiply([loss, weights_mask])
return loss
def identity_loss(y_true, y_pred):
return y_pred
def net()
.....
....
conv10 = Conv2D(1, 1, activation = 'sigmoid')(conv9)
w_loss = Lambda(weighted_binary_loss, output_shape=input_size, name='loss_output')([conv10, inputs, weights])
model = Model(inputs = inputs, outputs = w_loss)
model.compile(optimizer = Adam(lr = 1e-5), loss = identity_loss, metrics = ['accuracy'])
that I call in my main function
...
model_checkpoint = ModelCheckpoint('temp_model.hdf5', monitor='loss',verbose=1, save_best_only=True)
model.fit_generator(imgs,steps_per_epoch=20,epochs=1,callbacks=[model_checkpoint])
When I erase the Lambda layer, the error desappears
...
conv10 = Conv2D(1, 1, activation = 'sigmoid')(conv9)
model = Model(inputs = inputs, outputs = conv10)
model.compile(optimizer = Adam(lr = 1e-5), loss = 'binary_crossentropy', metrics = ['accuracy'])
I'm using
Keras==2.2.4, tensorflow-gpu==2.0.0b1
It appears that you are computing the loss in the layer of a model. It is not a good practice to accomodate the loss function as a layer. You can compute your weighted loss using custom loss function.
So your code can be rewritten as follows:
def weighted_binary_loss(y_true, y_pred):
weights = [0.5, 0.6] # Define your weights here
loss = binary_crossentropy(y_true, y_pred)
weights_mask = y_true*weights[0] + (1.-y_true)*weights[1]
loss = multiply([loss, weights_mask])
return loss
conv10 = Conv2D(1, 1, activation = 'sigmoid')(conv9)
model = Model(inputs = inputs, outputs = conv10)
model.compile(optimizer = Adam(lr = 1e-5), loss = weighted_binary_loss, metrics = ['accuracy'])
If it is needed that weights is a dynamic property and you have to send it as a separate parameter in loss function, you can follow this question.

Tensorflow dense layers worse than keras sequential

I try to train an agent on the inverse-pendulum (similar to cart-pole) problem, which is a benchmark of reinforcement learning. I use neural-fitted-Q-iteration algorithm which uses a multi-layer neural network to evaluate the Q function.
I use Keras.Sequential and tf.layers.dense to build the neural network repectively, and leave all other things to be the same. However, Keras gives me a good results and tensorflow does not. In fact, tensorflow doesn't work at all with its loss being increasing and the agent learns nothing from the training.
Here I present the code for Keras as follows
def build_model():
model = Sequential()
model.add(Dense(5, input_dim=3))
model.add(Activation('sigmoid'))
model.add(Dense(5))
model.add(Activation('sigmoid'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
adam = Adam(lr=1E-3)
model.compile(loss='mean_squared_error', optimizer=adam)
return model
and the tensorflow version is
class NFQ_fit(object):
"""
neural network approximator for NFQ iteration
"""
def __init__(self, sess, N_feature, learning_rate=1E-3, batch_size=100):
self.sess = sess
self.N_feature = N_feature
self.learning_rate = learning_rate
self.batch_size = batch_size
# DNN structure
self.inputs = tf.placeholder(tf.float32, [None, N_feature], 'inputs')
self.labels = tf.placeholder(tf.float32, [None, 1], 'labels')
self.l1 = tf.layers.dense(inputs=self.inputs,
units=5,
activation=tf.sigmoid,
use_bias=True,
kernel_initializer=tf.truncated_normal_initializer(0.0, 1E-2),
bias_initializer=tf.constant_initializer(0.0),
kernel_regularizer=tf.contrib.layers.l2_regularizer(1E-4),
name='hidden-layer-1')
self.l2 = tf.layers.dense(inputs=self.l1,
units=5,
activation=tf.sigmoid,
use_bias=True,
kernel_initializer=tf.truncated_normal_initializer(0.0, 1E-2),
bias_initializer=tf.constant_initializer(0.0),
kernel_regularizer=tf.contrib.layers.l2_regularizer(1E-4),
name='hidden-layer-2')
self.outputs = tf.layers.dense(inputs=self.l2,
units=1,
activation=tf.sigmoid,
use_bias=True,
kernel_initializer=tf.truncated_normal_initializer(0.0, 1E-2),
bias_initializer=tf.constant_initializer(0.0),
kernel_regularizer=tf.contrib.layers.l2_regularizer(1E-4),
name='outputs')
# optimization
# self.mean_loss = tf.losses.mean_squared_error(self.labels, self.outputs)
self.mean_loss = tf.reduce_mean(tf.square(self.labels-self.outputs))
self.regularization_loss = tf.losses.get_regularization_loss()
self.loss = self.mean_loss # + self.regularization_loss
self.train_op = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(self.loss)
The two models are the same. Both of them has two hidden layers with the same dimension. I expect that the problems may come from the kernel initialization but I don't know how to fix it.
Using Keras is great. If you want better TensorFlow integration check out tf.keras. There's no particular reason to use tf.layers if the Keras (or tf.keras) defaults work better.
In this case glorot_uniform looks like the default initializer. This is also the global TensorFlow default, so consider removing the kernel_initializer argument instead of the explicit truncated normal initialization in your question (or passing Glorot explicitly).

Generative Adversarial Networks in Keras doesn't work like expected

I'm a beginner in Keras machine learning. I'm Trying to understand the Generative Adversarial Networks (GAN). For this purpose i'm trying to program a simple example. Im generating data With the following function:
def genReal(l):
realX = []
for i in range(l):
x = []
y = []
for i in np.arange(0.0, 1.0, 0.02):
x.append(i + np.random.normal(0,0.01))
y.append(-abs(i-0.5)+0.5+ np.random.normal(0,0.01))
data = np.array(list(zip(x, y)))
data = np.reshape(data, (100))
data.clip(0,1)
realX.append(data)
realX = np.array(realX)
return realX
Data that is gerated with this fuction looks similar to these examples:
Now the aim should be to train a Neural Network to generate similar data.
For the GAN we need a Generator Network which i modeled like this:
generator = Sequential()
generator.add(Dense(128, input_shape=(100,), activation='relu'))
generator.add(Dropout(rate=0.2))
generator.add(Dense(128, activation='relu'))
generator.add(Dropout(rate=0.2))
generator.add(Dense(100, activation='sigmoid'))
generator.compile(loss='mean_squared_error', optimizer='adam')
an a discriminator which looks like this:
discriminator = Sequential()
discriminator.add(Dense(128, input_shape=(100,), activation='relu'))
discriminator.add(Dropout(rate=0.2))
discriminator.add(Dense(128, activation='relu'))
discriminator.add(Dropout(rate=0.2))
discriminator.add(Dense(1, activation='sigmoid'))
discriminator.compile(loss='mean_squared_error', optimizer='adam')
the combined model:
ganInput = Input(shape=(100,))
x = generator(ganInput)
ganOutput = discriminator(x)
GAN = Model(inputs=ganInput, outputs=ganOutput)
GAN.compile(loss='binary_crossentropy', optimizer='adam')
I have a function that generates noise (a random array)
def noise(l):
noise = np.array([np.random.uniform(0, 1, size=[l, ])])
return noise
And then i'm training the model:
for i in range(1000000):
fake = generator.predict(noise(100))
print(i, "==>", discriminator.predict(fake))
discriminator.train_on_batch(genReal(1), np.array([1]))
discriminator.train_on_batch(fake, np.array([0]))
discriminator.trainable = False
GAN.train_on_batch(noise(100), np.array([1]))
discriminator.trainable = True
Like you can see i've already tried to train the model for 1. Mio iterations. But the generator outputs data that looks like this afterwards (despite of different inputs):
Definitely not what I wanted. So my question is: Is 1. Mio Iterations not enough, or is there anything wrong in the concept of my program
edit:
That is the function with which i plot my data:
def plotData(data):
x = np.reshape(data,(50,2))
x = x.tolist()
plt.scatter(list(zip(*x))[0],list(zip(*x))[1], c=col)
The problem with your implementation is that discriminator.trainable = False doesn't have any effect after compiling discriminator. Therefore, all the weights (both from the discriminator and the generator networks) are trainable when you execute GAN.train_on_batch.
The solution to this problem is to set discriminator.trainable = False right after compiling discriminator and before compiling GAN:
discriminator.compile(loss='mean_squared_error', optimizer='adam')
discriminator.trainable = False
ganInput = Input(shape=(100,))
x = generator(ganInput)
ganOutput = discriminator(x)
GAN = Model(inputs=ganInput, outputs=ganOutput)
GAN.compile(loss='binary_crossentropy', optimizer='adam')
NOTE. I have plotted your data and it looks more like this: