I am working on some kind of framework for myself built on top of Tensorflow and Keras. As a start, I wrote just the core of the framework and implemented a first toy example. This toy example is just a classic feed forward network solivng XOR.
It's probably not necessary to explain everything around it but I implemented the loss function like this:
class MeanSquaredError(Modality):
def loss(self, y_true, y_pred, sample_weight=None):
y_true = tf.cast(y_true, dtype=y_pred.dtype)
loss = tf.keras.losses.MeanSquaredError(reduction=tf.keras.losses.Reduction.NONE)(y_true, y_pred)
return tf.reduce_sum(loss) / self.model_hparams.model.batch_size
This will be used in the actual model class like this:
class Model(keras.Model):
def loss(self, y_true, y_pred, weights=None):
target_modality = self.modalities['targets'](self.problem.hparams, self.hparams)
return target_modality.loss(y_true, y_pred)
Now, when it comes to training, I can train the model like this:
model.compile(
optimizer=keras.optimizers.Adam(0.001),
loss=model.loss, # Simply setting 'mse' works as well here
metrics=['accuracy']
)
or I can just set loss=mse. Both cases work as expected without any problems.
However, I have another Modality class which I am using for sequence-to-sequence (e.g. translation) tasks. It looks like this:
class CategoricalCrossentropy(Modality):
"""Simple SymbolModality with one hot as embeddings."""
def loss(self, y_true, y_pred, sample_weight=None):
labels = tf.reshape(y_true, shape=(tf.shape(y_true)[0], tf.reduce_prod(tf.shape(y_true)[1:])))
y_pred = tf.reshape(y_pred, shape=(tf.shape(y_pred)[0], tf.reduce_prod(tf.shape(y_pred)[1:])))
loss = tf.keras.losses.CategoricalCrossentropy(reduction=tf.keras.losses.Reduction.NONE, from_logits=True)(labels, y_pred)
return tf.reduce_mean(loss) / self.model_hparams.model.batch_size
What this does is just reshaping the y_true and y_pred tensors [batch_size, seq_len, embedding_size] to [seq_len * batch_size, embedding_size] - effectively stacking all examples. From this, the categorical cross-entropy is calculated and normalized.
Now, the model I am using is a very simple LSTM - this isn't important though. As I am training the model like this:
model.compile(
optimizer=keras.optimizers.Adam(0.001),
loss='categorical_crossentropy', # <-- Setting the loss via string argument (works)
metrics=['accuracy']
)
The model does learn the task as expected. However, if I use the CategoricalCrossentropy-modality from above, setting loss=model.loss, the model does not converge at all. The loss oscillates randomly but does not converge.
And this is where I am scrathing my head. Since the simple XOR-examples works, both ways, and since setting categorical_crossentropy works as well, I do not quite see why using said modality doesn't work.
Am I doing something obviously wrong?
I am sorry that I cannot provide a small example here but this not possible since the framework already consists of some lines of code. Empirically speaking, everything should work.
Any ideas how I could track down the issue or what might be causing this?
You're creating a tuple of tensors for shape. That might not work.
Why not just this?
labels = tf.keras.backend.batch_flatten(y_true)
y_pred = tf.keras.backend.batch_flatten(y_pred)
The standard 'categorical_crossentropy' loss does not perform any kind of flattening, and it considers as classes the last axis.
Are you sure you want to flatten your data? If you flatten, you will multiply the number of classes by the number of steps, this doesn't seem to make much sense.
Also, the standard 'categorical_crossentropy' loss uses from_logits=False!
The standard loss expects outputs from a "softmax" activation, while from_logits=True expects outputs without that activation.
Related
I am trying to implement a GAN called the SimGAN proposed by Apple researchers. The SimGAN is used to refine labelled synthetic images so that they look more like the unlabelled real images.
The link to the paper can be found on arXiv here.
In the paper, the loss function of the combined model, which comprises the generator and the discriminator, has a self-regularization component in the form of an L1 loss that penalizes too great a difference between the synthetic images and the images after refinement. In other words, the refinement should not be too drastic.
I would like to know how I can implement this self-regularization loss in Keras. Here is what I tried:
def self_regularization_loss(refined_images, syn_images):
def l1loss(y_true, y_pred):
return keras.metrics.mean_absolute_error(refined_images, syn_images)
return l1loss
However, I do not think I can compile the model in the way below as the batches of refined and synthetic images change during training time.
model.compile(loss=[self_regularization_loss(current_batch_of_refined, current_batch_of_synthetic),
local_adversarial_loss],
optimizer=opt)
What is the way to implement this loss?
Trying using the tf.function decorator and tf.GradientTape():
#tf.function
def train_step(model, batch):
with tf.GradientTape() as tape:
refined_images, syn_images = batch
loss = self_regularization_loss(model, refined_images, syn_images)
gradients = tape.gradient(loss, model.trainable_variables)
self.optimizer.apply_gradients(zip(gradients, model.trainable_variables))
your training loop can look something like:
for image_batch in dataset:
train_step(model, image_batch)
Here it is assumed that model is of type tf.keras.Model. More details to the model class can be found here. Note that model is also passed to self_regularization_loss. In this function your model recieves both images as inputs and then gives you the respective output. Then you calculate your loss.
I want to train a Neural Network for a classification task in Keras using a TensorFlow backend with a custom loss function. In my loss, I want to give different weights to different training examples. I have some datapoints I consider important and some I do not consider as important. I want my loss function to take this into account and punish errors in important examples more than in less important ones.
I have already built my model:
input = tf.keras.Input(shape=(16,))
hidden_layer_1 = tf.keras.layers.Dense(5, kernel_initializer='glorot_uniform', activation='relu')(input)
output = tf.keras.layers.Dense(1, kernel_initializer='normal', activation='softmax')(hidden_layer_1)
model = tf.keras.Model(input, output)
model.compile(loss=custom_loss(input), optimizer='adam', run_eagerly=True, metrics = [tf.keras.metrics.Accuracy(), 'acc'])
and the currrent state of my loss function is:
def custom_loss(input):
def loss(y_true, y_pred):
return ...
return loss
I'm struggling with implementing the loss function in the way I explained above, mainly because I don't exactly know what input, y_pred and y_true are (KerasTensors, I know - but what is the content? And is it for one training example only or for the whole batch?). I'd appreciate help with
printing out the values of input, y_true and y_pred
converting the input value to a numpy ndarray ([1,3,7] for example) so I can use the array to look up my weight for this specific training data point
once I have my weigth as a number (0.5 for example), how do I implement the computation of the loss function in Keras? My loss for one training exaple should be 0 if the classification was correct and weight if it was incorrect.
I am currently trying to build a deep learning model with three different loss functions in Keras. The first loss function is the typical mean squared error loss. The other two loss functions are the ones I built myself, which finds the difference between a calculation made from the input image and the output image (this code is a simplified version of what I'm doing).
def p_autoencoder_loss(yTrue,yPred):
def loss(yTrue, y_Pred):
return K.mean(K.square(yTrue - yPred), axis=-1)
def a(image):
return K.mean(K.sin(image))
def b(image):
return K.sqrt(K.cos(image))
a_pred = a(yPred)
a_true = a(yTrue)
b_pred = b(yPred)
b_true = b(yTrue)
empirical_loss = (loss(yTrue, yPred))
a_loss = K.mean(K.square(a_true - a_pred))
b_loss = K.mean(K.square(b_true - b_pred))
final_loss = K.mean(empirical_loss + a_loss + b_loss)
return final_loss
However, when I train with this loss function, it is simply not converging well. What I want to try is to minimize the three loss functions separately, not together by adding them into one loss function.
I essentially want to do the second option here Tensorflow: Multiple loss functions vs Multiple training ops but in Keras form. I also want the loss functions to be independent from each other. Is there a simple way to do this?
You could have 3 outputs in your keras model, each with your specified loss, and then keras has support for weighting these losses. It will also then generate a final combined loss for you in the output, but it will be optimising to reduce all three losses. Be wary with this though as depending on your data/problem/losses you might find it stalls slightly or is slow if you have losses fighting each other. This however requires use of the functional API. I'm unsure as to whether this actually implements separate optimiser instances, however I think this is as close you will get in pure Keras that i'm aware of without having to start writing more complex TF training regimes.
For example:
loss_out1 = layers.Dense(1, activation='sigmoid', name='loss1')(x)
loss_out2 = layers.Dense(1, activation='sigmoid', name='loss2')(x)
loss_out3 = layers.Dense(1, activation='sigmoid', name='loss3')(x)
model = keras.Model(inputs=[input],
outputs=[loss1, loss2, loss3])
model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
loss=['binary_crossentropy', 'categorical_crossentropy', 'custom_loss1'],
loss_weights=[1., 1., 1.])
This should compile a model with 3 outputs at the end from (x) which would be above. When you compile you set the outputs as a list as well as set the losses and loss weights as a list. Note that when you fit() that you'll need to supply your target outputs three times as a list too e.g. [y, y, y] as your model now has three outputs.
I'm not a Keras expert, but it's pretty high-level and i'm not aware of another way using pure Keras. Hopefully someone can come correct me with a better solution!
Since there is only one output, few things that can be done:
1.Monitor the individual loss components to see how they vary.
def a_loss(y_true, y_pred):
a_pred = a(yPred)
a_true = a(yTrue)
return K.mean(K.square(a_true - a_pred))
model.compile(....metrics=[...a_loss,b_loss])
2.Weight the loss components where lambda_a & lambda_b are hyperparameters.
final_loss = K.mean(empirical_loss + lambda_a * a_loss + lambda_b * b_loss)
Use a different loss function like SSIM.
https://www.tensorflow.org/api_docs/python/tf/image/ssim
In tensorflow 2.0 you don't have to worry about training phase(batch size, number of epochs etc), because everything can be defined in compile method: model.fit(X_train,Y_train,batch_size = 64,epochs = 100).
But I have seen the following code style:
optimizer = tf.keras.optimizers.Adam(0.001)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
#tf.function
def train_step(inputs, labels):
with tf.GradientTape() as tape:
predictions = model(inputs, training=True)
regularization_loss = tf.math.add_n(model.losses)
pred_loss = loss_fn(labels, predictions)
total_loss = pred_loss + regularization_loss
gradients = tape.gradient(total_loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
for epoch in range(NUM_EPOCHS):
for inputs, labels in train_data:
train_step(inputs, labels)
print("Finished epoch", epoch)
So here you can observe "more detailed" code, where you manually define by for loops you training procedure.
I have following question: what is the best practice in Tensorflow 2.0? I haven't found a any complete tutorial.
Use what is best for your needs.
Both methods are documented in Tensorflow tutorials.
If you don't need anything special, no extra losses, strange metrics or intricate gradient computation, just use a model.fit() or a model.fit_generator(). This is totally ok and makes your life easier.
A custom training loop might come in handy when you have complicated models with non-trivial loss/gradients calculation.
Up to now, two applications I tried were easier with this:
Training a GAN's generator and discriminator simultaneously without having to do the generation step twice. (It's complicated because you have a loss function that applies to different y_true values, and each case should update only a part of the model) - The other option would require to have a few separate models, each model with its own trainable=True/False configuration, and train then in separate phases.
Training inputs (good for style transfer models) -- Alternatively, create a custom layer that takes dummy inputs and that outputs its own trainable weights. But it gets complicated to compile several loss functions for each of the outputs of the base and style networks.
I have a Keras model that has two outputs:
output is the true output of the network on which the loss is going to be computed
additional is used to make an external task during inference (no loss should be computed with this output)
When I build the model, I write something like that:
model = Model(inputs=inp, outputs=[output, additional])
Since my Model has two outputs, I need to provide two losses when compiling the model so I created a useless loss like this:
class NoopLoss(object):
def __call__(self, y_true, y_pred, **kwargs):
return self.compute_loss(y_true, y_pred)
def compute_loss(self, y_true, y_pred):
return tf.math.square(0.0)
Which I integrate in the compile step like this:
loss = UsefulLoss() # the real loss I'm using
noop_loss = NoopLoss()
model.compile(loss=[loss, noop_loss], optimizer=optimizer, metrics=['binary_accuracy'])
It works, but I feel it is a bit hackish, is there a correct way to implement this behavior? I didn't find any official useless loss in the Keras documentation.
In my opinion, Keras was not thought to consider things like this.
I often use these hacks myself too.
But, not sure it's a better solution, actually it might not be, you can create a training model and an inference model, both sharing the trainable part:
inputs = Input(...)
trainable_out = SomeLayer(...)(inputs)
....
trainable_out = ....
extra_output = SomeLayer(...)(something)
training_model = Model(inputs, trainable_out)
inference_model = Model(inputs, [trainable_out, extra_output])
You can train training_model and automatically the other model will be trained as well.