I am learning from an example given by TensorFlow document, https://www.tensorflow.org/tutorials/generative/cvae#define_the_loss_function_and_the_optimizer:
VAEs train by maximizing the evidence lower bound (ELBO) on the
marginal log-likelihood.
In practice, optimize the single sample Monte Carlo estimate of this
expectation: logp(x|z) + logp(z) - logq(z|x).
The loss function was implemented as:
def log_normal_pdf(sample, mean, logvar, raxis=1):
log2pi = tf.math.log(2. * np.pi)
return tf.reduce_sum(
-.5 * ((sample - mean) ** 2. * tf.exp(-logvar) + logvar + log2pi),
axis=raxis)
def compute_loss(model, x):
mean, logvar = model.encode(x)
z = model.reparameterize(mean, logvar)
x_logit = model.decode(z)
cross_ent = tf.nn.sigmoid_cross_entropy_with_logits(logits=x_logit, labels=x)
logpx_z = -tf.reduce_sum(cross_ent, axis=[1, 2, 3])
logpz = log_normal_pdf(z, 0., 0.)
logqz_x = log_normal_pdf(z, mean, logvar)
return -tf.reduce_mean(logpx_z + logpz - logqz_x)
Since this example used MINIST dataset, x can be normalized to [0, 1] and sigmoid_cross_entropy_with_logits was used here.
My questions are:
What if x > 1, what kind of loss could be used?
Can we use other loss functions as a reconstruction loss in VAE, such as Huber loss (https://en.wikipedia.org/wiki/Huber_loss)?
Another example used MSE loss (as follow), is MSE loss a valid ELBO loss to measure p(x|z)?
https://www.tensorflow.org/guide/keras/custom_layers_and_models#putting_it_all_together_an_end-to-end_example
# Iterate over the batches of the dataset.
for step, x_batch_train in enumerate(train_dataset):
with tf.GradientTape() as tape:
reconstructed = vae(x_batch_train)
# Compute reconstruction loss
loss = mse_loss_fn(x_batch_train, reconstructed)
loss += sum(vae.losses) # Add KLD regularization loss
In the loss function of a variational autoencoder, you jointly optimize two terms:
The reconstruction loss between prediction and label, like in a normal autoencoder
The distance between the parametrized probability distribution and the assumed true probability distribution. In practice, the true distribution is usually assumed to be Gaussian and distance is measured in terms of Kullback-Leibler divergence
For the reconstruction loss part, you can pick any loss function that fits your data, including MSE and Huber. It is generally still a good idea to normalize your input features though.
Related
I'm having a lot of troubles understanding why we have to do np.sum(..., axis=0) when we are computing the gradients of bias w.r.t. loss in a Neural Net.
I'm studying with a book and there's this nice 2-layer NN diagram:
now, if I want to know by how much I have to update (for example) B_2, I just do:
gradient of bias wrt loss = ∂(L)/∂(b_2) = ∂(L)/∂(P) * ∂(P)/∂(b_2) = -(Y - P) * np.ones_like(b_2)
Then just bias -= LR * (gradient of bias wrt loss).
And that's it, no? Where is calculus saying that we have to sum along the axis 0?
I am using a variational autoencoder to reconstruct images in tensorflow 2.0 with the Keras API. My model's architecture looks like that:
The lambda layer uses a function to sample from a normal distribution which looks like that:
def sampling(args):
z_mean, z_log_var = args
epsilon = K.random_normal(shape =(1,1,16))
return z_mean + K.exp(0.5 * z_log_var) * epsilon
My hyperparameters are as follows:
epochs = 50
batch size =16
num_training = 1800
num_val = 100
num_test = 100
learning rate = 0.001
exponential decay = 0.9 * initial learning rate (calculated every 5 epochs)
optimizer = Adam
shuffle = True
I am using the following loss:
def vae_loss(y_pred, y_gt):
mse_loss = mse(y_pred, y_gt)
z_mean = model.get_layer('z_mean_layer').output
z_log_var = model.get_layer('z_log_var_layer').output
kl_loss = 1 + z_log_var - K.square(z_mean) - K.exp(z_log_var)
kl_loss = K.sum(kl_loss, axis=-1)
kl_loss *= -0.5
return K.mean(mse_loss + kl_loss)
My weights are initialized the default way: kernel_initializer='glorot_uniform', bias_initializer='zeros'.
My datasets images consist of a randomly placed circle, which looks like that:
The background has the value 0 and the circle's value is sampled from a uniform distribution between -1 and 1, e.g. 0.987 for all circle pixels.
When I train with this configuration, I get the following loss.
The KL divergence is of magnitude 1e-8, whereas the MSE loss is stays at 0.101.
And I always get the same reconstruction, regardless of the input, which is an image with a constant pixel intensity
Now, if I multiply all input images with 500 (eg. background stays zero, circle pixel values are uniformly distributed in the range (-500, 500)), the network miraculously starts to learn.
with a KL loss of magnitude 50 and MSE loss of magnitude 250 (last epochs)
And the image reconstruction works well. Basically, the MSE metric is high, but the circle contour is positioned in the right place.
My quiestion is: How come the network cannot reconstruct images in the range (-1,1) , but does so in the range (-500, 500)?
Machine precision is set to float32.
I have used numerous learning rates, e.g. 0.00001, but this does not solve the problem. I have also trained for many epochs, e.g. 200, still no result.
As mentioned in the comments there is probably a problem with the scaling of the loss. Your current implementation of the MSE loss uses the mean of the squared differences (which is fairly small). Instead of using the mean, try using the sum of the squared differences over your image. The Keras VAE (https://keras.io/examples/variational_autoencoder/) does this by scaling the computed MSE loss with the original image size (in pytorch this can be specified directly https://github.com/pytorch/examples/blob/234bcff4a2d8480f156799e6b9baae06f7ddc96a/vae/main.py#L74).
I'm not able to train the network using keras, getting the following error, at epoch 1, first batch:
InvalidArgumentError: In[0] is not a matrix. Instead it has shape []
[[{{node training/SGD/gradients/dense_1/MatMul_grad/MatMul}}]]
I'm trying to solve a regression problem using Keras and a custom function provided by https://github.com/farrell236/DeepPose
The network is a quite simple CNN VGG-like.
I think the problem is the loss function. In particular, I suppose that the weight initialization is the issue (take a look at the Tensorflow example: https://github.com/farrell236/DeepPose/blob/master/tensorflow/example)
That's my loss function:
def custom_loss(y_true, y_pred):
loss = SE3GeodesicLoss(np.ones((1, 6)))
tf.initializers.constant([loss])
y_pred = tf.cast(y_pred, dtype=tf.float32)
y_true = tf.cast(y_true, dtype=tf.float32)
loss = SE3GeodesicLoss(np.ones(6))
geodesic_loss = loss.geodesic_loss(y_pred, y_true)
geodesic_loss = tf.cast(geodesic_loss, dtype=tf.float32)
return geodesic_loss
What's strange is that I'm able to use this function as a metric for the training.
Further information:
What I'm trying to do is to estimate the position of an object having images as input and relative Eulerian angles and distance of the target as labels (which means 6 parameters [r_x, r_y, r_z, t_x, t_y, t_z]). I'm trying to implement this loss function in order to solve the attitude estimation problem. Other losses (means: MSE, MAE) are not effective enough in solving attitude regression problem.
Do you have any suggestion?
I am attempting object segmentation using a custom loss function as defined below:
def chamfer_loss_value(y_true, y_pred):
# flatten the batch
y_true_f = K.batch_flatten(y_true)
y_pred_f = K.batch_flatten(y_pred)
# ==========
# get chamfer distance sum
// error here
y_pred_mask_f = K.cast(K.greater_equal(y_pred_f,0.5), dtype='float32')
finalChamferDistanceSum = K.sum(y_pred_mask_f * y_true_f, axis=1, keepdims=True)
return K.mean(finalChamferDistanceSum)
def chamfer_loss(y_true, y_pred):
return chamfer_loss_value(y_true, y_pred)
y_pred_f is the result of my U-net. y_true_f is the result of a euclidean distance transform on the ground truth label mask x as shown below:
distTrans = ndimage.distance_transform_edt(1 - x)
To compute the Chamfer distance, you multiply the predicted image (ideally, a mask with 1 and 0) with the ground truth distance transform, and simply sum over all pixels. To do this, I needed to get a mask y_pred_mask_f by thresholding y_pred_f, then multiply with y_true_f, and sum over all pixels.
y_pred_f provides a continuous range of values in [0,1], and I get the error None type not supported at the evaluation of y_true_mask_f. I know the loss function has to be differentiable, and greater_equal and cast are not. But, is there a way to circumvent this in Keras? Perhaps using some workaround in Tensorflow?
Well, this was tricky. The reason behind your error is that there is no continuous dependence between your loss and your network. In order to compute gradients of your loss w.r.t. to network, your loss must compute the gradient of indicator if your output is greater than 0.5 (as this is the only connection between your final loss value and output y_pred from your network). This is impossible as this indicator is partially constant and not continuous.
Possible solution - smooth your indicator:
def chamfer_loss_value(y_true, y_pred):
# flatten the batch
y_true_f = K.batch_flatten(y_true)
y_pred_f = K.batch_flatten(y_pred)
y_pred_mask_f = K.sigmoid(y_pred_f - 0.5)
finalChamferDistanceSum = K.sum(y_pred_mask_f * y_true_f, axis=1, keepdims=True)
return K.mean(finalChamferDistanceSum)
As sigmoid is a continuous version of a step function. If your output comes from sigmoid - you could simply use y_pred_f instead of y_pred_mask_f.
I am trying to edit my own model by adding some code to cifar10.py and here is the question.
In cifar10.py, the [tutorial][1] says:
EXERCISE: The output of inference are un-normalized logits. Try editing the network architecture to return normalized predictions using tf.nn.softmax().
So I directly input the output from "local4" to tf.nn.softmax(). This gives me the scaled logits which means the sum of all logits is 1.
But in the loss function, the cifar10.py code uses:
tf.nn.sparse_softmax_cross_entropy_with_logits()
and description of this function says
WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.
Also, according to the description, logits as input to above funtion must have the shape [batch_size, num_classes] and it means logits should be unscaled softmax, like sample code calculate unnormalized softmaxlogit as follow.
# softmax, i.e. softmax(WX + b)
with tf.variable_scope('softmax_linear') as scope:
weights = _variable_with_weight_decay('weights', [192, NUM_CLASSES],
stddev=1/192.0, wd=0.0)
biases = _variable_on_cpu('biases', [NUM_CLASSES],
tf.constant_initializer(0.0))
softmax_linear = tf.add(tf.matmul(local4, weights), biases, name=scope.name)
_activation_summary(softmax_linear)
Does this mean I don't have to use tf.nn.softmax in the code?
You can use tf.nn.softmax in the code if you want, but then you will have to compute the loss yourself:
softmax_logits = tf.nn.softmax(logits)
loss = tf.reduce_mean(- labels * tf.log(softmax_logits) - (1. - labels) * tf.log(1. - softmax_logits))
In practice, you don't use tf.nn.softmax for computing the loss. However you need to use tf.nn.softmax if for instance you want to compute the predictions of your algorithm and compare them to the true labels (to compute accuracy).