I want to train a model with a custom loss function. To weight the calculated loss inside the function I want to use an external value. This value relates to the label (with same shape), but is not a part of the labels datasets - so it is something like an "hidden relation" organized in an array.
Pseudocode of the loss function I need looks like this:
def custom_loss(hidden_relation[i]):
def loss(y_true, y_pred)
if y_true == y_pred:
p_loss = <something math with y_true, y_pred, hidden_relation[i]>
return p_loss
else:
p_loss = <something other math with y_true, y_pred, hidden_relation[i]>
return p_loss
return loss
My problem is the index i. Inside the loss function y_true and y_pred are not given as something like y_true = [[label],[i]] and y_pred = [[prediction],[i]]. So how can I know which i'th row from label datasets is used for this training iteration? (My batch size is 1.)
Some more background: My application is a time series forecasting. Parallel to the time series, there is a list of delayed events. If any such event occurs, this leads to higher or lower values in the time series (like a feedback loop).
I think that patterns in the time series can predict these (delayed) events. So I would like to further penalize the model if it misses these patterns, because this leads to bad predictions. On the other hand, if a delayed event does not occur it doesn't matter, no extra penalty necessary.
When training on historical data, I know if a prediction coincides with an event. In reality, I do not have this additional information, though. So I can not use the event list for the input labels since it doesn't exist (yet).
Thank you for your help!
Best
hippo
Related
i am using a custom loss trying to decrease peak average power ratio of ofdm symbols. to break it down the input is of length N length that can take only 4 values. the output can take any floating value from [-1,1] (because i cant go over the power threshold). i generate the training and validation set randomly since it is.. the data can take any random combination of the 4 values.
The problem is changing and tweaking the model and parameters only improves the training loss, validation loss is constant from the first epoch.
I am using a custom loss function that only concatenates the output of the model and spreads it in the middle of the input and using ifft operation then calculating the max / mean of all elements.
in short its reserving some of the array elements (tones) to pick so that it removes the peaks of the input sacrificing those element but getting less peaks in the final signal.
i am sending input data as one hot encoded for each of the 4 values, and sending them once more as labels in their complex form so i can do operations on them in the custom loss function below.
def PAPR_Loss(y_true, y_pred):
Reserved_phases = [0, 32, 62, 93, 124, 155, 186, 217, 248]
data = tf.concat([tf.concat([y_true[:, Reserved_phases[i]:Reserved_phases[i+1]], tf.complex(y_pred[:, 4*(i+1)-4] - y_pred[:, 4*(i+1)-2], y_pred[:, 4*(i+1)-3] - y_pred[:, 4*(i+1)-1])[:, tf.newaxis]], 1) for i in range(L)], 1)
x = tf.signal.ifft(data)
temp = tf.square(tf.abs(x))
loss = tf.reduce_max(temp, axis=-1) / tf.reduce_mean(temp, axis=-1)
return 10*tf.experimental.numpy.log10(loss)
Loss and Validation Loss vs Epochs
i am using 80k unique data combinations as training and 20k different combinations as validation
also i am using dropout after each layer so i dont think its an overfitting problem.
when i remove the tanh activation at the output (meaning the output can take any values) i start getting improvements on the validation and better loss on training as well but i suspect this occurs because we just let the model add the mean power term which is inversly proportional to the loss but it doesnt learn where the peaks and how to cancel those peaks. it just increase the mean as much as possible so that the max isnt that big in relation to it anymore.
also could the model not train because of the concatenation and using input in a different form as a label? i thought i could get away with this since the input isnt trainable so it doesnt matter.
Note: The model doesnt even beat the classical method without using deep learning which just search in a candidate limited set for the best combinations that decrease this peaks. the problem with the classical model that it is computationally expensive if i can even match this performance this approach will be very rewarding.
what could be going wrong here? what can i try changing next?
Thanks in advance.
I have a simpleRNN / LSTM that I'm trying to train on a sequential classification task using tensorflow. There is a sequence of data (300 time steps) that predicts a label at t=300. For my task I would like for the RNN to evaluate the error at every timestep (not just at the final time point) and propagate it backwards (as figure below).
After some responses below it seems I need to do a few things: use return_sequences flag; use the TimeDistributed layer to access the output from the LSTM/RNN; and also defined a custom loss function.
model = Sequential()
layer1 = LSTM(n_neurons, input_shape=(length, 1), return_sequences=True)
model.add(layer1)
layer2 = TimeDistributed(Dense(1))
model.add(layer2)
# Define custom loss
def custom_loss(layer1):
# Create a loss function
def loss(y_true,y_pred):
# access layer1 at every time point and compute mean error
# UNCLEAR HOW TO RUN AT EVERY TIME STEP
err = K.mean(layer1(X) - y_true, axis=-1)
return err
# Return a function
return loss
# Compile the model
model.compile(optimizer='adam', loss=custom_loss(layer), metrics=['accuracy'])
For now I'm a bit confused of the custom_loss function as it's not clear that how I can pass in layer1 and compute the error inside the inner most loss function.
Anyone has a suggestion or can point me to a more detailed answer?
The question is not easy to answer since it is not clear what you're trying to achieve (it shouldn't be the same using a FFNN or a RNN, and what works best depends definitely on the application).
Anyway, you might be confusing the training steps (say, the forward- and back- propagation over a minibatch of sequences) with the "internal" steps of the RNN. A single sequence (or a single minibatch) will always "unroll" entirely through time during the forward pass before any output is made available: only after (thus, at the end of the training step), you can use the predictions and compute the losses to backpropagate.
What you can do is return sequences of outputs (one y_predicted for every internal time step) including the argument return_sequences=True inside SimpleRNN(...). This will give you a sequence of 300 predictions, each of which depends only on the past inputs with respect to the considered internal time step. You can then use the outputs that you need to compute the loss, possibly in a custom loss function.
I hope I've been clear enough. Otherwise, let me know if I can help further.
I have written my custom training loop using tf.GradientTape(). My data has 2 classes. The classes are not balanced; class1 data contributes almost 80% and class2 contributes remaining 20%. Therefore in order to remove this imbalance I was trying to write custom loss function which will take into account this imbalance and apply the corresponding class weights and calculate the loss. i.e. I want to use the class_weights = [0.2, 0.8]. I am not able to find similar examples.
However all the examples I am seeing are using model.fit approach where its easier to pass the class_weights. I am not able to find out the example which uses class_weights with custom training loop using tf.GradientTape.
I did go through the suggestions of using sample_weight, however I don't have the data where in I can specify the weights for samples, therefore my preference is to use class weight.
I am using BinaryCrossentropy loss as loss function but I want to change the loss based on the class_weights. That's where I am stuck, how to tell BinaryCrossentropy to consider the class_weights.
Is my approach of using custom loss function correct or there is better way to make use of class_weights while training with custom training loop (not using model.fit)?
you can write your own loss function. in that loss function call BinaryCrossentropy and then multiply the result in the weight you want and return that
Here's an implementation that should work for n classes instead of just 2.
For your example of 80:20 split, calculate weights as below (assuming 100 samples in total).
Weight calculation (ref: Handling Class Imbalance: TensorFlow):
weight_class_0 = (1/count_for_class_0) * (total_samples / num_classes) # (80%) 0.625
weight_class_1 = (1/count_for_class_1) * (total_samples / num_classes) # (20%) 2.5
class_wts = tf.constant([weight_class_0, weight_class_1])
Loss function: Requires labels to be sparse and logits unscaled (no activations applied).
# Example logits=[[-3.2, 2.0], [1.2, 0.5], ...], (sparse)labels=[0, 1, ...]
def weighted_sparse_categorical_crossentropy(labels, logits, weights):
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels, logits)
class_weights = tf.gather(weights, labels)
return tf.reduce_mean(class_weights * loss)
You can supply this loss function to custom training loops.
I am having trouble with Keras Custom loss function. I want to be able to access truth as a numpy array.
Because it is a callback function, I think I am not in eager execution, which means I can't access it using the backend.get_value() function. i also tried different methods, but it always comes back to the fact that this 'Tensor' object doesn't exist.
Do I need to create a session inside the custom loss function ?
I am using Tensorflow 2.2, which is up to date.
def custom_loss(y_true, y_pred):
# 4D array that has the label (0) and a multiplier input dependant
truth = backend.get_value(y_true)
loss = backend.square((y_pred - truth[:,:,0]) * truth[:,:,1])
loss = backend.mean(loss, axis=-1)
return loss
model.compile(loss=custom_loss, optimizer='Adam')
model.fit(X, np.stack(labels, X[:, 0], axis=3), batch_size = 16)
I want to be able to access truth. It has two components (Label, Multiplier that his different for each item. I saw a solution that is input dependant, but I am not sure how to access the value. Custom loss function in Keras based on the input data
I think you can do this by enabling run_eagerly=True in model.compile as shown below.
model.compile(loss=custom_loss(weight_building, weight_space),optimizer=keras.optimizers.Adam(), metrics=['accuracy'],run_eagerly=True)
I think you also need to update custom_loss as shown below.
def custom_loss(weight_building, weight_space):
def loss(y_true, y_pred):
truth = backend.get_value(y_true)
error = backend.square((y_pred - y_true))
mse_error = backend.mean(error, axis=-1)
return mse_error
return loss
I am demonstrating the idea with a simple mnist data. Please take a look at the code here.
I am using autoencoders to do anomaly detection. So, I have finished training my model and now I want to calculate the reconstruction loss for each entry in the dataset. so that I can assign anomalies to data points with high reconstruction loss.
This is my current code to calculate the reconstruction loss
But this is really slow. By my estimation, it should take 5 hours to go through the dataset whereas training one epoch occurs in approx 55 mins.
I feel that converting to tensor operation is bottlenecking the code, but I can't find a better way to do it.
I've tried changing the batch sizes but it does not make much of a difference. I have to use the convert to tensor part because K.eval is throwing an error if I do it normally.
python
for i in range(0, encoded_dataset.shape[0], batch_size):
y_true = tf.convert_to_tensor(encoded_dataset[i:i+batch_size].values,
np.float32)
y_pred= tf.convert_to_tensor(ae1.predict(encoded_dataset[i:i+batch_size].values),
np.float32)
# Append the batch losses (numpy array) to the list
reconstruction_loss_transaction.append(K.eval(loss_function( y_true, y_pred)))
I was able to train in 55 mins per epoch. So I feel prediction should not take 5 hours per epoch. encoded_dataset is a variable that has the entire dataset in main memory as a data frame.
I am using Azure VM instance.
K.eval(loss_function(y_true,y_pred) is to find the loss for each row of the batch
So y_true will be of size (batch_size,2000) and so will y_pred
K.eval(loss_function(y_true,y_pred) will give me an output of
(batch_size,1) evaluating binary cross entropy on each row of y
_true and y_pred
Moved from comments:
My suspicion is that ae1.predict and K.eval(loss_function) are behaving in unexpected ways. ae1.predict should normally be used to output the loss function value as well as y_pred. When you create the model, specify that the loss value is another output (you can have a list of multiple outputs), then just call predict here once to get both y_pred the loss value in one call.
But I want the loss for each row . Won't the loss returned by the predict method be the mean loss for the entire batch?
The answer depends on how the loss function is implemented. Both ways produce perfectly valid and identical results in TF under the hood. You could average the loss over the batch before taking the gradient w.r.t. the loss, or take the gradient w.r.t. a vector of losses. The gradient operation in TF will perform the averaging of the losses for you if you use the latter approach (see SO articles on taking the per-sample gradient, it's actually hard to do).
If Keras implements the loss with reduce_mean built into the loss, you could just define your own loss. If you're using square loss, replacing 'mean_squared_error' with lambda y_true, y_pred: tf.square(y_pred - y_true). That would produce square error instead of MSE (no difference to the gradient), but look here for the variant including the mean.
In any case this produces a per sample loss so long as you don't use tf.reduce_mean, which is purely optional in the loss. Another option is to simply compute the loss separately from what you optimize for and make that an output of the model, also perfectly valid.