I define a customized loss function for my LSTM model (RMSE function) to be as follows:
def RMSE(y_true, y_pred):
return K.sqrt(K.mean(K.square(y_pred - y_true)))
everything good so far, but the issue is that I scale my input data to be in the range of [-1, 1], so the reported loss will be associated with this scale, I want the model to report the training loss in the range of my original data, for example by applying the scaler.inverse_transform function on the y_true and y_pred somehow, but no luck doing it... as they are tensor and the scaler.inverse_transform requires numpy array....
any idea how to force re-scaling the data and reporting the loss values in the right scale?
scaler.inverse_transform essentially uses scaler.min_ and scaler.scale_ parameters to convert data in sklearn.preprocessing.minmaxscaler. An example:
from sklearn.preprocessing import MinMaxScaler
import numpy as np
data = np.array([[-1, 2], [-0.5, 6], [0, 10], [1, 18]])
scaler = MinMaxScaler()
data_trans = scaler.fit_transform(data)
print('transform:\n',data_trans)
data_inverse = (data_trans - scaler.min_)/scaler.scale_
print('inverse transform:\n',data_inverse)
# print
transform:
[[0. 0. ]
[0.25 0.25]
[0.5 0.5 ]
[1. 1. ]]
inverse transform:
[[-1. 2. ]
[-0.5 6. ]
[ 0. 10. ]
[ 1. 18. ]]
So you just need to use them to achieve your goals in RMSE function.
def RMSE_inverse(y_true, y_pred):
y_true = (y_true - K.constant(scaler.min_)) / K.constant(scaler.scale_)
y_pred = (y_pred - K.constant(scaler.min_)) / K.constant(scaler.scale_)
return K.sqrt(K.mean(K.square(y_pred - y_true)))
Related
Outputs for LSTM layer in tensorflow when using model(X) and model.predict(X) differ when using dropout.
Let's call the output of model(X) as Fwd Pass and model.predict(X) as Prediction
For a regular dropout layer, we can specify the seed but LSTM layer doesn't have such an argument. I'm guessing this is causing the difference between these Fwd Pass and Prediction.
In the following code sample, if dropout=0.4, these the outputs are different but when dropout=0.0 they match exactly. This makes me believe that every evaluation is using a different operation level seed.
Is there a way to set that? I've already set the global seed for tensforflow.
Is there something else going on, that I am not aware of?
PS: I want to use dropout during inference, so that is by design.
Code
import os
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.initializers import GlorotUniform
SEED = 200
HIDDEN_UNITS = 4
N_OUTPUTS = 1
N_INPUTS = 4
BATCH_SIZE = 4
N_SAMPLES = 4
np.random.seed(SEED)
tf.random.set_seed(SEED)
# Simple LSTM Model
def my_model():
inputs = x = keras.Input(shape=(N_INPUTS, 1))
initializer = GlorotUniform(seed=SEED)
x = layers.LSTM(HIDDEN_UNITS,
kernel_initializer=initializer,
recurrent_dropout=0.0,
dropout=0.4,
# return_sequences=True,
use_bias=False)(x, training=True)
output = x
model = keras.Model(inputs=inputs, outputs=[output])
return model
# Create Sample Data
# Target Function
def f_x(x):
y = x[:, 0] + x[:, 1] ** 2 + np.sin(x[:, 2]) + np.sin(x[:, 3] ** 3)
y = y[:, np.newaxis]
return y
# Generate random inputs
d = np.linspace(0.1, 1, N_SAMPLES)
X = np.transpose(np.vstack([d*0.25, d*0.5, d*0.75, d]))
X = X[:, :, np.newaxis]
Y = f_x(X)
# PRINT FWD PASS
model = my_model()
n_out = model(X).numpy()
print('FWD PASS:')
print(n_out, '\n')
# PRINT PREDICT OUTPUT
print('PREDICT:')
out = model.predict(X)
print(out)
Output (dropout=0.4) - do not match
FWD PASS:
[[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]
[ 0.0526864 -0.13284351 0.02326298 -0.30357683]
[ 0.06297918 -0.14084947 0.02214929 -0.44425806]]
PREDICT:
[[ 0.00975818 -0.029404 0.00678372 -0.03232396]
[ 0.0347842 -0.0974849 0.01938616 -0.15696262]
[ 0. 0. 0. 0. ]
[ 0.06297918 -0.14084947 0.02214929 -0.44425806]]
Output (dropout=0.0) - no dropout, outputs match
FWD PASS:
[[ 0.00593475 -0.01799661 0.00424165 -0.01876264]
[ 0.02226446 -0.06519517 0.01399653 -0.08595844]
[ 0.03620889 -0.10084937 0.01987283 -0.1663805 ]
[ 0.0475584 -0.12453148 0.02269932 -0.2541136 ]]
PREDICT:
[[ 0.00593475 -0.01799661 0.00424165 -0.01876264]
[ 0.02226446 -0.06519517 0.01399653 -0.08595844]
[ 0.03620889 -0.10084937 0.01987283 -0.1663805 ]
[ 0.0475584 -0.12453148 0.02269932 -0.2541136 ]]
I have a logistic regression model which I created referring this link
The label is a Boolean value (0 or 1 as values).
Do we need to do one_hot encode the label in this case?
The reason for asking : I use the below function for finding the cross_entropy and loss is always coming as zero.
def cross_entropy(y_true, y_pred):
y_true = tf.one_hot([y_true.numpy()], 2)
print(y_pred)
print(y_true)
loss_row = tf.nn.softmax_cross_entropy_with_logits(labels=y_true, logits=y_pred)
print('Loss')
print(loss_row)
return tf.reduce_mean(loss_row)
EDIT :- The gradient is giving [None,None] as return value (for following code).
def grad(x, y):
with tf.GradientTape() as tape:
y_pred = logistic_regression(x)
loss_val = cross_entropy(y, y_pred)
return tape.gradient(loss_val, [w, b])
Examples values
loss_val => tf.Tensor(307700.47, shape=(), dtype=float32)
w => tf.Variable 'Variable:0' shape=(171, 1) dtype=float32, numpy=
array([[ 0.7456649 ], [-0.35111237],[-0.6848465 ],[ 0.22605407]]
b => tf.Variable 'Variable:0' shape=(1,) dtype=float32, numpy=array([1.1982833], dtype=float32)
In case of binary logistic regression, you don't required one_hot encoding. It generally used in multinomial logistic regression.
If you are doing ordinary (binary) logistic regression (with 0/1 labels), then use the loss function tf.nn.sigmoid_cross_entropy_with_logits().
If you are doing multiclass logistic regression (a.k.a softmax regression or multinomial logistic regission), then you have two choices:
Define your labels in 1-hot format (e.g. [1, 0, 0], [0, 1, 0], ...) and use the loss function tf.nn.softmax_cross_entropy_with_logits()
Define your labels as single integers (e.g. 1, 2, 3, ...) and use the loss function tf.nn.sparse_softmax_cross_entropy_with_logits()
For the latter two, you can find more information in this StackOverflow question:
What's the difference between sparse_softmax_cross_entropy_with_logits and softmax_cross_entropy_with_logits?
I have a small model used in a reinforcement learning context.
I can input a 2d tensor of states, and I get a 2d tensor of action weigths.
Let say I input two states and I get the following action weights out:
[[0.1, 0.2],
[0.3, 0.4]]
Now I have another 2d tensor which have the action number from which I want to get the weights:
[[1],
[0]]
How can I use this tensor to get the weight of actions?
In this example I'd like to get:
[[0.2],
[0.3]]
Similar to Tensorflow tf.gather with axis parameter, the indices are handled little different here:
a = tf.constant( [[0.1, 0.2], [0.3, 0.4]])
indices = tf.constant([[1],[0]])
# convert to full indices
full_indices = tf.stack([tf.range(indices.shape[0])[...,tf.newaxis], indices], axis=2)
# gather
result = tf.gather_nd(a,full_indices)
with tf.Session() as sess:
print(sess.run(result))
#[[0.2]
#[0.3]]
A simple way to do this is squeeze the dimensions of indices, element-wise multiply with corresponding one-hot vector and then expand the dimensions later.
import tensorflow as tf
weights = tf.constant([[0.1, 0.2], [0.3, 0.4]])
indices = tf.constant([[1], [0]])
# Reduce from 2d (2, 1) to 1d (2,)
indices1d = tf.squeeze(indices)
# One-hot vector corresponding to the indices. shape (2, 2)
action_one_hot = tf.one_hot(indices=indices1d, depth=weights.shape[1])
# Element-wise multiplication and sum across axis 1 to pick the weight. Shape (2,)
action_taken_weight = tf.reduce_sum(action_one_hot * weights, axis=1)
# Expand the dimension back to have a 2d. Shape (2, 1)
action_taken_weight2d = tf.expand_dims(action_taken_weight, axis=1)
sess = tf.InteractiveSession()
print("weights\n", sess.run(weights))
print("indices\n", sess.run(indices))
print("indices1d\n", sess.run(indices1d))
print("action_one_hot\n", sess.run(action_one_hot))
print("action_taken_weight\n", sess.run(action_taken_weight))
print("action_taken_weight2d\n", sess.run(action_taken_weight2d))
Should give you the following output:
weights
[[0.1 0.2]
[0.3 0.4]]
indices
[[1]
[0]]
indices1d
[1 0]
action_one_hot
[[0. 1.]
[1. 0.]]
action_taken_weight
[0.2 0.3]
action_taken_weight2d
[[0.2]
[0.3]]
Note: You can also do action_taken_weight = tf.reshape(action_taken_weight, tf.shape(indices)) instead of expand_dims.
I am trying to write a custom Keras loss function but I am having issues with implementing and debugging my code. My target vector is:
y_pred = [p_conf, p_class_1, p_class_2]
where, p_conf = confidence an event of interest was detected
y_true examples:
[0, 0, 0] = no event of interest
[1, 1, 0] = first class event
[1, 0, 1] = second class event
I get relatively good results using multi-label classification (i.e. using a sigmoid activation in my final layer and binary_crossentropy loss function) but I want to experiment and improve my results using a custom loss function that calculates the:
binary_crossentropy loss for when y_true = [0, ..., ...]
categorical_crossentropy loss for when y_true = [1, ..., ...]
This is a simplified loss function used by the YOLO object detection algorithm. I tried adapting an existing Keras / TensorFlow implementation of the YOLO loss function but have not been successful.
Here is my current working code. It runs but generates unstable results. i.e. loss and accuracy decreases over time. Any assistance would be greatly appreciated.
import tensorflow as tf
from keras import losses
def custom_loss(y_true, y_pred):
# Initialisation
mask_shape = tf.shape(y_true)[:0]
conf_mask = tf.zeros(mask_shape)
class_mask = tf.zeros(mask_shape)
# Labels
true_conf = y_true[..., 0]
true_class = tf.argmax(y_true[..., 1:], -1)
# Predictions
pred_conf = tf.sigmoid(y_pred[..., 0])
pred_class = y_pred[..., 1:]
# Masks for selecting rows based on confidence = {0, 1}
conf_mask = conf_mask + (1 - y_true[..., 0])
class_mask = y_true[..., 0]
nb_class = tf.reduce_sum(tf.to_float(class_mask > 0.0))
# Calculate loss
loss_conf = losses.binary_crossentropy(true_conf, pred_conf)
loss_class = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=true_class, logits=pred_class)
loss_class = tf.reduce_sum(loss_class * class_mask) / nb_class
loss = loss_conf + loss_class
return loss
I'm using a softmax function for a binary classification task.
My test label is a one hot list and looks like:
test_y = [[1. 0.] [1. 0.]…]
And the predicted label is a probability list:
test_y_pred = [[ 4.39091297e-09 1.00000000e+00]
[ 1.75207238e-10 1.00000000e+00] …]
When I try to use the f1_score, I get an error :
ValueError: Can't handle mix of binary and continuous
How can I handle this issue?
Thanks
f1_score will not classify the results for you.
Change your predictions to vectors of classes, for example:
import numpy as np
test_y = [np.argmax(prediction) for prediction in test_y]
test_y_pred= [np.argmax(prediction) for prediction in test_y_pred]