Binary Logistic Regression - do we need to one_hot encode label? - tensorflow

I have a logistic regression model which I created referring this link
The label is a Boolean value (0 or 1 as values).
Do we need to do one_hot encode the label in this case?
The reason for asking : I use the below function for finding the cross_entropy and loss is always coming as zero.
def cross_entropy(y_true, y_pred):
y_true = tf.one_hot([y_true.numpy()], 2)
print(y_pred)
print(y_true)
loss_row = tf.nn.softmax_cross_entropy_with_logits(labels=y_true, logits=y_pred)
print('Loss')
print(loss_row)
return tf.reduce_mean(loss_row)
EDIT :- The gradient is giving [None,None] as return value (for following code).
def grad(x, y):
with tf.GradientTape() as tape:
y_pred = logistic_regression(x)
loss_val = cross_entropy(y, y_pred)
return tape.gradient(loss_val, [w, b])
Examples values
loss_val => tf.Tensor(307700.47, shape=(), dtype=float32)
w => tf.Variable 'Variable:0' shape=(171, 1) dtype=float32, numpy=
array([[ 0.7456649 ], [-0.35111237],[-0.6848465 ],[ 0.22605407]]
b => tf.Variable 'Variable:0' shape=(1,) dtype=float32, numpy=array([1.1982833], dtype=float32)

In case of binary logistic regression, you don't required one_hot encoding. It generally used in multinomial logistic regression.

If you are doing ordinary (binary) logistic regression (with 0/1 labels), then use the loss function tf.nn.sigmoid_cross_entropy_with_logits().
If you are doing multiclass logistic regression (a.k.a softmax regression or multinomial logistic regission), then you have two choices:
Define your labels in 1-hot format (e.g. [1, 0, 0], [0, 1, 0], ...) and use the loss function tf.nn.softmax_cross_entropy_with_logits()
Define your labels as single integers (e.g. 1, 2, 3, ...) and use the loss function tf.nn.sparse_softmax_cross_entropy_with_logits()
For the latter two, you can find more information in this StackOverflow question:
What's the difference between sparse_softmax_cross_entropy_with_logits and softmax_cross_entropy_with_logits?

Related

Tensorflow Macro F1 Score for multiclass and also for binary classification

I am trying to train 2 1D Conv neural networks - one for a multiclass classification problem and second for a binary classification problem. One of my metrics has to be Macro F1 score for both problems. However I am having a problem using tfa.metrics.F1Score from tensorflow addons.
Multiclass classification
I have 3 classes encoded as 0, 1, 2.
The last layer of the network and the compile method look like this (int_sequeces_input is the input layer):
preds = layers.Dense(3, activation="softmax")(x)
model = keras.Model(int_sequences_input, preds)
f1_macro = F1Score(num_classes=3, average='macro')
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy',f1_macro])
However when I run model.fit(), I get the following error:
ValueError: Dimension 0 in both shapes must be equal, but are 3 and 1. Shapes are [3] and [1]. for '{{node AssignAddVariableOp_7}} = AssignAddVariableOp[dtype=DT_FLOAT](AssignAddVariableOp_7/resource, Sum_6)' with input shapes: [], [1].
shapes of data:
X_train - (23658, 150)
y_train - (23658,)
Binary classification
I have 2 classes encoded as 0,1
The last layer of the network and the compile method look like this (int_sequeces_input is the input layer):
preds = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(int_sequences_input, preds)
print(model.summary())
f1_macro = F1Score(num_classes=2, average='macro')
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy',f1_macro])
Again, when I run model.fit() I get error:
ValueError: Dimension 0 in both shapes must be equal, but are 2 and 1. Shapes are [2] and [1]. for '{{node AssignAddVariableOp_4}} = AssignAddVariableOp[dtype=DT_FLOAT](AssignAddVariableOp_4/resource, Sum_3)' with input shapes: [], [1].
shapes of data:
X_train - (15770, 150)
y_train - (15770,)
So my question is: how to evaluate both of my models using macro F1 score? How can I fix my implementation to make it work with tfa.metrics.F1Score? Or is there any other way to calculate macro F1 score without using tfa.metrics.F1Score? Thanks.
Have a look at the usage example from its doc page.
metric = tfa.metrics.F1Score(num_classes=3, threshold=0.5)
y_true = np.array([[1, 1, 1],
[1, 0, 0],
[1, 1, 0]], np.int32)
y_pred = np.array([[0.2, 0.6, 0.7],
[0.2, 0.6, 0.6],
[0.6, 0.8, 0.0]], np.float32)
metric.update_state(y_true, y_pred)
You can see that it expects the label to be in one-hot format.
But given the shapes you mentioned above:
shapes of data:
X_train - (23658, 150)
y_train - (23658,)
It looks like your labels are in index format. Try converting them to one hot with tf.one_hot(y_train, num_classes). You'll also need to change your loss to loss='categorical_crossentropy'.

Create a weighted MSE loss function in Tensorflow

I want to train a recurrent neural network using Tensorflow. My model outputs a 1 by 100 vector for each training sample. Assume that y = [y_1, y_2, ..., y_100] is my output for training sample x and the expected output is y'= [y'_1, y'_2, ..., y'_100].
I wish to write a custom loss function that calculates the loss of this specific sample as follows:
Loss = 1/sum(weights) * sqrt(w_1*(y_1-y'_1)^2 + ... + w_100*(y_100-y'_100)^2)
which weights = [w_1,...,w_100] is a given weight array.
Could someone help me with implementing such a custom loss function? (I also use mini-batches while training)
I want to underline that you have 2 possibilities according to your problem:
[1] If the weights are equal for all your samples:
You can build a loss wrapper. Here a dummy example:
n_sample = 200
X = np.random.uniform(0,1, (n_sample,10))
y = np.random.uniform(0,1, (n_sample,100))
W = np.random.uniform(0,1, (100,)).astype('float32')
def custom_loss_wrapper(weights):
def loss(true, pred):
sum_weights = tf.reduce_sum(weights) * tf.cast(tf.shape(pred)[0], tf.float32)
resid = tf.sqrt(tf.reduce_sum(weights * tf.square(true - pred)))
return resid/sum_weights
return loss
inp = Input((10,))
x = Dense(256)(inp)
pred = Dense(100)(x)
model = Model(inp, pred)
model.compile('adam', loss=custom_loss_wrapper(W))
model.fit(X, y, epochs=3)
[2] If the weights are different between samples:
You should build your model usind add_loss in order to dinamically take into account the weights for each sample. Here a dummy example:
n_sample = 200
X = np.random.uniform(0,1, (n_sample,10))
y = np.random.uniform(0,1, (n_sample,100))
W = np.random.uniform(0,1, (n_sample,100))
def custom_loss(true, pred, weights):
sum_weights = tf.reduce_sum(weights)
resid = tf.sqrt(tf.reduce_sum(weights * tf.square(true - pred)))
return resid/sum_weights
inp = Input((10,))
true = Input((100,))
weights = Input((100,))
x = Dense(256)(inp)
pred = Dense(100)(x)
model = Model([inp,true,weights], pred)
model.add_loss(custom_loss(true, pred, weights))
model.compile('adam', loss=None)
model.fit([X,y,W], y=None, epochs=3)
When using add_loss you should pass all the tensors involved in the loss as input layer and pass them inside the loss for the computation.
At inference time you can compute predictions as always, simply removing the true and weights as input:
final_model = Model(model.input[0], model.output)
final_model.predict(X)
You can implement custom weighted mse in the following way
import numpy as np
from tensorflow.keras import backend as K
def custom_mse(class_weights):
def weighted_mse(gt, pred):
# Formula:
# w_1*(y_1-y'_1)^2 + ... + w_100*(y_100-y'_100)^2 / sum(weights)
return K.sum(class_weights * K.square(gt - pred)) / K.sum(class_weights)
return weighted_mse
y_true = np.array([[0., 1., 1, 0.], [0., 0., 1., 1.]])
y_pred = np.array([[0., 1, 0., 1.], [1., 0., 1., 1.]])
weights = np.array([0.25, 0.50, 1., 0.75])
print(y_true.shape, y_pred.shape, weights.shape)
(2, 4) (2, 4) (4,)
loss = custom_mse(class_weights=weights)
loss(y_true, y_pred).numpy()
0.8
Using it with model compilation.
model.compile(loss=custom_mse(weights))
This will compute mse with the provided weighted matrices. However, in your question, you quote sqrt..., from which I presume you meant root mse (rmse). To do that you can use K.sqrt(K.sum(...)) / K.sum(...) in the custom function of custom_mse.
FYI, you may also interest to look at class_weights and sample_weights during Model. fit. From source:
class_weight: Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss
function (during training only). This can be useful to tell the model
to "pay more attention" to samples from an under-represented class.
sample_weight: Optional Numpy array of weights for the training samples, used for weighting the loss function (during training only).
You can either pass a flat (1D) Numpy array with the same length as
the input samples (1:1 mapping between weights and samples), or in the
case of temporal data, you can pass a 2D array with shape (samples,
sequence_length), to apply a different weight to every timestep of
every sample. This argument is not supported when x is a dataset,
generator, or keras.utils.Sequence instance, instead provides the
sample_weights as the third element of x.
And also loss_weights in Model.compile, from source
loss_weights: Optional list or dictionary specifying scalar coefficients (Python floats) to weight the loss contributions of
different model outputs. The loss value that will be minimized by the
model will then be the weighted sum of all individual losses, weighted
by the loss_weights coefficients. If a list, it is expected to have a
1:1 mapping to the model's outputs. If a dict, it is expected to map
output names (strings) to scalar coefficients.
A class version of the weighted mean squared error loss function.
class WeightedMSE(object):
def __init__(self):
pass
def __call__(self, y_true, y_pred, weights):
sum_weights = tf.reduce_sum(weights)
resid = tf.reduce_sum(weights * tf.square(y_true - y_pred))
return resid / sum_weights

How does BatchNormalization work on an example?

I am trying to understand batchnorm.
My humble example
layer1 = tf.keras.layers.BatchNormalization(scale=False, center=False)
x = np.array([[3.,4.]])
out = layer1(x)
print(out)
Prints
tf.Tensor([[2.99850112 3.9980015 ]], shape=(1, 2), dtype=float64)
My attempt to reproduce it
e=0.001
m = np.sum(x)/2
b = np.sum((x - m)**2)/2
x_=(x-m)/np.sqrt(b+e)
print(x_)
It prints
[[-0.99800598 0.99800598]]
What am I doing wrong?
Two problems here.
First, batch norm has two "modes": Training, where normalization is done via the batch statistics, and inference, where normalization is done via "population statistics" that are collected from batches during training. Per default, keras layers/models function in inference mode, and you need to specify training=True in their call to change this (there are other ways, but that is the simplest one).
layer1 = tf.keras.layers.BatchNormalization(scale=False, center=False)
x = np.array([[3.,4.]], dtype=np.float32)
out = layer1(x, training=True)
print(out)
This prints tf.Tensor([[0. 0.]], shape=(1, 2), dtype=float32). Still not right!
Second, batch norm normalizes over the batch axis, separately for each feature. However, the way you specify the input (as a 1x2 array) is basically a single input (batch size 1) with two features. Batch norm just normalizes each feature to mean 0 (standard deviation is not defined). Instead, you want two inputs with a single feature:
layer1 = tf.keras.layers.BatchNormalization(scale=False, center=False)
x = np.array([[3.],[4.]], dtype=np.float32)
out = layer1(x, training=True)
print(out)
This prints
tf.Tensor(
[[-0.99800634]
[ 0.99800587]], shape=(2, 1), dtype=float32)
Alternatively, specify the "feature axis":
layer1 = tf.keras.layers.BatchNormalization(axis=0, scale=False, center=False)
x = np.array([[3.,4.]], dtype=np.float32)
out = layer1(x, training=True)
print(out)
Note that the input shape is "wrong", but we told batchnorm that axis 0 is the feature axis (it defaults to -1, the last axis). This will also give the desired result:
tf.Tensor([[-0.99800634 0.99800587]], shape=(1, 2), dtype=float32)

Why the gradient of categorical crossentropy loss with respect to logits is 0 with gradient tape in TF2.0?

I am learning Tensorflow 2.0 and I am trying to figure out how Gradient Tapes work. I have this simple example, in which, I evaluate the cross entropy loss between logits and labels. I am wondering why the gradients with respect to logits is being zero. (Please look at the code below).
The version of TF is tensorflow-gpu==2.0.0-rc0.
logits = tf.Variable([[1, 0, 0], [1, 0, 0], [1, 0, 0]], type=tf.float32)
labels = tf.constant([[1, 0, 0], [0, 1, 0], [0, 0, 1]],dtype=tf.float32)
with tf.GradientTape(persistent=True) as tape:
loss = tf.reduce_sum(tf.losses.categorical_crossentropy(labels, logits))
grads = tape.gradient(loss, logits)
print(grads)
I am getting
tf.Tensor(
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]], shape=(3, 3), dtype=float32)
as a result, but should not it tell me how much should I change logits in order to minimize the loss?
When calculate the cross entropy loss, set from_logits=True in the tf.losses.categorical_crossentropy(). In default, it's false, which means you are directly calculate the cross entropy loss using -p*log(q). By setting the from_logits=True, you are using -p*log(softmax(q)) to calculate the loss.
Update:
Just find one interesting results.
logits = tf.Variable([[0.8, 0.1, 0.1]], dtype=tf.float32)
labels = tf.constant([[1, 0, 0]],dtype=tf.float32)
with tf.GradientTape(persistent=True) as tape:
loss = tf.reduce_sum(tf.keras.losses.categorical_crossentropy(labels, logits, from_logits=False))
grads = tape.gradient(loss, logits)
print(grads)
The grads will be tf.Tensor([[-0.25 1. 1. ]], shape=(1, 3), dtype=float32)
Previously, I thought tensorflow will use loss=-\Sigma_i(p_i)\log(q_i) to calculate the loss, and if we derive on q_i, we will have the derivative be -p_i/q_i. So, the expected grads should be [-1.25, 0, 0]. But the output grads looks like all increased by 1. But it won't affect the optimization process.
For now, I'm still trying to figure out why the grads will be increased by one. After reading the source code of tf.categorical_crossentropy, I found that even though we set from_logits=False, it still normalize the probabilities. That will change the final gradient expression. Specifically, the gradient will be -p_i/q_i+p_i/sum_j(q_j). If p_i=1 and sum_j(q_j)=1, the final gradient will plus one. That's why the gradient will be -0.25, however, I haven't figured out why the last two gradients would be 1..
To prove that all gradients are increased by 1/sum_j(q_j),
logits = tf.Variable([[0.5, 0.1, 0.1]], dtype=tf.float32)
labels = tf.constant([[1, 0, 0]],dtype=tf.float32)
with tf.GradientTape(persistent=True) as tape:
loss = tf.reduce_sum(tf.keras.losses.categorical_crossentropy(labels, logits, from_logits=False))
grads = tape.gradient(loss, logits)
print(grads)
The grads are tf.Tensor([[-0.57142866 1.4285713 1.4285713 ]], which should be [-2,0,0].
It shows that all gradients are increased by 1/(0.5+0.1+0.1). For the p_i==1, the gradient increased by 1/(0.5+0.1+0.1) makes sense to me. But I don't understand why p_i==0, the gradient is still increased by 1/(0.5+0.1+0.1).
I finally figured that.
The keras categorical cross entropy calculates the gradient using the following way:
sum(target) / sum(input) - target / input
You just sum the values for both targets and inputs ONLY if the input(i) is different from ZERO.

Select weight of action from a tensorflow model

I have a small model used in a reinforcement learning context.
I can input a 2d tensor of states, and I get a 2d tensor of action weigths.
Let say I input two states and I get the following action weights out:
[[0.1, 0.2],
[0.3, 0.4]]
Now I have another 2d tensor which have the action number from which I want to get the weights:
[[1],
[0]]
How can I use this tensor to get the weight of actions?
In this example I'd like to get:
[[0.2],
[0.3]]
Similar to Tensorflow tf.gather with axis parameter, the indices are handled little different here:
a = tf.constant( [[0.1, 0.2], [0.3, 0.4]])
indices = tf.constant([[1],[0]])
# convert to full indices
full_indices = tf.stack([tf.range(indices.shape[0])[...,tf.newaxis], indices], axis=2)
# gather
result = tf.gather_nd(a,full_indices)
with tf.Session() as sess:
print(sess.run(result))
#[[0.2]
#[0.3]]
A simple way to do this is squeeze the dimensions of indices, element-wise multiply with corresponding one-hot vector and then expand the dimensions later.
import tensorflow as tf
weights = tf.constant([[0.1, 0.2], [0.3, 0.4]])
indices = tf.constant([[1], [0]])
# Reduce from 2d (2, 1) to 1d (2,)
indices1d = tf.squeeze(indices)
# One-hot vector corresponding to the indices. shape (2, 2)
action_one_hot = tf.one_hot(indices=indices1d, depth=weights.shape[1])
# Element-wise multiplication and sum across axis 1 to pick the weight. Shape (2,)
action_taken_weight = tf.reduce_sum(action_one_hot * weights, axis=1)
# Expand the dimension back to have a 2d. Shape (2, 1)
action_taken_weight2d = tf.expand_dims(action_taken_weight, axis=1)
sess = tf.InteractiveSession()
print("weights\n", sess.run(weights))
print("indices\n", sess.run(indices))
print("indices1d\n", sess.run(indices1d))
print("action_one_hot\n", sess.run(action_one_hot))
print("action_taken_weight\n", sess.run(action_taken_weight))
print("action_taken_weight2d\n", sess.run(action_taken_weight2d))
Should give you the following output:
weights
[[0.1 0.2]
[0.3 0.4]]
indices
[[1]
[0]]
indices1d
[1 0]
action_one_hot
[[0. 1.]
[1. 0.]]
action_taken_weight
[0.2 0.3]
action_taken_weight2d
[[0.2]
[0.3]]
Note: You can also do action_taken_weight = tf.reshape(action_taken_weight, tf.shape(indices)) instead of expand_dims.