How is the gradient and hessian of logarithmic loss computed in the custom objective function example script in xgboost's github repository? - numpy

I would like to understand how the gradient and hessian of the logloss function are computed in an xgboost sample script.
I've simplified the function to take numpy arrays, and generated y_hat and y_true which are a sample of the values used in the script.
Here is a simplified example:
import numpy as np
def loglikelihoodloss(y_hat, y_true):
prob = 1.0 / (1.0 + np.exp(-y_hat))
grad = prob - y_true
hess = prob * (1.0 - prob)
return grad, hess
y_hat = np.array([1.80087972, -1.82414818, -1.82414818, 1.80087972, -2.08465433,
-1.82414818, -1.82414818, 1.80087972, -1.82414818, -1.82414818])
y_true = np.array([1., 0., 0., 1., 0., 0., 0., 1., 0., 0.])
loglikelihoodloss(y_hat, y_true)
The log loss function is the sum of where .
The gradient (with respect to p) is then however in the code its .
Likewise the second derivative (with respect to p) is however in the code it is .
How are the equations equal?

The log loss function is given as:
where
Taking the partial derivative we get the gradient as
Thus we get the negative of gradient as p-y.
Similar calculations can be done to obtain the hessian.

Related

Jacobian of a vector in Tensorflow

I think this question has never been properly answered 8see How to calculate the Jacobian of a vector function with tensorflow or Computing Jacobian in TensorFlow 2.0), so I will try again:
I want to compute the jacobian of the vector valued function z = [x**2 + 2*y, y**2], that is, I want to obtain the matrix of the partial derivatives
[[2x, 0],
[2, 2y]]
(being automatic differentiation, this matrix will be for an specific point).
with tf.GradientTape() as g:
x = tf.Variable(1.0)
y = tf.Variable(4.0)
z = tf.convert_to_tensor([x**2 + 2*y, y**2])
jacobian = g.jacobian(z, [x, y])
print(jacobian)
Obtaining
[<tf.Tensor: shape=(2,), dtype=float32, numpy=array([2., 0.], dtype=float32)>, <tf.Tensor: shape=(2,), dtype=float32, numpy=array([2., 8.], dtype=float32)>]
I want to obtain naturally the tensor
[[2., 0.],
[2., 8.]]
not that intermediate result. Can it be done?
Try some thing like this
import numpy as np
import tensorflow as tf
with tf.GradientTape() as g:
x = tf.Variable(1.0)
y = tf.Variable(4.0)
z = tf.convert_to_tensor([x**2 + 2*y, y**2])
jacobian = g.jacobian(z, [x, y])
print(np.array([jacob.numpy() for jacob in jacobian]))
Result
[[2. 0.]
[2. 8.]]

tf.keras.losses.CategoricalCrossentropy gives different values than plain implementation

Any one knows why raw implementation of Categorical Crossentropy function is so different from the tf.keras's api function?
import tensorflow as tf
import math
tf.enable_eager_execution()
y_true =np.array( [[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])
y_pred = np.array([[.9, .05, .05], [.5, .89, .6], [.05, .01, .94]])
ce = tf.keras.losses.CategoricalCrossentropy()
res = ce(y_true, y_pred).numpy()
print("use api:")
print(res)
print()
print("implementation:")
step1 = -y_true * np.log(y_pred )
step2 = np.sum(step1, axis=1)
print("step1.shape:", step1.shape)
print(step1)
print("sum step1:", np.sum(step1, ))
print("mean step1", np.mean(step1))
print()
print("step2.shape:", step2.shape)
print(step2)
print("sum step2:", np.sum(step2, ))
print("mean step2", np.mean(step2))
Above gives:
use api:
0.3239681124687195
implementation:
step1.shape: (3, 3)
[[0.10536052 0. 0. ]
[0. 0.11653382 0. ]
[0. 0. 0.0618754 ]]
sum step1: 0.2837697356318653
mean step1 0.031529970625762814
step2.shape: (3,)
[0.10536052 0.11653382 0.0618754 ]
sum step2: 0.2837697356318653
mean step2 0.09458991187728844
If now with another y_true and y_pred:
y_true = np.array([[0, 1]])
y_pred = np.array([[0.99999999999, 0.00000000001]])
It gives:
use api:
16.11809539794922
implementation:
step1.shape: (1, 2)
[[-0. 25.32843602]]
sum step1: 25.328436022934504
mean step1 12.664218011467252
step2.shape: (1,)
[25.32843602]
sum step2: 25.328436022934504
mean step2 25.328436022934504
The difference is because of these values: [.5, .89, .6], since it's sum is not equal to 1. I think you have made a mistake and you meant this instead: [.05, .89, .06].
If you provide the values with sum equal to 1, then both formulas results will be the same:
import tensorflow as tf
import numpy as np
y_true = np.array( [[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])
y_pred = np.array([[.9, .05, .05], [.05, .89, .06], [.05, .01, .94]])
print(tf.keras.losses.categorical_crossentropy(y_true, y_pred).numpy())
print(np.sum(-y_true * np.log(y_pred), axis=1))
#output
#[0.10536052 0.11653382 0.0618754 ]
#[0.10536052 0.11653382 0.0618754 ]
However, let's explore how is calculated if the y_pred tensor is not scaled (the sum of values is not equal to 1)? If you look at the source code of categorical cross entropy here, you will see that it scales y_pred so that the class probas of each sample sum to 1:
if not from_logits:
# scale preds so that the class probas of each sample sum to 1
output /= tf.reduce_sum(output,
reduction_indices=len(output.get_shape()) - 1,
keep_dims=True)
since we passed a pred which the sum of probas is not 1, let's see how this operation changes our tensor [.5, .89, .6]:
output = tf.constant([.5, .89, .6])
output /= tf.reduce_sum(output,
axis=len(output.get_shape()) - 1,
keepdims=True)
print(output.numpy())
# array([0.2512563 , 0.44723618, 0.30150756], dtype=float32)
So, it should be equal if we replace the above operation output (scaled y_pred), and pass it to your own implemented categorical cross entropy, with the unscaled y_pred passing to tensorflow implementation:
y_true =np.array( [[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])
#unscaled y_pred
y_pred = np.array([[.9, .05, .05], [.5, .89, .6], [.05, .01, .94]])
print(tf.keras.losses.categorical_crossentropy(y_true, y_pred).numpy())
#scaled y_pred (categorical_crossentropy scales above tensor to this internally)
y_pred = np.array([[.9, .05, .05], [0.2512563 , 0.44723618, 0.30150756], [.05, .01, .94]])
print(np.sum(-y_true * np.log(y_pred), axis=1))
Output:
[0.10536052 0.80466845 0.0618754 ]
[0.10536052 0.80466846 0.0618754 ]
Now, let's explore the results of your second example. Why your second example shows different output?
If you check the source code again, you will see this line:
output = tf.clip_by_value(output, epsilon, 1. - epsilon)
which clips values below than a threshold. Your input [0.99999999999, 0.00000000001] will be converted to [0.9999999, 0.0000001] in this line, so it gives you a different result:
y_true = np.array([[0, 1]])
y_pred = np.array([[0.99999999999, 0.00000000001]])
print(tf.keras.losses.categorical_crossentropy(y_true, y_pred).numpy())
print(np.sum(-y_true * np.log(y_pred), axis=1))
#now let's first clip the values less than epsilon, then compare loss
epsilon=1e-7
y_pred = tf.clip_by_value(y_pred, epsilon, 1. - epsilon)
print(tf.keras.losses.categorical_crossentropy(y_true, y_pred).numpy())
print(np.sum(-y_true * np.log(y_pred), axis=1))
Output:
#results without clipping values
[16.11809565]
[25.32843602]
#results after clipping values if there is a value less than epsilon (1e-7)
[16.11809565]
[16.11809565]

How to specify spearman rank correlation as a loss function in keras?

I wanted to write a loss function that maximizes the spearman rank correlation between two vectors in keras. Unfortunately I could not find an existing implementation, nor a good method to calculate the rank of a vector in keras, so that I could use the formula to implement it myself
def rank_correlation(y_true, y_pred):
pass
model = tensorflow.keras.Sequential()
#### More model code
model.compile(loss=rank_correlation)
Can anyone please help me implement rank_correlation ?
You can try something like as follows, referenced.
from scipy.stats import spearmanr
def compute_spearmanr(y, y_pred):
spearsum = 0
cnt = 0
for col in range(y_pred.shape[1]):
v = spearmanr(y_pred[:,col], y[:,col]).correlation
if np.isnan(v):
continue
spearsum += v
cnt += 1
res = spearsum / cnt
return res
a = np.array([[2., 1., 2., 3.],[3., 3., 4., 5.]] )
b = np.array([[1., 0., 0., 3.], [1., 0., 3., 3.]])
compute_spearmanr(a, b)
0.9999999999999999

T5 Encoder model output all zeros?

I am trying out a project where I use the T5EncoderModel from HuggingFace in order to obtain hidden representations of my input sentences. I have 100K sentences which I tokenize and pad as follows:
for sentence in dataset[original]:
sentence = tokenizer(sentence, max_length=40, padding='max_length', return_tensors='tf', truncation= True)
original_sentences.append(sentence.input_ids)
org_mask.append(sentence.attention_mask)
This gives me the right outputs and tokenizes everything decently. The problem I achieve is when I am trying to actually train the model. The setup is a bit complex and is taken from https://keras.io/examples/vision/semantic_image_clustering/ which I am trying to apply to text.
The set-up for training is as follows:
def create_encoder(rep_dim):
encoder = TFT5EncoderModel.from_pretrained('t5-small', output_hidden_states=True)
encoder.trainable = True
original_input = Input(shape=(max_length), name = 'originalIn', dtype=tf.int32)
augmented_input = Input(shape=(max_length), name = 'originalIn', dtype=tf.int32)
concat = keras.layers.Concatenate(axis=1)([original_input, augmented_input])
#Take 0-index because it returns a TFBERTmodel type, and 0 returns a tensor
encoded = encoder(input_ids=concat)[0]
#This outputs shape: [sentences, max_length, encoded_dims]
output = Dense(rep_dim, activation='relu')(encoded)
return encoder
This function is fed into the ReprensentationLearner class from the above link as such:
class RepresentationLearner(keras.Model):
def __init__(
self,
encoder,
projection_units,
temperature=0.8,
dropout_rate=0.1,
l2_normalize=False,
**kwargs
):
super(RepresentationLearner, self).__init__(**kwargs)
self.encoder = encoder
# Create projection head.
self.projector = keras.Sequential(
[
layers.Dropout(dropout_rate),
layers.Dense(units=projection_units, use_bias=False),
layers.BatchNormalization(),
layers.ReLU(),
]
)
self.temperature = temperature
self.l2_normalize = l2_normalize
self.loss_tracker = keras.metrics.Mean(name="loss")
#property
def metrics(self):
return [self.loss_tracker]
def compute_contrastive_loss(self, feature_vectors, batch_size):
num_augmentations = tf.shape(feature_vectors)[0] // batch_size
if self.l2_normalize:
feature_vectors = tf.math.l2_normalize(feature_vectors, -1)
# The logits shape is [num_augmentations * batch_size, num_augmentations * batch_size].
logits = (
tf.linalg.matmul(feature_vectors, feature_vectors, transpose_b=True)
/ self.temperature
)
# Apply log-max trick for numerical stability.
logits_max = tf.math.reduce_max(logits, axis=1)
logits = logits - logits_max
# The shape of targets is [num_augmentations * batch_size, num_augmentations * batch_size].
# targets is a matrix consits of num_augmentations submatrices of shape [batch_size * batch_size].
# Each [batch_size * batch_size] submatrix is an identity matrix (diagonal entries are ones).
targets = tf.tile(tf.eye(batch_size), [num_augmentations, num_augmentations])
# Compute cross entropy loss
return keras.losses.categorical_crossentropy(
y_true=targets, y_pred=logits, from_logits=True
)
def call(self, inputs):
features = self.encoder(inputs[0])[0]
# Apply projection head.
return self.projector(features[0])
def train_step(self, inputs):
batch_size = tf.shape(inputs)[0]
# Run the forward pass and compute the contrastive loss
with tf.GradientTape() as tape:
feature_vectors = self(inputs, training=True)
loss = self.compute_contrastive_loss(feature_vectors, batch_size)
# Compute gradients
trainable_vars = self.trainable_variables
gradients = tape.gradient(loss, trainable_vars)
# Update weights
self.optimizer.apply_gradients(zip(gradients, trainable_vars))
# Update loss tracker metric
self.loss_tracker.update_state(loss)
# Return a dict mapping metric names to current value
return {m.name: m.result() for m in self.metrics}
def test_step(self, inputs):
batch_size = tf.shape(inputs)[0]
feature_vectors = self(inputs, training=False)
loss = self.compute_contrastive_loss(feature_vectors, batch_size)
self.loss_tracker.update_state(loss)
return {"loss": self.loss_tracker.result()}
In order to train it, I use the Colab TPU and train it as such:
with strategy.scope():
encoder = create_encoder(rep_dim)
training_model = RepresentationLearner(encoder=encoder, projection_units=128, temperature=0.1)
lr_scheduler = keras.experimental.CosineDecay(initial_learning_rate=0.001, decay_steps=500, alpha=0.1)
training_model.compile(optimizer=tfa.optimizers.AdamW(learning_rate=lr_scheduler, weight_decay=0.0001))
history = training_model.fit(x = [original_train, augmented_train], batch_size=32*8, epocs = 10)
training_model.save_weights('representation_learner.h5', overwrite=True)
Note that I am giving my model two inputs. When I predict on my input data, I get all zeros, and I can not seem to understand why. I predict as follows:
training_model.load_weights('representation_learner.h5')
feature_vectors= training_model.predict([[original_train, augmented_train]], verbose = 1)
And the output is:
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]], dtype=float32)
With a way too large shape of (1000000, 128)

To create a custom metrics for regression according to the rule book of Keras' documentation

I found two main sources about it.
A tutorial, not done according to the rule book (I prefer avoid)
Keras' documentation (what I prefer to avoid surprises)
I prefer to follow Keras' documentation to avoid memory leak as it's the case for some people who try custom approaches with Keras.
But what Keras' is showing in the documentation is about classification. This is not my case.
So I tried to look at the source code of Keras. Precisely in the file: /lib/python3.7/site-packages/tensorflow_core/python/keras/metrics.py. It does not help me at all because most of metrics (some exception are classification metrics) are all done with a wrapper as the following code:
#keras_export('keras.metrics.MeanSquaredError')
class MeanSquaredError(MeanMetricWrapper):
"""Computes the mean squared error between `y_true` and `y_pred`.
For example, if `y_true` is [0., 0., 1., 1.], and `y_pred` is [1., 1., 1., 0.]
the mean squared error is 3/4 (0.75).
Usage:
```python
m = tf.keras.metrics.MeanSquaredError()
m.update_state([0., 0., 1., 1.], [1., 1., 1., 0.])
print('Final result: ', m.result().numpy()) # Final result: 0.75
```
Usage with tf.keras API:
```python
model = tf.keras.Model(inputs, outputs)
model.compile('sgd', metrics=[tf.keras.metrics.MeanSquaredError()])
```
"""
def __init__(self, name='mean_squared_error', dtype=None):
super(MeanSquaredError, self).__init__(
mean_squared_error, name, dtype=dtype)
As you can see there's only the constructor method, no good inspiration available for the udpate_state method that I need.
Where can I find it ?
python 3.7.7
tensorflow 2.1.0
keras-applications 1.0.8
keras-preprocessing 1.1.0
You can use a loss function as a metric, so you can extend keras.losses.Loss instead. You only need to override call as shown in the documentation
import tensorflow as tf
class MeanSquaredError(tf.keras.losses.Loss):
def call(self, y_true, y_pred):
y_true = tf.cast(y_true, y_pred.dtype)
return tf.math.reduce_mean(tf.math.square(y_pred - y_true), axis=-1)