Keras: Calculating loss as *median* across datapoints instead of mean - optimization

The Keras losses page says that if we have a custom loss function, then "the actual optimized objective is the mean of the output array across all datapoints." Is there any way we can optimize the median of the output array across all datapoints (instead of the mean)?

In order to do this, you would need to go down to tensorflow level
import keras
import tensorflow
def pick_median(arg_tensor):
the_upper_tensor = tensorflow.contrib.distributions.percentile(arg_tensor, 50, interpolation='higher')
the_lower_tensor = tensorflow.contrib.distributions.percentile(arg_tensor, 50, interpolation='lower')
final_tensor = (the_upper_tensor + the_lower_tensor) / 2
# print(the_count.eval(session=keras.backend.get_session()))
return final_tensor
Here is how you would define, let's say, median_squared_error loss function:
def median_squared_error(arg_y_true,
arg_y_pred):
final_tensor = keras.backend.square(arg_y_pred - arg_y_true)
final_tensor = pick_median(arg_tensor=final_tensor)
return final_tensor

Related

How to avoid memory leakage in an autoregressive model within tensorflow

Recently, I am training a LSTM with attention mechanism for regressionin tensorflow 2.9 and I met an problem during training with model.fit():
At the beginning, the training time is okay, like 7s/step. However, it was increasing during the process and after several steps, like 1000, the value might be 50s/step. Here below is a part of the code for my model:
class AttentionModel(tf.keras.Model):
def __init__(self, encoder_output_dim, dec_units, dense_dim, batch):
super().__init__()
self.dense_dim = dense_dim
self.batch = batch
encoder = Encoder(encoder_output_dim)
decoder = Decoder(dec_units,dense_dim)
self.encoder = encoder
self.decoder = decoder
def call(self, inputs):
# Creat a tensor to record the result
tempt = list()
encoder_output, encoder_state = self.encoder(inputs)
new_features = np.zeros((self.batch, 1, 1))
dec_initial_state = encoder_state
for i in range(6):
dec_inputs = DecoderInput(new_features=new_features, enc_output=encoder_output)
dec_result, dec_state = self.decoder(dec_inputs, dec_initial_state)
tempt.append(dec_result.logits)
new_features = dec_result.logits
dec_initial_state = dec_state
result=tf.concat(tempt,1)
return result
In the official documents for tf.function, I notice: "Don't rely on Python side effects like object mutation or list appends".
Since I use a dynamic python list with append() to record the intermediate variables, I guess each time during training, a new tf.graph was added. Is the reason my training is getting slower and slower?
Additionally, what should I use instead of python list to avoid this? I have tried with a numpy.zeros matrix but it will lead to another problem:
tempt = np.zeros(shape=(1,6))
...
for i in range(6):
dec_inputs = DecoderInput(new_features=new_features, enc_output=encoder_output)
dec_result, dec_state = self.decoder(dec_inputs, dec_initial_state)
tempt[i]=(dec_result.logits)
...
Cannot convert a symbolic tf.Tensor (decoder/dense_3/BiasAdd:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported.

Converting a fully connected neural network with variable number of hidden layers from tensorflow to pytorch

I recently started learning pytorch and I am trying to convert a part of a large script including coding a MLP with variable number of hidden layers from Tensorflow to pytorch.
import tensorflow as tf
### Base neural network
def init_mlp(layer_sizes, std=.01, bias_init=0.):
params = {'w':[], 'b':[]}
for n_in, n_out in zip(layer_sizes[:-1], layer_sizes[1:]):
params['w'].append(tf.Variable(tf.random_normal([n_in, n_out], stddev=std)))
params['b'].append(tf.Variable(tf.mul(bias_init, tf.ones([n_out,]))))
return params
def mlp(X, params):
h = [X]
for w,b in zip(params['w'][:-1], params['b'][:-1]):
h.append( tf.nn.relu( tf.matmul(h[-1], w) + b ) )
#h.append( tf.nn.tanh( tf.matmul(h[-1], w) + b ) )
return tf.matmul(h[-1], params['w'][-1]) + params['b'][-1]
def compute_nll(x, x_recon_linear):
return tf.reduce_sum(tf.nn.sigmoid_cross_entropy_with_logits(x_recon_linear, x), reduction_indices=1, keep_dims=True)
def gauss_cross_entropy(mean_post, std_post, mean_prior, std_prior):
d = (mean_post - mean_prior)
d = tf.mul(d,d)
return tf.reduce_sum(-tf.div(d + tf.mul(std_post,std_post),(2.*std_prior*std_prior)) - tf.log(std_prior*2.506628), reduction_indices=1, keep_dims=True)
how could I write down similarly weights and bias variables and attach them in each hidden layer in pytorch?
how could I convert gauss_cross_entropy and compute_nll
functions as well (finding equivalent syntax)?
Are these two codes compatible?
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as func
from torch.distributions import Normal, Categorical, Independent
from copy import
device = "cpu"
if torch.cuda.is_available():
device = "cuda:0"
if torch.cuda.device_count() > 1:
net = nn.DataParallel(net)
net.to(device)
def init_mlp(layer_sizes, std=.01, bias_init=0.):
params = {'w':[], 'b':[]}
for n_in, n_out in zip(layer_sizes[:-1], layer_sizes[1:]):
params['w'].append(torch.tensor(Normal([n_in, n_out], torch.tensor([std])) ,requires_grad=True))
params['b'].append(torch.tensor(torch.mul(bias_init, torch.ones([n_out,])),requires_grad=True))
return params
def mlp(X, params):
h = [X]
for w,b in zip(params['w'][:-1], params['b'][:-1]):
h.append( torch.nn.ReLU( tf.matmul(h[-1], w) + b ) )
return torch.matmul(h[-1], params['w'][-1]) + params['b'][-1]
def compute_nll(x, x_recon_linear):
return torch.sum(func.binary_cross_entropy_with_logits(x_recon_linear, x), reduction_indices=1, keep_dims=True)
def gauss_cross_entropy(mu_post, sigma_post, mu_prior, sigma_prior):
d = (mu_post - mu_prior)
d = torch.mul(d,d)
return torch.sum(-torch.div(d + torch.mul(sigma_post,sigma_post),(2.*sigma_prior*sigma_prior)) - torch.log(sigma_prior*2.506628), reduction_indices=1, keep_dims=True)
What is the substitute function for tf.placeholder in pytorch? For instance here:
class VAE(object):
def __init__(self, hyperParams):
self.X = tf.placeholder("float", [None, hyperParams['input_d']])
self.prior = hyperParams['prior']
self.K = hyperParams['K']
self.encoder_params = self.init_encoder(hyperParams)
self.decoder_params = self.init_decoder(hyperParams)
and also how should I change tf.shape in this line: tf.random_normal(tf.shape(self.sigma[-1]))
How could I write down similar weights and bias variables and attach them in each hidden layer in PyTorch?
An easier way to define those is to create a list containing the params as (weight, bias) tuples:
def init_mlp(layer_sizes, std=.01, bias_init=0.):
params = []
for n_in, n_out in zip(layer_sizes[:-1], layer_sizes[1:]):
params.append([
nn.init.normal_(torch.empty(n_in, n_out)).requires_grad_(True),
torch.empty(n_out).fill_(bias_init).requires_grad_(True)])
return params
Above I define my parameters as 'empty' (created with uninitialized data) tensors with torch.empty. I have used in-place functions such as nn.init.normal_ (there are many others available) and torch.Tensor.fill_ to fill the tensor with an arbitrary value (maybe it is .mul_(bias_init) you are looking for, based on your TensorFlow sample?).
For the inference code, you don't actually need to store the intermediate layer results:
def mlp(x, params):
for i, (W, b) in enumerate(params):
x = x#W + b
if i < len(params) - 1:
x = torch.relu(x)
return x
How could I convert gauss_cross_entropy and compute_nll functions as well (finding equivalent syntax)?
You can use PyTorch functions and mathematical operators to define your logic. For compute_loss you were using the built-in, which actually does not require summation after it, by default the losses of the batch elements are averaged.
def compute_loss(y_pred, y_true):
return F.binary_cross_entropy_with_logits(y_pred, y_true)
What is the substitute function for tf.placeholder in Pytorch?
You don't have placeholders in PyTorch, you compute your outputs explicitly using PyTorch operators, then you should be able to backpropagate through those operators and get the gradients for each parameter.
How should I change tf.shape in this line: tf.random_normal(tf.shape(self.sigma[-1]))
Function tf.shape returns the shape of the tensor, in PyTorch you call torch.Tensor.shape or by calling torch.Tensor.size: i.e. self.sigma[-1].shape or self.sigma[-1].size().

how to calculate entropy on float numbers over a tensor in python keras

I have been struggling on this and could not get it to work. hope someone can help me with this.
I want to calculate the entropy on each row of the tensor. Because my data are float numbers not integers I think I need to use bin_histogram.
For example a sample of my data is tensor =[[0.2, -0.1, 1],[2.09,-1.4,0.9]]
Just for information My model is seq2seq and written in keras with tensorflow backend.
This is my code so far: I need to correct rev_entropy
class entropy_measure(Layer):
def __init__(self, beta,batch, **kwargs):
self.beta = beta
self.batch = batch
self.uses_learning_phase = True
self.supports_masking = True
super(entropy_measure, self).__init__(**kwargs)
def call(self, x):
return K.in_train_phase(self.rev_entropy(x, self.beta,self.batch), x)
def get_config(self):
config = {'beta': self.beta}
base_config = super(entropy_measure, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
def rev_entropy(self, x, beta,batch):
for i in x:
i = pd.Series(i)
p_data = i.value_counts() # counts occurrence of each value
entropy = entropy(p_data) # get entropy from counts
rev = 1/(1+entropy)
return rev
new_f_w_t = x * (rev.reshape(rev.shape[0], 1))*beta
return new_f_w_t
Any input is much appreciated:)
It looks like you have a series of questions that come together on this issue. I'll settle it here.
You calculate entropy in the following form of scipy.stats.entropy according to your code:
scipy.stats.entropy(pk, qk=None, base=None)
Calculate the entropy of a distribution for given probability values.
If only probabilities pk are given, the entropy is calculated as S =
-sum(pk * log(pk), axis=0).
Tensorflow does not provide a direct API to calculate entropy on each row of the tensor. What we need to do is to implement the above formula.
import tensorflow as tf
import pandas as pd
from scipy.stats import entropy
a = [1.1,2.2,3.3,4.4,2.2,3.3]
res = entropy(pd.value_counts(a))
_, _, count = tf.unique_with_counts(tf.constant(a))
# [1 2 2 1]
prob = count / tf.reduce_sum(count)
# [0.16666667 0.33333333 0.33333333 0.16666667]
tf_res = -tf.reduce_sum(prob * tf.log(prob))
with tf.Session() as sess:
print('scipy version: \n',res)
print('tensorflow version: \n',sess.run(tf_res))
scipy version:
1.329661348854758
tensorflow version:
1.3296613488547582
Then we need to define a function and achieve for loop through tf.map_fn in your custom layer according to above code.
def rev_entropy(self, x, beta,batch):
def row_entropy(row):
_, _, count = tf.unique_with_counts(row)
prob = count / tf.reduce_sum(count)
return -tf.reduce_sum(prob * tf.log(prob))
value_ranges = [-10.0, 100.0]
nbins = 50
new_f_w_t = tf.histogram_fixed_width_bins(x, value_ranges, nbins)
rev = tf.map_fn(row_entropy, new_f_w_t,dtype=tf.float32)
new_f_w_t = x * 1/(1+rev)*beta
return new_f_w_t
Notes that the hidden layer will not produce a gradient that cannot propagate backwards since entropy is calculated on the basis of statistical probabilistic values. Maybe you need to rethink your hidden layer structure.

Custom Keras metric, changing

I am currently trying to create my own loss function for Keras (using Tensorflow backend). This is a simple categorical crossentropy but I am applying a factor on the 1st column to penalize more loss from the 1st class.
Yet I am new to Keras and I can't figure out how to translate my function (below) as I have to use symbolic expressions and it seems I can't go element-wise:
def custom_categorical_crossentropy(y_true, y_pred):
y_pred = np.clip(y_pred, _EPSILON, 1.0-_EPSILON)
out = np.zeros(y_true.shape).astype('float32')
for i in range(0,y_true.shape[0]):
for j in range (0,y_true.shape[1]):
#penalize more all elements on class 1 so that loss takes its low proportion in the dataset into account
if(j==0):
out[i][j] = -(prop_database*(y_true[i][j] * np.log(y_pred[i][j]) + (1.0 - y_true[i][j]) * np.log(1.0 - y_pred[i][j])))
else:
out[i][j] = -(y_true[i][j] * np.log(y_pred[i][j]) + (1.0 - y_true[i][j]) * np.log(1.0 - y_pred[i][j]))
out = np.mean(out.astype('float32'), axis=-1)
return tf.convert_to_tensor(out,
dtype=tf.float32,
name='custom_loss')
Can someone help me?
Many thanks!
You can use class_weight in the fit method to penalize classes without creating functions:
weights = {
0:2,
1:1,
2:1,
3:1,
...
}
model.compile(optimizer=chooseOne, loss='categorical_crossentropy')
model.fit(......., class_weight = weights)
This will make the first class be twice as important as the others.

Output error rate per label / confusion matrix

I train an image classifier using Keras up to around 98% test accuracy. Now I know that the overall accuracy is 98%, but i want to know the accuracy/error per distinct class/label.
Has Keras a builtin function for that or would I have to test this myself per class/label?
Update: Thanks #gionni. I didn't know the actual term was "Confusion Matrix". But that's what I am actually looking for. That being said, is there a function to generate one? I have to use Keras 1.2.2 by the way.
I had similar issue so I could share my code with you. The following function computes a single class accuracy:
def single_class_accuracy(interesting_class_id):
def fn(y_true, y_pred):
class_id_preds = K.argmax(y_pred, axis=-1)
# Replace class_id_preds with class_id_true for recall here
positive_mask = K.cast(K.equal(class_id_preds, interesting_class_id), 'int32')
true_mask = K.cast(K.equal(y_true, interesting_class_id), 'int32')
acc_mask = K.cast(K.equal(positive_mask, true_mask), 'float32')
class_acc = K.mean(acc_mask)
return class_acc
return fn
Now - if you want to get an accuracy for 0 class you could add it to metrics while compiling a model:
model.compile(..., metrics=[..., single_class_accuracy(0)])
If you want to have all classes accuracy you could type:
model.compile(...,
metrics=[...] + [single_class_accuracy(i) for i in range(nb_of_classes)])
There may be better options, but you can use this:
import numpy as np
#gather each true label
distinct, counts = np.unique(trueLabels,axis=0,return_counts=True)
for dist,count in zip(distinct, counts):
selector = (trueLabels == dist).all(axis=-1)
selectedX = testData[selector]
selectedY = trueLabels[selector]
print('\n\nEvaluating for ' + str(count) + ' occurrences of class ' + str(dist))
print(model.evaluate(selectedX,selectedY,verbose=0))