I'm trying to define a loss function on the following shape: (NUM_OF_STROKES, STROKE_LEN, 2)
For example, let say NUM_OF_STROKES=1, STROKE_LEN=4, it can be like:
[[[x1,y1], [x2,y2], [x3,y3], [x4,y4]]]
I want that my loss function will be the ditances between two corresponding points (sum of all the distances). For example:
p1 = [[[a1,b1], [a2,b2], [a3,b3], [a4,b4]]]
p2 = [[[c1,d1], [c2,d2], [c3,d3], [c4,d4]]]
loss = sqrt((a1-c1)^2 + (b1-d1)^2) + ... + sqrt((a4-c4)^2 + (b4-d4)^2)
In numpy, I can do:
np.sum(np.linalg.norm(np.array(p1) - np.array(p2), axis=1))
But I dont know how to do that in tensrflow
I'm working with tensorflow 2 , keras.
I think what you are looking for is:
tf.keras.backend.sum(tf.sqrt(tf.keras.backend.sum(tf.square(labels - predictions), axis=3)))
and axis = 3 for the batch
Related
I am trying to move a model from Tf1 to Torch.
The model is quite involved and I have been unable to get a portion of it to work. In particular, I have found that a function appears to return a result in PyTorch that is around 10% off the result the equivalent function in Tensorflow or Numpy.
I believe that this 10% difference is an error that impacts my loss function and prevents the model from learning.
I have isolated the function here and show both the torch and numpy ‘equivalents’. Attached is a link to the torch model and the comparison data needed. Below are two code segments. I believe the Numpy result is the better one because it both agrees the Tensorflow v1 result to an accuracy of 10e-05 and in the model I'm dealing with, this function trains successfully when the Torch equivalent does not.
My question is: how come the Numpy function returns better results than the Torch function and is there away of arranging the Torch function so it has accuracy closer to the Numpy function.
Regards,
Simon
The data needed to run this review is saved here:
https://drive.google.com/file/d/1lClIUWuHDGtibSXN2h5X-cyMaalU-cbX/view?usp=sharing
The full torch model is saved in a pickle for use with torch.load:
https://drive.google.com/file/d/1bFJYC5bHme7YmIbqTOjaxXvd-yrKczxH/view?usp=sharing
The data load and two functions:
with open('recovered_autoencoder_network.pkl', 'rb') as f:
recovered_autoencoder_network = pickle.load(f)
# parameters needed for this issue
params: Dict[str, Any] = {'weight_precision': torch.float64,
'sindy_precision': torch.float64,
'target_device': 'cuda'}
sindy_autoencoder = torch.load('saved_model.pkl')
sindy_autoencoder.to(params['target_device'])
# this is a version of the 'problem' function in torch.
def calculate_first_and_second_derivative_with_torch(input_and_derivatives, stack):
x, dx, ddx = input_and_derivatives
layer_count = len(stack)
for i in range(layer_count - 1):
x = torch.mm(x, stack[i].weights) + stack[i].bias
x = torch.sigmoid(x)
dx_prev = torch.mm(dx, stack[i].weights)
sigmoid_first_derivative = torch.mul(x, 1 - x)
sigmoid_second_derivative = torch.mul(sigmoid_first_derivative, 1 - 2 * x)
dx = torch.mul(sigmoid_first_derivative, dx_prev)
ddx = torch.mul(sigmoid_second_derivative, torch.square(dx_prev)) \
+ torch.mul(sigmoid_first_derivative, torch.mm(ddx, stack[i].weights))
dx = torch.mm(dx, stack[layer_count - 1].weights)
ddx = torch.mm(ddx, stack[layer_count - 1].weights)
return dx, ddx
# this is the equivalent 'problem' function in numpy.
def calculate_first_and_second_derivative_with_np(input, dx, ddx, weights, biases):
dz = dx
ddz = ddx
def sigmoid(x):
return 1 / (1 + np.exp(-x))
for i in range(len(weights) - 1):
input = np.matmul(input, weights[i]) + biases[i]
input = sigmoid(input)
dz_prev = np.matmul(dz, weights[i])
sigmoid_derivative = np.multiply(input, 1 - input)
sigmoid_derivative2 = np.multiply(sigmoid_derivative, 1 - 2 * input)
dz = np.multiply(sigmoid_derivative, dz_prev)
ddz = np.multiply(sigmoid_derivative2, np.square(dz_prev)) \
+ np.multiply(sigmoid_derivative, np.matmul(ddz, weights[i]))
dz = np.matmul(dz, weights[-1])
ddz = np.matmul(ddz, weights[-1])
return dz, ddz
dx_decode_np_test, ddx_decode_np_test = \
calculate_first_and_second_derivative_with_np(
recovered_autoencoder_network['v2_in_z'],
recovered_autoencoder_network['v2_in_dz'],
recovered_autoencoder_network['v2_in_sindy_predict'],
recovered_autoencoder_network['v2_in_decoder_weights'],
recovered_autoencoder_network['v2_in_decoder_biases'])
# Here I access the tensors recovered from the saved Tensorflow model and convert them to torch.
converted_stack = [torch.tensor(recovered_autoencoder_network['v2_in_z'],
device=torch.device(params['target_device']),
dtype=params['sindy_precision']),
torch.tensor(recovered_autoencoder_network['v2_in_dz'],
device=torch.device(params['target_device']),
dtype=params['sindy_precision']),
torch.tensor(recovered_autoencoder_network['v2_in_sindy_predict'],
device=torch.device(params['target_device']),
dtype=params['sindy_precision'])]
# Here I use the tensors captured from the tensorflow model (converted to torch)
# with the torch version of the function and the layers from the model.
dx_decode_torch_test, ddx_decode_torch_test = \
calculate_first_and_second_derivative_with_torch(converted_stack,
sindy_autoencoder.ψ_decoder_to_x)
# Here I show the error between the two functions.
print(dx_decode_np_test - dx_decode_torch_test, ddx_decode_np_test - ddx_decode_torch_test)
# Here I show that the Torch weights in the model feeding the Torch
# function are equivalent to the Numpy arrays feeding the Numpy
# function. (the weights were initialized from those arrays after conversion to Torch.tensor.
print(("\n\nWeight and bias comparison for two models (imported from np source)\n\n" +
"weights comparison: \nl1 {:.5f} ({:.2%})\nl2 {:.5f} ({:.2%})\nl3 {:.5f} ({:.2%})\nl4 {:.5f} ({:.2%})\n\n" +
"bias comparison: \nb1 {:.5f} ({:.2%})\nb2 {:.5f} ({:.2%})\nb3 {:.5f} ({:.2%})\nb4 {:.5f} ({:.2%}))")
.format(np.sum(sindy_autoencoder.ψ_decoder_to_x[0].weights.cpu().detach().numpy()
- recovered_autoencoder_network['v2_in_decoder_weights'][0]),
(np.sum(sindy_autoencoder.ψ_decoder_to_x[0].weights.cpu().detach().numpy()
- recovered_autoencoder_network['v2_in_decoder_weights'][0]))
/ np.sum(recovered_autoencoder_network['v2_in_decoder_weights'][0]),
(np.sum(sindy_autoencoder.ψ_decoder_to_x[1].weights.cpu().detach().numpy()
- recovered_autoencoder_network['v2_in_decoder_weights'][1])),
(np.sum(sindy_autoencoder.ψ_decoder_to_x[1].weights.cpu().detach().numpy()
- recovered_autoencoder_network['v2_in_decoder_weights'][1]))
/ np.sum(recovered_autoencoder_network['v2_in_decoder_weights'][1]),
(np.sum(sindy_autoencoder.ψ_decoder_to_x[2].weights.cpu().detach().numpy()
- recovered_autoencoder_network['v2_in_decoder_weights'][2])),
(np.sum(sindy_autoencoder.ψ_decoder_to_x[2].weights.cpu().detach().numpy()
- recovered_autoencoder_network['v2_in_decoder_weights'][2]))
/ np.sum(recovered_autoencoder_network['v2_in_decoder_weights'][2]),
(np.sum(sindy_autoencoder.ψ_decoder_to_x[3].weights.cpu().detach().numpy()
- recovered_autoencoder_network['v2_in_decoder_weights'][3])),
(np.sum(sindy_autoencoder.ψ_decoder_to_x[3].weights.cpu().detach().numpy()
- recovered_autoencoder_network['v2_in_decoder_weights'][3]))
/ np.sum(recovered_autoencoder_network['v2_in_decoder_weights'][3]),
(np.sum(sindy_autoencoder.ψ_decoder_to_x[0].bias.cpu().detach().numpy()
- recovered_autoencoder_network['v2_in_decoder_biases'][0])),
(np.sum(sindy_autoencoder.ψ_decoder_to_x[0].bias.cpu().detach().numpy()
- recovered_autoencoder_network['v2_in_decoder_biases'][0]))
/ np.sum(recovered_autoencoder_network['v2_in_decoder_biases'][0]),
(np.sum(sindy_autoencoder.ψ_decoder_to_x[1].bias.cpu().detach().numpy()
- recovered_autoencoder_network['v2_in_decoder_biases'][1])),
(np.sum(sindy_autoencoder.ψ_decoder_to_x[1].bias.cpu().detach().numpy()
- recovered_autoencoder_network['v2_in_decoder_biases'][1]))
/ np.sum(recovered_autoencoder_network['v2_in_decoder_biases'][1]),
(np.sum(sindy_autoencoder.ψ_decoder_to_x[2].bias.cpu().detach().numpy()
- recovered_autoencoder_network['v2_in_decoder_biases'][2])),
(np.sum(sindy_autoencoder.ψ_decoder_to_x[2].bias.cpu().detach().numpy()
- recovered_autoencoder_network['v2_in_decoder_biases'][2]))
/ np.sum(recovered_autoencoder_network['v2_in_decoder_biases'][2]),
(np.sum(sindy_autoencoder.ψ_decoder_to_x[3].bias.cpu().detach().numpy()
- recovered_autoencoder_network['v2_in_decoder_biases'][3])),
(np.sum(sindy_autoencoder.ψ_decoder_to_x[3].bias.cpu().detach().numpy()
- recovered_autoencoder_network['v2_in_decoder_biases'][3]))
/ np.sum(recovered_autoencoder_network['v2_in_decoder_biases'][3])))
I am currently trying to create my own loss function for Keras (using Tensorflow backend). This is a simple categorical crossentropy but I am applying a factor on the 1st column to penalize more loss from the 1st class.
Yet I am new to Keras and I can't figure out how to translate my function (below) as I have to use symbolic expressions and it seems I can't go element-wise:
def custom_categorical_crossentropy(y_true, y_pred):
y_pred = np.clip(y_pred, _EPSILON, 1.0-_EPSILON)
out = np.zeros(y_true.shape).astype('float32')
for i in range(0,y_true.shape[0]):
for j in range (0,y_true.shape[1]):
#penalize more all elements on class 1 so that loss takes its low proportion in the dataset into account
if(j==0):
out[i][j] = -(prop_database*(y_true[i][j] * np.log(y_pred[i][j]) + (1.0 - y_true[i][j]) * np.log(1.0 - y_pred[i][j])))
else:
out[i][j] = -(y_true[i][j] * np.log(y_pred[i][j]) + (1.0 - y_true[i][j]) * np.log(1.0 - y_pred[i][j]))
out = np.mean(out.astype('float32'), axis=-1)
return tf.convert_to_tensor(out,
dtype=tf.float32,
name='custom_loss')
Can someone help me?
Many thanks!
You can use class_weight in the fit method to penalize classes without creating functions:
weights = {
0:2,
1:1,
2:1,
3:1,
...
}
model.compile(optimizer=chooseOne, loss='categorical_crossentropy')
model.fit(......., class_weight = weights)
This will make the first class be twice as important as the others.
I am trying to train an autoencoder NN (3 layers - 2 visible, 1 hidden) using numpy and scipy for the MNIST digits images dataset. The implementation is based on the notation given here Below is my code:
def autoencoder_cost_and_grad(theta, visible_size, hidden_size, lambda_, data):
"""
The input theta is a 1-dimensional array because scipy.optimize.minimize expects
the parameters being optimized to be a 1d array.
First convert theta from a 1d array to the (W1, W2, b1, b2)
matrix/vector format, so that this follows the notation convention of the
lecture notes and tutorial.
You must compute the:
cost : scalar representing the overall cost J(theta)
grad : array representing the corresponding gradient of each element of theta
"""
training_size = data.shape[1]
# unroll theta to get (W1,W2,b1,b2) #
W1 = theta[0:hidden_size*visible_size]
W1 = W1.reshape(hidden_size,visible_size)
W2 = theta[hidden_size*visible_size:2*hidden_size*visible_size]
W2 = W2.reshape(visible_size,hidden_size)
b1 = theta[2*hidden_size*visible_size:2*hidden_size*visible_size + hidden_size]
b2 = theta[2*hidden_size*visible_size + hidden_size: 2*hidden_size*visible_size + hidden_size + visible_size]
#feedforward pass
a_l1 = data
z_l2 = W1.dot(a_l1) + numpy.tile(b1,(training_size,1)).T
a_l2 = sigmoid(z_l2)
z_l3 = W2.dot(a_l2) + numpy.tile(b2,(training_size,1)).T
a_l3 = sigmoid(z_l3)
#backprop
delta_l3 = numpy.multiply(-(data-a_l3),numpy.multiply(a_l3,1-a_l3))
delta_l2 = numpy.multiply(W2.T.dot(delta_l3),
numpy.multiply(a_l2, 1 - a_l2))
b2_derivative = numpy.sum(delta_l3,axis=1)/training_size
b1_derivative = numpy.sum(delta_l2,axis=1)/training_size
W2_derivative = numpy.dot(delta_l3,a_l2.T)/training_size + lambda_*W2
#print(W2_derivative.shape)
W1_derivative = numpy.dot(delta_l2,a_l1.T)/training_size + lambda_*W1
W1_derivative = W1_derivative.reshape(hidden_size*visible_size)
W2_derivative = W2_derivative.reshape(visible_size*hidden_size)
b1_derivative = b1_derivative.reshape(hidden_size)
b2_derivative = b2_derivative.reshape(visible_size)
grad = numpy.concatenate((W1_derivative,W2_derivative,b1_derivative,b2_derivative))
cost = 0.5*numpy.sum((data-a_l3)**2)/training_size + 0.5*lambda_*(numpy.sum(W1**2) + numpy.sum(W2**2))
return cost,grad
I have also implemented a function to estimate the numerical gradient and verify the correctness of my implementation (below).
def compute_gradient_numerical_estimate(J, theta, epsilon=0.0001):
"""
:param J: a loss (cost) function that computes the real-valued loss given parameters and data
:param theta: array of parameters
:param epsilon: amount to vary each parameter in order to estimate
the gradient by numerical difference
:return: array of numerical gradient estimate
"""
gradient = numpy.zeros(theta.shape)
eps_vector = numpy.zeros(theta.shape)
for i in range(0,theta.size):
eps_vector[i] = epsilon
cost1,grad1 = J(theta+eps_vector)
cost2,grad2 = J(theta-eps_vector)
gradient[i] = (cost1 - cost2)/(2*epsilon)
eps_vector[i] = 0
return gradient
The norm of the difference between the numerical estimate and the one computed by the function is around 6.87165125021e-09 which seems to be acceptable. My main problem seems to be to get the gradient descent algorithm "L-BGFGS-B" working using the scipy.optimize.minimize function as below:
# theta is the 1-D array of(W1,W2,b1,b2)
J = lambda x: utils.autoencoder_cost_and_grad(theta, visible_size, hidden_size, lambda_, patches_train)
options_ = {'maxiter': 4000, 'disp': False}
result = scipy.optimize.minimize(J, theta, method='L-BFGS-B', jac=True, options=options_)
I get the below output from this:
scipy.optimize.minimize() details:
fun: 90.802022224079778
hess_inv: <16474x16474 LbfgsInvHessProduct with dtype=float64>
jac: array([ -6.83667742e-06, -2.74886002e-06, -3.23531941e-06, ...,
1.22425735e-01, 1.23425062e-01, 1.28091250e-01])
message: b'ABNORMAL_TERMINATION_IN_LNSRCH'
nfev: 21
nit: 0
status: 2
success: False
x: array([-0.06836677, -0.0274886 , -0.03235319, ..., 0. ,
0. , 0. ])
Now, this post seems to indicate that the error could mean that the gradient function implementation could be wrong? But my numerical gradient estimate seems to confirm that my implementation is correct. I have tried varying the initial weights by using a uniform distribution as specified here but the problem still persists. Is there anything wrong with my backprop implementation?
Turns out the issue was a syntax error (very silly) with this line:
J = lambda x: utils.autoencoder_cost_and_grad(theta, visible_size, hidden_size, lambda_, patches_train)
I don't even have the lambda parameter x in the function declaration. So the theta array wasn't even being passed whenever J was being invoked.
This fixed it:
J = lambda x: utils.autoencoder_cost_and_grad(x, visible_size, hidden_size, lambda_, patches_train)
I train an image classifier using Keras up to around 98% test accuracy. Now I know that the overall accuracy is 98%, but i want to know the accuracy/error per distinct class/label.
Has Keras a builtin function for that or would I have to test this myself per class/label?
Update: Thanks #gionni. I didn't know the actual term was "Confusion Matrix". But that's what I am actually looking for. That being said, is there a function to generate one? I have to use Keras 1.2.2 by the way.
I had similar issue so I could share my code with you. The following function computes a single class accuracy:
def single_class_accuracy(interesting_class_id):
def fn(y_true, y_pred):
class_id_preds = K.argmax(y_pred, axis=-1)
# Replace class_id_preds with class_id_true for recall here
positive_mask = K.cast(K.equal(class_id_preds, interesting_class_id), 'int32')
true_mask = K.cast(K.equal(y_true, interesting_class_id), 'int32')
acc_mask = K.cast(K.equal(positive_mask, true_mask), 'float32')
class_acc = K.mean(acc_mask)
return class_acc
return fn
Now - if you want to get an accuracy for 0 class you could add it to metrics while compiling a model:
model.compile(..., metrics=[..., single_class_accuracy(0)])
If you want to have all classes accuracy you could type:
model.compile(...,
metrics=[...] + [single_class_accuracy(i) for i in range(nb_of_classes)])
There may be better options, but you can use this:
import numpy as np
#gather each true label
distinct, counts = np.unique(trueLabels,axis=0,return_counts=True)
for dist,count in zip(distinct, counts):
selector = (trueLabels == dist).all(axis=-1)
selectedX = testData[selector]
selectedY = trueLabels[selector]
print('\n\nEvaluating for ' + str(count) + ' occurrences of class ' + str(dist))
print(model.evaluate(selectedX,selectedY,verbose=0))
I'm using tensorflow batch normalization in my deep neural network successfully. I'm doing it the following way:
if apply_bn:
with tf.variable_scope('bn'):
beta = tf.Variable(tf.constant(0.0, shape=[out_size]), name='beta', trainable=True)
gamma = tf.Variable(tf.constant(1.0, shape=[out_size]), name='gamma', trainable=True)
batch_mean, batch_var = tf.nn.moments(z, [0], name='moments')
ema = tf.train.ExponentialMovingAverage(decay=0.5)
def mean_var_with_update():
ema_apply_op = ema.apply([batch_mean, batch_var])
with tf.control_dependencies([ema_apply_op]):
return tf.identity(batch_mean), tf.identity(batch_var)
mean, var = tf.cond(self.phase_train,
mean_var_with_update,
lambda: (ema.average(batch_mean), ema.average(batch_var)))
self.z_prebn.append(z)
z = tf.nn.batch_normalization(z, mean, var, beta, gamma, 1e-3)
self.z.append(z)
self.bn.append((mean, var, beta, gamma))
And it works fine both for training and testing phases.
However I encounter problems when I try to use the computed neural network parameters in my another project, where I need to compute all the matrix multiplications and stuff by myself. The problem is that I can't reproduce the behavior of the tf.nn.batch_normalization function:
feed_dict = {
self.tf_x: np.array([range(self.x_cnt)]) / 100,
self.keep_prob: 1,
self.phase_train: False
}
for i in range(len(self.z)):
# print 0 layer's 1 value of arrays
print(self.sess.run([
self.z_prebn[i][0][1], # before bn
self.bn[i][0][1], # mean
self.bn[i][1][1], # var
self.bn[i][2][1], # offset
self.bn[i][3][1], # scale
self.z[i][0][1], # after bn
], feed_dict=feed_dict))
# prints
# [-0.077417567, -0.089603029, 0.000436493, -0.016652612, 1.0055743, 0.30664611]
According to the formula on the page https://www.tensorflow.org/versions/r1.2/api_docs/python/tf/nn/batch_normalization:
bn = scale * (x - mean) / (sqrt(var) + 1e-3) + offset
But as we can see,
1.0055743 * (-0.077417567 - -0.089603029)/(0.000436493^0.5 + 1e-3) + -0.016652612
= 0.543057
Which differs from the value 0.30664611, computed by Tensorflow itself.
So what am I doing wrong here and why I can't just calculate batch normalized value myself?
Thanks in advance!
The formula used is slightly different from:
bn = scale * (x - mean) / (sqrt(var) + 1e-3) + offset
It should be:
bn = scale * (x - mean) / (sqrt(var + 1e-3)) + offset
The variance_epsilon variable is supposed to scale with the variance, not with sigma, which is the square-root of variance.
After the correction, the formula yields the correct value:
1.0055743 * (-0.077417567 - -0.089603029)/((0.000436493 + 1e-3)**0.5) + -0.016652612
# 0.30664642276945747