I want to create a custom loss function in keras.
Let's say I have yTrue and yPred which are tensors (n x m) of true and predicted labels.
Let's call each sample n (that is, each row in yTrue and yPred) yT and yP.
Then I want a loss function that computes (yT-yP)^2 when yT[0] == 1, otherwise it will compute (yT[0]-yP[0])^2.
That is: for each sample I always want to calculate the squared error for the first element - but I want to calculate the squared error of the other elements only if the first element of the true label == 1.
How do I do this in a custom loss function?
This is what I have gotten so far:
I need to do this with tensors operations.
First I can compute
Y = (yTrue - yPred)^2
Then I can define a masking matrix where the first column is always one, and the others are 1 depending on the value of the first element for each row of yTrue.
So I can get something like
1 0 0 0 0
1 0 0 0 0
1 1 1 1 1
1 1 1 1 1
1 0 0 0 0
I can then multiply element wise this matrix with Y and obtain what I want.
However, how do I get in generating the masking matrix? In particular, how do I do the condition "if the first element of the row is 1" in tensorflow/keras?
Maybe there is a better way to do this?
You can use a conditional switch K.switch in the backend. Something along the lines of:
mse = K.mean(K.square(y_pred - y_true), axis=-1) # standard mse
msep = K.square(y_pred[:,0] - y_true[:,0])
return K.switch(K.equals(y_true[:,0], 1), mse, msep)
Edit for handling per sample condition.
I am training a UNet shaped CNN and have to deal with data imbalances. I want to minimise false negatives, so I want to implement a custom loss function that does so. I created the following loss function:
from tensorflow.keras import backend as K
def fbeta_loss(y_true, y_pred, beta=2., epsilon=K.epsilon()):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
tp = K.sum(y_true_f * y_pred_f)
predicted_positive = K.sum(y_pred_f)
actual_positive = K.sum(y_true_f)
precision = tp/(predicted_positive+epsilon) # calculating precision
recall = tp/(actual_positive+epsilon) # calculating recall
# calculating fbeta
beta_squared = K.square(beta)
fb = (1+beta_squared)*precision*recall / (beta_squared*precision + recall + epsilon)
return 1-fb
However, I am not sure if y_pred is binary, or a float number between 0 and 1. In my final layer I use a sigmoid activation. Does that mean if I create a custom loss function y_pred is a float between 0 and 1, and I should add a step that maps every value higher then a threshold(0.5) to 1 and lower to 0? Or is that step already included in the Keras model? Since in similar custom loss implementations that step is often not included, e.g. .
Hopefully this is sort of clear, I am relatively new to stackoverflow. Let me know if anything is missing! Thanks in advance.
The output of sigmoid activation function is always between 0 and 1.
In the limit of x tending towards infinity, S(x) converges to 1, and in the limit of x tending towards negative infinity, S(x) converges to 0. Here, the word converges does not mean that S(x) reach any of 0 or 1 but it converges to 0 and 1.
And so the output of S(x) is always a float between 0 and 1.
Range of S(x):
0 < S(x) < 1
I'm working on a simple classification problem. I proceeded through the example and created a model.
I arranged the tag column as given below.
label 0 1 1 0 0 1
As a result, I wanted to test the system with samples. But it does value as a percentage.
I expect it to give 2 correct values, either 0 or 1.
example codes;
input_dict = {name: tf.convert_to_tensor([value]) for name, value in sample.items()}
predictions = reloaded_model.predict(input_dict)
prob = tf.nn.sigmoid(predictions[0])
"This particular pet had a %.1f percent probability "
"of getting adopted." % (100 * prob)
What code will result in 0 and 1?
thank you.
What to do depends on how you model was constructed. With only two labels you are doing binary classification. If in your model the last dense layer has 1 neuron then it is set up for binary classification. In that case your loss function in model.compile should be
Model.predict in that case will produce a single value probability output. You can just use an if statement to determine the class. If the probability is less than.5 it is one class, if greator or equal to .5 it is the other class. Now you may have constructed your model where the last dense layer has 2 neurons. In that case you should be using either sparse_categorical_crossentropy if the labels were integers or categorical_crossentropy if the labels were one hot encoded as your loss function. Model.predict in this case will produce two probabilities as the output. You want to select the index of with the highest probability as the class.
You can do that with class=np.argmax(predictions)
I am using this function to calculate distance between 2 vectors a,b, of size 300, word2vec, I get the distance between 'hot' and 'cold' to be equal 1.
How to add this value (1) to a vector, becz i thought simply new_vec=model['hot']+1, but when I do the calc dist(new_vec,model['hot'])=17?
import numpy
def dist(a,b):
return numpy.linalg.norm(a-b)
I expected dist(a,c) will give me back 1!
You should review what the norm is. In the case of numpy, the default is to use the L-2 norm (a.k.a the Euclidean norm). When you add 1 to a vector, the call is to add 1 to all of the elements in the vector.
>> vec1 = np.random.normal(0,1,size=300)
>> print(vec1[:5])
... [ 1.18469795 0.04074346 -1.77579852 0.23806222 0.81620881]
>> vec2 = vec1 + 1
>> print(vec2[:5])
... [ 2.18469795 1.04074346 -0.77579852 1.23806222 1.81620881]
Now, your call to norm is saying sqrt( (a1-b1)**2 + (a2-b2)**2 + ... + (aN-bN)**2 ) where N is the length of the vector and a is the first vector and b is the second vector (and ai being the ith element in a). Since (a1-b1)**2 == (a2-b2)**2 == ... == (aN-bN)**2 == 1 we expect this sum to produce N which in your case is 300. So sqrt(300) = 17.3 is the expected answer.
>> print(np.linalg.norm(vec1-vec2))
... 17.320508075688775
To answer the question, "How to add a value to a vector": you have done this correctly. If you'd like to add a value to a specific element then you can do vec2[ix] += value where ix indexes the element that you wish to add. If you want to add a value uniformly across all elements in the vector that will change the norm by 1, then add np.sqrt(1/300).
Also possibly relevant is a more commonly used distance metric for word2vec vectors: the cosine distance which measures the angle between two vectors.
In order to make the case simple and intuitive, I will using binary (0 and 1) classification for illustration.
Loss function
loss = np.multiply(np.log(predY), Y) + np.multiply((1 - Y), np.log(1 - predY)) #cross entropy
cost = -np.sum(loss)/m #num of examples in batch is m
Probability of Y
predY is computed using sigmoid and logits can be thought as the outcome of from a neural network before reaching the classification step
predY = sigmoid(logits) #binary case
def sigmoid(X):
return 1/(1 + np.exp(-X))
Suppose we are running a feed-forward net.
Inputs: [3, 5]: 3 is number of examples and 5 is feature size (fabricated data)
Num of hidden units: 100 (only 1 hidden layer)
Iterations: 10000
Such arrangement is set to overfit. When it's overfitting, we can perfectly predict the probability for the training examples; in other words, sigmoid outputs either 1 or 0, exact number because the exponential gets exploded. If this is the case, we would have np.log(0) undefined. How do you usually handle this issue?
If you don't mind the dependency on scipy, you can use scipy.special.xlogy. You would replace the expression
np.multiply(np.log(predY), Y) + np.multiply((1 - Y), np.log(1 - predY))
xlogy(Y, predY) + xlogy(1 - Y, 1 - predY)
If you expect predY to contain very small values, you might get better numerical results using scipy.special.xlog1py in the second term:
xlogy(Y, predY) + xlog1py(1 - Y, -predY)
Alternatively, knowing that the values in Y are either 0 or 1, you can compute the cost in an entirely different way:
Yis1 = Y == 1
cost = -(np.log(predY[Yis1]).sum() + np.log(1 - predY[~Yis1]).sum())/m
How do you usually handle this issue?
Add small number (something like 1e-15) to predY - this number doesn't make predictions much off, and it solves log(0) issue.
BTW if your algorithm outputs zeros and ones it might be useful to check the histogram of returned probabilities - when algorithm is so sure that something's happening it can be a sign of overfitting.
One common way to deal with log(x) and y / x where x is always non-negative but can become 0 is to add a small constant (as written by Jakub).
You can also clip the value (e.g. tf.clip_by_value or np.clip).
I looking for an elegant way to select a subset of a torch tensor which satisfies some constrains.
For example, say I have:
A = torch.rand(10,2)-1
and S is a 10x1 tensor,
sel = torch.ge(S,5) -- this is a ByteTensor
I would like to be able to do logical indexing, as follows:
A1 = A[sel]
But that doesn't work.
So there's the index function which accepts a LongTensor but I could not find a simple way to convert S to a LongTensor, except the following:
sel = torch.nonzero(sel)
which returns a K x 2 tensor (K being the number of values of S >= 5). So then I have to convert it to a 1 dimensional array, which finally allows me to index A:
This is very cumbersome; in e.g. Matlab all I'd have to do is
Can anyone suggest a better way?
One possible alternative is:
sel = S:ge(5):expandAs(A) -- now you can use this mask with the [] operator
A1 = A[sel]:unfold(1, 2, 2) -- unfold to get back a 2D tensor
> A = torch.rand(3,2)-1
-0.0047 -0.7976
-0.2653 -0.4582
-0.9713 -0.9660
[torch.DoubleTensor of size 3x2]
> S = torch.Tensor{{6}, {1}, {5}}
[torch.DoubleTensor of size 3x1]
> sel = S:ge(5):expandAs(A)
1 1
0 0
1 1
[torch.ByteTensor of size 3x2]
> A[sel]
[torch.DoubleTensor of size 4]
> A[sel]:unfold(1, 2, 2)
-0.0047 -0.7976
-0.9713 -0.9660
[torch.DoubleTensor of size 2x2]
There are two simpler alternatives:
Use maskedSelect:
Use a simple element-wise multiplication, for example
The second one is very useful if you need to keep the shape of the original matrix (i.e A), for example to select neurons in a layer at backprop. However, since it puts zeros in the resulting matrix whenever the condition dictated by the ByteTensor doesn't apply, you can't use it to compute the product (or median, etc.). The first one only returns the elements that satisfy the condittion, so this is what I'd use to compute products or medians or any other thing where I don't want zeros.