Tensorflow classification label 0 and 1 - tensorflow

I'm working on a simple classification problem. I proceeded through the example and created a model.
I arranged the tag column as given below.
label 0 1 1 0 0 1
As a result, I wanted to test the system with samples. But it does value as a percentage.
I expect it to give 2 correct values, either 0 or 1.
example codes;
input_dict = {name: tf.convert_to_tensor([value]) for name, value in sample.items()}
predictions = reloaded_model.predict(input_dict)
prob = tf.nn.sigmoid(predictions[0])
print(
"This particular pet had a %.1f percent probability "
"of getting adopted." % (100 * prob)
)
What code will result in 0 and 1?
thank you.

What to do depends on how you model was constructed. With only two labels you are doing binary classification. If in your model the last dense layer has 1 neuron then it is set up for binary classification. In that case your loss function in model.compile should be
loss=BinaryCrossentropy
Model.predict in that case will produce a single value probability output. You can just use an if statement to determine the class. If the probability is less than.5 it is one class, if greator or equal to .5 it is the other class. Now you may have constructed your model where the last dense layer has 2 neurons. In that case you should be using either sparse_categorical_crossentropy if the labels were integers or categorical_crossentropy if the labels were one hot encoded as your loss function. Model.predict in this case will produce two probabilities as the output. You want to select the index of with the highest probability as the class.
You can do that with class=np.argmax(predictions)

Related

Custom Weighted Cross Entropy loss in Keras

Ok, so I have a neural network that classifies fire size into 3 groups, 0-1, 1 - 100, and over 100 acres. I need a loss function that weights the loss as double when the classifier guesses a class that is off by 2 (Actual = 0, predicted = 3)
I need a loss function that weights the loss as double when the classifier guesses a class that is off by 2 (Actual = 0, predicted = 3)
double of what?.
A)Is it the double the loss value when the classifier guesses correctly,
B)or double the loss value when the classifier is off by 1.
C)Can we relax this 'double' constraint, and can we assume that any suitable higher power would suffice?
Let us assume A).
Let f(x) denotes the probability that your input variable belong to a particular class. Note that, in f(x), x is the absolute value of the difference in categorical value.
Then we see that f(0)=0.5 is a solution for assumption A. This means that f(1)=0.25 and f(2)=0.25. Btw, the fact that f(1)==f(2) doesn't look natural.
Assume that your classifier calculates a function f(x), and uses it as follows.
def classifier_output(firesize):
if (firesize >=0 and firesize < 1.0):
return [f(0), f(1), f(2)]
elif (firesize >= 1.0 and firesize < 100.0):
return [f(1), f(0), f(1)]
else :
assert(firesize > 100.0)
return (f(2), f(1), f(0)]
The constraints are
C1)
f(x) >=0
C2)
the components of your output vector should always sum to 1.0
ie. sum of all three components of the return value should always be 1.
C3)
When the true class and predicted class differ by 2, the 1-hot encoding loss
will be -log(f(2)), According to assumption A, this should equal -2log(f(0)).
ie:
log(f(2))=2*log(f(0))
This translates to
f(2) = f(0)*f(0)
Let us put z=f(0). Now f(2)=z*z. We don't know f(1). Let us assume, f(1)=y.
From the constraint C2,
We have the following equations,
z+ z*z + y=1
z + 2*y=1
A solution to the above is z=0.5, y=0.25
If you assume B), you wont be able to find such a function.

Tensorflow/Keras custom loss function

I want to create a custom loss function in keras.
Let's say I have yTrue and yPred which are tensors (n x m) of true and predicted labels.
Let's call each sample n (that is, each row in yTrue and yPred) yT and yP.
Then I want a loss function that computes (yT-yP)^2 when yT[0] == 1, otherwise it will compute (yT[0]-yP[0])^2.
That is: for each sample I always want to calculate the squared error for the first element - but I want to calculate the squared error of the other elements only if the first element of the true label == 1.
How do I do this in a custom loss function?
This is what I have gotten so far:
I need to do this with tensors operations.
First I can compute
Y = (yTrue - yPred)^2
Then I can define a masking matrix where the first column is always one, and the others are 1 depending on the value of the first element for each row of yTrue.
So I can get something like
1 0 0 0 0
1 0 0 0 0
1 1 1 1 1
1 1 1 1 1
1 0 0 0 0
I can then multiply element wise this matrix with Y and obtain what I want.
However, how do I get in generating the masking matrix? In particular, how do I do the condition "if the first element of the row is 1" in tensorflow/keras?
Maybe there is a better way to do this?
Thanks
You can use a conditional switch K.switch in the backend. Something along the lines of:
mse = K.mean(K.square(y_pred - y_true), axis=-1) # standard mse
msep = K.square(y_pred[:,0] - y_true[:,0])
return K.switch(K.equals(y_true[:,0], 1), mse, msep)
Edit for handling per sample condition.

How to handle log(0) when using cross entropy

In order to make the case simple and intuitive, I will using binary (0 and 1) classification for illustration.
Loss function
loss = np.multiply(np.log(predY), Y) + np.multiply((1 - Y), np.log(1 - predY)) #cross entropy
cost = -np.sum(loss)/m #num of examples in batch is m
Probability of Y
predY is computed using sigmoid and logits can be thought as the outcome of from a neural network before reaching the classification step
predY = sigmoid(logits) #binary case
def sigmoid(X):
return 1/(1 + np.exp(-X))
Problem
Suppose we are running a feed-forward net.
Inputs: [3, 5]: 3 is number of examples and 5 is feature size (fabricated data)
Num of hidden units: 100 (only 1 hidden layer)
Iterations: 10000
Such arrangement is set to overfit. When it's overfitting, we can perfectly predict the probability for the training examples; in other words, sigmoid outputs either 1 or 0, exact number because the exponential gets exploded. If this is the case, we would have np.log(0) undefined. How do you usually handle this issue?
If you don't mind the dependency on scipy, you can use scipy.special.xlogy. You would replace the expression
np.multiply(np.log(predY), Y) + np.multiply((1 - Y), np.log(1 - predY))
with
xlogy(Y, predY) + xlogy(1 - Y, 1 - predY)
If you expect predY to contain very small values, you might get better numerical results using scipy.special.xlog1py in the second term:
xlogy(Y, predY) + xlog1py(1 - Y, -predY)
Alternatively, knowing that the values in Y are either 0 or 1, you can compute the cost in an entirely different way:
Yis1 = Y == 1
cost = -(np.log(predY[Yis1]).sum() + np.log(1 - predY[~Yis1]).sum())/m
How do you usually handle this issue?
Add small number (something like 1e-15) to predY - this number doesn't make predictions much off, and it solves log(0) issue.
BTW if your algorithm outputs zeros and ones it might be useful to check the histogram of returned probabilities - when algorithm is so sure that something's happening it can be a sign of overfitting.
One common way to deal with log(x) and y / x where x is always non-negative but can become 0 is to add a small constant (as written by Jakub).
You can also clip the value (e.g. tf.clip_by_value or np.clip).

How leave's scores are calculated in this XGBoost trees?

I am looking at the below image.
Can someone explain how they are calculated?
I though it was -1 for an N and +1 for a yes but then I can't figure out how the little girl has .1. But that doesn't work for tree 2 either.
I agree with #user1808924. I think it's still worth to explain how XGBoost works under the hood though.
What is the meaning of leaves' scores ?
First, the score you see in the leaves are not probability. They are the regression values.
In Gradient Boosting Tree, there's only regression tree. To predict if a person like computer games or not, the model (XGboost) will treat it as a regression problem. The labels here become 1.0 for Yes and 0.0 for No. Then, XGboost puts regression trees in for training. The trees of course will return something like +2, +0.1, -1, which we get at the leaves.
We sum up all the "raw scores" and then convert them to probabilities by applying sigmoid function.
How to calculate the score in leaves ?
The leaf score (w) are calculated by this formula:
w = - (sum(gi) / (sum(hi) + lambda))
where g and h are the first derivative (gradient) and the second derivative (hessian).
For the sake of demonstration, let's pick the leaf which has -1 value of the first tree. Suppose our objective function is mean squared error (mse) and we choose the lambda = 0.
With mse, we have g = (y_pred - y_true) and h=1. I just get rid of the constant 2, in fact, you can keep it and the result should stay the same. Another note: at t_th iteration, y_pred is the prediction we have after (t-1)th iteration (the best we've got until that time).
Some assumptions:
The girl, grandpa, and grandma do NOT like computer games (y_true = 0 for each person).
The initial prediction is 1 for all the 3 people (i.e., we guess all people love games. Note that, I choose 1 on purpose to get the same result with the first tree. In fact, the initial prediction can be the mean (default for mean squared error), median (default for mean absolute error),... of all the observations' labels in the leaf).
We calculate g and h for each individual:
g_girl = y_pred - y_true = 1 - 0 = 1. Similarly, we have g_grandpa = g_grandma = 1.
h_girl = h_grandpa = h_grandma = 1
Putting the g, h values into the formula above, we have:
w = -( (g_girl + g_grandpa + g_grandma) / (h_girl + h_grandpa + h_grandma) ) = -1
Last note: In practice, the score in leaf which we see when plotting the tree is a bit different. It will be multiplied by the learning rate, i.e., w * learning_rate.
The values of leaf elements (aka "scores") - +2, +0.1, -1, +0.9 and -0.9 - were devised by the XGBoost algorithm during training. In this case, the XGBoost model was trained using a dataset where little boys (+2) appear somehow "greater" than little girls (+0.1). If you knew what the response variable was, then you could probably interpret/rationalize those contributions further. Otherwise, just accept those values as they are.
As for scoring samples, then the first addend is produced by tree1, and the second addend is produced by tree2. For little boys (age < 15, is male == Y, and use computer daily == Y), tree1 yields 2 and tree2 yields 0.9.
Read this
https://towardsdatascience.com/xgboost-mathematics-explained-58262530904a
and then this
https://medium.com/#gabrieltseng/gradient-boosting-and-xgboost-c306c1bcfaf5
and the appendix
https://gabrieltseng.github.io/appendix/2018-02-25-XGB.html

how to find weight vector from the libsvm model file?

I have the following model file from LIBSVM:
svm_type c_svc kernel_type linear nr_class 2 total_sv 3 rho 0.0666415
label 1 -1 nr_sv 2 1 SV
0.004439511653718091 1:4.5 2:0.5
0.07111595083031433 1:2 2:2
-0.07555546248403242 1:-0.5 2:-2.5
My question is how do I figure out the weight vector from this information?
The weights of the support vectors are the first numbers on each of the support vector lines (the last three). Despite using a linear kernel, libsvm is for general kernel SVMs, so it isn't storing a weight vector and bias explicitly.
If you know you want a linear kernel, and you want that information, you can use liblinear (from the same folks as libsvm). Given this trivial data:
1 1:1 2:1
0 1:-1 2:-1
you can get this model, which has explicit weight and bias:
solver_type L2R_L2LOSS_SVC_DUAL
nr_class 2
label 1 0
nr_feature 2
bias -1
w
0.4327936
0.4327936