how to find weight vector from the libsvm model file? - libsvm

I have the following model file from LIBSVM:
svm_type c_svc kernel_type linear nr_class 2 total_sv 3 rho 0.0666415
label 1 -1 nr_sv 2 1 SV
0.004439511653718091 1:4.5 2:0.5
0.07111595083031433 1:2 2:2
-0.07555546248403242 1:-0.5 2:-2.5
My question is how do I figure out the weight vector from this information?

The weights of the support vectors are the first numbers on each of the support vector lines (the last three). Despite using a linear kernel, libsvm is for general kernel SVMs, so it isn't storing a weight vector and bias explicitly.
If you know you want a linear kernel, and you want that information, you can use liblinear (from the same folks as libsvm). Given this trivial data:
1 1:1 2:1
0 1:-1 2:-1
you can get this model, which has explicit weight and bias:
solver_type L2R_L2LOSS_SVC_DUAL
nr_class 2
label 1 0
nr_feature 2
bias -1
w
0.4327936
0.4327936

Related

Tensorflow classification label 0 and 1

I'm working on a simple classification problem. I proceeded through the example and created a model.
I arranged the tag column as given below.
label 0 1 1 0 0 1
As a result, I wanted to test the system with samples. But it does value as a percentage.
I expect it to give 2 correct values, either 0 or 1.
example codes;
input_dict = {name: tf.convert_to_tensor([value]) for name, value in sample.items()}
predictions = reloaded_model.predict(input_dict)
prob = tf.nn.sigmoid(predictions[0])
print(
"This particular pet had a %.1f percent probability "
"of getting adopted." % (100 * prob)
)
What code will result in 0 and 1?
thank you.
What to do depends on how you model was constructed. With only two labels you are doing binary classification. If in your model the last dense layer has 1 neuron then it is set up for binary classification. In that case your loss function in model.compile should be
loss=BinaryCrossentropy
Model.predict in that case will produce a single value probability output. You can just use an if statement to determine the class. If the probability is less than.5 it is one class, if greator or equal to .5 it is the other class. Now you may have constructed your model where the last dense layer has 2 neurons. In that case you should be using either sparse_categorical_crossentropy if the labels were integers or categorical_crossentropy if the labels were one hot encoded as your loss function. Model.predict in this case will produce two probabilities as the output. You want to select the index of with the highest probability as the class.
You can do that with class=np.argmax(predictions)

How to handle log(0) when using cross entropy

In order to make the case simple and intuitive, I will using binary (0 and 1) classification for illustration.
Loss function
loss = np.multiply(np.log(predY), Y) + np.multiply((1 - Y), np.log(1 - predY)) #cross entropy
cost = -np.sum(loss)/m #num of examples in batch is m
Probability of Y
predY is computed using sigmoid and logits can be thought as the outcome of from a neural network before reaching the classification step
predY = sigmoid(logits) #binary case
def sigmoid(X):
return 1/(1 + np.exp(-X))
Problem
Suppose we are running a feed-forward net.
Inputs: [3, 5]: 3 is number of examples and 5 is feature size (fabricated data)
Num of hidden units: 100 (only 1 hidden layer)
Iterations: 10000
Such arrangement is set to overfit. When it's overfitting, we can perfectly predict the probability for the training examples; in other words, sigmoid outputs either 1 or 0, exact number because the exponential gets exploded. If this is the case, we would have np.log(0) undefined. How do you usually handle this issue?
If you don't mind the dependency on scipy, you can use scipy.special.xlogy. You would replace the expression
np.multiply(np.log(predY), Y) + np.multiply((1 - Y), np.log(1 - predY))
with
xlogy(Y, predY) + xlogy(1 - Y, 1 - predY)
If you expect predY to contain very small values, you might get better numerical results using scipy.special.xlog1py in the second term:
xlogy(Y, predY) + xlog1py(1 - Y, -predY)
Alternatively, knowing that the values in Y are either 0 or 1, you can compute the cost in an entirely different way:
Yis1 = Y == 1
cost = -(np.log(predY[Yis1]).sum() + np.log(1 - predY[~Yis1]).sum())/m
How do you usually handle this issue?
Add small number (something like 1e-15) to predY - this number doesn't make predictions much off, and it solves log(0) issue.
BTW if your algorithm outputs zeros and ones it might be useful to check the histogram of returned probabilities - when algorithm is so sure that something's happening it can be a sign of overfitting.
One common way to deal with log(x) and y / x where x is always non-negative but can become 0 is to add a small constant (as written by Jakub).
You can also clip the value (e.g. tf.clip_by_value or np.clip).

Tensorflow, how to compute broadcast product of 2 gradients?

I'm trying to compute a matrix H as follows:
Where L is a tensor of shape (?,N) and z is a variable vector shape (M).
Each element of H is a broadcast product of 2 gradients of L which respect to two elements of vector z.
The tf.gradients(L,z[i]) * tf.gradients(L,z[j]) does not work because it returns a product of two sum, while I need a sum of wise products. Anyone have done that before, please help me.

How leave's scores are calculated in this XGBoost trees?

I am looking at the below image.
Can someone explain how they are calculated?
I though it was -1 for an N and +1 for a yes but then I can't figure out how the little girl has .1. But that doesn't work for tree 2 either.
I agree with #user1808924. I think it's still worth to explain how XGBoost works under the hood though.
What is the meaning of leaves' scores ?
First, the score you see in the leaves are not probability. They are the regression values.
In Gradient Boosting Tree, there's only regression tree. To predict if a person like computer games or not, the model (XGboost) will treat it as a regression problem. The labels here become 1.0 for Yes and 0.0 for No. Then, XGboost puts regression trees in for training. The trees of course will return something like +2, +0.1, -1, which we get at the leaves.
We sum up all the "raw scores" and then convert them to probabilities by applying sigmoid function.
How to calculate the score in leaves ?
The leaf score (w) are calculated by this formula:
w = - (sum(gi) / (sum(hi) + lambda))
where g and h are the first derivative (gradient) and the second derivative (hessian).
For the sake of demonstration, let's pick the leaf which has -1 value of the first tree. Suppose our objective function is mean squared error (mse) and we choose the lambda = 0.
With mse, we have g = (y_pred - y_true) and h=1. I just get rid of the constant 2, in fact, you can keep it and the result should stay the same. Another note: at t_th iteration, y_pred is the prediction we have after (t-1)th iteration (the best we've got until that time).
Some assumptions:
The girl, grandpa, and grandma do NOT like computer games (y_true = 0 for each person).
The initial prediction is 1 for all the 3 people (i.e., we guess all people love games. Note that, I choose 1 on purpose to get the same result with the first tree. In fact, the initial prediction can be the mean (default for mean squared error), median (default for mean absolute error),... of all the observations' labels in the leaf).
We calculate g and h for each individual:
g_girl = y_pred - y_true = 1 - 0 = 1. Similarly, we have g_grandpa = g_grandma = 1.
h_girl = h_grandpa = h_grandma = 1
Putting the g, h values into the formula above, we have:
w = -( (g_girl + g_grandpa + g_grandma) / (h_girl + h_grandpa + h_grandma) ) = -1
Last note: In practice, the score in leaf which we see when plotting the tree is a bit different. It will be multiplied by the learning rate, i.e., w * learning_rate.
The values of leaf elements (aka "scores") - +2, +0.1, -1, +0.9 and -0.9 - were devised by the XGBoost algorithm during training. In this case, the XGBoost model was trained using a dataset where little boys (+2) appear somehow "greater" than little girls (+0.1). If you knew what the response variable was, then you could probably interpret/rationalize those contributions further. Otherwise, just accept those values as they are.
As for scoring samples, then the first addend is produced by tree1, and the second addend is produced by tree2. For little boys (age < 15, is male == Y, and use computer daily == Y), tree1 yields 2 and tree2 yields 0.9.
Read this
https://towardsdatascience.com/xgboost-mathematics-explained-58262530904a
and then this
https://medium.com/#gabrieltseng/gradient-boosting-and-xgboost-c306c1bcfaf5
and the appendix
https://gabrieltseng.github.io/appendix/2018-02-25-XGB.html

How to do pairwise addition in tensorflow

I am new in tensorflow so this might be an easy question, but it is really stuck me
I am tring to implement this paper by keras, background is tensorflow
In first stage of training, he used softmax_pair
if we got this output from last fc
vertical is batch size and this is NoneType
x11 x12 x13 x14...
x21 x22 x23 x24...
x31 x32 x33 x34...
...
and we do exponential, so we have
e11 e12 e13 e14...
e21 e22 e23 e24...
e31 e32 e33 e34...
...
and then, I am stuck here
e11/(e11+e12) e12/(e11+e12) e13/(e13+e14) e14/(e13+e14)...
e21/(e21+e22) e22/(e21+e22) e23/(e23+e24) e24/(e23+e24)...
e31/(e31+e32) e32/(e31+e32) e33/(e33+e34) e34/(e33+e34)...
...
I don't know how to do pairwise addition
tf.transpose and tf.segment_sum might be great
but after research I found transpose is expensive
further more, after tf.segment_sum I only have half size of tensor,
I don't know how to double it
oh and I am thinking how to produce segment_ids
so how can I do this calculate?
Thanks!!
----------update
The part I talked in paper is Fig.3
The fc output is P2c-1 and P2c, which is mean possibility of class c appear or not appear in the image
c=1,2,3...num of class
Is transpose not expensive? sometimes I see this,e.g. the comment ,perhaps I misunderstood this?
The tensorflow docs for tf.transpose state that unlike numpy tensorflow returns a new tensor -> memory.
Assuming X is your tensor of size R x C:
_, C = X.get_shape()
X_split = tf.split(1, C/2, X)
Y_split = [tf.nn.softmax(slice) for slice in X_split]
Y = tf.concat(1, Y_split)
C will be the number of colums, X_split will be a list of subtensors, each having a two columns, Y_split will calculate regular softmax for each of the tensors, Y will join the results of softmaxes.