Chance/Probability in GAMS - gams-math

I would appreciate it if anyone could help me with this problem of mine, I have a random variable N which is following Poisson distribution and another Variable named M which is calculated based on other things and I know it is being calculated correctly cause i solved the example by hand. The problem is the part where GAMS wants to calculate the probability of N
*Number of emergency patients arriving at time period t on day d
Table N(t,d)
1 2
1 1 1
2 1 1
3 1 1;
file emp / '%emp.info%' /;put emp '* problem %gams.i%'/;
$onput
randvar N(1,1) poisson Ltotal
randvar N(2,1) poisson Ltotal
randvar N(3,1) poisson Ltotal
randvar N(1,2) poisson Ltotal
randvar N(2,2) poisson Ltotal
randvar N(3,2) poisson Ltotal
$offput
put "randvar N(1,1) poisson Ltotal", N('1','1')/;
put "randvar N(2,1) poisson Ltotal", N('2','1')/;
put "randvar N(3,1) poisson Ltotal", N('3','1')/;
put "randvar N(1,2) poisson Ltotal", N('1','2')/;
put "randvar N(2,2) poisson Ltotal", N('2','2')/;
put "randvar N(3,2) poisson Ltotal", N('3','2')/;
file emp1 / '%emp.info%' /;put emp '* problem %gams.i%'/;
$onput
chance (N(t,d)<=M(t,d)) Re(t,d)
$offput

Related

Spline fitting to data how to predict for particular value

After fitting a spline model
fit<-lm(wage ~ bs(age,knots = c(30,50,60)),data = Wage)
how to predict for particular age?
Try this:
predict (fit ,newdata =list(age = 30))
Now you will ask how I know age should be 30.
One word for you - 'Magic'

How leave's scores are calculated in this XGBoost trees?

I am looking at the below image.
Can someone explain how they are calculated?
I though it was -1 for an N and +1 for a yes but then I can't figure out how the little girl has .1. But that doesn't work for tree 2 either.
I agree with #user1808924. I think it's still worth to explain how XGBoost works under the hood though.
What is the meaning of leaves' scores ?
First, the score you see in the leaves are not probability. They are the regression values.
In Gradient Boosting Tree, there's only regression tree. To predict if a person like computer games or not, the model (XGboost) will treat it as a regression problem. The labels here become 1.0 for Yes and 0.0 for No. Then, XGboost puts regression trees in for training. The trees of course will return something like +2, +0.1, -1, which we get at the leaves.
We sum up all the "raw scores" and then convert them to probabilities by applying sigmoid function.
How to calculate the score in leaves ?
The leaf score (w) are calculated by this formula:
w = - (sum(gi) / (sum(hi) + lambda))
where g and h are the first derivative (gradient) and the second derivative (hessian).
For the sake of demonstration, let's pick the leaf which has -1 value of the first tree. Suppose our objective function is mean squared error (mse) and we choose the lambda = 0.
With mse, we have g = (y_pred - y_true) and h=1. I just get rid of the constant 2, in fact, you can keep it and the result should stay the same. Another note: at t_th iteration, y_pred is the prediction we have after (t-1)th iteration (the best we've got until that time).
Some assumptions:
The girl, grandpa, and grandma do NOT like computer games (y_true = 0 for each person).
The initial prediction is 1 for all the 3 people (i.e., we guess all people love games. Note that, I choose 1 on purpose to get the same result with the first tree. In fact, the initial prediction can be the mean (default for mean squared error), median (default for mean absolute error),... of all the observations' labels in the leaf).
We calculate g and h for each individual:
g_girl = y_pred - y_true = 1 - 0 = 1. Similarly, we have g_grandpa = g_grandma = 1.
h_girl = h_grandpa = h_grandma = 1
Putting the g, h values into the formula above, we have:
w = -( (g_girl + g_grandpa + g_grandma) / (h_girl + h_grandpa + h_grandma) ) = -1
Last note: In practice, the score in leaf which we see when plotting the tree is a bit different. It will be multiplied by the learning rate, i.e., w * learning_rate.
The values of leaf elements (aka "scores") - +2, +0.1, -1, +0.9 and -0.9 - were devised by the XGBoost algorithm during training. In this case, the XGBoost model was trained using a dataset where little boys (+2) appear somehow "greater" than little girls (+0.1). If you knew what the response variable was, then you could probably interpret/rationalize those contributions further. Otherwise, just accept those values as they are.
As for scoring samples, then the first addend is produced by tree1, and the second addend is produced by tree2. For little boys (age < 15, is male == Y, and use computer daily == Y), tree1 yields 2 and tree2 yields 0.9.
Read this
https://towardsdatascience.com/xgboost-mathematics-explained-58262530904a
and then this
https://medium.com/#gabrieltseng/gradient-boosting-and-xgboost-c306c1bcfaf5
and the appendix
https://gabrieltseng.github.io/appendix/2018-02-25-XGB.html

K value vs Accuracy in KNN

I am trying to learn KNN by working on Breast cancer dataset provided by UCI repository. The Total size of dataset is 699 with 9 continuous variables and 1 class variable.
I tested my accuracy on cross-validation set. For K =21 & K =19. Accuracy is 95.7%.
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier(n_neighbors=21)
neigh.fit(X_train, y_train)
y_pred_val = neigh.predict(X_val)
print accuracy_score(y_val, y_pred_val)
But for K= 1, I am getting Accuracy = 97.85%
K = 3, Accuracy = 97.14
I read
Choice of k is very critical – A small value of k means that noise will have a higher influence on the result. A large value make it computationally expensive and kinda defeats the basic philosophy behind KNN (that points that are near might have similar densities or classes ) .A simple approach to select k is set k = n^(1/2). here
Which value of K should I consider for my model. Can you guys elaborate the logic behind it?
Thanks in advance!

how to find weight vector from the libsvm model file?

I have the following model file from LIBSVM:
svm_type c_svc kernel_type linear nr_class 2 total_sv 3 rho 0.0666415
label 1 -1 nr_sv 2 1 SV
0.004439511653718091 1:4.5 2:0.5
0.07111595083031433 1:2 2:2
-0.07555546248403242 1:-0.5 2:-2.5
My question is how do I figure out the weight vector from this information?
The weights of the support vectors are the first numbers on each of the support vector lines (the last three). Despite using a linear kernel, libsvm is for general kernel SVMs, so it isn't storing a weight vector and bias explicitly.
If you know you want a linear kernel, and you want that information, you can use liblinear (from the same folks as libsvm). Given this trivial data:
1 1:1 2:1
0 1:-1 2:-1
you can get this model, which has explicit weight and bias:
solver_type L2R_L2LOSS_SVC_DUAL
nr_class 2
label 1 0
nr_feature 2
bias -1
w
0.4327936
0.4327936

Poisson cumulative distribution in GAMS

Can anyone help me with this? I defined a Parameter N and then defined the chance of N being less than M(a variable), but it's not working, could you please take a look at this?
*Number of emergency patients arriving at time period t on day d
Parameter N(t,d);
file emp / '%emp.info%' /;put emp '* problem %gams.i%'/;
$onput
randvar N(t,d) poisson Ltotal
chance (N(t,d)<=M(t,d)) Re(t,d)
$offput