Form of cubic spline hazard function for Cox Regression - spline

I am trying to understand the form of piecewise hazard function for piece wise Cox regression. I have python codes and output below. My focus is also to find the formula for python output. What is the formula for hazard function for below python output. Below is what I taught but still needs help to place log lambda? and where can I input Beta?
What I tried as a form for just one variable:
h(t) = h0(t) * exp(beta1x1t) for t < t1
h(t) = h0(t) * exp(beta1x1t + beta2*(t-t1)) for t1 <= t < t2
h(t) = h0(t) * exp(beta1x1t + beta2*(t-x1)+beta3(t-t2)) for t2 <= t
Code:
cph_piecewise=CoxPHFitter(penalizer=0.4,baseline_estimation_method
="piecewise",breakpoints=[20,30]) \
.fit(df_new,duration_col='tenure',event_col='Churn')

Related

Calculate intercepting vector?

I am trying to calculate an intercepting vector based on Velocity Location and time of two objects.
I found an post covering my problem but was left over with some technical questions i could not ask because my reputation is below 50.
Calculating Intercepting Vector
The answer marked as best goes over the process of how to solve my problem, however when i tried to calculate myself, i could not understand how the vectors of position and velocity are converted to a real number.
Using the data provided here for the positions and speeds of the target and the interceptor, the solving equation is the following:
plugging in the numbers, the coefficients of the quadratic equation in t are:
s_t = [120, 40]; v_t = [5,2]; s_i = [80, 80]; v_i = 10;
a = dot(v_t, v_t)-10^2
b = 2*dot((s_t - s_i),v_t)
c = dot(s_t - s_i, s_t - s_i)
Solving for t yields:
delta = sqrt(b^2-4*a*c)
t1 = (b + sqrt(b^2 - 4*a*c))/(2*a)
t2 = (b - sqrt(b^2 - 4*a*c))/(2*a)
With the data at hand, t1 turns out to be negative, and can be discarded.

Implementing the Square Non-linearity (SQNL) activation function in Keras

I have been trying to implement the square non-linearity activation function function as a custom activation function for a keras model. It's the 10'th function on this list https://en.wikipedia.org/wiki/Activation_function.
I tried using the keras backend but i got nowhere with the multiple if else statements i require so i also tried using the following:
import tensorflow as tf
def square_nonlin(x):
orig = x
x = tf.where(orig >2.0, (tf.ones_like(x)) , x)
x = tf.where(0.0 <= orig <=2.0, (x - tf.math.square(x)/4), x)
x = tf.where(-2.0 <= orig < 0, (x + tf.math.square(x)/4), x)
return tf.where(orig < -2.0, -1, x)
As you can see there's 4 different clauses i need to evaluate. But when i try to compile the Keras model i still get the error:
Using a `tf.Tensor` as a Python `bool` is not allowed
Could anyone help me to get this working in Keras? Thanks a lot.
I've just started a week ago digging into tensorflow and am actively playing around with different activation functions. I think I know what two of your problems are. In your second and third assignments you have compound conditionals you need to put them in under tf.logical_and. The other problem you have is that the last tf.where on the return line returns a -1 which is not a vector, which tensorflow expects. I haven't tried the function with Keras, but in my "activation function" tester this code works.
def square_nonlin(x):
orig = x
x = tf.where(orig >2.0, (tf.ones_like(x)) , x)
x = tf.where(tf.logical_and(0.0 <= orig, orig <=2.0), (x - tf.math.square(x)/4.), x)
x = tf.where(tf.logical_and(-2.0 <= orig, orig < 0), (x + tf.math.square(x)/4.), x)
return tf.where(orig < -2.0, 0*x-1.0, x)
As I said I'm new at this so to "vectorize" -1, I multiplied the x vector by 0 and subtracted -1 which produces a array filled with -1 of the right shape. Perhaps one of the more seasoned tensorflow practioners can suggest the proper way to do that.
Hope this helps.
BTW, tf.greater is equivlent to tf.__gt__ which means that orig > 2.0 expands under the covers in python to tf.greater(orig, 2.0).
Just a follow up. I tried it with the MNIST demo in Keras and the activation function works as coded above.
UPDATE:
The less hacky way to "vectorize" -1 is to use the tf.ones_like function
so replace the last line with
return tf.where(orig < -2.0, -tf.ones_like(x), x)
for a cleaner solution

Matlab: how do I run the optimization (fmincon) repeately?

I am trying to follow the tutorial of using the optimization tool box in MATLAB. Specifically, I have a function
f = exp(x(1))*(4*x(1)^2+2*x(2)^2+4*x(1)*x(2)+2*x(2)+1)+b
subject to the constraint:
(x(1))^2+x(2)-1=0,
-x(1)*x(2)-10<=0.
and I want to minimize this function for a range of b=[0,20]. (That is, I want to minimize this function for b=0, b=1,b=2 ... and so on).
Below is the steps taken from the MATLAB's tutorial webpage(http://www.mathworks.com/help/optim/ug/nonlinear-equality-and-inequality-constraints.html), how should I change the code so that, the optimization will run for 20 times, and save the optimal values for each b?
Step 1: Write a file objfun.m.
function f = objfun(x)
f = exp(x(1))*(4*x(1)^2+2*x(2)^2+4*x(1)*x(2)+2*x(2)+1)+b;
Step 2: Write a file confuneq.m for the nonlinear constraints.
function [c, ceq] = confuneq(x)
% Nonlinear inequality constraints
c = -x(1)*x(2) - 10;
% Nonlinear equality constraints
ceq = x(1)^2 + x(2) - 1;
Step 3: Invoke constrained optimization routine.
x0 = [-1,1]; % Make a starting guess at the solution
options = optimoptions(#fmincon,'Algorithm','sqp');
[x,fval] = fmincon(#objfun,x0,[],[],[],[],[],[],...
#confuneq,options);
After 21 function evaluations, the solution produced is
x, fval
x =
-0.7529 0.4332
fval =
1.5093
Update:
I tried your answer, but I am encountering problem with your step 2. Bascially, I just fill the my step 2 to your step 2 (below the comment "optimization just like before").
%initialize list of targets
b = 0:1:20;
%preallocate/initialize result vectors using zeros (increases speed)
opt_x = zeros(length(b));
opt_fval = zeros(length(b));
>> for idx = 1, length(b)
objfun = #(x)objfun_builder(x,b)
%optimization just like before
x0 = [-1,1]; % Make a starting guess at the solution
options = optimoptions(#fmincon,'Algorithm','sqp');
[x,fval] = fmincon(#objfun,x0,[],[],[],[],[],[],...
#confuneq,options);
%end the stuff I fill in
opt_x(idx) = x
opt_fval(idx) = fval
end
However, it gave me the output is:
Error: "objfun" was previously used as a variable, conflicting
with its use here as the name of a function or command.
See "How MATLAB Recognizes Command Syntax" in the MATLAB
documentation for details.
There are two things you need to change about your code:
Creation of the objective function.
Multiple optimizations using a loop.
1st Step
For more flexibility with regard to b, you need to set up another function that returns a handle to the desired objective function, e.g.
function h = objfun_builder(x, b)
h = #(x)(objfun(x));
function f = objfun(x)
f = exp(x(1))*(4*x(1)^2+2*x(2)^2+4*x(1)*x(2)+2*x(2)+1) + b;
end
end
A more elegant and shorter approach are anonymous functions, e.g.
objfun_builder = #(x,b)(exp(x(1))*(4*x(1)^2+2*x(2)^2+4*x(1)*x(2)+2*x(2)+1) + b);
After all, this works out to be the same as above. It might be less intuitive for a Matlab-beginner, though.
2nd Step
Instead of placing an .m-file objfun.m in your path, you will need to call
objfun = #(x)(objfun_builder(x,myB));
to create an objective function in your workspace. In order to loop over the interval b=[0,20], use the following loop
%initialize list of targets
b = 0:1:20;
%preallocate/initialize result vectors using zeros (increases speed)
opt_x = zeros(length(b))
opt_fval = zeros(length(b))
%start optimization of list of targets (`b`s)
for idx = 1, length(b)
objfun = #(x)objfun_builder(x,b)
%optimization just like before
opt_x(idx) = x
opt_fval(idx) = fval
end

Looping and if statements in SPSS

I'm new to SPSS and I'm a bit stuck on a problem. I have about 200 variables and I want to loop through pairs of them looking for variables with correlation coefficients above 0.7. I know that I can use CORRELATIONS to get a matrix of coefficients but it would be huge and difficult to look through. Basically, in pseudocode, what I want to do is:
for (i = W1_1 to W1_200) {
for (j = i to W1_200) {
if CORRELATIONS(i,j)>0.7 {
print i, j, CORRELATIONS(i,j)
}
}
}
I can't for the life of me work out how to do any of this in SPSS. Help!
SPSS has a helper function on the CORRELATIONS command to export the correlation matrix. From there you can manipulate the data to give the correlation pairs that meet your criteria. So first, lets make some fake data to illustrate.
*Making fake data.
set seed 5.
input program.
loop i = 1 to 100.
end case.
end loop.
end file.
end input program.
dataset name test.
compute #base = RV.NORMAL(0,1).
vector X(20).
loop #i = 1 to 20.
compute X(#i) = #base*(#i/20) + RV.NORMAL(0,1).
end loop.
exe.
Now, we can run the CORRELATIONS command and export the table to a new dataset (which I named here Corrs).
DATASET DECLARE Corrs.
CORRELATIONS
/VARIABLES=X1 to X20
/MATRIX=OUT('Corrs').
Unfortunately SPSS returns the full matrix (plus other info on the sample size). We can only select the rows we are interested in (ones with "CORR" in the ROWTYPE_ column) and then use a DO REPEAT to set the upper or lower half of the matrix to system missing values.
DATASET ACTIVATE Corrs.
SELECT IF ROWTYPE_ = "CORR".
*Now only making lower half of matrix.
COMPUTE #iter = 0.
DO REPEAT X = X1 TO X20.
COMPUTE #iter = #iter + 1.
IF #iter > ($casenum-1) X = $SYSMIS.
END REPEAT.
I set them to system missing values because the next part I will reshape the data using VARSTOCASES. This by default drops missing values, so we won't end up having redundant correlation pairs.
VARSTOCASES
/MAKE Corr FROM X1 TO X20
/INDEX X2 (Corr)
/DROP ROWTYPE_.
RENAME VARIABLES (VARNAME_ = X1).
Now you have your correlation pairs list and can just select out the correlations that meet your criteria.
SELECT IF ABS(Corr) >= .5.
Making of the correlation pairs can be made into a MACRO function pretty easily to return the pair list. Below is that function, recreating the exact steps used here.
DEFINE !CorrPairs (!POSITIONAL !CMDEND)
DATASET DECLARE Corrs.
CORRELATIONS
/VARIABLES=!1
/MATRIX=OUT('Corrs').
DATASET ACTIVATE Corrs.
SELECT IF ROWTYPE_ = "CORR".
COMPUTE #iter = 0.
DO REPEAT X = !1.
COMPUTE #iter = #iter + 1.
IF #iter > ($casenum-1) X = $SYSMIS.
END REPEAT.
VARSTOCASES
/MAKE Corr FROM !1
/INDEX X2 (Corr)
/DROP ROWTYPE_.
RENAME VARIABLES (VARNAME_ = X1).
!ENDDEFINE.
The macro just takes a list of variables (in the active dataset) to grab the correlations, and returns a second dataset named Corrs with the correlation pairs and the variable names defined in the X1 and X2 columns. Then after the above macro is defined the above steps can be recreated simply by below.
!CorrPairs X1 to X20.
SELECT IF ABS(Corr) >= .5.
EXECUTE.
My suggestion is to use OMS to extract your correlation values from the output into a datafile. Use a macro to only run the correlations you need:
DATASET DECLARE Correlations.
OMS /SELECT TABLES /IF COMMANDS=['Correlations'] SUBTYPES=['Correlations']
/DESTINATION FORMAT=SAV NUMBERED=TableNumber_ OUTFILE='Correlations' VIEWER=YES.
define runCorrs ()
!do !i1=1 !to 200
!do !i2=!i1 !to 200
!if (!i2<>!i1) !then
corr !concat("W_",!i1) with !concat("W_",!i2).
!ifend
!doend !doend
!enddefine.
runCorrs.
OMSEND.
datas act Correlations.
select if var2="Pearson Correlation".
VARSTOCASES /make crlVal from W_2 to W_200/index=withvar(crlVal)
/drop TableNumber_ Command_ Subtype_ Label_ Var2.
now you have a nice list of all the correlations to work with:
select if crlVal>0.7.
exe.

Normal Distribution Best Approach

I'm trying to build a simple program to price call options using the black scholes formula http://en.wikipedia.org/wiki/Black%E2%80%93Scholes. I'm trying to figure our the best way to get probabilities from a normal distribution. For example if I were to do this by hand and I got the value of as d1=0.43 than I'd look up 0.43 in this table http://www.math.unb.ca/~knight/utility/NormTble.htm and get the value 0.6664.
I believe that there are no functions in c or objective-c to find the normal distribution. I'm also thinking about creating a 2 dimensional array and looping through until I find the desired value. Or maybe I can define 300 doubles with the corresponding value and loop through those until I get the appropriate result. Any thoughts on the best approach?
You need to define what it is you are looking for more clearly. Based on what you posted, it appears you are looking for the cumulative distribution function or P(d < d1) where d1 is measured in standard deviations and d is a normal distribution: by your example, if d1 = 0.43 then P(d < d1) = 0.6664.
The function you want is called the error function erf(x) and there are some good approximations for it.
Apparently erf(x) is part of the standard math.h in C. (not sure about objective-c but I assume it probably contains it as well).
But erf(x) is not exactly the function you need. The general form P(d < d1) can be calculated from erf(x) in the following formula:
P(d<d1) = f(d1,sigma) = (erf(x/sigma/sqrt(2))+1)/2
where sigma is the standard deviation. (in your case you can use sigma = 1.)
You can test this on Wolfram Alpha for example: f(0.43,1) = (erf(0.43/sqrt(2))+1)/2 = 0.666402 which matches your table.
There are two other things that are important:
If you are looking for P(d < d1) where d1 is large (greater in absolute value than about 3.0 * sigma) then you should really be using the complementary error function erfc(x) = 1-erf(x) which tells you how close P(d < d1) is to 0 or 1 without running into numerical errors. For d1 < -3*sigma, P(d < d1) = (erf(d1/sigma/sqrt(2))+1)/2 = erfc(-d1/sigma/sqrt(2))/2, and for d1 > 3*sigma, P(d < d1) = (erf(d1/sigma/sqrt(2))+1)/2 = 1 - erfc(d1/sigma/sqrt(2))/2 -- but don't actually compute that; instead leave it as 1 - K where K = erfc(d1/sigma/sqrt(2))/2. For example, if d1 = 5*sigma, then P(d < d1) = 1 - 2.866516*10-7
If for example your programming environment doesn't have erf(x) built into the available libraries, you need a good approximation. (I thought I had an easy one to use but I can't find it and I think it was actually for the inverse error function). I found this 1969 article by W. J. Cody which gives a good approximation for erf(x) if |x| < 0.5, and it's better to use erf(x) = 1 - erfc(x) for |x| > 0.5. For example, let's say you want erf(0.2) &approx; 0.2227025892105 from Wolfram Alpha; Cody says evaluate with x * R(x2) where R is a rational function you can get from his table.
If I try this in Javascript (coefficients from Table II of the Cody paper):
// use only for |x| <= 0.5
function erf1(x)
{
var y = x*x;
return x*(3.6767877 - 9.7970565e-2*y)/(3.2584593 + y);
}
then I get erf1(0.2) = 0.22270208866303123 which is pretty close, for a 1st-order rational function. Cody gives tables of coefficients for rational functions up to degree 4; here's degree 2:
// use only for |x| <= 0.5
function erf2(x)
{
var y = x*x;
return x*(21.3853322378 + 1.72227577039*y + 0.316652890658*y*y)
/ (18.9522572415 + 7.8437457083*y + y*y);
}
which gives you erf2(0.2) = 0.22270258922638206 which is correct out to 10 decimal places. The Cody paper also gives you similar formulas for erfc(x) where |x| is between 0.5 and 4.0, and a third formula for erfc(x) where |x| > 4.0, and you can check your results with Wolfram Alpha or known erfc(x) tables for accuracy if you like.
Hope this helps!