Hyperpriors for hierarchical models with Stan - bayesian

I'm looking to fit a model to estimate multiple probabilities for binomial data with Stan. I was using beta priors for each probability, but I've been reading about using hyperpriors to pool information and encourage shrinkage on the estimates.
I've seen this example to define the hyperprior in pymc, but I'm not sure how to do something similar with Stan
#pymc.stochastic(dtype=np.float64)
def beta_priors(value=[1.0, 1.0]):
a, b = value
if a <= 0 or b <= 0:
return -np.inf
else:
return np.log(np.power((a + b), -2.5))
a = beta_priors[0]
b = beta_priors[1]
With a and b then being used as parameters for the beta prior.
Can anybody give me any pointers on how something similar would be done with Stan?

To properly normalize that, you need a Pareto distribution. For example, if you want a distribution p(a, b) ∝ (a + b)^(-2.5), you can use
a + b ~ pareto(L, 1.5);
where a + b > L. There's no way to normalize the density with support for all values greater than or equal to zero---it needs a finite L as a lower bound. There's a discussion of using just this prior as the count component of a hierarchical prior for a simplex.
If a and b are parameters, they can either both be constrained to be positive, or you can leave a unconstrained and declare
real<lower = L - a> b;
to insure a + b > L. L can be a small constant or something more reasonable given your knowledge of a and b.
You should be careful because this will not identify a + b. We use this construction as a hierarchical prior for simplexes as:
parameters {
real<lower = 1> kappa;
real<lower = 0, upper = 1> phi;
vector<lower = 0, upper = 1>[K] theta;
model {
kappa ~ pareto(1, 1.5); // power law prior
phi ~ beta(a, b); // choose your prior for theta
theta ~ beta(kappa * phi, kappa * (1 - phi)); // vectorized
There's an extended example in my Stan case study of repeated binary trials, which is reachable from the case studies page on the Stan web site (the case study directory is currently linked under the documentation link from the users tab).

Following suggestions in the comments I'm not sure that I will follow this approach, but for reference I thought I'd at least post the answer to my question of how this could be accomplished in Stan.
After some asking around on Stan Discourses and further investigation I found that the solution was to set a custom density distribution and use the target += syntax. So the equivalent for Stan of the example for pymc would be:
parameters {
real<lower=0> a;
real<lower=0> b;
real<lower=0,upper=1> p;
...
}
model {
target += log((a + b)^-2.5);
p ~ beta(a,b)
...
}

Related

Calculate intercepting vector?

I am trying to calculate an intercepting vector based on Velocity Location and time of two objects.
I found an post covering my problem but was left over with some technical questions i could not ask because my reputation is below 50.
Calculating Intercepting Vector
The answer marked as best goes over the process of how to solve my problem, however when i tried to calculate myself, i could not understand how the vectors of position and velocity are converted to a real number.
Using the data provided here for the positions and speeds of the target and the interceptor, the solving equation is the following:
plugging in the numbers, the coefficients of the quadratic equation in t are:
s_t = [120, 40]; v_t = [5,2]; s_i = [80, 80]; v_i = 10;
a = dot(v_t, v_t)-10^2
b = 2*dot((s_t - s_i),v_t)
c = dot(s_t - s_i, s_t - s_i)
Solving for t yields:
delta = sqrt(b^2-4*a*c)
t1 = (b + sqrt(b^2 - 4*a*c))/(2*a)
t2 = (b - sqrt(b^2 - 4*a*c))/(2*a)
With the data at hand, t1 turns out to be negative, and can be discarded.

systemfit 3SLS Testing for Overidentification Restrictions

currently I'm struggling to find a good way to perform the Hansen/Sargan tests of Overidentification restrictions within a Three-Stage Least Squares model (3SLS) in panel data using R. I was digging the whole day in different networks and couldn't find a way of depicting the tests in R using the well-known systemfit package.
Currently, my code is simple.
violence_c_3sls <- Crime ~ ln_GDP +I(ln_GDP^2) + ln_Gini
income_c_3sls <-ln_GDP ~ Crime + ln_Gini
gini_c_3sls <- ln_Gini ~ ln_GDP + I(ln_GDP^2) + Crime
inst <- ~ Educ_Gvmnt_Exp + I(Educ_Gvmnt_Exp^2)+ Health_Exp + Pov_Head_Count_1.9
system_c_3sls <- list(violence_c_3sls, income_c_3sls, gini_c_3sls)
fitsur_c_3sls <-systemfit(system_c_3sls, "3SLS",inst=inst, data=df_new, methodResidCov = "noDfCor" )
summary(fitsur_c_3sls)
However, adding more instruments to create an over-identified system do not yield in an output of the Hansen/Sargan test, thus I assume the test should be executed aside from the output and probably associated to systemfit class object.
Thanks in advance.
With g equations, l exogenous variables, and k regressors, the Sargan test for 3SLS is
where u is the stacked residuals, \Sigma is the estimated residual covariance, and P_W is the projection matrix on the exogenous variables. See Ch 12.4 from Davidson & MacKinnon ETM.
Calculating the Sargan test from systemfit should look something like this:
sargan.systemfit=function(results3sls){
result <- list()
u=as.matrix(resid(results3sls)) #model residuals, n x n_eq
n_eq=length(results3sls$eq) # number of equations
n=nrow(u) #number of observations
n_reg=length(coef(results3sls)) # total number of regressors
w=model.matrix(results3sls,which='z') #Matrix of instruments, in block diagonal form with one block per equation
#Need to aggregate into a single block (in case different instruments used per equation)
w_list=lapply(X = 1:n_eq,FUN = function(eq_i){
this_eq_label=results3sls$eq[[eq_i]]$eqnLabel
this_w=w[str_detect(rownames(w),this_eq_label),str_detect(colnames(w),this_eq_label)]
colnames(this_w)=str_remove(colnames(this_w),paste0(this_eq_label,'_'))
return(this_w)
})
w=do.call(cbind,w_list)
w=w[,!duplicated(colnames(w))]
n_inst=ncol(w) #w is n x n_inst, where n_inst is the number of unique instruments/exogenous variables
#estimate residual variance (or use residCov, should be asymptotically equivalent)
var_u=crossprod(u)/n #var_u=results3sls$residCov
P_w=w%*%solve(crossprod(w))%*%t(w) #Projection matrix on instruments w
#as.numeric(u) vectorizes the residuals into a n_eq*n x 1 vector.
result$statistic <- as.numeric(t(as.numeric(u))%*%kronecker(solve(var_u),P_w)%*%as.numeric(u))
result$df <- n_inst*n_eq-n_reg
result$p.value <- 1 - pchisq(result$statistic, result$df)
result$method = paste("Sargan over-identifying restrictions test")
return(result)
}

Finding n-tuple that minimizes expensive cost function

Suppose there are three variables that take on discrete integer values, say w1 = {1,2,3,4,5,6,7,8,9,10,11,12}, w2 = {1,2,3,4,5,6,7,8,9,10,11,12}, and w3 = {1,2,3,4,5,6,7,8,9,10,11,12}. The task is to pick one value from each set such that the resulting triplet minimizes some (black box, computationally expensive) cost function.
I've tried the surrogate optimization in Matlab but I'm not sure it is appropriate. I've also heard about simulated annealing but found no implementation applied to this instance.
Which algorithm, apart from exhaustive search, can solve this combinatorial optimization problem?
Any help would be much appreciated.
The requirement/benefit of Simulated Annealing (SA), is that the objective surface is somewhat smooth, that is, we can be close to a solution.
For a completely random spiky surface- you might as well do a random search
If it is anything smooth, or even sometimes, it makes sense to try SA.
The idea is that (sometimes) changing only 1 of the 3 values, we have little effect on out blackbox function.
Here is a basic example to do this with Simulated Annealing, using frigidum in Python
import numpy as np
w1 = np.array( [1,2,3,4,5,6,7,8,9,10,11,12] )
w2 = np.array( [1,2,3,4,5,6,7,8,9,10,11,12] )
w3 = np.array( [1,2,3,4,5,6,7,8,9,10,11,12] )
W = np.array([w1,w2,w3])
LENGTH = 12
I define a black-box using the Rastrigin function.
def rastrigin_function_n( x ):
"""
N-dimensional Rastrigin
https://en.wikipedia.org/wiki/Rastrigin_function
x_i is in [-5.12, 5.12]
"""
A = 10
n = x.shape[0]
return A*n + np.sum( x**2- A*np.cos(2*np.pi * x) )
def black_box( x ):
"""
Transform from domain [1,12] to [-5,5]
to be able to push to rastrigin
"""
x = (x - 6.5) * (5/5.5)
return rastrigin_function_n(x)
Simulated Annealing needs to modify state X. Instead of taking/modifying values directly, we keep track of indices. This simplifies creating new proposals as an index is always an integer we can simply add/subtract 1 modulo LENGTH.
def random_start():
"""
returns 3 random indices
"""
return np.random.randint(0, LENGTH, size=3)
def random_small_step(x):
"""
change only 1 index
"""
d = np.array( [1,0,0] )
if np.random.random() < .5:
d = np.array( [-1,0,0] )
np.random.shuffle(d)
return (x+d) % LENGTH
def random_big_step(x):
"""
change 2 indici
"""
d = np.array( [1,-1,0] )
np.random.shuffle(d)
return (x+d) % LENGTH
def obj(x):
"""
We have a triplet of indici,
1. Calculate corresponding values in W = [w1,w2,w3]
2. Push the values in out black-box function
"""
indices = x
values = W[np.array([0,1,2]), indices]
return black_box(values)
And throw a SA Scheme at it
import frigidum
local_opt = frigidum.sa(random_start=random_start,
neighbours=[random_small_step, random_big_step],
objective_function=obj,
T_start=10**4,
T_stop=0.000001,
repeats=10**3,
copy_state=frigidum.annealing.naked)
I am not sure what the minimum for this function should be, but it found a objective with 47.9095 with indicis np.array([9, 2, 2])
Edit:
For frigidum to change the cooling schedule, use alpha=.9. My experience is that all the work of experiment which cooling scheme works best doesn't out-weight simply let it run a little longer. The multiplication you proposed, (sometimes called geometric) is the standard one, also implemented in frigidum. So to implement Tn+1 = 0.9*Tn you need a alpha=.9. Be aware this cooling step is done after N repeats, so if repeats=100, it will first do 100 proposals before lowering the temperature with factor alpha
Simple variations on current state often works best. Since its best practice to set the initial temperature high enough to make most proposals (>90%) accepted, it doesn't matter the steps are small. But if you fear its soo small, try 2 or 3 variations. Frigidum accepts a list of proposal functions, and combinations can enforce each other.
I have no experience with MINLP. But even if, so many times experiments can surprise us. So if time/cost is small to bring another competitor to the table, yes!
Try every possible combination of the three values and see which has the lowest cost.

Algorithms for factorizing a 30 decimal digit number

My professor has given me an RSA factoring problem has assignment. The given modulus is 30 decimal digits long. I have been searching a lot about factoring algorithms. But it has been quite a headache to choose one for my given requirements. Which all algorithms give better performance for 30 decimal digit numbers?
Note: So far I have read about Brute force approach and Quadratic Sieve. The latter is complex and the former time consuming.
There's another method called Pollard's Rho algorithm, which is not as fast as the GNFS but is capable of factoring 30-digit numbers in minutes rather than hours.
The algorithm is very simple. It stops when it finds any factor, so you'll need to call it recursively to obtain a complete factorisation. Here's a basic implementation in Python:
def rho(n):
def gcd(a, b):
while b > 0:
a, b = b, a%b
return a
g = lambda z: (z**2 + 1) % n
x, y, d = 2, 2, 1
while d == 1:
x = g(x)
y = g(g(y))
d = gcd(abs(x-y), n)
if d == n:
print("Can't factor this, sorry.")
print("Try a different polynomial for g(), maybe?")
else:
print("%d = %d * %d" % (n, d, n // d))
rho(441693463910910230162813378557) # = 763728550191017 * 578338290221621
Or you could just use an existing software library. I can't see much point in reinventing this particular wheel.

How to find time-varying coefficients for a VAR model by using the Kalman Filter

I'm trying to write some code in R to reproduce the model i found in this article.
The idea is to model the signal as a VAR model, but fit the coefficients by a Kalman-filter model. This would essentially enable me to create a robust time-varying VAR(p) model and analyze non-stationary data to a degree.
The model to track the coefficients is:
X(t) = F(t) X(t− 1) +W(t)
Y(t) = H(t) X(t) + E(t),
where H(t) is the Kronecker product between lagged measurements in my time-series Y and a unit vector, and X(t) fills the role of regression-coefficients. F(t) is taken to be an identity matrix, as that should mean we assume coefficients to evolve as a random walk.
In the article, from W(T), the state noise covariance matrix Q(t) is chosen at 10^-3 at first and then fitted based on some iteration scheme. From E(t) the state noise covariance matrix is R(t) substituted by the covariance of the noise term unexplained by the model: Y(t) - H(t)Xhat(t)
I have the a priori covariance matrix of estimation error (denoted Σ in the article) written as P (based on other sources) and the a posteriori as Pmin, since it will be used in the next recursion as a priori, if that makes sense.
So far i've written the following, based on the articles Appendix A 1.2
Y <- *my timeseries, for test purposes two channels of 3000 points*
F <- diag(8) # F is (m^2*p by m^2 *p) where m=2 dimensions and p =2 lags
H <- diag(2) %x% t(vec(Y[,1:2])) #the Kronecker product of vectorized lags Y-1 and Y-2
Xhatminus <- matrix(1,8,1) # an arbitrary *a priori* coefficient matrix
Q <- diag(8)%x%(10**-7) #just a number a really low number diagonal matrix, found it used in some examples
R<- 1 #Didnt know what else to put here just yet
Pmin = diag(8) #*a priori* error estimate, just some 1-s...
Now should start the reccursion. To test i just took the first 3000 points of one trial of my data.
Xhatstorage <- matrix(0,8,3000)
for(j in 3:3000){
H <- diag(2) %x% t(vec(Y[,(j-2):(j-1)]))
K <- (Pmin %*% t(H)) %*% solve((H%*%Pmin%*%t(H) + R)) ##Solve gives inverse matrix ()^-1
P <- Pmin - K%*% H %*% Pmin
Xhatplus <- F%*%( Xhatminus + K%*%(Y[,j]-H%*%Xhatminus) )
Pplus <- (F%*% P %*% F) + Q
Xhatminus <- Xhatplus
Xhatstorage[,j] <- Xhatplus
Pmin <- Pplus
}
I extracted Xhatplus values into a storage matrix and used them to write this primitive VAR model with them:
Yhat<-array(0,3000)
for(t in 3:3000){
Yhat[t]<- t(vec(Y[,(t-2)])) %*% Xhatstorage[c(1,3),t] + t(vec(Y[,(t-1)])) %*% Xhatstorage[c(2,4),t]
}
The looks like this .
The blue line is VAR with Kalman filter found coefficients, Black is original data..
I'm having issue understanding how i can better evaluate my coefficients? Why is it so off?
How should i better choose the first a priori and a posteriori estimates to start the recursion? Currently, adding more lags to the VAR is not the issue i'm sure, it's that i don't know how to choose the initial values for Pmin and Xhatmin. Most places i pieced this together from start from arbitrary 0 assumptions in toy models, but in this case, choosing any of the said matrixes as 0 will just collapse the entire algorithm.
Lastly, is this recursion even a correct implementation of Oya et al describe in the article? I know im still missing the R evaluation based on previously unexplained errors (V(t) in Appendix A 1.2), but in general?