Conditional prior in PyMC3 - bayesian

I am trying to build a model in which the prior assigned to a distribution is contingent on a particular value, and that value is another variable that is sampled. For example, a student answering a question correctly is modeled according to a Bernoulli trial with probability p. If the student has the given prerequisites (themselves part of the model), p should be drawn from Beta(20,5). If not, p should be drawn from Beta(5,20).
I got this to work in PyMC2 with the following code:
# prior for thetas - same for all students
lambda1 = pymc.Beta('lambda1',alpha=20,beta=5)
#top-level node - one for each student
theta1 = []
for i in range(num_students):
theta1.append(pymc.Bernoulli('theta1_%i' % i, p=lambda1, plot=False))
lambda2 = [
pymc.Beta('lambda2_0', alpha=5,beta=20),
pymc.Beta('lambda2_1', alpha=20,beta=5)
]
lambda2_choices = []
theta2 = []
for i in range(num_students):
#pymc.deterministic(name='lambda2_choice_%i'%(i), plot=False)
def lambda2_choice(theta1 = theta1[i],
lambda2 = lambda2):
if theta1 == False:
return lambda2[0]
elif theta1 == True:
return lambda2[1]
lambda2_choices.append(lambda2_choice)
theta2.append(pymc.Bernoulli('theta2_%i' % i,p=lambda2_choice))
In other words, the prior assigned to the Bernoulli random variable is a deterministic function that returns a stochastic variable depending on the SAMPLED value of some other value, in this case theta1[i].
I can't figure out how to do this in PyMC3, as the #deterministic decorator no longer exists and deterministic functions have to have input/output as Theano variables.
I'd really appreciate any insight or suggestions!!

Here you can use:
pymc3.switch(theta[i], lambda2[1], lambda2[0])

Related

Scipy Optimization for Inventory allocation

I am new in Python in optimization with scipy and very enthousiastic about it!
I am trying to optimize inventory allocation for spare parts (very low demand) using SciPy.
The goal is to minimize the inventory value while obtaining a target service rate (otif for on-time in full).
I defined a function otif to return a scalar for the objective.
There is also a function stock_val to evaluate the value of the stock (taking the allocation and the price)
Finally I have minimize_stock in which I define an initial guess, provide bounds and constraint.
The constraint is simply to reach the targeted otif.
My issue is that minimize fails and just return the initial guess.
As entry:
- part_demand is a dataframe with a column with material part number and columns of ordered quantity per months.
- part_price is a dataframe with a column with material part number and price.
from scipy.optimize import minimize
def otif(part_demand, stocks):
"""
Returns the OnTimeInFull covered with stock using parts demand and stocks
"""
df_occ = part_demand[(part_demand !=0)].count(axis=1)
df_cov = part_demand[(part_demand !=0) & (part_demand.le(stocks, axis=0))].count(axis = 1)
return df_cov.sum() / df_occ.sum()
def stock_val(stocks, part_price):
"""
Returns the stock value of stock using parts prices
"""
return np.dot(stocks,part_price)
def minimize_stock(target_otif, part_demand, part_price):
"""
Returns the optimal stock value and distribution for a part demand and targeted OnTimeInFull
"""
n = part_demand.shape[0]
init_guess = np.repeat(10,n)
bounds = ((0,10),) * n # an N-tuple of 2-tuples
# construct the constraints
return_is_target = {'type': 'eq',
'fun': lambda stocks: target_otif - otif(part_demand, stocks)}
stocks = minimize(stock_val, init_guess,
args=(part_price,), method='SLSQP',
options={'disp': True},
constraints=(return_is_target),
bounds=bounds)
return stocks
# Define entry Variables
material_items = pnMonthly[dataMthColMask]
#display(material_items)
df_groupby = PNtable.groupby("Material").last()
material_price = df_groupby['UnitPrice']
n = material_items.shape[0]
init_guess = np.repeat(10000,n)
print(stock_val(init_guess, material_price))
sol = minimize_stock(target_otif=0.89, part_demand=material_items, part_price=material_price)
display(sol)
print('OTIF is ', format(otif(material_items, sol.x)))
print('Stock value is ', format(stock_val(sol.x, material_price)))
Thank you in advance for your help!
I tried different algorithms apart from SLSQP and still get failure.
I am not sure about which algorithm is best for this kind of optimization problem.
I am also not sure about the syntax for minimize or what are the conditions for the algorithm to work.

Finding n-tuple that minimizes expensive cost function

Suppose there are three variables that take on discrete integer values, say w1 = {1,2,3,4,5,6,7,8,9,10,11,12}, w2 = {1,2,3,4,5,6,7,8,9,10,11,12}, and w3 = {1,2,3,4,5,6,7,8,9,10,11,12}. The task is to pick one value from each set such that the resulting triplet minimizes some (black box, computationally expensive) cost function.
I've tried the surrogate optimization in Matlab but I'm not sure it is appropriate. I've also heard about simulated annealing but found no implementation applied to this instance.
Which algorithm, apart from exhaustive search, can solve this combinatorial optimization problem?
Any help would be much appreciated.
The requirement/benefit of Simulated Annealing (SA), is that the objective surface is somewhat smooth, that is, we can be close to a solution.
For a completely random spiky surface- you might as well do a random search
If it is anything smooth, or even sometimes, it makes sense to try SA.
The idea is that (sometimes) changing only 1 of the 3 values, we have little effect on out blackbox function.
Here is a basic example to do this with Simulated Annealing, using frigidum in Python
import numpy as np
w1 = np.array( [1,2,3,4,5,6,7,8,9,10,11,12] )
w2 = np.array( [1,2,3,4,5,6,7,8,9,10,11,12] )
w3 = np.array( [1,2,3,4,5,6,7,8,9,10,11,12] )
W = np.array([w1,w2,w3])
LENGTH = 12
I define a black-box using the Rastrigin function.
def rastrigin_function_n( x ):
"""
N-dimensional Rastrigin
https://en.wikipedia.org/wiki/Rastrigin_function
x_i is in [-5.12, 5.12]
"""
A = 10
n = x.shape[0]
return A*n + np.sum( x**2- A*np.cos(2*np.pi * x) )
def black_box( x ):
"""
Transform from domain [1,12] to [-5,5]
to be able to push to rastrigin
"""
x = (x - 6.5) * (5/5.5)
return rastrigin_function_n(x)
Simulated Annealing needs to modify state X. Instead of taking/modifying values directly, we keep track of indices. This simplifies creating new proposals as an index is always an integer we can simply add/subtract 1 modulo LENGTH.
def random_start():
"""
returns 3 random indices
"""
return np.random.randint(0, LENGTH, size=3)
def random_small_step(x):
"""
change only 1 index
"""
d = np.array( [1,0,0] )
if np.random.random() < .5:
d = np.array( [-1,0,0] )
np.random.shuffle(d)
return (x+d) % LENGTH
def random_big_step(x):
"""
change 2 indici
"""
d = np.array( [1,-1,0] )
np.random.shuffle(d)
return (x+d) % LENGTH
def obj(x):
"""
We have a triplet of indici,
1. Calculate corresponding values in W = [w1,w2,w3]
2. Push the values in out black-box function
"""
indices = x
values = W[np.array([0,1,2]), indices]
return black_box(values)
And throw a SA Scheme at it
import frigidum
local_opt = frigidum.sa(random_start=random_start,
neighbours=[random_small_step, random_big_step],
objective_function=obj,
T_start=10**4,
T_stop=0.000001,
repeats=10**3,
copy_state=frigidum.annealing.naked)
I am not sure what the minimum for this function should be, but it found a objective with 47.9095 with indicis np.array([9, 2, 2])
Edit:
For frigidum to change the cooling schedule, use alpha=.9. My experience is that all the work of experiment which cooling scheme works best doesn't out-weight simply let it run a little longer. The multiplication you proposed, (sometimes called geometric) is the standard one, also implemented in frigidum. So to implement Tn+1 = 0.9*Tn you need a alpha=.9. Be aware this cooling step is done after N repeats, so if repeats=100, it will first do 100 proposals before lowering the temperature with factor alpha
Simple variations on current state often works best. Since its best practice to set the initial temperature high enough to make most proposals (>90%) accepted, it doesn't matter the steps are small. But if you fear its soo small, try 2 or 3 variations. Frigidum accepts a list of proposal functions, and combinations can enforce each other.
I have no experience with MINLP. But even if, so many times experiments can surprise us. So if time/cost is small to bring another competitor to the table, yes!
Try every possible combination of the three values and see which has the lowest cost.

How to apply bounds on a variable when performing optimisation in Pytorch?

I am trying to use Pytorch for non-convex optimisation, trying to maximise my objective (so minimise in SGD). I would like to bound my dependent variable x > 0, and also have the sum of my x values be less than 1000.
I think I have the penalty implemented correctly in the form of a ramp penalty, but am struggling with the bounding of the x variable. In Pytorch you can set the bounds using clamp but it doesn't seem appropriate in this case. I think this is because optim needs the gradients free under the hood. Full working example:
import torch
from torch.autograd import Variable
import numpy as np
def objective(x, a, b, c): # Want to maximise this quantity (so minimise in SGD)
d = 1 / (1 + torch.exp(-a * (x)))
# Checking constraint
exceeded_limit = constraint(x).item()
#print(exceeded_limit)
obj = torch.sum(d * (b * c - x))
# If overlimit add ramp penalty
if exceeded_limit < 0:
obj = obj - (exceeded_limit * 10)
print("Exceeded limit")
return - obj
def constraint(x, limit = 1000): # Must be > 0
return limit - x.sum()
N = 1000
# x is variable to optimise for
x = Variable(torch.Tensor([1 for ii in range(N)]), requires_grad=True)
a = Variable(torch.Tensor(np.random.uniform(0,100,N)), requires_grad=True)
b = Variable(torch.Tensor(np.random.rand(N)), requires_grad=True)
c = Variable(torch.Tensor(np.random.rand(N)), requires_grad=True)
# Would like to include the clamp
# x = torch.clamp(x, min=0)
# Non-convex methodf
opt = torch.optim.SGD([x], lr=.01)
for i in range(10000):
# Zeroing gradients
opt.zero_grad()
# Evaluating the objective
obj = objective(x, a, b, c)
# Calculate gradients
obj.backward()
opt.step()
if i%1000==0: print("Objective: %.1f" % -obj.item())
print("\nObjective: {}".format(-obj))
print("Limit: {}".format(constraint(x).item()))
if torch.sum(x<0) > 0: print("Bounds not met")
if constraint(x).item() < 0: print("Constraint not met")
Any suggestions as to how to impose the bounds would be appreciated, either using clamp or otherwise. Or generally advice on non-convex optimisation using Pytorch. This is a much simpler and scaled down version of the problem I'm working so am trying to find a lightweight solution if possible. I am considering using a workaround such as transforming the x variable using an exponential function but then you'd have to scale the function to avoid the positive values becoming infinite, and I want some flexibility with being able to set the constraint.
I meet the same problem with you.
I want to apply bounds on a variable in PyTorch, too.
And I solved this problem by the below Way3.
Your example is a little compliex but I am still learning English.
So I give a simpler example below.
For example, there is a trainable variable v, its bounds is (-1, 1)
v = torch.tensor((0.5, ), require_grad=True)
v_loss = xxxx
optimizer.zero_grad()
v_loss.backward()
optimizer.step()
Way1. RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.
v.clamp_(-1, 1)
Way2. RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed.
v = torch.clamp(v, -1, +1) # equal to v = v.clamp(-1, +1)
Way3. NotError. I solved this problem in Way3.
with torch.no_grad():
v[:] = v.clamp(-1, +1) # You must use v[:]=xxx instead of v=xxx

Hyperpriors for hierarchical models with Stan

I'm looking to fit a model to estimate multiple probabilities for binomial data with Stan. I was using beta priors for each probability, but I've been reading about using hyperpriors to pool information and encourage shrinkage on the estimates.
I've seen this example to define the hyperprior in pymc, but I'm not sure how to do something similar with Stan
#pymc.stochastic(dtype=np.float64)
def beta_priors(value=[1.0, 1.0]):
a, b = value
if a <= 0 or b <= 0:
return -np.inf
else:
return np.log(np.power((a + b), -2.5))
a = beta_priors[0]
b = beta_priors[1]
With a and b then being used as parameters for the beta prior.
Can anybody give me any pointers on how something similar would be done with Stan?
To properly normalize that, you need a Pareto distribution. For example, if you want a distribution p(a, b) ∝ (a + b)^(-2.5), you can use
a + b ~ pareto(L, 1.5);
where a + b > L. There's no way to normalize the density with support for all values greater than or equal to zero---it needs a finite L as a lower bound. There's a discussion of using just this prior as the count component of a hierarchical prior for a simplex.
If a and b are parameters, they can either both be constrained to be positive, or you can leave a unconstrained and declare
real<lower = L - a> b;
to insure a + b > L. L can be a small constant or something more reasonable given your knowledge of a and b.
You should be careful because this will not identify a + b. We use this construction as a hierarchical prior for simplexes as:
parameters {
real<lower = 1> kappa;
real<lower = 0, upper = 1> phi;
vector<lower = 0, upper = 1>[K] theta;
model {
kappa ~ pareto(1, 1.5); // power law prior
phi ~ beta(a, b); // choose your prior for theta
theta ~ beta(kappa * phi, kappa * (1 - phi)); // vectorized
There's an extended example in my Stan case study of repeated binary trials, which is reachable from the case studies page on the Stan web site (the case study directory is currently linked under the documentation link from the users tab).
Following suggestions in the comments I'm not sure that I will follow this approach, but for reference I thought I'd at least post the answer to my question of how this could be accomplished in Stan.
After some asking around on Stan Discourses and further investigation I found that the solution was to set a custom density distribution and use the target += syntax. So the equivalent for Stan of the example for pymc would be:
parameters {
real<lower=0> a;
real<lower=0> b;
real<lower=0,upper=1> p;
...
}
model {
target += log((a + b)^-2.5);
p ~ beta(a,b)
...
}

knapsack algorithm not returning optimal value

I am trying to write an algorithm in python for knapsack problem. I did quite a few iterations and came to the following solution. It seems perfect for me. When I ran it on test sets,
it does not output optimal value
and sometimes gives maximum recursion depth error.
after changing the maxValues function it is outputting the optimal value, But it takes very very long time for datasets having more points. how to refine it
For the second problem, I have inspected the data for which it gives the error. The data is like huge and only just couple of them exceeds and the knapsack capacity. So it unnecessarily goes through the entire list.
So what I planned to do is at the start of running my recursive function, I tried to see the entire weights list where each weight is less than the current capacity and prune the rest. The following is the code I am planning to implement.
#~ weights_jump = find_indices(temp_w, lambda e: e < capacity)
#~ if(len(weights_jump)>0):
#~ temp_w[0:weights_jump[0]-1] = []
#~ temp_v[0:weights_jump[0]-1] = []
My main problem remains that why it is not outputting the optimal value. Please help me in this regards and also to integrate the above code into the current algorithm
The following is the main function. the input for this function is as follows,
A knapsack input contains n + 1 lines. The first line contains two integers, the first is the number
of items in the problem, n. The second number is the capacity of the knapsack, K. The remaining
lines present the data for each of the items. Each line, i ∈ 0 . . . n − 1 contains two integers, the
item’s value vi followed by its weight wi
eg input:
n K
v_0 w_0
v_1 w_1
...
v_n-1 w_n-1
def solveIt(inputData):
# parse the input
lines = inputData.split('\n')
firstLine = lines[0].split()
items = int(firstLine[0])
capacity = int(firstLine[1])
K = capacity
values = []
weights = []
for i in range(1, items+1):
line = lines[i]
parts = line.split()
values.append(int(parts[0]))
weights.append(int(parts[1]))
items = len(values)
#end of parsing
value = 0
weight = 0
print(weights)
print(values)
v = node(value,weights,values,K,0,taken);
# prepare the solution in the specified output format
outputData = str(v[0]) + ' ' + str(0) + '\n'
outputData += ' '.join(map(str, v[1]))
return outputData
The following is the recursive function
I'will try to explain this recursive function. Let's say I 'm starting off with root node and now I ill have two decisions to make either to take the first element or not.
before this I will call maxValue function to see the maximum value that can be obtained following this branch. If it is less than existing_max no need to search, so prune.
i will follow left branch if the weight of the first element is less than capacity. so append(1).
updating values,weights list etc and again call node function.
so it first transverses entire left branch and then transverses right branch.
in the right im just updating the values,weights lists and calling node function.
For this function inputs are
value -- The current value of the problem, It is initially set to zero and is it goes it gets increased
weights list
values list
current capacity
current max value found by algo. If this existing_max is greater than the maximum value that can be obtained by following a branch,
there is no need to search that branch. so entire branch is pruned
existing_nodes is the list which tells whether a particular item is taken (1) or not (0)
def node(value,weights,values,capacity,existing_max,existing_nodes):
v1=[];e1=[]; #values we get from left branch
v2=[];e2=[]; #values we get from right branch
e=[];
e = existing_nodes[:];
temp_w = weights[:]
temp_v = values[:];
#first check if the list is empty
if(len(values)==0):
r = [value,existing_nodes[:]]
return r;
#centre check if this entire branch could be pruned. it checks for max value that can be obtained is more than the max value inputted to this
max_value = value+maxValue(weights,values,capacity);
print('existing _max is '+str(existing_max))
print('weight in concern '+str(weights[0])+' value is '+str(value))
if(max_value<=existing_max):
return [0,[]];
#Transversing the left branch
#Transverse only if the weight does not exceed the capacity
print colored('leftbranch','red');
#search for indices of weights where weight < capacity
#~ weights_jump = find_indices(temp_w, lambda e: e < capacity)
#~ if(len(weights_jump)>0):
#~ temp_w[0:weights_jump[0]-1] = []
#~ temp_v[0:weights_jump[0]-1] = []
if(temp_w[0]<=capacity):
updated_value = temp_v[0]+value;
k = capacity-temp_w[0];
temp_w.pop(0);
temp_v.pop(0);
e1 =e[:]
e1.append(1);
print(str(updated_value)+' '+str(k)+' ')
raw_input('press ')
v1= node(updated_value,temp_w,temp_v,k,existing_max,e1);
#after transversing left node update existing_max
if(v1[0]>existing_max):
existing_max = v1[0];
else:
v1 = [0,[]]
#Transverse the right branch
#it implies we are not including the current value so remove that from weights and values.
print('rightbranch')
#~ print(str(value)+' '+str(capacity)+' ')
raw_input("Press Enter to continue...")
weights.pop(0);
values.pop(0);
e2 =e[:];
e2.append(0);
v2 = node(value,weights,values,capacity,existing_max,e2);
if(v1[0]>v2[0]):
return v1;
else:
return v2;
The following is the helper function maxValue which is called from recursive function
def maxValues(weights,values,K):
weights = weights;
values = values;
a=[];
l = 0;
#~ print('hello');
items = len(weights);
#~ print(items);
max = 0;k = K;
for i in range(0,items):
t = (i,float(values[i])/weights[i]);
a.append(t);
#~ print(i);
a = sorted(a,key=operator.itemgetter(1),reverse=True);
#~ print(a);
length = len(a);
for (i,v) in a:
#~ print('this is loop'+str(l)+'and k is '+str(k)+'weight is '+str(weights[i]) );
if weights[i]<=k:
max = max+values[i];
w = weights[i];
#~ print('this'+str(w));
k = k-w;
if(k==0):
break;
else:
max = max+ (float(k)/weights[i])*values[i];
w = k
k = k-w;
#~ print('this is w '+str(max));
if(k==0):
break;
l= l+1;
return max;
I spent days on this and could not do anything.