Timeout for Z3 Optimize - optimization

How do you set a timeout for the z3 optimizer such that it will give you the best known solution when it runs out of time?
from z3 import *
s = Optimize()
# Hard Problem
print(s.check())
print(s.model())
Follow-up question, can you set z3 to randomized hill climbing or does it always perform a complete search?

Long answer short, you can't. That's simply not how the optimizer works. That is, it doesn't find a solution and then try to improve it. If you interrupt it or set a time-out, when the timer expires it may not even have a satisfying solution, let alone an "improved" one by any means. You should look at the optimization paper for details: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/nbjorner-nuz.pdf
It is true, however, that z3 does keep track of bounds of variables, for numerical quantities. You might be able to extract these, though in general, you'll have no means of knowing what values out of those intervals you'd need to pick to get a satisfying solution for the overall problem. See this answer for a discussion: Is it possible to get a legit range info when using a SMT constraint with Z3
This sort of "hill-climbing" questions come up often in this forum. And the answer is simply that's not how z3's optimizer works. Some prior questions in this manner:
Z3 Time Restricted Optimization
z3 minimization and timeout
There are few other questions along these lines in stack-overflow. Search for "optimize" and "timeout".
Your best bet
That's the theory side of it. In practice, I believe the best approach to deal with a problem of this sort is not to use the optimizer at all. Instead do the following:
State your problem
Ask for a model. If there's no model, respond unsat. Quit.
Hold on to the current model as "best-so-far"
Out of time? Return the model you have as "best-so-far". You are done.
Still have time?
5a. Compute the "cost" of this model. i.e., the metric you are trying to minimize or maximize. If you store the cost as a variable in your model, you can simply query its value from the model.
5b. Assert a new constraint saying the cost should be lower than the cost of the current model. (Or higher if you are maximizing.) Depending on how fancy you want to get, you might want to "double" the cost function, or implement some sort of binary-search to converge on a value faster. But all that is really dependent on the exact details of the problem.
5c. Ask for a new model. If unsat, return the last model you got as "optimal." Otherwise, repeat from step 3.
I believe this is the most practical approach for time-constraint optimization in z3. It gives you the full control on how many times to iterate, and guide the search in any way you want. (For instance, you can query for various variables at each model, and direct the search by saying "find me a bigger x, or a smaller y, etc., instead of looking at just one metric.) Hope that makes sense.
Summary
Note that an SMT solver can work like you're describing, i.e., give you an optimal-so-far solution when the time-out goes off. It's just that z3's optimizer does not work that way. For z3, I found the iterative loop described as above to be the most practical solution to this sort of timeout based optimization.
You can also look at OptiMathSAT (http://optimathsat.disi.unitn.it/) which might offer better facilities in this regard. #Patrick Trentin, who reads this forum often, is an expert on that and he might opine separately regarding its usage.

In general, #alias is right when he states that an OMT solver does not provide any guarantee of a solution being available at the end of the optimization search when this is interrupted by a timeout signal.
An OMT solver can look for an optimal solution in one of two ways:
by starting from an initial Model of the formula and trying to improve the value of the objective function; This is the case of the standard OMT approach, which enumerates a number of partially optimized solutions until it finds the optimal one.
by starting from more-than-optimal, unsatisfiable, assignment and progressively relaxing such assignment until it yields an optimal solution; AFAIK, this is only the case of the Maximum Resolution engine for dealing with MaxSMT problems.
When the OMT solver uses an optimization technique that falls in the first category, then it is possible to retrieve the best known solution when it runs out of time, provided that the OMT solver stores it in a safe place during the optimization search. This is not the case with the second MaxRes engine (see this Q/A).
A possible workaround. (CAVEAT: I haven't tested this)
z3 keeps track of the lower and upper bound of the objective function along the optimization search. When minimizing, the upper bound corresponds to the value of the objective function in the most recent partial solution found by the OMT solver (dual for maximization). After a timeout signal occurred when minimizing (resp. maximizing) an obj instance obtained from minimize() (resp. maximize()), one should be able to retrieve the latest approximation v of the optimal value of obj by calling obj.upper() (resp. obj.lower()). Assuming that such value v is different from +oo (resp. -oo), one can incrementally learn a constraint of the form cost = v and perform an incremental SMT check of satisfiability to reconstruct the model corresponding to the sub-optimal solution that was hit by z3.
OptiMathSAT is one OMT solver that stores in a safe place the latest solution it encounters during the optimization search. This makes it easy to achieve what you want to do.
There are two types of timeout signals in OptiMathSAT:
hard timeout: as soon as the timeout fires, the optimization search is stopped immediately; if the OMT solver found any solution, the result of the optimization search (accessible via msat_objective_result(env, obj)) is MSAT_OPT_SAT_PARTIAL and the Model corresponding to the latest sub-optimal solution can be extracted and printed; if instead the OMT solver didn't find any solution, the result of the optimization search is MSAT_UNKNOWN and no Model is available.
soft timeout: if a timeout fires after the OMT solver found any solution, then the search is stopped immediately as in the case of a hard timeout. Otherwise, the timeout is ignored until the OMT solver finds one solution.
The type of timeout signal can be set using the option opt.soft_timeout=[true|false].
Example: The following example is the timeout.py unit-test contained in my omt_python_examples github repository that features a number of examples of how to use the Python API interface of OptiMathSAT.
"""
timeout unit-test.
"""
###
### SETUP PATHS
###
import os
import sys
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
INCLUDE_DIR = os.path.join(BASE_DIR, '..', 'include')
LIB_DIR = os.path.join(BASE_DIR, '..', 'lib')
sys.path.append(INCLUDE_DIR)
sys.path.append(LIB_DIR)
from wrapper import * # pylint: disable=unused-wildcard-import,wildcard-import
###
### DATA
###
OPTIONS = {
"model_generation" : "true", # !IMPORTANT!
"opt.soft_timeout" : "false",
"opt.verbose" : "true",
}
###
### TIMEOUT UNIT-TEST
###
with create_config(OPTIONS) as cfg:
with create_env(cfg) as env:
# Load Hard Problem from file
with open(os.path.join(BASE_DIR, 'smt2', 'bacp-19.smt2'), 'r') as f:
TERM = msat_from_smtlib2(env, f.read())
assert not MSAT_ERROR_TERM(TERM)
msat_assert_formula(env, TERM)
# Impose a timeout of 3.0 seconds
CALLBACK = Timer(3.0)
msat_set_termination_test(env, CALLBACK)
with create_minimize(env, "objective", lower="23", upper="100") as obj:
assert_objective(env, obj)
solve(env) # optimization search until timeout
get_objectives_pretty(env) # print latest range of optimization search
load_model(env, obj) # retrieve sub-optimal model
dump_model(env) # print sub-optimal model
This is the verbose output of the optimization search:
# obj(.cost_0) := objective
# obj(.cost_0) - search start: [ 23, 100 ]
# obj(.cost_0) - linear step: 1
# obj(.cost_0) - new: 46
# obj(.cost_0) - update upper: [ 23, 46 ]
# obj(.cost_0) - linear step: 2
# obj(.cost_0) - new: 130/3
# obj(.cost_0) - update upper: [ 23, 130/3 ]
# obj(.cost_0) - linear step: 3
# obj(.cost_0) - new: 40
# obj(.cost_0) - update upper: [ 23, 40 ]
# obj(.cost_0) - linear step: 4
# obj(.cost_0) - new: 119/3
# obj(.cost_0) - update upper: [ 23, 119/3 ]
# obj(.cost_0) - linear step: 5
# obj(.cost_0) - new: 112/3
# obj(.cost_0) - update upper: [ 23, 112/3 ]
# obj(.cost_0) - linear step: 6
# obj(.cost_0) - new: 104/3
# obj(.cost_0) - update upper: [ 23, 104/3 ]
# obj(.cost_0) - linear step: 7
# obj(.cost_0) - new: 34
# obj(.cost_0) - update upper: [ 23, 34 ]
# obj(.cost_0) - linear step: 8
# obj(.cost_0) - new: 133/4
# obj(.cost_0) - update upper: [ 23, 133/4 ]
# obj(.cost_0) - linear step: 9
# obj(.cost_0) - new: 161/5
# obj(.cost_0) - update upper: [ 23, 161/5 ]
# obj(.cost_0) - linear step: 10
# obj(.cost_0) - new: 32
# obj(.cost_0) - update upper: [ 23, 32 ]
# obj(.cost_0) - linear step: 11
# obj(.cost_0) - new: 158/5
# obj(.cost_0) - update upper: [ 23, 158/5 ]
# obj(.cost_0) - linear step: 12
# obj(.cost_0) - new: 247/8
# obj(.cost_0) - update upper: [ 23, 247/8 ]
# obj(.cost_0) - linear step: 13
# obj(.cost_0) - new: 123/4
# obj(.cost_0) - update upper: [ 23, 123/4 ]
# obj(.cost_0) - linear step: 14
# obj(.cost_0) - new: 61/2
# obj(.cost_0) - update upper: [ 23, 61/2 ]
# obj(.cost_0) - linear step: 15
unknown ;; <== Timeout!
(objectives
(objective 61/2), partial search, range: [ 23, 61/2 ]
) ;; sub-optimal value, latest search interval
course_load__ARRAY__1 : 9 ;; and the corresponding sub-optimal model
course_load__ARRAY__2 : 1
course_load__ARRAY__3 : 2
course_load__ARRAY__4 : 10
course_load__ARRAY__5 : 3
course_load__ARRAY__6 : 4
course_load__ARRAY__7 : 1
course_load__ARRAY__8 : 10
course_load__ARRAY__9 : 4
course_load__ARRAY__10 : 1
course_load__ARRAY__11 : 1
course_load__ARRAY__12 : 5
course_load__ARRAY__13 : 10
course_load__ARRAY__14 : 9
course_load__ARRAY__15 : 1
...
;; the sub-optimal model is pretty long, it has been cut to fit this answer!
...

Related

Treatment of constraints in SLSQP optimization with openMDAO

With openMDAO, I am using FD derivatives and trying to solve a non-linearly constrained optimization problem with the SLSQP method. Any time the optimizer arrives at a point that violates one of the constraints, it just crashes with the message:
Optimization FAILED. Positive directional derivative for linesearch
For instance, if I intentionally set the initial point to an unfeasible design point, the optimizer performs 1 iteration and exits with the above error (the same happens when I start from a feasible point, but then optimizer arrives at an unfeasible point after a few iterations).
Based on the answer to the question in In OpenMDAO, is there a way to ensure that the constraints are respected before proceeding with a computation?, I'm assuming that raising the AnalysisError exception will not work in my case, is that correct? Is there any other way to prevent the optimizer from going into unfeasible regions or at least backtrack on the linesearch and try a different direction/distance? Or should the SLSQP method be only used when analytic derivatives are available?
Reproducible test case:
import numpy as np
import openmdao.api as om
class d1(om.ExplicitComponent):
def setup(self):
# Global design variables
self.add_input('r', val= [3,3,3])
self.add_input('T', val= 20)
# Coupling output
self.add_output('M', val=0)
self.add_output('cost', val=0)
def setup_partials(self):
# Finite difference all partials.
self.declare_partials('*', '*', method='fd')
def compute(self, inputs, outputs):
# define inputs
r = inputs['r']
T = inputs['T'][0]
cost = 174.42 * T * (r[0]**2 + 2*r[1]**2 + r[2]**2 + r[0]*r[1] + r[1]*r[2])
M = 456.19 * T * (r[0]**2 + 2*r[1]**2 + r[2]**2 + r[0]*r[1] + r[1]*r[2]) - 599718
outputs['M'] = M
outputs['cost'] = cost
class MDA(om.Group):
class ObjCmp(om.ExplicitComponent):
def setup(self):
# Global Design Variable
self.add_input('cost', val=0)
# Output
self.add_output('obj', val=0.0)
def setup_partials(self):
# Finite difference all partials.
self.declare_partials('*', '*', method='fd')
def compute(self, inputs, outputs):
outputs['obj'] = inputs['cost']
class ConCmp(om.ExplicitComponent):
def setup(self):
# Global Design Variable
self.add_input('M', val=0)
# Output
self.add_output('con', val=0.0)
def setup_partials(self):
# Finite difference all partials.
self.declare_partials('*', '*', method='fd')
def compute(self, inputs, outputs):
# assemble outputs
outputs['con'] = inputs['M']
def setup(self):
self.add_subsystem('d1', d1(), promotes_inputs=['r','T'],
promotes_outputs=['M','cost'])
self.add_subsystem('con_cmp', self.ConCmp(), promotes_inputs=['M'],
promotes_outputs=['con'])
self.add_subsystem('obj_cmp', self.ObjCmp(), promotes_inputs=['cost'],
promotes_outputs=['obj'])
# Build the model
prob = om.Problem(model=MDA())
model = prob.model
model.add_design_var('r', lower= [3,3,3], upper= [10,10,10])
model.add_design_var('T', lower= 20, upper= 220)
model.add_objective('obj', scaler=1)
model.add_constraint('con', lower=0)
# Setup the optimization
prob.driver = om.ScipyOptimizeDriver(optimizer='SLSQP', tol=1e-3, disp=True)
prob.setup()
prob.set_solver_print(level=0)
prob.run_driver()
# Printout
print('minimum found at')
print(prob.get_val('T')[0])
print(prob.get_val('r'))
print('constraint')
print(prob.get_val('con')[0])
print('minimum objective')
print(prob.get_val('obj')[0])
Based on your provided test case, the problem here is that your have a really poorly scaled objective and constraint (you also have some very strange coding choices ... which I modified).
Running the OpenMDAO scaling report shows that your objective and constraint values are both around 1e6 in magnitude:
This is quite large, and is the source of your problems. A (very rough) rule of thumb is that your objectives and constraints should be around order 1. Thats not hard and fast rule, but is generally a good starting point. Sometimes other scaling will work better, if you have very very larger or small derivatives ... but there are parts of SQP methods that are sensitive to the scaling of objective and constraint values directly. So trying to keep them roughly in the range of 1 is a good idea.
Adding ref=1e6 to both objective and constraints gave enough resolution for the numerical methods to converge the problem:
Current function value: [0.229372]
Iterations: 8
Function evaluations: 8
Gradient evaluations: 8
Optimization Complete
-----------------------------------
minimum found at
20.00006826587515
[3.61138704 3. 3.61138704]
constraint
197.20821903413162
minimum objective
229371.99547899762
Here is the code I modified (including removing the extra class definitions inside your group that didn't seem to be doing anything):
import numpy as np
import openmdao.api as om
class d1(om.ExplicitComponent):
def setup(self):
# Global design variables
self.add_input('r', val= [3,3,3])
self.add_input('T', val= 20)
# Coupling output
self.add_output('M', val=0)
self.add_output('cost', val=0)
def setup_partials(self):
# Finite difference all partials.
self.declare_partials('*', '*', method='cs')
def compute(self, inputs, outputs):
# define inputs
r = inputs['r']
T = inputs['T'][0]
cost = 174.42 * T * (r[0]**2 + 2*r[1]**2 + r[2]**2 + r[0]*r[1] + r[1]*r[2])
M = 456.19 * T * (r[0]**2 + 2*r[1]**2 + r[2]**2 + r[0]*r[1] + r[1]*r[2]) - 599718
outputs['M'] = M
outputs['cost'] = cost
class MDA(om.Group):
def setup(self):
self.add_subsystem('d1', d1(), promotes_inputs=['r','T'],
promotes_outputs=['M','cost'])
# self.add_subsystem('con_cmp', self.ConCmp(), promotes_inputs=['M'],
# promotes_outputs=['con'])
# self.add_subsystem('obj_cmp', self.ObjCmp(), promotes_inputs=['cost'],
# promotes_outputs=['obj'])
# Build the model
prob = om.Problem(model=MDA())
model = prob.model
model.add_design_var('r', lower= [3,3,3], upper= [10,10,10])
model.add_design_var('T', lower= 20, upper= 220)
model.add_objective('cost', ref=1e6)
model.add_constraint('M', lower=0, ref=1e6)
# Setup the optimization
prob.driver = om.ScipyOptimizeDriver(optimizer='SLSQP', tol=1e-3, disp=True)
prob.setup()
prob.set_solver_print(level=0)
prob.set_val('r', 7.65)
prob.run_driver()
# Printout
print('minimum found at')
print(prob.get_val('T')[0])
print(prob.get_val('r'))
print('constraint')
print(prob.get_val('M')[0])
print('minimum objective')
print(prob.get_val('cost')[0])
Which SLSQP method are you using? There is one implementation in pyOptSparse and one in ScipyOptimizer. The one in pyoptsparse is older and doesn't respect bounds constraints. The one in Scipy is newer and does. (Yes, its very confusing that they have the same name and share some lineage... but are not the same optimizer any more)
You shouldn't raise an analysis error when you go outside the bounds. If you need strict bounds respecting, I suggest using IPopt from within pyoptsparse (if you can get it to compile) or switching to ScipyOptimizerDriver and its SLSQP implementation.
Based on your question, its not totally clear to me if you're talking about bounds constraints or inequality/equality constraints. If its the latter, then then there isn't any optimizer that would guarantee you remain in a feasible region all the time. Interior point methods like IPopt will stay inside the region much better, but not 100% of the time.
In general, with gradient based optimization its pretty critical that you make your problem smooth and continuous even when its outside the constraint areas. If there are parts of the space that you absolutely can not go into, then you need to make those variables into design variables and use bound constraints. This sometimes requires reformulating your problem formulation a little bit, possibly by adding a kind of compatibility constraint that says "design variable = computed_value". Then you can make sure that the design variable is passed into anything that requires the value to be strictly within a bound, and (hopefully) a converged answer will also satisfy your compatibility constraint.
If you provide some kind of a test case or example, I can amend my answer with a more specific suggestion.

Finding out reason of Pyomo model infeasibility

I got a pyomo concrete model with lots of variables and constraints.
Somehow, one of the variable inside my model violates one constraint, which makes my model infeasible:
WARNING: Loading a SolverResults object with a warning status into model=xxxx;
message from solver=Model was proven to be infeasible.
Is there a way to ask the solver, the reason of the infeasibility?
So for example, lets assume I got a variable called x, and if I define following 2 constraints, model will be ofc. infeasible.
const1:
x >= 10
const2:
x <= 5
And what I want to achieve that pointing out the constraints and variable which causes this infeasibility, so that I can fix it. Otherwise with my big model it is kinda hard to get what causing this infeasibility.
IN: write_some_comment
OUT: variable "x" cannot fulfill "const1" and "const2" at the same time.
Many solvers (including IPOPT) will hand you back the value of the variables at solver termination, even if the problem was found infeasible. At that point, you do have some options.
There is contributed code in pyomo.util.infeasible that might help you out. https://github.com/Pyomo/pyomo/blob/master/pyomo/util/infeasible.py
Usage:
from pyomo.util.infeasible import log_infeasible_constraints
...
SolverFactory('your_solver').solve(model)
...
log_infeasible_constraints(model)
I would not trust any numbers that the solver loads into the model after reporting "infeasible." I don't think any solvers come w/ guarantees on the validity of those numbers. Further, unless a package can divine the modeler's intent, it isn't clear how it would list the infeasible constraints. Consider 2 constraints:
C1: x <= 5
C2: x >= 10
X ∈ Reals, or Integers, ...
Which is the invalid constraint? Well, it depends! Point being, it seems an impossible task to unwind the mystery based on values the solver tries.
A possible alternate strategy: Load the model with what you believe to be a valid solution, and test the slack on the constraints. This "loaded solution" could even be a null case where everything is zero'ed out (if that makes sense in the context of the model). It could also be a set of known feasible solutions tried via unit test code.
If you can construct what you believe to be a valid solution (forget about optimal, just something valid), you can (1) load those values, (2) iterate through the constraints in the model, (3) evaluate the constraint and look for negative slack, and (4) report the culprits with values and expressions
An example:
import pyomo.environ as pe
test_null_case = True
m = pe.ConcreteModel('sour constraints')
# SETS
m.T = pe.Set(initialize=['foo', 'bar'])
# VARS
m.X = pe.Var(m.T)
m.Y = pe.Var()
# OBJ
m.obj = pe.Objective(expr = sum(m.X[t] for t in m.T) + m.Y)
# Constraints
m.C1 = pe.Constraint(expr=sum(m.X[t] for t in m.T) <= 5)
m.C2 = pe.Constraint(expr=sum(m.X[t] for t in m.T) >= 10)
m.C3 = pe.Constraint(expr=m.Y >= 7)
m.C4 = pe.Constraint(expr=m.Y <= sum(m.X[t] for t in m.T))
if test_null_case:
# set values of all variables to a "known good" solution...
m.X.set_values({'foo':1, 'bar':3}) # index:value
m.Y.set_value(2) # scalar
for c in m.component_objects(ctype=pe.Constraint):
if c.slack() < 0: # constraint is not met
print(f'Constraint {c.name} is not satisfied')
c.display() # show the evaluation of c
c.pprint() # show the construction of c
print()
else:
pass
# instantiate solver & solve, etc...
Reports:
Constraint C2 is not satisfied
C2 : Size=1
Key : Lower : Body : Upper
None : 10.0 : 4 : None
C2 : Size=1, Index=None, Active=True
Key : Lower : Body : Upper : Active
None : 10.0 : X[foo] + X[bar] : +Inf : True
Constraint C3 is not satisfied
C3 : Size=1
Key : Lower : Body : Upper
None : 7.0 : 2 : None
C3 : Size=1, Index=None, Active=True
Key : Lower : Body : Upper : Active
None : 7.0 : Y : +Inf : True

Tensorflow Prediction Meanings

I've been working on a DNNClassifier model in Tensorflow, and I have gotten the model to train, evaluate and output results.
Here is the code I'm using to output my predictions:
predictions = classifier.predict(input_fn=predict_input_fn)
i = 0
for j, p in enumerate(predictions):
print("Prediction %s: %s" % (j + 1, p["probabilities"]))
i = i + 1
if i > 100:
break
i is being used to limit the results for testing purposes, because there's roughly 16,000 results being output and I don't need to see all of them for the purposes of what I'm trying to do right now.
The output I'm getting looks like this:
Prediction 1: [ 5.11678644e-02 9.48832154e-01 3.84762299e-37]
Prediction 2: [ 0.0352843 0.96471566 0. ]
Prediction 3: [ 1.04001068e-04 9.99895930e-01 0.00000000e+00]
Prediction 4: [ 0.0323724 0.96762753 0. ]
I know that these are probabilities of some kind, but I can't find documentation on what each one means. There's three per row, but only two categories, so I am guessing that one of them is a measure of certainty?
I realize that this is not strictly a programming question, rather it is more about output and documentation. However, before asking, I did look on other StackExchange sites and the TensorFlow documentation itself to try and find a better place to ask/an answer. The AI Stack Exchange website is still in beta and appears to have very little TensorFlow related activity (which is understandable, since many TensorFlow questions are programming questions), and I have had reasonable success on StackOverflow before when it comes to TF questions.
It looks like you are classifying between three categories (labels). So what you are seeing in the predictions is your networks weighted guess for each possible category (label). For instance, in your first prediction the network results can be interpreted as: there is ~5% chance of the data belonging to the first category (label), a ~95% chance of the data belonging to the second category (label), and ~0% chance of the data belonging to the third category (label).

Stan version of a JAGS model which includes a sum of discrete values - Is it possible?

I was trying to run this model in Stan. I have a running JAGS version of it (that returns highly autocorrelated parameters) and I know how to formulate it as CDF of a double exponential (with two rates), which would probably run without problems. However, I would like to use this version as a starting point for similar but more complex models.
By now I have the suspicion that a model like this is not possible in Stan. Maybe because of the discreteness introduces by taking the sum of a Boolean value, Stan may not be able to calculate gradients.
Does anyone know whether this is the case, or if I do something else in a wrong way in this model? I paste the errors I get below the model code.
Many thanks in advance
Jan
Model:
data {
int y[11];
int reps[11];
real soas[11];
}
parameters {
real<lower=0.001,upper=0.200> v1;
real<lower=0.001,upper=0.200> v2;
}
model {
real dif[11,96];
real cf[11];
real p[11];
real t1[11,96];
real t2[11,96];
for (i in 1:11){
for (r in 1:reps[i]){
t1[i,r] ~ exponential(v1);
t2[i,r] ~ exponential(v2);
dif[i,r] <- (t1[i,r]+soas[i]<=(t2[i,r]));
}
cf[i] <- sum(dif[i]);
p[i] <-cf[i]/reps[i];
y[i] ~ binomial(reps[i],p[i]);
}
}
Here is some dummy data:
psy_dat = {
'soas' : numpy.array(range(-100,101,20)),
'y' : [47, 46, 62, 50, 59, 47, 36, 13, 7, 2, 1],
'reps' : [48, 48, 64, 64, 92, 92, 92, 64, 64, 48, 48]
}
And here are the errors:
DIAGNOSTIC(S) FROM PARSER:
Warning (non-fatal): Left-hand side of sampling statement (~) contains a non-linear transform of a parameter or local variable.
You must call increment_log_prob() with the log absolute determinant of the Jacobian of the transform.
Sampling Statement left-hand-side expression:
get_base1(get_base1(t1,i,"t1",1),r,"t1",2) ~ exponential_log(...)
Warning (non-fatal): Left-hand side of sampling statement (~) contains a non-linear transform of a parameter or local variable.
You must call increment_log_prob() with the log absolute determinant of the Jacobian of the transform.
Sampling Statement left-hand-side expression:
get_base1(get_base1(t2,i,"t2",1),r,"t2",2) ~ exponential_log(...)
And at runtime:
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
stan::prob::exponential_log(N4stan5agrad3varE): Random variable is nan:0, but must not be nan!
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Rejecting proposed initial value with zero density.
Initialization between (-2, 2) failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model
Here is a working JAGS version of this model:
model {
for ( n in 1 : N ) {
for (r in 1 : reps[n]){
t1[r,n] ~ dexp(v1)
t2[r,n] ~ dexp(v2)
c[r,n] <- (1.0*((t1[r,n]+durs[n])<=t2[r,n]))
}
p[n] <- max((min(sum(c[,n]) / (reps[n]),0.99999999999999)), 1-0.99999999999999))
y[n] ~ dbin(p[n],reps[n])
}
v1 ~ dunif(0.0001,0.2)
v2 ~ dunif(0.0001,0.2)
}
With regard to the min() and max(): See this post https://stats.stackexchange.com/questions/130978/observed-node-inconsistent-when-binomial-success-rate-exactly-one?noredirect=1#comment250046_130978.
I'm still not sure what model you are trying to estimate (it would be best if you post the JAGS code) but what you have above cannot work in Stan. Stan is closer to C++ in the sense that you have to declare and then define objects. In your Stan program, you have the two declarations
real t1[11,96];
real t2[11,96];
but no definitions of t1 or t2. Consequently, they are initalized to NaN and when you do
t1[i,r] ~ exponential(v1);
that gets parsed as something like
for(i in 1:11) for(r in 1:reps[i]) lp__ += log(v1) - v1 * NaN
where lp__ is an internal symbol that holds value of the log-posterior, which becomes NaN and it cannot do Metropolis-style updates of the parameters.
Perhaps you meant for t1 and t2 to be unknown parameters, in which case they should be declared in the parameters block. The following [EDITED] Stan program is valid and should work, but it might not be the program you had in mind (it does not make a lot of sense to me and the discontinuity in dif will probably preclude Stan from sampling from this posterior distribution efficiently).
data {
int<lower=1> N;
int y[N];
int reps[N];
real soas[N];
}
parameters {
real<lower=0.001,upper=0.200> v1;
real<lower=0.001,upper=0.200> v2;
real t1[N,max(reps)];
real t2[N,max(reps)];
}
model {
for (i in 1:N) {
real dif[reps[i]];
for (r in 1:reps[i]) {
dif[r] <- (t1[i,r]+soas[i]) <= t2[i,r];
}
y[i] ~ binomial(reps[i], (1.0 + sum(dif)) / (1.0 + reps[i]));
}
to_array_1d(t1) ~ exponential(v1);
to_array_1d(t2) ~ exponential(v2);
}

cumulative simpson integration with scipy

I have some code which uses scipy.integration.cumtrapz to compute the antiderivative of a sampled signal. I would like to use Simpson's rule instead of Trapezoid. However scipy.integration.simps seems not to have a cumulative counterpart... Am I missing something? Is there a simple way to get a cumulative integration with "scipy.integration.simps"?
You can always write your own:
def cumsimp(func,a,b,num):
#Integrate func from a to b using num intervals.
num*=2
a=float(a)
b=float(b)
h=(b-a)/num
output=4*func(a+h*np.arange(1,num,2))
tmp=func(a+h*np.arange(2,num-1,2))
output[1:]+=tmp
output[:-1]+=tmp
output[0]+=func(a)
output[-1]+=func(b)
return np.cumsum(output*h/3)
def integ1(x):
return x
def integ2(x):
return x**2
def integ0(x):
return np.ones(np.asarray(x).shape)*5
First look at the sum and derivative of a constant function.
print cumsimp(integ0,0,10,5)
[ 10. 20. 30. 40. 50.]
print np.diff(cumsimp(integ0,0,10,5))
[ 10. 10. 10. 10.]
Now check for a few trivial examples:
print cumsimp(integ1,0,10,5)
[ 2. 8. 18. 32. 50.]
print cumsimp(integ2,0,10,5)
[ 2.66666667 21.33333333 72. 170.66666667 333.33333333]
Writing your integrand explicitly is much easier here then reproducing the simpson's rule function of scipy in this context. Picking intervals will be difficult to do when provided a single array, do you either:
Use every other value for the edges of simpson's rule and the remaining values as centers?
Use the array as edges and interpolate values of centers?
There are also a few options for how you want the intervals summed. These complications could be why its not coded in scipy.
Your question has been answered a long time ago, but I came across the same problem recently. I wrote some functions to compute such cumulative integrals for equally spaced points; the code can be found on GitHub. The order of the interpolating polynomials ranges from 1 (trapezoidal rule) to 7. As Daniel pointed out in the previous answer, some choices have to be made on how the intervals are summed, especially at the borders; results may thus be sightly different depending on the package you use. Be also aware that the numerical integration may suffer from Runge's phenomenon (unexpected oscillations) for high orders of polynomials.
Here is an example:
import numpy as np
from scipy import integrate as sp_integrate
from gradiompy import integrate as gp_integrate
# Definition of the function (polynomial of degree 7)
x = np.linspace(-3,3,num=15)
dx = x[1]-x[0]
y = 8*x + 3*x**2 + x**3 - 2*x**5 + x**6 - 1/5*x**7
y_int = 4*x**2 + x**3 + 1/4*x**4 - 1/3*x**6 + 1/7*x**7 - 1/40*x**8
# Cumulative integral using scipy
y_int_trapz = y_int [0] + sp_integrate.cumulative_trapezoid(y,dx=dx,initial=0)
print('Integration error using scipy.integrate:')
print(' trapezoid = %9.5f' % np.linalg.norm(y_int_trapz-y_int))
# Cumulative integral using gradiompy
y_int_trapz = gp_integrate.cumulative_trapezoid(y,dx=dx,initial=y_int[0])
y_int_simps = gp_integrate.cumulative_simpson(y,dx=dx,initial=y_int[0])
print('\nIntegration error using gradiompy.integrate:')
print(' trapezoid = %9.5f' % np.linalg.norm(y_int_trapz-y_int))
print(' simpson = %9.5f' % np.linalg.norm(y_int_simps-y_int))
# Higher order cumulative integrals
for order in range(5,8,2):
y_int_composite = gp_integrate.cumulative_composite(y,dx,order=order,initial=y_int[0])
print(' order %i = %9.5f' % (order,np.linalg.norm(y_int_composite-y_int)))
# Display the values of the cumulative integral
print('\nCumulative integral (with initial offset):\n',y_int_composite)
You should get the following result:
'''
Integration error using scipy.integrate:
trapezoid = 176.10502
Integration error using gradiompy.integrate:
trapezoid = 176.10502
simpson = 2.52551
order 5 = 0.48758
order 7 = 0.00000
Cumulative integral (with initial offset):
[-6.90203571e+02 -2.29979407e+02 -5.92267425e+01 -7.66415188e+00
2.64794452e+00 2.25594840e+00 6.61937372e-01 1.14797061e-13
8.20130517e-01 3.61254267e+00 8.55804341e+00 1.48428883e+01
1.97293221e+01 1.64257877e+01 -1.13464286e+01]
'''
I would go with Daniel's solution. But you need to be careful if the function that you are integrating is itself subject to fluctuations. Simpson's requires the function to be well-behaved (meaning in this case, one that is continuous).
There are techniques for making a moderately badly behaved function look like it is better behaved than it really is (really forms of approximation of your function) but in that case you have to be sure that the function "adequately" approximates yours. In that case you might make the intervals may be non-uniform to handle the problem.
An example might be in considering the flow of a field that, over longer time scales, is approximated by a well-behaved function but which over shorter periods is subject to limited random fluctuations in its density.