Three asterisks (***) in GAMS - gams-math

I have some questions about the three asterisks (***) in GAMS that may be shown at the end of an individual equation listing. I know they are a warning that the constraint is infeasible at the starting point. I have a model that after solving it by GAMS, the model status and solver status are ‘1’ and the equation seems to be feasible, but at the end of an equation three asterisks are shown.
I want to know:
1) What is the starting point?
2) Is the model infeasible?
I really appreciate your kind helps.

The starting point (or initial point) is comprised of the values of the decision variables just before the solver starts optimizing them. You can change the initial values by assigning say x.l(i) = 5. Note that the default in GAMS is zero (but may be projected to the closest bound just before the solve: variables will always between their bounds in the initial point).
These *** mean that the initial value is such that some equations are not feasible. In general this is not something to worry about. The solver will try to return a feasible and optimal point even if the initial point is not feasible. Sometimes we want to make sure the initial point is feasible for performance reasons, and then this part of the listing file can be a useful debugging tool.
Note that an infeasible initial point does not say anything about whether the model is infeasible.

Related

Multi-objective optimization but the function equation is unknown?

Firstly, I am totally out of my expertise zone so please bear with me.
I developed a fluid dynamic engine with 5 exposed parameters (say A,B,C,D,E). When you give this engine these 5 parameters, it does magic and give out a value 'Z'.
I want to write a script which can explore which combinations of A-E give lowest (or close to lowest) value of Z.
I know optimization algorithm exists, but from all of my search for examples, they use some function.
So I guess my function would simply be minimize Z? But where do A-E go?
Not really an answer, but some questions and ideas that might help you think through the best way to address this. We have no understanding of how big a range of values needs to be explored for those parameters, or how Z behaves, so this is very vague...
If you look at the values of Z for given values of A...E, does the value of Z jump around a lot for small changes on the parameter values, or does the Z value change reasonably smoothly?
If the Z value is not too eratic you could try some kind of gradient descent approach using calculated values of Z for some values of the parameters to approximate the gradient - suppose changing the value of 'A' from 1 to 2 gives a better change in the Z value than a similar size change in the other parameters, then try other values of A while keeping the other parameters fixed until you find a value of A that gives the best value of Z. Then try changing the other parameter values to see which one gives the steepest descent and try to find some better value for that parameter. Repeat this process until you can't find any improvement and you will have found a (local) minimum. You could then start at a different place in your parameter space and try again - you will probably find several local minima, and may just choose the best of those. Not provably optimal but may be good enough. Of course you can get clever and use things like conjugate gradients, Newton-Raphson or similar if Z is smooth enough.
If the Z values are very eratic, then you might have to just do some sampling of the possible combinations of A...E to get values of Z and choose the best you can find. Again you might do that in some systematic way (e.g. points on a grid in your parameter space) or entirely at random, or a combination of both.
If you find that there are 'clusters' of good solutions with similar values of the parameters then maybe some kind of local search would help - the idea is that there is often a better solution in the local neighbourhood of a known good solution. So maybe try perturbing your parameter values a bit from a known solution to see if that can lead to a better solution - either by some gradient descent method or by random sampling.
Unfortunately, if your Z calculation is complex, then any method using it as a black box will likely be slow as it will need to be re-evaluated many times.
You could use a Genetic Algorithm, where your chromosomes are formed with the 5 candidate values of the variables you have to optimize, to minimize Z, and your optimization/fitness "function" is the simulation itself outputting Z.
Other viable alternatives are Particle Swarm Optimization algorithm or Ant Colony Optimization. All of those are usable algortihms for that kind of optimization problem.

How to constrain dtw from dtw-python library?

Here is what I want to do:
keep a reference curve unchanged (only shift and stretch a query curve)
constrain how many elements are duplicated
keep both start and end open
I tried:
dtw(ref_curve,query_curve,step_pattern=asymmetric,open_end=True,open_begin=True)
but I cannot constrain how the query curve is stretched
dtw(ref_curve,query_curve,step_pattern=mvmStepPattern(10))
it didn’t do anything to the curves!
dtw(ref_curve,query_curve,step_pattern=rabinerJuangStepPattern(4, "c"),open_end=True, open_begin=True)
I liked this one the most but in some cases it shifts the query curve more than needed...
I read the paper (https://www.jstatsoft.org/article/view/v031i07) and the API but still don't quite understand how to achieve what I want. Any other options to constrain number of elements that are duplicated? I would appreciate your help!
to clarify: we are talking about functions provided by the DTW suite packages at dynamictimewarping.github.io. The question is in fact language-independent (and may be more suited to the Cross-validated Stack Exchange).
The pattern rabinerJuangStepPattern(4, "c") you have found does in fact satisfy your requirements:
it's asymmetric, and each step advances the reference by exactly one step
it's slope-limited between 1/2 and 2
it's type "c", so can be normalized in a way that allows open-begin and open-end
If you haven't already, check out dtw.rabinerJuangStepPattern(4, "c").plot().
It goes without saying that in all cases you are getting is the optimal alignment, i.e. the one with the least accumulated distance among all allowed paths.
As an alternative, you may consider the simpler asymmetric recursion -- as your first attempt above -- constrained with a global warping window: see dtw.window and the window_type argument. This provides constraints of a different shape (and flexible size), which might suit your specific case.
PS: edited to add that the asymmetricP2 recursion is also similar to RJ-4c, but with a more constrained slope.

Range of possible values for alpha, gamma and eta params of HLDA's Mallet implementation

I'm trying to run the hlda algorytmm and producing a descriptive hierarchy of the input documents. The problem is I'm running diverse parameters configs and trying to understand how it works in an "empirical way", because I can not match the ones that are being used in the original papers (I understand it's a different team). E.g. alpha in Mallet seems to be eta in the paper, but I'm not very sure. Besides, I can not know the boundaries for each of them. I mean, the range of possible values for each parameter.
In the source code, there is some help:
double alpha; // smoothing on topic distributions
double gamma; // "imaginary" customers at the next
double eta; // smoothing on word distributions.
First, I used the default values: alpha=10.0; gamma=1.0; eta = 0.1;
Then, I tryed running the algorythm by changing the values and interpret the results, but I can't understand the meaning of them. E.g. I think changing gamma (in Mallet) has an effect on the customers decition: to start a new node in the tree or to be placed in an existing one. So, if I set gamma = 0.5, less nodes should be produced, because 0.5 is half the probability of the default one, right? But the results with gamma=1 give me 87 nodes, and with gamma=0.5, it returns 98! And then, I'm asking me something new: is that a probability? I was trying to find the range of possible values in these two papers, but I didn't find them:
Hierarchical Topic Models andthe Nested Chinese Restaurant Process
The Nested Chinese Restaurant Process and BayesianNonparametric Inference of Topic Hierarchies
I know I could be missing something, because I don't have the a good background on this, but that's why I'm asking here, maybe someone already had this problem and can help me understanding those limits.
Thanks in advance!
It may be helpful to run multiple times with each hyperparameter setting. I suspect that gamma does not have a big influence on the final number of topics, and that what you are seeing could just be typical variability in the sampling process.
In my experience the parameter that has by far the strongest influence on the number of topics is actually eta, the topic-word smoothing.

z3 minimization and timeout

I try to use the z3 solver for a minimization problem. I was trying to get a timeout, and return the best solution so far. I use the python API, and the timeout option "smt.timeout" with
set_option("smt.timeout", 1000) # 1s timeout
This actually times out after about 1 second. However a larger timeout does not provide a smaller objective. I ended up turning on the verbosity with
set_option("verbose", 2)
And I think that z3 successively evaluates larger values of my objective, until the problem is satisfiable:
(opt.maxres [0:6117664])
(opt.maxres [175560:6117664])
(opt.maxres [236460:6117664])
(opt.maxres [297360:6117664])
...
(opt.maxres [940415:6117664])
(opt.maxres [945805:6117664])
...
I thus have the two questions:
Can I on contrary tell z3 to start with the upper bound, and successively return models with a smaller value for my objective function (just like for instance Minizinc annotations indomain_max http://www.minizinc.org/2.0/doc-lib/doc-annotations-search.html)
It still looks like the solver returns a satisfiable instance of my problem. How is it found? If it's trying to evaluates larger values of my objective successively, it should not have found a satisfiable instance yet when the timeout occurs...
edit: In the opt.maxres log, the upper bound never shrinks.
For the record, I found a more verbose description of the options in the source here opt_params.pyg
Edit Sorry to bother, I've beed diving into this recently once again. Anyway I think this might be usefull to others. I've been finding that I actually have to call the Optimize.upper method in order to get the upper bound, and the model is still not the one that corresponds to this upper bound. I've been able to add it as a new constraint, and call a solver (without optimization, just SAT), but that's probably not the best idea. By reading this I feel like I should call Optimize.update_upper after the solver times out, but the python interface has no such method (?). At least I can get the upper bound, and the corresponding model now (at the cost of unneccessary computations I guess).
Z3 finds solutions for the hard constraints and records the current values for the objectives and soft constraints. The last model that was found (the last model with the so-far best value for the objectives) is returned if you ask for a model. The maxres strategy mainly improves the lower bounds on the soft constraints (e.g., any solution must have cost at least xx) and whenever possible improves the upper bound (the optional solution has cost at most yy). The lower bounds don't tell you too much other than narrowing the range of possible optimal values. The upper bounds are available when you timeout.
You could try one of the other strategies, such as the one called "wmax", which
performs a branch-and-prune. Typically maxres does significantly better, but you may have better experience (depending on the problems) with wmax for improving upper bounds.
I don't have a mode where you get a stream of models. It is in principle possible, but it would require some (non-trivial) reorganization. For Pareto fronts you make successive invocations to Optimize.check() to get the successive fronts.

Building ranking with genetic algorithm,

Question after BIG edition :
I need to built a ranking using genetic algorithm, I have data like this :
P(a>b)=0.9
P(b>c)=0.7
P(c>d)=0.8
P(b>d)=0.3
now, lets interpret a,b,c,d as names of football teams, and P(x>y) is probability that x wins with y. We want to build ranking of teams, we lack some observations P(a>d),P(a>c) are missing due to lack of matches between a vs d and a vs c.
Goal is to find ordering of team names, which the best describes current situation in that four team league.
If we have only 4 teams than solution is straightforward, first we compute probabilities for all 4!=24 orderings of four teams, while ignoring missing values we have :
P(abcd)=P(a>b)P(b>c)P(c>d)P(b>d)
P(abdc)=P(a>b)P(b>c)(1-P(c>d))P(b>d)
...
P(dcba)=(1-P(a>b))(1-P(b>c))(1-P(c>d))(1-P(b>d))
and we choose the ranking with highest probability. I don't want to use any other fitness function.
My question :
As numbers of permutations of n elements is n! calculation of probabilities for all
orderings is impossible for large n (my n is about 40). I want to use genetic algorithm for that problem.
Mutation operator is simple switching of places of two (or more) elements of ranking.
But how to make crossover of two orderings ?
Could P(abcd) be interpreted as cost function of path 'abcd' in assymetric TSP problem but cost of travelling from x to y is different than cost of travelling from y to x, P(x>y)=1-P(y<x) ? There are so many crossover operators for TSP problem, but I think I have to design my own crossover operator, because my problem is slightly different from TSP. Do you have any ideas for solution or frame for conceptual analysis ?
The easiest way, on conceptual and implementation level, is to use crossover operator which make exchange of suborderings between two solutions :
CrossOver(ABcD,AcDB) = AcBD
for random subset of elements (in this case 'a,b,d' in capital letters) we copy and paste first subordering - sequence of elements 'a,b,d' to second ordering.
Edition : asymetric TSP could be turned into symmetric TSP, but with forbidden suborderings, which make GA approach unsuitable.
It's definitely an interesting problem, and it seems most of the answers and comments have focused on the semantic aspects of the problem (i.e., the meaning of the fitness function, etc.).
I'll chip in some information about the syntactic elements -- how do you do crossover and/or mutation in ways that make sense. Obviously, as you noted with the parallel to the TSP, you have a permutation problem. So if you want to use a GA, the natural representation of candidate solutions is simply an ordered list of your points, careful to avoid repitition -- that is, a permutation.
TSP is one such permutation problem, and there are a number of crossover operators (e.g., Edge Assembly Crossover) that you can take from TSP algorithms and use directly. However, I think you'll have problems with that approach. Basically, the problem is this: in TSP, the important quality of solutions is adjacency. That is, abcd has the same fitness as cdab, because it's the same tour, just starting and ending at a different city. In your example, absolute position is much more important that this notion of relative position. abcd means in a sense that a is the best point -- it's important that it came first in the list.
The key thing you have to do to get an effective crossover operator is to account for what the properties are in the parents that make them good, and try to extract and combine exactly those properties. Nick Radcliffe called this "respectful recombination" (note that paper is quite old, and the theory is now understood a bit differently, but the principle is sound). Taking a TSP-designed operator and applying it to your problem will end up producing offspring that try to conserve irrelevant information from the parents.
You ideally need an operator that attempts to preserve absolute position in the string. The best one I know of offhand is known as Cycle Crossover (CX). I'm missing a good reference off the top of my head, but I can point you to some code where I implemented it as part of my graduate work. The basic idea of CX is fairly complicated to describe, and much easier to see in action. Take the following two points:
abcdefgh
cfhgedba
Pick a starting point in parent 1 at random. For simplicity, I'll just start at position 0 with the "a".
Now drop straight down into parent 2, and observe the value there (in this case, "c").
Now search for "c" in parent 1. We find it at position 2.
Now drop straight down again, and observe the "h" in parent 2, position 2.
Again, search for this "h" in parent 1, found at position 7.
Drop straight down and observe the "a" in parent 2.
At this point note that if we search for "a" in parent one, we reach a position where we've already been. Continuing past that will just cycle. In fact, we call the sequence of positions we visited (0, 2, 7) a "cycle". Note that we can simply exchange the values at these positions between the parents as a group and both parents will retain the permutation property, because we have the same three values at each position in the cycle for both parents, just in different orders.
Make the swap of the positions included in the cycle.
Note that this is only one cycle. You then repeat this process starting from a new (unvisited) position each time until all positions have been included in a cycle. After the one iteration described in the above steps, you get the following strings (where an "X" denotes a position in the cycle where the values were swapped between the parents.
cbhdefga
afcgedbh
X X X
Just keep finding and swapping cycles until you're done.
The code I linked from my github account is going to be tightly bound to my own metaheuristics framework, but I think it's a reasonably easy task to pull the basic algorithm out from the code and adapt it for your own system.
Note that you can potentially gain quite a lot from doing something more customized to your particular domain. I think something like CX will make a better black box algorithm than something based on a TSP operator, but black boxes are usually a last resort. Other people's suggestions might lead you to a better overall algorithm.
I've worked on a somewhat similar ranking problem and followed a technique similar to what I describe below. Does this work for you:
Assume the unknown value of an object diverges from your estimate via some distribution, say, the normal distribution. Interpret your ranking statements such as a > b, 0.9 as the statement "The value a lies at the 90% percentile of the distribution centered on b".
For every statement:
def realArrival = calculate a's location on a distribution centered on b
def arrivalGap = | realArrival - expectedArrival |
def fitness = Σ arrivalGap
Fitness function is MIN(fitness)
FWIW, my problem was actually a bin-packing problem, where the equivalent of your "rank" statements were user-provided rankings (1, 2, 3, etc.). So not quite TSP, but NP-Hard. OTOH, bin-packing has a pseudo-polynomial solution proportional to accepted error, which is what I eventually used. I'm not quite sure that would work with your probabilistic ranking statements.
What an interesting problem! If I understand it, what you're really asking is:
"Given a weighted, directed graph, with each edge-weight in the graph representing the probability that the arc is drawn in the correct direction, return the complete sequence of nodes with maximum probability of being a topological sort of the graph."
So if your graph has N edges, there are 2^N graphs of varying likelihood, with some orderings appearing in more than one graph.
I don't know if this will help (very brief Google searches did not enlighten me, but maybe you'll have more success with more perseverance) but my thoughts are that looking for "topological sort" in conjunction with any of "probabilistic", "random", "noise," or "error" (because the edge weights can be considered as a reliability factor) might be helpful.
I strongly question your assertion, in your example, that P(a>c) is not needed, though. You know your application space best, but it seems to me that specifying P(a>c) = 0.99 will give a different fitness for f(abc) than specifying P(a>c) = 0.01.
You might want to throw in "Bayesian" as well, since you might be able to start to infer values for (in your example) P(a>c) given your conditions and hypothetical solutions. The problem is, "topological sort" and "bayesian" is going to give you a whole bunch of hits related to markov chains and markov decision problems, which may or may not be helpful.