I am interested in Z3 giving me a model and at the same time I would be happy if it tried to take an objective function into account as a heuristic, but I don't want to pay the performance penalty of finding the actual (max|min)imum.
Is this possible in Z3?
This is already possible in Z3 with soft time outs, see this answer: Z3 Time Restricted Optimization
The way to achieve what you want is to use incremental solving:
Assert all your hard constraints. (i.e., those that need to be satisfied)
Do a check-sat and get the model
Assert all your "optimization" constraints.
Use a soft-time out, as described here: Z3 Time Restricted Optimization
In the last step, you can adjust how much you want to wait, i.e., the penalty you are willing to pay.
Related note: You might also want to play with soft-constraints, which the solver can "satisfy" optionally, incurring penalties if it doesn't. Perhaps that's more appropriate for your use case. See here on how to do that: https://rise4fun.com/Z3/tutorialcontent/optimization#h23
Related
I have to solve a linear program with a very large number of constraints. I use MOSEK (mosekopt, with MSK_IPAR_INTPNT_BASIS set equal to MSK_BI_NEVER to save time).
The solver takes time to solve the program due to the large dimension.
I thought about manually coding the following iterative procedure:
Take a random subset of constraints and solve the restricted linear program.
If a solution of the restricted linear program does not exist, stop.
If a solution of the restricted linear program exists, check if it is a solution of the original linear program. If yes, stop. If not, repeat from 1. with a larger set of constraints that includes the constraints of this iteration.
The procedure does not seem to produce a notable saving of time. I wonder whether this is because 1.,2.,3. are essentially what the solver does without needing my input. Could you advise?
Could I do improve things if, when moving from 3. to 1., I supply to mosekopt the old solution of the restricted linear program?
This may or may not be faster, than using Mosek on the complete problem. At least theoretical your approach is inferior.
You say nothing of the dimension of the problem that would be interesting to know.
Or how long it takes to solve the complete problem.
One issue tricky is how many and which constraints you are adding in 3. That will be very important.
I ran into some slow downs on a tight loop today caused by an If statement, which surprised me some because I expected branch prediction to successfully pipeline the particular statement to minimize the cost of the conditional.
When I sat down to think more about why it wasn't better handled I realized I didn't know much about how branch prediction was being handled at all. I know the concept of branch prediction quite well and it's benefits, but the problem is that I didn't know who was implementing it and what approach they were utilizing for predicting the outcome of a conditional.
Looking deeper I know branch prediction can be done at a few levels:
Hardware itself with instruction pipelining
C++ style compiler
Interpreter of interpreted language.
half-compiled language like java may do two and three above.
However, because optimization can be done in many areas I'm left uncertain as to how to anticipate branch prediction. If I'm writing in Java, for example, is my conditional optimized when compiled, when interpreted, or by the hardware after interpretation!? More interesting, does this mean if someone uses a different runtime enviroment? Could a different branch prediction algorithm used in a different interpreter result in a tight loop based around a conditional showing significant different performance depending on which interpreter it's run with?
Thus my question, how does one generalize an optimization around branch prediction if the software could be run on very different computers which may mean different branch prediction? If the hardware and interpreter could change their approach then profiling and using whichever approach proved fastest isn't a guarantee. Lets ignore C++ where you have compile level ability to force this, looking at the interpreted languages if someone still needed to optimize a tight loop within them.
Are there certain presumptions that are generally safe to make regardless of interpreter used? Does one have to dive into the intricate specification of a language to make any meaningful presumption about branch prediction?
Short answer:
To help improve the performance of the branch predictor try to structure your program so that conditional statements don't depend on apparently random data.
Details
One of the other answers to this question claims:
There is no way to do anything at the high level language to optimize for branch prediction, caching sure, sometimes you can, but branch prediction, no not at all.
However, this is simply not true. A good illustration of this fact comes from one of the most famous questions on Stack Overflow.
All branch predictors work by identifying patterns of repeated code execution and using this information to predict the outcome and/or target of branches as necessary.
When writing code in a high-level language it's typically not necessary for an application programmer to worry about trying to optimizing conditional branches. For instance gcc has the __builtin_expect function which allows the programmer to specify the expected outcome of a conditional branch. But even if an application programmer is certain they know the typical outcome of a specific branch it's usually not necessary to use the annotation. In a hot loop using this directive is unlikely to help improve performance. If the branch really is strongly biased the the predictor will be able to correctly predict the outcome most of the time even without the programmer annotation.
On most modern processors branch predictors perform incredibly well (better than 95% accurate even on complex workloads). So as a micro-optimization, trying to improve branch prediction accuracy is probably not something that an application programmer would want to focus on. Typically the compiler is going to do a better job of generating optimal code that works for the specific hardware platform it is targeting.
But branch predictors rely on identifying patterns, and if an application is written in such a way that patterns don't exist, then the branch predictor will perform poorly. If the application can be modified so that there is a pattern then the branch predictor has a chance to do better. And that is something you might be able to consider at the level of a high-level language, if you find a situation where a branch really is being poorly predicted.
branch prediction like caching and pipelining are things done to make code run faster in general overcoming bottlenecks in the system (super slow cheap dram which all dram is, all the layers of busses between X and Y, etc).
There is no way to do anything at the high level language to optimize for branch prediction, caching sure, sometimes you can, but branch prediction, no not at all. in order to predict, the core has to have the branch in the pipe along with the instructions that preceed it and across architectures and implementations not possible to find one rule that works. Often not even within one architecture and implementation from the high level language.
you could also easily end up in a situation where tuning for branch predictions you de-tune for cache or pipe or other optimizations you might want to use instead. and the overall performance first and foremost is application specific then after that something tuned to that application, not something generic.
For as much as I like to preach and do optimizations at the high level language level, branch prediction is one that falls into the premature optimization category. Just enable it it in the core if not already enabled and sometimes it saves you a couple of cycles, most of the time it doesnt, and depending on the implementation, it can cost more cycles than it saves. Like a cache it has to do with the hits vs misses, if it guesses right you have code in a faster ram sooner on its way to the pipe, if it guesses wrong you have burned bus cycles that could have been used by code that was going to be run.
Caching is usually a benefit (although not hard to write high level code that shows it costing performance instead of saving) as code usually runs linearly for some number of instructions before branching. Likewise data is accessed in order often enough to overcome the penalties. Branching is not something we do every instruction and where we branch to does not have a common answer.
Your backend could try to tune for branch prediction by having the pre-branch decisions happen a few cycles before the branch but all within a pipe size and tuned for fetch line or cache line alignments. again this messes with tuning for other features in the core.
I have seen that Z3 supports optimization via e.g. assert-soft. From what I understood, if given sufficient time, Z3 will report the optimal solution for a given SMT formula.
However, I am interested if it is possible to run Z3 for a limited amount of time and have it report the best solution it can find (which does not necessarily mean it is the optimal solution).
If I run Z3 on a SMT formula and restrict the time (via parameter -T), it will just report 'timeout' if it did not solve it optimally. I read that the default wmax solver uses a simple procedure to bounds weights and am curious to whether it is possible to bound the weights from an upper bound, rather than a lower bound.
The timeout option -T causes the process to terminate so it does not return any intermediate values. If you use -t option (soft timeout), then the process does not terminate. Instead Z3 will stop the search at some point where it checks for cancellation. It then produces the best answer so far. It corresponds to setting a cancellation state. I hope this will work for you.
When looking at Genetic programming papers, it seems to me that the number of test cases is always fixed. However, most mutations should (?) at every stage of the execution be very deleterious, i. e. make it obvious after one test case that the mutated program performs much worse than the previous one. What happens if you, at first, only try very few (one?) test case and look whether the mutation makes any sense?
Is it maybe so that different test cases test for different features of the solutions, and one mutation will probably improve only one of those features?
I don't know if I agree with your assumption that most mutations should be very deleterious, but you shouldn't care even if they were. Your goal is not to optimize the individuals, but to optimize the population. So trying to determine if a "mutation makes any sense" is exactly what genetic programming is supposed to do: i.e. eliminate mutations that "don't make sense." Your only "guidance" for the algorithm should come through the fitness function.
I'm also not sure what you mean with "test case", but for me it sounds like you are looking for something related to multi-objective-optimization (MOO). That means you try to optimize a solution regarding different aspects of the problem - therefore you do not need to mutate/evaluate a population for a specific test-case, but to find a multi objective fitness function.
"The main idea in MOO is the notion of Pareto dominance" (http://www.gp-field-guide.org.uk)
I think this is a good idea in theory but tricky to put into practice. I can't remember seeing this approach actually used before but I wouldn't be surprised if it has.
I presume your motivation for doing this is to improve the efficiency of the applying the fitness function - you can stop evaluation early and discard the individual (or set fitness to 0) if the tests look like they're going to be terrible.
One challenge is to decide how many test cases to apply; discarding an individual after one random test case is surely not a good idea as the test case could be a real outlier. Perhaps terminating evaluation after 50% of test cases if the fitness of the individual was <10% of the best would probably not discard any very good individuals; on the other hand it might not be worth it given a lot of individuals will be of middle-of-the road fitness and might well only save a small proportion of the computation. You could adjust the numbers so you save more effort, but the more effort you try to save the more chances you have of genuinely good individuals being discarded by accident.
Factor in the extra time to taken to code this and possible bugs etc. and I shouldn't think the benefit would be worthwhile (unless this is a research project in which case it might be interesting to try it and see).
I think it's a good idea. Fitness evaluation is the most computational intense process in GP, so estimating the fitness value of individuals in order to reduce the computational expense of actually calculating the fitness could be an important optimization.
Your idea is a form of fitness approximation, sometimes it's called lazy evaluation (try searching these words, there are some research papers).
There are also distinct but somewhat overlapping schemes, for instance:
Dynamic Subset Selection (Chris Gathercole, Peter Ross) is a method to select a small subset of the training data set on which to actually carry out the GP algorithm;
Segment-Based Genetic Programming (Nailah Al-Madi, Simone Ludwig) is a technique that reduces the execution time of GP by partitioning the dataset into segments and using the segments in the fitness evaluation process.
PS also in the Brood Recombination Crossover (Tackett) child programs are usually evaluated on a restricted number of test cases to speed up the crossover.
Can someone explain to me why Verlet integration is better than Euler integration? And why RK4 is better than Verlet? I don't understand why it is a better method.
The Verlet method is is good at simulating systems with energy conservation, and the reason is that it is symplectic. In order to understand this statement you have to describe a time step in your simulation as a function, f, that maps the state space into itself. In other words each timestep can be written on the following form.
(x(t+dt), v(t+dt)) = f(x(t),v(t))
The time step function, f, of the Verlet method has the special property that it conserves state-space volume. We can write this in mathematical terms. If you have a set A of states in the state space, then you can define f(A) by
f(A) = {f(x)| for x in A}
Now let us assume that the sets A and f(A) are smooth and nice so we can define their volume. Then a symplectic map, f, will always fulfill that the volume of f(A) is the same as the volume of A. (and this will be fulfilled for all nice and smooth choices of A). This is fulfilled by the time step function of the Verlet method, and therefore the Verlet method is a symplectic method.
Now the final question is. Why is a symplectic method good for simulating systems with energy conservation, but I am afraid that you will have to read a book to understand this.
The Euler method is a first order integration scheme, i.e. the total error is proportional to the step size. However, it can be numerically unstable, in other words, the accumulated error can overwhelm the calculation giving you nonsense. Please note, this instability can occur regardless of how small you make the step size or whether the system is linear or not. I am not familiar with verlet integration, so I can not speak to its efficacy. But, the Runge-Kutta methods differ from the Euler method in more than just step size.
In essence, they are based on a better way of numerically approximating the derivative. The precise details escape me at the moment. In general, the fourth order Runge-Kutta method is considered the workhorse of the integration schemes, but it does have some disadvantages. It is slightly dissipative, i.e. a small first derivative dependent term is added to your calculation which resembles an added friction. Also, it has a fixed step size which can result can make it difficult to achieve the accuracy you desire. Alternatively, you can use an adaptive stepsize scheme, like the Runge-Kutta-Fehlberg method, which gives fifth order accuracy for an additional 6 function evaluations. This can greatly reduce the time necessary to perform your calculation while improving accuracy, as shown here.
If everything just coasts along in a linear way, it wouldn't matter what method you used, but when something interesting (i.e. non-linear) happens, you need to look more carefully, either by considering the non-linearity directly (verlet) or by taking smaller timesteps (rk4).