Benchmark vs Solver - same data, different result - optaplanner

Currently, we are implementing timetable planning with optaplanner - overall works great! But we are trying to do some improvements on how our solver works - try to use different algorithms etc. So we used benchmark with simple config: common heuristic phase, and than HILL_CLIMBING, LATE_ACCEPTANCE and TABU_SEARCH and this are results
Benchmark:
HILL: 0hard/-5medium/-5soft
LATE_ACCEPTANCE: 0hard/-5medium/-126soft
TABU: 0hard/-7medium/-4soft
At this where is starting to be tricky - I'm coping solver configuration and using the same data set and I have very different results:
Solver with the same dataset:
HILL: 0hard/-11medium/-7soft
LATE_ACCEPTANCE: 0hard/-5medium/-121soft
TABU: 0hard/-11medium/-18soft
So it seems that only LATE_ACCEPTANCE is close to benchmark - but others are way off - any idea why its behave like that?

Assuming that both the solver and benchmark use the default, REPRODUCIBLE environment mode, might it be caused by different termination conditions?
Note that even if you use the same time-based termination, it may not be fully reproducible due to context switching. To make sure every run with the same configuration ends up with exactly the same score, you can use a step-based termination.
Please check the INFO-level logging; each phase reports there the best attained score and the number of steps it took.

Related

GUROBI only uses single core to setup problem with cvxpy (python)

I have a large MILP that I build with cvxpy and want to solve with GUROBI. When I give use the solve() function of cvxpy it take a really really really long time to setup and does not start solving for hours. Whilest doing that only 1 core of my cluster is being used. It is used for 100%. I would like to use multiple cores to build the model so that the process of building the model does not take so long. Running grbprobe also shows that gurobi knows about the other cores and for solving the problem it uses multiple cores.
I have tried to run with different flags i.e. turning presolve off and on or giving the number of Threads to be used (this seemed like i didn't even for the solving.
I also have reduce the number of constraints in the problem and it start solving much faster which means that this is definitively not a problem of the model itself.
The problem in it's normal state should have 2200 constraints i reduce it to 150 and it took a couple of seconds until it started to search for a solution.
The problem is that I don't see anything since it takes so long to get the ""set username parameters"" flag and I don't get any information on what the computer does in the mean time.
Is there a way to tell GUROBI or CVXPY that it can take more cpus for the build-up?
Is there another way to solve this problem?
Sorry. The first part of the solve (cvxpy model generation, setup, presolving, scaling, solving the root, preprocessing) is almost completely serial. The parallel part is when it really starts working on the branch-and-bound tree. For many problems, the parallel part is by far the most expensive, but not for all.
This is not only the case for Gurobi. Other high-end solvers have the same behavior.
There are options to do less presolving and preprocessing. That may get you earlier in the B&B. However, usually, it is better not to touch these options.
Running things with verbose=True may give you more information. If you have more detailed questions, you may want to share the log.

How do I handle variability of output in Anylogic?

I have been working on a simulation model for battery swapping in Anylogic. So far I have developed the simulation model, optimization experiment and parameters variation experiment.
There are no errors in the model but the output values are unsatisfactory. Small changes such as changing the step size of the decision variables results in a drastic change in the best value obtained after every experiment. Though the objective does not change much but I am concerned about the other variables that are changing with each run. Even with multiple optimization runs it is difficult to come to a conclusion.
For reference I am posting an output of parameters variation experiment here. I ran the experiment with an optimized value but I was getting feasible results (percentile > 95%) far off the expected input values. Although, the overall result is correct (decreasing percentile with increasing charging time) but it is difficult to understand the variability.
Can anyone help?enter image description here
When building a model, this is a common problem you will have when looking at high level overall outputs. You could have a model bug, but it is just as likely (if not more likely) that there is some dynamic to your system that was not clear in simple Excel spreadsheets or mental models. The DES may be telling us something truly interesting about the system behavior, but without additional outputs, there is no way to understand what that is.
A few suggestions:
Run this as a simple single scenario, where you manually update inputs. When you run this with the low range of input values and then the high range of input values, what do you see on the animation or additional outputs that is different than you expected or could explain the overall output trend? Try running several intermediate points.
Add additional output metrics. If you look at queue sizes, resource utilizations, turn-around-times, etc; do you see anything at that level that is different than expected?
Add a "replication" log. When you run a set of inputs for multiple scenarios, does any single replication stand out as an outlier? If so, re-run the scenario with that set of inputs and that random seed.
There is no substitute for understanding underlying system behavior, and without understanding those dynamics, looking at overall correlation with optimization or parameter variation experiments will often lead companies to make the wrong policies decisions.

Optaplanner: Reproducible solution

I am trying to solve a problem similar to employee rostering. The problem I am facing is every time I run the solver, it generates a different assignment. This makes it harder to debug why a particular case was picked over another. Why is this the case?
P.S. My assignment has many hard constraint and all of them may not be satisfied (most cases I still see some negative hard score). So my termination strategy is based on unimprovedSecondsSpentLimit. Could this be the reason?
Yes, it's likely the termination. OptaPlanner's default environmentMode guarantees the exact same solution at the exact same step (*). But CPU cycles differ a lot from run to run, so that means you get more or less steps per run. Use DEBUG logging to see that.
Use stepCountLimit or unimprovedStepCountLimit termination.
(*) Unless specified otherwise in the docs. Simulated Annealing for example will be different even in the exact same step if used with time bound terminations.

OptaPlanner: Gaps in Chained Through Time Pattern

I'm just starting learning to use OptaPlanner recently. Please pardon me if there is any technically inaccurate description below.
Basically, I have a problem to assign several tasks on a bunch of machines. Tasks have some precedence restrictions such that some task cannot be started before the end of another task. In addition, each task can only be run on certain machines. The target is to minimize the makespan of all these tasks.
I modeled this problem with Chained Through Time Pattern in which each machine is the anchor. But the problem is that tasks on certain machine might not be executed sequentially due to the precedence restriction. For example, Task B can only be started after Task A completes while Tasks A and B are executed on machines I and II respectively. This means during the execution of Task A on machine I, if there is no other task that can be run on machine II, then machine II can only keep idle until Task A completes at which point Task B could be started on it. This kind of gap is not deterministic as it depends on the duration of Task A with respect to this example. According to the tutorial of OptaPlanner, it seems that additional planning variable gaps should be introduced for this kind of problem. But I have difficulty in modeling this gap variable now. In general, how to integrate the gap variable in the model using Chained Through Time Pattern? Some detailed explanation or even a simple example would be highly appreciated.
Moreover, I'm actually not sure whether chained through time pattern is suitable for modeling this kind of task assigning problem or I just used an entirely inappropriate method. Could someone please shed some light on this? Thanks in advance.
I'am using chained through time pattern to solve the same question as yours.And to solve the precedence restriction you can write drools rules.

Selenium GRID vs TestNG parallel

This topic is the beginning of the answer I am looking for. I need to know more.
Short story:
Why use GRID if pure TestNG parallel execution seems to work just fine?
Long story:
Background:
We are running about 40 tests now, growing.
We only use one browser (chrome).
To make tests faster we do parallel testing (makes sense).
We face issues configuring GRID solution,
in many cases we just drop it and run pure testNG parallel.
Question:
I need to know if it even makes sense to be so stubborn on that
whole GRID. For now it only seems to consume time without giving any
additional value.
My own thoughts:
The only thing i can think of to justify GRID is running the tests
using different machines. If we would need to actually balance the
load on several servers. But at this point even my own laptop is
doing the job just perfectly. This situation will not change
dramatically in nearest future, so why bother?
The link mentioned above claims the results of the no-grid parallel
tests may become unpredictable. We do not face that. So the question
may be: in what sense unpredictable? What to watch out for?
Thanks in advance for your help.
cheers,
Greg
The Grid mimics as a load balancer and distribute tests to nodes according to the desired capabilities. While the parallel attribute in testNG xml is just instructing the testNGrunner to trigger n number of tests at one go.
CAVEAT : If you do not use grid for parallel test execution, your single host will get overloaded as you scale up the thread-count. The results of the no-grid parallel tests may become unpredictable because multiple sessions will fill up the heap memory quickly. A general purpose computer has limited Heap memory . You are not facing this issue ,may be because you did not hit that limit.
Lets consider some examples:
Your target is to check functionality on windows as well as on MAC. Without grid you will run the cases twice.
You got a test case where a functionality breaks at older version of a browser and now its time for regression test. Without grid you will be running test cases multiple times for each browser's older version.
A case that is dependent on different screen resolutions.
Grid can simplify the effort for configuration.
Its just about making the time as much minimal as possible for running number of test cases.