Optaplanner and real time replanning without simple backup planning, minimising changes - optaplanner

If I have the following situation - a kind of "Travelling Technician" problem modeled on the vehicle routing but instead of vehicles its is technicians traveling to sites.
We want to:
generate a plan for the week ahead
send that plan to each of the technicians and sites with who is visiting, why and when
So far all ok, we generate the plan for the week..
But on Tuesday a technician phones in ill (or at 11:30 the technicians car breaks down). Assume we do not have a backup (so simple backup planning will not work). How can I redo the plan minimising any changes? Basically keeping the original plan constraints but adding a constraint that rewards keeping as close to the original plan as possible and minimising the number of customers that we upset.

Yes, basically every Entity has an extra field which holds the original planning variable value. That extra field is NOT a planning variable itself. Then you add rules which says that if the plannign variable != original value, it inflicts a certain soft cost. The higher the soft cost, the less volatile your schedule is. The lower the soft cost, the more flexible your schedule is towards the new situation.
See the MachineReassignment example for an example implementation. That actually has 3 types of these soft costs.

Related

Is there an example of an optaplanner vrp model that minimizes cost per unit?

I have a vrp variant that minimizes cost for a set of liquid deliveries. I have been asked to minimize cost per unit instead.
The costs are: hourly vehicle costs from standstill, back to depot (just the time to previous standstill and time to depot from the VRP example, multiplied by the vehicle hourly rate), plus the cost of product.
The amount delivered varies depending on the solution, but can be calculated by doing a sum of the deliveries of each vehicle.
So I have three streams for costs and one for unit count. Is there a way to join them and divide the two sums? Or is a shadow variable the only way to do it?
For a shadow variable method, I would add "cost"to each customer and then have a single constraint that replaces all the soft constraints that looks like:
protected Constraint costPerUnit(ConstraintFactory factory) {
return factory.forEach(Customer.class)
.groupBy( c->sumLong(c.getCost()), sumLong(c.getLitres))
.penalizeLong(
HardSoftLongScore.ONE_SOFT,
(cost, amount) -> cost / amount)
.asConstraint("costOfProduct");
}
It seems like it would be very slow though.
edit: thinking about this some more, is there a performance reason for using constraint streams instead of just calculating the score in listerners and then using one simple constraint stream rule for all soft constraints?
Even though, with a lot of care and attention, you could probably implement a very fast listener to tackle this sort of problem, I doubt it would be as fast as a properly incremental solution.
Now does that solution need to be implemented using Constraint Streams? No. For small problems, EasyScoreCalculator will be, well, easy - but for small problems, you wouldn't need OptaPlanner. For problems large in size but easy in how the score is calculated, you may want to look into IncrementalScoreCalculator - those are tricky to implement, but once you get it right, there is no way you could be any faster. Well-designed incremental calculators routinely beat Constraint Streams in terms of performance.
The main benefit of Constraint Streams is good performance without the need for complex code. And you get constraint justifications, and therefore score explanations. The downside is you have to learn to think in the API, and there is some overhead. It's up to you to weigh these factors against each other and make the choice that is best for your particular problem.

Using Optaplanner to solve VRPTW with large number of customers and sophisticated constraints

I'm developing a solver for a VRPTW problem using the OptaPlanner and I have faced a problem when large number of customers need to be serviced. By the large number I mean up to 10,000 customers. I have tried running a solver for about 48 hours but no feasible solution was ever reached.
I use a highly customized VRPTW domain model that introduces additional planning entity so-called "Workbreak". Workbreaks are like customers but they can have a location that is actually another planning value - because every day a worker can return home or go to the hotel. Workbreaks have fixed time of departure (usually next day morning), and a variable time of arrival (because it depends on the previous entity within a chain). A hard constraint cares about not allowing to "arrive" to the Workbreak after certain point of time. There are other hard constraints too, like:
multiple service time windows per customer
every week the last customer in chain must be a special customer "storage space visit" (workers need to gather materials before the next week)
long jobs management (when a customer needs to be serviced longer than specified time it should be serviced before specific hour of a day)
max number of jobs per workday
max total job duration per workday (as worker cannot work longer than specified time)
a workbreak cannot have a location of a hotel that is too close to worker's home.
jobs can not be serviced on Sundays
... and many more - there is a total number of 19 hard constrains that have to be applied. There are 3 soft constraints too.
All the aforementioned constraints were initially written as Drools rules, but because of many accumulation-based constraints (max jobs per day, max hours per day, overtime hours per week) the overall speed of the solver (benchmarks) was about 400 step/sec.
At first I thought that solver's speed is too slow to reach a feasible solution in a reasonable time, so I have rewritten all rules into easy score calculator, and it had a decent speed - about 4600 steps/sec. I knew that is will only perform best for a really small number of customers, but I wanted to know if the Drools was the cause of that poor performance. Then I have rewritten all these rules into incremental score calculator (and survived the pain of corrupted score bugs until all of them were successfully fixed). Surprisingly incremental score calculation is a bit slower for a small number of customers, comparing to easy score calculator, but it is not an issue, because overall speed is about 4000 steps/sec - no matter how many entities I have.
The thing that bugs me the most is that above a certain number of customers (problems start at 1000 customers) the solver cannot reach feasible solution. Currently I'm using Late Acceptance and Step Counting algorithms, because they perform really good for this kind of a problem (at least for a less number of customers). I used Simulated Annealing too, but without success, mostly because I could not find good values for algorithm specific parameters.
I have implemented some custom moves too:
Composite move that changes workbreak's location when sibling entities are changed using other moves like change/swap moves (it helps escaping many score traps, as improving step usually needs at least two moves to be performed in a single step)
Move factory for better long jobs assignment (it generates moves that tries to put customers with longer service time in the front of a workday chain)
Workbreak assignment move factory (it generates moves that helps putting workbreaks in proper sequence)
Now I'm scratching my head, and wondering what I should do to diagnose the source of my problem. I suspected that maybe it was hitting a score trap, but I have modified the solver so it saves snapshots of best score each minute. After reading these snapshots I realized that the score was still decreasing. Can the number of hard constraints play the role? I suspect that many moves need to be performed to find out a move that improves the score. The fact is that maybe 48 hours isn't that much for this kind of a problem, and it should make computations a whole week? Unfortunately I have nothing to compare with.
I would like to know how to find out if it is solely a performance problem, or a solver (algorithm, custom moves, hard/soft score) configuration problem.
I really apologize for my bad English.
TL;DR but FWIW:
To scale above 1k locations you need to use NearBy selection.
To scale above 10k locations, add Partitioned Search too.

OptaPlanner for large data sets

I have been asked by a customer to work on a project using Drools. Looking at the Drools documentation I think they are talking about OptaPlanner.
The company takes in transport orders from many customers and links these to bookings on multiple carriers. Orders last year exceeded 100,000. The "optimisation" that currently takes place is based on service, allocation and rate and is linear (each order is assigned to a carrier using the constraints but without any consideration of surrounding orders). The requirement is to hold non-critical orders in a pool for a number of days and optimize the orders in the pool for lowest cost using the same constraints.
Initially they want to run "what if's" over last year's orders to fine-tune the constraints. If this exercise is successful they want to use it in their live system.
My question is whether OptaPlanner is the correct tool for this task, and if so, if there is an example that I can use to get me started.
Take a look at the vehicle routing videos, as it sounds like you have a vehicle routing problem.
If you use just Drools to assign orders, you basically build a Construction Heuristic (= a greedy algorithm). If you use OptaPlanner to assign the orders (and Drools to calculate the quality (= score) of a solution), then you get a better solution. See false assumptions on vehicle routing to understand why.
To scale to 100k orders (= planning entities), use Nearby Selection (which is good up to 10k) and Partitioned Search (which is a sign of weakness but needed above 10k).

Optaplanner take fastest path

How can we optimize Optaplanner to select the fastest route? See the highlighted point in the below image. It is taking the long route.
Note: Vehicles does not need to come back depot. I think i cannot use CVRPTW as arrivalAfterDueTimeAtDepot is a build-in hard constraint (and besides i do not have any time constraints).
How can we write a constraint to select the less capacity vehicle?
For example, A customer needs only 3 items and we have two vehicles with 4 and 9 capacities. Seems like Optaplanner is selecting the first vehicle from the order of input by default.
I presume it's taking the blue vehicle for the center of Bengaluru because the green in is already at full capacity.
Check what the score is (calculated through Solver.getScoreDirectorFactory()) if you manually put that location in the green trip and swap the vehicles of the green and blue trip. If it's worse (or breaks a hard constraint), then it's normal that OptaPlanner selects the other solution. In that case, either your score function has bug (or you realize don't want that solution at all). But if it has indeed a better score, OptaPlanner's <localSearch> (such as Late Acceptance) should find it (especially when scaling out because ironically local optima are a bigger problem when scaling down). You can try to add <subchainSwapMoveSelector> etc to escape local optima faster.
If you want to guide the search more (which is often not a good idea), you can define a planning value strength comparator to sort small vehicles before big vehicles and use the Construction Heuristic WEAKEST_FIT(_DECREASING).

Is it better to cache some value in a database table, or to recompute it each time?

For example, I have a table of bank users (user id, user name), and a table for transactions (user id, account id, amount).
Accounts have the same properties across different users, but hold different amounts (like Alex -> Grocery, it is specific to Alex, but all other users also have Grocery account).
The question is, would it be better to create a separate table of accounts (account id, user id, amount left) or to get this value by selecting all transactions with the needed user id and account id and just summing the 'amount' values? It seems that the first approach would be faster, but more prone to error and database corruption - I would need to update accounts every time the transaction happens. The second approach seems to be cleaner, but would it lead to significant speed reduction?
What would you recommend?
good question!
In my opinion you should always avoid duplicated data so I would go with the "summing" every time option
"It seems that the first approach would be faster, but more prone to error and database corruption - I would need to update accounts every time the transaction happens"
said everything, you are subject to errors and you'll have to build a mechanism to maintain the data up-to-date.
Dont forget that the first approach would be faster to select only. inserts updates and deletes would be slower because you will have to update your second table.
This is an example of Denormalization.
In general, denormalization is discouraged, but there are certain exceptions - bank account balances are typically one such exception.
So if this is your exact situation, I would suggest going with the separate table of accounts solution - but if you have far fewer records than a bank would typically, then I recommend the derived approach, instead.
To some extent, it depends.
With "small" data volumes, performance will more than likely be OK.
But as data volumes grow, having to SUM all transactions may become costlier to the point at which you start noticing a performance problem.
Also to consider is data access/usage patterns. In a ready-heavy system, where you "write once, ready many", then the SUM approach hits performance on every read - in this scenario, it may make sense to take a performance hit once on write, to improve subsequent read performance.
If you anticipate "large" data volumes, I'd definitely go with the extra table to hold the high level totals. You need to ensure though that it is updated when a (monetary) transaction is made, within a (sql server) transaction to make it an atomic operation.
With smaller data volumes, you could get away without it...personally, I'd probably still go down that path, to simplify the read scenario.
It makes sense to go with the denormalized approach (the first solution) only if you face significant performance issues. Since you are doing just simple SUM (or group by and then sum) with proper indexes, your normalized solution will work really well and will be a lot easier to maintain (as you noted).
But depending on your queries, it can make sense to go with denormalized solution...for example, if your database is read/only (you periodically load data from some other data source and don't make inserts/updates at all or make them really rarely), then you can just load data in the easiest way to make queries...and in that case, denormalized solution might prove to be better.