Unlimited vehicles in VRP - optaplanner

How to allow Optaplanner to use an unlimited or dynamic number of vehicles in the VRP problem?
The number of vehicles is minimized during score calculation, as each vehicle has a base cost. The solver should initialize as many vehicles as it thinks it is comvenient
#PlanningEntityCollectionProperty
#ValueRangeProvider(id = "vehicleRange")
public List<Vehicle> getVehicleList() {
return vehicleList;
}
Currently I just initialize the vehicle list with a predefined number of vehicles, such as 100 000, but I am not sure about the performance implications of that, as the search space is much bigger than necessary.

Out-of-the-box, this is the only way. You figure out the minimum maximum number of vehicles for a dataset and use that to determine the number of vehicles. For one, the minimum maximum number of vehicles is never bigger than the number of visits. But usually you can prove it to be far less than that.
That being said, the OptaPlanner architecture does support Move's that create or delete Vehicles, at least in theory. No out-of-the-box moves do that, so you'd need to build custom moves to do that - and it will get complex fast. One day we intend to support generic create/delete moves out-of-the-box.

Related

Prioritising scores in the VRP solution with OptaPlanner

I am using optaplanner to solve my VRP problem. I have several constraint providers, for example: one to enforce the capabilities and another to enforce the TW regarding the arrival time, both HARD. At the finish of the optimisation it returns a route with a negative score and when I analyse the ConstraintMach I find that it is a product of a vehicle capacity constraint. However, I consider that in my problem it does not objective that the vehicle arrives on time (meeting TW's constraint) if it will not be able to satisfy the customer's demands.That's why I require that the constraints I have defined for the capacities (Weight and Volume) have more weight/priority than the Time Window constraint.
Question: How can I configure the solver or what should I consider to apply all the hard constraints, but make some like the capacity ones have more weight than others?
Always grateful for your suggestions and help
I am not by far an expert on OptaPlanner but every constraint penalty (or reward) is divided into two parts if you use penalizeConfigurable(...) instead of penalize(...). Then each constraint score will be evaluated as the ConstraintWeight that you declare in a ConstrainConfig file multiplied by MatchWeight that is how you implement the deviation from the desired result. Like the number of failed stops might be Squared turning into an exponential penalty instead of just linear.
ConstaintWeights can be reconfigured between Solutions to tweak the importance of a penalty and setting it to Zero will negate it completely. MatchWeight is an implementation detail even in my view that you tweak while you develop. At least this how I see it.

Getting the optimal number of employees for a month (rostering)

Is it possible to get the optimal number of employees in a month for a given number of shifts?
I'll explain myself a little further taking the nurse rostering as an example.
Imagine that we don't know the number of nurses to plan in a given month with a fixed number of shifts. Also, imagine that each time you insert a new nurse in the planification it decreases your score and each nurse has a limited number of normal hours and a limited number of extra hours. Extra hours decrease more the score than normal ones.
So, the problem consists on getting the optimal number of nurses needed and their planification. I've come up with two possible solutions:
Fix the number of nurses clearly above of the ones needed and treat the problem as an overconstrained one, so there will be some nurses not assigned to any shifts.
Launching multiple instances of the same problem in parallel with an incremental number of nurses for each instance. This solution has the problem that you have to estimate more or less an approximate range of nurses under and above the nurses needed beforehand.
Both solutions are a little bit inefficient, is there a better approach to tackle with this problem?
I call option 2 doing simulations. Typically in simulations, they don't just play with the number of employees, but also the #ConstraintWeights etc. It's useful for strategic "what if" decisions (What if we ... hire more people? ... focus more on service quality? ... focus more on financial gain? ...)
If you really just need to minimize the number of employees, and you can clearly weight that versus all the other hard and soft constraint (probably as a weight in between both, similar to overconstrained planning), then option 1 is good enough - and less cpu costly.

Efficiently Computing Significant Terms in SQL

I was introduced to ElasticSearch significant terms aggregation a while ago and was positively surprised how good and relevant this metric turns out to be. For those not familiar with it, it's quite a simple concept - for a given query (foreground set) a given property is scored against the statistical significance of the background set.
For example, if we were querying for the most significant crime types in the British Transport Police:
C = 5,064,554 -- total number of crimes
T = 66,799 -- total number of bicycle thefts
S = 47,347 -- total number of crimes in British Transport Police
I = 3,640 -- total number of bicycle thefts in British Transport Police
Ordinarily, bicycle thefts represent only 1% of crimes (66,799/5,064,554) but for the British Transport Police, who handle crime on railways and stations, 7% of crimes (3,640/47,347) is a bike theft. This is a significant seven-fold increase in frequency.
The significance for "bicycle theft" would be [(I/S) - (T/C)] * [(I/S) / (T/C)] = 0.371...
Where:
C is the number of all documents in the collection
S is the number of documents matching the query
T is the number of documents with the specific term
I is the number of documents that intersect both S and T
For practical reasons (the sheer amount of data I have and huge ElasticSearch memory requirements), I'm looking to implement the significant terms aggregation in SQL or directly in code.
I've been looking at some ways to potentially optimize this kind of query, specifically, decreasing the memory requirements and increasing the query speed, at the expense of some error margin - but so far I haven't cracked it. It seems to me that:
The variables C and S are easily cacheable or queriable.
The variable T could be derived from a Count-Min Sketch instead of querying the database.
The variable I however, seems impossible to derive with the Count-Min Sketch from T.
I was also looking at the MinHash, but from the description it seems that it couldn't be applied here.
Does anyone know about some clever algorithm or data structure that helps tackling this problem?
I doubt a SQL impl will be faster.
The values for C and T are maintained ahead of time by Lucene.
S is a simple count derived from the query results and I is looked up using O(1) data structures. The main cost are the many T lookups for each of the terms observed in the chosen field. Using min_doc_count typically helps drastically reduce the number of these lookups.
For practical reasons (the sheer amount of data I have and huge ElasticSearch memory requirements
Have you looked into using doc values to manage elasticsearch memory better? See https://www.elastic.co/blog/support-in-the-wild-my-biggest-elasticsearch-problem-at-scale
An efficient solution is possible for the case when the foreground set is small enough. Then you can afford processing all documents in the foreground set.
Collect the set {Xk} of all terms occurring in the foreground set for the chosen field, as well as their frequencies {fk} in the foreground set.
For each Xk
Calculate the significance of Xk as (fk - Fk) * (fk / Fk), where Fk=Tk/C is the frequency of Xk in the background set.
Select the terms with the highest significance values.
However, due to the simplicity of this approach, I wonder if ElasticSearch already contains that optimization. If it doesn't - then it very soon will!

Structuring a model in Optaplanner: basic

I am fascinated with Optaplanner and am trying to learn it by constructing examples. I would be grateful for any tips that could help me structure this problem.
The problem domain has
Performers who perform in classes of performance: one performer may perform in several classes
Venues in which the performers perform
Time slots in which the performers perform
Competitions in which there are performers performing in their classes in a set of venues
Which performers will perform in which class is known ahead of time
Each performance takes a fixed amount of time
There are no constraints on the size of a class: classes are open to all comers, but the list of performers and their requested classes is fixed at the beginning of the model run
The objectives and constraints are:
Schedule the performances in a competition to take the least amount of elapsed time (i.e. try to make full use of multiple venues);
A class must start and finish in the same venue, but any class can use any venue;
Once a class starts in a venue, it has exclusive use of the venue until it finishes;
Only one performer can perform in a given venue in a given time slot;
A performer cannot be in two different venues at the same time;
Every performer must be permitted to perform in each class that they request;
Order of performance in a class doesn't matter;
A performer must have at least "n" time slots between performances;
It is desirable to avoid dead time-slots in a venue;
An individual performance takes exactly "m" minutes irrespective of class, performer or venue;
If a class has n performers and each performance takes m minutes, a class will continue in its assigned venue from its assigned starting time until n * m minutes later (plus x * m if there are x dead time slots);
In a typical problem, there might be 4 venues, 15 classes and 200 performers, with large variance in the size of the classes (from 1 to 30 or more performers in each class).
My first thought was to schedule the classes to venues and starting time-slots first, then rearrange the order of performances if doing so is necessary to resolve conflicts. Only if conflicts can't be resolved by changing the order of performances in a class can the assignment of classes to venues or starting times change.
What's not clear to me is how to handle the shuffling of performer/time-slot/venue mappings and then revert to shuffling class/venue/start-time mappings if there remain conflicts.
The first part of the problem (deciding an initial class to venue/start-time) seems like simple bin-packing. I'm not sure if that's the best time to consider reordering performances, however.
Any assistance would be gratefully received.
Thanks!

Discrete optimisation: large number of optimal solutions

TL;DR version: Is there a way to cope with optimisation problems where there exists a large number of optimal solutions (solutions that find the best objective value)? That is, finding an optimal solution is pretty quick (but highly dependent on the size of the problem, obviously) but many such solutions exists so that the solver runs endlessly trying to find a better solution (endlessly because it does find other feasible solutions but with an objective value equals to the current best).
Not TL;DR version:
For a university project, I need to implement a scheduler that should output the schedule for every university programme per year of study. I'm provided some data and for the matter of this question, will simply stick to a general but no so rare example.
In many sections, you have mandatory courses and optional courses. Sometimes, those optional courses are divided in modules and the student needs to choose one of these modules. Often, they have to select two modules, but some combinations arise more often than others. Clearly, if you count the number of courses (mandatory + optional courses) without taking into account the subdivision into modules, you happen to have more courses than time slots in which they need to be scheduled. My model is quite simple. I have constraints stating that every course should be scheduled to one and only one time slot (period of 2 hours) and that a professor should not give two courses at the same time. Those are hard constraints. The thing is, in a perfect world, I should add hard constraints stating that a student cannot have two courses at the same time. But because I don't have enough data and that every combination of modules is possible, there is no point in creating one student per combination mandatory + module 1 + module 2 and apply the hard constraints on each of these students, since it is basically identical to have one student (mandatory + all optionals) and try to fit the hard constraints - which will fail.
This is why, I decided to move those hard constraints in an optimisation problem. I simply define my objective function minimising for each student the number of courses he/she takes that are scheduled simultaneously.
If I run this simple model with only one student (22 courses) and 20 time slots, I should have an objective value of 4 (since 2 time slots embed each 2 courses). But, using Gurobi, the relaxed objective is 0 (since you can have fraction of courses inside a time slot). Therefore, when the solver does reach a solution of cost 4, it cannot prove optimality directly. The real trouble, is that for this simple case, there exists a huge number of optimal solutions (22! maybe...). Therefore, to prove optimality, it will go through all other solutions (which share the same objective) desperately trying to find a solution with a smaller gap between the relaxed objective (0) and the current one (4). Obviously, such solution doesn't exist...
Do you have any idea on how I could tackle this problem? I thought of analysing the existing database and trying to figure out which combinations of modules are very likely to happen so that I can put back the hard constraints but it seems hazardous (maybe I will select a combination that leads to a conflict therefore not finding any solution or omitting a valid combination). The current solution I use is putting a time threshold to stop the optimisation...