Is it possible to use dynamic weighting (#ConstraintConfiguration) with an EasyScoreCalculator - optaplanner

I've been reading the documentation and it provides some examples for drools and constraint streams, but it doesn't explicitly say whether you can or cannot use Constraint Configuration with an EasyScoreCalculator.

As the ConstrationConfiguration is a field in the PlanningSolution class, it's available in the EasyScoreCalculator's calculateScore(Solution_ solution) method, which computes the score of the entire solution for every move.
Let me just note that the EasyScoreCalculator does not scale for bigger data sets - exactly because it computes the score of the entire solution for every move.

Related

OptaPlanner Constraint Streams: Count Distinct Values in Planning Entity Set

I'm looking for some help with OptaPlanner's constraint streams. The problem is a variant on job-shop scheduling, and my planning entities (CandidateAssignment) are wrapping around two decision variables: choice of robot and assigned time grain. Each CandidateAssignment also has a field (a Set) denoting which physical containers in a warehouse will be filled by assigning that task.
The constraint I'm trying to enforce is to minimize the total number of containers used by all CandidateAssignments in a solution (the goal being to guide OptaPlanner towards grouping tasks by container... there are domain-specific benefits to this in the warehouse). If each CandidateAssignment could only service a single container, this would be easy:
protected Constraint maximizeContainerCompleteness(ConstraintFactory constraintFactory) {
return constraintFactory.forEach(CandidateAssignment.class)
.filter(CandidateAssignment::isAssigned)
.groupBy(CandidateAssignment::getContainerId, countDistinct())
.penalizeConfigurable("Group by container");
}
Moving from a single ID to a collection seems less straightforward to me (i.e., if CandidateAssignment:getContainerIds returns a set of integers). Any help would be much appreciated.
EDIT: Thanks Christopher and Lukáš for the responses. Christopher's constraint matches my use case (minimize the number of containers serviced by a solution). However, this ends up being a pretty poor way to guide OptaPlanner towards (more) optimal solutions since it's operating via iterated local search. Given a candidate solution, the majority of neighbors in that solution's neighborhood will have equal value for that constraint (# unique containers used), so it doesn't have much power of discernment.
The approach I've tested with reasonable results is as follows:
protected Constraint maximizeContainerCompleteness(ConstraintFactory constraintFactory) {
return constraintFactory.forEach(CandidateAssignment.class)
.filter(CandidateAssignment::isAssigned)
.join(Container.class, Joiners.filtering(
(candidate, container) -> candidate.getContainerIds().contains(container.getContainerId())))
.rewardConfigurable("Group by container", (candidate, container) -> container.getPercentFilledSquared());
}
This is a modified version of Lukáš' answer. It works by prioritizing containers which are "mostly full." In the real-world use case (which I think I explained pretty poorly above), we'd like to minimize the number of containers used in a solution because it allows the warehouse to replace those containers with new ones which are "easier" to fulfill (the search space is less constrained). We're planning in a receding time horizon, and having many partially filled bins means that each planning horizon becomes increasingly more difficult to schedule. "Closing" containers by fulfilling all associated tasks means we can replace that container with a new one and start fresh.
Anyways, just a bit of context. This is a very particular use case, but if anyone else reads this and wants to know how to work with this type of constraint, hopefully that helps.
Interpreting your constraint as "Penalize by 1 for each container used", this should work:
Constraint maximizeContainerCompleteness(ConstraintFactory constraintFactory) {
return constraintFactory.forEach(CandidateAssignment.class)
.filter(CandidateAssignment::isAssigned)
.flattenLast(CandidateAssignment::getContainerIds)
.distinct()
.penalizeConfigurable("Group by container");
}
What it does: for each assigned candidate assignment, flatten its set of container ids (resulting in a stream of non-distinct used container ids), take the distinct elements of that stream (resulting in a stream of distinct used container ids), and trigger a penalize call for each one.
Not to take away from Christopher's correct answer, but there are various ways how you could do that. For example, consider conditional propagation (ifExists()):
return constraintFactory.forEach(Container.class)
.ifExists(CandidateAssignment.class,
Joiners.filtering((container, candidate) -> candidate.isAssigned()
&& candidate.getContainerIds().contains(container.getId()))
.penalizeConfigurable("Penalize assigned containers",
container -> 1);
I have a hunch that this approach will be faster, but YMMV. I recommend you benchmark the two approaches and pick the one that performs better.
This approach also has the extra benefit of Container instance showing up in the constraint matches, and not some anonymous Integer.

OptaPlanner: Is the "constraint match" associated with a score just a semantical thing?

I have a question about OptaPlanner constraint stream API. Are the constraint matches only used to calculate the total score and are meant to help the user see how the score results, or is this information used to find a better solution?
With "used to find a better solution" I mean the information is used to get the next move(s) in the local search phase.
So does it matter which planning entity I penalize?
Currently, I am working on an examination scheduler. One requirement is to distribute the exams of a single student optimally.
The number of exams per student varies. Therefore, I wrote a cost function that gives a normalized value, indicating how well the student's exams are distributed.
Let's say the examination schedule in the picture has costs of 80. Now, I need to break down this value to the individual exams. There are two different ways to do this:
Option A: Penalize each of the exams with 10 (10*8 = 80).
Option B: Penalize each exam according to its actual impact.=> Only the exams in the last week are penalized as the distribution of exams in week one and week two is fine.
Obviously, option B is semantically correct. But does the choice of the option affect the solving process?
The constraint matches are there to help explain the score to humans. They do not, in any way, affect how the solver moves or what solution you are going to get. In fact, ScoreManager has the capability to calculate constraint matches after the solver has already finished, or for a solution that's never even been through the solver before.
(Note: constraint matching does affect performance, though. They slow everything down, due to all the object iteration and creation.)
To your second question: Yes, it does matter which entity you penalize. In fact, you want to penalize every entity that breaks your constraints. Ideally it should be penalized more, if it breaks the constraints more than some other entity - this way, you get to avoid score traps.
EDIT based on an edit to the question:
In this case, since you want to achieve fairness per student, I suggest your constraint does not penalize the exam, but rather the student. Per student, group your exams and apply some fairness ConstraintCollector. If you do it like that, you will be able to create a per-student fairness function and use its value as your penalty.
The OptaPlanner Tennis example shows one way of doing fairness. You may also be interested in a larger fairness discussion on the OptaPlanner blog.

Prioritising scores in the VRP solution with OptaPlanner

I am using optaplanner to solve my VRP problem. I have several constraint providers, for example: one to enforce the capabilities and another to enforce the TW regarding the arrival time, both HARD. At the finish of the optimisation it returns a route with a negative score and when I analyse the ConstraintMach I find that it is a product of a vehicle capacity constraint. However, I consider that in my problem it does not objective that the vehicle arrives on time (meeting TW's constraint) if it will not be able to satisfy the customer's demands.That's why I require that the constraints I have defined for the capacities (Weight and Volume) have more weight/priority than the Time Window constraint.
Question: How can I configure the solver or what should I consider to apply all the hard constraints, but make some like the capacity ones have more weight than others?
Always grateful for your suggestions and help
I am not by far an expert on OptaPlanner but every constraint penalty (or reward) is divided into two parts if you use penalizeConfigurable(...) instead of penalize(...). Then each constraint score will be evaluated as the ConstraintWeight that you declare in a ConstrainConfig file multiplied by MatchWeight that is how you implement the deviation from the desired result. Like the number of failed stops might be Squared turning into an exponential penalty instead of just linear.
ConstaintWeights can be reconfigured between Solutions to tweak the importance of a penalty and setting it to Zero will negate it completely. MatchWeight is an implementation detail even in my view that you tweak while you develop. At least this how I see it.

Why do we set the similarity function at index time in Lucene?

How does Lucene use Similarity during indexing time? I understand the role of similarity while reading the index. So, searcher.setSimilarity() makes sense in scoring. What is the use of IndexWriterConfig.setSimilarity()?
How does Lucene use Similarity during indexing time?
The short answer is: Lucene captures some statistics at indexing time which can then be used to support scoring at query time. I expect it is simply a matter of efficiency that these are captured as part of the indexing process, rather than being repeatedly re-computed on the fly, when running queries.
There is a section in the Similarity javadoc which describes this at a high level:
At indexing time, the indexer calls computeNorm(FieldInvertState), allowing the Similarity implementation to set a per-document value for the field that will be later accessible via LeafReader.getNormValues(String).
The javadoc goes on to describe further details - for example:
Many formulas require the use of average document length, which can be computed via a combination of CollectionStatistics.sumTotalTermFreq() and CollectionStatistics.docCount().
So, for example, the segment info file within a Lucene index records the number of documents in each segment.
There are other statistics which can be captured in an index to support scoring calculations at query time. You can see a summary of these stats in the Index Structure Overview documentation - with links to further details.
What is the use of IndexWriterConfig.setSimilarity()?
This is a related question which follows on from the above points.
By default, Lucene uses the BM25Similarity formula.
That is one of a few different scoring models that you may choose to use (or you can define your own). The setSimilarity() method is how you can choose a different similarity (scoring model) from the default one. This means different statistics may need to be captured (and then used in different ways) to support the chosen scoring model.
It would not make sense to use one scoring model at indexing time, and a different one at query time.
(Just to note: I have never set the similarity scoring model myself - I have always used the default model.)

Did anyone write custom Affinity function?

I want all nodes in a cluster to have equal number data load. With
default Affinity function it is not happening.
As of now, we have 3 nodes. We use group ID as affinity key, and we have 3
group IDs (1, 2 and 3). And we limit cache partitions to group IDs. Overall
nodes=group IDs=cache partitions. So that each node have equal number of
partitions.
Will it be okay to write custom Affinity function? And
what will we lose doing so? Did anyone write custom Affinity function?
The affinity function doesn't guarantee an even distribution across all nodes. It's statistical... and three values isn't really enough to make sure the data is "fairly" distributed.
So, yes, writing a new affinity function would work. The downsides being you need to make it fast (it's called a lot) and you'd be hard-coding it to your current node topology. What happens when you choose to add a new node? What happens when a node fails? Also, you'd be potentially putting all your data into three partitions which make it harder to scale out (one of the main advantages of Ignite's architecture).
As an alternative, I'd look at your data model. Splitting your data into three chunks is too coarse for things to work automatically.