Optaplanner: Can we use two planning variables with (nullable = true)? - optimization

Can we use two planning variables with (nullable = true) for each of them?
If so, how can we deal with them in the Drools rule file?
I know that when we use one planning variable we define it with (nullable = true) and then in the rule we use $planningVariable != null as in the "pas" example, I tried this and it worked well, but what about using two planning variables?
Can we apply this on the curriculumCourse? and if so, the over constrained data should appear in the output as unassigned for the two planning variables or appear in only one of them?

Yes, of course you can. But as usual, you 'll have to make sure your score constraints (= score rules) penalize/reward what you want to achieve.
For example on CurriculumCourse, I presume you 'd have a negative medium constraint that penalizes a Lecture if either room or period is null. If both are null, don't penalize it more, or you'll end up with a lot semi assigned entities. But despite that, you'll still probably end up with a few semi assigned entities, so to fix that:
Either do some post-processing to make all those not assigned at all (= both vars null) as a semi-assignment is useless.
Or add a hard constraint against semi assignments to avoid them entirely (even in intermediate solution states).
Additional solving efficiently can be gained from:
A ChangeMove selector that moves both room and period, as changing just one to/from null will never yield a better solution.

Related

Constraining optaplanner's choices based on the result of different choices - Constraint Propagation / Local Consistency

I have an optimization problem that I'm working on in which the optimizer's decision for one variable needs to constrain the available choices for another variable, and I'm wondering about the suitability of different ways of accomplishing this. I'll try to pare it down for demonstration purposes (in kotlin).
#PlanningEntity
class AuctionBid {
#PlanningId
lateinit var entity: String
#PlanningVariable(valueRangeProviderRefs = ["availOptions"])
var bidNow: Boolean? = null
#PlanningVariable(valueRangeProviderRefs = ["availOptions"])
var bidFinal: Boolean? = null
var forceBid: Boolean? = null
#ValueRangeProvider(id = "availOptions")
private fun availOptions(): List<Boolean> {
return if(this.forceBid != null) {
listOf(this.forceBid!!)
} else {
listOf(true,false)
}
}
}
In this scenario, we are trying to determine if we should bid on a given entity in a combinatorial auction (multiple entities being bid on in each round), given a hard budget and various cost/benefit tradeoffs. The bidNow choice is determined by the prices in the current round of the auction, and the bidFinal choice is determined by the anticipated/forecasted final prices in the auction. In order to avoid constantly changing decisions about what to bid on right now (switching bids is penalized in the auction rules), we want to constrain our choices on bidNow based on our decision for bidFinal, such that if we will want to bid on it at our anticipated future price, we will always choose to bid on it now. A state table seen below:
bidNow
bidFinal
possible
Y
Y
Y
Y
N
Y
N
N
Y
N
Y
N
As seen, we already have a method of constraining choices for situations where we know we don't want the product at all, or where we know we want it at any price. This is using the availOptions ValueRangeProvider, which only lets the optimizer choose a predetermined option if we know that is our only choice.
However, this constraint is more dynamic, as it results from a separate choice. If we are wanting to bid at the anticipated final price, our only choice for right now is to continue bidding in the current round.
I've thought of a few ways of doing this, as this is a classic Local Consistency Constraint Propagation example, but without knowing too much about optaplanner's internals, I don't really know the best way to do this. Some options:
Use a separate ValueRangeProvider for bidNow than bidFinal, and constrain the value range for bidNow to True if bidFinal is true. However, I don't know if optaplanner can even handle a dynamic value range that is determined by another planning variable's value, nor do I know if this is an efficient method.
Place a hard constraint (using ConstraintStreams, which is super cool BTW) on bidNow to always be True if bidFinal is True. I know this is feasible (it's my current solution), but it seems really slow.
Maybe try to use a Shadow variable which I've never used before, although I'm not sure this is a good fit because bidFinal==True implies bidNow==True, but bidFinal=False doesn't necessarily imply anything about bidNow.
I know I can always just try all the different options and see, but this problem is already big and unwieldy (the demo code is dramatically simplified), and I really would like some insight into how Optaplanner is working behind the scenes because I really like working with it but come across these this-way-or-that-way scenarios all the time and it's difficult to try them all. Is there any insight into modeling this constraint propagation problem? Or problems like this?
Both 1 and 2 should be possible; 2 being the straightforward way.
Another option would be to use filtered move selection to skip the moves that would set bidFinal to Y if bidNow is false (and the opposite corner case). The downside of this is that the filter needs to be applied to every move you use, which makes solver configuration verbose. And the moves will still be generated before they are thrown away, so you lose a bit of performance.
However, why not make bid an enum with three options: NOTHING, NOW_ONLY, BOTH. That way, you have the benefits of only having one planning variable and completely eliminating the original problem. Of course this is only suitable when the number of options is this low, which it may not be in your actual use case.
This last option is an example of designing your data model in a way to avoid corner cases. If they are impossible because the model precludes them, you do not need to worry about working around them later. This is always your best bet.

Additional PlanningEntity in CloudBalancing - bounded-space situation

I successfully amended the nice CloudBalancing example to include the fact that I may only have a limited number of computers open at any given time (thanx optaplanner team - easy to do). I believe this is referred to as a bounded-space problem. It works dandy.
The processes come in groupwise, say 20 processes in a given order per group. I would like to amend the example to have optaplanner also change the order of these groups (not the processes within one group). I have therefore added a class ProcessGroup in the domain with a member List<Process>, the instances of ProcessGroup being stored in a List<ProcessGroup>. The desired optimisation would shuffle the members of this List, causing the instances of ProcessGroup to be placed at different indices of the List List<ProcessGroup>. The index of ProcessGroup should be ProcessGroup.index.
The documentation states that "if in doubt, the planning entity is the many side of the many-to-one relationsship." This would mean that ProcessGroup is the planning entity, the member index being a planning variable, getting assigned to (hopefully) different integers. After every new assignment of indices, I would have to resort the list List<ProcessGroup in ascending order of ProcessGroup.index. This seems very odd and cumbersome. Any better ideas?
Thank you in advance!
Philip.
The current design has a few disadvantages:
It requires 2 (genuine) entity classes (each with 1 planning variable): probably increases search space (= longer to solve, more difficult to find a good or even feasible solution) + it increases configuration complexity. Don't use multiple genuine entity classes if you can avoid it reasonably.
That Integer variable of GroupProcess need to be all different and somehow sequential. That smelled like a chained planning variable (see docs about chained variables and Vehicle Routing example), in which case the entire problem could be represented as a simple VRP with just 1 variable, but does that really apply here?
Train of thought: there's something off in this model:
ProcessGroup has in Integer variable: What does that Integer represent? Shouldn't that Integer variable be on Process instead? Are you ordering Processes or ProcessGroups? If it should be on Process instead, then both Process's variables can be replaced by a chained variable (like VRP) which will be far more efficient.
ProcessGroup has a list of Processes, but that a problem property: which means it doesn't change during planning. I suspect that's correct for your use case, but do assert it.
If none of the reasoning above applies (which would surprise me) than the original model might be valid nonetheless :)

Cplex/OPL local search

I have a model implemented in OPL. I want to use this model to implement a local search in java. I want to initialize solutions with some heuristics and give these initial solutions to cplex find a better solution based on the model, but also I want to limit the search to a specific neighborhood. Any idea about how to do it?
Also, how can I limit the range of all variables? And what's the best: implement these heuristics and local search in own opl or in java or even C++?
Thanks in advance!
Just to add some related observations:
Re Ram's point 3: We have had a lot of success with approach b. In particular it is simple to add constraints to fix the some of the variables to values from a known solution, and then re-solve for the rest of the variables in the problem. More generally, you can add constraints to limit the values to be similar to a previous solution, like:
var >= previousValue - 1
var <= previousValue + 2
This is no use for binary variables of course, but for general integer or continuous variables can work well. This approach can be generalised for collections of variables:
sum(i in indexSet) var[i] >= (sum(i in indexSet) value[i])) - 2
sum(i in indexSet) var[i] <= (sum(i in indexSet) value[i])) + 2
This can work well for sets of binary variables. For an array of 100 binary variables of which maybe 10 had the value 1, we would be looking for a solution where at least 8 have the value 1, but not more than 12. Another variant is to limit something like the Hamming distance (assume that the vars are all binary here):
dvar int changed[indexSet] in 0..1;
forall(i in indexSet)
if (previousValue[i] <= 0.5)
changed[i] == (var[i] >= 0.5) // was zero before
else
changed[i] == (var[i] <= 0.5) // was one before
sum(i in indexSet) changed[i] <= 2;
Here we would be saying that out of an array of e.g. 100 binary variables, only a maximum of two would be allowed to have a different value from the previous solution.
Of course you can combine these ideas. For example, add simple constraints to fix a large part of the problem to previous values, while leaving some other variables to be re-solved, and then add constraints on some of the remaining free variables to limit the new solution to be near to the previous one. You will notice of course that these schemes get more complex to implement and maintain as we try to be more clever.
To make the local search work well you will need to think carefully about how you construct your local neighbourhoods - too small and there will be too little opportunity to make the improvements you seek, while if they are too large they take too long to solve, so you don't get to make so many improvement steps.
A related point is that each neighbourhood needs to be reasonably internally connected. We have done some experiments where we fixed the values of maybe 99% of the variables in a model and solved for the remaining 1%. When the 1% was clustered together in the model (e.g. all the allocation variables for a subset of resources) we got good results, while in comparison we got nowhere by just choosing 1% of the variables at random from anywhere in the model.
An often overlooked idea is to invert these same limits on the model, as a way of forcing some changes into the solution to achieve a degree of diversification. So you could add a constraint to force a specific value to be different from a previous solution, or ensure that at least two out of an array of 100 binary variables have a different value from the previous solution. We have used this approach to get a sort-of tabu search with a hybrid matheuristic model.
Finally, we have mainly done this in C++ and C#, but it would work perfectly well from Java. Not tried it much from OPL, but it should be fine too. The key for us was being able to traverse the problem structure and use problem knowledge to choose the sets of variables we freeze or relax - we just found that easier and faster to code in a language like C#, but then the modelling stuff is more difficult to write and maintain. We are maybe a bit "old-school" and like to have detailed fine-grained control of what we are doing, and find we need to create many more arrays and index sets in OPL to achieve what we want, while we can achieve the same effect with more intelligent loops etc without creating so many data structures in a language like C#.
Those are several questions. So here are some pointers and suggestions:
In Cplex, you give your model an initial solution with the use of IloOplCplexVectors()
Here's a good example in IBM's documentation of how to alter CPLEX's solution.
Within OPL, you can do the same. You basically set a series of values for your variables, and hand those over to CPLEX. (See this example.)
Limiting the search to a specific neighborhood: There is no easy way to respond without knowing the details. But there are two ways that people do this:
a. change the objective to favor that 'neighborhood' and make other areas unattractive.
b. Add constraints that weed out other neighborhoods from the search space.
Regarding limiting the range of variables in OPL, you can do it directly:
dvar int supply in minQty..maxQty;
Or for a whole array of decision variables, you can do something along the lines of:
range CreditsAllowed = 3..12;
dvar int credits[student] in CreditsAllowed;
Hope this helps you move forward.

Grammatically correct double-noun identifiers, plural versions

Consider compounds of two nouns, which in natural English would most often appear in the form "noun of noun", e.g. "direction of light", "output of a filter". When programming, we usually write "LightDirection" and "FilterOutput".
Now, I have a problem with plural nouns. There are two cases:
1) singular of plural
e.g. "union of (two) sets", "intersection of (two) segments"
Which is correct, SetUnion and SegmentIntersection or SetsUnion and SegmentsIntersection?
2) plural of plural
There are two subcases:
(a) Many elements, each having many related elements, e.g. "outputs of filters"
(b) Many elements, each having single related element, e.g. "directions of vectors"
Shall I use FilterOutputs and VectorDirections or FiltersOutputs and VectorsDirections?
I suspect correct is the first version (FilterOutupts, VectorDirections), but I think it may lead to ambiguities, e.g.
FilterOutputs - many outputs of a single filter or many outputs of many filters?
LineSegmentProjections - projections of many segments or many projections of a single segment?
What are the general rules, I should follow?
There's a grammatical misunderstanding lying behind this question. When we turn a phrase of form:
1. X of Y
into
2. Y X
the Y changes grammatical role from a noun in the possessive (1) to an adjective in the attributive (2). So while one may pluralise both X and Y in (1), one may only pluralise X in (2), because Y in (2) is an adjective, and adjectives do not have grammatical number.
Hence, e.g., SetsUnion is not in accordance with English. You're free to use it if it suits you, but you are courting unreadability, and I advise against it.
Postscript
In particular, consider two other possessive constructions, first the old-fashioned construction using the possessive pronoun "its", singular:
3a. Y, its X
the equivalent plural:
4a. Ys, their X
and their contractions, with 4b much less common than 3b:
3b. Y's X
4b. Ys' X
Here, SetsUnion suggests it is a rendering of the singular possessive type (3) Set's Union (=Set, its Union), where you intended to communicate the plural possessive (4) Sets, their Union (contracted to the less common Sets' Union).
So it's actively misleading.
Unless you're getting hamstrung by a convention driven system (ruby on rails, cakePHP etc), why not use OutputsOfFilters, UnionOfSets etc? They may not be conventional but they may be clearer.
For example its pretty clear that ProjectionOfLineSegments and ProjectionsOfLineSegment are different things or even ProjectionsOfLineSegments....
Using plural forms of nouns can make them more difficult to read.
When you have a number of things, they are usually stored in a datastructure - an array, a list, a map, set, etc.. generically called a collection or abstract data type. The interface to a collection of items is typically part of the programming environment (e.g. Collections in java and .net, STL in C++) and is well understood by developers to involve quantities of items.
You can avoid pluralizing your nouns, and make the fact that you are dealing with multiple quantities explicit, and indicate how they are accessed by incorporating the name of the collection. For example,
VectorDirectionList - the vectors and their directions are listed, e.g. some kind of Pair type. Works particularly well if you have a VectorDirection, combining a Vector and a Direction.
VectorDirectionMap - if the vector directions are mapped from vector.
Because it's a collection type, dealing with multiple objects is understood as it is endemic to a collection type. It then puts it in the same class as SetUnion - a union always involves at least 2 sets, and a VectorDirectionList makes it clear there can be more than one VectorDirection.
I agree about avoiding homonyms where the word has more than one word class, e.g. Filter, (and actually, Set, although to my mind Set would not really be used in a class name as a verb, so I interpret it as a noun.) I originally wrote this using FilterOutput as an example, but it didn't read well. Using a compound for Filter may help disambiguate - e.g. ImageFilterOutputs (or applying my own adivce, this would be ImageFilterOutputList.)
Avoiding plural forms with class names seems natural when you consider that an instance of a class is itself always one item - "an instance". If we use a plural name, then we get a mismatch - an instance trying to imply that it is multiple things - it itself is just one thing, even if it references multiple other things. The collection naming above builds on this - you have an instance which is a list, a map etc so there is no mismatch.
I'm assuming you are talking about programming language constructs, although the same thinking applies to tables/views. These are understood to involve quantities of items and table names are consequently often singlular (Customer, Order, Item) even though they store multiple rows. Many-to-Many Mapping tables are usually compounds of the entities being related, e.g. relating orders to items - OrderItem. In my experience, using plurals for table names makes the SQL difficult to read.
To sum up, I would avoid plural froms as they make reading harder. There are sure to be cases where they are unavoidable - where using the plural form is more readable than creating a huge name of nested entities and collections, but these are the exception than the rule.
What are the general rules, I should follow?
Make it Clear -- for both visual and aural thinkers.
Make it Specific but Accurate.
Make it pass the "crowded room" or "emergency phone call" test.
To illustrate with the SetsUnion example:
"SetsUnion" is right out; It's easily confused for a typo and speaking it (even in your head) will confuse it for "Set's Union" (Or worse).
The plural is also implied, so the 2nd 's' is redundant.
SetUnion is better but still ambiguous.
UnionOfSets is clearer and should be the bare minimum standard.
But all of these, so far, are uselessly vague (unless you are working with pure mathematical theory).
The term really should be specific. For example, "Red cars", "Programmers who spent too much time on esoterica", etc.
These are all unions of sets, but they tell you something useful. ;-)
.
Finally, Phil Factor had the right of it. To paraphrase:
Can you shout a (term) out across a crowded room and have it keyed in, and successfully (used), by a listener at the other side?
Try yelling, "SetsUnion," or even, "UnionOfSets," across a packed Irish bar. ;-)
1) i would use SetUnion and SegmentIntersection because i think in this case the plurality is implied anyway and it just looks nicer that way.
2) again, i would use FilterOutputs and VectorDirections, for the same reason. you could always use MultipleFilterOutputs if you want to be more specific.
but ultimately it's entirely down to your personal preference.
I think that while general naming conventions and consistency are important, but in a very very tight/tricky algorithm, clarity should trump convention. If it helps, use veryLongAndDescriptiveIdentifiers.
What's wrong with Union()?
Moreover, "union of sets" turns into "sets' union" (the two sets' union is ...); I'm sure I'm not the only person who's okay with CamelCase but not CamelsCaseMinusApostrophes. If it needs an apostrophe to make sense, don't use it. Set.Union() reads exactly like "union of set(s)".
Mathematations will also say "the (set) union of A and B", or rarely "A and B's (set) union". "The sets' union of A and B" makes no sense!
Most people will also see Vector[] vectors and Directions[] vectorDirections and assume that vectors[i] corresponds to vectorDirections[i]. If things really get ambiguous, I use something like vector_by_index and vectorDirection_by_index. Then you can have Map<Filter,Output> output_by_filter or Map<Filter,Output[]> outputs_by_filter, which makes it very obvious what the key is (this is very important in Objective-C where it's completely non-obvious what type the keys or values are).
If you really want, you can add an s and get vectors_by_index, but then consistency gives you the silly outputss_by_filter.
The right thing is, of course, something like struct FilterState { Filter filter; Output[] outputs; }; FilterState[] filterStates;.
I'd suggest singular for the first word: SetUnion, VectorDirections, etc.
Do a quick class search in your IDE, for: Strings*, Sets*, Vectors*, Collections*
Anyway, whatever you choose, be consistent throughout the whole application.

how to store an approximate number? (number is too small to be measured)

I have a table representing standards of alloys. The standard is partly based on the chemical composition of the alloys. The composition is presented in percentages. The percentage is determined by a chemical composition test. Sample data.
But sometimes, the lab cannot measure below a certain percentage. So they indicate that the element is present, but the percentage is less than they can measure.
I was confused how to accurately store such a number in an SQL database. I thought to store the number with a negative sign. No element can have a negative composition of course, but i can interpret this as less than the specified value. Or option is to add another column for each element!! The latter option i really don't like.
Any other ideas? It's a small issue if you think about it, but i think a crowd is always wiser. Somebody might have a neater solution.
Question updated:
Thanks for all the replies.
The test results come from different labs, so there is no common lower bound.
The when the percentage of Titanium is less than <0.0004 for example, the number is still important, only the formula will differ slightly in this case.
Hence the value cannot be stored as NULL, and i don't know the lower bound for all values.
Tricky one.
Another possibility i thought of is to store it as a string. Any other ideas?
What you're talking about is a sentinel value. It's a common technique. Strings in most languages after all use 0 as a sentinel end-of-string value. You can do that. You just need to find a number that makes sense and isn't used for anything else. Many string functions will return -1 to indicate what you're looking for isn't there.
0 might work because if the element isn't there there shouldn't even be a record. You also face the problem that it might be mistaken for actually meaning 0. -1 is another option. It doesn't have that same problem obviously.
Another column to indicate if the amount is measurable or not is also a viable option. The case for this one becomes stronger if you need to store different categories of trace elements (eg <1%, <0.1%, <0.01%, etc). Storing the negative of those numbers seems a bit hacky to me.
You could just store it as NULL, meaning that the value exists but is undefined.
Any arithmetic operation with a NULL yields a NULL.
Division by NULL is safe.
NULL's are ignored by the aggregation functions, so queries like these:
SELECT SUM(metal_percent), COUNT(metal_percent)
FROM alloys
GROUP BY
metal
will give you the sum and the count of the actual, defined values, not taking the unfilled values into account.
I would use a threshold value which is at least one significant digit smaller than your smallest expected value. This way you can logically say that any value less than say 0.01, can be presented to you application as a "trace" amount. This remains easy to understand and gives you flexibility in determining where your threshold should lie.
Since the constraints of the values are well defined (cannot have negative composition), I would go for the "negative value to indicate less-than" approach. As long as this use of such sentinel values are sufficiently documented, it should be reasonably easy to implement and maintain.
An alternative but similar method would be to add 100 to the values, assuming that you can't get more than 100%. So <0.001 becomes 100.001.
I would have a table modeling the certificate, in a one to many relation with another table, storing the values for elements. Then, I would still have the elements table containing the value in one column and a flag (less than) as a separate column.
Draft:
create table CERTIFICATES
(
PK_ID integer,
NAME varchar(128)
)
create table ELEMENTS
(
ELEMENT_ID varchar(2),
CERTIFICATE_ID integer,
CONCENTRATION number,
MEASURABLE integer
)
Depending on the database engine you're using, the types of the columns may vary.
Why not add another column to store whether or not its a trace amount
This will allow you to to save the amount that the trace is less than too
Since there is no common lowest threshold value and NULL is not acceptable, the cleanest solution now is to have a marker column which indicates whether there is a quantifiable amount or a trace amount present. A value of "Trace" would indicate to anybody reading the raw data that only a trace amount was present. A value of "Quantity" would indicate that you should check an amount column to find the actual quantity present.
I would have to warn against storing numerical values as strings. It will inevitably add additional pain, since you now lose the assertions a strong type definition gives you. When your application consumes the values in that column, it has to read the string to determine whether it's a sentinel value, a numeric value or simply some other string it can't interpret. Trying to handle data conversion errors at this point in your application is something I'm sure you don't want to be doing.
Another field seems like the way to go; call it 'MinMeasurablePercent'.