OptaPlanner: Mixing constraint and for-next based score calculation - optaplanner

I am implementing several restrictions within my CustomizedConstraintProvider class using the streaming API. Nevertheless there is one special case, where I currently do not see how to get this properly implemented within the streaming API.
If I got several methods ...
private Constraint Restriction1(ConstraintFactory constraintFactory) {
return constraintFactory
.forEach(Class.class)
...
.penalize("Restriction1", HardMediumSoftBigDecimalScore.ONE_HARD)
}
private Constraint Restriction2(ConstraintFactory constraintFactory) {
return constraintFactory
.forEach(Class.class)
...
.penalize("Restriction2", HardMediumSoftBigDecimalScore.ONE_SOFT)
}
private Constraint Restriction3(ConstraintFactory constraintFactory) {
return constraintFactory
.forEach(Class.class)
...
.penalizeBigDecimal("Restriction3", HardMediumSoftBigDecimalScore.ONE_MEDIUM,
(a, b, c) -> BigDecimal.valueOf(Math.pow((b - c), a))
}
how can I implement one particular method (let's say "Restriction4" that runs with for-next loops, accessing the assignment lists and returning medium and soft scores at the end depending on the evaluation within the ConstraintFactroy approach? In the manual I only read this as an either or approach (TimeTableEasyScoreCalculator vs. TimeTableConstraintProvider in chapter 2 of the manual for the current OptaPlanner version 8.19.0). I am aware that the looping way scales way more poorly than the streaming alternative but this shall be a basis to get later into the more complex Constraint Stream Score Calculation having a working solution on hand for comparison.
Thanks in advance!

The easy and entirely unhelpful answer is that you can not use for-style loops in constraint streams.
The Constraint Streams API is designed to give you incremental performance, and therefore you need to think of your constraints in a certain way. This way is not always easy to learn, and it requires practice. That said, we have not yet seen a constraint which we could not implement incrementally.
For example, groupBy is a very powerful construct which allows you to transform your data in pretty much any way you want. If you implement a custom constraint collector, you can solve even very complex problems incrementally.
However, some users simply use groupBy() together with the toList() constraint collector, gather all their data in a single collection, and then penalize on that. I will not give an example of that, as it is an anti-pattern which leads to poor performance, and we generally discourage it.

Related

OptaPlanner - ConstraintStreams - Disabling constraints dynamically

I am using Java ConstraintStreams for constraint implementation and need to enable/disable the constraints dynamically.
Following the suggested approach in this answer Optaplanner : Add / remove constraints dynamically using zero constraint weight seems to work as expected (the score is different compared to when the constraint is enabled) but when I put a log in the constraint implementation the log was printed which suggests that the constraint is actually being evaluated even if the score is set to zero.
private Constraint minimizeCost(ConstraintFactory constraintFactory) {
return constraintFactory.forEach(CloudProcess.class)
.filter(process -> {
System.out.println("Minimize computer cost");
return process.getComputer() != null;
})
.penalizeConfigurable("computerCost",
node -> (Integer) process.getComputer.getCost());
}
For Disabling the constraint I am using
new CloudBalance().setConstraintConfiguration(new CloudBalanceConstraintConfiguration()
.computerCost(HardMediumSoftScore.ZERO))
I have modified the CloudBalancing example to make it similar to what I am trying to implement.
So is there something that I am missing in terms of understanding/implementation ?
Will the disabled constraint still execute regular stream operations like filter and skip OptaPlanner specific operations like ifExists, joins etc.?
Is there any way to prevent this behavior ? (currently I am working with version 8.16.0.Final)
You are correct that part of the constraint may still be executed. But constraint weight of zero makes sure that no constraint will ever match - they may have a cost in terms of performance, but they will not affect the score nor justifications.
There is nothing else you could do, other than generating a different ConstraintProvider every time. In the future, we may improve constraint disabling so that the constraints are disabled entirely.

Optaplanner - Multiple planning entities with same blockId- How to "move all" or "chain" or "shadow" to the same planning variable?

I am trying to assign Timeslots to planningEntities
(containing room, groups, persons handled by constraint streams).
Some of these planning entities has a blockId.
When entity has a blockId the goal is to share timeslot with other entities with the same blockId.
I defined a constraint for this, but I can see that the solver does extremely many unnecessary moves.
public Constraint groupBlockConstraint(ConstraintFactory constraintFactory){
return constraintFactory.forEachUniquePair(Lesson.class,
Joiners.equal(Lesson::getSequenceGroup),
Joiners.filtering((a, b) ->
!Lesson.withoutBlock(a, b)
&& !Lesson.sameTimeslot(a,b)))
.penalize("BlockSequence not in same timeslot", HardSoftScore.ofHard(15));
}
Is there a way to handle this more efficiently?
Constraints do not determine which moves the solver will be trying. Constraints are only used to score solutions which are achieved once moves are already performed.
Therefore if you're seeing moves which in your opinion should not be performed, you need to configure your selectors. Using tabu search could, perhaps, also help here.
That said, without a more detailed question I can not provide a less generic answer.

Optaplanner limit valueRangeProvider based on another entity

My planning problem is similar to employee rostering.
My planning entity looks like this
public class Menu {
#PlanningVariable(valueRangeProviderRefs = "productRange")
private String productId = null;
private String packId;
private String date;
}
Now, I have a condition that if two packIds are "similar" then the productId for those on the same data must also be "similar" where being similar is defined by some business logic.
I added a hard constraint for this but number of products are ~3000 and it takes forever to run through all combinations. Is there a way to restrict the value range provider to achieve this (so that it only iterates over the similar products)?
As per OptaPlanner manual: the value range of an entity must be independent of the planning variable's state. Any such dependence must be handle through (hard) constraints.
That being said, there are often more efficient models to deal with complexity you're describing. I've never seen anything below 10k instances that doesn't have an efficient way to solve them in 5 minutes or so. Typical scaling tricks include precalculation (valid combo's, hashing, ...), rule/scoring efficiency, nearby selection, ... and mulithreaded solving to top it off. It depends on the case and requires an in depth review.

Hibernate: Enforcing SQL uniqueness without constraints

I'm developing an application using Hibernate. One of the fields I insert must be unique in a table. The problem here is that the field is not the primary key and the underlying database does not support "UNIQUE" constraints. So I have to enforce this in my application code.
This pseudocode is what I have so far:
void insert(Data data) {
beginTransaction();
boolean exists = existsRecordWithName(data.name);
// Line 7
if (exists == false) {
insertRecord(data);
} else {
display("Name already exists in database!");
}
commit();
}
But if two different processes where to insert data at the same time, and the two reached line number 7, they would think there is no other record at the database with the same name and they both would insert it -> the result is a duplicate.
So how could I enforce uniqueness this way? If I were using pure SQL I would try to lock the table but I'm looking for a higher level solution involving Hibernate standard features, so it would continue working if I someday change the backend.
Any help is appreciated!
You can't enforce unique constraints using application code. Constraints, strictly speaking, apply to all users. What you do in application code only applies to users who happen to use that application code. Constraints in application code don't apply, for example, to DBAs or developers that use a command-line tool or GUI utility to access the database.
Having said that, a SQL DBMS will usually support locks and transactions. If you can't enforce uniqueness by declaring a column unique, your next best bet is to explicitly lock the table and maybe wrap your changes in a serializable transaction. I think locking the table should be enough, but I'm not going to make bets on a system that doesn't support unique constraints.
What dbms doesn't support unique constraints? I'm pretty sure I've never seen such a thing, and I started working with databases almost 30 years ago.
you can do locking using hibernate too (LockMode) provided your database supports it, but locking with inserts is tricky business (how do you lock something that does not yet exist). btw what happens if you do a unique="true" for this entity? does hibernate throw an error saying it can not, or does it allow duplicates? I am assuming you have correctly implemented equals and hashcode methods.
In your code outline above, you could intern() the name and do a synchronized block around the code on the name - that may work.

Designing SQL database to represent OO class hierarchy

I'm in the process of converting a class hierarchy to be stored in an SQL database.
Original pseudo code:
abstract class Note
{
int id;
string message;
};
class TimeNote : public Note
{
time_t time;
};
class TimeRangeNote : public Note
{
time_t begin;
time_t end;
};
class EventNote : public Note
{
int event_id;
};
// More classes deriving from Note excluded.
Currently I'm having a couple of ideas how to store this in a database.
A. Store all notes in a single wide table
The table would contain all information needed by all classes deriving from Note.
CREATE TABLE t_note(
id INTEGER PRIMARY KEY,
message TEXT,
time DATETIME,
begin DATETIME,
end DATETIME,
event_id INTEGER
);
Future classes deriving from Note need to add new columns to this table.
B. Map each class to a table
CREATE TABLE t_note(
id INTEGER PRIMARY KEY,
message TEXT
);
CREATE TABLE t_timenote(
note_id INTEGER PRIMARY KEY REFERENCES t_note(id),
time DATETIME
);
CREATE TABLE t_timerangenote(
note_id INTEGER PRIMARY KEY REFERENCES t_note(id),
begin DATETIME,
end DATETIME
);
CREATE TABLE t_eventnote(
note_id INTEGER PRIMARY KEY REFERENCES t_note(id),
event_id INTEGER
);
Future classes deriving from Note need to create a new table.
C. Use database normalization and VARIANT/SQL_VARIANT
CREATE TABLE t_note(
id INTEGER PRIMARY KEY,
message TEXT
);
CREATE TABLE t_notedata(
note_id INTEGER REFERENCES t_note(id),
variable_id TEXT, -- or "variable_id INTEGER REFERENCES t_variable(id)".
-- where t_variable has information of each variable.
value VARIANT
);
Future classes deriving from Note need to add new variable_id.
D. Map each concrete class to a table (newly added based on current answers)
CREATE TABLE t_timenote(
id INTEGER PRIMARY KEY,
message TEXT,
time DATETIME
);
CREATE TABLE t_timerangenote(
id INTEGER PRIMARY KEY,
message TEXT,
begin DATETIME,
end DATETIME
);
CREATE TABLE t_eventnote(
id INTEGER PRIMARY KEY,
message TEXT,
event_id INTEGER
);
Future classes deriving from Note need to create a new table.
What would be the most logical representation in SQL be?
Are there any better options?
In general I prefer obtion "B" (i.e. one table for base class and one table for each "concrete" subclass).
Of course this has a couple of drawbacks: first of all you have to join at least 2 tables whenever you have to read a full instance of a subclass. Also, the "base" table will be constantly accessed by anyone who has to operate on any kind of note.
But this is usually acceptable unless you have extreme cases (billions of rows, very quick response times required and so on).
There is a third possible option: map each subclass to a distinct table. This helps partitioning your objects but costs more in development effort, in general.
See this for a complete discussion.
(Regarding your "C" solution, using VARIANT: I can't comment on the merits/demerits, because it looks like a proprietary solution - what is it ? Transact-SQL? and I am not familiar with it).
Your 'B' option as described is pretty much an implementation of the 'Object Subclass Heirarchy' (Kung, 1990 http://portal.acm.org/citation.cfm?id=79213)
As such, it's a well established and understood method. It works quite well. It's also extensible through multiple levels of inheritance, should you need it.
Of course you lose some of the benefits of encapsulation and information hiding, if you don't restrict who can access the data theough the DBMS interface.
You can however access it from multiple systems, and even languages, simultaneously (e.g Java, C++, C#)
(This was the subject of my Masters dissertation :)
You've hit the 3 most-commonly-accepted ways of modeling objects into a relational database. All 3 are acceptable, and each has their own pros and cons. Unfortunately, that means there's no cut-n-dry "right" answer. I've implemented each of those at different times, and here's a couple notes/caveats to keep in mind:
Option A has the drawback that, when you add a new subclass, you must modify an existing table (this may be less palatable to you than adding a new table). It also has the drawback that many columns will contain NULLs. However, modern DBs seem MUCH better at managing space than older DBs, so I've never been too worried about nulls. One benefit is that none of your search or retrieve operations will require JOINs or UNIONs, which means potentially better performance and simpler SQL.
Option B has the drawback that, if you add a new property to your superclass, you need to add a new column to each and every subclass's table. Also, if you want to do a heterogeneous search (all subclasses at once), you must do so using a UNION or JOIN (potentially slower performance and/or more complex sql).
Option C has the drawback that all retrieval operations (even for just one subclass) will involve a JOIN, as will most searches. Also, all inserts will involve multiple tables, making for somewhat more complex SQL, and will necessitate use of transactions. This option seems to be the most "pure" from a data-normalization standpoint, but I rarely use it because the JOIN-for-every-operation drawback usually makes one of the other options more palatable.
I'd grativate towards option A myself.
It also depends a bit on your usage scenarios, for example will you need to do lots of searches across all types of notes? If yes, then you might be better off with option A.
You can always store them as option A (one big table) and create Views for the different sub-notes if you so please. That way, you can still have a logical seperation while having good searchability.
Generally speaking, but this might be close to a religious discussion so beware, I believe that a relational database should be a relational database and not try to mimic an OO structure. Let your classes do the OO stuff, let the db be relational. There are specific OO databases available if you want to extend this to your datastore. It does mean that you have to cross the 'Object-relational impedance mismatch' as they call it, but again there are ORM mappers for that specific purpose.
I would go for option A.
Solution B is good if the class hierarchy is very complex with dozens of classes inheriting each others. It's the most scalable solution. However, the drawback is that it makes the SQL more complex and slower.
For relatively simple cases, like 4 or 5 classes all inheriting the same base class, it makes more sense to choose solution A. The SQL would be more simple and faster. And the overhead of having additional columns with NULL values is negligible.
There's a series of patterns collectively known as "Crossing Chasms" I've used for many years. Don't let the references to Smalltalk throw you - it's applicable to any object oriented language. Try the following references:
A Pattern Language for Relational Databases and Smalltalk
Crossing Chasms - The Static Patterns
Crossing Chasms - The Architectural Patterns
Share and enjoy.
EDIT
Wayback Machine links to everything I've been able to find on the Crossing Chasms patterns:
http://web.archive.org/web/20040604122702/http://www.ksccary.com/article1.htm
http://web.archive.org/web/20040604123327/http://www.ksccary.com/article2.htm
http://web.archive.org/web/20040604010736/http://www.ksccary.com/article5.htm
http://web.archive.org/web/20030402004741/http://members.aol.com/kgb1001001/Chasms.htm
http://web.archive.org/web/20060922233842/http://people.engr.ncsu.edu/efg/591O/s98/lectures/persistent-patterns/chasms.pdf
http://web.archive.org/web/20081119235258/http://www.smalltalktraining.com/articles/crossingchasms.htm
http://web.archive.org/web/20081120000232/http://www.smalltalktraining.com/articles/staticpatterns.htm
I've created a Word document which integrates all the above into something resembling a coherent whole, but I don't have a server I can drop it on to make it publicly available. If someone can suggest a free document repository I'd be happy to put the doc up there.
I known that this question is old, but I have another option:
You can store in any table column (text type) a Note object, or an Note object collection, as json structure. You can serialize and deserialize json using Newtonsoft. You will need to specifies type name handling options to Object for the JsonSerializer.