Optaplanner - Multiple planning entities with same blockId- How to "move all" or "chain" or "shadow" to the same planning variable? - optaplanner

I am trying to assign Timeslots to planningEntities
(containing room, groups, persons handled by constraint streams).
Some of these planning entities has a blockId.
When entity has a blockId the goal is to share timeslot with other entities with the same blockId.
I defined a constraint for this, but I can see that the solver does extremely many unnecessary moves.
public Constraint groupBlockConstraint(ConstraintFactory constraintFactory){
return constraintFactory.forEachUniquePair(Lesson.class,
Joiners.equal(Lesson::getSequenceGroup),
Joiners.filtering((a, b) ->
!Lesson.withoutBlock(a, b)
&& !Lesson.sameTimeslot(a,b)))
.penalize("BlockSequence not in same timeslot", HardSoftScore.ofHard(15));
}
Is there a way to handle this more efficiently?

Constraints do not determine which moves the solver will be trying. Constraints are only used to score solutions which are achieved once moves are already performed.
Therefore if you're seeing moves which in your opinion should not be performed, you need to configure your selectors. Using tabu search could, perhaps, also help here.
That said, without a more detailed question I can not provide a less generic answer.

Related

insertLogical in Optaplanner VRP with tank volumes using Constraint Streams

I'm trying to convert a simple VRP project using drl to constraint streams, and I'm not sure how to replicate the functionality of "insertLogical".
The "demand" for each delivery is determined by the number of days from the previous delivery. (The demand is for liquid in tanks being consumed at a known rate.) In drl I'd pair the deliveries and insertLogical, and also insertLogical for the firstDelivery based on the current state.
Without insertLogical, I'm joining deliveries and using a groupBy for paired deliveries, but I can't see how to do an "Outer Join" to include the first delivery.
I also tried creating a continuous planning style "pre-schedule" delivery and then omitting those from the "planning value range". Which would mean the pairs would always exist, but I have a kludgy mess for preventing a Customer from using a pre-schedule vehicle.
So, is there a way to "insertLogical" or to do an outer join in constraint streams?
No and no.
You could build a constraint collector; see UniConstraintCollector interface or its bi/tri/... variants. This allows you to implement any custom logic in your groups.
Or you could create a shadow variable that would keep track of the first delivery. (In fact, with the new planning list variable, that may be even easier.)

OptaPlanner: Mixing constraint and for-next based score calculation

I am implementing several restrictions within my CustomizedConstraintProvider class using the streaming API. Nevertheless there is one special case, where I currently do not see how to get this properly implemented within the streaming API.
If I got several methods ...
private Constraint Restriction1(ConstraintFactory constraintFactory) {
return constraintFactory
.forEach(Class.class)
...
.penalize("Restriction1", HardMediumSoftBigDecimalScore.ONE_HARD)
}
private Constraint Restriction2(ConstraintFactory constraintFactory) {
return constraintFactory
.forEach(Class.class)
...
.penalize("Restriction2", HardMediumSoftBigDecimalScore.ONE_SOFT)
}
private Constraint Restriction3(ConstraintFactory constraintFactory) {
return constraintFactory
.forEach(Class.class)
...
.penalizeBigDecimal("Restriction3", HardMediumSoftBigDecimalScore.ONE_MEDIUM,
(a, b, c) -> BigDecimal.valueOf(Math.pow((b - c), a))
}
how can I implement one particular method (let's say "Restriction4" that runs with for-next loops, accessing the assignment lists and returning medium and soft scores at the end depending on the evaluation within the ConstraintFactroy approach? In the manual I only read this as an either or approach (TimeTableEasyScoreCalculator vs. TimeTableConstraintProvider in chapter 2 of the manual for the current OptaPlanner version 8.19.0). I am aware that the looping way scales way more poorly than the streaming alternative but this shall be a basis to get later into the more complex Constraint Stream Score Calculation having a working solution on hand for comparison.
Thanks in advance!
The easy and entirely unhelpful answer is that you can not use for-style loops in constraint streams.
The Constraint Streams API is designed to give you incremental performance, and therefore you need to think of your constraints in a certain way. This way is not always easy to learn, and it requires practice. That said, we have not yet seen a constraint which we could not implement incrementally.
For example, groupBy is a very powerful construct which allows you to transform your data in pretty much any way you want. If you implement a custom constraint collector, you can solve even very complex problems incrementally.
However, some users simply use groupBy() together with the toList() constraint collector, gather all their data in a single collection, and then penalize on that. I will not give an example of that, as it is an anti-pattern which leads to poor performance, and we generally discourage it.

Get number of attached constraints on a variable in MiniZinc

I have two sets of variables in my Minizinc program. Each variable from the first set necessarily has several constraints placed on it, but the variables in the second set are only implicitly constrained via their interactions with variables in the first set. This means that each of the variables in the second set may have anywhere from 0 to ~8 constraints placed on it, depending on the values taken by the variables in the first set.
I see that there is a way to reference the number of constraints placed on a variable at search time via the dom_w_deg search annotation, but I was wondering if there was anyway to access this information at runtime? I want to do this because I would like to specify additional constraints related to the number of constraints already placed on the variables.
I realize this is a weird question, and I may be approaching this whole thing the wrong way, but I've been banging my head against this problem for a while now, so figured I'd ask.
As a general rule, I think that you are approaching your problem erroneously. There are several mis-conceptions in the approach that I can identify leading to this:
Different solver back-ends might do very different things with the model and how it is solved
"A constraint" is not a meaningful concept for the solver. A single constraint might be multiple propagators in the back-end solver, a single propagator, or even just part of a propagator covering several constraints (assuming that it is a propagator based back-end).
Constraint models have monotonic behavior, so you can not in a well-defined and meaningful way change the model based on the number of constraints connected to a variable.
Given that a constraint maps to a single propagator, it may still have very different propagation strength, meaning that it might be done early or very late in the solving process.
Without knowing what you are actually trying to achieve, as a general technique you might be interested in using reification, where the truth of a constraint is reflected onto a binary Boolean variable. In general, it is good practice to have as little reification as possible, since it does not propagate much, but sometimes it is needed.
As a very simple example of using reification, this is a (probably not very good) model that tries to maximize the number of constraints satisfied.
set of int: Domain = 1..10;
var Domain: x;
var Domain: y;
var Domain: z;
array[1..3] of var bool: holds;
constraint holds[1] <-> x < y;
constraint holds[2] <-> y < z;
constraint holds[3] <-> z < x;
var int: goal;
constraint goal = sum(holds);
solve maximize goal;

How to implement a one-to-many relationship with an "Is Current" requirement

Designing a database, there's a relationship between two tables, Job and Document. One Job can have multiple Documents, but one (and only one) of these Documents needs to be flagged as IsCurrent. This is not always the most recent Document associated with that Job.
Structurally, I can see two ways of doing this.
The first is to add a DocumentId column to Job, and a JobId column to Document. This will work, but creates a circular reference: when imported into Entity Framework you end up with the peculiar situation that a Job has both a Document and a Documents collection. Likewise that Document has both a Job and a Jobs collection.
The second is to add an IsCurrent bit flag to the Document table. This will work, but leaves it logically possible for a Job to have multiple IsCurrent Documents, which is not allowed.
Questions:
1) Am I right in thinking there's no "third way" out of this dilemma?
2) Presuming not, which is better, and why? I favour the second solution as it seems much cleaner and we can enforce the single IsCurrent through the business logic. My colleague favours the former solution because it results in simpler C# code and object references - if we rename the foreign keys, it should avoid the confusion created by Job/Jobs.
If your back-end is SQL Server, you can create a filtered index to ensure that each job has at most one current document:
CREATE UNIQUE INDEX IX_Documents_Current
ON Documents (JobId) where IsCurrent=1
That way, it's not just enforced at the business level but is also enforced inside the database.
just for a third way (and for fun): consider using not a bit, but an int equals to max + 1 among the documents of the job.
then create a unique index on {job FK, said int}.
you can:
change current by updating the int,
get the current by searching the max and
prevent to have more than one current because of the unique index.
create a new non current document by using min - 1 for said int.
this is not the simplest to implement.
Yes there is a third way out of this dilemma. You need a DBMS that supports SQL's CREATE ASSERTION (and supports it correctly, of course). With such a DBMS, you can declare any data rule that applies to your situation and your DBMS will enforce that rule for you.
Unfortunately, no such DBMS exists * in the SQL world *. Outside of the SQL world, there are such engines. ASSERTIONs being my hobbyhorse, I wrote one myself. If you're interested, a Google search should lead you to it quickly.

Optional Database Entities

ORIGINAL (see UPDATED QUESTION below)
I am designing a new laboratory database that tests a wide variety of tests on a wide variety of sample types.
The following list is my current candidate for the list of main entities to best model the laboratory work.
For each entity, a 1-to-many relationship exists from that entity to the entity below. In other words, every entity (except REQ) has at least columns for entity_id and parent_id.
Main Entities:
REQ: Request (a form)
SAM: Sample (the material)
TST: Test (requested procedures)
SUB: ** Sub-Test (part of standard test)
TRI: ** Trial (single instance: usually for mean,range, and stddev)
MEA: Measurement (a measured number)
** Not all tests have subtests, and not all tests have trials.
Sub-tests are a set of tests grouped together by a single name for easy referencing. For example, a lot acceptance test (LAT) for a particular product is defined as the following tests: viscosity, %-nitrogen, pH, and density.
A trial is a single experiment performed multiple times for product assurance. For example, fifty bullets might be shot, and each shot is a trial. The accuracy of each bullet might be required to fall within a certain range, and the average accuracy of all fifty bullets might be required to be in a tighter range.
Question: How should I model cases when sub-tests and/or trials are not needed?
Option 1: Use a "blank" sub-test (or trial) if not needed.
Option 2: Consider sub-tests and trials to be tests (and have a test_id as a parent), so that measurements always have a test as a parent.
Option 3: Optional parents for measurement (trial, sub-test, or test) and trials (sub-test or test).
Option x: Any other option worth considering.
FYI: If required to answer the question, I will be using Oracle.
UPDATED QUESTION
In general, my schema is a heirarchy of entities where each entity (except top) must have ONE parent and (except bottom) must have at least one child. What is the best way to handle cases where an internal entity is unnecessary in certain situation, or what is the benefit/drawback to using a particular option?
Option 1 (Dummy): Use a "dummy" entry to indicate entity does not apply in this case.
Option 2 (Rollup): Roll-up optional entities into next higher parent entity.
Option 3 (Pick-a-Parent): Entity (C) below optional entity (B) with required entity (A) must have ONE parent but the parent can be either the optional entity (B) or the next higher one (A).
Option x: Any other option worth considering.
Addressing your simplified question:
Given a hierarchy as you've described, if I found that some levels in the hierarchy were optional, I would question whether a hierarchy really mapped well to my domain. I would consider drawing my relations differently, or redefining the entities in my schema.
I don't think a more detailed answer to the general question is possible in a short space like this, since figuring out the best representation of a domain is a) hard, and b) very specific to the particular domain.
Use Outer Joins. (RIGHT OUTER JOIN and LEFT OUTER JOIN).
They were made specifically for this.
< Edit > This is my first post. Based on the comments, I'll be adding a second post.
Here's my take on an architectural first pass. This stuff generally requires a LOT of back-and-forth with the subject matter experts to get right.
"Test" means one of:
- Take an action, measure results
- Take several actions (subtests), measure results for each
- Make no tests whatsoever (yet you can still have measurements -- ?)
I'd configure this as a "parent" Test table and a child "SubTest" table, where Test can have 0 or more related SubTests, and every SubTest must be related with one and only one Test. (If a test has only one SubTest, enter it in its own table, don't try and track SubTests in the Test table.)
Trials can only exist if there are SubTests. Therefore, Trials are a child of the SubTest table; SubTests can have zero or more Trials, and Trials must be related with one and only one SubTest.
Measures only exist if there are Trials. Therefore, repeat the above, with Measures as a child of Trials.
Can there be SubTests without Trials (or Tests)? If so, then don't enter any Trials.
Can there be Measures without Trials? If no, you don't need any Trials (or SubTests). If yes (?), once again enter some properly labeled dummy/placholder SubTests or Trials as necessary.
Again, this is rudimentary, and more interviews with the folks driving requirements is required.
As others have remarked it is hard for us to give a definitive answer without understanding more about your domain. You have attempted to distill a lot of business rules into a couple of paragraphs but some important information has been lost. Specifically, it is not possible to be sure whether two entities are genuinely distinct without knowing their attributes. Having said all which, let's have a go.
A TEST is a single procedure. Despite containing the word "test" a LAT is not a TEST in its own right but is rather a pre-defined set of such procedures. I would model this scenario as an entity TEST with an optional parent entity, which I would prefer to call TEST_GROUP (as that is what it is) but it is best to use the domain name, SUB_TEST.
A TRIAL appears to be distinct from a TEST, so model it as a separate entity. Therefore you have a choice when it comes to MEASUREMENT: you can have one entity with two optional foreign keys or you can have TEST_MEASUREMENT and TRIAL_MEASUREMENT. Choosing which road to go depends on the characteristics and usage profile.
The following is an initial stab at the entity relationships. This would be the point in the project when the user goes, "Oh no, that is not what I meant at all."
create table sample (
sample_id number not null
, constraint samp_pk primary key (sample_id)
)
/
create table sub_test (
sub_test_id number not null
, sample_id number not null
, constraint subt_pk primary key (sub_test_id)
, constraint subt_samp_fk foreign key (sample_id)
references sample (sample_id)
)
/
create table test (
test_id number not null
, sample_id number not null
, sub_test_id number
, constraint tst_pk primary key (test_id)
, constraint tst_samp_fk foreign key (sample_id)
references sample (sample_id)
, constraint tst_subt_fk foreign key (sub_test_id)
references sub_test (sub_test_id)
)
/
create table trial (
trial_id number not null
, test_id number not null
, constraint trl_pk primary key (trial_id)
, constraint trl_tst_fk foreign key (test_id)
references test (test_id)
)
/
create table measurement (
measurement_id number not null
, trial_id number
, test_id number
, constraint meas_pk primary key (measurement_id)
, constraint meas_tst_fk foreign key (test_id)
references test (test_id)
, constraint meas_trl_fk foreign key (trial_id)
references trial (trial_id)
, constraint measurement_ck check (
(test_id is not null and trial_id is null)
or (test_id is null and trial_id is not null)
)
/
Edit
Addressing your more generic question.
Option 1 (Dummy)
Never use a dummy record. It's is like using a magic value instead of a null. The solution is worse than the problem it solves.
Option 2 (Rollup)
This can work when the parent and the child have the same attributes. But it is not a viable solution if they have different columns, or if they are different dependencies. Even if they have identical data structures but different business uses it may still be a problem.
Option 3 (Pick-a-Parent)
This would be my preferred solution. The snag is the need for a check constraint to ensure that one (and only one) of the eligible foreign keys has been populated. You also need to guard against allowing too many parents/grandparents/great-grandparents into the mix.
I am not entirely sure I understand the details of your question, but it sounds like you should have the following:
Table Test
test_id, request, sample, test
Table SubTest
subtest_id, test_id (foreign key to Test)
Table Trial
trial_id, trial_name, measurement, subtest_id
So, Test is a collection of subtests (possibly just one subtest), and a subtest is a collection of Trials (possibly just one trial)
I'm not entirely certain I understand your domain, but could you do something like this?
Tests has a parent_test_id column, which can be NULL (when set, this is a subtest).
Trials has a test_id column. (All tests have at least one trial, since you did a thing and had at least one measurement, right?)
Measurements has a trial_id column.
This does seem to violate your premise, since it stipulates that all tests have at least one trial, so it's possible I misunderstand the requirements. How can you have a test with no trials?
Anyway, if necessary, you could put both a trial_id and a test_id on Measurements, possibly with a constraint that one or the other must be NULL (and the other must be set).
I'll take a second stab at this one, based on the feedback from my first post. The key thing to understand is that design and architecture can be highly iterative, and I doubt you'll get the ideal model without a lot of back-and-forth--something that doesn't play out to well on Stack Overflow. Odds are you'll take the ideas posted (APC has some good ones), bounce them around with the people you work with, and come up with something that'll work.
My goal these days when designing databases is to try and produce a fully normalized model. Once you've got that, if it doesn't seem reasonable or practical you can denormalize for efficiency, expediency, or whatever -- but the key thing is you denormalize after you've found the ideal model. If you stop normalization before you get to fully normalized, you haven't denormalized, you've just got a sloppy model.
Here's the entities I see to-date:
What you've labeled as the top-level test, for purposes of clarity here I'm going to call an Exam. You define an exam and all its contents (below), and people contact your laboratory to run these exams on their problems.
For any given exam performed for a customer, you run a bunch of Tests. Any given test may be used by (required by?) any number of exams.
Often, you get a set of Tests that are done together for more than one Exam. If there are properties that apply to the specific set of Tests, you might want to identify each set as its own entity. Call these TestGroups. However, if these are only used associate a specific set of Tests with one or more Exams, you might not get any particular benefit our of defining them as their own entity. (These are your SubTests.)
So, an Exam "has" or "contains" one or more Tests. Alternatively, Exams are related with one or more TestGroups. However, trying to relate an Exam with zero or more TestGroups and zero or more individual Tests will produce an overly complex model (let alone physical implmentation), and I'd really want to avoid that. Perhaps a TestGroup can contain a single Test, so Exams only reference TestGroups? Maybe an Exam can only be related to one TestGroup -- in which case that'd be the "many to many" table relating Exams with Tests. This depends on further discussion of requirements with the subject matter experts.
So you have Exams -- Exam definitions, really -- related somehow or other with multiple Tests. Next up, you have a "paid instance" of an Exam (customer X comes in and pays you to test his Widgets). Call this a CustomerExam; it has all the contact and billing info, identifies the Exam to be run, and thus is related to the Tests to be performed for the customer. (There's probably a Customer entity out there too...?)
Trials are perfomed for the Tests that are part of a CustomerExam. They don't relate with the Exam or the Test, they are an instance of the Trial being performed. (Seems safe to assume that the "meaning/definition" of a Trial would actually be part of a Test--for example, if Test = Is gun accurate, then the work required by a Trial for that Test = fire gun 50 times and measure). So as Trials are performed for the Tests of a given CustomerExam. Are they performed once, or more than once? (Is a trial to fire the gun 50 times, or is each shot counted as a trial? What if they do two rounds of 50 shots?) Whatever, the attributes of the Trial event are stored here -- when it happened, who did it, special notes/circumstances, whatever.
Measures are produced by (or for?) Trials. The meaning/definition of each measure is actually part of the definition of a Trial (which is part of the definition of a Test); the event of the Trial produces specific values for the defined/anticipated Measures. The assumption is that a Trial will generate zero (?) or more Measures, so Measures are their own entity.
Looking back at this, it seems like there's some form of implicit double stucture: a set of tables to define available Exams, Tests, Trials, and Measures (what can be examined, how can it be tested, what shall we measure) and a companion set of tables to track specific instances of each (who wanted it, who did the work, when did they do it, what were the results)
I've got to have way over-anazled this problem. The key thing here is, as with all design sessions, in posing ideas and asking questions, did they generate your own ideas, questions, or answers?