OptaPlanner: Is the "constraint match" associated with a score just a semantical thing? - optaplanner

I have a question about OptaPlanner constraint stream API. Are the constraint matches only used to calculate the total score and are meant to help the user see how the score results, or is this information used to find a better solution?
With "used to find a better solution" I mean the information is used to get the next move(s) in the local search phase.
So does it matter which planning entity I penalize?
Currently, I am working on an examination scheduler. One requirement is to distribute the exams of a single student optimally.
The number of exams per student varies. Therefore, I wrote a cost function that gives a normalized value, indicating how well the student's exams are distributed.
Let's say the examination schedule in the picture has costs of 80. Now, I need to break down this value to the individual exams. There are two different ways to do this:
Option A: Penalize each of the exams with 10 (10*8 = 80).
Option B: Penalize each exam according to its actual impact.=> Only the exams in the last week are penalized as the distribution of exams in week one and week two is fine.
Obviously, option B is semantically correct. But does the choice of the option affect the solving process?

The constraint matches are there to help explain the score to humans. They do not, in any way, affect how the solver moves or what solution you are going to get. In fact, ScoreManager has the capability to calculate constraint matches after the solver has already finished, or for a solution that's never even been through the solver before.
(Note: constraint matching does affect performance, though. They slow everything down, due to all the object iteration and creation.)
To your second question: Yes, it does matter which entity you penalize. In fact, you want to penalize every entity that breaks your constraints. Ideally it should be penalized more, if it breaks the constraints more than some other entity - this way, you get to avoid score traps.
EDIT based on an edit to the question:
In this case, since you want to achieve fairness per student, I suggest your constraint does not penalize the exam, but rather the student. Per student, group your exams and apply some fairness ConstraintCollector. If you do it like that, you will be able to create a per-student fairness function and use its value as your penalty.
The OptaPlanner Tennis example shows one way of doing fairness. You may also be interested in a larger fairness discussion on the OptaPlanner blog.

Related

Prioritising scores in the VRP solution with OptaPlanner

I am using optaplanner to solve my VRP problem. I have several constraint providers, for example: one to enforce the capabilities and another to enforce the TW regarding the arrival time, both HARD. At the finish of the optimisation it returns a route with a negative score and when I analyse the ConstraintMach I find that it is a product of a vehicle capacity constraint. However, I consider that in my problem it does not objective that the vehicle arrives on time (meeting TW's constraint) if it will not be able to satisfy the customer's demands.That's why I require that the constraints I have defined for the capacities (Weight and Volume) have more weight/priority than the Time Window constraint.
Question: How can I configure the solver or what should I consider to apply all the hard constraints, but make some like the capacity ones have more weight than others?
Always grateful for your suggestions and help
I am not by far an expert on OptaPlanner but every constraint penalty (or reward) is divided into two parts if you use penalizeConfigurable(...) instead of penalize(...). Then each constraint score will be evaluated as the ConstraintWeight that you declare in a ConstrainConfig file multiplied by MatchWeight that is how you implement the deviation from the desired result. Like the number of failed stops might be Squared turning into an exponential penalty instead of just linear.
ConstaintWeights can be reconfigured between Solutions to tweak the importance of a penalty and setting it to Zero will negate it completely. MatchWeight is an implementation detail even in my view that you tweak while you develop. At least this how I see it.

Getting the optimal number of employees for a month (rostering)

Is it possible to get the optimal number of employees in a month for a given number of shifts?
I'll explain myself a little further taking the nurse rostering as an example.
Imagine that we don't know the number of nurses to plan in a given month with a fixed number of shifts. Also, imagine that each time you insert a new nurse in the planification it decreases your score and each nurse has a limited number of normal hours and a limited number of extra hours. Extra hours decrease more the score than normal ones.
So, the problem consists on getting the optimal number of nurses needed and their planification. I've come up with two possible solutions:
Fix the number of nurses clearly above of the ones needed and treat the problem as an overconstrained one, so there will be some nurses not assigned to any shifts.
Launching multiple instances of the same problem in parallel with an incremental number of nurses for each instance. This solution has the problem that you have to estimate more or less an approximate range of nurses under and above the nurses needed beforehand.
Both solutions are a little bit inefficient, is there a better approach to tackle with this problem?
I call option 2 doing simulations. Typically in simulations, they don't just play with the number of employees, but also the #ConstraintWeights etc. It's useful for strategic "what if" decisions (What if we ... hire more people? ... focus more on service quality? ... focus more on financial gain? ...)
If you really just need to minimize the number of employees, and you can clearly weight that versus all the other hard and soft constraint (probably as a weight in between both, similar to overconstrained planning), then option 1 is good enough - and less cpu costly.

Discrete optimisation: large number of optimal solutions

TL;DR version: Is there a way to cope with optimisation problems where there exists a large number of optimal solutions (solutions that find the best objective value)? That is, finding an optimal solution is pretty quick (but highly dependent on the size of the problem, obviously) but many such solutions exists so that the solver runs endlessly trying to find a better solution (endlessly because it does find other feasible solutions but with an objective value equals to the current best).
Not TL;DR version:
For a university project, I need to implement a scheduler that should output the schedule for every university programme per year of study. I'm provided some data and for the matter of this question, will simply stick to a general but no so rare example.
In many sections, you have mandatory courses and optional courses. Sometimes, those optional courses are divided in modules and the student needs to choose one of these modules. Often, they have to select two modules, but some combinations arise more often than others. Clearly, if you count the number of courses (mandatory + optional courses) without taking into account the subdivision into modules, you happen to have more courses than time slots in which they need to be scheduled. My model is quite simple. I have constraints stating that every course should be scheduled to one and only one time slot (period of 2 hours) and that a professor should not give two courses at the same time. Those are hard constraints. The thing is, in a perfect world, I should add hard constraints stating that a student cannot have two courses at the same time. But because I don't have enough data and that every combination of modules is possible, there is no point in creating one student per combination mandatory + module 1 + module 2 and apply the hard constraints on each of these students, since it is basically identical to have one student (mandatory + all optionals) and try to fit the hard constraints - which will fail.
This is why, I decided to move those hard constraints in an optimisation problem. I simply define my objective function minimising for each student the number of courses he/she takes that are scheduled simultaneously.
If I run this simple model with only one student (22 courses) and 20 time slots, I should have an objective value of 4 (since 2 time slots embed each 2 courses). But, using Gurobi, the relaxed objective is 0 (since you can have fraction of courses inside a time slot). Therefore, when the solver does reach a solution of cost 4, it cannot prove optimality directly. The real trouble, is that for this simple case, there exists a huge number of optimal solutions (22! maybe...). Therefore, to prove optimality, it will go through all other solutions (which share the same objective) desperately trying to find a solution with a smaller gap between the relaxed objective (0) and the current one (4). Obviously, such solution doesn't exist...
Do you have any idea on how I could tackle this problem? I thought of analysing the existing database and trying to figure out which combinations of modules are very likely to happen so that I can put back the hard constraints but it seems hazardous (maybe I will select a combination that leads to a conflict therefore not finding any solution or omitting a valid combination). The current solution I use is putting a time threshold to stop the optimisation...

Is it preferred to use end-time or duration for events in sql? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
My gut tells me that start time and end time would be better than start time and duration in general, but I'm wondering if there are some concrete advantages or disadvantages to the differing methods.
The advantage for strttime and endtime I am seeing is that if you want to call all events active during a certain time period you don't have to look outside that time period.
(this is for events that are not likely to change much after initial input and are tied to a specific time, if that makes a difference)
I do not see it as a preference or a personal choice. Computer Science is, well, a science, and we are programming machinery, not a sensitive child.
Re-inventing the Wheel
Entire books have been written on the subject of Temporal Data in Relational Databases, by giants of the industry. Codd has passed on, but his colleague and co-author C J Date, and recently H Darwen carry on the work of progressing and refining the Relational Model, in The Third Manifesto. The seminal book on the subject is Temporal Data & the Relational Model by C J Date, Hugh Darwen, and Nikos
A Lorentzos.
There are many who post opinions and personal choices re CS subjects as if they were choosing ice cream. This is due to not having had any formal training, and thus treating their CS task as if they were the only person on the planet who had come across that problem, and found a solution. Basically they re-invent the wheel from scratch, as if there were no other wheels in existence. A lot of time and effort can be saved by reading technical material (that excludes Wikipedia and MS publications).
Buy a Modern Wheel
Temporal Data has been a problem that has been worked with by thousands of data modellers following the RM and trying to implement good solutions. Some of them are good and others not. But now we have the work of giants, seriously researched, and with solutions and prescribed treatment provided. As before, these will eventually be implemented in the SQL Standard. PostgreSQL already has a couple of the required functions (the authors are part of TTM).
Therefore we can take those solutions and prescriptions, which will be (a) future-proofed and (b) reliable (unlike the thousands of not-so-good Temporal databases that currently exist), rather than relying on either personal opinion, or popular votes on some web-site. Needless to say, the code will be much easier as well.
Inspect Before Purchase
If you do some googling, beware that there are also really bad "books" available. These are published under the banner of MS and Oracle, by PhDs who spend their lives at the ice cream parlour. Because they did not read and understand the textbooks, they have a shallow understanding of the problem, and invent quite incorrect "solutions". Then they proceed to provide massive solutions, not to Temporal data, but to the massive problems inherent in their "solutions". You will be locked into problems that have been identified and sole; and into implementing triggers and all sorts of unnecessary code. Anything available free is worth exactly the price you paid for it.
Temporal Data
So I will try to simplify the Temporal problem, and paraphrase the guidance from the textbook, for the scope of your question. Simple rules, taking both Normalisation and Temporal requirements into account, as well as usage that you have not foreseen.
First and foremost, use the correct Datatype for any kind of Temporal column. That means DATETIME or SMALLDATETIME, depending on the resolution and range that you require. Where only DATE or TIME portion is required , you can use that. This allows you to perform date & time arithmetic using SQL function, directly in your WHERE clause.
Second, make sure that you use really clear names for the columns and variables.
There are three types of Temporal Data. It is all about categorising the properly, so that the treatment (planned and unplanned) is easy (which is why yours is a good question, and why I provide a full explanation). The advantage is much simpler SQL using inline Date/Time functions (you do not need the planned Temporal SQL functions). Always store:
Instant as SMALL/DATETIME, eg. UpdatedDtm
Interval as INTEGER, clearly identifying the Unit in the column name, eg. IntervalSec or NumDays
There are some technicians who argue that Interval should be stored in DATETIME, regardless of the component being used, as (eg) seconds or months since midnight 01 Jan 1900, etc. That is fine, but requires more unwieldy (not complex) code both in the initial storage and whenever it is extracted.
whatever you choose, be consistent.
Period or Duration. This is defined as the time period between two separate Instants. Storage depends on whether the Period is conjunct or disjunct.
For conjunct Periods, as in your Event requirement: use one SMALL/DATETIME for EventDateTime; the end of the Period can be derived from the beginning of the Period of the next row, and EndDateTime should not be stored.
For disjunct Periods, with gaps in-between yes, you need 2 x SMALL/DATETIMEs, eg. a RentedFrom and a RentedTo. If it is in the same row.
Period or Duration across rows merely need the ending Instant to be stored in some other row. ExerciseStart is the Event.DateTime of the X1 Event row, and ExerciseEnd is the Event.DateTime of the X9 Event row.
Therefore Period or Duration stored as an Interval is simply incorrect, not subject to opinion.
Data Duplication
Separately, in a Normalised database, ie. where EndDateTime is not stored (unless disjoint, as per above), storing a datum that can be derived will introduce an Update Anomaly where there was none.
with one EndDateTime, you have version of a the truth in one place; where as with duplicated data, you have a second version of the fact in another column:
which breaks 1NF
the two facts need to be maintained (updated) together, transactionally, and are at the risk of being out of synch
different queries could yeild different results, due to two versions of the truth
All easily avoided by maintaining the science. The return (insignificant increase in speed of single query) is not worth destroying the integrity of the data for.
Response to Comments
could you expand a little bit on the practical difference between conjunct and disjunct and the direct practical effect of these concepts on db design? (as I understand the difference, the exercise and temp-basal in my database are disjunct because they are distinct events separated by whitespace.. whereas basal itself would be conjunct because there's always a value)
Not quite. In your Db (as far as I understand it so far):
All the Events are Instants, not conjunct or disjunct Periods
The exceptions are Exercise and TempBasal, for which the ending Instant is stored, and therefore they have Periods, with whitespace between the Periods; thus they are disjunct.
I think you want to identify more Durations, such a ActiveInsulinPeriod and ActiveCarbPeriod, etc, but so far they only have an Event (Instant) that is causative.
I don't think you have any conjunct Periods (there may well be, but I am hard pressed to identify any. I retract what I said (When they were Readings, they looked conjunct, but we have progressed).
For a simple example of conjunct Periods, that we can work with re practical effect, please refer to this time-series question. The text and perhaps the code may be of value, so I have linked the Q/A, but I particularly want you the look at the Data Model. Ignore the three implementation options, they are irrelevant to this context.
Every Period in that database is Conjunct. A Product is always in some Status. The End-DateTime of any Period is the Start-DateTime of the next row for the Product.
It entirely depends on what you want to do with the data. As you say, you can filter by end time if you store that. On the other hand, if you want to find "all events lasting more than an hour" then the duration would be most useful.
Of course, you could always store both if necessary.
The important thing is: do you know how you're going to want to use the data?
EDIT: Just to add a little more meat, depending on the database you're using, you may wish to consider using a view: store only (say) the start time and duration, but have a view which exposes the start time, duration and computed end time. If you need to query against all three columns (whether together or separately) you'll want to check what support your database has for indexing a view column. This has the benefits of convenience and clarity, but without the downside of data redundancy (having to keep the "spare" column in sync with the other two). On the other hand, it's more complicated and requires more support from your database.
End - Start = Duration.
One could argue you could even use End and Duration, so there really is no difference in any of the combinations.
Except for the triviality that you need the column included to filter on it, so include
duration: if you need to filter by duration of execution time
start + end: if you need to trap for events that both start and end within a timeframe

Is this a textbook design pattern, or did I invent something new?

I'm fresh out of designing a set of tables, in which I came up with an architecture that I was very pleased with! I've never seen it anywhere else before, so I'd love to know if I've just reinvented the wheel (most probable), or if this is a genuine innovation.
Here's the problem statement: I have Employees who can each sign a different contract with the company. Each employee can perform different Activities, and each activity may have a different pay rate, sometimes a fixed amount for completing one activity, sometimes an hourly rate, and sometimes at a tiered rate. There may also be a specific customer who likes the employee particularly, so when he works with that specific customer, he gets a higher rate. And if no rate is defined, he gets the company default rate.
Don't fuss about the details: the main point is that there are a lot of pay rates that can be defined, each in a fairly complicated way. And the pay rates all have the following in common:
Service Type
Pay Scale Type (Enum: Fixed Amount/Hourly Rate/Tiered Rate)
Fixed Amount (if PayScaleType = FA)
Hourly Rate (if PayScaleType = HR) - yes, could be merged into one field, but for reasons I won't go into here, I've kept them separate
Tiers (1->n relationship, with all the tiers and the amount to pay once you have gone over the tier threshold)
These pay rates apply to:
Default company rate
Employee rate
Employee override rate (defined per customer)
If I had to follow the simple brute force approach, I would have to create a PayRate and PayRateTier clone table for each of the 3 above tables, plus their corresponding Linq classes, plus logic to calculate the rates in 3 separate places, somehow refactoring to reuse the calculation logic. Ugh. That's like using copy and paste, just on the database.
So instead, what did I do? I created a intermediary table, which I called PayRatePackage, consisting only of an ID field. I have only one PayRate table with a mandatory FK to PayRatePackage, and a PayRateTier table with a mandatory FK to PayRate. Then, DefaultCompanyPayRate has a mandatory FK to PayRatePackage, as do EmployeeRate and EmployeeOverrideRate.
So simple - and it works!
(Pardon me for not attaching diagrams; that would be a lot of effort to go to for a SO question where I've already solved the main problem. If a lot of people want to see a diagram, please say so in the comments, and I'll throw something together.)
Now, I'm pretty sure that something this simple and effective must be in a formal design pattern somewhere, and I'd love to know what it is. Or did I just invent something new? :)
I'm pretty sure this is the Strategy Pattern
"Define a family of algorithms, encapsulate each one, and make them interchangeable. Strategy lets the algorithm vary independently from clients that use it."
Sounds like relational database design to me. You broke out specific logic into specific entities, and keyed them back to the original tables... Standard normalization...