We're are trying to put together a proof of concept planning constraints solver using OptaPlanner. However the construction phase seems slow for even a trivial set of constraints i.e. assign to one User with no overlapping Tasks for that User.
Problem overview:
We are assigning Tasks to Users
Only one Task can be assigned to User
The Tasks can be variable length: 1-16 hours
Users can only do one Task at a time
Users have 8 hours per day
We are using the Time Grain pattern - 1 grain = 1 hour.
See constraints configuration below.
This works fine (returns in a 20 seconds) for a small number of Users and Tasks e.g. 30 Users / 1000 Tasks but when we start scaling up the performance rapidly drops off. Simply increasing the number of Users without increasing the number of Tasks (300 Users / 1000 Tasks) increases the solve time to 120 seconds.
But we hope to scale up to 300 Users / 10000 Tasks and incorporate much more elaborate constraints.
Is there a way to optimise the constraints/configuration?
Constraint constraint1 = constraintFactory.forEach(Task.class)
.filter(st -> st.getUser() == null)
.penalize("Assign Task", HardSoftLongScore.ONE_HARD);
Constraint constraint2 = constraintFactory.forEach(Task.class)
.filter(st -> st.getStartDate() == null)
.penalize("Assign Start Date", HardSoftLongScore.ONE_HARD);
Constraint constraint3 = constraintFactory
.forEachUniquePair(Task.class,
equal(Task::getUser),
overlapping(st -> st.getStartDate().getId(),
st -> st.getStartDate().getId() + st.getDurationInHours()))
.penalizeLong("Crew conflict", HardSoftLongScore.ONE_HARD,
(st1, st2) -> {
int x1 = st1.getStartDate().getId() > st2.getStartDate().getId() ? st1.getStartDate().getId(): st2.getStartDate().getId();
int x2 = st1.getStartDate().getId() + st1.getDurationInHours() < st2.getStartDate().getId() + st2.getDurationInHours() ?
st1.getStartDate().getId() + st1.getDurationInHours(): st2.getStartDate().getId() + + st2.getDurationInHours();
return Math.abs(x2-x1);
});
constraint1 and constraint2 seem redundant to me. The Construction Heuristic phase will initialize all planning variables (automatically, without being penalized for not doing so) and Local Search will never set a planning variable to null (unless you're optimizing an over-constrained problem).
You should be able to remove constraint1 and constraint2 without impact on the solution quality.
Other than that, it seems you have two planning variables (Task.user and Task.startDate). By default, in each CH step, both variables of a selected entity are initialized "together". That means OptaPlanner looks for the best initial pair of values for that entity in the Cartesian product of all users and all time grains. This scales poorly.
See the Scaling construction heuristics chapter to learn how to change that default behavior and for other ways how to make Construction Heuristic algorithms scale better.
Related
We have a use-case where we want to present the user with some human-readable message with why an "assignment" was rejected based on the score of the constraints.
For e.g. in the CloudBalancing problem with 3 computers (Computer-1,2,3) and 1 process (Process-1) we ended up with the below result:
Computer-1 broke a hard constraint (requiredCpu)
Computer-2 lost due to a soft constraint (min cost)
Computer-3 assigned to Process-1 --> (Optimal solution)
We had implemented the BestSolutionChanged listener where we used solution.explainScore() to get some info and enabled DEBUG logging which provided us the OptaPlanner internal logs for intermediate moves and their scores. But the requirement is to provide some custom human readable information on why all the non-optimal solutions (Computer-1, Computer-2) were rejected even if they were infeasible (basically explanation of scores of these two solutions).
So wanted to know how can we achieve the above ?
We did not want to rely on listening to BestSolutionChanged event as
it might not get triggered for other solutions if the LS/CH
phase starts with a solution which is already a "best solution"
(Computer-3). Is this a valid assumption ?
DEBUG logs do provide us with the
information but building a custom message from this log does not seem
like a good idea so was wondering if there is another
listener/OptaPlanner concept which can be used to achieve this.
By "all the non-optimal solutions", do you mean instead a particular non-optimal solution? Search space can get very large very quickly, and OptaPlanner itself probably won't evaluate the majority of those solutions (simply because the search space is so large).
You are correct that BestSolutionChanged event will not fire again if the problem/solution given to the Solver is already the optimal solution (since by definition, there are no solutions better than it).
Of particular interest is ScoreManager, which allows you to calculate and explain the score of any problem/solution:
(Examples taken from https://www.optaplanner.org/docs/optaplanner/latest/score-calculation/score-calculation.html#usingScoreCalculationOutsideTheSolver)
To create it and get a ScoreExplanation do:
ScoreManager<CloudBalance, HardSoftScore> scoreManager = ScoreManager.create(solverFactory);
ScoreExplanation<CloudBalance, HardSoftScore> scoreExplanation = scoreManager.explainScore(cloudBalance);
Where cloudBalance is the problem/solution you want to explain. With the
score explanation you can:
Get the score
HardSoftScore score = scoreExplanation.getScore();
Break down the score by constraint
Collection<ConstraintMatchTotal<HardSoftScore>> constraintMatchTotals = scoreExplanation.getConstraintMatchTotalMap().values();
for (ConstraintMatchTotal<HardSoftScore> constraintMatchTotal : constraintMatchTotals) {
String constraintName = constraintMatchTotal.getConstraintName();
// The score impact of that constraint
HardSoftScore totalScore = constraintMatchTotal.getScore();
for (ConstraintMatch<HardSoftScore> constraintMatch : constraintMatchTotal.getConstraintMatchSet()) {
List<Object> justificationList = constraintMatch.getJustificationList();
HardSoftScore score = constraintMatch.getScore();
...
}
}
and get the impact of individual entities and problem facts:
Map<Object, Indictment<HardSoftScore>> indictmentMap = scoreExplanation.getIndictmentMap();
for (CloudProcess process : cloudBalance.getProcessList()) {
Indictment<HardSoftScore> indictment = indictmentMap.get(process);
if (indictment == null) {
continue;
}
// The score impact of that planning entity
HardSoftScore totalScore = indictment.getScore();
for (ConstraintMatch<HardSoftScore> constraintMatch : indictment.getConstraintMatchSet()) {
String constraintName = constraintMatch.getConstraintName();
HardSoftScore score = constraintMatch.getScore();
...
}
}
I am saving the best solution into the DB, and we display that on the web page. I am looking for some solution where a user can add more visits, but that should not change already published trips.
I have checked the documentation and found ProblemFactChange can be used, but only when the solver is already running.
In my case solver is already terminated and the solution is also published. Now I want to add more visits to the vehicle without modifying the existing visits of the Vehicle. Is this possible with Optaplanner? if yes any example of documentation would be very helpful.
You can use PlanningPin annotation for avoiding unwanted changes.
Optaplanner - Pinned planning entities
If you're not looking for pinning (see Ismail's excellent answer), take a look at the OptaPlanner School Timetabling example, which allows adding lessons between solver runs. The lessons simply get stored in the database and then get loaded when the solver starts.
The difficulty with VRP is the chained model complexity (we're working on an alternative): If you add a visit X between A and B, then make sure that afterwards A.next = X, B.previous = X, X.previous = A, X.next = B and X.vehicle = A.vehicle. Not the mention the arrival times etc.
My suggestion would be to resolve what is left after the changes have been introduced. Let's say you are you visited half of your destinations (A -> B -> C) but not yet (C - > D -> E) when two new possible destinations (D' and E') are introduced. Would not this be the same thing as you are starting in C and trying plan for D, D', E and E'? The solution needs to be updated on the progress though so the remainder + changes can be input to the next solution.
Just my two cent.
My friends,
In the past couple of years I read a lot about AI with JS and some libraries like TensorFlow. I have great interest in the subject but never used it on a serious project. However, after struggling a lot with linear regression to solve an optimization problem I have, I think that finally I will get much better results, with greater performance, using AI. I work for 12 years with web development and lots of server side, but never worked with any AI library, so please, have a little patience with me if I say something stupid!
My problem is this: every user that visits our platform (website) we save the Hour, Day of the week, if the device requesting the page was a smartphone or computer... and such of the FIRST access the user made. If the user keeps visiting other pages, we dont care, we only save the data of the FIRST visit. And if the user anytime does something that we consider a conversion, we assign that conversion to the record of the first access that user made. So we have almost 3 millions of lines like this:
SESSION HOUR DAY_WEEK DEVICE CONVERSION
9847 7 MONDAY SMARTPHONE NO
2233 13 TUESDAY COMPUTER YES
5543 19 SUNDAY COMPUTER YES
3721 8 FRIDAY SMARTPHONE NO
1849 12 SUNDAY COMPUTER NO
6382 0 MONDAY SMARTPHONE YES
What I would like to do is this: next time a user visits our platform, we wanna know the probability of that user making a conversion. If a user access now, our website, depending on their device, day of week, hour... we wanna know the probability of that user making a future conversion. With that, we can show very specific messages to the user while he is using our platform and a different price model according to that probability.
CURRENTLY we are using a liner regression, and it predicts if the user will make a conversion with an accuracy of around 30%. It's pretty low but so far, it's the best we got it, and this linear regression generates almost 18% increase in conversions when we use it to show specific messages/prices to that specific user compaired to when we dont use it. SO, with a 30% accuracy our linear regression already provides 18% better conversions (and with that, higher revenues and so on).
In case you are curious, our linear regression model works like this: we generate a linear equation to every first user access on our system with variables that our system tries to find in order to minimize error sqr(expected value - value). Using the data above, our model would generate these equations below (SUNDAY = 0, MONDAY = 1...COMPUTER = 0, SMARTPHONE = 1... CONVERSION YES = 1 and NO = 0)
A*7 + B*1 + C*1 = 0
A*13 + B*2 + C*0 = 1
A*19 + B*0 + C*0 = 1
A*8 + B*6 + C*1 = 0
A*12 + B*0 + C*1 = 0
A*0 + B*1 + C*0 = 1
So, our system find the best A, B and C that generates the minimizes error. How can we do that with AI? If possible, it would be nice if we could use TensorFlow or anything with JS! I know there are several AI models, and I have no idea which one would best fit what we need!
In my optaplanner project i have periods with fixed duration.
For some of them there is a medium constraint that they should be scheduled in a row, occupying for example 5 directly adjacent timeslots.
I want to use Java constraints streams but dont manage to define this constraint using the timeslot-pattern.
I know that this constraint can be defined using the time-grain-pattern as suggested by https://stackoverflow.com/a/30702865. I have done this and it works. But I want to compare timeslot-pattern to time-grain-pattern because the have different behaviour when it comes to escape local maxima. The problem with time-grain-pattern is that those 5 periods could also be sheduled in every possible partition of 5 (eg. as 2 + 2 + 1).
Has anyone a hint on how to define the constraint using timeslot-pattern?
You may want to try using the newly added ifExists() building block. Without knowing your actual domain model, I imagine the constraint to look like this:
private Constraint twoConsecutivePeriods(ConstraintFactory constraintFactory) {
return constraintFactory.from(Period.class)
.ifExists(Period.class, equal(Period::getDay, period -> period.getDay() + 1))
.penalize("2 consecutive periods", period -> ...);
}
Consequently, ifNotExists() may be used to achieve the opposite. We have examples of both in Traveling Tournament OptaPlanner example.
Please note that this API is only available since OptaPlanner 7.33.0.Final onward.
I solved the problem using ConstraintCollectors.toList() as follows:
factory.from(Period.class).groupBy(Period::getCourse, ConstraintCollectors.toList())
.penalize(id, score, (course,list)->dayDistributionPenalize(course,list));
and
public int dayDistributionPenalize(Course course, List<Period> list) {
var penalize = 0;
var dayCodes = dayCodesFromPeriodList(list);
for (var dayCode : dayCodes)
if (!getAllowedDayCodes(course).contains(dayCode))
penalize++;
return penalize;
}
Where getAllowedDayCodes() returns for example 0111110000, 0230000000 or 0005000000 etc. from a hash-map if a course has to have 5 consecutive periods.
I want to match my user to a different user in his/her community every day. Currently, I use code like this:
#matched_user = User.near(#user).order("RANDOM()").first
But I want to have a different #matched_user on a daily basis. I haven't been able to find anything in Stack or in the APIs that has given me insight on how to do it. I feel it should be simpler than having to resort to a rake task with cron. (I'm on postgres.)
Whenever I find myself hankering for shared 'memory' or transient state, I think to myself "this is what (distributed) caches were invented for".
#matched_user = Rails.cache.fetch(#user.cache_key + '/daily_match', expires_in: 1.day) {
User.near(#user).order("RANDOM()").first
}
NOTE: While specifying a TTL for cache entry tells Rails/the cache system to try and keep that value for the given timeframe, there's NO guarantee that it will. In particular, a cache that aggressively tries to reclaim memory may expire an entry well before its desired expires_in time.
For this particular use case, it shouldn't be a big deal but in cases where the business/domain logic demands periodically generated values that are durable then you really have to factor that into your database.
How about using PostgreSQL's SETSEED function? I used the date to seed so that every day the seed will change, but within a day, the seed will be consistent.:
User.connection.execute "SELECT SETSEED(#{Date.today.strftime("%y%d%m").to_i/1000000.0})"
#matched_user = User.near(#user).order("RANDOM()").first
You may want to seed a random value after using this so that any future calls to random aren't biased:
random = User.connection.execute("SELECT RANDOM()").to_a.first["random"]
# Same code as above:
User.connection.execute "SELECT SETSEED(#{Date.today.strftime("%y%d%m").to_i/1000000.0})"
#matched_user = User.near(#user).order("RANDOM()").first
# Use random value before seed to make new seed:
User.connection.execute "SELECT SETSEED(#{random})"
I have split these steps in different sections just for readability. you can optimise query later.
1) Find all user records till today morning. so that the count will freeze.
usrs_till_today_morning = User.where("created_at <?", DateTime.now.in_time_zone(Time.zone).beginning_of_day)
2) Pluck all ID's
user_ids = usr_till_today_morning.pluck(:id)
3) Today date it will be a range (1..30) but will remain constant throughout the day.
day_today = Time.now.day
4) Select the same ID for the day
todays_user_id = user_ids[day_today % user_ids.count]
#matched_user = User.find(todays_user_id)
So it will give you random user records by maintaining same record throughout the day!!