VRP with parallel relations in Optaplanner - optaplanner

We are trying to solve a VRP with Optaplanner where it is important that two (or more) customers are served at the same time.
This means, for example, that if customer #1 is supplied at 10 o'clock, then customer #2 must also be supplied at 10 o'clock.
It is not allowed to deliver to one customer and leave the other unscheduled.
Such constellations occur with approx 50% of all customers out of a total number of 1000 customers.
It is not sufficient to apply the "delay till last pattern".
All other conditions remain the same as in the VRP example.
How can we proceed in order to solve this problem with Optaplanner?
Are there any examples of such constellations?

In the docs, take a look at chapter Design Patterns, and specifically the auto delay until last pattern.

Related

Optaplanner inventory allocation

I am trying to work with Optaplanner in order to solve the following simple exmample :
https://www.gams.com/products/gams/gams-language/#the-gams-language-at-a-glance/
simple inventory allocation exemple
I find it very difficult to do so since I couldn't find a similar cas on optaplanner
Can anyone help ? some directions, ressources
thanks in advance
It looks like the Facility Location Problem in OptaPlanner Quickstarts, but a consumer can pull from 2 or more suppliers at the same time.
Because the number of units to ship from supplier A to consumer X is an integer (*), you would need to use continuous value ranges and custom moves like in the investment example. Not simple (unlike other use cases in OptaPlanner), but doable.
(*) These business requirements are maybe unrealistic: in the real world, a half-full truck is almost as expensive as a full truck.

Can proc sql embedded in sas macros dynamically merge to data-sets, simulating residential treatment placement decisions for trouble youth?

Good afternoon and happy Friday, folks
I’m trying to automate a placement simulation of youth into residential treatment where they will have the highest likelihood of success. Success is operationalized as “not recidivating” within 3 years of entering treatment. Equations predicting recidivism have been generated for each location, and the equations have been applied to each individual in the scenario (based on youth characteristics like risk, age, etc., LOS). Each youth has predicted success rates for every location, which throws in a wrench: youth are not qualified for all of the treatment facilities for which they have predicted success rates. Indeed, treatment locations have differing, yet overlapping qualifications.
Let’s take a made-up example. Johnny (ID # 5, below) is a 15-year-old boy with drug charges. He could have “predicted success rates” of 91% for location A, 88% for location B, 50% for location C, and 75% for location D. Johnny is most likely to be successful (i.e., not recidivate within three years of entering treatment) if he is treated at location A; unfortunately, location A only accepts youth who are 17 years old or older; therefore, Johnny would not qualify for treatment here. Alternatively, for Johnny, location B is the next best location. Let us assume that Johnny is qualified for location B, but that all of location-B beds are filled; so, we must now look to location D, as it is now Johnny’s “best available” option at 75%.
The score so far: We are matching youth to available beds in location for which they qualify and might enjoy the greatest likelihood of success. Unfortunately, each location only has a certain number of available beds, and the number of available beds different across locations. The qualifications of entry into treatment facilities differ, yet overlap (e.g., 12-17 year-olds vs 14-20 year-olds).
In order to simulate what placement decisions might look like based on success rates, I went through the scenario describe above for over 400 youth, by hand, in excel. It took me about a week. I’d like to use PROC SQL imbedded in a SAS MACRO to automate these placement scenarios with the ultimate goals of a) obtain the ability to bootstrap iterations in order to examine effect sizes across distributions, b) save time, and c) prevent further brain damage from banging my head again desk and wall in frustration whilst doing this by hand. Whilst never having had the necessity—nay—the privilege of using SQL in my typical roll as a researcher, I believe that this time has now come to pass and I’m excited about it! Honestly. I believe it has the capacity I’m looking for. Unfortunately, it is beating the devil out of me!
Here’s what I’ve got cookin’ so far: I want to create and automate the placement simulation with the clever use of merging/joining/switching/or something like that.
I have two datasets (tables). The first dataset contains all of the youth information (one row per youth; several columns with demographics, location ranks, which correspond to the predicted success rates). The order of rows in the youth dataset (was/will be randomly generated (to simulate the randomness with which youth enter the system and are subsequently place into treatment). Note that I will be “cleaning” the youth dataset prior to merging such that rank-column cells will only be populated for programs for which a respective youth qualifies. This should take the “does the youth even qualify for the program” problem out of the equation.
However, it still leaves the issue of availability left to be contended with in the scenario.
The second dataset containing the treatment facility beds, with each row corresponding to an available bed in one of the treatment location; two columns contain bed numbers and location names. Each bed (row) has only one location cell populated, but locations will populate several cells.
Thus, in descending order, I want to merge each youth row with the available bed that represents his/her best chance of success, and so the merge/join/switch/thing should take place
on youth.Rank1= distinct TF.Location,
and if youth.Rank1≠ TF.location then
merge on youth.Rank2= TF.location,
if youth.Rank2≠ TF.location then merge at
youth.Rank3 = TF.location, etc.
Put plainly: “Merge on rank1 unless rank1 location is no longer available, then merge on rank2, unless rank2 location is no longer available, and on down the line, etc., etc., until all option are exhausted and foster care (i.e., alternative services). Is the only option.
I’ve had no success getting this to work. I haven’t even been successful getting the union function to work. About the only successful thing I’ve done in SQL so far is create a view of a single dataset. It’s pretty sad. I’ve been following this guidance, but I get hung up around the “where” command:
proc sql; /Calls the SQL procedure*/;
create table x as /*Tells SAS to create a table called x*/
select /*Specifies the column(s) to be selected*/
from /*Specificies the tables(s) (data sets) to be queried*/
where /*Subjests the data based on a condition*/
group by /*Classifies the data into groups based on the specified
column(s)*/
order by /*Sorts the resulting rows observations) by the specified
column(s)*/
; quit; /*Ends the proc sql procedure*/
Frankly, I’m stuck and I could use some advice. This greenhorn in me is in way over his head.
I appreciate any help or guidance anyone might lend.
Cheers!
P
The process you describe (and to be honest I skiped to the end so I might of missed something) does not lend itself to SQL because each step could affect the results of the next one. However, you want to get the most best results for the most kids. (I think a lot of that text was to convince us how important it is to help out). You don't actually give us anything we can really use to help since you don't give any details of your data model, your data, or expected results. There really is no way to answer this question. But I don't care -- I'm going to go forward with some suggestions because it is a friday and I've never done a stream of consciousness answer to a stream of consciousness question before. I will suggest you don't formulate your solution just in sql, but instead use a higher level program and engage is a process like the one described below -- because this a DB questions I've noted the locations where the DB might be involved.
Generate a list kids (this can be in a table -- called NEEDY-KID)
Have a list of locations to assign (this can also be a table LOCATION)
Run your matching for best fit from KID to location -- at this point don't worry about assign more than one kid to a location -- there can be duplicates (put this in table called KID2LOC using a query)
Check KID2LOC for locations assigned twice -- use some method to remove the duplicate ones so each loc is only assigned once. (remove from the KID2LOC using a query)
Prune the LOCATION list to remove assigned locations (once again -- a query)
If kids exist without a location go to 3 with new pruned location list.
Done.

Optaplanner: how to handle minimum number consecutive

Let's assume a variation on Nurse Rostering example in which instead of assigning a nurse to a shift on a day, the nurse is assigned to a variable number of timeblocks on that day (which consists of 24 timeblocks). eg: Nurse1 is assigned to timeblocks [8,9,10,11,12,13,14]. Let's call these consecutive assignments a ShiftPeriod. There is a hard minimum and maximum on these shiftperiods. However, optaplanner has difficulties finding a feasible solution.
When having hard consecutive constraints, is it better to model the planning entity as a startTimeBlock with a duration instead of my current way with assignment to a timeblock and a day and then imposing min/max consecutive?
Take a look at the meeting scheduling example on github master for 6.4.0.Beta1 (but the example will work perfectly with 6.3.0.Final too). Video and docs coming soon. That example uses the design pattern TimeGrains, which is what you're looking for I think.

SDK2 query for counting: which is more efficient?

I have an app that is displaying metrics about defects in a project.
I have the option of making one query that returns all the defects, and from that I can break out about four different metrics (How many defects escaped QA in 90 days, 180 days, and then the same metrics again but only counting sev1/sev2 defects).
I could make four queries and limit the results to one so that I just get a count for each. Or I could make one query that encompass them all (all defects that escaped QA in 180 days) and then count up the difference.
I'm figuring worst case, the number of defects that escaped QA in the last six months will generally be less than 100, certainly less 500 worst case.
Which would you do-- four queryies with one result each, or one single query that on average might return 50, perhaps worst case 500?
And I guess the key question is-- where are the inflections points? Perhaps I have more metrics tomorrow (who knows, 8?) and a different average defect counts. Is there a rule of thumb I could use to help choose which approach?
Well I would probably make the series of four queries and use the result count. If you are expecting 500 defects that will end up being three queries each with 200 defects anyways.
The solution where you do each individual query and use the total result count would be safe with even a very large amount of defects. Plus I usually find it to be a bad plan to think that I know the data sets that an App will be dealing with. Most of my Apps end up living much longer and being used on larger datasets than I intended.
The max page size is 200, so it sounds like you'd be requesting between 1 and 3 pages to get all the data vs. 4 queries with a page size of 1 and using the TotalResultCount...
You'd definitely have less aggregation code to write if you use the multi query approach (letting the server do the counting for you based on your supplied filters).
I'd guess the 4 independent queries might be faster but it would be interesting to hear back your experimental results...

Interview: Determine how many x in y?

Today I had an interview with a gentleman who asked me to determine how many veterinarians are in the city of Atlanta. The interview was for an entry-level development position.
Assumptions: 1,000,000 people in Atlanta, 500,000 pets in Atlanta. The actual data is irrelevant.
Other than that there were no specifics. He asked me to find this data using only a whiteboard. There was no code required; it was simply a question to determine how well I could "reason" the problem. He said there was no right or wrong answer, and that I should work from the ground up.
After several answers, one of which was ~1,000 veterinarians in Atlanta, he told me he was going to ask other questions and I got the impression I had missed the point entirely.
I tried to work from the assumption that each vet could maybe see five animals a day, in a total of 24 working days per month.
Using those assumptions, I finally calculated (24 * 5) * 12 = 1,440 pets/year, and with 500,000 pets that would come to 500,000 / 1,440 ~= 348 veterinarians.
What steps could I have taken to approach this problem differently, in case I run into this sort of problems in future interviews?
I agree with your approach. The average pet sees a veterinarian so many times a year. The average veterinarian sees so many pets per week. Crunch those numbers and you have your answer.
Just guessing off the top of my head, I would say the average pet sees a veterinarian twice each year. So that's 1,000,000 visits. I'd say the average vet works 48 weeks a year, sees about a pet every 40 minutes, and works 30 hours per working week. That's about 2,160 vists per vet.
1,000,000 / 2,160 ~= 462.
My answer is close enough to yours, given that the numbers are all guesses.
The point of the question, I think, is to clearly define each assumption you have to make in order to produce an estimate. Your assumptions can be wildly inaccurate; in practice, they usually aren't too bad.
Interesting aside...there's a fun board game called Guesstimation built entirely around this kind of estimation problem.
How many pets are the types of pets that need to see veterinarians? How many vets see pets instead of large animals?
The point of this question isn't necessarily a Fermi question: It's to see how you handle ambiguous requirements that could significantly affect your answer.