Can Optaplanner be used to solve the waste collection problem? The waste collection trucks will need to go to the dump site when full and then return to collect from the remaining locations.
Certainly. Check the vehicle routing examples provided with Optaplanner. Waste collection is a very similar problem. You want to maximize the collected waste while minimizing transportation cost (distance driven, number of vehicles used...).
In contrast to the VRP here you do not deliver goods starting from the depot fully loaded - instead, the vehicles depart empty from the depot and then pickup waste until their capacity limit is reached and then return to the depot/dump site.
Related
I'm developing a solver for a VRPTW problem using the OptaPlanner and I have faced a problem when large number of customers need to be serviced. By the large number I mean up to 10,000 customers. I have tried running a solver for about 48 hours but no feasible solution was ever reached.
I use a highly customized VRPTW domain model that introduces additional planning entity so-called "Workbreak". Workbreaks are like customers but they can have a location that is actually another planning value - because every day a worker can return home or go to the hotel. Workbreaks have fixed time of departure (usually next day morning), and a variable time of arrival (because it depends on the previous entity within a chain). A hard constraint cares about not allowing to "arrive" to the Workbreak after certain point of time. There are other hard constraints too, like:
multiple service time windows per customer
every week the last customer in chain must be a special customer "storage space visit" (workers need to gather materials before the next week)
long jobs management (when a customer needs to be serviced longer than specified time it should be serviced before specific hour of a day)
max number of jobs per workday
max total job duration per workday (as worker cannot work longer than specified time)
a workbreak cannot have a location of a hotel that is too close to worker's home.
jobs can not be serviced on Sundays
... and many more - there is a total number of 19 hard constrains that have to be applied. There are 3 soft constraints too.
All the aforementioned constraints were initially written as Drools rules, but because of many accumulation-based constraints (max jobs per day, max hours per day, overtime hours per week) the overall speed of the solver (benchmarks) was about 400 step/sec.
At first I thought that solver's speed is too slow to reach a feasible solution in a reasonable time, so I have rewritten all rules into easy score calculator, and it had a decent speed - about 4600 steps/sec. I knew that is will only perform best for a really small number of customers, but I wanted to know if the Drools was the cause of that poor performance. Then I have rewritten all these rules into incremental score calculator (and survived the pain of corrupted score bugs until all of them were successfully fixed). Surprisingly incremental score calculation is a bit slower for a small number of customers, comparing to easy score calculator, but it is not an issue, because overall speed is about 4000 steps/sec - no matter how many entities I have.
The thing that bugs me the most is that above a certain number of customers (problems start at 1000 customers) the solver cannot reach feasible solution. Currently I'm using Late Acceptance and Step Counting algorithms, because they perform really good for this kind of a problem (at least for a less number of customers). I used Simulated Annealing too, but without success, mostly because I could not find good values for algorithm specific parameters.
I have implemented some custom moves too:
Composite move that changes workbreak's location when sibling entities are changed using other moves like change/swap moves (it helps escaping many score traps, as improving step usually needs at least two moves to be performed in a single step)
Move factory for better long jobs assignment (it generates moves that tries to put customers with longer service time in the front of a workday chain)
Workbreak assignment move factory (it generates moves that helps putting workbreaks in proper sequence)
Now I'm scratching my head, and wondering what I should do to diagnose the source of my problem. I suspected that maybe it was hitting a score trap, but I have modified the solver so it saves snapshots of best score each minute. After reading these snapshots I realized that the score was still decreasing. Can the number of hard constraints play the role? I suspect that many moves need to be performed to find out a move that improves the score. The fact is that maybe 48 hours isn't that much for this kind of a problem, and it should make computations a whole week? Unfortunately I have nothing to compare with.
I would like to know how to find out if it is solely a performance problem, or a solver (algorithm, custom moves, hard/soft score) configuration problem.
I really apologize for my bad English.
TL;DR but FWIW:
To scale above 1k locations you need to use NearBy selection.
To scale above 10k locations, add Partitioned Search too.
I have 4 people to visit 22.000 places. So, I need to minimize the total time of the visits.
I have the spatial location of the places, and I'm thinking of getting a distance between them or using euclidian distance or using the Google Maps API.
It's possible to solve this problem using OptaPlanner.
I think of solving using the Vehicle Routing modeling. This is the best option? Would OptaPlanner support this amount of input data?
OptaPlanner has done cases like this, but you'll need to enable "nearby selection" explicitly because it's above 1k locations.
Because it's above 10k locations, it might be interesting to benchmark (using the benchmarker) with Partitioned Search too. For example, to speed up the Construction Heuristic, you might want to wrap that in a Partitioned Search. You probably can't wrap it all, because there are only 4 people.
As for using Google Maps API, first read this blog. Then: 10k locations takes 2GB of RAM IIRC to store the distance matrix in its most efficient form (double array of 32-bits) - this has nothing to do with optaplanner. I suspect 22k will bring you near 10GB of RAM just to load that in memory.
I have been asked by a customer to work on a project using Drools. Looking at the Drools documentation I think they are talking about OptaPlanner.
The company takes in transport orders from many customers and links these to bookings on multiple carriers. Orders last year exceeded 100,000. The "optimisation" that currently takes place is based on service, allocation and rate and is linear (each order is assigned to a carrier using the constraints but without any consideration of surrounding orders). The requirement is to hold non-critical orders in a pool for a number of days and optimize the orders in the pool for lowest cost using the same constraints.
Initially they want to run "what if's" over last year's orders to fine-tune the constraints. If this exercise is successful they want to use it in their live system.
My question is whether OptaPlanner is the correct tool for this task, and if so, if there is an example that I can use to get me started.
Take a look at the vehicle routing videos, as it sounds like you have a vehicle routing problem.
If you use just Drools to assign orders, you basically build a Construction Heuristic (= a greedy algorithm). If you use OptaPlanner to assign the orders (and Drools to calculate the quality (= score) of a solution), then you get a better solution. See false assumptions on vehicle routing to understand why.
To scale to 100k orders (= planning entities), use Nearby Selection (which is good up to 10k) and Partitioned Search (which is a sign of weakness but needed above 10k).
I am trying to understand the Optaplanner CVRPTW example and have the below questions:
Does every node require both distance and travel time to every other node? Or it just requires any one of them? Example data set does not contain both of them. I think it uses euclidean formula to calculate the distance, but how does it automatically calculate travel time?
Is it possible to use real time data (precalculated road distance data)?
Depends if the dataset is using AirLocation or RoadLocation. See docs on vehicle routing, chapter 3.
Yes, if you can hold all the data in memory. At 10k+ locations this becomes a problem because (10k)² ints require almost 2GB RAM. The goal of SegmentedRoadLocation is to scale up to 100k locations without using a lot of RAM, but generating good segmented road location has proven to be difficult.
Suppose that there is a company that own a couple of vending machines that collect coins. When the coin safe is full, the machine can not sell any new item. To prevent this, the company must collect the coins before that. But if the company send the technician too early, the company loses money because he made an unnecessary trip. The challenge is to predict the right time to collect coins to minimize the cost of operation.
At each visit (to collect or other operations), a reading of the level of coins in the safe is performed. This data contains historical information regarding the safe filling for each machine.
What is the best ML technique, approach to this problem computationally?
This is the two parts to the problem I see:
1) vending machine model
I would probably build a model for each machine using the historic data. Since you said a linear approach is probably not good, you need to think about things that have influence on the filling of a machine, i.e. time related things like week-day dependency, holiday dependency, etc., other influences like the weather maybe? So you need to attach these factors to the historic data to make a good predictive model. Many machine learning techniques can help creating a model and finding real data correlations. Maybe you should create despriptors from your historical data and try to correlate these to the filling state of a machine. PLS can help reducing the descriptor space and find relevant ones. Neuronal Networks are great if you really have no clue about the underlying math of a correlation. Play around with it. But pretty much any machine learning technique should be able to come up with a decent model
2) money collection
Model the cost for a random trip of the technician to a machine. Take into account the filling grade of the machines and the cost of the trip. You can send the technician on virtual collecting tours and calculate the total cost of collecting the money and the revenues from the machine. Use again maybe a neuronal network with some evolutionary strategy to find an optimum of trips and times. you can use the model of the filling grade of the machines during the virtual optimization, since you probably need to estimate the filling grade of the machines in these virtual collection rounds.
Interesting problems you have...