I have 4 people to visit 22.000 places. So, I need to minimize the total time of the visits.
I have the spatial location of the places, and I'm thinking of getting a distance between them or using euclidian distance or using the Google Maps API.
It's possible to solve this problem using OptaPlanner.
I think of solving using the Vehicle Routing modeling. This is the best option? Would OptaPlanner support this amount of input data?
OptaPlanner has done cases like this, but you'll need to enable "nearby selection" explicitly because it's above 1k locations.
Because it's above 10k locations, it might be interesting to benchmark (using the benchmarker) with Partitioned Search too. For example, to speed up the Construction Heuristic, you might want to wrap that in a Partitioned Search. You probably can't wrap it all, because there are only 4 people.
As for using Google Maps API, first read this blog. Then: 10k locations takes 2GB of RAM IIRC to store the distance matrix in its most efficient form (double array of 32-bits) - this has nothing to do with optaplanner. I suspect 22k will bring you near 10GB of RAM just to load that in memory.
Related
Is there currently a way to incorporate traffic patterns into OptaPlanner with the package and delivery VRP problem?
Eg. Let's say I need to optimize 500 pickup and deliveries today and tomorrow amongst 30 vehicles where each pickup has a 1-4hr time window. I want to avoid busy areas of the city during rush hours when possible.
New pickups can also be added (or cancelled in the meantime).
I'm sure this is a common problem. Does a decent solution exist for this in OptaPlanner?
Thanks!
Users often do this, but there is no out-of-the-box example of it.
There are several ways to do it, but one way is to add a 3th dimension to the distanceMatrix, indicating the departureTime. Typically that use a granularity of 15 minutes, 30 minutes or 1 hour.
There are 2 scaling concerns here:
memory. 15 minutes means 24 * 4 = 96 per day. Given that with 2 dimensions, a 10k locations distanceMatrix uses almost 2 GB RAM, clearly memory can become a concern.
pre-calculation time. Calculation the distance matrix can be time consuming. "bulk algorithms" can help here. For example, graphhopper community doesn't support bulk distance calculations, but their enterprise version - as well OSRM (which is free) does. Getting a 3 dimensional matrix from the remote Google Maps API, or the remote enterprise Graphhopper API, can result in bandwidth concerns (see above, the distance matrix can become several GB in size, especially in non-binary formats such as JSON or CSV).
In any case, one that 3 dimensional matrix is there, it's just a matter of adjust OptaPlanner examples's ArrivalTimeUpdateListener to use getDistance(from, to, departureTime).
How to stabilize the RSSI (Received Signal Strength Indicator) of low energy Bluetooth beacons (BLE) for more accurate distance calculation?
We are trying to develop an indoor navigation system and came across this problem where the RSSI is fluctuating so much that, the distance estimation is nowhere near the correct value. We tried using an advance average calculator but to no use,
The device is constantly getting RSSI values, how to filter them, how to get the mean value, I am completely lost, please help.
Can anyone suggest any npm library or point in the right direction, I have been searching for many days but have not gotten anywhere.
FRONT END: ReactNative BACKEND: NODEJS
In addition to the answer of #davidgyoung, we would like to point out that any filtering method is a compromise between quality of noise level reduction and the time-lag introduced by this filtration (depending on the characteristic filtering time you use in your method). As was pointed by #davidgyoung, if you take characteristic filtering period T you will get an average time-lag of about T/2.
Thus, I think the best approach to solve your problem is not to try to find the best filtering method but to make changes on the transmitter’s end itself.
First you can increase the number of signals, transmitter per second (most of the modern beacon allow to do so by using manufacturer applications and API).
Secondly, you can increase beacon's power (which is also usually one of the beacon’s settings), which usually reduces signal-to-noise ratio.
Finally, you can compare beacons from different vendors. At Navigine company we experimented and tested lots of different beacons from multiple manufacturers, and it appears that signal-to-noise ratio can significantly vary among existing manufacturers. From our side, we recommend taking a look at kontakt.io beacons (https://kontakt.io/) as an one of the recognized leaders with 5+ years experience in the area.
It is unlikely that you will find a pre-built package that will do what you want as your needs are pretty specific. You will most likely have to wtite your own filtering code.
A key challenge is to decide the parameters of your filtering, as an indoor nav use case often is impacted by time lag. If you average RSSI over 30 seconds, for example, the output of your filter will effectively give you the RSSI of where a moving object was on average 15 seconds ago. This may be inappropriate for your use case if dealing with moving objects. Reducing the averaging interval to 5 seconds might help, but still introduces time lag while reducing smoothing of noise. A filter called an Auto-Regressive Moving Average Filter might be a good choice, but I only have an implementation in Java so you would need to translate to JavaScript.
Finally, do not expect a filter to solve all your problems. Even if you smooth out the noise on the RSSI you may find that the distance estimates are not accurate enough for your use case. Make sure you understand the limits of what is possible with this technology. I wrote a deep dive on this topic here.
For example in twitter how can we find path between person a to person b ? The query using repeat is recursive and can be very heavy on a big graph.how can i use olap for better performance?.or there is another way?
One mechanism to mitigate the "heft" of this operation in OLTP mode, i.e. the potential full cluster/graph scan, is to use a time limit via Gremlin. The trade off though is that you may not find the path between the two vertices due to reaching the time limit.
OLAP would enable an operation likes this as it's designed to process "wide" traversals. Please note that 5.1 will have a lot of focus on performance improvements for full graph operations.
There are a couple of threads discussing the scalability of Optaplanner, and I am wondering what's the recommended approach to deal with very large datasets when it comes to millions of rows?
As this blog discussed I am already using heuristic (Simulated Annealing + Tabu Search). The search space of cloud balancing problem is c^p, but the feasible space is unknown/NP-complete.
http://www.optaplanner.org/blog/2014/03/27/IsTheSearchSpaceOfAnOptimizationProblemReallyThatBig.html
The problem I am trying to solve is similar to cloud balancing. But the main difference is in the input data, besides a list of computers and a list of processes, there is also a big two dimensional 'score list/table' which has the scores for each possible combinations that needs to be loaded into memory.
In other words, except for the constraints between computers and processes that the planning needs to satisfy, different valid combinations yield various scores and the higher the score the better.
It's a simple problem but when it comes to hundreds of computers, 100k+ processes and the score table has a million+ combinations, it needs a lot of memory. Even though I could allocate more memory to increase the heap size, the planning could become very slow and struggling, as the steps are sorted with custom planning variable/entity comparator classes.
A straight-forward solution is to divide the dataset into smaller subsets, run each of them individually and then combine the results, so that I could have multiple machines to run at the same time and each machine runs on multi-threads. The biggest drawback of this approach is the result produced is far away from optimal.
I am wondering is there any other better solutions?
The MachineReassignment example also has a big "score combination" matrix. OptaPlanner doesn't care about that - those are just problem facts and the DRL quickly matches the combination(s) that is picked for an assignment. The Solver.solve() causes no big memory consumption or performance impact.
However, loading the problem in your code (before calling Solver.solve()) does cause a huge memory consumption: Understand that if n = 20k, then n² = 400m and an int takes of up 4 bytes, so for 20 000 elements that matrix is 1.6 GB in its most efficient uncompressed form int[][] (both in Java and C++!). So for 20k reserve 2GB RAM, for 40k reserve 8GB RAM for 80k reserve 32 GB RAM. That scales badly.
As for dealing with these big problems, I use combinations of techniques such as Nearby Selection (see my blog article on that), Partitioned Search (what you described, it will be supported out of the box in 7, but I 've implemented it for customers in a CustomPhase), Limited Selection Construction Heuristics (need to research that further, the plumbing is there, usually overkill), ... Partitioned Search does indeed exclude optimal solutions, but above 10k planning entities the trade-off quality vs time taking clearly favors Partitioned Search given a reasonable solving time (minutes/hours/days instead of millenia). The trick is to keep the size of each partition big enough, above 1k entities (hence the use NearbySelection). Score calculation speed also matters a lot, of course.
I am trying to understand the Optaplanner CVRPTW example and have the below questions:
Does every node require both distance and travel time to every other node? Or it just requires any one of them? Example data set does not contain both of them. I think it uses euclidean formula to calculate the distance, but how does it automatically calculate travel time?
Is it possible to use real time data (precalculated road distance data)?
Depends if the dataset is using AirLocation or RoadLocation. See docs on vehicle routing, chapter 3.
Yes, if you can hold all the data in memory. At 10k+ locations this becomes a problem because (10k)² ints require almost 2GB RAM. The goal of SegmentedRoadLocation is to scale up to 100k locations without using a lot of RAM, but generating good segmented road location has proven to be difficult.