Optaplanner looks like it's great at Vehicle Routing for problems involving single entities, such as a taxi fleet of taxis transporting single customers between locations.
But what about more complex systems where the vehicle is shared by multiple people at once, such as DARP or Uber pooling where it could like something like:
Pick up customer 1 -> Pick up customer 2 -> Drop off customer 1 -> Pick up customer 3 -> Drop off customer 2 -> Drop off customer 3
As per the description of DARP:
The Dial-a-Ride Problem (DARP) consists of designing vehicle routes and schedules for n users who specify pickup and delivery requests between origins and destinations. The aim is to plan a set of m minimum cost vehicle routes capable of accommodating as many users as possible, under a set of constraints. The most common example arises in door-to-door transportation for elderly or disabled people.
Is this sort of thing possible with Optaplanner?
I looked through the documentation to grasp what Optaplanner could do, but not too sure where its limits lie at.
In theory, doing a mixed VRP in OptaPlanner is possible. In practice, we have not yet gotten around to finding the best possible model which we could recommend to users.
We have an old JIRA for it where some proposals were outlined, but no definitive conclusion was reached.
Related
A friend was asked this question which I would like to know what would be an ideal response for this question.
If you were asked to do analysis about how many people and book capacity should a new public library have in the area you live in, and what should be the location for such a library, what are the inputs you would need to perform the analysis, what are the key factors that would need to be considered? How would you test whether your analysis was accurate?
The analysis which I did were as follows: For the book and people capacity as well as for the location:
1 For the location, questions such as: Where are the cities most populated educational institutes located? Where do most college students go after their college/tuitions? Which locations are entertainment hotspots that attract people? For example, Malls, Shopping centers, etc.? Which location is most connected to other parts of the city via different public transport routes like bus, train, and metro?
I think so answering this, would give an idea about the location for where the library should be established.
Coming to the people and book capacity, Well, including the students that are present in the given area, can one estimate an average of the student population for a reference number? (Though I think it doesn't qualify as a correct method?) Can also take an example of an existing library somewhere else and find out how many people do visit it and get a rough estimated number? (This can also be done for getting the book capacity) Finally, I think we can also get a number on how many working professionals, residents and students inhabit the said location and estimate a range on how many people will visit the library?
For the book capacity, A number of books would be educational books depending upon the educational backgrounds of the majority of students, which could be estimated by the number of residing students present there. Besides educational books, the number of fictional and non fictional books could be the average amount of best sellers present in a given month. And lastly, again we can compare with existing libraries which are present in the given vicinity.
For testing our hypothesis, the only way I'm getting is by conducting surveys or by asking people first hand in a QnA interview format near college areas. Any other suggestions?
I have been assigned a project related to VRP in which I have a fleet of trucks scattered in different locations that need to pick up and deliver goods from and to different locations.
Google's OR tools works on using depot for optimization, but in my case there is no starting or ending depot and I need to create a planner that can find the optimised routes for all the vehicles with different constraints.
Im struggling to find a solution for such problem, please help for the same.
Problem:
We are looking for some guidance on what database to use and how to model our data to efficiently query for aggregated statistics as well as statistics related to a specific entity.
We have different underlying data but this example should showcase the fundamental problem:
Let's say you have data of Facebook friend requests and interactions over time. You now would like to answer questions like the following:
In 2018 which American had the most German friends that like ACDC?
Which are the friends that person X most interacted with on topic Y?
The general problem is that we have a lot of changing filter criteria (country, topic, interests, time) on both the entities that we want to calculate statistics for and the relevant related entities to calculate these statistics on.
Non-Functional Requirements:
It is an offline use-case, meaning there are no inserts, deletes or
updates happening, instead every X weeks a new complete dump is imported to replace the old data.
We would like to have an upper bound of 10 seconds
to answer our queries. The faster the better max 2 seconds for queries would be great.
The actual data has around 100-200 million entries, growth rate is linear.
The system has to serve a limited amount of concurrent users, max 100.
Questions:
What would be the right database technology or mixture of technologies to solve our problem?
What would be an efficient data model for computing aggregations with changing filter criteria in several dimensions?
(Bonus) What would be the estimated hardware requirements given a specific technology?
What we tried so far:
Setting up a document store with denormalized entries. Problem: It doesn't perform well on general queries because it has to scan too many entries for aggregations.
Setting up a graph database with normalized entries. Problem: performs even more poorly on aggregations.
You talk about which database to use, but it sounds like you need a data warehouse or business intelligence solution, not just a database.
The difference (in a nutshell) is that a data warehouse (DW) can support multiple reporting views, custom data models, and/or pre-aggregations which can allow you to do advanced analysis and detailed filtering. Data warehouses tend to hold a lot of data and are generally built to be very scalable and flexible (in terms of how the data will be used). For more details on the difference between a DW and database, check out this article.
A business intelligence (BI) tool is a "lighter" version of a data warehouse, where the goal is to answer specific data questions extremely rapidly and without heavy technical end-user knowledge. BI tools provide a lot of visualization functionality (easy to configure graphs and filters). BI tools are often used together with a data warehouse: The data is modeled, cleaned, and stored inside of the warehouse, and the BI tool pulls the prepared data into specific visualizations or reports. However many companies (particularly smaller companies) do use BI tools without a data warehouse.
Now, there's the question of which data warehouse and/or BI solution to use.
That's a whole topic of its own & well beyond the scope of what I write here, but here are a few popular tool names to help you get started: Tableau, PowerBI, Domo, Snowflake, Redshift, etc.
Lastly, there's the data modeling piece of it.
To summarize your requirements, you have "lots of changing filter criteria" and varied statistics that you'll need, for a variety of entities.
The data model inside of a DW would often use a star, snowflake, or data vault schema. (There are plenty of articles online explaining those.) If you're using purely BI tool, you can de-normalize the data into a combined dataset, which would allow you a variety of filtering & calculation options, while still maintaining high performance and speed.
Let's look at the example you gave:
Data of Facebook friend requests and interactions over time. You need to answer:
In 2018 which American had the most German friends that like ACDC?
Which are the friends that person X most interacted with on topic Y?
You want to filter/re-calculate the answers to those questions based on country, topic, interests, time.
One potential dataset can be structured like:
Date of Interaction | Initiating Person's Country | Responding Person's Country | Topic | Interaction Type | Initiating Person's Top Interest | Responding Person's Top Interest
This would allow you to easily count the amount of interactions, grouped and/or filtered by any of those columns.
As you can tell, this is just scratching the surface of a massive topic, but what you're asking is definitely do-able & hopefully this post will help you get started. There are plenty of consulting companies who would be happy to help, as well. (Disclaimer: I work for one of those consulting companies :)
I have been asked by a customer to work on a project using Drools. Looking at the Drools documentation I think they are talking about OptaPlanner.
The company takes in transport orders from many customers and links these to bookings on multiple carriers. Orders last year exceeded 100,000. The "optimisation" that currently takes place is based on service, allocation and rate and is linear (each order is assigned to a carrier using the constraints but without any consideration of surrounding orders). The requirement is to hold non-critical orders in a pool for a number of days and optimize the orders in the pool for lowest cost using the same constraints.
Initially they want to run "what if's" over last year's orders to fine-tune the constraints. If this exercise is successful they want to use it in their live system.
My question is whether OptaPlanner is the correct tool for this task, and if so, if there is an example that I can use to get me started.
Take a look at the vehicle routing videos, as it sounds like you have a vehicle routing problem.
If you use just Drools to assign orders, you basically build a Construction Heuristic (= a greedy algorithm). If you use OptaPlanner to assign the orders (and Drools to calculate the quality (= score) of a solution), then you get a better solution. See false assumptions on vehicle routing to understand why.
To scale to 100k orders (= planning entities), use Nearby Selection (which is good up to 10k) and Partitioned Search (which is a sign of weakness but needed above 10k).
I am trying to make a recommendation website of books. I have crawled some book sites and, and have around 15 million separate books in the DB, which is in neo4j.
Now for some genres, like mystery and thriller, there are about 1 million books at least. I have to make a top 20 list of recommendations. My current approach-
get the books
run a similarity comparison (vec-cosine or pearsons)
sort and display
are expensive and take time, not at all good for a realtime system. I thought keeping a sorted list per genre by linking neo4j to a traditional DB and getting the top ones from that db via neo4j. But that also is slow (takes a few 10s of seconds). Is there a simpler and more intuitive way to do this? Any ideas will help.
It would be good to know what other criteria you would like to base your recommendations on, e.g. how exactly you measure similarities between books. I'm assuming it's not purely genre based.
One approach we have been taking with these dense nodes (such as your genres, or cities people live in, etc.), is to find recommendations first based on some other criteria, then boost the relevance score of the recommendation if it is connected to the correct dense node. Such a query is much more performant.
For example, when recommending 20 people you should be friends with, I'd find 100 candidates based on all other criteria and then boost the scores of candidates living in the same location as the user we're recommending for. That's 100 single-hop traversals, which will be very quick.
Have a look at this recent webinar recording, you may find some inspiration in it.
Regarding similarity measures, these may need to be pre-computed, linking similar books together by SIMILAR_TO relationships. Such pre-computation might be done using the Runtime of GraphAware Framework, which only executes this background computation during quiet periods, thus not interfering with your regular transactional processing. Look at the NodeRank module, which computes PageRank in Neo4j during quiet periods.