Total duration of the route with time windows is not optimal - optaplanner

I prepared a test case illustrating the problem.
The route consists of three points:
1001 - depot.
1002 - timewindow 10:00-15:00.
1003 - timewindow 8:00-15:00.
I expected that the route will be 1001-1003-1002, but received 1001-1002-1003.
As I understand, soft score constraint doesn't optimize downtime interval = readyTime - arrivalTime.
Although total travel time is minimal (calculated only by the matrix), but the total duration of the route now is more than could be.
Can I somehow optimize total route duration?
Thanks in advance.
Vrp file:
NAME: P1568C3-n3-k1
COMMENT: P1568C3-n3-k1
TYPE: CVRPTW
DIMENSION: 3
EDGE_WEIGHT_TYPE: EXPLICIT
EDGE_WEIGHT_FORMAT: FULL_MATRIX
EDGE_WEIGHT_UNIT_OF_MEASUREMENT: SEC
CAPACITY: 4
NODE_COORD_SECTION
1001 52.086 23.687 address
1002 52.089 23.71 address
1003 52.095 23.742 address
EDGE_WEIGHT_SECTION
0 0.1675 0.4053
0.1675 0 0.2378
0.4893 0.3218 0
DEMAND_SECTION
1001 0 21600 54000 0
1002 1 36000 54000 1800
1003 1 28800 54000 1800
DEPOT_SECTION
1001
-1
EOF
Result xml:
<vehicleList id="11">
<VrpVehicle id="12">
<id>0</id>
<capacity>4</capacity>
<depot class="VrpTimeWindowedDepot" reference="10"/>
<nextCustomer class="VrpTimeWindowedCustomer" id="13">
<id>1002</id>
<location class="VrpRoadLocation" reference="5"/>
<demand>1</demand>
<previousStandstill class="VrpVehicle" reference="12"/>
<nextCustomer class="VrpTimeWindowedCustomer" id="14">
<id>1003</id>
<location class="VrpRoadLocation" reference="7"/>
<demand>1</demand>
<previousStandstill class="VrpTimeWindowedCustomer" reference="13"/>
<vehicle reference="12"/>
<readyTime>28800</readyTime>
<dueTime>54000</dueTime>
<serviceDuration>1800</serviceDuration>
<arrivalTime>38038</arrivalTime>
</nextCustomer>
<vehicle reference="12"/>
<readyTime>36000</readyTime>
<dueTime>54000</dueTime>
<serviceDuration>1800</serviceDuration>
<arrivalTime>36000</arrivalTime>
</nextCustomer>
</VrpVehicle>
</vehicleList>

In the optaplanner-examples implementation, which follows the academic paper's problem definition, the score implementation soft score is only the time spend on the road. The current score constraints do not include any penalty for lost time (if any) of vehicles before leaving the depot.
You can see that in the example UI if you click on the bottom left button "constraint matches":
-489 for driving back to the depot
-406: -168 for driving from the depot to closest customer and -238 to drive to other customer.
So OptaPlanner does return the optimal solution, you just have a different problem definition. Simply add a soft constraint to penalize the depot's opening time till the departure time.

Related

Combining multiple related tables with different dates

Suppose I have a company that each month buys a certain amount of computers, and the ratios of desktop to laptops as well as the brand changes sometimes (This is made up scenario, my actual scenario has nothing to do with computers). Ex:
For the accounting department:
On 1/1:
Laptops 50% (Dell) Desktop 50% (Acer)
For the next few months, I use that buying structure. However, I decide I want to change brands:
On 4/1:
Laptops 50% (Apple) Desktop 50% (Asus)
Then later, I decide I need more laptops:
On 7/1:
Laptops 75% (Apple) Desktop 25% (Asus)
For a certain department, I would like to get the following data in a report:
Date
Company
Weight
1/1
Dell
.5
1/1
Acer
.5
4/1
Apple
.5
4/1
Asus
.5
7/1
Apple
.75
7/1
Asus
.25
Here is what the database table structure looks like:
Table: department
id
name
1
Accounting
2
Developers
Table: computer_type
id
name
1
Laptop
2
Desktop
Table: purchase_weights
date
computer_type_id
weight
department_id
1/1
1
.5
1
1/1
2
.5
1
7/1
1
.25
1
7/1
2
.75
1
Table: purchase_companies
date
computer_type_id
company
1/1
1
Dell
1/1
2
Acer
4/1
1
Apple
4/1
2
Asus
As you can see, the tables need to be joined in a way that the dates that are missing in one table get forward filled. There is no entry in the weights table for 4/1 when the company changed, and there is no entry in the company table when the weight changed. The constraints can also be seen, where if you change the company for laptops, all departments will start to buy that company if their weights include that computer type. Any insight would be very helpful, thanks!
Things I have tried:
Full outer join (does not include missing dates)
SELECT * FROM purchase_weights pw FULL OUTER JOIN purchase_companies pc ON pw.computer_type_id = pc.computer_type_id AND pw.date = pc.date WHERE pw.department_id = 1
Normal join (does a cross multiplication)
SELECT * FROM purchase_weights pw JOIN purchase_companies pc ON pw.computer_type_id = pc.computer_type_id WHERE pw.department_id = 1

How to use normalization to set levels of confidence between a rating and the number of ratings in Python or SQL?

I have a list of about 800 sales items that have a rating (from 1 to 5), and the number of ratings. I'd like to list the items that are most probable of having a "good" rating in an unbiased way, meaning that 1 person voting 5.0 isn't nearly as good as 50 people having voted and the rating of the item being a 4.5.
Initially I thought about getting the smallest amount of votes (which will be zero 99% of the time), and the highest amount of votes for an item on the list and factor that into the ratings, giving me a confidence level of 0 to 100%, however I'm thinking that this approach would be too simplistic.
I've heard about Bayesian probability but I have no idea on how to implement it. My list of items, ratings and number of ratings is on a MySQL view, but I'm parsing the code using Python, so I can make the calculations on either side (but preferably at the SQL view).
Is there any practical way that I can normalize this voting with SQL, considering the rating and number of votes as parameters?
|----------|--------|--------------|
| itemCode | rating | numOfRatings |
|----------|--------|--------------|
| 12330 | 5.00 | 2 |
| 85763 | 4.65 | 36 |
| 85333 | 3.11 | 9 |
|----------|--------|--------------|
I've started off trying to assign percentiles to the rating and numOfRatings, this way I'd be able to do normalization (sum them with an initial 50/50 weight). Here's the code I've attempted:
SELECT p.itemCode AS itemCode, (p.rating - min(p.rating)) / (max(p.rating) - min(p.rating)) AS percentil_rating,
(p.numOfRatings - min(p.numOfRatings)) / (max(p.numOfRatings) - min(p.numOfRatings)) AS percentil_qtd_ratings
FROM products p
WHERE p.available = 1
GROUP BY p.itemCode
However that's only bringing me a result for the first itemCode on the list, not all of them.
Clearly the issue here is the low number of observations your data has. Implementing Bayesian's method is the way to go because it provides great probability distribution for applications involving ratings especially if there is limited observations, and it easily decides the future likelihood ratio based on given parameters (this article provides an excellent explanation about Bayesian probability for beginners).
I would suggest storing your data in CSV files so it becomes easier to manipulate in python. Denormalizing the data via joins is the first task to do before analyzing your ratings.
This is Bayesian's simplified formula to use in your python code:
R – Confidence level aka number of observations
v – number of votes for a single product
C – avg vote for all products
m - tuneable parameter aka cutoff number required for votes to be considered (How many votes do you want displayed)
Since this is the simplified formula, this article explains how its been derived from its original formula. This article is helpful too in explaining the parameters.
Knowing the formula pretty much gets 50% of your work done, the rest is just importing your data and working with it. I provided below examples similar to your problem in case you need full demonstration:
Github example 1
Github example 2

Is it possible to match the "next" unmatched record in a SQL query where there is no strictly unique common field between tables?

Using Access 2010 and its version of SQL, I am trying to find a way to relate two tables in a query where I do not have strict, unique values in each table, using concatenated fields that are mostly unique, then matching each unmatched next record (measured by a date field or the record id) in each table.
My business receives checks that we do not cash ourselves, but rather forward to a client for processing. I am trying to build a query that will match the checks that we forward to the client with a static report that we receive from the client indicating when checks were cashed. I have no control over what the client reports back to us.
When we receive a check, we record the name of the payor, the date that we received the check, the client's account number, the amount of the check, and some other details in a table called "Checks". We add a matching field which comes as close as we can get to a unique identifier to match against the client reports (more on that in a minute).
Checks:
ID Name Acct Amt Our_Date Match
__ ____ ____ ____ _____ ______
1 Dave 1001 10.51 2/14/14 1001*10.51
2 Joe 1002 12.14 2/28/14 1002*12.14
3 Sam 1003 50.00 3/01/14 1003*50.00
4 Sam 1003 50.00 4/01/14 1003*50.00
5 Sam 1003 50.00 5/01/14 1003*50.00
The client does not report back to us the date that WE received the check, the check number, or anything else useful for making unique matches. They report the name, account number, amount, and the date of deposit. The client's report comes weekly. We take that weekly report and append the records to make a second table out of it.
Return:
ID Name Acct Amt Their_Date Unique1
__ ____ ____ ____ _____ ______
355 Dave 1001 10.51 3/25/14 1001*10.51
378 Joe 1002 12.14 4/04/14 1002*12.14
433 Sam 1003 50.00 3/08/14 1003*50.00
599 Sam 1003 50.00 5/11/14 1003*50.00
Instead of giving us back the date we received the check, we get back the date that they processed it. There is no way to make a rule to compare the two dates, because the deposit dates vary wildly. So the closest thing I can get for a unique identifier is a concatenated field of the account number and the amount.
I am trying to match the records on these two tables so that I know when the checks we forward get deposited. If I do a simple join using the two concatenated fields, it works most of the time, but we run into a problem with payors like Sam, above, who is making regular monthly payments of the same amount. In a simple join, if one of Sam's payments appears in the Return table, it matches to all of the records in the Checks table.
To limit that behavior and match the first Sam entry on the Return table to the first Sam entry on the Checks table, I wrote the following query:
SELECT return.*, checks.*
FROM return, checks
WHERE (( ( checks.id ) = (SELECT TOP 1 id
FROM checks
WHERE match = return.unique1
ORDER BY [our_date]) ));
This works when there is only one of Sam's records in the Return table. The problem comes when the second entry for Sam hits the Return table (Return.ID 599) as the client's weekly reports are added to the table. When that happens, the query appropriately (for my purposes) only lists that two of Sam's checks have been processed, but uses the "Top 1 ID" record to supply the row's details from the Return table:
Checks_Return_query:
Checks.ID Name Acct Amt Our_Date Their_Date Return.ID
__ ____ ____ ____ _____ ______ ________
1 Dave 1001 10.51 2/14/14 3/25/14 355
2 Joe 1002 12.14 2/28/14 4/04/14 378
3 Sam 1003 50.00 3/01/14 3/08/14 433
4 Sam 1003 50.00 4/01/14 3/08/14 433
In other words, the query repeats the Return table info for record Return.ID 433 instead of matching Return.ID 599, which is I guess what I should expect from the TOP 1 operator.
So I am trying to figure out how I can get the query to take the two concatenated fields in Checks and Return, compare them to find matching sets, then select the next unmatched record in Checks (with "next" being measured either by the ID or Our_Date) with the next unmatched record in Return (again, with "next" being measured either by the ID or Their_Date).
I spent many hours in a dark room turning the query into various joins, and back again, looking at functions like WHERE NOT IN, WHERE NOT EXISTS, FIRST() NEXT() MIN() MAX(). I am afraid I am way over my head.
I am beginning to think that I may have a structural problem, and may need to write the "matched" records in this query to another table of completed transactions, so that I can differentiate between "matched" and "unmatched" records better. But that still wouldn't help me if two of Sam's transactions are on the same weekly report I get from my client.
Are there any suggestions as to query functions I should look into for further research, or confirmation that I am barking up the wrong tree?
Thanks in advance.
I'd say that you really need another table of completed transactions, it could be temporary table.
Regarding your fears "... if two of Sam's transactions are on the same weekly report ", you can use cursor in order to write records "one-by-one" instead of set based transaction.

finding time concurrency with an oracle select

I have the following data in a DB table and would like to find concurrent transactions based on the start and end time.
Patient Start Time End Time
John 08:31A 10:49A
Jim 10:14A 10:30A
Jerry 10:15A 10:28A
Alice 10:18A 12:29P
Bobby 10:32A 10:49A
Sally 10:46A 10:55A
Jane 10:52A 11:29A
Jules 10:54A 11:40A
Adam 10:58A 11:25A
Ben 11:00A 11:20A
Ann 11:31A 11:56A
Chris 11:49A 11:57A
Nick 12:00P 12:21P
Dave 12:00P 12:35P
Steve 12:23P 12:29P
If I want to find any overlapping times with a particular input, how would I write it? For example say I want to find overlaps with the 10:58A-11:25A time. This needs to find potential time concurrencies. I am looking for a count of the maximum number of concurrent overlaps. In my example for 10:58A-11:25A, I would want to see the following (count would be 5):
Patient Start Time End Time
Alice 10:18A 12:29P
Jane 10:52A 11:29A
Jules 10:54A 11:40A
Adam 10:58A 11:25A
Ben 11:00A 11:20A
In my second example some of the concurrent times overlap with the time I am looking for, but they are over before another range start. So, say I am looking for 10:46A-10:55A. I would expect a count of 4. In this example 08:31A-10:49A is dropped because it is over before 10:52A-11:29A starts. Same with 10:32A-10:49A, it was over before 10:52A-11:29A started. So, the most concurrent with the 10:46A-10:55A range would be 4, even though overall there is 6 (2 dropped).
Patient Start Time End Time
Alice 10:18A 12:29P
Sally 10:46A 10:55A
Jane 10:52A 11:29A
Jules 10:54A 11:40A
Can this be done with a sql statement?
The most easy and clean method is to exclude rows:
All rows except:
rows than ends before start time
rows than begins after end time
sample:
select * from your table t
where
not (
t.end < begin_time
or
t.start > end_time
)
Sample in sqlfiddle

How to tally and store votes for a web site?

I am using SQL Server 2005.
I have a site that people can vote on awesome motorcycles. Each time a user votes, there is one for the first bike and one vote against the second bike. Two votes are stored in the database. The vote table looks like this:
VoteID VoteDate BikeID Vote
1 2012-01-12 123 1
2 2012-01-12 125 0
3 2012-01-12 126 0
4 2012-01-12 129 1
I want to tally the votes for each bike quite frequently, say each hour. My idea is to store the tally as a percentage of contest won versus lost on the bike table as an attribute of the bike. So, if a bike won 10 contests and lost 20 contest, they would have a score (tally) of 33. I would tally up daily, weekly, and monthly scores.
BikeID BikeName DailyTally WeeklyTally MonthlyTally
1 Big Dog 5 10 50
2 Big Cat 3 15 40
3 Small Dog 9 8 0
4 Fish Face 19 21 0
Right now, there are about 500 votes per day being cast. We anticipate 2500 - 5000 per day in the next month or so.
What is the best way to tally the data and what is the best way to store it? Should the tallies be on their own table? Should a trigger be used to run a new tally each time a bike is voted on? Should a stored procedure be run hourly to get all tallies?
Any ideas would be very helpful!
Store your VoteDate as a datetime value instead of just date.
For your tallies, you can just make that a view and calculate it on the fly. This should be very simple to do using GROUP BY and DATEPART functions. If you need exact code for how to do this, please open a new question.
For that low volume of rows it doesn't make any sense to store aggregations in a table when you can just calculate them whenever you want to see them and get accurate and immediate results that are up-to-date.
I agree with #JNK try a view or just a normal stored proc to calculate the outputs on the fly. If you find it becomes too slow as your data grows I would investigate other routes then (like caching the data in another table etc). Probably worth keeping it simple to start with; you can always resuse the logic from the SP/VIEW later if you do want to setup a scheduled task.
Edit :
Removed the index view as per #Damien_The_Unbeliever comments its not deterministic and i'm stupid :)