Optaplanner inventory allocation - optaplanner

I am trying to work with Optaplanner in order to solve the following simple exmample :
https://www.gams.com/products/gams/gams-language/#the-gams-language-at-a-glance/
simple inventory allocation exemple
I find it very difficult to do so since I couldn't find a similar cas on optaplanner
Can anyone help ? some directions, ressources
thanks in advance

It looks like the Facility Location Problem in OptaPlanner Quickstarts, but a consumer can pull from 2 or more suppliers at the same time.
Because the number of units to ship from supplier A to consumer X is an integer (*), you would need to use continuous value ranges and custom moves like in the investment example. Not simple (unlike other use cases in OptaPlanner), but doable.
(*) These business requirements are maybe unrealistic: in the real world, a half-full truck is almost as expensive as a full truck.

Related

VRP with parallel relations in Optaplanner

We are trying to solve a VRP with Optaplanner where it is important that two (or more) customers are served at the same time.
This means, for example, that if customer #1 is supplied at 10 o'clock, then customer #2 must also be supplied at 10 o'clock.
It is not allowed to deliver to one customer and leave the other unscheduled.
Such constellations occur with approx 50% of all customers out of a total number of 1000 customers.
It is not sufficient to apply the "delay till last pattern".
All other conditions remain the same as in the VRP example.
How can we proceed in order to solve this problem with Optaplanner?
Are there any examples of such constellations?
In the docs, take a look at chapter Design Patterns, and specifically the auto delay until last pattern.

Object condition in multiple places/repeated code (DRY)

This is a fundamental application design question I’ve struggled with and flip-flopped on for years. We have a legacy webapp that doesn't really have a solid ORM, if that tidbit might influence your answer. To abstract my question let’s say we have a class Car, and a corresponding table in our database named car. Car has a few properties: color, weight, year, maxspeed These properties directly correspond to columns in the db table.
In our application, we define the car as “classic old” if year is < 1960 and color = black. And in many places within our app knowing whether the car is "classic old" is extremely important (maybe we’re running a very illogical insurance agency which gives steep discounts and other perks to cars which are “classic old”).
All over our application, we do things like:
--list all classic old cars
--give the current user a discount if their car is classic old
--list all classic old cars with max speed > 100 miles per hour
--email the current user if their car is classic old and weights more than 1000 pounds
What is the best way to go about this? We have a legacy application that does this in some places:
getOldClassicCars()
select * where year < 1960 and color = black
and in other places:
cararray = getAllCars();
for each car in cararray
if car.year < 1960 and car.color = black
oldcararray = car.add()
The point being that this very important, fundamental piece of our application – is the car classic old – is “hardcoded” as year < 1960 and color = black in many places. Sometimes in SQL, sometimes in application code, etc. Obviously that is not good, but as we’ve refactored things I’m not sure we’re refactoring things the best way we can.
Well, you are stuck with the fundamental problem that
you cant run your code on the database
you want to be able to use the database's selection functionality on this criteria.
you want the calculation of "classic old" to be defined in a single place (preferably code)
Lets enumerate the solutions
1: Put the calculation in a sproc and always use the sproc to retrieve cars.
The problem here is if you create a new car in code, its class status is undefined, so you haven't really solved the 'not in two places' problem.
2: Get the DB to run your calc via an assembly. for example you can get mssql to run functions from a .net assembly which you can also use in your code base to perform the same calculation.
Problem, its hard work. Plus essentially its still in two places, you have to keep the db up to date and ensure that the table is accessed correctly
3: Persist the calculated value on the DB, but perform the calc in the code
Problem, if the calculation changes the DB values will be incorrect and need updating.
3 seems to be the best option, as we will know when the calculation changes and be able to take some action to resolve the situation.
However, it might be best, given the fundamental nature of this calculation, to make that 'out of dateness' implicit in the way we structure the code.
Instead of simply persisting car.IsClassic we could add a CarStatusReport object with a datetime property. We then generate a CarStatusReport(2017) which evaluates all the cars at that point in time and saves that data in a separate table.
Our business logic is then no longer, "Is this car a classic?" but "What does the latest CarStatusReport say the status of this car is?"
You Business Logic will then reside in a single CarStatusReportGenerator service and any other logic accessing the IsClassic calculation, will be forced to acknowledge the ephemeral nature of the stored info.
No optimal solution here. But, one good point will be to move all the business logic into the one place. If you can't (when you make methods or functions calculating some property, for example isOld()) then hide all those inconsistencies under the hood, so implementation users (conceptually) will never notice DRY violation from outside.

Can we Optimise the travel distance in VRPTW by using only one Vehicle?

Is it Possible to optimise the VRPTW by using only one vehicle.Since the single vehicle must go to the customer in sequence of the appointment time of customers.
Yes, it is. In the VRPTW example, simply add only 1 Vehicle instance in the Solution's vehicleList.
Academically, that's more a TSP(TW) than a VRP(TW) though.

Neural Network Input and Output Data formatting

and thanks for reading my thread.
I have read some of the previous posts on formatting/normalising input data for a Neural Network, but cannot find something that addresses my queries specifically. I apologise for the long post.
I am attempting to build a radial basis function network for analysing horse racing data. I realise that this has been done before, but the data that I have is "special" and I have a keen interest in racing/sportsbetting/programming so would like to give it a shot!
Whilst I think I understand the principles for the RBFN itself, I am having some trouble understanding the normalisation/formatting/scaling of the input data so that it is presented in a "sensible manner" for the network, and I am not sure how I should formulate the output target values.
For example, in my data I look at the "Class change", which compares the class of race that the horse is running in now compared to the race before, and can have a value between -5 and +5. I expect that I need to rescale these to between -1 and +1 (right?!), but I have noticed that many more runners have a class change of 1, 0 or -1 than any other value, so I am worried about "over-representation". It is not possible to gather more data for the higher/lower class changes because thats just 'the way the data comes'. Would it be best to use the data as-is after scaling, or should I trim extreme values, or something else?
Similarly, there are "continuous" inputs - like the "Days Since Last Run". It can have a value between 1 and about 1000, but values in the range of 10-40 vastly dominate. I was going to scale these values to be between 0 and 1, but even if I trim the most extreme values before scaling, I am still going to have a huge representation of a certain range - is this going to cause me an issue? How are problems like this usually dealt with?
Finally, I am having trouble understanding how to present the "target" values for training to the network. My existing results data has the "win/lose" (0 or 1?) and the odds at which the runner won or lost. If I just use the "win/lose", it treats all wins and loses the same when really they're not - I would be quite happy with a network that ignored all the small winners but was highly profitable from picking 10-1 shots. Similarly, a network could be forgiven for "losing" on a 20-1 shot but losing a bet at 2/5 would be a bad loss. I considered making the results (+1 * odds) for a winner and (-1 / odds) for a loser to capture the issue above, but this will mean that my results are not a continuous function as there will be a "discontinuity" between short price winners and short price losers.
Should I have two outputs to cover this - one for bet/no bet, and another for "stake"?
I am sorry for the flood of questions and the long post, but this would really help me set off on the right track.
Thank you for any help anyone can offer me!
Kind regards,
Paul
The documentation that came with your RBFN is a good starting point to answer some of these questions.
Trimming data aka "clamping" or "winsorizing" is something I use for similar data. For example "days since last run" for a horse could be anything from just one day to several years but tends to centre in the region of 20 to 30 days. Some experts use a figure of say 63 days to indicate a "spell" so you could have an indicator variable like "> 63 =1 else 0" for example. One clue is to look at outliers say the upper or lower 5% of any variable and clamp these.
If you use odds/dividends anywhere make sure you use the probabilities ie 1/(odds+1) and a useful idea is to normalize these to 100%.
The odds or parimutual prices tend to swamp other predictors so one technique is to develop separate models, one for the market variables (the market model) and another for the non-market variables (often called the "fundamental" model).

how to determine if a record in every source, represents the same person

I have several sources of tables with personal data, like this:
SOURCE 1
ID, FIRST_NAME, LAST_NAME, FIELD1, ...
1, jhon, gates ...
SOURCE 2
ID, FIRST_NAME, LAST_NAME, ANOTHER_FIELD1, ...
1, jon, gate ...
SOURCE 3
ID, FIRST_NAME, LAST_NAME, ANOTHER_FIELD1, ...
2, jhon, ballmer ...
So, assuming that records with ID 1, from sources 1 and 2, are the same person, my problem is how to determine if a record in every source, represents the same person. Additionally, sure not every records exists in all sources. All the names, are written in spanish, mainly.
In this case, the exact matching needs to be relaxed because we assume the data sources has not been rigurously checked against the official bureau of identification of the country. Also we need to assume typos are common, because the nature of the processes to collect the data. What is more, the amount of records is around 2 or 3 millions in every source...
Our team had thought in something like this: first, force exact matching in selected fields like ID NUMBER, and NAMES to know how hard the problem can be. Second, relaxing the matching criteria, and count how much records more can be matched, but is here where the problem arises: how to do to relax the matching criteria without generating too noise neither restricting too much?
What tool can be more effective to handle this?, for example, do you know about some especific extension in some database engine to support this matching?
Do you know about clever algorithms like soundex to handle this approximate matching, but for spanish texts?
Any help would be appreciated!
Thanks.
The crux of the problem is to compute one or more measures of distance between each pair of entries and then consider them to be the same when one of the distances is less than a certain acceptable threshold. The key is to setup the analysis and then vary the acceptable distance until you reach what you consider to be the best trade-off between false-positives and false-negatives.
One distance measurement could be phonetic. Another you might consider is the Levenshtein or edit distance between the entires, which would attempt to measure typos.
If you have a reasonable idea of how many persons you should have, then your goal is to find the sweet spot where you are getting about the right number of persons. Make your matching too fuzzy and you'll have too few. Make it to restrictive and you'll have too many.
If you know roughly how many entries a person should have, then you can use that as the metric to see when you are getting close. Or you can divide the number of records into the average number of records for each person and get a rough number of persons that you're shooting for.
If you don't have any numbers to use, then you're left picking out groups of records from your analysis and checking by hand whether they look like the same person or not. So it's guess and check.
I hope that helps.
This sounds like a Customer Data Integration problem. Search on that term and you might find some more information. Also, have a poke around inside The Data Warehousing Institude, and you might find some answers there as well.
Edit: In addition, here's an article that might interest you on spanish phonetic matching.
I've had to do something similar before and what I did was use a double metaphone phonetic search on the names.
Before I compared the names though, I tried to normalize away any name/nickname differences by looking up the name in a nick name table I created. (I populated the table with census data I found online) So people called Bob became Robert, Alex became Alexander, Bill became William, etc.
Edit: Double Metaphone was specifically designed to be better than Soundex and work in languages other than English.
SSIS , try using the Fuzzy Lookup transformation
Just to add some details to solve this issue, I'd found this modules for Postgresql 8.3
Fuzzy String Match
Trigrams
You might try to cannonicalise the names by comparing them with a dicionary.
This would allow you to spot some common typos and correct them.
Sounds to me you have a record linkage problem. You can use the references in the link.