optaplanner dynamic allocation of equipment - optaplanner

In the optaplanner document, I did not see an example of a multi-equipment and multi-process scenario, my appeal is very simple
For example: there is a product with processes 10, 20, 30, and the equipment that can process 10 processes are M1, M2, M3, Y1, Y2, Y3, Y4 and the equipment that can process 20 processes are M3, M4, M5 , The equipment that can process 30 processes is M6, M7, M8
Take process 10 as an example, if there are 2000 products to be processed, they cannot be allocated to a single M1, and a balance between multiple devices should be achieved.
Sorry, I didn’t find practical examples in the official documents. Now I don’t have any ideas at all. Can any kind person help to see this problem?
enter image description here

Take a look at the CloudBalancing and PAS (Hospital bed planning) examples in the docs and optaplanner-examples. It sounds like your problem aligns well with those.

Related

Include belonging into model

If you had data like (prices and market-cap are not real)
Date Stock Close Market-cap GDP
15.4.2010 Apple 7.74 1.03 ...
15.4.2010 VW 50.03 0.8 ...
15.5.2010 Apple 7.80 1.04 ...
15.5.2010 VW 52.04 0.82 ...
where Close is the y you want to predict and Market-cap and GDP are your x-variables, would you also include Stock in your model as another independent variable as it could for example be that price building for Apple works than differently than for VW.
If yes, how would you do it? My idea is to assign 0 to Apple and 1 to VW in the column Stock.
You first need to identify what exactly are you trying to predict. As it stands, you have longitudinal data such that you have multiple measurements from the same company over a period of time.
Are you trying to predict the close price based on market cap + GDP?
Or are you trying to predict the future close price based on previous close price measurements?
You could stratify based on company name, but it really depends on what you are trying to achieve. What is the question you are trying to answer ?
You may also want to take the following considerations into account:
close prices measured at different times on the same company are correlated with each other.
correlations between two measurements soon after each other will be better than correlations between two measurements far apart in time.
There are four assumptions associated with a linear regression model:
Linearity: The relationship between X and the mean of Y is linear.
Homoscedasticity: The variance of residual is the same for any value of X.
Independence: Observations are independent of each other.
Normality: For any fixed value of X, Y is normally distributed.

Algorithm for Minimising Sum of Distance of agents to visiting targets

I'm working on implementing a model in Python. As part of this model, I have a set of agents (e.g. humans) that need to visit a set of targets (e.g. places). Each agent has its own initial location (i.e. starting point) and I can calculate the distance from each agent to each target.
What I need at this point is to allocate a first job to each agent in a way that the sum of all travel distances for agents from their starting location to their first job is minimum.
I considered greedy algorithm, but I found examples that proves order of allocation can lead to non-optimal solutions. I also looked into nearest neighbour algorithm in TSP, but all I could find was for one agent (or salesman) not multiple.
Could someone point me to any (non-exhaustive search) algorithm/approach that could be used for this purpose please? Thanks
If the number of agents = number of targets, we end up with a standard assignment problem. This can be solved in different ways:
as an LP (linear programming problem). Technically a MIP but variables are automatically integer-valued, so an LP solver suffices.
as a network problem
or using specialized algorithms.
If, say, the number of locations > number of agents, we still can use an LP/MIP:
min sum((i,j), d(i,j)*x(i,j))
sum(j, x(i,j)) = 1 for all agents i (each agent should be assigned to exactly one location)
sum(i, x(i,j)) <= 1 for all locations j (each location should be assigned to at most one agent)
x(i,j) in {0,1}
For the network approach, we would need to add some dummy nodes.
All these methods are quite fast (this is an easy model). To give you an indication: I solved a random example with 500 agents and 1000 locations as an LP and it took 0.3 seconds.

Splitting Training Data to train optimal number of n models

lets assume we have a huge Database providing us with the training data D and a dedicated smaller testing data T for a machine learning problem.
The data covers many aspects of a real world problem and thus is very diverse in its structure.
When we now train a not closer defined machine learning algorithm (Neural Network, SVM, Random Forest, ...) with D and finally test the created model against T we obtain a certain performance measure P (confusion matrix, mse, ...).
The Question: If I could achieve a better performance, by dividing the problem ito smaller sub-problems, e.g. by clustering D into several distinct training sets D1, D2, D3, ..., how could I find the optimal clusters? (number of clusters, centroids,...)
In a brute-force fashion I am thinking about using a kNN Clustering with a random number of clusters C, which leads to the training data D1, D2,...Dc.
I would now train C different models and finally test them against the training sets T1, T2, ..., Tc, where the same kNN Clustering has been used to split T into the C test sets T1,..,Tc.
The combination which gives me the best overall performance mean(P1,P2,...,Pc) would be the one I would like to choose.
I was just wondering whether you know a more sophisticated way than brute-forcing this?
Many thanks in advance
Clustering is hard.
Much harder than classification, because you don't have labels to tell you if you are doing okay, or not well at all. It can't do magic, but it requires you to carefully choose parameters and evaluate the result.
You cannot just dump your data into k-means and expect anything useful to come out. You'd first need to really really carefully clean and preprocess your data, and then you might simply figure out that it actually is only one single large clump...
Furthermore, if clustering worked well and you train classifiers on each cluster independently, then every classifier will miss crucial data. The result will likely performing really really bad!
If you want to only train on parts of the data, use a random forest.
But it sounds like you are more interested in a hierarchical classification approach. That may work, if you have good hierarchy information. You'd first train a classifier on the category, then another within the category only to get the final class.

Improve computation time of numerous binomial processes in hierachical model (openbugs/winbugs)

I am currently developing a hierarchical bayesian model in Openbugs that involves a lot (about 6000 sites) of binomial processes. It describes successive removal electric fishing events/pass and the general structure is as follow:
N_tot[i]<-d[i] * S[i]
d[i]~dgamma(0.01,0.01)
for (i in 1:n_sites){
for (j in 1:n_pass[i]){
logit(p[i,j])~dnorm(0,0.001)
N[i,j] <- N_tot[i] - sum( C[i,1:(j-1)] )
C[i,j] ~ dbin( p[i,j] , N[i,j] )
}
}
where n_sites is the total number of sites i'm looking at. n_pass[i] is the number of fishing pass carried out in site i. N[i,j] is the number of fish in site i when doing fish pass j. N_tot[i] is the total number of fish in site i before any fish pass and it is the product of the density at the site d[i] times the surface of the site S[i] (the surface is known). C[i,j] is the number of fish caught in site i during fish pass j. p[i,j] is the probability of capture in site i for fish pass j.
Each sites as on average 3 fishing pass which is a lot of successive binomial process which typically takes a lot of time to compute/converge.
I can't approximate the binomial process because the catches are typically small.
So I'm a bit stuck and i'm looking for suggestions/alternatives to deal with this issue.
Thanks in advance
edit history:
15-11-2016: added prior definitions for d and p following on #M_Fidino request for clarification

Bayesian Networks with multiple layers

So I'm trying to solve a problem with Bayesian networking. I know the conditional probabilities of some event, say that it will rain. Suppose that I measure (boolean) values from each of four sensors (A1 - A4). I know the probability that of rain and I know the probability of rain given the measurements on each of the sensors.
Now I add in a new twist. A4 is no longer available, but B1 and B2 are (they are also boolean sensors). I know the conditional probabilities of both B1 and B2 given the measurement of A4. How do I incorporate those probabilities into my Bayesian network to replace the lost data from A4?
Your problem fits perfectly to Multi-Entity Bayesian Networks (MEBN). This is an extension to standard BN using First Order Logic (FOL). It basically allows nodes to be added and/or removed based on the specific situation at hand. You define a template for creating BN on the fly, based on the current knwoledge available.
There are several papers on it available on the Web. A classic reference to this work is "Multi-Entity Bayesian Networks Without Multi-Tears".
We have implemented MEBN inside UnBBayes. You can get a copy of it by following the instructions # http://sourceforge.net/p/unbbayes/discussion/156015/thread/cb2e0887/. An example can be seen in the paper "Probabilistic Ontology and Knowledge Fusion for Procurement Fraud Detection in Brazil" # http://link.springer.com/chapter/10.1007/978-3-642-35975-0_2.
If you are interested in it, I can give you more pointers later on.
Cheers,
Rommel