Scheduling a Job that consumes multiple resources - optaplanner

I've got a problem which I think optaplanner may be able to solve, but I haven't seen a demo that quite fits what I'm looking to do. My problem set is scheduling IoT node usage for a testbed. Each test execution (job) requires different sets of constraints on the nodes it will use. For example, a job may ask for M nodes with resource A, and N nodes with resource B. It will also specify a length of time it needs the nodes for and a window in which the job start is acceptable. To successfully schedule a job, it must be able to claim enough resources to meet the job specific requirements (ie, hard limits).
Being new to optaplanner, my understanding is that most of the examples focus on only needing one resource per Job. Any insight into whether this problem could be solved with optaplanner and where to start would be highly appreciated.

If you haven't already, look at the (cheap time scheduling example](https://www.youtube.com/watch?v=r6KsveB6v-g&list=PLJY69IMbAdq0uKPnjtWXZ2x7KE1eWg3ns) and project job scheduling example.
The differentiating question is if when job J1 needs M nodes with resource A if whether or not any of those M nodes can also supply resource B, just not at the same time.
If that's not the case, this is an easy model: you can threat resource A as a capacity like cloud balancing.
If that is the case, it's a complex model (but still possible), for example the jobs are chained or time grained (=> planning var 1) and each job has tasks which are assigned to nodes (=> planning var 2). All of this is likely to need custom moves for efficiency.

Related

What kind of problem is this? Is feasible with optaplanner?

I have to solve a problem on a manufacturing environment where:
A number of processes with subtasks needs to be scheduled.
Each subtask need N resources that can be, raw material, workers or machines.
Some subtasks need a worker with certain skills or from a department.
Workers are organized in shifts, so it may happen that on a shift certain skill may not be available. *
A machine can fit N pieces, depending on the piece size and the capacity of the machine. *
A machine may accept pieces of different types. *
Machines can be not available on a period as maintenances can happen.
If the next piece going to the machine is different from the previous one, a new task for maintenance needs to be inserted. *
If there is no raw material of certain category it can be manufactured, so a new process to manufacture that raw material needs to be inserted before the one that needs it. *
The processes can have a deadline.
Some raw materials can be partially consumed, so for example if we have 2L of painting, a subtask require 1L of that painting.
Is this a Job Shop or any variant problem? Is it possible to do with optaplanner? Are there too many constraints for the solver?
I know that the tasks scheduling and the requirements of each subtasks can be done, my biggest concern is with the ones that I have marked with *
Thank you in advance.
In the OptaPlanner docs, look for the Design Patterns chapter and read the section on how to design a good model and also the section on assigning to time (timeslot vs time grain vs chained).
These are 2 related video's:
https://www.youtube.com/watch?v=0uAoWU8m0pE
https://www.youtube.com/watch?v=Ew6pq9nJKog

OptaPlanner: Gaps in Chained Through Time Pattern

I'm just starting learning to use OptaPlanner recently. Please pardon me if there is any technically inaccurate description below.
Basically, I have a problem to assign several tasks on a bunch of machines. Tasks have some precedence restrictions such that some task cannot be started before the end of another task. In addition, each task can only be run on certain machines. The target is to minimize the makespan of all these tasks.
I modeled this problem with Chained Through Time Pattern in which each machine is the anchor. But the problem is that tasks on certain machine might not be executed sequentially due to the precedence restriction. For example, Task B can only be started after Task A completes while Tasks A and B are executed on machines I and II respectively. This means during the execution of Task A on machine I, if there is no other task that can be run on machine II, then machine II can only keep idle until Task A completes at which point Task B could be started on it. This kind of gap is not deterministic as it depends on the duration of Task A with respect to this example. According to the tutorial of OptaPlanner, it seems that additional planning variable gaps should be introduced for this kind of problem. But I have difficulty in modeling this gap variable now. In general, how to integrate the gap variable in the model using Chained Through Time Pattern? Some detailed explanation or even a simple example would be highly appreciated.
Moreover, I'm actually not sure whether chained through time pattern is suitable for modeling this kind of task assigning problem or I just used an entirely inappropriate method. Could someone please shed some light on this? Thanks in advance.
I'am using chained through time pattern to solve the same question as yours.And to solve the precedence restriction you can write drools rules.

Control of parallelization

I am running a custom processor on a rowset that does not seem to run in parallel. The underlying ~1GB text file is first read into a table that is partitioned via round robin. The 'Extract' runs on 200 vertices but then (under 'Aggregate' node) the processing [that does various complex computations] happens on only 2 vertices even though the parallelism parameter is much higher than that. Is there a special hint that needs to be used to dictate the compiler to use more vertex? Is there a function or property that needs to be overridden to set the parallelism at this phase as well?
Sorry for the late reply. But it is vacation time :).
It is good to see that the extract phase is fully scaled out.
Without seeing the script or the generated plan it is a bit difficult to say why you only see 2 vertices in some places. There are a couple of reasons why that may be the case:
you don't have enough data to scale out to more.
your aggregation needs more data and thus the plan has less parallelism.
your operation is intrinsically less parallel.
The optimizer's data cardinality estimation is off and chooses not enough parallelism. We have some ability to hint, but I rather first see the job.
Note that custom processors often block the optimizer from pushing optimizations through in the script (using the READ ONLY option for example helps) and can throw off the cardinality estimations.
If you send me the script, the job graph and the link to the job to mrys at Microsoft, I and the team will look into it next week after the holidays are over.

Travelling Salesman and Map/Reduce: Abandon Channel

This is an academic rather than practical question. In the Traveling Salesman Problem, or any other which involves finding a minimum optimization ... if one were using a map/reduce approach it seems like there would be some value to having some means for the current minimum result to be broadcast to all of the computational nodes in some manner that allows them to abandon computations which exceed that.
In other words if we map the problem out we'd like each node to know when to give up on a given partial result before it's complete but when it's already exceeded some other solution.
One approach that comes immediately to mind would be if the reducer had a means to provide feedback to the mapper. Consider if we had 100 nodes, and millions of paths being fed to them by the mapper. If the reducer feeds the best result to the mapper than that value could be including as an argument along with each new path (problem subset). In this approach the granularity is fairly rough ... the 100 nodes will each keep grinding away on their partition of the problem to completion and only get the new minimum with their next request from the mapper. (For a small number of nodes and a huge number of problem partitions/subsets to work across this granularity would be inconsequential; also it's likely that one could apply heuristics to the sequence in which the possible routes or problem subsets are fed to the nodes to get a rapid convergence towards the optimum and thus minimize the amount of "wasted" computation performed by the nodes).
Another approach that comes to mind would be for the nodes to be actively subscribed to some sort of channel, or multicast or even broadcast from which they could glean new minimums from their computational loop. In that case they could immediately abandon a bad computation when notified of a better solution (by one of their peers).
So, my questions are:
Is this concept covered by any terms of art in relation to existing map/reduce discussions
Do any of the current map/reduce frameworks provide features to support this sort of dynamic feedback?
Is there some flaw with this idea ... some reason why it's stupid?
that's a cool theme, that doesn't have that much literature, that was done on it before. So this is pretty much a brainstorming post, rather than an answer to all your problems ;)
So every TSP can be expressed as a graph, that looks possibly like this one: (taken it from the german Wikipedia)
Now you can run a graph algorithm on it. MapReduce can be used for graph processing quite well, although it has much overhead.
You need a paradigm that is called "Message Passing". It was described in this paper here: Paper.
And I blog'd about it in terms of graph exploration, it tells quite simple how it works. My Blogpost
This is the way how you can tell the mapper what is the current minimum result (maybe just for the vertex itself).
With all the knowledge in the back of the mind, it should be pretty standard to think of a branch and bound algorithm (that you described) to get to the goal. Like having a random start vertex and branching to every adjacent vertex. This causes a message to be send to each of this adjacents with the cost it can be reached from the start vertex (Map Step). The vertex itself only updates its cost if it is lower than the currently stored cost (Reduce Step). Initially this should be set to infinity.
You're doing this over and over again until you've reached the start vertex again (obviously after you visited every other one). So you have to somehow keep track of the currently best way to reach a vertex, this can be stored in the vertex itself, too. And every now and then you have to bound this branching and cut off branches that are too costly, this can be done in the reduce step after reading the messages.
Basically this is just a mix of graph algorithms in MapReduce and a kind of shortest paths.
Note that this won't yield to the optimal way between the nodes, it is still a heuristic thing. And you're just parallizing the NP-hard problem.
BUT a little self-advertising again, maybe you've read it already in the blog post I've linked, there exists an abstraction to MapReduce, that has way less overhead in this kind of graph processing. It is called BSP (Bulk synchonous parallel). It is more freely in the communication and it's computing model. So I'm sure that this can be a lot better implemented with BSP than MapReduce. You can realize these channels you've spoken about better with it.
I'm currently involved in an Summer of Code project which targets these SSSP problems with BSP. Maybe you want to visit if you're interested. This could then be a part solution, it is described very well in my blog, too. SSSP's in my blog
I'm excited to hear some feedback ;)
It seems that Storm implements what I was thinking of. It's essentially a computational topology (think of how each compute node might be routing results based on a key/hashing function to the specific reducers).
This is not exactly what I described, but might be useful if one had a sufficiently low-latency way to propagate current bounding (i.e. local optimum information) which each node in the topology could update/receive in order to know which results to discard.

Are there well-identified patterns for software scalability testing?

I've recently become quite interested in identifying patterns for software scalability testing. Due to the variable nature of different software solutions, it seems to like there are as many good solutions to the problem of scalability testing software as there are to designing and implementing software. To me, that means that we can probably distill some patterns for this type of testing that are widely used.
For the purposes of eliminating ambiguity, I'll say in advance that I'm using the wikipedia definition of scalability testing.
I'm most interested in answers proposing specific pattern names with thorough descriptions.
All the testing scenarios I am aware of use the same basic structure for the test which involves generating a number of requests on one or more requesters targeted at the processing agent to be tested. Kurt's answer is an excellent example of this process. Generally you will run the tests to find some thresholds and also run some alternative configurations (less nodes, different hardware etc...) to build up an accurate averaged data.
A requester can be a machine, network card, specific software or thread in software that generates the requests. All it does is generate a request that can be processed in some way.
A processing agent is the software, network card, machine that actually processes the request and returns a result.
However what you do with the results determines the type of test you are doing and they are:
Load/Performance Testing: This is the most common one in use. The results are processed is to see how much is processed at various levels or in various configurations. Again what Kurt is looking for above is an example if this.
Balance Testing: A common practice in scaling is to use a load balancing agent which directs requests to a process agent. The setup is the same as for Load Testing, but the goal is to check distribution of requests. In some scenarios you need to make sure that an even (or as close to as is acceptable) balance of requests across processing agents is achieved and in other scenarios you need to make sure that the process agent that handled the first request for a specific requester handles all subsequent requests (web farms are commonly needed like this).
Data Safety: With this test the results are collected and the data is compared. What you are looking for here is locking issues (such as a SQL deadlock) which prevents writes or that data changes are replicated to the various nodes or repositories you have in use in an acceptable time or less.
Boundary Testing: This is similar to load testing except the goal is not processing performance but how much is stored effects performance. For example if you have a database how many rows/tables/columns can you have before the I/O performance drops below acceptable levels.
I would also recommend The Art of Capacity Planning as an excellent book on the subject.
I can add one more type of testing to Robert's list: soak testing. You pick a suitably heavy test load, and then run it for an extended period of time - if your performance tests usually last for an hour, run it overnight, all day, or all week. You monitor both correctness and performance. The idea is to detect any kind of problem which builds up slowly over time: things like memory leaks, packratting, occasional deadlocks, indices needing rebuilding, etc.
This is a different kind of scalability, but it's important. When your system leaves the development shop and goes live, it doesn't just get bigger 'horizontally', by adding more load and more resources, but in the time dimension too: it's going to be running non-stop on the production machines for weeks, months or years, which it hasn't done in development.