How to solve this scheduling problem with identical workers and task dependencies? - optimization

The problem I'm trying to solve, can be expressed as:
A team of N astronauts preparing for the re-entry of their space shuttle
into the atmosphere. There is a set of tasks that must be accomplished by the team before
re-entry. Each task must be carried out by exactly one astronaut, and certain tasks can not
be started until other tasks are completed. Which tasks should be performed by which
astronaut, and in which order, to ensure that the entire set of tasks is accomplished as
quickly as possible?
So:
Each astronaut is able to preform every tasks
Some tasks might depend on each other (e.g. task i must be completed before task j)
Tasks do not have a specific start time or deadline
The objective is to minimize the makespan (time it takes to complete all tasks)
I have found solutions to similar problems (e.g. Job Shop problem, RCPSP, etc) but non of those problems completely captures the problem described above, as they don't involve worker allocation to the tasks, i.e. the solution assumes specific workers must work on specific tasks.

Related

Project Job Scheduling: A job is shared by multiple projects

I am currently working on project job scheduling case, and stuck by a requirement from the customer. It is about scheduling a supervisor in the workshop. The situation is like below:
The factory has 49 machines. Because of the regulations, the machines are distributed into 11 workshops, instead of 1 single large shared workshop.
Each workshop has 3-5 machines, and each machine can fulfill projects independently. Although a machine can finish an order by itself, it takes 11 days with multiple technicians on each day for different types of jobs.
For a specific workshop, there must be a supervisor staying in it, as long as the first machine starts working. This means his schedule starts as soon as the first machine is turned on, and ends as soon as the last machine is turned off.
The numbers of the supervisors are quite limited and often need to be scheduled carefully.
A supervisor is possibly scheduled to various workshops to make a full utilization of his time of the day shift.
I am considering to include the schedule of the supervisors jobs into the project routing. This means, concurrently this monitoring job could be one of the jobs of multiple projects in that workshop. However it seems the example case does not support such a scenario.
Any advices/hints will be highly appreciated. Thanks a lot!

How to reflect automatic processes and processes done by different users with BPMN?

Let's say I have the following simplified process:
How should I reflect there that the data could be added not only by manual input, but can be received from another system (without user verification)?
And is there more correct way to display the same actions done by different users (see Verification step done by Manager 1 or Manager 2; in reality there are much more steps than just Verification, and all of them are the same in Manager 1 and Manager 2 columns).
Obviously there are many open questions regarding your specific requirements, so I can just give you an example:
I am using two lanes, one for the manager, one for the user. I assume that the concrete person (or subrole) necessary to carry out the steps for the "manager" needs to be determined within the process. From a process perspective it's just one role carried out by people with different skill sets or authorizations. I show that "Assign" task here as an automatic step, but it could also be a manual step. A BPMN process can have several start "events", I am using here two of them to show the different ways in which the process can start. I am using a collapsed pool "External System" and a message flow to indicate where the automatic message is coming from.
(Please note that BPMN processes are typically modeled from left to right, but may also be modeled from top to bottom. Also note, that for more complex processes and a more finegrained level of detail, it is often preferable to show every process participant in a separate pool with a separate process and exchange of messages in between them. Modeling one process pool with several lanes quite soon reaches practical limits!)

RabbitMQ job queus completion indicator event

I am trying out RabbitMQ with springboot. I have a main process and within that process I am creating many number of small tasks that can be processed from other workers. From the main process perspective, I like to know when all of these tasks are completed so that it can move to next step. I did not find a easy way to query rabbitmq if the tasks are complete.
One solution I can think of is to store these tasks in a database and when each message is completed, update the database with COMPLETE status. Once all jobs are in COMPLETE status, the main process can know the jobs are COMPLETE and it can move to next step o fits process.
Another solution I can think of is that the main process maintain the list of jobs that is being sent to other workers. Once each worker completes it's job, it can send a message to the main process indicating the job is complete. Then the Main process can mark the job is complete and remove the item from the list.Once the list is empty, the main process will know the jobs are complete and it can move to next step of it's work.
I am looking to learn best practice on how other people have dealt this kind of situation. I appreciate for any suggestion.
Thank you!
There is no way to query RabbitMQ for this information.
The best way to approach this is with the use of a process manager.
The basic idea is to have your individual steps send a message back to a central process that keeps track of which steps are done. When that main process receives notice that all of the steps are done, it lets the system move on to the next thing.
The details of this approach are fairly complex, but I do have a blog post that covers the core of a process manager from a JavaScript/NodeJS perspective.
You should be able to find something like a "process manager" or "saga" as they are sometimes called, within your language and RabbitMQ framework of choice. If not, you should be able to write one for your process without too much trouble, as described in my blog post.

What is the practice for scheduling multiple inter-dependent SQL Server Agent jobs?

The way my team currently schedules jobs is through the SQL Server Job Agent. Many of these jobs have dependencies on other internal servers which in turn have their own SQL Server Jobs that need to be run to keep their data up to date.
This has created dependencies in the start time and length of each of our SQL Server Jobs. Job A might depend on Job B finishing, so we schedule Job B a certain estimated time in advance to Job A. All of this process is very subjective and not scalable, as we add more jobs and servers which create more dependencies.
I would love to get out of the business of subjectively scheduling these jobs and hoping that the dominos fall in the right order. I am wondering what the accepted practices for scheduling SQL Server jobs are. Do people use SSIS to chain jobs together? Is there tooling already built into the SQL Server Job Agent to handle this?
What is the accepted way to handle the scheduling of multiple SQL Server jobs with dependencies on each other?
I have used Control-M before to schedule multiple inter-dependent jobs in different environment. Control-M generally works by using batch files (from what I remember) to execute SSIS packages.
We had a complicated environment hosting 2 data warehouses side by side (1 International and 1 US Local). There were jobs that were dependent on other jobs and those jobs on others and so on, but by using Control-M we could easily decide on the dependency (It has a really nice and intuitive GUI). Other tool that comes to my mind is Tidal Scheduler.
There is no set standard for job scheduling, but I think its safe to say that job schedules depend entirely on what an organization needs. For example Finance jobs might be dependent on Sales and Sales on Inventory and so on. But the point is, if you need to have job inter dependency, using a third party software such as Control-M is a safe bet. It can control jobs on different environments and give you real sense of the company wide job control.
We too had the requirement to manage dependencies between multiple agent jobs - after looking at various 3rd party tools and discounting them for various reasons (mainly down to the internal constraints relating to the use of 3rd party software) we decided to create our own solution.
The solution centres around a configuration database that holds details about processes (jobs) that need to run and how they are grouped (batches), along with the dependencies between processes.
Summary of configuration tables used:
Batch - highlevel definition of a group of related processes, includes metadata such as max concurrent processes, and current batch instance etc.
Process - meta data relating to a process (job) such as name, max wait time, earliest run time, status (enabled / disabled), batch (what batch the process belongs to), process job name etc.
Batch Instance - the active instance of a given batch
Process Instance - active instances of processes for a given batch
Process Dependency - dependency matrix
Batch Instance Status - lookup for batch instance status
Process Instance Status - loolup for process instance status
Each batch has 2 control jobs - START BATCH and UPDATE BATCH. The 1st deals with starting all processes that belong to it and the 2nd is the last to run in any given batch and deals with updating the outcome statuses.
Each process has an agent job associated with it that gets executed by the START BATCH job - processes have a capped concurrency (defined in the batch configuration) so processes are started up to a max of x at a time and then START BATCH waits until a free slot becomes available before starting the next process.
The process agent job steps call a templated SSIS package that deals with the actual ETL work and with the decision making around whether the process needs to run and has to wait for dependencies etc.
We are currently looking to move to a Service Broker solution for greater flexibility and control.
Anyway, probably too much detail and not enough example here so VS2010 project available on request.
I'm not sure how much this will help, but we ended up creating an email solution for scheduling.
We built an email reader that accesses an exchange mailbox. As jobs finish, they send an email to the mail reader to start another job. The other nice part, is that most applications have email notifications built in, so there really isn't much in the way of custom programming.
We really only built it in the first place to handle data files coming in from lots of other partners. It was much easier to give them an email address rather than setting them up with an ftp site, etc.
The mail reader app now has grown to include basic filtering, time of day scheduling, use of semaphores to prevent concurrent jobs, etc. It really works great.

SSIS 2005 Control Flow Priority

The short version is I am looking for a way to prioritize certain tasks in SSIS 2005 control flows. That is I want to be able to set it up so that Task B does not start until Task A has started but Task B does not need to wait for Task A to complete. The goal is to reduce the amount of time where I have idle threads hanging around waiting for Task A to complete so that they can move onto Tasks C, D & E.
The issue I am dealing with is converting a data warehouse load from a linear job that calls a bunch of SPs to an SSIS package calling the same SPs but running multiple threads in parallel. So basically I have a bunch of Execute SQL Task and Sequence Container objects with Precedent Constraints mapping out the dependencies. So far no problems, things are working great and it cut our load time a bunch.
However I noticed that tasks with no downstream dependencies are commonly being sequenced before those that do have dependencies. This is causing a lot of idle time in certain spots that I would like to minimize.
For example: I have about 60 procs involved with this load, ~10 of them have no dependencies at all and can run at any time. Then I have another one with no upstream dependencies but almost every other task in the job is dependent on it. I would like to make sure that the task with the dependencies is running before I pick up any of the tasks with no dependencies. This is just one example, there are similar situations in other spots as well.
Any ideas?
I am late in updating over here but I also raised this issue over on the MSDN forums and we were able to devise a partial work around. See here for the full thread, or here for the feature request asking microsoft to give us a way to do this cleanly...
The short version is that you use a series of Boolean variables to control loops that act like roadblocks and prevent the flow from reaching the lower priority tasks until the higher priority items have started.
The steps involved are:
Declare a bool variable for each of the high priority tasks and default the values to false.
Create a pre-execute event for each of the high priority tasks.
In the pre-execute event create a script task which sets the appropriate bool to true.
At each choke point insert a for each loop that will loop while the appropriate bool(s) are false. (I have a script with a 1 second sleep inside each loop but it also works with empty loops.)
If done properly this gives you a tool where at each choke point the package has some number of high priority tasks ready to run and a blocking loop that keeps it from proceeding down the lower priority branches until said high priority items are running. Once all of the high priority tasks have been started the loop clears and allows any remaining threads to move on to lower priority tasks. Worst case is one thread sits in the loop while waiting for other threads to come along and pick up the high priority tasks.
The major drawback to this approach is the risk of deadlocking the package if you have too many blocking loops get queued up at the same time, or misread your dependencies and have loops waiting for tasks that never start. Careful analysis is needed to decide which items deserved higher priority and where exactly to insert the blocks.
I don't know any elegant ways to do this but my first shot would be something like this..
Sequence Container with the proc that has to run first. In that same sequence container put a script task that just waits 5-10 seconds or so before each of the 10 independent steps can run. Then chain the rest of the procs below that sequence container.