How to assign multiple workers to one task at the same time - optaplanner

I have the requirement as follows: there are some tasks and some workers,
For any given task, we need one worker or a pair of workers with different levels at the same time.
For any given task, we must finish it before a specific time.

define one worker or a combination with two workers as a human resource, assign the human resource object to the task.
write a hard constraint rule for the task's finish time. Punish it when the finish time is later than the specified time.

Besides Kent's 2 options, there is a more complex scenario that requires:
Option 3: The auto delay to last design pattern.
Give each Task 1 or more TaskAssignment instances. Some tasks only require only 1 Worker, so they only have 1 TaskAssignment. Those that require multiple workers, have multiple.
2 task assignments for the same task must start and end at the same time (the workers need to work together). Delay until the last arriving worker. So when worker 1 arrives at 10:00 and worker 2 arrives at 10:30, delay the worker 1 until 10:30 to start the task. So if the task takes 1 hour, both workers are done at 11:30 (so worker 1 will have lost half an hour, but the solver will automatically start avoiding that because of your existing soft constraints that already favor time efficiency)
If 2 task assignments for the same task have the same worker, incur a hard constraint penalty.

Related

Can this SQL operation be done without doing it row by row (RBAR)?

I have a set of tasks, with some tasks being more important than others
Each task does some work on one or more databases.
These tasks are assigned to workers that will perform the task (threads in an application poll a table).
When the worker has done the task, it sets the value back to null to signal it can accept work again.
When assigning the tasks to the workers, I would like to impose an upper limit on the number of database connections that can be used at any one time - so a task that uses a database that is currently at it's limit will not be assigned to a worker.
I can get the number of database connections available by subtracting the databases of tasks that are currently assigned to workers from the database limits.
My problem is this, how do I select tasks that can run, in order of importance, based on the number of database connections available, without doing it row by row?
I'm hoping the example below illustrates my problem:
On the right is available database connections, decreasing as we go down the list of tasks in order of importance.
If I'm selecting them in order of the importance of a task, then the connections available to the next task depend on whether the previous one was selected, which depends on if there was space for all it's database connections.
In the case above, task 7 can run only because task 6 couldn't
Also task 8 can't run because task 5 took the last connection to database C as it's a more important task.
Question:
Is there a way to work this out without using while loops and doing it row by row?

Understanding how many parallel workers remain from a worker pool across sessions for PostgreSQL parallel queries?

Let's say I have a pool of 4 parallel workers in my PostgreSQL database configuration. I also have 2 sessions.
In session#1, the SQL is currently executing, with my planner randomly choosing to launch 2 workers for this query.
So, in session#2, how can I know that my pool of workers has decreased by 2?
You can count the parallel worker backends:
SELECT current_setting('max_parallel_workers')::integer AS max_workers,
count(*) AS active_workers
FROM pg_stat_activity
WHERE backend_type = 'parallel worker';
Internally, Postgres keeps track of how many parallel workers are active with two variables named parallel_register_count and parallel_terminate_count. The difference between the two is the number of active parallel workers. See comment in in the source code.
Before registering a new parallel worker, this number is checked against the max_parallel_workers setting in the source code here.
Unfortunately, I don't know of any direct way to expose this information to the user.
You'll see the effects of an exhausted limit in query plans. You might try EXPLAIN ANALYZE with a SELECT query on a big table that's normally parallelized. You would see fewer workers used than workers planned. The manual:
The total number of background workers that can exist at any one time
is limited by both max_worker_processes and max_parallel_workers.
Therefore, it is possible for a parallel query to run with fewer
workers than planned, or even with no workers at all.

Scheduling optimization with multiple constraints

I have faced with a scheduling of N jobs which each of them has a release time, deadline and value. There is two type of constraints:
hard constraint which means I must schedule this set of jobs such that all of them meet their deadlines; and
soft constraint which means I prefer to schedule jobs in decreasing order of their values.
I know without considering soft constraint there exists an optimal scheduling algorithm called EDF (earliest deadline first) which schedules job with earliest deadline in each step. This problem satisfies Optimal substructure and Greedy Choice Properties which results in greedy algorithm become optimal solution.
Now, the question is: is there any solution which consider soft constraint as well as hard one? How we can prove its optimality?
I have thought about variant knapsack problem and nurse scheduling algorithms but unfortunately do not find any idea or solution yet.
it is worth noting that:
. jobs are preemptible which means jobs can be divided and executed in one time unit slices

two relations over two tables

I have a problem with a database design I am trying to complete in sql server management in a diagram.
I have an process table and an activity table.
A process can call multiple activities, however an activity can only be called by one process at a time (one to many relation).
However in my second scenario an activity can also call one process at a time and a process can thus only be called by one activity. (one to one relation)
What is the best way to design these tables around this principal but also track who called who?
Thanks in advance.
Just add activityID to process, and ProcessID to activity, that way you have the 2 relationships 1 - M that you are describing
Maybe call them something like CalledByProcessID (activity) and CalledByActivityID ( process
To answer
track who called who
Sounds to me like you need a third table to log calls, including ID of activity/process called, ID of activity/process that did the calling, time started, time finished etc.
Also, enforcing
an activity can only be called by one process at a time
and
an activity can also call one process at a time
could be done by adding fields to the activity table, called_by_processID and calling_processID, which would need to be updated at start of call and emptied at end of call.
Alternatively, leave that functionality in the log table and only allow that activity/process call when the corresponding previous call has finished. Slightly more complicated logic but better normalisation.

Rails/SQL How do I get different workers to work on different records?

I have (for argument sake) 1000 records and 10 Heroku workers running.
I want to have each worker work on a different set of records..
What I've got right now is quite good, but not quite complete.
sql = 'update products set status = 2 where id in
(select id from products where status = 1 limit (100) ) return *'
records = connection.execute(sql)
This works rather well.. I get get 100 records and at the same time, I make sure my other workers don't get the same 100..
If I throw it in a while loop then even if I have 20000 records and 2 workers, eventually they will all get processed.
My issue is if there's a crash or exception then the 100 records look like their being processed by another worker but they aren't.
I can't use transaction, because the other selects will pick up the same records.
My question
What strategies do others use to have many workers working on the same dataset, but different records.
I know this is a conversational question... I'd put it as community wiki, but I don't see that ability any more.
Building a task queue in a RDBMS is annoyingly hard. I recommend using a queueing system that's designed for the job instead.
Check out PGQ, Celery, etc.
I have used queue_classic by Heroku to schedule jobs stored in a Postgres database.
If I were to do this it would be something other than a db-side queue. It sounds like standard client processing but you really want is parallel processing of the result set.
The simplest solution might be to do what you are doing but lock them on the client side, and divide them between workers there (spinlocks etc). You can then commit the transaction and re-run after these have finished processing.
The difficulty is that if you have records you are processing for things that are supposed to happen outside the server, and there is a crash, you never really know what records were processed. It is safer to rollback probably, but just keep that in mind.