Hangfire batching - does each job execute in the order in which its added to the batch? - hangfire

With batching, is each job executed in the order in which its added within the batch
https://docs.hangfire.io/en/latest/background-methods/using-batches.html
Say I had 3 different jobs, but would like them to execute in order 1-2-3 and 2 would only start after 1 was complete. Does batching help in this case? I would still need to at least define the start time with a chron schedule.

Related

How changing number of workers will effect the Glue job

I have 2200000 records to process in Glue job which is leading to timeout as by default it is set to 2 days and number of workers are 10 . Increasing the number of workers will help in running the glue job faster ??
Increasing the numbers of worker will help to run the job faster, if your job has transformations that can run in parallel since you are allocating more executor nodes.
2200000 records isn't that much though, and you should check if somethings wrong with the code if it takes > 2 days.

Snakemake 200000 job submission

I have 200000 fasta sequences. I am doing GATK to call variants and created a wildcard for every sequence. Now I would like to submit 200000 jobs using snakemake. Will this cause a problem to cluster? Is there a way to submit jobs in a set of 10-20?
First off, it might take some time to calculate the DAG, but I have been told the DAG calculation recently has been greatly improved. Anyways, it might be wise to split up in batches.
Most clusters won't allow you to submit more than X jobs at the same time, usually in the range of 100-1000. I believe the documentation is not fully correct, but when using --cluster cluster I believe the --jobs argument controls the number of active/submitted jobs at the same time, so by using snakemake --jobs 20 --cluster "myclustercommand" you should be able to control this. Know that this control the number of submitted jobs, not active jobs. It might be that all your jobs are in the queue, so probably best to check in with your cluster administrator and ask what the maximum number of submitted jobs is, and get as close to that number.

Understanding Domain Class in Project Job Scheduling

I am new to optaplanner, and right now I focus on trying to understand the project job scheduling. I trying to run this examples using the sample data from optaplanner manual like in this picture below:
I have some question about the domain classes in this example :
What is the difference of GlobalResource and LocalResource? In the examples, all the resource is GlobalResource right? Then what the use of LocalResource?
There are 3 JobType: SOURCE, STANDARD, SINK, what is the meaning each one of them? It is SOURCE mean the job should be the first to start before the others? STANDARD mean it is should be run after the predecessor job finished but not after the SINK job? SINK mean it is the last job to do after all job finished?
What is the meaning of property releaseDate and criticalPathDuration in Project class? If we related it with the picture above, what is the value for project Book1 and Book2?
What is the meaning of requirement in ResourceRequirement?
I will be really thankful if someone can help me create the xml sample data like in optaplanner distribution, cause it will help me more faster to understand this example. Thanks & Regards.
A LocalResource belongs to a specific Project, a GlobalResource is shared between the projects.
So a LocalResource only has to be worry about being used by other jobs in the same Project too, while a GlobalResource has to worry about all other tasks.
That's an implementation trick. The source and sink jobs are dummy's basically. Because a project might start with multiple jobs in parallel, a SOURCE job is put in front of it, to have a single root. Same for the end: it can end with multiple, so a SINK job is put after it, to have a single tail. This makes it easier and faster to determine makespan etc.
IIRC, releaseDate is the first date we are allowed to start the first job. For example: you have to create a book, but you 'll only get the actual final content next Monday, so the releaseDate is next Monday (you can't start any work before that date).
The criticalPathDuration is a theoretical minimum duration (if we can happily ignore resources IIRC). For example: if job A takes 5 days and job B takes 2 days and B has to be done AFTER A, then the critical path duration is 7 days. Adding job C which takes 1 day and can be done in parallel with the others, don't affect that.
ResourceRequirement is the many2many relationship between ExecutionMode and Resource. Remember that ExecutionMode belongs to a specific Job. For example: doing job A in executionMode A1 requires 1 laborers and 5 days. Doing job A in executionMode A2 requires 2 laborers and 3 days.

How to reduce time allotted for a batch of HITs?

today I created a small batch of 20 categorization HITs with the name Grammatical or Ungrammatical using the web UI. Can you tell me the easiest way to manage this batch so that I can reduce its time allotted to 15 minutes from 1 hour and remove also remove the categorization of masters. This is a very simple task that's set to auto-approve within 1 hour, and I am fine with that. I just need to make it more lucrative for people to attempt this at the penny rate.
You need to register a new HITType with the relevant properties (reduced time and no masters qualification) and then perform a ChangeHITTypeOfHIT operation on all of the HITs in the batch.
API documentation here: http://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/ApiReference_ChangeHITTypeOfHITOperation.html

How do I do the Delayed::Job equivalent of Process#waitall?

I have a large task that proceeds in several major steps: Step A must complete before Step B can be started, etc. But each major step can be divided up across multiple processes, in my case, using Delayed::Job.
The question: Is there a simple technique for starting Step B only after all the processes have completed working on Step A?
Note 1: I don't know a priori how many external workers have been spun up, so keeping a reference count of completed workers won't help.
Note 2: I'd prefer not to create a worker whose sole job is to busy wait for the other jobs to complete. Heroku workers cost money!
Note 3: I've considered having each worker examine the Delayed::Job queue in the after callback to decide if it's the last one working on Step A, in which case it could initiate Step B. This could work, but seems potentially fraught with gotchas. (In the absence of better answers, this is the approach I'm going with.)
I think it really depends on the specifics of what you are doing, but you could set priority levels such that any jobs from Step A run first. Depending on the specifics, that might be enough. From the github page:
By default all jobs are scheduled with priority = 0, which is top
priority. You can change this by setting
Delayed::Worker.default_priority to something else. Lower numbers have
higher priority.
So if you set Step A to run at priority = 0, and Step B to run at priority = 100, nothing in Step B will run until Step A is complete.
There's some cases where this will be problematic -- in particular, if you have a lot of jobs and are running a lot of workers, you will probably have some workers running Step B before the work in Step A is finished. Ideally in this setup, Step B has some sort of check to make check if it can be run or not.