APScheduler: How to get the number of running instances - apscheduler

I can get the scheduled jobs by get_jobs but is there any way to retrieve the currently running instances of a job?

Related

Distributed JobRunr using single data source

I want to create a scheduler using JobRunr that will run in two different server. This Scheduler will select data from SQL database and will call an api endpoint. How can I make sure these 2 schedulers running in 2 different server will not pick same data from database ?
Main concern is they should not call the API with duplicate data from 2 different server.
As per documentation JobRunr will push the job in the queue first, but I am wondering how one scheduler queue will know that the same data has not picked by other scheduler in different server, is they any extra locking mechanism I need to maintain ?
JobRunr will run any job only once - the locking is already done in JobRunr itself.

Hazelcast cluster member crash results in loosing all scheduled tasks

We are running 4 instances of our java application in hazelcast cluster. We scheduled around 2000 task using schedule executor service schedule method. Hazelcast partition all these 2000 tasks across the 4 instances. Due to some reason one of the cluster member crashes then all the task that are assign to the partition that are owned by the crashed node are lost, rest all 3 cluster member completed their assign task.
So how can we overcome this problem to avoid the lost tasks.
Try using the Durable Executor
Probably a good idea also to find why the process crashed in the first place.

resource management on spark jobs on Yarn and spark shell jobs

Our company has a 9 nodes clusters on cloudera.
We have 41 long running spark streaming jobs [YARN + cluster mode] & some regular spark shell jobs scheduled to run on 1pm daily.
All jobs are currently submitted at user A role [ with root permission]
The issue I encountered are that while all 41 spark streaming jobs are running, my scheduled jobs will not be able to obtain resource to run.
I have tried the YARN fair scheduler, but the scheduled jobs remain not running.
We expect the spark streaming jobs are always running, but it will reduce the resources occupied whenever other scheduled jobs start.
please feel free to share your suggestions or possible solutions.
Your spark streaming jobs are consuming too many resources for your scheduled jobs to get started. This is either because they're always scaled to a point that there aren't enough resources left for scheduled jobs or they aren't scaling back.
For the case where the streaming jobs aren't scaling back you could check whether you have dynamic resource allocation enabled for your streaming jobs. One way of checking is via the spark shell using spark.sparkContext.getConf.get("spark.streaming.dynamicAllocation.enabled"). If dynamic allocation is enabled then you could look at reducing the minimum resources for those jobs.

How to detect APScheduler's running jobs?

I have some recurring jobs run frequently or last for a while.
It seems that Scheduler().get_jobs() will only return the list of scheduled jobs that are not currently running, so I cannot determine if a job with certain id do not exists or is actually running.
How may I test if a job is running or not in this situation?
(I set up those jobs not the usual way, (because I need them to run in a random interval, not fixed interval), they are jobs that execute only once, but will add a job with the same id by the end of their execution, and they will stop doing so when reaching a certain threshold.)
APScheduler does not filter the list of jobs for get_jobs() in any way. If you need random scheduling, why not implement that in a custom trigger instead of constantly readding the job?

Can we add more Amazon Elastic Mapreduce instances into an existing Amazon Elastic Mapreduce instances?

I am new to Amazon Services and facing some issues.
Suppose I am running some Job Flow on Amazon Elastic Mapreduce with total 3 instances. While running my job flow on it I found that my job is taking more time to execute. And in such case I need to add more instances into it so that my instances will increase and hence job will execute fast.
My question is that How to add such instance into an existing instances? Because If we terminate existed instance and again create the new instances with more number is time consuming.
Is there anyway to do it? If yes then please suggest me.
I am doing all this task through CLI. So please share the anwers with commands too along with GUI steps in AWS Management Console.
Thanks.
Yes, you can do this with the command line tool
to add more instances to the core group:
elastic-mapreduce --modify-instance-group CORE --instance-count 40
To create a task group (no datanodes), with 40 instances:
elastic-mapreduce --add-instance-group TASK --instance-count 40 --instance-type c1.medium
It's important to note that CORE instance-group instances can not be reduced since they participate as data nodes. They can be increased only.
TASK instances only do processing and can be increased and reduced.