Scheduling a task in a Distributed Environment like Kubernetes - apscheduler

I have a FAST API based Rest Application where I need to have some scheduled tasks that needs to run every 30 mins. This application will run on Kubernetes as such the number of instances are not fixed. I want the Scheduled Jobs to only trigger from one of the available instance and not from all the running instances creating a Race condition, as such I need some kind of locking mechanism that will prevent the schedulers to fire if one is already running. My App does connect to a MySql compatible Aurora DB running on AWS. Can I achieve this with ApScheduler, if not are there any alternatives available?

Related

Manage In-memory cache in multiple servers in aws

Once or twice a day some files are being uploaded to S3 Bucket. I want the uploaded data to be refreshed with the In-memory data of each server on every s3 upload.
Note there are multiple servers running and I want to store the same data in all the servers. Also, the servers are scaling based on the traffic(also on start-up of the new server goes up and older ones go down means server instances will not be the same always).
Like I want to keep updated data in the cache.
I want to build an architecture where auto-scaling of the server can be supported. I came across the FAN-OUT architecture of AWS by using the SNS and multiple SQS from which different servers can poll.
How can we handle the auto-scaling of the queue with respect to servers?
Or is there any other way to handle the scenario?
PS: I m totally new to the AWS environment.
It Will be a great help for any reference.
To me there are a few things that you need to have to make this work. These are opinions and, as with most architectural designs, there is certainly more than one way to handle this.
I start with the assumption that you've got an application running on an EC2 of some sort (Elastic Beanstalk, Fargate, Raw EC2s with auto scaling, etc.) and that you've solved for having the application installed and configured when a scale-up event occurs.
Conceptually I'd have this diagram:
The setup involves having the S3 bucket publish likely s3:ObjectCreated events to the SNS topic. These events will be published when an object in the bucket is updated or created.
Next:
During startup your application will pull the current data from S3.
As part of application startup create a queue named after the instance id of the EC2 (see here for some examples) The queue would need to subscribe to the SNS topic. If the queue already exists then that's not an error.
Your application would have a background thread or process that polls the SQS queue for messages.
If you get a message on the queue then that needs to tell the application to refresh the cache from S3.
When an instance is shut down there is an event from at least Elastic Beanstalk and the load balancers that your instance will be shut down. Remove the SQS queue tied to the instance at that time.
The only issue might be that a hard crash of an environment would leave orphan queues. It may be advisable to either manually clean these up or have a periodic task clean them up.

Flink on yarn use yarn-session or not?

There are two methods to deploy flink applications on yarn. The first one is use yarn-session and all flink applications are deployed in the session. The second method is each flink application deploy on yarn as a yarn application.
My question is what's the difference between these two methods? Which one to choose in product environment?
I can't find any material about this.
I think the first method will save resources since only need one jobmanager(yarn application master). While it is also the disadvantage since the only jobmanager can be the bottleneck while flink applications getting more and more.
Both modes have their uses in production environments.
Session mode generally makes sense when you will be running a bunch of short-lived jobs, and want to avoid the overhead of starting up a cluster for each one. On the other hand, there are security implications, as any credentials available to any of the jobs will be accessible to all of the jobs. Cluster-per-job mode may use more resources overall, but is, in some sense, more straightforward.

Distributed job management system

I'm using beeQueue for video transcoding job scheduling and processing
For now everything is fine and but I'm now facing challenge of working with distributed environment like auto scaling the amazon the instances for adding more workers to process more jobs which are pending in the queue, We scale well but need to implement a system which is fail safe, I mean in case a instance on which workers were processing the job has gone shutdown and we don't get job status or events, In that case the job which were running on that instance is gone into blackhole and can't be recovered and processed again.
What I did :
I'm looking up for ready made solution who works fail safe in distributed env.
Thanks

Multiple Mobilefirst-Server artifacts concurrent deploy

I use a batch procedure for deploying MFP v7 artifacts (wlapps and adapters).
The procedure is based on the standard ant tasks defined in worklight-ant-deployer.jar.
The MFP environment runs onto a WAS cell, and consists of a single AdminService application managing multiple WLRuntimes.
Is it possible to run two (or more) deploy tasks concurrently against different WLRuntime targets ?
Furthermore, sticking to a single WLRuntime, is it possible to deploy different multiple artifacts concurrently ?
Thanks in advance for any answer/comment.
Ciao, Stefano.
For a single WL runtime, all deployments are internally done sequentially. You can start the deployments concurrently, but internally only one deployment is done after the other, due to a transaction locking mechanism. If you start too many deployments in parallel, it may come to timeout situations, even though this is seldom. By default, a deployment transaction waits for 20 minutes before it may time out.
Note: starting deployments in parallel means here using ant tasks or the wladm tool or the REST service directly. In the MobileFirst Admin Console UI, you will see deploy buttons disabled when another deployment transaction is ongoing, hence in the UI, it is not so easily possible to start deployments in parallel. The UI tries to prohibit that.
Note 2: the 20 minutes that I mentioned above is for the locking mechanism itself. Ant/wladm has its own parameters for time out that may be lower, hence in ant tasks, you might get time outs quicker than 20 min. See here.
For multiple WL runtimes, deployments can be concurrently. The mentioned locking mechanism is per runtime, hence deployments that occur in one WL runtime will not influence any other WL runtime.

Amazon EMR managing my spark cluster

I have a spark setup on Amazon EC2 machines with 2 worker machines running. It reads data from cassandra, do some processing and write to sql server. I have heard about amazon EMR and read about it. I want a managed system where my worker machines are automatically added to my cluster if my job is taking more time and shutdown when my job gets completed.
Can I achieve this through Amazon EMR?
The requirements are:
My worker machines are automatically added to my cluster if my job is taking more time.
Shutdown when my job gets completed.
No. 2 is definitely possible if your job is launched from the steps. There is an option that auto-terminates cluster after the last step is completed. Alternatively, this could also be done programatically with the SDK.
No. 1 is a little more difficult but EMR has three classes of nodes; master, core, and task. Task nodes can be added after cluster creation. The trigger for that would probably have to be done programatically or utilizing another Amazon service, like Lambda.