I have
a jenkins server
two AWS ElastiCache Redis instances
Occasionally the developer team needs to issue FLUSHALL, and desires to do so from Jenkins so they don't need to hunt down a system administrator, or fool around in shell.
Optimally, I'd use AWS CLI. I don't see how in the AWS CLI toolset to do this.
Is there a way to execute FLUSHALL from a shell script?
Thanks
You might want to look at using lamda for AWS environment. This can be a starting point https://docs.aws.amazon.com/lambda/latest/dg/vpc-ec.html
Related
Dask jobqueue seems to be a very nice solution for distributing jobs to PBS/Slurm managed clusters. However, if I'm understanding its use correctly, you must create instance of "PBSCluster/SLURMCluster" on head/login node. Then you can on the same node, create a client instance to which you can start submitting jobs.
What I'd like to do is let jobs originate on a remote machine, be sent over SSH to the cluster head node, and then get submitted to dask-jobqueue. I see that dask has support for sending jobs over ssh to a "distributed.deploy.ssh.SSHCluster" but this seems to be designed for immediate execution after ssh, as opposed to taking the further step of putting it in jobqueue.
To summarize, I'd like a workflow where jobs go remote --ssh--> cluster-head --slurm/jobqueue--> cluster-node. Is this possible with existing tools?
I am currently looking into this. My idea is to set-up an SSH tunnel with paramiko and then use Pyro5 to communicate with the cluster object from my local machine.
What would be the best solution to transfer files between s3 and an EC2 instance using airflow?
After research i found there was a s3_to_sftp_operator but i know it's good practice to execute tasks on the external systems instead of the airflow instance...
I'm thinking about running a bashoperator that executes an aws cli on the remote ec2 instance since it respects the principle above.
Do you have any production best practice to share about this case ?
The s3_to_sftp_operator is going to be the better choice unless the files are large. Only if the files are large would I consider a bash operator with an ssh onto a remote machine. As for what large means, I would just test with the s3_to_sftp_operator and if the performance of everything else on airflow isn't meaningfully impacted then stay with it. I'm regularly downloading and opening ~1 GiB files with PythonOperators in airflow on 2 vCPU airflow nodes with 8 GiB RAM. It doesn't make sense to do anything more complex on files that small.
The best solution would be not to transfer the files, and most likely to get rid of the EC2 while you are at it.
If you have a task that needs to run on some data in S3, then just run that task directly in airflow.
If you can't run that task in airflow because it needs vast power or some weird code that airflow won't run, then have the EC2 instance read S3 directly.
If you're using airflow to orchestrate the task because the task is watching the local filesystem on the EC2, then just trigger the task and have the task read S3.
I'm working as cassandra cluster DevOps engr. wanted to know is there a way or tool that push cassandra data into AWS for backup purpose.I have cassandra cluster that is not in AWS. I explored netflix-priam but as per my understanding it needs cassandra to be hosted on AWS itself then it takes backups on EBS. my question is why i need to install cassandra cluster on AWS if i already have on-premise working cassandra. I have also read about cassandra-snapshotter & table-snap code in github,but dont want to use that. So again asking, is there such tool other than tablesnap,cassandra-snapshotter & Netflix-priam ??
Please help
Thanks
We have ETL jobs i.e. a java jar(performs etl operations) is run via shell script. The shell script is passed with some parameters as per the job being run. These shell scripts are run via crontab as well as manually depending on the requirements. Sometimes there is need of running some sql commands/scripts on posgresql RDS DB too, before the shell script run.
We have everything on AWS i.e. Ec2 talend server, Postgresql RDS, Redshift, ansible etc.
How can we automate this process? How to deploy and handle passing custom parameters etc. Pointers are welcome.
I would prefer to go with AWS Data pipeline, and add steps to perform any pre / post operations on your ETL job, like running shell scripts, or any hql etc.
AWS Glue runs on Spark engine, and it has other features as well as such AWS Glue Development Endpoint, Crawler, Catalog, Job schedulers. I think AWS Glue would be ideal if you are starting afresh, or plan to move your ETL to AWS Glue. Please refer here on price comparison.
AWS Pipeline: For details on AWS Pipeline
AWS Glue FAQ:For details on supported languages for AWS Glue
Please note according to AWS Glue FAQ:
Q: What programming language can I use to write my ETL code for AWS
Glue?
You can use either Scala or Python.
Edit: As Jon scott commented, Apache Airflow is another option for job scheduling, but I have not used it.
You can use Aws Glue for performing serverless ETL. Glue also has triggers which lets you automate their jobs.
I am working on Ruby on Rails app with Mongodb .My app is deployed on heroku and for delayed jobs i am using amazon ec2. Things I have a doubt
1)How to connect to the mongo database in amazon ec2 which is basically at heroku?
2)When i run delayed jobs how it will went to amazon server what are the changes i have to make to the app? If somebody can point me tutorial for this.
If you want to make your EC2 instance visible to your application on Heroku, you need to add your instance to Heroku's security group from Amazon. There are some instructions in Heroku's documentation that explain how to connect to external services like this.
https://devcenter.heroku.com/articles/dynos#connecting-to-external-services
In the case of MongoDB running on its default ports, you'd want to do something like this:
$ ec2-authorize YOURGROUP -P tcp -p 27017 -u 098166147350 -o default
As for how to handle your delayed jobs running remotely on the EC2 instance, you might find this article from the Artsy engineering team helpful. It sounds like they developed a fairly similar setup.
http://artsy.github.io/blog/2012/01/31/beyond-heroku-satellite-delayed-job-workers-on-ec2/