EMR 5.4.0 High Availability

EMR 5.4.0 High Availability - amazon-emr

Does EMR 5.4.0 supports HA for Resource Manager, Namenode and Hive? if not any road map for the same?
i am not able to get it from the EMR documentation site
https://docs.aws.amazon.com/emr/latest/ReleaseGuide
please suggest if you find any useful document

As of Feb 2018 AWS EMR has a single point of failure at Master Node.
EMR FAQ
Look at :
Q: If the master node in a cluster goes down, can Amazon EMR recover it?
If HA is necessary requirement then you would want to consider either Cloud Offerings from Cloudera/Hortonworks/Mapr or Custom Installations on AWS EC2s.

To help people coming to this post for EMR HA queries.
Finally, as of writing this on may 2019, HA is made GA on AWS EMR. More information in the AWS post below.
https://aws.amazon.com/about-aws/whats-new/2019/04/amazon-emr-announces-support-for-multiple-master-nodes-to-enable-high-availability-for-EMR-applications/

Related

Scheduling over different AWS Components - Glue and EMR

I was wondering how I would tackle the following on AWS? - or whether it was not possible?
Transient EMR Cluster for some bulk Spark processing
When that cluster terminates, then and only then use a Glue Job to do some limited processing
I am not convinced AWS Glue Triggers will help over environments.
Or could one say, well just keep on in the EMR Cluster, it's not a good use case? Glue can write to SAP Hana with appropriate Connector and Redshift Spectrum is common use case to load Redshift via Glue job with Redshift Spectrum.

You can use "Run a job" service integration using AWS Step Functions. Step functions supports both EMR and Glue integration.
Please refer to the link for details.

Having spoken to Amazon on this aspect, they indicate that Airflow via MWAA is the preferred option now.

Is high availability really not possible with aws emr instance fleets?

In the ams emr management guide (https://github.com/awsdocs/amazon-emr-management-guide/blob/master/doc_source/emr-instance-fleet.md) I read:
... the master instance fleet is only a single instance ...
Does this ultimatly mean, I cannot provision an EMR cluster with instances fleet as high available? Or am I missing something here?

Your understanding is correct. Although AWS introduced multiple master nodes for EMR for high availability in 2019, the same can't be said for fleet instances and the official documentation you mentioned above confirms it.

Which s3 compatible blob storage?

I want deploy a s3 compatible blob storage in my Kubernetes Cluster. I already use GlusterFS for volumes like mongodb, and I tried to set up minio with the helm chart https://github.com/helm/charts/tree/master/stable/minio. I just realize I can't scale up minio easily because of erasure code.
So I have some questions about blob storage solutions :
Is GlusterFS blob storage service stable and reliable (https://github.com/gluster/gluster-kubernetes/tree/master/docs/examples/gluster-s3-storage-template) ?
Do I must use OpenShift to deploy GlusterFS blob storage as I read in the web ? I think no because I can see simple Kubernetes manifests in the GlusterFS repo like this one : https://github.com/gluster/gluster-kubernetes/blob/master/deploy/kube-templates/gluster-s3-template.yaml.
Is it easy to use Minio federation in Kubernetes ? Is it easily scalable with a "helm upgrade --set replicas=X" or do I need manually upgrade minio configuration ?
As you can see, I feel lost with this s3 storage. So if you have more information/solutions, do not hesitate.
Thanks in advance !

About reliability you should read more about user experience like:
An end user review of GlusterFS
Community Survey Feedback, 2019
Why openshift with glusterFS:
For standalone Red Hat Gluster Storage, there is no component installation required to use it with OpenShift Container Platform. OpenShift Container Platform comes with a built-in GlusterFS volume driver, allowing it to make use of existing volumes on existing clusters but Red Hat Gluster Storage is a commercial storage software product, based on Gluster.
How to deploy it in AWS
For minio please follow official docs:
ConfigMap allows injecting containers with configuration data even while a Helm release is deployed.
To update your MinIO server configuration while it is deployed in a release, you need to
Check all the configurable values in the MinIO chart using helm inspect values stable/minio.
Override the minio_server_config settings in a YAML formatted file, and then pass that file like this helm upgrade -f config.yaml stable/minio.
Restart the MinIO server(s) for the changes to take effect
I didn't try but, but as per documentation:
For federation I can see additional environment variables in the values.yaml.
In addition you should Run MinIO in federated mode Federation Quickstart Guide
Here you can find differences between google and amazon s3 sotrage
or Cloud Storage interoperability from gcloud perspective.
Hope this help.

Terraform - Cloudwatch alarm Elasticache cluster metrics

Can someone suggest on how we can create alarms for Elasticache cluster for "CPUUtilization" and "FreeableMemory" using Terraform?
Elasticache seems like an exception where we are unable to get cluster level metrics. Seems like current workaround is to create alarms at node level.
Haven't tried below but seems like a workaround -
https://github.com/azavea/terraform-aws-redis-elasticache/blob/develop/main.tf

It is possible, and documented here.

Here is my terraform module to create cloudwatch alerts for rds and cache clusters at node level.
https://bitbucket.org/rkkrishnaa/terraform/src/master/
I have added Jenkinsfile to deploy the alerts through CI.

Is there a way or tool that push cassandra data into AWS for backup purpose?

I'm working as cassandra cluster DevOps engr. wanted to know is there a way or tool that push cassandra data into AWS for backup purpose.I have cassandra cluster that is not in AWS. I explored netflix-priam but as per my understanding it needs cassandra to be hosted on AWS itself then it takes backups on EBS. my question is why i need to install cassandra cluster on AWS if i already have on-premise working cassandra. I have also read about cassandra-snapshotter & table-snap code in github,but dont want to use that. So again asking, is there such tool other than tablesnap,cassandra-snapshotter & Netflix-priam ??
Please help
Thanks

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

EMR 5.4.0 High Availability - amazon-emr

Does EMR 5.4.0 supports HA for Resource Manager, Namenode and Hive? if not any road map for the same? i am not able to get it from the EMR documentation site https://docs.aws.amazon.com/emr/latest/ReleaseGuide please suggest if you find any useful document

Related

Scheduling over different AWS Components - Glue and EMR

Is high availability really not possible with aws emr instance fleets?

Which s3 compatible blob storage?

Terraform - Cloudwatch alarm Elasticache cluster metrics

Is there a way or tool that push cassandra data into AWS for backup purpose?

Categories

Resources