Is high availability really not possible with aws emr instance fleets?

Is high availability really not possible with aws emr instance fleets? - amazon-emr

In the ams emr management guide (https://github.com/awsdocs/amazon-emr-management-guide/blob/master/doc_source/emr-instance-fleet.md) I read:
... the master instance fleet is only a single instance ...
Does this ultimatly mean, I cannot provision an EMR cluster with instances fleet as high available? Or am I missing something here?

Your understanding is correct. Although AWS introduced multiple master nodes for EMR for high availability in 2019, the same can't be said for fleet instances and the official documentation you mentioned above confirms it.

Related

How to move data analytics into AWS?

I've installed tiger and I have one problem, I hope you could help me to solve it. Suppose I install tiger at a data center (physical datacenter) either using Docker and the AIO or Kubernetes. I get it installed, I connect to data sources, I do the ETL, I create the LDM, Metrics, Insights, Dashboard KPI. However, I realized that we need to have a cloud strategy and we need to move our data analytics - on premise Tiger - to AWS. Can I shutdown then the docker image or kubernetes, SCP it to either 1. AWS EC2 instance OR 2. AWS EKS. Can someone walked me theoretically through these steps?
I suppose that datasources are not on yet on AWS and that there is a VPN connection between the on premise data center and AWS or even AWS Direct Connect between on premise data center and AWS Region for customer.

if you are thinking about moving Tiger but not data source, it would be definitely challenging because of the latency (and also security).
Well, if a customer has good and secure link between public cloud and on-premise, then it should work.
In such a case both deployments of Tiger can work fully in parallel, on top of the same data source. So such a migration would be almost zero-downtime.

Terraform - Cloudwatch alarm Elasticache cluster metrics

Can someone suggest on how we can create alarms for Elasticache cluster for "CPUUtilization" and "FreeableMemory" using Terraform?
Elasticache seems like an exception where we are unable to get cluster level metrics. Seems like current workaround is to create alarms at node level.
Haven't tried below but seems like a workaround -
https://github.com/azavea/terraform-aws-redis-elasticache/blob/develop/main.tf

It is possible, and documented here.

Here is my terraform module to create cloudwatch alerts for rds and cache clusters at node level.
https://bitbucket.org/rkkrishnaa/terraform/src/master/
I have added Jenkinsfile to deploy the alerts through CI.

EMR 5.4.0 High Availability

Does EMR 5.4.0 supports HA for Resource Manager, Namenode and Hive? if not any road map for the same?
i am not able to get it from the EMR documentation site
https://docs.aws.amazon.com/emr/latest/ReleaseGuide
please suggest if you find any useful document

As of Feb 2018 AWS EMR has a single point of failure at Master Node.
EMR FAQ
Look at :
Q: If the master node in a cluster goes down, can Amazon EMR recover it?
If HA is necessary requirement then you would want to consider either Cloud Offerings from Cloudera/Hortonworks/Mapr or Custom Installations on AWS EC2s.

To help people coming to this post for EMR HA queries.
Finally, as of writing this on may 2019, HA is made GA on AWS EMR. More information in the AWS post below.
https://aws.amazon.com/about-aws/whats-new/2019/04/amazon-emr-announces-support-for-multiple-master-nodes-to-enable-high-availability-for-EMR-applications/

minio: What is the cluster architecture of minio.io object storage server?

I have searched minio.io for hours but id dosn't provide any good information about clustering, dose it has rings and instance are connected? or mini is just for single isolated machine. And for running a cluster we have to run many isolated instance of it and the our app choose to which instance we write?
if yes:
When I write a file to a bucket does minio replicate it between multi server?
I is it like amazon s3, or openstack swift that support of storing multi copy of object in different servers (and not multi disk on the same machine).

Here is the document for distributed minio: https://docs.minio.io/docs/distributed-minio-quickstart-guide

From what I can tell, minio does not support clustering with automatic replication across multiple servers, balancing, etcetera.
However, the minio documentation does say how you can set up one minio server to mirror another one:
https://gitlab.gioxa.com/opensource/minio/blob/1983925dcfc88d4140b40fc807414fe14d5391bd/docs/setup-replication-between-two-sites-running-minio.md

Minio also Introduced Continuous Availability and Active-Active Bucket Replication. CheckoutTheir active-active Replication Guide

Kubernetes master high availability or replication configuration

Hi all we are looking for practically and tested guide or reference for kubernetes master high availability or other solution for master node fail over.

There are definitely folks running Kubernetes HA masters in production following the instructions for High Availability Kubernetes Clusters. As noted at the beginning of that page, it's an advanced use case and requires in-depth knowledge of how the Kubernetes master components work.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Is high availability really not possible with aws emr instance fleets? - amazon-emr

Your understanding is correct. Although AWS introduced multiple master nodes for EMR for high availability in 2019, the same can't be said for fleet instances and the official documentation you mentioned above confirms it.

Related

How to move data analytics into AWS?

Terraform - Cloudwatch alarm Elasticache cluster metrics

EMR 5.4.0 High Availability

minio: What is the cluster architecture of minio.io object storage server?

Kubernetes master high availability or replication configuration

Categories

Resources