Files related to AIX HACMP Commands - aix

Are there any files which stores the output of the below HACMP command?
cllsgrp
clshowres

According to this document cluster configuration files are stored in shared disk (between nodes).
From here:
The cluster repository disk is used as the central repository for the
cluster configuration data. The cluster repository disk must be
accessible from all nodes in the cluster and is a minimum of 10 GB in
size. Given the importance of the cluster configuration data, the
cluster repository disk should be backed up by a redundant and highly
available storage configuration.

Related

How can I maintain a list of constant masters and workers under conf/masters and conf/workers in a managed Scaling cluster?

I am using an AWS EMR cluster with Alluxio installed n every node. I want to now deploy Alluxio in High Availability.
https://docs.alluxio.io/os/user/stable/en/deploy/Running-Alluxio-On-a-HA-Cluster.html#start-an-alluxio-cluster-with-ha
I am following the above documentation, and see that "On all the Alluxio master nodes, list all the worker hostnames in the conf/workers file, and list all the masters in the conf/masters file".
My concern is that since I have an AWS-managed scaling cluster the worker nodes keep added and removed based on cluster loads. How can I maintain a list of constant masters and workers under conf/masters and conf/workers in a managed Scaling cluster?
this conf/workers and conf/masters conf file is only used for intiial setup through scripts. Once the cluster is running, you don’t need to update them any more.
E.g., in an say EMR cluster, you can add a new slave node as Alluxio worker and as long as you specify the correct Alluxio master address, this new Alluxio worker will be able to register itself and serve in the fleet like other workers,

What is the difference between:Redis Replicated setup, Redis Cluster setup Redis Sentinel setup and Redis with Master with Slave only?[REDISSON]

I've read https://github.com/redisson/redisson
And I found out that there are several
Redis Replicated setup (including support of AWS ElastiCache and Azure Redis Cache)
Redis Cluster setup (including support of AWS ElastiCache Cluster and Azure Redis Cache)
Redis Sentinel setup
Redis with Master with Slave only
I am not a big expert in clusters and I don't understand the difference between these setups.
Could you beiefly explain the differences ?
Disclaimer I am an AWS employee.
I do not know how Redis Replicated Setup is different from Redis in Master-Slave mode. Maybe they mean cross-region replication?
In any case, I can try and explain setups I know about:
Redis with Master with Slave only - is a single shard setup where you create a primary replica together with one or more secondary (slave) replicas (let's hope PC police won't arrest me). This setup is used to improve the durability of your in-memory store. It's not advised to use your secondaries for reads because such setup has eventual consistency guarantees and your replica reads may be stale (depending on the replication lag).
Redis Cluster setup - the setup supported by cloud provides such as AWS Elasticache. In this setup your workload can be spread horizontally across multiple shards and each shard may have its own secondary replicas. Your client library must support this setup since it requires maintaining multiple connections to several nodes at a client level. Moreover, there are some locality rules you need to follow in order to use cluster mode efficiently:
Keys with foo{<shard>}bar notation will be routed to their shard according to what is stored inside curly brackets.
You can not use mset, mget and other multi-key commands across shards. You can still use these commands if their keys contain the same {shard} part.
There are additional cluster mode admin commands that are exposed by Redis but they are usually hijacked and hidden from users by cloud providers since cloud provides use them in order to manage redis cluster themselves.
Redis cluster have an ability to migrate part of your workload between shards. However, it still obliged to preserve correctness with respect to {shard} notation. Since your client library is responsible to fetch data from specific shard it must handle "moved" response when a shard might redirect it to another node.
Redis Sentinel setup - using an additional server that provides service discovery functionality for Redis clusters. Not strictly required and I believe is less popular across users. It serves as a single source of truth regarding each node's health and state. It provides monitoring, management, and service discovery functions for managing your Redis cluster. Many Redis client libraries provide the option of connecting to Redis sentinel nodes in order to achieve automatic service discovery and seamless failover flow. One of the reasons why this setup is less popular is because cloud companies like AWS Elasticache provide this service out of the box.

redis cluster total size

I have a quick question about redis cluster.
I'm setting up a redis cluster on google cloud kubernetes engine. I'm using the n1-highmem-2 machine type with 13GB RAM, but I'm slightly confused how to calculate the total available size of the cluster.
I have 3 nodes with each 13GB ram. I'm running 6 pods (2 on each node), 1 master and 1 slave per node. This all works. I've assigned 6GB of RAM to each pod in my pod definition yaml file.
Is it correct to say that my total cluster size would be 18GB (3 masters * 6GB), or can I count the slaves size with the total size of the redis cluster?
Redis Cluster master-slave model
In order to remain available when a subset of master nodes are failing or are not able to communicate with the majority of nodes, Redis Cluster uses a master-slave model where every hash slot has from 1 (the master itself) to N replicas (N-1 additional slaves nodes).
So, slaves are replicas(read only) of masters(read-write) for availability, hence your total workable size is the size of your master pods.
Keep in mind though, that leaving masters and slaves on the same Kubernetes node only protects from pod failure, not node failure and you should consider redistributing them.
You didn't mention how are you installing Redis, But I'd like to mention Bitnami Redis Helm Chart as it's built for use even on production and deploys 1 master and 3 slaves providing good fail tolerance and have tons of configurations easily personalized using the values.yaml file.

EMR Spark job - usage of HDFS and EBS storage

Does Spark on EMR distributes input data from Amazon S3 to underlying HDFS ?
What is the usage of EBS volumes which are also attached to nodes ?
The root EBS volume for each node is used for the operating system and application files. This is a 10GB volume by default. Additional volumes attached to the core nodes are used for HDFS. Task nodes may have additional volumes, but Task nodes do not have HDFS Name Nodes, and will not store HDFS data.
Instance Storage documentation for EMR:
Instance store and/or EBS volume storage is used for HDFS data, as well as buffers, caches, scratch data, and other temporary content that some applications may "spill" to the local file system.
Spark will store temporary data in HDFS if configured to do that.
You can configure properties like spark.local.dir to set where Spark should write data.
Unless you are specifically writing data to HDFS, you don't need to provision large EBS volumes for core nodes. I suggest launching a cluster with what you estimate you'll need, and then adding additional core nodes as your HDFS requirements increase.
Whether you specify HDFS or not it is always spun by the EMR. I couldn't find any documentation why EMR spins HDFS; but to my experience EMR first writes to HDFS as temporary storage and then copies these data to S3. Some part of the root volume is used to host this HDFS -- even though you didn't check HDFS checkbox when spinning EMR.

Jackrabbit Clustering Configuration

My application uses stand alone version of jackrabbit and we wanted to move to embedded mode so that we can cluster it.
I read the requirements on the jackrabbit clustering site but still confused. Should I be having different home directories for each cluster node. i.e. If I need to configure two nodes, do I need to have ~/node1/repository.xml and ~/node2/repository.xml? Or they can share same ~/node/repository.xml?
As described in the Clustering Overview, "each cluster node needs its own (private) repository directory, including repository.xml file, workspace FileSystem and Search index."