Scaled up ignite nodes are not participating in the existing topology - ignite

Ignite cluster was running with 3 replicas and having persistence data storage of 80 GB. Backup as one. CPU 3 cores RAM is 6 GB.
Half of the caches are created with Partitioned mode and collocated. One of this cache is loaded with 70% of data. Remaining caches are replicated.
Scaled up the ignite cluster to 5 with 10 GB RAM. All nodes were up and the newly created nodes were not participating in the existing topology.
Observed one node was using more CPU and RAM and other nodes were using very less.
Can you help us to overcome this scenario?

When you add nodes to a cluster with persistence enabled, you need to adjust the "baseline topology." There are multiple ways, but the easiest is to use the control.sh script:
./control.sh --baseline add consistentId1,consistentId2 --yes

Related

Dask Yarn failed to allocate number of workers

We have a CDH cluster (version 5.14.4) with 6 worker servers with a total of 384 vcores (64 cores per server).
We are running some ETL processes using dask version 2.8.1, dask-yarn version 0.8 with skein 0.8 .
Currently we are having problem allocating the maximum number of workers .
We are not able to run a job with more the 18 workers! (we can see the actual number of workers in the dask dashboad.
The definition of the cluster is as follows:
cluster = YarnCluster(environment = 'path/to/my/env.tar.gz',
n_workers = 24,
worker_vcores = 4,
worker_memory= '64GB'
)
Even when increasing the number of workers to 50 nothing changes, although when changing the worker_vcores or worker_memory we can see the changes in the dashboard.
Any suggestions?
update
Following #jcrist answer I realized that I didn't fully understand the termenology between the Yarn web UI application dashboard and the Yarn Cluster parameters.
From my understanding:
a Yarn Container is equal to a dask worker.
When ever a Yarn cluster is generated there are 2 additional workers/containers that are running (one for a Schedualer and one for a logger - each with 1 vCore)
The limitation between the n_workers * worker_vcores vs. n_workers * worker_memory that I still need fully grok.
There is another issue - while optemizing I tried using cluster.adapt(). The cluster was running with 10 workers each with 10 ntrheads with a limit of 100GB but in the Yarn web UI there was only displayed 2 conteiners running (my cluster has 384 vCorres and 1.9TB so there is still plenty of room to expand). probably worth to open a different question.
There are many reasons why a job may be denied more containers. Do you have enough memory across your cluster to allocate that many 64 GiB chunks? Further, does 64 GiB tile evenly across your cluster nodes? Is your YARN cluster configured to allow jobs that large in this queue? Are there competing jobs that are also taking resources?
You can see the status of all containers using the ApplicationClient.get_containers method.
>>> cluster.application_client.get_containers()
You could filter on state REQUESTED to see just the pending containers
>>> cluster.application_client.get_containers(states=['REQUESTED'])
this should give you some insight as to what's been requested but not allocated.
If you suspect a bug in dask-yarn, feel free to file an issue (including logs from the application master for a problematic run), but I suspect this is more an issue with the size of containers you're requesting, and how your queue is configured/currently used.

Redis sizing for Production

I have few queries regarding Redis Cluster setup:
1.Does redis support cross site replication ? When we start redis cluster,can we decide what will be the slave of each instance.
2.I need to store around 11 billion keys,with full persistance and fault tolerance,how many master -slaves should i start with? I have a high tps requirement of both read and writes.
Pls suggest.
Data is important for you and you don't want to lose any data during failover/DR scenarios.
You need to create a Redis cluster.
Redis cluster needs at least 3 master nodes which shares 16384 slots between each other.
You need to create slave nodes as much as you want to replicate data from master nodes.(Please consider right slave number by yourself.)

Setting up a Hadoop Cluster on Amazon Web services with EBS

I was wondering how I could setup a hadoop cluster (say 5 nodes) through AWS. I know how to create the cluster on EC2 but I don't know how to face the following challenges.
What happens if I lose my spot instance. How do I keep the cluster going.
I am working with some datasets of Size 1TB. Would it be possible to setup the EBS accordingly. How can I access the HDFS in this scenario.
Any help will be great!
Depending on your requirements, these suggestions would change. However, assuming a 2 Master and 3 Worker setup, you can probably use r3 instances for Master nodes as they are memory intensive app optimized and go for d2 instances for the worker nodes. d2 instances have multiple local disks and thus can withstand some disk failures while still keeping your data safe.
To answer your specific questions,
treat Hadoop machines as any linux applications. What would happen if your general centOS spot instances are lost? Hwnce, generally it is advised to use reserved instances.
Hadoop typically stores data by maintaining 3 copies and distributing them across the worker nodes in forms of 128 or 256 MB blocks. So, you will have 3TB data to store across the three worker nodes. Obviously, you have to consider some overhead while calculating space requirements.
You can use AWS's EMR service - it is designed especially for Hadoop clusters on top of EC2 instances.
It it fully managed, and it comes pre-packed with all the services you need in Hadoop.
Regarding your questions:
There are three main types of nodes in hadoop:
Master - a single node, don't need to spot it.
Core - a node that handle tasks, and have part of the HDFS
Task - a node that handle tasks, but does not have any part of the HDFS
If Task nodes are lost (if they are spot instances) the cluster will continue to work with no problems.
Regarding storage, the default replication factor in EMR is as follows:
1 for clusters < four nodes
2 for clusters < ten nodes
3 for all other clusters
But you can change it - http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hdfs-config.html

Elasticache - Increase the nodes for replication

I was looking at elasticache as a replacement for our EC2 redis deployment.
However I have a hard requirement of 1 master and 11 replicas in our deployment, while elasticache allows only 5 replicas. Is there a way this restriction can be relaxed.
No, unfortunately 5 read replicas appears to be the upper limit - http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/Replication.html
Is your replica requirement to spread out CPU load? If so, there are some very capable instances available these days. Check out the M4 family of machines.

Infinispan Distributed Cache Issues

I am new to Infinispan. We have setup a Infinispan cluster so that we can make use of the Distributed Cache for our CPU and memory intensive task. We are using UDP as the communication medium and Infinispan MapReduce for distributed processing. The problem we are using facing is with the throughput. When we run the program on a single node machine, the total program completes in around 11 minutes i.e. processing a few hundred thousand records and finally emitting around 400 000 records.
However, when we use a cluster for the same dataset, we are seeing a throughput of only 200 records per second being updated between the nodes in the cluster, which impacts the overall processing time. Not sure which configuration changes are impacting the throughput so badly. I am sure its something to do with the configuration of either JGroups or Infinispan.
How can this be improved?