What happens when HBase node fails? - replication

HBase is wonderful, but...
What will happen to the data when a node (explodes/burns down/gets stolen/Torn into pieces by mad IT on the loose)?
Is the data lost?
Can the cluster auto recover?
Can I add new nodes without downtime?
Thanks guys,
Maxim.

Because HBase uses HDFS as a data storage layer, your data is replicated on the other nodes (3 by default). And same rules apply as to normal HDFS usage.
What to you mean by auto recovery? The data after some time will eventually return to the state of replication level before crash.
Yes, you can. See this topic in FAQ: http://wiki.apache.org/hadoop/Hbase/FAQ#A21

Related

Can I have 2 stage redis backup using free redis?

I am new to redis, still reading doc, hope you could help me here.
I need a 2-stage database solution:
At local devices, there is a database cluster. It has several primaries and several replicas. To my understanding each primary or replica normally has a portion of the whole data set. This is called data sharding.
At cloud, there is another database replica. This cloud replica backs up the whole data set.
I like to use free redis for this solution, not enterprise version.
Is this achievable? From what I read so far, it seems that there is no problem if the cloud replica is just like local replica to back up a portion of data set. So I want to know whether I can use the cloud database to back up the whole cluster.
Thanks!
Nothing prevents you from having a replica hosted in the cloud, but each Redis cluster node is either a master responsible of a set of key slots (shards) or a replica of a master; in a multi-master scenario there is no way to have a single replica covering different master nodes.
With the goal of having your entire cluster data replicated in the cloud, you should configure and host there one additional Redis replica per each master node. To avoid those new replicas to ever become masters themselves, you can set their cluster-replica-no-failover configuration property accordingly in their redis.conf files:
cluster-replica-no-failover yes
In all cases, please note that replication is not a backup solution and you may want to pair your setup with a proper Redis persistence mechanism.
If I understand your questions clearly, your master dataset(in shards) are located on premise and the replicas(slave) are hosted on cloud. There is nothing preventing you from backing up your slaves(open source redis) on the cloud. Redis doesn't care where the slaves are situated provided the master can reach them. Master-slave replication is available on redis enterprise with no such restriction. You might have a little problem implementing master-master replication on redis open source but that is outside the scope of this question

Reduce Redis cluster to single GCP memorystore

I have 3 redis instance with redis. One is the master and the other two, are the slaves. I have connected to master node and get info by redis-cli with INFO command. I can see the parameter cluster_enabled:0 and
#Replication
role:master
connected_slaves:2
slave0:ip=xxxxx,port=6379,state=online,offset=15924636776,lag=1
slave1:ip=xxxxx,port=6379,state=online,offset=15924636776,lag=0
And the keyspace, each node has different dbs. I need to migrate all data to a single memorystore in GCP but I don't know how. Anyone can help me?
Since the two nodes are slaves and clustering is not enabled, you only need to replicate the master node. RIOT is a great tool for migrating data in and out of Redis.
However, if you say DB by node do you mean redis DB that you access by select? In that case you'll need to prefix keys as there may be overlap between the keysets of the DBs.
I think setting up another Redis cluster in a single node configuration is the least of your worries.
The real challenge for you would be migrating all your records over to the new setup. This is not a simple question to answer and would depend heavily on multiple factors:
The total size of your data being migrated
Is this is a live Database in production
Do you want to keep the two DB schemas in your new configuration separate?
Ok, I believe currently your Redis Instances are hosted on Google Compute Engine.
And you are looking to migrate to Memorystore for Redis.
As mentioned here, you can leverage Redis snapshots for this. It provides you step-wise instructions on how to achieve this, leveraging GCS buckets as transient storage.
import data into Cloud Memorystore instances using RDB (Redis Database Backup) snapshots, as well as back up data from existing Redis instances.

A solution to increase my EMR master node capacity without shutting down the cluster

My EMR master node has become full and I need to attach some ESB volumne to it, is there any way to do it without terminating the cluster?
You can add additional EBS volumes & also resize
How to explained here :
https://superuser.com/questions/1409373/how-to-add-an-ebs-volume-by-snapshot-id-to-amazon-emr
https://github.com/qyjohn/AWS_Tutorials/wiki/Grow-EBS-volumes-on-EMR-clusters
I don't think so. This is because you set up Amazon Elastic Block Store (Amazon EBS) volumes and configure mount points when the cluster is launched, so it’s difficult to modify the storage capacity after the cluster is running.
The feasible solutions usually involve adding more nodes to your
cluster, backing up your data to a data lake, and then launching a new
cluster with a higher storage capacity. Or, if the data that occupies
the storage is expendable, removing the excess data is usually the way
to go.
For more details,have a look at: https://aws.amazon.com/blogs/big-data/dynamically-scale-up-storage-on-amazon-emr-clusters/

aerospike data not found on after server restart

I am new to aerospike DB. I inserted data from mysql to aerospike using a migration script. Due to some issue aerospike server was restarted.
But after the restart, there was no data in aerospike DB.
Can someone please let me know what could be the issue? Any config problem in Aerospike ?
What is the storage mechanism that you used with Aerospike? Did you use one of the default databases? One of the defaults is a in-memory only. Hence, data will be lost if it is an in-memory storage only with a single node and is restarted.
So basically you should ensure that the database storage is configured for persistence[1], has replication factor 2 or more and the suggested minimum number of servers in the cluster should be atleast equal to replication factor to ensure HA.
[1]https://www.aerospike.com/docs/operations/configure/namespace/storage/#recipe-for-an-ssd-storage-engine

Accumulo -- Adding a new node

I'm trying to learn Accumulo. But I have a couple of questions that I couldn't find directly:
First, can we add a new server to an existing Accumulo system without any down time? If yes, the new node will have its share (DB data) arranged by master; right? Since it has fail-recovery, I believe that will be automatic.
Can we define the number of replications or whole data is shared with some fail recovery system by itself? How can I learn the details of replication and data distribution process?
Thanks a lot :)
Yes, you can dynamically add/remove worker nodes at any time. They just need to have the same configuration options available to them so that they can join the cluster (shared secret, zookeeper quorum, etc... basically, the same accumulo-site.xml that you are using).
By default, the "master" process will assign tablets to each "tablet server" processes so that each host will be serving roughly the same amount of data.
Not sure I understand your second question, but Accumulo generally uses HDFS for its backing store, which handles replication and data recovery at the "file" level.