Apache Ignite Fault Tolerance - ignite

I have few questions about Ignite Cache in Partitioned mode
1)When a node goes down in a Ignite cluster, If the failed node is primary for a key, does the backup of this become new primary?.
2)What happens to the backup copies in the failed node? will they be recreated in the cluster?.
3)If I set CacheRebalanceMode in cache configuration will it be applicable for node failure as well or only in case of node addition?

Yes, this is right. Former backup will become a new primary and new backup will receive the copy in background.
Yes, if backup is lost, new node will assigned for this role. It will receive the copy in background.
In synchronous rebalance mode a node will not complete start process and user will not be able to use the API until the data is rebalanced. This doesn't affect the rebalancing process in case of failures.

Related

How to restart scylla db cluster without any data loss

I want to restart my Scylla db cluster. But I don't want to lose any data.
Do I lose any data if I restart one after other node?
No, you will not loose data if you are doing a rolling restart.
Scylla keeps the data replicated across multiple nodes (usually 3 or more)
Depending on your Replication Factor (RF) and Consistency Level (CL) you might see read or write operations failed during the restart. See interactive calc here https://docs.scylladb.com/getting-started/consistency/#consistency-level-calculator
If "restarting a node" just involves restarting Scylla or rebooting the kernel on which it runs, then you're safe: Scylla is a distributed database, and is designed to support durability and availability even when nodes temporarily disappear from the network. When a node is temporarily down, all its data is still available for reads (from two other replicas), and also writes continue to work normally and will be eventually replicated to the down node when it finally comes up (using the "hinted handoff" and/or "repair" mechanisms).
However, if by "restarting a node" you mean something more destructive - replacing it with a brand-new node with empty storage, as in some cloud setups where nodes have transient storage. In that case you have to be more careful: If the node's data is lost, we still have two more replicas and the database continues to be available, but you should tell the cluster to "stream" the data which the node lost back to the node - before continuing to do this destructive restart to additional nodes. If you have RF=3 and destroy three nodes at the same time, you will surely lose data.

Apache Ignite Partitioned Mode with 1 backup copy: updates to cache do not get reflected in both paritions?

I have an Apache Ignite cluster with 5 nodes, running in PARTITIONED mode with 1 back-up copy for each primary partition (also configured to read from backup if it's on the local node).
Updates to data in one of the caches is received from a Kafka topic, updates are processed and cache is re-loaded as required.
However, occasionally, I am observing that when I request the data from the cache, I will get the correct updated data a handful of times, but this will alternate with getting the stale data pre-update back.
It seems to me that something fails when syncing between the primary and back up node upon update (configuration is FULL_SYNC so not related to async issues). I can't spot any errors in the logs which suggest something like this however.
How can I determine if this is the cause of the issue? What else may be going wrong to cause this behaviour?
Running on Ignite 2.9.1
Thanks

Could you please explain Replication feature of Redis

I am very new in REDIS cache implementation.
Could you please let me know what is the replication factor means?
How it works or What is the impact?
Thanks.
At the base of Redis replication (excluding the high availability features provided as an additional layer by Redis Cluster or Redis Sentinel) there is a very simple to use and configure leader follower (master-slave) replication: it allows replica Redis instances to be exact copies of master instances. The replica will automatically reconnect to the master every time the link breaks, and will attempt to be an exact copy of it regardless of what happens to the master.
This system works using three main mechanisms:
When a master and a replica instances are well-connected, the master keeps the replica updated by sending a stream of commands to the replica, in order to replicate the effects on the dataset happening in the master side due to: client writes, keys expired or evicted, any other action changing the master dataset.
When the link between the master and the replica breaks, for network issues or because a timeout is sensed in the master or the replica, the replica reconnects and attempts to proceed with a partial resynchronization: it means that it will try to just obtain the part of the stream of commands it missed during the disconnection.
When a partial resynchronization is not possible, the replica will ask for a full resynchronization. This will involve a more complex process in which the master needs to create a snapshot of all its data, send it to the replica, and then continue sending the stream of commands as the dataset changes.
Redis uses by default asynchronous replication, which being low latency and high performance, is the natural replication mode for the vast majority of Redis use cases.
Synchronous replication of certain data can be requested by the clients using the WAIT command. However WAIT is only able to ensure that there are the specified number of acknowledged copies in the other Redis instances, it does not turn a set of Redis instances into a CP system with strong consistency: acknowledged writes can still be lost during a failover, depending on the exact configuration of the Redis persistence. However with WAIT the probability of losing a write after a failure event is greatly reduced to certain hard to trigger failure modes.

Can Infinispan be forced to fully replicate to a new cluster member

Looking through the Infinispan getting started guide it states [When in replication mode]
Infinispan only replicates data to nodes which are already in the
cluster. If a node is added to the cluster after an entry is added, it
won’t be replicated there.
Which I read as any cluster member will always be ignorant of any data that existed in the cluster before it became a cluster member.
Is there a way to force Infinispan to replicate all existing data to a new cluster member?
I see two options currently but I'm hoping I can just get Infinispan to do the work.
Use a distributed cache and live with the increase in access times inherent in the model, but this at least leaves Infinispan to handle its own state.
Create a Listener to listen for a new cache member joining and iterate through the existing data, pushing it into the new member. Unfortunately this would in effect cause every entry to replicate out to the existing cluster members again. I don't think this option will fly.
This information sounds as misleading/outdated. When the node joins a cluster, a rebalance process is initiated and when you query for these data during the rebalance prior to delivering these data to the node, the entry is fetched by remote RPC.

Cluster Failover

I know I'm asking something very obvious about cluster failover.
I read on redis.io that, if any master cluster node fails it will affect to other master nodes until slave come to take in charge. In my structure, I'm not defining any slave and just working with 3 masters.
I'm thinking to modify the redis-trib.rb file, which will remove the defected server and will start the cluster with other 2 nodes. I'm confused about a couple of things,
1) Resharding
Could not possible until failed server goes live
2) Minimum 3 node limitation for create cluster
As per bit understanding, redis-trib.rb not allowing me to create cluster for two nodes
There might be some solution in code file :)
3) Automatic Way to Re-Create new structure with live nodes
As programmer point of view, I'm searching something automatic for my system. Something that trigger one command when Redis Cluster fails some tasks happens internally. like
Shutdown all other redis cluster servers
Remove nodes-[port].conf files from all cluster nodes folder
Start redis cluster servers
Run "redis-trib.rb create ip:port ip:port"
I'm just trying to minimize administration work :). Otherwise I need to implement some other algorithm "Data Consistency" here.
If any of you guys have any solution or idea, kindly share.
Thanks,
Sanjay Mohnani
In a cluster with only master nodes, if a node fails, data is lost. Therefore no resharding is possible, since it is not possible to migrate the data (hash slots) out of the failed node.
To keep the cluster working when a master fails, you need slave nodes (one per master). This way, when a master fails, its slave fails over (becomes the new master with the same copy of the data).
The redis-trib.rb script does not handle cluster creation with less than 3 masters, however in redis-cluster a cluster can be of any size (at least one node).
Therefore adding slave nodes can be considered an automatic solution to your problem.