Rolling restart Ignite full memory mode on K8s without losing data - ignite

I am running ignite on the k8s cluster with 5 pods and set backups = 1 https://apacheignite.readme.io/docs/primary-and-backup-copies. Is there any way to do a rolling restart without losing data and how to check if the data is synced to other instances before restarting one after another
Thank you

You can monitor cache rebalancing process by using jmx
https://ignite.apache.org/docs/latest/monitoring-metrics/metrics#monitoring-rebalancing

Related

How to mount volume for stateless service that uses Apache Ignite

I have a service, that runs on run on kubernetes, uses Apache Ignite to store some data for processing, runs in replication mode with native persistence enabled. How to rightly mount the volume so the data is persisted the disk? Please note, this question is not related to mounting volumes in Kubernetes, rather the configuration/method to enable persistence in service running with embedded Ignite server in Kubernetes.
Note: The application may run multiple replicas.
Edit: As volumes (pvc) cannot be shared by multiple pods, only pod runs successfully, and other pods are in pending state.
The stateless means the system does not have dependency during its start or execution, but only be as stateless as possible. So, as the need itself is persistence, the Ignite has to be deployed as stateful using the StatefulSet. The StatefulSet will automatically provision separate volumes & mount it to every pod.
Checkout out Ignite guides for mounting K8 on AWS, GKE, and Azure

Redis cluster node failure not detected on MISCONF

We currently have a redis cache cluster with 3 masters and 3 slaves hosted on 3 windows servers (1 master/slave by server). We are using StackExhange.Redis as our client.
We have RBD disabled but AOF enabled and are experiencing some problems with the cluster in the following situation :
One of our servers became full and the redis node on this server was unable to write to the AOF file (the error returned to the client was MISCONF Errors writing to the AOF file: No space left on device).
The cluster did not detect that the node was failing and so did not exlclude it from the cluster.
All cache operations were blocked until we make some place on the server.
We know that we don't need the AOF, so we have disalbed it after the incident.
But we would like to confirm or infirm our view on redis clustering: for us, if a node was experiencing a failure, the cluster would redirect all requests to another one. We have tested that with a stopped node master, a slave is promoted into a master so we are confident that our cluster is working, but we are not sure why, in our case, the node was not marked as a failure.
Is the cluster capable of detecting a node failure when the failure is only happening when a request is made from a client to the cluster ?

How to clean Apache Ignite caches and sort of start over?

I have a 3 node ignite cluster and 1 client that creates cache. During the development and testing, I had to stop the cluster or interrupt the cache building several time and the entire system is broken now. Only one node starts and the other nodes crashes. The client is blocked and it does not do anythin.
Is there any way to clean everything and sort of start fresh?
I am using Ignite 2.1 and using Persistent Cache storage.
Thank you for your help.
Just delete Ignite work directory - by default, it's ${IGNITE_HOME}/work.
Also, if you configured WAL store path, you need to clean it too:
https://apacheignite.readme.io/docs/distributed-persistent-store#section-write-ahead-log
Note: All data in persistent store will be lost.

Restarting managed servers by clusters without outage

I want to write script for restarting weblogics managed servers, which would do the following:
It would contain loop ,which would restart first nodes of all clusters at one time.
a.)FORCE_SHUTDOWN
b.)wait for status: SHUTDOWN
c.)START managed servers
d.)wait for status: RUNNING
e.)move to next node of each cluster and repeat until all managed servers are restarted.
So in first iteration it would restart all first nodes of each cluster, in second iteration it would restart the second nodes of each cluster and repeat this action until all managed servers are restarted.
I have not started to writing the script yet, I am newbie with weblogic and this is just concept. Do you have any suggestions how to achieve that goal?
Why reinvent the wheel?
rollingRestart
Category: Control Commands
Use with WLST: Online
Description Initiates a rolling restart of all servers in a domain or all servers in a specific cluster or clusters without interrupting
the service. This command provides the ability to sequentially restart
servers.
This operation involves the graceful shutdown of the servers, and the
servers being restarted without interrupting the service for the user.
Syntax
rollingRestart(target, [options])

Redis - Promoting a slave to master manually

Suppose I have [Slave IP Address] which is the slave of [Master IP Address].
Now my master server has been shut down, and I need to set this slave to be master MANUALLY (WITHOUT using sentinel automatic failover, WITH redis command).
Is it possible doing this without restarting the redis service ? (and losing all the cached data)
use SLAVEOF NO ONE to promote a slave to master
http://redis.io/commands/slaveof
it depends, if you are in a cluster you will be better using the fail over. You will need to use the force option in the command
http://redis.io/commands/cluster-failover
Is it possible doing this without restarting the redis service? (and
losing all the cached data)
yes that's possible, you can use
SLAVEOF NO ONE (without sentinel)
But it is recommended to use sentinel to avoid data loss.
sentinel failover master-name(with sentinel)
This will force the sentinel to switch master.
The new master will have all the data that was synchronized before the old-master shutdown.
Redis will automatically choose the best slave with max. data, that will reduce the amount of data we lose when switching master.
Below 2 options in step 3 have helped me to recover the cluster once a master node is down, compute was replaced or other not recoverable state.
1 .- First you need to connect to the slave node, use redis-cli, here a link how to do that: How to connect to remote Redis server?
2 .- Once connected to the slave node run the command cluster nodes to validate master node is in fail state, also run cluster info to see the overall state of your cluster(this is always a good idea)
3 .- Inside the slave node to be promoted run command: cluster failover,
in rare cases when there is some serious issues with redis this
command could fail, and you will need to use cluster failover force
or cluster failover takeover, here more info abut the implications
of those options: https://redis.io/commands/cluster-failover
4 .- Run cluster forged $old_master_id in all your cluster nodes
5 .- Add a new node with cluster meet $new_node_IP $new_node_PORT
6 .- Subscribe your new node to your brand new master, login in to the new bode and run cluster replicate $master_node_id
Steps 1-3 are required for the slave-master promotion and 4-5 are required to left all cluster in a healthy master-slave equilibrium.
As of Redis version 5.0.0 the SLAVEOF command is regarded as deprecated.
If a Redis server is already acting as replica, the command REPLICAOF NO ONE will turn off the replication, turning the Redis server into a MASTER.