Best Practice to Upgrade Redis with Sentinels? - redis

I have three redis nodes being watched by 3 sentinels. I've searched around and the documentation seems to be unclear as to how best to upgrade a configuration of this type. I'm currently on version 3.0.6 and I want to upgrade to the latest 5.0.5. I have a few questions on the procedure around this.
Is it ok to upgrade two major versions? I did this in our staging environment and it seemed to be fine. We use pretty basic redis functionality and there are no breaking changes between the versions.
Does order matter? Should I upgrade say all the sentinels first and then the redis nodes, or should the sentinel plane be last after verifying the redis plane? Should I do one sentinel/redis node at a time?
Any advice or experience on this would be appreciated.

I am surprised by the lack of response to this, but I understand that the subject kind of straddles something like stackoverflow and something like stack exchange. I'm also surprised at the lack of documentation I was able to find on the subject.
I did some extensive testing in a staging environment and then proceeded to our production and the procedure I followed seemed to work for the most part:
Upgrading from 3.0.6 to 5.0.5 in our case seems to be working without a hitch. As I said in the original post, we use the basics in redis and there hasn't been much changed from the client perspective.
I went forward upgrading in this order:
The first two sentinel peers and then the sentinel currently in the leader status.
Each of the redis nodes listed as slaves (now known as replicas).
After each node is upgraded, it will want to copy its dump.rdb from the master
A sync can be done to a 5 node from a 3 node, but once a 5 node is the master, a 3 node cannot sync, so once you've failed over to an upgraded node, you can't go back to the earlier version.
Finally use the sentinels to failover to an upgraded node as master and upgrade the former master
Hopefully someone might find this useful going forward.

Related

What could be the reason for OpenLDAP replica to skip few items while synchronization

I have an OpenLdap cluster with 6 nodes, when an item is added/deleted in the master, the synchronization kicks in and the changes are replicated to other slave nodes in the cluster, but sometimes one of the slave cluster nodes (the same node all the time) misses the updates and hence there is a difference between this slave node and the rest of the slave nodes and the master, so sometimes when the request goes to the unsynchronized slave it yields invalid results.
In the problematic slave's ldap logs, there is no error information during this operation to the master which explains the miss, so cant figure out what has caused this problem, bringing down that slave and re-add does not help either.
Anyone has faced similar problem and figured out the cause ?
I have posted the query in OpenLDAP itself and they suggested to go for the upgrade as the version i am using is 2.4.44 which is couple of years old and looks like there have been many replication related fixes went in since 2.4.44
Below is the OpenLDAP forum link for the same:
https://bugs.openldap.org/show_bug.cgi?id=9701

Migrate a (storm+nimbus) cluster to a new Zookeeper, without loosing the information or having downtime

I have a nimbus+storm cluster using Zookeeper, and I wish to move my cluster and point it to a new Zookeeper. Do you know if this is possible? Can I keep all the information of the old zookeeper and save it in the new one? Is it possible to do it without downtime?
I have looked in the internet for this procedure but I have not found much.
Would it be as simples as change the storm.yml file in both the master . and worker nodes? Do I need a restart afterwards?
# storm.zookeeper.servers:
# - "server1"
# - "server2"
If you just change storm.yml, you'd be pointing Storm at a new empty Zookeeper cluster, and it will be like you just installed Storm from scratch. More likely, you want to grow your Zookeeper cluster to include your new machines, then update storm.yml to point at the new machines, then shrink the cluster to exclude the machines you want to move away from. That way, your Zookeeper quorum is preserved even though you've moved to other physical machines.
This is easier to do on Zookeeper 3.5 with dynamic reconfiguration http://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html. I'm unsure whether Storm will run on Zookeeper 3.5, but you may consider investigating whether you can upgrade to 3.5 before growing/shrinking the cluster.
Otherwise you will have to do a rolling restart to add the new Zookeeper nodes, then do another one to remove the old machines once the cluster has stabilized.
Let me suggest a hack here. This was a script provided by microsoft for migration on HD Insight cluster , but you can change it and use it for your need.
The script can be downloaded from : https://github.com/hdinsight/hdinsight-storm-examples/tree/master/tools/zkdatatool-1.0 and you can read more about it here :
https://blogs.msdn.microsoft.com/azuredatalake/2017/02/24/restarting-storm-eventhub/
I have used it in the past when i had to migrate some stuff between PaaS clusters and i can confirm it works ok!

Should all pods using a redis cache be constrained to the same node as the rediscache itself?

We are running one of our services in a newly created kubernetes cluster. Because of that, we have now switched them from the previous "in-memory" cache to a Redis cache.
Preliminary tests on our application which exposes an API shows that we experience timeouts from our applications to the Redis cache. I have no idea why and it issue pops up very irregularly.
So I'm thinking maybe the reason for these timeouts are actually network related. Is it a good idea to put in affinity so we always run the Redis-cache on the same nodes as the application to prevent network issues?
The issues have not arisen during "very high load" situations so it's concerning me a bit.
This is an opinion question so I'll answer in an opinionated way:
Like you mentioned I would try to put the Redis and application pods on the same node, that would rule out wire networking issues. You can accomplish that with Kubernetes pod affinity. But you can also try nodeslector, that way you always pin your Redis and application pods to a specific node.
Another way to do this is to taint your nodes where you want to run your workloads and then add a toleration to the Redis and your application pods.
Hope it helps!

DC/OS Mesos-Master rejoined and causes interruptions on the master agents

I'm having a strange issue today. First of all, every thing was still working fine yesterday when I left the office, but today when I went back to work my DC/OS dashboard showed my that there weren't any services running, or Nodes connected.
I've ran into this issue once or twice before and was related to the marathon not being able to elect a master. One of the 3 master nodes is then also showing a lot of errors in the journal. This can be resolved by stopping / starting the dcos-marathon service on that host, which brings it back into the marathon group.
I did see the Nodes and services again. But now it sometimes tells me there is only one Node connected and then 3 again, and just 1 again, etc..
When I stop the dcos-mesos-master process on the conflicting host, this stops and I have a stable master cluster (but probably not really resilient).
It looks like the failing node is trying to become the master, which causes this.. I've tried to search about rejoining a failed mesos-master.. but came up
I'm running DC/OS on a CoreOS environment.
Although a general behavior is described, you may need to provide more specifics such as the kernel version, dc/os version, specs and etc. The simplest answer I can provide based what's been given is to reach out via their support channel on Slack ( https://dcos-community.slack.com/ ).

Migration of Active MQ version from 5.5.1 to 5.11.2

Planning to migrate Active MQ version form 5.5.1 to 5.11.2 how to migrate the existing messages from older version(5.5.1) to newer version(5.11.2)
Thanks in advance.
This assumes you have already taken care of any migration issues noted in each release note from 5.6.0 to 5.11.2.
There are essentially two ways to upgrade/migrate a broker.
Simply install the new broker and point out the old (kahaDB) database. This will automatically upgrade to a new version. This may cause some downtime during store upgrade (at least if there are a lot of messages in the store).
Have two parallell brokers running at once and let the old "fade out". You can setup a shiny new 5.11 broker side by side. This also makes it possible to migrate to other store types (JDBC or LevelDB). It's a little more work but will keep you uptime maximized. If you depend on message order, I would not recommend this method.
Setup the new broker.
Remove transportConnector from the old broker, and add a network connector from old to new.
Stop old, start new, start old.
Now, clients (using failover, right?) will fail over to the new broker and messages from the old brokers will be copied over to new as long as there are connected consumers on all queues.
When no more messages are left on old broker, shut it down and uninstall.
As with all upgrades, bypassing a lot of versions will make the upgrade less reliable. I would try some dry run upgrade of a production replica to ensure that everything goes as planned.