What could be the reason for OpenLDAP replica to skip few items while synchronization - ldap

I have an OpenLdap cluster with 6 nodes, when an item is added/deleted in the master, the synchronization kicks in and the changes are replicated to other slave nodes in the cluster, but sometimes one of the slave cluster nodes (the same node all the time) misses the updates and hence there is a difference between this slave node and the rest of the slave nodes and the master, so sometimes when the request goes to the unsynchronized slave it yields invalid results.
In the problematic slave's ldap logs, there is no error information during this operation to the master which explains the miss, so cant figure out what has caused this problem, bringing down that slave and re-add does not help either.
Anyone has faced similar problem and figured out the cause ?

I have posted the query in OpenLDAP itself and they suggested to go for the upgrade as the version i am using is 2.4.44 which is couple of years old and looks like there have been many replication related fixes went in since 2.4.44
Below is the OpenLDAP forum link for the same:
https://bugs.openldap.org/show_bug.cgi?id=9701

Related

Handle io.lettuce.core.RedisReadOnlyException when network is partitioned

I have a situation where I use sentinel to get current redis master from sentinel. My setup is one redis master and three slaves and three sentinel nodes. This works fine in most situations but I have found that if I get a network split where the current master and the sentinel node that is configured first in the list of sentinel nodes are isolated from the other nodes, the other two sentinel nodes are doing a reelection to a new master, as intended.
My problem is that when the isolated previous master is accessing the common network again and is reconfigured to slave, my application is never notified that a new master is elected and continues to write to a slave since it still thinks it is writing to a master, ending up in getting "Error in execution; nested exception is io.lettuce.core.RedisReadOnlyException: READONLY You can't write against a read only slave."
I do not know if this is a redis problem or framework problem. Should redis when it is reconfigured from master to save terminate the connection like it is done in normal circumstances when a new master is elected or should the framework handle exceptions and query for current master?
One more interesting aspect of this is if the sentinel node configured first in the sentinel node list continues to be isolated, the behavior continues even if the application accessing redis is restarted.
Is there any mechanism to handle this situation or is this a bug or enhancement to the framework?

Best Practice to Upgrade Redis with Sentinels?

I have three redis nodes being watched by 3 sentinels. I've searched around and the documentation seems to be unclear as to how best to upgrade a configuration of this type. I'm currently on version 3.0.6 and I want to upgrade to the latest 5.0.5. I have a few questions on the procedure around this.
Is it ok to upgrade two major versions? I did this in our staging environment and it seemed to be fine. We use pretty basic redis functionality and there are no breaking changes between the versions.
Does order matter? Should I upgrade say all the sentinels first and then the redis nodes, or should the sentinel plane be last after verifying the redis plane? Should I do one sentinel/redis node at a time?
Any advice or experience on this would be appreciated.
I am surprised by the lack of response to this, but I understand that the subject kind of straddles something like stackoverflow and something like stack exchange. I'm also surprised at the lack of documentation I was able to find on the subject.
I did some extensive testing in a staging environment and then proceeded to our production and the procedure I followed seemed to work for the most part:
Upgrading from 3.0.6 to 5.0.5 in our case seems to be working without a hitch. As I said in the original post, we use the basics in redis and there hasn't been much changed from the client perspective.
I went forward upgrading in this order:
The first two sentinel peers and then the sentinel currently in the leader status.
Each of the redis nodes listed as slaves (now known as replicas).
After each node is upgraded, it will want to copy its dump.rdb from the master
A sync can be done to a 5 node from a 3 node, but once a 5 node is the master, a 3 node cannot sync, so once you've failed over to an upgraded node, you can't go back to the earlier version.
Finally use the sentinels to failover to an upgraded node as master and upgrade the former master
Hopefully someone might find this useful going forward.

What happens if MASTER node is inaccessible by the clients in "Replicated levelDB Store" in ActiveMQ?

So the documentation to the "Replicated LevelDB Store" says:
The elected master broker node starts and accepts client connections. The other nodes go into slave mode and connect the the master and synchronize their persistent state /w it. The slave nodes do not accept client connections. All persistent operations are replicated to the connected slaves. If the master dies, the slaves with the latest update gets promoted to become the master. The failed node can then be brought back online and it will go into slave mode.
So one chosen master exist, it accepts client connections and the rest are replicated slave nodes who do not accept client connections. Fine.
So if the master dies it's all working fine - the master gets reelected, clients disconnect and they eventually connect to the new master. Awesome.
Now what happens if the master isn't dead from the perspective of Zookeeper, but it's just NOT ACCESSIBLE from clients. So a master is chosen, it's considered live(as i understand zookeeper's need to be able to connect to it to be considered available), but the actual clients can't connect to it?
Sure clients CAN connect to the other slave nodes, they just can't connect to the master. But the master won't ever be changed as it's live. Is that how it works?
Not sure i understood it right.
LevelDB support in ActiveMQ is deprecated and has been for quite some time (years) so I'd suggest not bothering with it as there is no support and plenty of open bugs that will not be fixed.
I'd suggest taking a look instead at ActiveMQ Artemis.
You understand it right, and it's a reasonable design.
Clients only commuicate to master, and slaves are just used for backup. If what you described really happens, maybe caused by network problem, then you should fix the network(or any other possible reasons).

ElastiCache not utilizing read only replica

I have a simple Redis ElastiCache cluster (cluster mode disabled) with a master node and a read only replica.
When throwing traffic at the server, i.e. from redis-benchmark, it seems all GET traffic goes only to the master node, while the RO replica gets zero GET traffic (cache hit/miss and GetTypeCommands are all 0).
Anyone has insights on why this is happening? I expected the traffic would be distributed between the two nodes.
older question but I am answering since I am just learning this myself...
I thought that the purpose was to balance the load between master and slave, but that is not the case. The slave exists so that it can be promoted to master if master fails for any reason.
further reading: https://redis.io/topics/replication

DC/OS Mesos-Master rejoined and causes interruptions on the master agents

I'm having a strange issue today. First of all, every thing was still working fine yesterday when I left the office, but today when I went back to work my DC/OS dashboard showed my that there weren't any services running, or Nodes connected.
I've ran into this issue once or twice before and was related to the marathon not being able to elect a master. One of the 3 master nodes is then also showing a lot of errors in the journal. This can be resolved by stopping / starting the dcos-marathon service on that host, which brings it back into the marathon group.
I did see the Nodes and services again. But now it sometimes tells me there is only one Node connected and then 3 again, and just 1 again, etc..
When I stop the dcos-mesos-master process on the conflicting host, this stops and I have a stable master cluster (but probably not really resilient).
It looks like the failing node is trying to become the master, which causes this.. I've tried to search about rejoining a failed mesos-master.. but came up
I'm running DC/OS on a CoreOS environment.
Although a general behavior is described, you may need to provide more specifics such as the kernel version, dc/os version, specs and etc. The simplest answer I can provide based what's been given is to reach out via their support channel on Slack ( https://dcos-community.slack.com/ ).