I am trying to create a simple redis high availability setup with 1 master, 1 slave and 2 sentinels.
The setup works perfectly when failing over from redis-master to redis-slave.
When redis-master recovers, it correctly register itself as slave to the new redis-slave master.
However, when redis-slave as a master goes down, redis-master cannot return as master. The log of redis-master go into the loop showing:
1:S 12 Dec 11:12:35.073 * MASTER <-> SLAVE sync started
1:S 12 Dec 11:12:35.073 * Non blocking connect for SYNC fired the event.
1:S 12 Dec 11:12:35.074 * Master replied to PING, replication can continue...
1:S 12 Dec 11:12:35.075 * Trying a partial resynchronization (request 684581a36d134a6d50f1cea32820004a5ccf3b2d:285273).
1:S 12 Dec 11:12:35.076 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 12 Dec 11:12:36.081 * Connecting to MASTER 10.102.1.92:6379
1:S 12 Dec 11:12:36.081 * MASTER <-> SLAVE sync started
1:S 12 Dec 11:12:36.082 * Non blocking connect for SYNC fired the event.
1:S 12 Dec 11:12:36.082 * Master replied to PING, replication can continue...
1:S 12 Dec 11:12:36.083 * Trying a partial resynchronization (request 684581a36d134a6d50f1cea32820004a5ccf3b2d:285273).
1:S 12 Dec 11:12:36.084 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 12 Dec 11:12:37.087 * Connecting to MASTER 10.102.1.92:6379
1:S 12 Dec 11:12:37.088 * MASTER <-> SLAVE sync started
...
Per Replication doc, it states that:
Since Redis 4.0, when an instance is promoted to master after a
failover, it will be still able to perform a partial resynchronization
with the slaves of the old master.
But the log seems to show otherwise. More detail version of log showing both the first redis-master to redis-slave failover and subsequent redis-slave to redis-master log is available here.
Any idea what's going on? What do I have to do to allow the redis-master to return to master role? Configuration detail is provided below:
SERVICES
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
redis-master ClusterIP 10.102.1.92 <none> 6379/TCP 11m
redis-slave ClusterIP 10.107.0.73 <none> 6379/TCP 11m
redis-sentinel ClusterIP 10.110.128.95 <none> 26379/TCP 11m
redis-master config
requirepass test1234
masterauth test1234
dir /data
tcp-keepalive 60
maxmemory-policy noeviction
appendonly no
bind 0.0.0.0
save 900 1
save 300 10
save 60 10000
slave-announce-ip redis-master.fp8-cache
slave-announce-port 6379
redis-slave config
requirepass test1234
slaveof redis-master.fp8-cache 6379
masterauth test1234
dir /data
tcp-keepalive 60
maxmemory-policy noeviction
appendonly no
bind 0.0.0.0
save 900 1
save 300 10
save 60 10000
slave-announce-ip redis-slave.fp8-cache
slave-announce-port 6379
It turn out that the problem is related to the used of host name instead of IP:
slaveof redis-master.fp8-cache 6379
...
slave-announce-ip redis-slave.fp8-cache
So, when the master came back as slave, sentinel shows that there are now 2 slaves: one with ip address and another with host name. Not sure exactly how does these 2 slave entries (that points to the same Redis server) cause the problem above. Now that I changed the config to use IP address instead of host name the Redis HA is working flawlessly.
Related
I have 3 EC2 Instances with Redis Running like this:
Server 001: 10.0.1.203, Port: 6379
Server 002: 10.0.1.202, Port: 6380
Server 003: 10.0.1.190, Port: 6381
Config file for each one:
# bind 127.0.0.1
protected-mode no
port PORT
pidfile /var/run/redis_PORT.pid
cluster-enabled yes
cluster-config-file nodes-PORT.conf
cluster-node-timeout 15000
I can connect via redis to each one on each server.
But when I run the cluster creation the script never ends on Server 001.
root#ip-10-0-1-203:~/redis-stable# redis-cli --cluster create 10.0.1.203:6379 10.0.1.202:6380 10.0.1.190:6381
>>> Performing hash slots allocation on 3 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
M: 4c0b7609e5d906ff58d67ab446bbd9e20833e0db 10.0.1.203:6379
slots:[0-5460] (5461 slots) master
M: a5dbd72815a1875b58a0cc0fd6a52dc0b76735b7 10.0.1.202:6380
slots:[5461-10922] (5462 slots) master
M: 14d39c0876a982cadd50f301a3d35715171279c0 10.0.1.190:6381
slots:[10923-16383] (5461 slots) master
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
....................................................................................................................................................................................................................................................................................................................................
Server 002 logs:
44119:M 02 Nov 2020 13:30:03.477 * Ready to accept connections
44119:M 02 Nov 2020 13:30:45.362 # configEpoch set to 0 via CLUSTER RESET HARD
44119:M 02 Nov 2020 13:30:45.362 * Node hard reset, now I'm a5dbd72815a1875b58a0cc0fd6a52dc0b76735b7
44119:M 02 Nov 2020 13:30:59.352 # configEpoch set to 2 via CLUSTER SET-CONFIG-EPOCH
Server 003 logs:
44033:M 02 Nov 2020 13:30:50.695 # configEpoch set to 0 via CLUSTER RESET HARD
44033:M 02 Nov 2020 13:30:50.695 * Node hard reset, now I'm 14d39c0876a982cadd50f301a3d35715171279c0
44033:M 02 Nov 2020 13:30:59.346 # configEpoch set to 3 via CLUSTER SET-CONFIG-EPOCH
Am missing something on the configuration?
Probably the Redis Cluster port is not accessible on the EC2 instances.
From the Redis Cluster Specification:
Every Redis Cluster node has an additional TCP port for receiving incoming connections from other Redis Cluster nodes. This port is at a fixed offset from the normal TCP port used to receive incoming connections from clients. To obtain the Redis Cluster port, 10000 should be added to the normal commands port. For example, if a Redis node is listening for client connections on port 6379, the Cluster bus port 16379 will also be opened.
I have an architecture with three Redis instances (one master and two slaves) and three Sentinel instances. In front of it there is a HaProxy.
All works well until the master Redis instance goes down. The new master is properly chosen by Sentinel. However, the old master (which is now down) is not reconfigured to be a slave. As a result, when that instance is up again I have two masters for a short period of time (about 11 seconds). After that time that instance which was brought up is properly downgraded to slave.
Shouldn't it work that way, that when the master goes down it is downgraded to slave straight away? Having that, when it was up again, it would be slave immediately.
I know that (since Redis 2.8?) there is that CONFIG REWRITE functionality so the config cannot be modified when the Redis instance is down.
Having two masters for some time is a problem for me because the HaProxy for that short period of time instead of sending requests to one master Redis, it does the load balancing between those two masters.
Is there any way to downgrade the failed master to slave immediately?
Obviously, I changed the Sentinel timeouts.
Here are some logs from Sentinel and Redis instances after the master goes down:
Sentinel
81358:X 23 Jan 22:12:03.088 # +sdown master redis-ha 127.0.0.1 63797.0.0.1 26381 # redis-ha 127.0.0.1 6379
81358:X 23 Jan 22:12:03.149 # +new-epoch 1
81358:X 23 Jan 22:12:03.149 # +vote-for-leader 6b5b5882443a1d738ab6849ecf4bc6b9b32ec142 1
81358:X 23 Jan 22:12:03.174 # +odown master redis-ha 127.0.0.1 6379 #quorum 3/2
81358:X 23 Jan 22:12:03.174 # Next failover delay: I will not start a failover before Sat Jan 23 22:12:09 2016
81358:X 23 Jan 22:12:04.265 # +config-update-from sentinel 127.0.0.1:26381 127.0.0.1 26381 # redis-ha 127.0.0.1 6379
81358:X 23 Jan 22:12:04.265 # +switch-master redis-ha 127.0.0.1 6379 127.0.0.1 6381
81358:X 23 Jan 22:12:04.266 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 # redis-ha 127.0.0.1 6381
81358:X 23 Jan 22:12:04.266 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 # redis-ha 127.0.0.1 6381
81358:X 23 Jan 22:12:06.297 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 # redis-ha 127.0.0.1 6381
Redis
81354:S 23 Jan 22:12:03.341 * MASTER <-> SLAVE sync started
81354:S 23 Jan 22:12:03.341 # Error condition on socket for SYNC: Connection refused
81354:S 23 Jan 22:12:04.265 * Discarding previously cached master state.
81354:S 23 Jan 22:12:04.265 * SLAVE OF 127.0.0.1:6381 enabled (user request from 'id=7 addr=127.0.0.1:57784 fd=10 name=sentinel-6b5b5882-cmd age=425 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=14 qbuf-free=32754 obl=36 oll=0 omem=0 events=rw cmd=exec')
81354:S 23 Jan 22:12:04.265 # CONFIG REWRITE executed with success.
81354:S 23 Jan 22:12:04.371 * Connecting to MASTER 127.0.0.1:6381
81354:S 23 Jan 22:12:04.371 * MASTER <-> SLAVE sync started
81354:S 23 Jan 22:12:04.371 * Non blocking connect for SYNC fired the event.
81354:S 23 Jan 22:12:04.371 * Master replied to PING, replication can continue...
81354:S 23 Jan 22:12:04.371 * Partial resynchronization not possible (no cached master)
81354:S 23 Jan 22:12:04.372 * Full resync from master: 07b3c8f64bbb9076d7e97799a53b8b290ecf470b:1
81354:S 23 Jan 22:12:04.467 * MASTER <-> SLAVE sync: receiving 860 bytes from master
81354:S 23 Jan 22:12:04.467 * MASTER <-> SLAVE sync: Flushing old data
81354:S 23 Jan 22:12:04.467 * MASTER <-> SLAVE sync: Loading DB in memory
81354:S 23 Jan 22:12:04.467 * MASTER <-> SLAVE sync: Finished with success
I was also getting the same error when I want to switch master in redis-cluster using sentinel.
+vote-for-leader xxxxxxxxxxxxxxxxxxxxxxxx8989 10495
Next failover delay: I will not start a failover before Fri Aug 2 23:23:44 2019
After resetting sentinel. Cluster works as expected
SENTINEL RESET *
or
SENTINEL RESET mymaster
Run above command in all sentinel server.
In the event a Redis node goes down, when/if it recovers, it will recover with the same role it had prior to going down. The Sentinel cannot reconfigure the node if it is unable to ping it. So, there's a brief period of time between the node coming back up and the Sentinel acknowledging and reconfiguring it. This explains the multi-master state.
If you are set on using Haproxy, one workaround would be to reconfigure the Redis node's role prior to starting the process. Redis will boot as a slave as long as there's a SLAVEOF entry in the redis.conf. The primary issue with this workaround is that it doesn't solve network partition scenarios.
Hope that helps.
If using HAProxy you can try to query the uptime_in_seconds something like this:
backend redis
mode tcp
balance first
timeout queue 5s
default-server check inter 1s fall 2 rise 2 maxconn 100
option tcp-check
tcp-check connect
tcp-check send AUTH\ <secret>\r\n
tcp-check expect string +OK
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check send info\ replication\r\n
tcp-check expect string role:master
tcp-check send info\ server\r\n
tcp-check expect rstring uptime_in_seconds:\d{2,}
tcp-check send QUIT\r\n
tcp-check expect string +OK
server redis-1 10.0.0.10:9736
server redis-2 10.0.0.20:9736
server redis-3 10.0.0.30:9736
Notice the:
tcp-check expect rstring uptime_in_seconds:\d{2,}
if uptime is not > 10 seconds, the node will not be added
Solution
This can be resolved by making use of the rise option in your HAProxy config.
default-server check inter 1s fall 2 rise 30
# OR
server redis-1 127.0.0.1:6379 check inter 1s fall 2 rise 30
This sets the number of successful checks that must pass for a server to be considered UP. As such this can successfully delay a re-joining Redis node from being considered UP and give Sentinel a chance to change the node's role.
Important Trade-off
The trade-off with this approach, is that your fail-overs will take longer to be respected by HAProxy as you are adding in an extra delay. This delay applies to both your re-joining node after a failure and also your existing slave nodes that are promoted to role:master. Ultimately you will need to make the decision between which option is better for you; having 2 masters momentarily, or taking longer to fail between nodes.
If using haproxy a more stable solution would be to check for available slaves. After a reboot, restart or forced switch an old master will still have the role master but no slaves are connected. So the value is zero.
# faulty old master
role:master
connected_slaves:0
slave0:ip=127.0.0.2,port=6379,state=online,offset=507346829,lag=0
slave1:ip=127.0.0.1,port=6379,state=online,offset=507346966,lag=0
master_failover_state:no-failover
...
I would replace
tcp-check expect string role:master
tcp-check send info\ server\r\n
tcp-check expect rstring uptime_in_seconds:\d{2,}
with
tcp-check expect rstring connected_slaves:[^0]
Total config for me.
listen listen-REDIS
bind 1.1.1.1:6379
mode tcp
no option prefer-last-server
option tcplog
balance leastconn
option tcp-check
tcp-check send "auth STRING\r\n"
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check send info\ replication\r\n
tcp-check expect rstring connected_slaves:[^0]
tcp-check send QUIT\r\n
tcp-check expect string +OK
default-server inter 500ms fall 1 rise 1
server REDIS01 127.0.0.1:6379 check
server REDIS02 127.0.0.2:6379 check
server REDIS03 127.0.0.3:6379 check
I'm trying to setup a typical redis sentinel configuration, with three machines that will run three redis servers and three redis sentinels. The Master/Slave part of the redis servers are working OK, but the sentinels are not working. When I start two sentinels, the sentinel with the master detects the slaves, but mark them as down after the specified amount of time. I'm running Redis 3.0.5 64-bit in debian jessie machines.
8319:X 22 Dec 14:06:17.855 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
8319:X 22 Dec 14:06:17.855 # Sentinel runid is cdd5bbd5b84c876982dbca9d45ecc4bf8500e7a2
8319:X 22 Dec 14:06:17.855 # +monitor master mymaster xxxxxxxx0 6379 quorum 2
8319:X 22 Dec 14:06:18.857 * +slave slave xxxxxxxx2:6379 xxxxxxx2 6379 # mymaster xxxxxxx0 6379
8319:X 22 Dec 14:06:18.858 * +slave slave xxxxxx1:6380 xxxxxxx1 6380 # mymaster xxxxxxx0 6379
8319:X 22 Dec 14:07:18.862 # +sdown slave xxxxxxxx1:6380 xxxxxxx1 6380 # mymaster xxxxxx0 6379
8319:X 22 Dec 14:07:18.862 # +sdown slave xxxxxx2:6379 xxxxxxx2 6379 # mymaster xxxxxx0 6379
Sentinel config file:
daemonize yes
pidfile "/var/run/redis/redis-sentinel.pid"
logfile "/var/log/redis/redis-sentinel.log"
bind 127.0.0.1 xxxxxxx0
port 26379
sentinel monitor mymaster xxxxxxx0 6379 2
sentinel down-after-milliseconds mymaster 60000
sentinel config-epoch mymaster 0
sentinel leader-epoch mymaster 0
dir "/var/lib/redis"
Of course, there is connectivity between these machines, as the slaves are working OK:
7553:S 22 Dec 13:46:33.285 * Connecting to MASTER xxxxxxxx0:6379 <br/>
7553:S 22 Dec 13:46:33.286 * MASTER <-> SLAVE sync started
7553:S 22 Dec 13:46:33.286 * Non blocking connect for SYNC fired the event.
7553:S 22 Dec 13:46:33.287 * Master replied to PING, replication can continue...
7553:S 22 Dec 13:46:33.288 * Partial resynchronization not possible (no cached master)
7553:S 22 Dec 13:46:33.291 * Full resync from master: f637ca8fe003acd09c6d021aed3f89a0d9994c9b:98290
7553:S 22 Dec 13:46:33.350 * MASTER <-> SLAVE sync: receiving 18 bytes from master
7553:S 22 Dec 13:46:33.350 * MASTER <-> SLAVE sync: Flushing old data
7553:S 22 Dec 13:46:33.350 * MASTER <-> SLAVE sync: Loading DB in memory
7553:S 22 Dec 13:46:33.350 * MASTER <-> SLAVE sync: Finished with success
7553:S 22 Dec 14:01:33.072 * 1 changes in 900 seconds. Saving...
I can answer myself. The problem was that the first IP that appeared in the sentinel conf was the localhost ip. It needs to be the binding IP. Just in case it serves anyone.
Have installed a redis (v. 3.0.4) master-slave model using 3 nodes (1 master and 2 slaves) with requirepass for each node as described in https://www.digitalocean.com/community/tutorials/how-to-configure-a-redis-cluster-on-ubuntu-14-04 then started 3 sentinel on each node as described in the article http://blog.commando.io/redis-is-easy-trivial-hard/
After have tried to take down the master, sentinel has promoted one of the slaves to a master as expected. Then when the old master was up
again it became a slave and recognize a new master, this could be seen in the
/etc/redis/sentinel.conf which was updated with the new master IP
in 'sentinel monitor redis-cluster' attribute.
But have noticed that the old master despite knowing the new master IP, it considers the new master as down, unlike the other slave which see it up. This could checked by running this command against the old master:
$redis-cli -a altoros info replication
#
Replication
role:slave
master_host: new master ip
master_port:6379
master_link_status:down
This also seems to be causing the following error "MASTERDOWN Link with MASTER is down and slave-serve-stale-data is set to 'no'", when trying to use a synchronous client for testing data replication over nodes.
The logs of the old masters (/var/log/redis/redis-server.log) are showing:
20731:S 09 Nov 10:16:31.117 * Connecting to MASTER <new master="" ip="">: 6379
20731:S 09 Nov 10:16:31.117 * MASTER <-> SLAVE sync started
20731:S 09 Nov 10:16:31.118 * Non blocking connect for SYNC fired the event.
20731:S 09 Nov 10:16:31.118 * Master replied to PING, replication can continue...
20731:S 09 Nov 10:16:31.119 * (Non critical) Master does not under stand REPLCONF listening-port: -NOAUTH Authentication required.
20731:S 09 Nov 10:16:31.119 * (Non critical) Master does not under stand REPLCONF capa:
-NOAUTH Authentication required.
This looks like the old master cannot authenticate to the new master, because it
doesn't have his password, but how to set that properly?
Because have noticed that /etc/redis/redis.conf did not changed after a new master was
promoted, unlike /etc/redis/sentinel.conf, and this could cause that as redis.conf of the master doesn't the password of the new master.
Would appreciate any hint to resolve the issue, thanks in advance.
The master needs to be configured just like a slave because it might become one, someday. As such you need to set it's masterauth to the password for the pod.
You can do this without restarting y doing the following against the "old master":
redis-cli -h oldmasterip -a thepassword config set masterauth thepassword
redis-cli -h oldmasterip -a thepassword config rewrite
And it should be fine from that point, and the config file will be updated.
for now we're trying to play with redis 2.8.7 as cache storage (from the .NET web application using booksleeve client).
It seems to be very interesting and exciting task at the moment, redis documentation is very nice, but due to lack of real practical experience I do have couple of questions about how expected configuration should be done properly.
I took next articles as main configuration source:
Installing redis with autostart capability (using an init script, so that after a restart everything will start again properly) : http://redis.io/topics/quickstart
Deployment of the redis into azure: http://haishibai.blogspot.com/2014/01/walkthrough-setting-up-redis-cluster-on.html
Initial idea/assumption - is to have 1 redis master and 2 slave instances running with linux Ubuntu. In order to provide high availability of the instances - I've decided to use sentinel. So my expected configuration looks like this at the moment:
MasterInstance: VM1 (linux, Ubuntu), port : 6379 (autostarted when linux is restarted)
Slave1: VM2 (linux, ubuntu), port : 6380 (autostarted when linux is restarted) : slaveOf MasterID 6379
Slave2: VM3 (linux, ubuntu), port : 6379 (autostarted when linux is restarted) : slaveOf MasterIP 6379
After VMs started, I can see that I've got 2 slaves successfully connected and syncing with Master:
Trace sample from the master:
[1120] 25 Mar 14:11:18.629 - 1 clients connected (0 slaves), 793352 bytes in use
[1120] 25 Mar 14:11:18.634 * Slave asks for synchronization
[1120] 25 Mar 14:11:18.634 * Full resync requested by slave.
[1120] 25 Mar 14:11:18.634 * Starting BGSAVE for SYNC
[1120] 25 Mar 14:11:18.634 * Background saving started by pid 1227
[1227] 25 Mar 14:11:18.810 * DB saved on disk
[1227] 25 Mar 14:11:18.810 * RDB: 0 MB of memory used by copy-on-write
[1120] 25 Mar 14:11:18.836 * Background saving terminated with success
[1120] 25 Mar 14:11:18.837 * Synchronization with slave succeeded
[1120] 25 Mar 14:11:23.829 - DB 0: 2 keys (0 volatile) in 4 slots HT.
[1120] 25 Mar 14:11:23.829 - DB 2: 4 keys (0 volatile) in 4 slots HT.
[1120] 25 Mar 14:11:23.829 - 0 clients connected (1 slaves), 1841992 bytes in use
[1120] 25 Mar 14:11:29.011 - DB 0: 2 keys (0 volatile) in 4 slots HT.
[1120] 25 Mar 14:11:29.011 - DB 2: 4 keys (0 volatile) in 4 slots HT.
[1120] 25 Mar 14:11:29.011 - 0 clients connected (1 slaves), 1841992 bytes in use
[1120] 25 Mar 14:11:29.826 - Accepted 168.62.36.189:1024
[1120] 25 Mar 14:11:29.828 * Slave asks for synchronization
[1120] 25 Mar 14:11:29.828 * Full resync requested by slave.
[1120] 25 Mar 14:11:29.828 * Starting BGSAVE for SYNC
[1120] 25 Mar 14:11:29.828 * Background saving started by pid 1321
[1321] 25 Mar 14:11:29.871 * DB saved on disk
[1321] 25 Mar 14:11:29.871 * RDB: 0 MB of memory used by copy-on-write
[1120] 25 Mar 14:11:29.943 * Background saving terminated with success
[1120] 25 Mar 14:11:29.946 * Synchronization with slave succeeded
[1120] 25 Mar 14:11:34.195 - DB 0: 2 keys (0 volatile) in 4 slots HT.
[1120] 25 Mar 14:11:34.195 - DB 2: 4 keys (0 volatile) in 4 slots HT.
[1120] 25 Mar 14:11:34.195 - 0 clients connected (2 slaves), 1862920 bytes in use
now I need to setup sentinel instances ...
I copied sentinel.conf from the initial redis-stable package into 3 VM runnung redis (1 master and both slaves)
Inside each config I've done next modifications:
sentinel monitor mymaster MasterPublicIP 6379 2
on each VM started sentinel using next command line:
redis-server /etc/redis/sentinel.conf -- sentinel
After that I've got the response that sentinel successfully started ... on all VMs...
After I started all 3 sentinel instances I've got next trace sample (sentinel.conf files were updated with information about slaves and other sentinel instances):
[1743] 25 Mar 16:35:46.450 # Sentinel runid is 05380d689af9cca1e826ce9c85c2d68c65780878
[1743] 25 Mar 16:35:46.450 # +monitor master mymaster MasterIP 6379 quorum 2
[1743] 25 Mar 16:36:11.578 * -dup-sentinel master mymaster MasterIP 6379 #duplicate of 10.119.112.41:26379 or 83666bdd03fd064bcf2ec41ec2134d4e1e239842
[1743] 25 Mar 16:36:11.578 * +sentinel sentinel 10.119.112.41:26379 10.119.112.41 26379 # mymaster 168.62.41.1 6379
[1743] 25 Mar 16:36:16.468 # +sdown sentinel 10.175.220.134:26379 10.175.220.134 26379 # mymaster 168.62.41.1 6379
[1743] 25 Mar 16:36:40.876 * -dup-sentinel master mymaster MasterIP 6379 #duplicate of 10.175.220.134:26379 or fe9edeb321e04070c6ac6e28f52c05317a593ffd
[1743] 25 Mar 16:36:40.876 * +sentinel sentinel 10.175.220.134:26379 10.175.220.134 26379 # mymaster 168.62.41.1 6379
[1743] 25 Mar 16:37:10.962 # +sdown sentinel 10.175.220.134:26379 10.175.220.134 26379 # mymaster 168.62.41.1 6379
based on the trace sample, I have next questions. It will be great, if someone can clarify them:
Why do I see -dup-sentinel master mymaster configuration here ... Is it because I added 3 sentinels for the same master instance (maybe I need to register 1 sentinel per instance of redis - so 1 sentinel is going to be mapped to the master and 2 other sentinels - to the 2 slaves)?
how to start sentinels in the way redis servers is started (automatically even then VM is restarted)? - do I need to perform same actions and register them as ordinary redis-server instances?
Is it ok to have sentinel instance to be hosted in the same VM as redis-server?
After that I started new putty connection and started redis-cli to work with sentinel APIs, but received next response on my command below:
127.0.0.1:6379> SENTINEL masters
(error) ERR unknown command 'SENTINEL'
I guess I've done something stupid here... :(
What I've done wrong and how to test sentinel APIs from the terminal connection?
Thank you in advance for any help.
I guess "SENTINEL masters" should be run on the Redis sentinel
redis-cli -p 26379 (which the default sentinel port)
then issue
127.0.0.1:26379> SENTINEL masters
and you will get something
1) "name"
2) "mymaster"
3) "ip"
4) "127.0.0.1"
5) "port"
6) "6379"
.
.
.
To start sentinels automatically even then VM is restarted
first set daemonize yes into sentinel.conf
and modify the init script here (https://github.com/antirez/redis/blob/unstable/utils/redis_init_script) to reflect the sentinel port and .conf location.
$EXEC $CONF --sentinel # starting in Sentinel mode
and the rest is like you did for redis server.
First, you don't run Sentinel on the master. Sentinel is designed to detect when the master fails. If you run Sentinel on the same system as the master, you will lose a Sentinel when you lose the system. For the same reasons you shouldn't use the slaves as your additional test points.
You want to run Sentinel from where the clients run - to ensure you are testing for network outages.
Next, you mention you added slave information to your sentinel configs. You don't configure slaves in sentinel - it discovers them through the master. I suspect you've added additional sentinel monitor commands for each slave - this would indeed cause duplicate monitoring attempts.
Third, as #yofpro mentioned, to run sentinel commands you need to connect to sentinel -not Redis master or slaves.