Why haproxy shows 2 redis nodes down even when those are the to redis slave nodes connected to redis master node? - redis

redis-a is master and other 2 are slaves connected to master then why haproxy is showing them as down
this is how i have setup in haproxy config.
defaults REDIS
mode tcp
timeout connect 4s
timeout server 30s
timeout client 30s
frontend front_redis
bind *:3679 name redis
default_backend back_redis
backend back_redis
option tcp-check
tcp-check connect
tcp-check send AUTH\ redis123\r\n
tcp-check expect string +OK
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check send info\ replication\r\n
tcp-check expect string role:master
tcp-check send QUIT\r\n
tcp-check expect string +OK
server redis-a 192.168.0.15:6379 check inter 1s
server redis-b 192.168.0.14:6379 check inter 1s
server redis-c 192.168.0.16:6379 check inter 1s
# Redis Block end
here is the result of the redis cli via haproxy.
3679> info Replication
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.5.16,port=6379,state=online,offset=1358919,lag=1
slave1:ip=192.168.5.14,port=6379,state=online,offset=1358919,lag=1
master_replid:5a096bcddd97e297wdww236ae9e6dd3f8df9f7
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1359061
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:310486
repl_backlog_histlen:1048576
3679 is haproxy port
my redis version is 6.0.9.

the two server 192.168.0.14 and 192.168.0.16 are sentinel slaves will not have role:master which is your haproxy check. so you need to check for slaves separate check .
that should do it.

Related

elasticache redis not responding to redis-cli commands

I have set up elasticache with redis and the host is rechable which I can confirm with telnet, when Redis commands are issued it does not return any result, either with ubuntu#ip-10-0-2-8:~$ redis-cli -h master.xxxxxx-xxxx.xxxxx.xxxx.cache.amazonaws.com -p 6379 INFO or and very unfortunately AWS cant show you redis logs
The redis-cli client does not support SSL/TLS connections. To use the
redis-cli to access an ElastiCache for Redis node (cluster mode
disabled) with in-transit encryption, you can use the stunnel package
in your Linux-based clients. The stunnel command can create an SSL
tunnel to Redis nodes specified in the stunnel configuration. After
the tunnel is established, the redis-cli can be used to connect an
in-transit encryption enabled cluster node.
Source: https://aws.amazon.com/premiumsupport/
So you can either use stunnel or disabling in-transit encryption.
You need to add firewall rule to allow other machine to access your redis server. I meant you need to enable firewall rule to allow 6379 port accessible from outside. Following article will will help you to do this.
Also please make sure redis is running on port 6379 or some other port.
https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/accessing-elasticache.html#access-from-outside-aws

Can't change the port number

I changed the port number from 6379 to 6380 but redis still tries to connect through the default port.
It says connection refused and couldn't connect through 127.0.0.1 6379. What can I do?
The command redis-cli -p 6380 will not start a Redis server that will listening to port 6380.
If you want to change the port you must firts kill the redis instance running on default port and then locate the redis.conf file.
Edit the lines:
# Accept connections on the specified port, default is 6379.
# If port 0 is specified Redis will not listen on a TCP socket.
port 6379
cluster-config-file nodes-6379.conf
with your new port.
Finally, start Redis with the edited config file:
./redis-server /path/to/redis/redis.conf
Check if there is a process already using 6379. On MacOS, run:
lsof -i :6380
Then kill whatever is using that port. Make sure you kill the redis instance running on 6379 and restart it on 6380 once you're sure that port is free.

redis behind ha proxy - multiple connection retries and connection closed

Ha proxy configuration for redis shows the following:
frontend redis
bind *:6379 name port6379
mode tcp
timeout client 15s
#define hosts
acl redis_3 hdr(host) -i im.test.com
#figure out which one to use
use_backend test_redis if redis_3
backend test_redis
mode tcp
timeout connect 4s
timeout server 30s
#balance leastconn
option tcplog
option tcp-check
tcp-check send PING\r\n
tcp-check expect string +PONG
#tcp-check send QUIT\r\n
#tcp-check expect string +OK
server node1_redis 10.146.99.164:6379 check inter 1s
HA proxylogs show multiple Connects to redir. On the redir server I see "Connection reset by peer"
In the browser I get a 'reinitialized session'.
But with redis-cli -h ping I get a correct PONG response. Directing the browser directly to the redis server on port 6379 give the following after a short while:
-ERR wrong number of arguments for 'get' command
-ERR unknown command 'Host:'
-ERR unknown command 'User-Agent:'
-ERR unknown command 'Accept:'
-ERR unknown command 'Accept-Language:'
-ERR unknown command 'Accept-Encoding:'
-ERR unknown command 'Cookie:'
-ERR unknown command 'Connection:'
-ERR unknown command 'Upgrade-Insecure-Requests:'
HAproxy stats show backend is up.
Can anyone help me with this? Why do I get an error when connecting through HA proxy?
Solved it! The host name for some reason was not read in the acl. Instantly when I changed it to a default backend it worked.
Maybe someone can tell me why the hostname in this example did not work?
But for me it works now.
Jan

Redis Sentinel master not downgraded to slave immediately

I have an architecture with three Redis instances (one master and two slaves) and three Sentinel instances. In front of it there is a HaProxy.
All works well until the master Redis instance goes down. The new master is properly chosen by Sentinel. However, the old master (which is now down) is not reconfigured to be a slave. As a result, when that instance is up again I have two masters for a short period of time (about 11 seconds). After that time that instance which was brought up is properly downgraded to slave.
Shouldn't it work that way, that when the master goes down it is downgraded to slave straight away? Having that, when it was up again, it would be slave immediately.
I know that (since Redis 2.8?) there is that CONFIG REWRITE functionality so the config cannot be modified when the Redis instance is down.
Having two masters for some time is a problem for me because the HaProxy for that short period of time instead of sending requests to one master Redis, it does the load balancing between those two masters.
Is there any way to downgrade the failed master to slave immediately?
Obviously, I changed the Sentinel timeouts.
Here are some logs from Sentinel and Redis instances after the master goes down:
Sentinel
81358:X 23 Jan 22:12:03.088 # +sdown master redis-ha 127.0.0.1 63797.0.0.1 26381 # redis-ha 127.0.0.1 6379
81358:X 23 Jan 22:12:03.149 # +new-epoch 1
81358:X 23 Jan 22:12:03.149 # +vote-for-leader 6b5b5882443a1d738ab6849ecf4bc6b9b32ec142 1
81358:X 23 Jan 22:12:03.174 # +odown master redis-ha 127.0.0.1 6379 #quorum 3/2
81358:X 23 Jan 22:12:03.174 # Next failover delay: I will not start a failover before Sat Jan 23 22:12:09 2016
81358:X 23 Jan 22:12:04.265 # +config-update-from sentinel 127.0.0.1:26381 127.0.0.1 26381 # redis-ha 127.0.0.1 6379
81358:X 23 Jan 22:12:04.265 # +switch-master redis-ha 127.0.0.1 6379 127.0.0.1 6381
81358:X 23 Jan 22:12:04.266 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 # redis-ha 127.0.0.1 6381
81358:X 23 Jan 22:12:04.266 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 # redis-ha 127.0.0.1 6381
81358:X 23 Jan 22:12:06.297 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 # redis-ha 127.0.0.1 6381
Redis
81354:S 23 Jan 22:12:03.341 * MASTER <-> SLAVE sync started
81354:S 23 Jan 22:12:03.341 # Error condition on socket for SYNC: Connection refused
81354:S 23 Jan 22:12:04.265 * Discarding previously cached master state.
81354:S 23 Jan 22:12:04.265 * SLAVE OF 127.0.0.1:6381 enabled (user request from 'id=7 addr=127.0.0.1:57784 fd=10 name=sentinel-6b5b5882-cmd age=425 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=14 qbuf-free=32754 obl=36 oll=0 omem=0 events=rw cmd=exec')
81354:S 23 Jan 22:12:04.265 # CONFIG REWRITE executed with success.
81354:S 23 Jan 22:12:04.371 * Connecting to MASTER 127.0.0.1:6381
81354:S 23 Jan 22:12:04.371 * MASTER <-> SLAVE sync started
81354:S 23 Jan 22:12:04.371 * Non blocking connect for SYNC fired the event.
81354:S 23 Jan 22:12:04.371 * Master replied to PING, replication can continue...
81354:S 23 Jan 22:12:04.371 * Partial resynchronization not possible (no cached master)
81354:S 23 Jan 22:12:04.372 * Full resync from master: 07b3c8f64bbb9076d7e97799a53b8b290ecf470b:1
81354:S 23 Jan 22:12:04.467 * MASTER <-> SLAVE sync: receiving 860 bytes from master
81354:S 23 Jan 22:12:04.467 * MASTER <-> SLAVE sync: Flushing old data
81354:S 23 Jan 22:12:04.467 * MASTER <-> SLAVE sync: Loading DB in memory
81354:S 23 Jan 22:12:04.467 * MASTER <-> SLAVE sync: Finished with success
I was also getting the same error when I want to switch master in redis-cluster using sentinel.
+vote-for-leader xxxxxxxxxxxxxxxxxxxxxxxx8989 10495
Next failover delay: I will not start a failover before Fri Aug 2 23:23:44 2019
After resetting sentinel. Cluster works as expected
SENTINEL RESET *
or
SENTINEL RESET mymaster
Run above command in all sentinel server.
In the event a Redis node goes down, when/if it recovers, it will recover with the same role it had prior to going down. The Sentinel cannot reconfigure the node if it is unable to ping it. So, there's a brief period of time between the node coming back up and the Sentinel acknowledging and reconfiguring it. This explains the multi-master state.
If you are set on using Haproxy, one workaround would be to reconfigure the Redis node's role prior to starting the process. Redis will boot as a slave as long as there's a SLAVEOF entry in the redis.conf. The primary issue with this workaround is that it doesn't solve network partition scenarios.
Hope that helps.
If using HAProxy you can try to query the uptime_in_seconds something like this:
backend redis
mode tcp
balance first
timeout queue 5s
default-server check inter 1s fall 2 rise 2 maxconn 100
option tcp-check
tcp-check connect
tcp-check send AUTH\ <secret>\r\n
tcp-check expect string +OK
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check send info\ replication\r\n
tcp-check expect string role:master
tcp-check send info\ server\r\n
tcp-check expect rstring uptime_in_seconds:\d{2,}
tcp-check send QUIT\r\n
tcp-check expect string +OK
server redis-1 10.0.0.10:9736
server redis-2 10.0.0.20:9736
server redis-3 10.0.0.30:9736
Notice the:
tcp-check expect rstring uptime_in_seconds:\d{2,}
if uptime is not > 10 seconds, the node will not be added
Solution
This can be resolved by making use of the rise option in your HAProxy config.
default-server check inter 1s fall 2 rise 30
# OR
server redis-1 127.0.0.1:6379 check inter 1s fall 2 rise 30
This sets the number of successful checks that must pass for a server to be considered UP. As such this can successfully delay a re-joining Redis node from being considered UP and give Sentinel a chance to change the node's role.
Important Trade-off
The trade-off with this approach, is that your fail-overs will take longer to be respected by HAProxy as you are adding in an extra delay. This delay applies to both your re-joining node after a failure and also your existing slave nodes that are promoted to role:master. Ultimately you will need to make the decision between which option is better for you; having 2 masters momentarily, or taking longer to fail between nodes.
If using haproxy a more stable solution would be to check for available slaves. After a reboot, restart or forced switch an old master will still have the role master but no slaves are connected. So the value is zero.
# faulty old master
role:master
connected_slaves:0
slave0:ip=127.0.0.2,port=6379,state=online,offset=507346829,lag=0
slave1:ip=127.0.0.1,port=6379,state=online,offset=507346966,lag=0
master_failover_state:no-failover
...
I would replace
tcp-check expect string role:master
tcp-check send info\ server\r\n
tcp-check expect rstring uptime_in_seconds:\d{2,}
with
tcp-check expect rstring connected_slaves:[^0]
Total config for me.
listen listen-REDIS
bind 1.1.1.1:6379
mode tcp
no option prefer-last-server
option tcplog
balance leastconn
option tcp-check
tcp-check send "auth STRING\r\n"
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check send info\ replication\r\n
tcp-check expect rstring connected_slaves:[^0]
tcp-check send QUIT\r\n
tcp-check expect string +OK
default-server inter 500ms fall 1 rise 1
server REDIS01 127.0.0.1:6379 check
server REDIS02 127.0.0.2:6379 check
server REDIS03 127.0.0.3:6379 check

Haproxy gateway settings - client and server are on the same subnetwork

I'm trying to setup a haproxy gateway between server and client for full transparent proxy like below diagram. My main aim is to provide load balancing.
There is a simple application that listens port 25 in the server side. The client tries to connect port 25 on the gateway machine, and haproxy on the gateway chooses an avaliable server then redirects the connection to the server.
Network analysis of this approach produces tcp flow like diagram: The client resets the connection at the end since it doesn't send a syn packet to the server.
Is this haproxy usage true and my problem related configuation? Or should the client connect to the server directly (This doesn't make much sense to me but I'm not sure actually. If this is true then how haproxy will intervene the connection and make load balancing)?
EDIT:
I've started to think this problem is related to routing and NAT on the gateway. All of these three machines are in same subnetwork but I've added routes to the gateway for both client and server. Also rules on the gateway are:
iptables -t mangle -N DIVERT
iptables -t mangle -A DIVERT -j MARK --set-mark 0x01/0x01
iptables -t mangle -A DIVERT -j ACCEPT
iptables -t mangle -A PREROUTING -p tcp -m socket -j DIVERT
iptables -t mangle -A PREROUTING -p tcp --dport 25 -j TPROXY \
--tproxy-mark 0x1/0x1 --on-port 10025
ip route flush table 100
ip rule add fwmark 1 lookup 100
ip route add local 0.0.0.0/0 dev lo table 100
Now the question is what should I do in the gateway to change "syn-ack (src: S, dst: C)" to "syn-ack (src: GW, dst: C)"?
Here is the definition of my situation.
Here comes the transparent proxy mode: HAProxy can be configured to spoof the client IP address when establishing the TCP connection to the server. That way, the server thinks the connection comes from the client directly (of course, the server must answer back to HAProxy and not to the client, otherwise it can’t work: the client will get an acknowledge from the server IP while it has established the connection on HAProxy‘s IP).
And the answer is to set ip_nonlocal_bind system control.