Redis cluster performing frequest failover between master and slave - redis

WE have a redis cluster with 4 master and 4 slave. The 4 master are on one physical host and the slaves are on other physical host. WE have observed a frequest auto failvoer happening between the master and slave even though the servers are up and running(suspecting the network glitch here). Once they are failed over the new master CPU is going high and is throwing the redis Server exception (as attached screenshot) disconnecting the clients. Below are the cluster node details:
6adb459bc1cda0ae002109140d04015c531c6910 10.10.52.38:6379 slave 0060ee610b3a52bf88a0202aff0ce63039354578 0 1555648709383 58 connected
46f38129c861ff775badc67cc869493ee28fd166 10.10.52.44:6379 slave 19538764e5cde1014f1fd35afbf1af3a217de7b4 0 1555648708378 66 connected
1833427e42afa74273aa33696ed7e5f80f40e244 10.10.52.40:6379 master - 0 1555648706367 63 connected 6827-9556 15018-16383
a0f14e54f18e1a04c448cd09e851459863e929b0 10.10.52.42:6379 slave d190c341144350bf9dbad67841104ed75ccbdcdc 0 1555648710388 65 connected
0060ee610b3a52bf88a0202aff0ce63039354578 10.10.52.37:6379 master - 0 1555648704357 58 connected 2730-5460 13653-15017
b702bbcddb6a39e2deb8567804fa5d4468fbe5cc 10.10.52.39:6379 slave 1833427e42afa74273aa33696ed7e5f80f40e244 0 1555648708879 63 connected
d190c341144350bf9dbad67841104ed75ccbdcdc 10.10.52.41:6379 master - 0 1555648710385 65 connected 9557-13652
19538764e5cde1014f1fd35afbf1af3a217de7b4 10.10.52.43:6379 myself,master - 0 0 66 connected 0-2729 5461-6826
Also the Cluster config details:
1) "cluster-node-timeout"
2) "15000"
3) "cluster-migration-barrier"
4) "1"
5) "cluster-slave-validity-factor"
6) "10"
7) "cluster-require-full-coverage"
8) "yes"
Attached is the log from one of the slave: https://pastebin.com/GS8ChyeH
Is the configuration correct? How do we prevent this from happening?

Related

Cannot add values to Redis cluster - The cluster is down

I have 4 nodes, 3 are master and 1 of them is a slave. I am trying to add a simple string by set foo bar, but whenever i do it, i get this error:
(error) CLUSTERDOWN The cluster is down
Below is my cluster info
127.0.0.1:7000cluster info
cluster_state:fail
cluster_slots_assigned:11
cluster_slots_ok:11
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:4
cluster_size:3
cluster_current_epoch:3
cluster_my_epoch:3
cluster_stats_messages_sent:9262
cluster_stats_messages_received:9160
I am using Redis-x64-3.0.503. please let me know how to solve this
Cluster Nodes:
87982f22cf8fb12c1247a74a2c26cdd1b84a3b88 192.168.2.32:7000 slave bc1c56ef4598fb4ef9d26c804c5fabd462415f71 1492000375488 1492000374508 3 connected
9527ba919a8dcfaeb33d25ef805e0f679276dc8d 192.168.2.35:7000 master - 1492000375488 1492000374508 2 connected 16380
ff999fd6cbe0e843d1a49f7bbac1cb759c3a2d47 192.168.2.33:7000 master - 1492000375488 1492000374508 0 connected 16381
bc1c56ef4598fb4ef9d26c804c5fabd462415f71 127.0.0.1:7000 myself,master - 0 0 3 connected 1-8 16383
Just to add up and simplify what #neuront said.
Redis stores data in hash slots. For this to understand you need to know how Hasmaps or Hashtables work. For our reference here redis has a constant of 16384 slots to assign and distribute to all the masters servers.
Now if we look at the node configuration you posted and refer it with redis documentation, you'll see the end numbers mean the slots assigned to the masters.
In your case this is what it looks like
... slave ... connected
... master ... connected 16380
... master ... connected 16381
... master ... connected 1-8 16380
So all the machines are connected to form up the cluster but not all the hash slots are assigned to store the information. It should've been something like this
... slave ... connected
... master ... connected 1-5461
... master ... connected 5462-10922
... master ... connected 10923-16384
You see, now we are assigning the range of all the hash slots like the documentation says
slot: A hash slot number or range. Starting from argument number 9, but there may be up to 16384 entries in total (limit never reached). This is the list of hash slots served by this node. If the entry is just a number, is parsed as such. If it is a range, it is in the form start-end, and means that the node is responsible for all the hash slots from start to end including the start and end values.
For specifically in your case, when you store some data with a key foo, it must've been assigned to some other slot not registered in the cluster.
Since you're in Windows, you'll have to manually setup the distribution. For that you'll have to do something like this (this is in Linux, translate to windows command)
for slot in {0..5400}; do redis-cli -h master1 -p 6379 CLUSTER ADDSLOTS $slot; done;
taken from this article
Hope it helped.
Only 11 slots were assigned so your cluster is down, just like the message tells you. The slots are 16380 at 192.168.2.35:7000, 16381 at 192.168.2.33:7000 and 1-8 16383 at 127.0.0.1:7000.
Of couse the direct reason is that you need to assign all 16384 slots (0-16383) to the cluster, but I think it was caused by a configuration mistake.
You have a node with a localhost address 127.0.0.1:7000. However, 192.168.2.33:7000 is also a 127.0.0.1:7000, while 192.168.2.35:7000 is also a 127.0.0.1:7000. This localhost address problem makes a node cannot tell itself from another node, and I think this causes the chaos.
I suggest you reset all the nodes by cluster reset commands and re-create the cluster again, and make sure you are using their 192.168.*.* address this time.
#user1829319 Here goes the windows equivalents for add slots:
for /l %s in (0, 1, 8191) do redis-cli -h 127.0.0.1 -p 6379 CLUSTER ADDSLOTS %s
for /l %s in (8192, 1, 16383) do redis-cli -h 127.0.0.1 -p 6380 CLUSTER ADDSLOTS %s
you should recreate your cluster by doing flush all and cluster reset and in next cluster setup make sure you verify all slots has been assigned to masters or not using > cluster slots

redis master slave replication - missing data on the slave

Problem
I have a situation where data that I created on the master doesn't seem to have been replicated to my slaves properly.
Master Redis DB Setup Info
I have a master running on 10.1.1.1. The configuration is set to "SAVE" to disk. Here's a snippet from the configuration file:
save 900 1
save 300 10
save 60 10000
When I run a scan command against the hash in question, here are the results (which are correct):
127.0.0.1:6379> scan 0 match dep:*
1) "13"
2) 1) "dep:+19999999999_00:00_00:00"
2) "dep:+19999999999_19:00_25:00"
3) "dep:+19999999999_08:00_12:00"
127.0.0.1:6379>
Slave 1 Setup
Slave 1 has been set up to run in memory only. So in the configuration file, all the "save" options have been commented out.
Here's the data I have in slave 1: (missing a record)
127.0.0.1:6379> scan 0 match dep:*
1) "15"
2) 1) "dep:+19999999999_00:00_00:00"
2) "dep:+19999999999_19:00_25:00"
127.0.0.1:6379>
When I run the "info" command on this slave, this is what I get back: (only picked specific items that I thought might pertain to this problem)
# Replication
role:slave
master_host:10.1.1.1
master_port:6379
master_link_status:up
master_last_io_seconds_ago:5
master_sync_in_progress:0
slave_repl_offset:346292
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
#Stats
expired_keys:0
#Persistence
aof_enabled:0
Slave 2 Setup
Slave 2 is also supposed to be an in memory data store only. So all the save options in the config file have also been commented out like so:
#save 900 1
#save 300 10
#save 60 10000
This is the data I have on slave 2 (notice that it's missing data, but different records from slave 1)
127.0.0.1:6379> scan 0 match dep:*
1) "3"
2) 1) "dep:+19999999999_00:00_00:00"
2) "dep:+19999999999_08:00_12:00"
127.0.0.1:6379>
Some of the results from the info command:
# Replication
role:slave
master_host:10.1.1.1
master_port:6379
master_link_status:up
master_last_io_seconds_ago:3
master_sync_in_progress:0
slave_repl_offset:346754
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
#Stats
expired_keys:0
#Persistence
aof_enabled:0
This is my first crack at using REDIS, so I'm sure it's something simple that I've missed.
I haven't tried to restart REDIS on the slaves just yet, because I don't want to lose any artifacts that might help me troubleshoot / understand how I got myself here in the first place.
Any suggestions would be appreciated.
EDIT 1
In checking the logs on slave 2, this is what I found:
4651:S 27 Sep 18:39:27.197 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
4651:S 27 Sep 18:39:27.197 # Server started, Redis version 3.0.5
4651:S 27 Sep 18:39:27.197 * The server is now ready to accept connections on port 6379
4651:S 27 Sep 18:39:27.198 * Connecting to MASTER 10.1.1.1:6379
4651:S 27 Sep 18:39:27.198 * MASTER <-> SLAVE sync started
4651:S 27 Sep 18:40:28.284 # Timeout connecting to the MASTER...
4651:S 27 Sep 18:40:28.284 * Connecting to MASTER 10.1.1.1:6379
4651:S 27 Sep 18:40:28.284 * MASTER <-> SLAVE sync started
4651:S 27 Sep 18:41:29.369 # Timeout connecting to the MASTER...
4651:S 27 Sep 18:41:29.369 * Connecting to MASTER 10.1.1.1:6379
4651:S 27 Sep 18:41:29.369 * MASTER <-> SLAVE sync started
4651:S 27 Sep 18:42:00.452 * Non blocking connect for SYNC fired the event.
4651:S 27 Sep 18:42:00.453 * Master replied to PING, replication can continue...
4651:S 27 Sep 18:42:00.453 * Partial resynchronization not possible (no cached master)
4651:S 27 Sep 18:42:00.463 * Full resync from master: b46c3622e4ef4c5586ebd2ec23eabcb04c3fcf32:1
4651:S 27 Sep 18:42:00.592 * MASTER <-> SLAVE sync: receiving 173 bytes from master
4651:S 27 Sep 18:42:00.592 * MASTER <-> SLAVE sync: Flushing old data
4651:S 27 Sep 18:42:00.592 * MASTER <-> SLAVE sync: Loading DB in memory
4651:S 27 Sep 18:42:00.592 * MASTER <-> SLAVE sync: Finished with success
How do redis slaves recover when there is a time out connecting to the master? I'm also wondering what this error means "Partial resynchronization not possible (no cached master)".
Currently googling... But if you have any comments, please feel free
EDIT 2
Here's another really interesting find (at least for me).
I just added a new item the master, like so:
127.0.0.1:6379> HMSET dep:+19999999999_15:00_18:45:00 ext 2222 dd me.net days "fri"
OK
127.0.0.1:6379> scan 0 match dep:*
1) "13"
2) 1) "dep:+19999999999_00:00_00:00"
2) "dep:+19999999999_19:00_25:00"
3) "dep:+19999999999_15:00_18:45:00"
4) "dep:+19999999999_08:00_12:00"
127.0.0.1:6379>
And now, when i check slave one again, it still only has 2 records, but its dropped a record that used to be there, and replaced it with the new i just added:
127.0.0.1:6379> scan 0 match dep:*
1) "7"
2) 1) "dep:+19999999999_00:00_00:00"
2) "dep:+19999999999_15:00_18:45:00"
127.0.0.1:6379>
EDIT 3
From the answer below, it looks like the first number returned by the SCAN command is a position in the cursor... And in reading the documentation I can specify a count indicating the number of records to return.
But this still raises some questions for me. For example, in line with the answer below, I tried the following SCAN command on a slave:
127.0.0.1:6379> scan 0 match dep:*
1) "7"
2) 1) "dep:+19999999999_00:00_00:00"
2) "dep:+19999999999_15:00_18:45:00"
127.0.0.1:6379> scan 7 match dep:*
1) "0"
2) 1) "dep:+19999999999_19:00_25:00"
2) "dep:+19999999999_08:00_12:00"
127.0.0.1:6379>
This makes sense to me... it seems to be returning 2 records at a time ( still need to figure out how I can change this default)
According to this post - Redis scan count: How to force SCAN to return all keys matching a pattern? - , I can use the "count" keyword to indicate how many records to return.
But in order to get all 4 of the records I have, I had to run several queries before the cursor value came back as a zero... and i dont' know why. For example:
127.0.0.1:6379> scan 0 match dep:* count 3
1) "10"
2) 1) "dep:+19999999999_00:00_00:00"
127.0.0.1:6379> scan 10 match dep:* count 3
1) "3"
2) (empty list or set)
127.0.0.1:6379> scan 3 match dep:* count 3
1) "7"
2) 1) "dep:+19999999999_15:00_18:45:00"
127.0.0.1:6379> scan 7 match dep:* count 3
1) "0"
2) 1) "dep:+19999999999_19:00_25:00"
2) "dep:+19999999999_08:00_12:00"
127.0.0.1:6379>
Why didn't the first request return 3 records? In my mind, at most, I should have had to run this scan command 2 times.
Can you explain what's happening here?
Also, maybe I shouldn't be using the scan command in my node js REST API? Imagine that a user will make a request for widget information... and I need to query this hash to find the key. It feels like this type of iteration would be very inefficient. The KEYS command will work too, but as per the docs,I shouldn't be using that in production because it will affect performance.
Any comments / insights would be appreciated.
You haven't iterate all keys from Redis instance.
In order to do a full iteration, you should continue sending the SCAN command to Redis with the returned cursor, until the returned cursor is 0.
In your last example:
127.0.0.1:6379> scan 0 match dep:*
1) "7" <---- returned cursor
2) 1) "dep:+19999999999_00:00_00:00"
2) "dep:+19999999999_15:00_18:45:00"
127.0.0.1:6379>
// here, you need continue sending scan command with the returned cursor, i.e. 7
127.0.0.1:6379> scan 7 match dep:*
// then you can get more results from Redis
// If the full iteration is finished, it should return something like this:
1) "0" <----- this means the full iteration is finished
2) 1) "dep:more result"
2) "dep:last result"
Edit
The count number for SCAN command is just a hint. There's no guarantee that Redis should return exactly count number results (see the doc for more details).
In order to get all keys in one shot, you can use the KEYS command. However, just as you mentioned, it's NOT a good idea (it might block Redis for a long time), and that's why Redis has this SCAN command to get all keys in iteration.
Both SCAN and KEYS commands traverse the whole key space to find the match. So if the data set is very large, they both take a long time to get/iterate all keys.
From your problem description, I think you should store your data in Redis' HASH structure, and use HKEYS, HGETALL or HSCAN to get the data:
hset dep:+19999999999 00:00_00:00:00 value
hset dep:+19999999999 15:00_18:45:00 value
hkeys dep:+19999999999 <----- get all fields in this hash
hgetall dep:+19999999999 <----- get all fields and values in this hash
hscan dep:+19999999999 0 <----- scan the hash to key fields
This should be much more efficient than traverse the whole key space. Especially, if there's not too much fields in the hash, HKEYS and HGETALL can get all keys/data in one shot, and very fast. However, if there's too much fields in the hash, you still need to use HSCAN to do iterations.

Redis server console output clarification?

I'm looking at the redis output console and I'm trying to understand the displayed info :
(didn't find that info in the quick guide)
So redis-server.exe outputs this :
/*1*/ [2476] 24 Apr 11:46:28 # Open data file dump.rdb: No such file or directory
/*2*/ [2476] 24 Apr 11:46:28 * The server is now ready to accept connections on port 6379
/*3*/ [2476] 24 Apr 11:42:35 - 1 clients connected (0 slaves), 1188312 bytes in use
/*4*/ [2476] 24 Apr 11:42:40 - DB 0: 1 keys (0 volatile) in 4 slots HT.
Regarding line #1 - what does the dump.rdb file is used for ? is it the data itself ?
what is the [2476] number ? it is not a port since line #2 tells port is 6379
What does (0 slaves) means ?
in line #3 - 1188312 bytes used - but what is the max value so i'd know overflows ...? is it for whole databases ?
Line #3 What does (0 volatile) means ?
Line #4 - why do i have 4 slots HT ? I have no data yet
[2476] - process ID
dump.rdb - redis can persist data by snapshoting, dump.rdb is the default file name http://redis.io/topics/persistence
0 slaves - redis can work in master-slave mode, 0 slaves informs you that there are no slave servers connected
1188312 bytes in use - total number of bytes allocated by Redis using its allocator
0 volatile - redis can set keys with expiration time, this is the count of them
4 slots HT - current hash table size, initial table size is 4, as you add more items hash table will grow

mount and loop0 preempting user processes for long time

I created some processes in user space and tried to visualize its working in kernelshark with the trace recorded using trace-cmd. But processes like the ones shown below are preempting my processes with real-time priority 98 as long as 4 seconds. These kind of processes don't even have a real-time priority
PID PPID CLS PRI RTPRIO CMD
359 1 TS 19 - mount.ntfs /dev/disk/by-uuid/AC72763B72760A7A /root
366 2 TS 39 - [loop0]
What can I do to run my processes uninterrupted by these kind of kernel processes?

LVS: All connections are InActConn

All connections are InActConn
I'm a newbie using LVS. I've tried LVS/TUN and LVS/DR, the result is the same, all connections are InActConn. But the realservers can be reach (through PING). Pls help!!!
OS: CentOS 6.2
RemoteAddress:Port Forward Weight ActiveConn InActConn
UDP 192.168.10.240:2345 rr
-> 192.168.10.251:2345 Tunnel 1 0 10
-> 192.168.10.252:2345 Tunnel 1 0 9
-> 192.168.10.253:2345 Tunnel 1 0 9
This is the expected behavior for services not maintaining connections, like UDP. You may want to read the LVS Howto, especially the part about Active/Inactive connections :
http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.ipvsadm.html#ActiveConn
Old Question : But I got to this post from Google and want to paste my findings here.
In the above answer, the link pasted by #remi-ggacogne missed 1 step for Real server.
You have to turn rp_filter off (esp. in Centos / RHEL ) https://www.slashroot.in/linux-kernel-rpfilter-settings-reverse-path-filtering
Open /etc/sysctl.conf and paste below lines ( as per your network interface )
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.tunl0.rp_filter = 0
To make the above active -->
$systcl -p