I have a Spring Cloud micro services application spanning 4 server types: A security gateway, two UI servers and a REST APIs server. Each one of these will run on its own VM in a production environment: 4 server instances of the REST server and 2 instances of each other server.
The system is expected to serve around 30,000 users.
The service discovery is provided by Eureka. I have two Eureka servers for failover.
The shared HTTP session is provided by Spring Session & Spring Data Redis, using #EnableRedisHttpSession annotation on the participating servers.
I decided to go with a 3 VMs setup for Redis ("Example 2: basic setup with three boxes" at this URL: http://redis.io/topics/sentinel).
Each VM will run a Redis server and a Redis sentinel process (one of the Redis servers will be the master, two instances will be slaves)
This all works great on development machines and System Test machines, mostly running all processes on the same server.
I am now moving towards running performance testing on production-like environments, with multiple VMs. I would like some feedback and recommendations from developers who already have similar Spring Cloud setups in production:
What edge cases should I look for?
Are there any recommended configuration settings? My setup is shown below.
Are there configuration settings that might work well in testing environments but become serious issues in production environments?
In my specific scenario, I would also like a solution that would purge old data from Redis, since it only exists to save session information. If for some reason spring would not cleanup the session data on session expiration (for example the server was killed abruptly), I would like to have some cleanup of the really old data. I read about the LRU/Caching mechanism on Redis but it does not seem to have some guarantee with regards to time, only when some data size is reached.
Here is a configuration of my master Redis server. The slaves are pretty much the same, just different ports and indicating they are slaveof the master:
daemonize no
port 6379
dbfilename "dump6379.rdb"
dir "/Users/odedia/Work/Redis/6379"
pidfile "/Users/odedia/Work/Redis/redis6379.pid"
#logfile "/Users/odedia/Work/Redis/redis6379.log"
tcp-backlog 511
timeout 0
tcp-keepalive 60
loglevel notice
databases 16
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
slave-serve-stale-data yes
slave-read-only no
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
slave-priority 100
appendonly no
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events "gxE"
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes
Here is a Redis sentinel configuration:
port 5000
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 5000
sentinel config-epoch mymaster 59
And here is the application.yml for the Eureka server:
server:
port: 1111
eureka:
instance:
hostname: localhost
client:
serviceUrl:
defaultZone: https://${eureka.instance.hostname}:${server.port}/eureka/
registerWithEureka: false #Dont register yourself with yourself...
fetchRegistry: false
server:
waitTimeInMsWhenSyncEmpty: 0
spring:
application:
name: eureka
And here is the application.yml for the gateway server, which is responsible for the Zuul-based routing:
# Spring properties
spring:
application:
name: gateway-server # Service registers under this name
redis:
sentinel:
master: mymaster
nodes: 127.0.0.1:5000,127.0.0.1:5001,127.0.0.1:5002
server:
port: 8080
security:
sessions: ALWAYS
zuul:
retryable: true #Always retry before failing
routes:
ui1-server: /ui1/**
ui2-server: /ui2/**
api-resource-server: /rest/**
# Discovery Server Access
eureka:
client:
serviceUrl:
defaultZone: https://localhost:1111/eureka/
instance:
hostname: localhost
metadataMap:
instanceId: ${spring.application.name}:${spring.application.instance_id:${random.value}}
hystrix:
command:
default:
execution:
isolation:
strategy: THREAD
thread:
timeoutInMilliseconds: 40000 #Timeout after this time in milliseconds
ribbon:
ConnectTimeout: 5000 #try to connect to the endpoint for 5 seconds.
ReadTimeout: 50000 #try to get a response after successfull connection for 5 seconds
# Max number of retries on the same server (excluding the first try)
maxAutoRetries: 1
# Max number of next servers to retry (excluding the first server)
MaxAutoRetriesNextServer: 2
I wrote an article following my experience in production with Spring Data Redis, it is available here for those interested.
https://medium.com/#odedia/production-considerations-for-spring-session-redis-in-cloud-native-environments-bd6aee3b7d34
Related
I have configured redis-sentinel with one master and two slaves.
lets call this setup of three machines a cluster.
I have a lot of clusters running on a lot of docker containers.
On run time I manage the IP in the redis.conf file and sentinal.conf files.
My problem is;
The master node on cluster-1 somehow became slave of the master of cluster-2.
On Cluster-1 Master node I killed the redis and sentinel services, removed slaveof <cluster-2 master ip> 6379 and then restarted redis service with the edited conf file.
The moment I start redis service, It again becomes slave of cluster-2 master redis.
I tried slaveof no one from inside redis-cli but within seconds the node again turn into slave.
All this is happening without even starting sentinel service.
What is happening here? are there other entries that I would have to delete?
redis.conf
bind 0.0.0.0
protected-mode no
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize no
supervised no
pidfile "/var/run/redis_6379.pid"
loglevel notice
logfile "/var/log/redis.log"
databases 16
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename "dump.rdb"
dir "/"
slave-serve-stale-data yes
slave-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
slave-priority 100
appendonly no
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes
slaveof 192.168.60.38 6379 #this comes back again and again
For anyone who faces the same problem.
Sentinel is designed to automatically detect another sentinels in the same network.
So, Cluster-1 and Cluster-2 sentinels were all able to reach each other.
Sentinel of Cluster-1 became disloyal(pun intended) and rewrote the redis configuration of Cluster-1 redis making it slave of Cluster-2 Master redis.
Possible solutions;
1. Use Unique password between each redis setup. requirepass in redis configuration will be used.
2. Block traffic between different redis clusters.
3. Dont use sentinel all together.
I'm testing redis failover with this simple setup:
3 Ubuntu server 16.04
redis and redis-sentinel are configured on each box.
Master ip : 192.168.0.18
Resque ip : 192.168.0.16
Resque2 ip : 192.168.0.13
Data replication works well but I can't get failover to work.
When I start redis-sentinel I always get a +sdown message after 60 seconds:
14913:X 17 Jul 10:40:03.505 # +monitor master mymaster 192.168.0.18 6379 quorum 2
14913:X 17 Jul 10:41:03.525 # +sdown master mymaster 192.168.0.18 6379
this is the configuration file for redis-sentinel:
bind 192.168.0.18
port 16379
sentinel monitor mymaster 192.168.0.18 6379 2
sentinel down-after-milliseconds mymaster 60000
sentinel failover-timeout mymaster 6000
loglevel verbose
logfile "/var/log/redis/sentinel.log"
repl-ping-slave-period 5
slave-serve-stale-data no
repl-backlog-size 8mb
min-slaves-to-write 1
min-slaves-max-lag 10
the bind directive uses the proper IP for each box.
I followed the redis tutorial here: https://redis.io/topics/sentinel but I can't get the failover to work.
Redis server version : 3.2.9
The issue is all about how redis-sentinel works because sentinel can not handle password protected redis-server.
In your redis-server configuration file (/etc/redis/redis.conf) do not use "requirepass" directive if you want to use redis-sentinel.
I have initially formed a Redis cluster with 3 masters and 3 slaves in my local machine.
Now I want to use twemproxy on Redis cluster so I have used, the below lines(see below) as my config file and implemented twemproxy with Redis cluster.
But the problem I am facing is, out of 100 keys I have sent to port 22122 only 30-40 are been registered in the Redis cluster.
Please help!
beta:
listen: **.**.**.***:22122
hash: fnv1a_64
hash_tag: "{}"
distribution: ketama
auto_eject_hosts: false
timeout: 400
redis: true
servers:
- **.**.**.***:8006:3 server1
- **.**.**.***:8007:2 server2
- **.**.**.***:8008:1 server3
You can't use Twemproxy and Redis Cluster together because they both try to shed keys across a cluster. Pick one and use just that one.
That said this isn't a stackexchange question because it is about administration. Try any further related questions on serverfault.
I am trying to implement a Redis cluster with 6 machine.
I have a vagrant cluster of six machines:
192.168.56.101
192.168.56.102
192.168.56.103
192.168.56.104
192.168.56.105
192.168.56.106
all running redis-server
I edited /etc/redis/redis.conf file of all the above servers adding this
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
cluster-slave-validity-factor 0
appendonly yes
I then ran this on one of the six machines;
./redis-trib.rb create --replicas 1 192.168.56.101:6379 192.168.56.102:6379 192.168.56.103:6379 192.168.56.104:6379 192.168.56.105:6379 192.168.56.106:6379
A Redis cluster is up and running. I checked manually by setting value in one machine it shows up on other machine.
$ redis-cli -p 6379 cluster nodes
3c6ffdddfec4e726f29d06a6da550f94d976f859 192.168.56.105:6379 master - 0 1450088598212 5 connected
47d04bc98ab42fc793f9f382855e5c54ab8f2e20 192.168.56.102:6379 slave caf2cec45114dc8f4cbc6d96c6dbb20b62a39f90 0 1450088598716 7 connected
040d4bb6a00569fc44eec05440a5fe0796952ccf 192.168.56.101:6379 myself,slave 5318e48e9ef0fc68d2dc723a336b791fc43e23c8 0 0 4 connected
caf2cec45114dc8f4cbc6d96c6dbb20b62a39f90 192.168.56.104:6379 master - 0 1450088599720 7 connected 0-10922
d78293d0821de3ab3d2bca82b24525e976e7ab63 192.168.56.106:6379 slave 5318e48e9ef0fc68d2dc723a336b791fc43e23c8 0 1450088599316 8 connected
5318e48e9ef0fc68d2dc723a336b791fc43e23c8 192.168.56.103:6379 master - 0 1450088599218 8 connected 10923-16383
My problem is that when I shutdown or stop redis-server on any one machine which is master the whole cluster goes down, but if all the three slaves die the cluster still works properly.
What should I do so that a slave turns a master if a master fails(Fault tolerance)?
I am under the assumption that redis handles all those things and I need not worry about it after deploying the cluster. Am I right or would I have to do thing myself?
Another question is lets say I have six machine of 16GB RAM. How much total data I would be able to handle on this Redis cluster with three masters and three slaves?
Thank you.
the setting cluster-slave-validity-factor 0 may be the culprit here.
from redis.conf
# A slave of a failing master will avoid to start a failover if its data
# looks too old.
In your setup the slave of the terminated master considers itself unfit to be elected master since the time it last contacted master is greater than the computed value of:
(node-timeout * slave-validity-factor) + repl-ping-slave-period
Therefore, even with a redundant slave, the cluster state is changed to DOWN and becomes unavailable.
You can try with a different value, example, the suggested default
cluster-slave-validity-factor 10
This will ensure that the cluster is able to tolerate one random redis instance failure. (it can be slave or a master instance)
For your second question: Six machines of 16GB RAM each will be able to function as a Redis Cluster of 3 Master instances and 3 Slave instances. So theoretical maximum is 16GB x 3 data. Such a cluster can tolerate a maximum of ONE node failure if cluster-require-full-coverage is turned on. else it may be able to still serve data in the shards that are still available in the functioning instances.
I'm successfully using Redis for Windows (2.6.8-pre2) in a master slave setup. However, I need to provide some automated failover capability, and it appears the sentinel is the most popular choice. When I run redis in sentinel mode the sentinel connects, but it always thinks the master is down. Also, when I run the sentinel master command it reports that there are 0 slaves (not true) and that there are no other sentinels (again, not true). So it's like it connects to the master, but not correctly.
Has anyone else seen this issue on Windows and, more importantly, is anyone successfully using sentinel in a windows environment? Any help or direction at all is appreciated!
I recommend use this:
1 master node redis server 1 slave node redis server
List item 3 redis sentinels with a quorum of 2
It's so important have more than have 3 sentinels to get a odd quorum.
I made this configuration in Windows 7 and it's working well.
Example of sentinel conf:
port 20001
logfile "sentinel1.log"
sentinel monitor shard1 127.0.0.1 16379 2
sentinel down-after-milliseconds shard1 5000
sentinel failover-timeout shard1 30000