I am trying to test a very simple setup with Redis and Twemproxy but I can't find a way to make it faster.
I have 2 redis servers that I run with bare minimum configuration:
./redis-server --port 6370
./redis-server --port 6371
Both of the compiled from source and running under 1 machine with all the appropriate memory and CPUs.
If I run a redis-benchmark in one of the instances I get the following:
./redis-benchmark --csv -q -p 6371 -t set,get,incr,lpush,lpop,sadd,spop -r 100000000
"SET","161290.33"
"GET","176366.86"
"INCR","170940.17"
"LPUSH","178571.42"
"LPOP","168350.17"
"SADD","176991.16"
"SPOP","168918.92"
Now I would like to use Twemproxy in front of the two instances to distribute the requests and get a higher throughput (at least this is what I expected!).
I used the following configuration for Twemproxy:
my_cluster:
listen: 127.0.0.1:6379
hash: fnv1a_64
distribution: ketama
auto_eject_hosts: false
redis: true
servers:
- 127.0.0.1:6371:1 server1
- 127.0.0.1:6372:1 server2
And I run nutcracker as:
./nutcracker -c twemproxy_redis.yml -i 5
The results are very disappointing:
./redis-benchmark -r 1000000 --csv -q -p 6379 -t set,get,incr,lpush,lpop,sadd,spop-q -p 6379
"SET","112485.94"
"GET","113895.21"
"INCR","110987.79"
"LPUSH","145560.41"
"LPOP","149700.61"
"SADD","122100.12"
I tried to understand what is going on by getting Twemproxy's statistics as this:
telnet 127.0.0.1 22222
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
{
"service": "nutcracker",
"source": "localhost.localdomain",
"version": "0.4.1",
"uptime": 10,
"timestamp": 1452545028,
"total_connections": 303,
"curr_connections": 3,
"my_cluster": {
"client_eof": 300,
"client_err": 0,
"client_connections": 0,
"server_ejects": 0,
"forward_error": 0,
"fragments": 0,
"server1": {
"server_eof": 0,
"server_err": 0,
"server_timedout": 0,
"server_connections": 1,
"server_ejected_at": 0,
"requests": 246791,
"request_bytes": 11169484,
"responses": 246791,
"response_bytes": 1104215,
"in_queue": 0,
"in_queue_bytes": 0,
"out_queue": 0,
"out_queue_bytes": 0
},
"server2": {
"server_eof": 0,
"server_err": 0,
"server_timedout": 0,
"server_connections": 1,
"server_ejected_at": 0,
"requests": 353209,
"request_bytes": 12430516,
"responses": 353209,
"response_bytes": 2422648,
"in_queue": 0,
"in_queue_bytes": 0,
"out_queue": 0,
"out_queue_bytes": 0
}
}
}
Connection closed by foreign host.
Is there any other benchmark around that works properly? Or redis-benchmark should had worked?
I forgot to mention that I am using Redis: 3.0.6 and Twemproxy: 0.4.1
It might seem counter-intuitive, but putting two instances of redis with a proxy in front of them will certainly reduce performance!
In a single instance scenario, redis-benchmark connects directly to the redis server, and thus has minimal latency per request.
Once you put two instances and a single twemproxy in front of them, think what happens - you connect to twemproxy, which analyzes the request, chooses the right instance, and connects to it.
So, first of all, each request now has two network hops to travel instead of one. Added latency means less throughput of course.
Also, you are using just one twemproxy instance. So let's assume that twemproxy itself performs more or less like a single redis instance, you can never beat a single instance with a single proxy.
Twemproxy facilitates scaling out, not scaling up. It allows you to grow your cluster to sizes that a single instance could never achieve. But there's a latency price to pay, and as long as you're using a single proxy, it's also a throughput price.
The proxy imposes a small tax on each request. Measure throughput using the proxy with one server. Impose a load until the throughput stops growing and the response times slow to a crawl. Add another server and note the response times are restored to normal, while capacity just doubled. Of course, you'll want to add servers well before response times start to crawl.
Related
Here are some tests and results I have run against the redis-benchmark tool.
C02YLCE2LVCF:Downloads xxxxxx$ redis-benchmark -p 7000 -q -r 1000000 -n 2000000 JSON.SET fooz . [9999]
JSON.SET fooz . [9999]: 93049.23 requests per second
C02YLCE2LVCF:Downloads xxxxxx$ redis-benchmark -p 7000 -q -r 1000000 -n 2000000 evalsha 8d2d42f1e3a5ce869b50a2b65a8bfaafe8eff57a 1 fooz [5555]
evalsha 8d2d42f1e3a5ce869b50a2b65a8bfaafe8eff57a 1 fooz [5555]: 61132.17 requests per second
C02YLCE2LVCF:Downloads xxxxxx$ redis-benchmark -p 7000 -q -r 1000000 -n 2000000 eval "return redis.call('JSON.SET', KEYS[1], '.', ARGV[1])" 1 fooz [5555]
eval return redis.call('JSON.SET', KEYS[1], '.', ARGV[1]) 1 fooz [5555]: 57423.41 requests per second
That is a significant drop in performance for something that is supposed to have the advantage of performance for a script running server side verse the client running a script client side.
From client to EVALSHA = 34% performance loss
From EVALSHA to EVAL = 6% performance loss
The results are similar for a NON-JSON insert set command
C02YLCE2LVCF:Downloads xxxxxx$ redis-benchmark -p 7000 -q -r 1000000 -n 2000000 set fooz 3333
set fooz 3333: 116414.43 requests per second
C02YLCE2LVCF:Downloads xxxxxxx$ redis-benchmark -p 7000 -q -r 1000000 -n 2000000 evalsha e32aba8d03c97f4418a8593ed4166640651e18da 1 fooz [2222]
evalsha e32aba8d03c97f4418a8593ed4166640651e18da 1 fooz [2222]: 78520.67 requests per second
I first noticed this when I did an info commandstat and observed the poorer performance for the EVALSHA command
# Commandstats
cmdstat_ping:calls=331,usec=189,usec_per_call=0.57
cmdstat_eval:calls=65,usec=4868,usec_per_call=74.89
cmdstat_del:calls=2,usec=21,usec_per_call=10.50
cmdstat_ttl:calls=78,usec=131,usec_per_call=1.68
cmdstat_psync:calls=51,usec=2515,usec_per_call=49.31
cmdstat_command:calls=5,usec=3976,usec_per_call=795.20
cmdstat_scan:calls=172,usec=1280,usec_per_call=7.44
cmdstat_replconf:calls=185947,usec=217446,usec_per_call=1.17
****cmdstat_json.set:calls=1056,usec=26635,usec_per_call=25.22**
****cmdstat_evalsha:calls=1966,usec=68867,usec_per_call=35.03**
cmdstat_expire:calls=1073,usec=1118,usec_per_call=1.04
cmdstat_flushall:calls=9,usec=694,usec_per_call=77.11
cmdstat_monitor:calls=1,usec=1,usec_per_call=1.00
cmdstat_get:calls=17,usec=21,usec_per_call=1.24
cmdstat_cluster:calls=102761,usec=23379827,usec_per_call=227.52
cmdstat_client:calls=100551,usec=122382,usec_per_call=1.22
cmdstat_json.del:calls=247,usec=2487,usec_per_call=10.07
cmdstat_script:calls=207,usec=10834,usec_per_call=52.34
cmdstat_info:calls=4532,usec=229808,usec_per_call=50.71
cmdstat_json.get:calls=1615,usec=11923,usec_per_call=7.38
cmdstat_type:calls=78,usec=115,usec_per_call=1.47
From JSON.SET to EVALSHA there is ~30% performance reduction which is what I observed in the direct testing.
The question is, why? And, is this anything to be concerned with or is this observation within fair expectations?
For context, the reason why I am using EVALSHA and not the direct JSON.SET command is for 2 reasons.
The IORedis client library doesn't have direct support using RedisJson.
Because of the previous fact, I would have had to use send_command() which then would have sent the direct command over to the server but doesn't work with pipelining while using TypeScript. So I would have had to do every other command separately and forgo pipelining.
I thought this was supposed to be better performance?
****** Update:
So in the end, based on the below answer I refactored my code to only include 1 EVALSHA for the write because it uses 2 commands which are a set and expire command. Again, I can't single this into RedisJson so that is the reason why.
Here is the code for someones reference: Shows evalsha and fallback
await this.client.evalsha(this.luaWriteCommand, '1', documentChange.id, JSON.stringify(documentChange), expirationSeconds)
.catch((error) => {
console.error(error);
evalSHAFail = true;
});
if (evalSHAFail) {
console.error('EVALSHA for write not processed, using EVAL');
await this.client.eval("return redis.pcall('JSON.SET', KEYS[1], '.', ARGV[1]), redis.pcall('expire', KEYS[1], ARGV[2]);", '1', documentChange.id, JSON.stringify(documentChange), expirationSeconds);
console.log('SRANS FRUNDER');
this.luaWriteCommand = undefined;
Why Lua script is slower in your case?
Because EVALSHA needs to do more work than a single JSON.SET or SET command. When running EVALSHA, Redis needs to push arguments to Lua stack, run Lua script, and pop return values from Lua stack. It should be slower than a c function call for JSON.SET or SET.
So When does server side script has a performance advantage?
First of all, you must run more than one command in script, otherwise, there won't be any performance advantage as I mentioned above.
Secondly, server side script runs faster than sending serval commands to Redis one-by-one, get the results form Redis, and do the computation work on the client side. Because, Lua script saves lots of Round Trip Time.
Thirdly, if you need to do really complex computation work in Lua script. It might not be a good idea. Because Redis runs the script in a single thread, if the script takes too much time, it will block other clients. Instead, on the client side, you can take the advantage of multi-core to do the complex computation.
I am working in a scenario with one Redis Master and several replicas, distributed in two cities, for geographic redundancy. My point is, for some reason, in a Lua script I need to know in which city the active instance is running. I thought, quite simple :
redis.call("config", "get", "bind"), and knowing the server IP I will determine the city. Not quite:
$ cat config.lua
redis.call("config", "get", "bind")
$ redis-cli --eval config.lua
(error) ERR Error running script (call to f_9bfbf1fd7e6331d0e3ec9e46ec28836b1d40ba35): #user_script:1: #user_script: 1: This Redis command is not allowed from scripts
Why is "config" not allowed from scripts? First, even though it's no deterministic, there are no write commands.
Second, I am using version 5.0.4, and from version 5 the replication is supposed to be done by effects and not by script propagation.
Any ideas?
For my task I need to load a bulk of data into Redis as soon as possible. It looks like this article is right about my case: https://redis.io/topics/mass-insert
The article starts from giving an example of using multiple inline SET commands with redis-cli. Then they proceed to generating Redis protocol and again use it with redis-cli. They don't explain the reasons or benefits of using Redis protocol.
Using of Redis protocol is a bit harder and it generates a bit more traffic. I wonder, what are the reasons to use Redis protocol rather than simple one-line commands? Probably despite the fact the data is larger, it is easier (and faster) for Redis to parse it?
Good point.
Only a small percentage of clients support non-blocking I/O, and not
all the clients are able to parse the replies in an efficient way in
order to maximize throughput. For all this reasons the preferred way
to mass import data into Redis is to generate a text file containing
the Redis protocol, in raw format, in order to call the commands
needed to insert the required data.
What I understood is that you emulate a client when you use Redis protocol directly, which would benefit from the highlighted points.
Based on the docs you provided, I tried these scripts:
test.rb
def gen_redis_proto(*cmd)
proto = ""
proto << "*"+cmd.length.to_s+"\r\n"
cmd.each{|arg|
proto << "$"+arg.to_s.bytesize.to_s+"\r\n"
proto << arg.to_s+"\r\n"
}
proto
end
(0...100000).each{|n|
STDOUT.write(gen_redis_proto("SET","Key#{n}","Value#{n}"))
}
test_no_protocol.rb
(0...100000).each{|n|
STDOUT.write("SET Key#{n} Value#{n}\r\n")
}
ruby test.rb > 100k_prot.txt
ruby test_no_protocol.rb > 100k_no_prot.txt
time cat 100k.txt | redis-cli --pipe
time cat 100k_no_prot.txt | redis-cli --pipe
I've got these results:
teixeira: ~/stackoverflow $ time cat 100k.txt | redis-cli --pipe
All data transferred. Waiting for the last reply...
Last reply received from server.
errors: 0, replies: 100000
real 0m0.168s
user 0m0.025s
sys 0m0.015s
(5 arquivo(s), 6,6Mb)
teixeira: ~/stackoverflow $ time cat 100k_no_prot.txt | redis-cli --pipe
All data transferred. Waiting for the last reply...
Last reply received from server.
errors: 0, replies: 100000
real 0m0.433s
user 0m0.026s
sys 0m0.012s
First, I am new to Redis.
So, I measure latency with redis-cli:
$ redis-cli --latency
min: 0, max: 31, avg: 0.55 (5216 samples)^C
OK, on average I get response in 0.55 milliseconds. From this I assume that using only one connection in 1 second I can get: 1000ms / 0.55ms = 1800 requests per second.
Then on the same computer I run redis-benchmark using only one connection and get more than 6000 requests per second:
$ redis-benchmark -q -n 100000 -c 1 -P 1
PING_INLINE: 5953.80 requests per second
PING_BULK: 6189.65 requests per second
So having measured latency I expected to get around 2000 request per seconds at best. However I got 6000 request per second. I cannot find explanation for it. Am I correct when I calculate: 1000ms / 0.55ms = 1800 requests per second?
Yes, your maths are correct.
IMO, the discrepancy comes from scheduling artifacts (i.e. to the behavior of the operating system scheduler or the network loopback).
redis-cli latency is implemented by a loop which only sends a PING command before waiting for 10 ms. Let's try an experiment and compare the result of redis-cli --latency with the 10 ms wait state and without.
In order to be accurate, we first make sure the client and server are always scheduled on deterministic CPU cores. Note: it is generally a good idea to do it for benchmarking purpose on NUMA boxes. Also, make sure the frequency of the CPUs is blocked to a given value (i.e. no power mode management).
# Starting Redis
numactl -C 2 src/redis-server redis.conf
# Running benchmark
numactl -C 4 src/redis-benchmark -n 100000 -c 1 -q -P 1 -t PING
PING_INLINE: 26336.58 requests per second
PING_BULK: 27166.53 requests per second
Now let's look at the latency (with the 10 ms wait state):
numactl -C 4 src/redis-cli --latency
min: 0, max: 1, avg: 0.17761 (2376 samples)
It seems too high compared to the throughput result of redis-benchmark.
Then, we alter the source code of redis-cli.c to remove the wait state, and we recompile. The code has also been modified to display more accurate figures (but less frequently, because there is no wait state anymore).
Here is the diff against redis 3.0.5:
1123,1128c1123
< avg = ((double) tot)/((double)count);
< }
< if ( count % 1024 == 0 ) {
< printf("\x1b[0G\x1b[2Kmin: %lld, max: %lld, avg: %.5f (%lld samples)",
< min, max, avg, count);
< fflush(stdout);
---
> avg = (double) tot/count;
1129a1125,1127
> printf("\x1b[0G\x1b[2Kmin: %lld, max: %lld, avg: %.2f (%lld samples)",
> min, max, avg, count);
> fflush(stdout);
1135a1134
> usleep(LATENCY_SAMPLE_RATE * 1000);
Note that this patch should not be used against a real system, since it will make the redis-client --latency feature expensive and intrusive for the performance of the server. Its purpose is just to illustrate my point for the current discussion.
Here we go again:
numactl -C 4 src/redis-cli --latency
min: 0, max: 1, avg: 0.03605 (745280 samples)
Surprise! The average latency is now much lower. Furthermore, 1000/0.03605=27739.25, which is completely in line with the result of redis-benchmark.
Morality: the more the client loop is scheduled by the OS, the lower the average latency. It is wise to trust redis-benchmark over redis-cli --latency if your Redis clients are active enough. And anyway keep in mind the average latency does not mean much for the performance of a system (i.e. you should also look at the latency distribution, the high percentiles, etc. ..)
So I accidentally started 2 ElasticSearch instances on the same machine. One with port 9200, the other with port 9201. This means there's 2 cluster nodes, each with the same name, and each has 1/2 of the total shards for each index.
If I kill one of the instances, I now end up with 1 instance having 1/2 the shards.
How do I fix this problem? I want to have just 1 instance with all the shards in it (like it used to be)
SO... there is a clean way to resolve this. Although I must say the ElasticSearch documentation is very very confusing (all these buzzwords like cluster and zen discovery boggles my mind!)
1)
Now, if you have 2 instances, one in port 9200, and the other in 9201. And you want ALL the shards to be in 9200.
Run this command to disable allocation in the 9201 instance. You can change persistent to transient if you want this change to not be permanent. I'd keep it persistent so this doesn't ever happen again.
curl -XPUT localhost:9201/_cluster/settings -d '{
"persistent" : {
"cluster.routing.allocation.disable_allocation" : true
}
}'
2) Now, run the command to MOVE all the shards in the 9201 instance to 9200.
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"commands" : [ {
"move" :
{
"index" : "<NAME OF INDEX HERE>", "shard" : <SHARD NUMBER HERE>,
"from_node" : "<ID OF 9201 node>", "to_node" : "<ID of 9200 node>"
}
}
]
}'
You need to run this command for every shard in the 9201 instance (the one you wanna get rid of).
If you have ElasticSearch head, that shard will be purple, and will have "REALLOCATING" status. If you have lots of data, say > 1 GB, it will take awhile for the shard to move - perhaps up to a hour or even more, so be patient. Don't shutdown the instance/node until everything is done moving.
That's it!