Redis benchmarking for HMSET, HGETALL with a data size - redis

Can someone let me know how can I use redis-benchmark to do a benchmarking for HMSET, HGETALL with a fixed data size (-d option in redis-benchmark). I am using redis 3.2.5.
I have gone through this answer and tried the below command:-
root#cache-server1:~# redis-benchmark -h a.b.c.d -p XXXX hmset hgetall myhash rand_int rand_string -d 2048
====== hmset hgetall myhash rand_int rand_string -d 2048 ======
10000 requests completed in 0.11 seconds
50 parallel clients
3 bytes payload
keep alive: 1
99.64% <= 1 milliseconds
100.00% <= 1 milliseconds
89285.71 requests per second
But looking at the output it seems it is using only 3 bytes payload.
If it is not possible via redis-benchmark can someone suggest some other alternative?

The payload is only 3 bytes (the default) because the -d is taken as part of the command. The command must be the last argument, and all switches must precede it.
Besides that, you can't use redis-benchmark to run two custom commands. Also, the -d option is only applicable to predefined tests (the ones that run by default or with the -t option) and has no meaning if the user specifies the command used in the benchmark.
If you have a specific benchmarking flow that you want to test, the best thing you can do is mock it with any client that you're comfortable with.

Related

Why does the EVALSHA command come at such a performance cost when compared to native commands run on the redis-cli client?

Here are some tests and results I have run against the redis-benchmark tool.
C02YLCE2LVCF:Downloads xxxxxx$ redis-benchmark -p 7000 -q -r 1000000 -n 2000000 JSON.SET fooz . [9999]
JSON.SET fooz . [9999]: 93049.23 requests per second
C02YLCE2LVCF:Downloads xxxxxx$ redis-benchmark -p 7000 -q -r 1000000 -n 2000000 evalsha 8d2d42f1e3a5ce869b50a2b65a8bfaafe8eff57a 1 fooz [5555]
evalsha 8d2d42f1e3a5ce869b50a2b65a8bfaafe8eff57a 1 fooz [5555]: 61132.17 requests per second
C02YLCE2LVCF:Downloads xxxxxx$ redis-benchmark -p 7000 -q -r 1000000 -n 2000000 eval "return redis.call('JSON.SET', KEYS[1], '.', ARGV[1])" 1 fooz [5555]
eval return redis.call('JSON.SET', KEYS[1], '.', ARGV[1]) 1 fooz [5555]: 57423.41 requests per second
That is a significant drop in performance for something that is supposed to have the advantage of performance for a script running server side verse the client running a script client side.
From client to EVALSHA = 34% performance loss
From EVALSHA to EVAL = 6% performance loss
The results are similar for a NON-JSON insert set command
C02YLCE2LVCF:Downloads xxxxxx$ redis-benchmark -p 7000 -q -r 1000000 -n 2000000 set fooz 3333
set fooz 3333: 116414.43 requests per second
C02YLCE2LVCF:Downloads xxxxxxx$ redis-benchmark -p 7000 -q -r 1000000 -n 2000000 evalsha e32aba8d03c97f4418a8593ed4166640651e18da 1 fooz [2222]
evalsha e32aba8d03c97f4418a8593ed4166640651e18da 1 fooz [2222]: 78520.67 requests per second
I first noticed this when I did an info commandstat and observed the poorer performance for the EVALSHA command
# Commandstats
cmdstat_ping:calls=331,usec=189,usec_per_call=0.57
cmdstat_eval:calls=65,usec=4868,usec_per_call=74.89
cmdstat_del:calls=2,usec=21,usec_per_call=10.50
cmdstat_ttl:calls=78,usec=131,usec_per_call=1.68
cmdstat_psync:calls=51,usec=2515,usec_per_call=49.31
cmdstat_command:calls=5,usec=3976,usec_per_call=795.20
cmdstat_scan:calls=172,usec=1280,usec_per_call=7.44
cmdstat_replconf:calls=185947,usec=217446,usec_per_call=1.17
****cmdstat_json.set:calls=1056,usec=26635,usec_per_call=25.22**
****cmdstat_evalsha:calls=1966,usec=68867,usec_per_call=35.03**
cmdstat_expire:calls=1073,usec=1118,usec_per_call=1.04
cmdstat_flushall:calls=9,usec=694,usec_per_call=77.11
cmdstat_monitor:calls=1,usec=1,usec_per_call=1.00
cmdstat_get:calls=17,usec=21,usec_per_call=1.24
cmdstat_cluster:calls=102761,usec=23379827,usec_per_call=227.52
cmdstat_client:calls=100551,usec=122382,usec_per_call=1.22
cmdstat_json.del:calls=247,usec=2487,usec_per_call=10.07
cmdstat_script:calls=207,usec=10834,usec_per_call=52.34
cmdstat_info:calls=4532,usec=229808,usec_per_call=50.71
cmdstat_json.get:calls=1615,usec=11923,usec_per_call=7.38
cmdstat_type:calls=78,usec=115,usec_per_call=1.47
From JSON.SET to EVALSHA there is ~30% performance reduction which is what I observed in the direct testing.
The question is, why? And, is this anything to be concerned with or is this observation within fair expectations?
For context, the reason why I am using EVALSHA and not the direct JSON.SET command is for 2 reasons.
The IORedis client library doesn't have direct support using RedisJson.
Because of the previous fact, I would have had to use send_command() which then would have sent the direct command over to the server but doesn't work with pipelining while using TypeScript. So I would have had to do every other command separately and forgo pipelining.
I thought this was supposed to be better performance?
****** Update:
So in the end, based on the below answer I refactored my code to only include 1 EVALSHA for the write because it uses 2 commands which are a set and expire command. Again, I can't single this into RedisJson so that is the reason why.
Here is the code for someones reference: Shows evalsha and fallback
await this.client.evalsha(this.luaWriteCommand, '1', documentChange.id, JSON.stringify(documentChange), expirationSeconds)
.catch((error) => {
console.error(error);
evalSHAFail = true;
});
if (evalSHAFail) {
console.error('EVALSHA for write not processed, using EVAL');
await this.client.eval("return redis.pcall('JSON.SET', KEYS[1], '.', ARGV[1]), redis.pcall('expire', KEYS[1], ARGV[2]);", '1', documentChange.id, JSON.stringify(documentChange), expirationSeconds);
console.log('SRANS FRUNDER');
this.luaWriteCommand = undefined;
Why Lua script is slower in your case?
Because EVALSHA needs to do more work than a single JSON.SET or SET command. When running EVALSHA, Redis needs to push arguments to Lua stack, run Lua script, and pop return values from Lua stack. It should be slower than a c function call for JSON.SET or SET.
So When does server side script has a performance advantage?
First of all, you must run more than one command in script, otherwise, there won't be any performance advantage as I mentioned above.
Secondly, server side script runs faster than sending serval commands to Redis one-by-one, get the results form Redis, and do the computation work on the client side. Because, Lua script saves lots of Round Trip Time.
Thirdly, if you need to do really complex computation work in Lua script. It might not be a good idea. Because Redis runs the script in a single thread, if the script takes too much time, it will block other clients. Instead, on the client side, you can take the advantage of multi-core to do the complex computation.

How can I get the memory footprint of a specific key in redis?

I'm new to Redis. How can I get the memory footprint of a specific key in redis?
db0
1) "unacked_mutex"
2) "_kombu.binding.celery"
3) "_kombu.binding.celery.pidbox"
4) "_kombu.binding.celeryev"
I just want to get the memory footprint of one specific key like "_kombu.binding.celery" , or one specific db like db0 , how can I get it?
redis_version: 2.8.17
Any commentary is very welcome. great thanks.
You are running a very old version of redis. The MEMORY command is not available in that version, so there is no precise way of getting at this information. However, you can approximate this information using the DUMP command.
Simply call DUMP _kombu.binding.celery and save the results to a file. The result is some characters and escape sequences. When you load this file into an environment like node, you can look at the length of the string and multiply by 2 to get the number of bytes. This is not precise, but it will give you a generally close approximation.
Here's what you could do:
in Redis:
$ redis-cli
127.0.0.1:6379> hset c 123 456
(integer) 0
127.0.0.1:6379> dump c
"\r\x12\x12\x00\x00\x00\r\x00\x00\x00\x02\x00\x00\xfe{\x03\xc0\xc8\x01\xff\t\x00\x10\xd4L \x908\x8b2"
in Node:
$ node
> a="\r\x12\x12\x00\x00\x00\r\x00\x00\x00\x02\x00\x00\xfe{\x03\xc0\xc8\x01\xff\t\x00\x10\xd4L \x908\x8b2"
'\r\u0012\u0012\u0000\u0000\u0000\r\u0000\u0000\u0000\u0002\u0000\u0000þ{\u0003ÀÈ\u0001ÿ\t\u0000\u0010ÔL 82'
> a.length
30
This is close to half of the actual amount that redis provides with MEMORY USAGE:
127.0.0.1:6379> MEMORY USAGE c
(integer) 63
MEMORY USAGE _kombu.binding.celery would give you the number of bytes that a key and value require to be stored in RAM.
Here is the doc for the command.

Any specific problems running (linux) BCP on "too many" threads?

Are there any specific problems with running Microsoft's BCP utility (on CentOS 7, https://learn.microsoft.com/en-us/sql/linux/sql-server-linux-migrate-bcp?view=sql-server-2017) on multiple threads? Googling could not find much, but am looking at a problem that seems to be related to just that.
Copying a set of large TSV files from HDFS to a remote MSSQL Server with some code of the form
bcpexport() {
filename=$1
TO_SERVER_ODBCDSN=$2
DB=$3
TABLE=$4
USER=$5
PASSWORD=$6
RECOMMEDED_IMPORT_MODE=$7
DELIMITER=$8
echo -e "\nRemoving header from TSV file $filename"
echo -e "Current head:\n"
echo $(head -n 1 $filename)
echo "$(tail -n +2 $filename)" > $filename
echo "First line of file is now..."
echo $(head -n 1 $filename)
# temp. workaround safeguard for NFS latency
#sleep 5 #FIXME: appears to sometimes cause script to hang, workaround implemented below, throws error if timeout reached
timeout 30 sleep 5
echo -e "\nReplacing null literal values with empty chars"
NULL_WITH_TAB="null\t" # WARN: assumes the first field is prime-key so never null
TAB="\t"
sed -i -e "s/$NULL_WITH_TAB/$TAB/g" $filename
echo -e "Lines containing null (expect zero): $(grep -c "\tnull\t" $filename)"
# temp. workaround safeguard for NFS latency
#sleep 5 #FIXME: appears to sometimes cause script to hang, workaround implemented below
timeout 30 sleep 5
/opt/mssql-tools/bin/bcp "$TABLE" in "$filename" \
$TO_SERVER_ODBCDSN \
-U $USER -P $PASSWORD \
-d $DB \
$RECOMMEDED_IMPORT_MODE \
-t "\t" \
-e ${filename}.bcperror.log
}
export -f bcpexport
parallel -q -j 7 bcpexport {} "$TO_SERVER_ODBCDSN" $DB $TABLE $USER $PASSWORD $RECOMMEDED_IMPORT_MODE $DELIMITER \
::: $DATAFILES/$TARGET_GLOB
where $DATAFILES/$TARGET_GLOB constructs a glob that lists a set of files in a directory.
When running this code for a set of TSV files, finding that sometimes some (but not all) of the parallel BCP threads fail, ie. some files successfully copy to MSSQL Server
Starting copy...
5397376 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.) Total : 154902 Average : (34843.8 rows per sec.)
while others output error message
Starting copy...
BCP copy in failed
Usually, see this pattern: a few successful BCP copy-in operations in the first few threads returned, then a bunch of failing threads return their output until run out of files (GNU Parallel returns output only when whole thread done to appear same as if sequential).
Notice in the code there is the -e option to produce an error file for each BCP copy-in operation (see https://learn.microsoft.com/en-us/sql/tools/bcp-utility?view=sql-server-2017#e). When examining the files after observing these failing behaviors, all are blank, no error messages.
Only have seen this with the number of threads >= 10 (and only for certain sets of data (assuming has something to do with total number of files are files sizes, and yet...)), no errors seen so far when using ~7 threads, which further makes me suspect this has something to do with multi-threading.
Monitoring system resources (via free -mh) shows that generally ~13GB or RAM is always available.
May be helpful to note that the data bcp is trying to copy-in may be ~500000-1000000 records long with an upper limit of ~100 columns per record.
Does anyone have any idea what could be going on here? Note, am pretty new to using BCP as well as GNU Parallel and multi-threading.
No, no issues specific to the BCP program being run in multiple threads. You seem to be on the track of what I would say your issue is, system resources. Have you monitored system resources while increasing the number of threads? If anything, there is likely an issue with BCP executing properly when memory/cpu/network resources are low. Regarding the "-e" option, it is meant to output data errors. login errors, bad table names... many other errros are not reported in the file created with the -e option. When you get output using the "-e" option, you'll see info like "value truncated" and such... will give you line numbers and sample data that was at issue.
TLDR: Adding more threads to run concurrently to have bcp copy-in files of data seems to have the affect of overwhelming the endpoint MSSQL Server with write instructions, causing the bcp threads to fail (maybe timeing out?). When the number of threads becomes too many seems to depend on the size of the files getting copy-in'ed by bcp (ie. both the number of records in the file as well as the width of each record (ie. number of columns)).
Long version (more reasons for my theory):
1.
When running a larger number of bcp threads and looking at the processes started on the machine (https://clustershell.readthedocs.io/en/latest/tools/clush.html)
ps -aux | grep bcp
seeing a bunch of sleeping processes (notice the S, see https://askubuntu.com/a/360253/760862) as shown below (added newlines for readability)
me 135296 14.5 0.0 77596 6940 ? S 00:32 0:01
/opt/mssql-tools/bin/bcp TABLENAME in /path/to/tsv/1_16_0.tsv -D -S MyMSSQLServer -U myusername -P -d myDB -c -t \t -e /path/to/logfile
These threads appear to sleep for very long time. Further debugging into why these threads are sleeping suggests that they may in fact be doing their intended job (which would further imply that the problem may be coming from BCP itself (see https://stackoverflow.com/a/52748660/8236733)). From https://unix.stackexchange.com/a/47259/260742 and https://unix.stackexchange.com/a/36200/260742)
A process in S state is usually in a blocking system call, such as reading or writing to a file or the network, or waiting for another called program to finish.
(eg. writing to the MSSQL Server endpoint destination given to bcp in the ODBCDSN)
Your process will be in S state when it is doing reads and possibly writes that are blocking. Can also happen while waiting on semaphores or other synchronization primitives... This is all normal and expected, and not usually a problem... you don't want it to waste CPU while it's waiting for user input.
2. When running different sets of files of varying record-amount-per-file (eg. ranges of 500000 - 1000000 rows/file) and record-width-per-file (~10 - 100 columns/row), found that in cases with either very large data width or amounts, running a fixed set of bcp threads would fail.
Eg. for a set of ~33 TSVs with ~500000 rows each, each row being ~100 columns wide, a set of 30 threads would write the first few OK, but then all the rest would start returning failure messages. Incorporating a bit from #jamie's answer, the fact the the failure messages returned are "BCP copy in failed" errors does not necessarily mean it has do do with the content of the data in question. Having no actual content being written into the -e errorlog files from my process, #jamie's post says this
Regarding the "-e" option, it is meant to output data errors. login errors, bad table names... many other errros are not reported in the file created with the -e option. When you get output using the "-e" option, you'll see info like "value truncated" and such... will give you line numbers and sample data that was at issue.
Meanwhile, a set of ~33 TSVs with ~500000 rows each, each row being ~100 wide, and still using 30 bcp threads would complete quickly and without error (also would be faster when reducing the number of threads or file set). The only difference here being the overall size of the data being bcp copy-in'ed to the MSSQL Server.
All this while
free -mh
still showed that the machine running the threads still had ~15GB of free RAM remaining in each case (which is again why I suspect that the problem has to do with the remote MSSQL Server endpoint rather than with the code or local machine itself).
3. When running some of the tests from (2), found that manually killing the parallel process (via CTL+C) and then trying to remotely truncate the testing table being written to with /opt/mssql-tools/bin/sqlcmd -Q "truncate table mytable" on the local machine would take a very long time (as opposed to manually logging into the MSSQL Server and executing a truncate mytable in the DB). Again this makes me think that this has something to do with the MSSQL Server having too many connections and just being overwhelmed.
** Anyone with any MSSQL Mgmt Studio experience reading this (I have basically none), if you see anything here that makes you think that my theory is incorrect please let me know your thoughts.

Redis mass insertion: protocol vs inline commands

For my task I need to load a bulk of data into Redis as soon as possible. It looks like this article is right about my case: https://redis.io/topics/mass-insert
The article starts from giving an example of using multiple inline SET commands with redis-cli. Then they proceed to generating Redis protocol and again use it with redis-cli. They don't explain the reasons or benefits of using Redis protocol.
Using of Redis protocol is a bit harder and it generates a bit more traffic. I wonder, what are the reasons to use Redis protocol rather than simple one-line commands? Probably despite the fact the data is larger, it is easier (and faster) for Redis to parse it?
Good point.
Only a small percentage of clients support non-blocking I/O, and not
all the clients are able to parse the replies in an efficient way in
order to maximize throughput. For all this reasons the preferred way
to mass import data into Redis is to generate a text file containing
the Redis protocol, in raw format, in order to call the commands
needed to insert the required data.
What I understood is that you emulate a client when you use Redis protocol directly, which would benefit from the highlighted points.
Based on the docs you provided, I tried these scripts:
test.rb
def gen_redis_proto(*cmd)
proto = ""
proto << "*"+cmd.length.to_s+"\r\n"
cmd.each{|arg|
proto << "$"+arg.to_s.bytesize.to_s+"\r\n"
proto << arg.to_s+"\r\n"
}
proto
end
(0...100000).each{|n|
STDOUT.write(gen_redis_proto("SET","Key#{n}","Value#{n}"))
}
test_no_protocol.rb
(0...100000).each{|n|
STDOUT.write("SET Key#{n} Value#{n}\r\n")
}
ruby test.rb > 100k_prot.txt
ruby test_no_protocol.rb > 100k_no_prot.txt
time cat 100k.txt | redis-cli --pipe
time cat 100k_no_prot.txt | redis-cli --pipe
I've got these results:
teixeira: ~/stackoverflow $ time cat 100k.txt | redis-cli --pipe
All data transferred. Waiting for the last reply...
Last reply received from server.
errors: 0, replies: 100000
real 0m0.168s
user 0m0.025s
sys 0m0.015s
(5 arquivo(s), 6,6Mb)
teixeira: ~/stackoverflow $ time cat 100k_no_prot.txt | redis-cli --pipe
All data transferred. Waiting for the last reply...
Last reply received from server.
errors: 0, replies: 100000
real 0m0.433s
user 0m0.026s
sys 0m0.012s

How to craft specific packets on the host of Mininet to generate massive Packet-In messages

I am wondering that how to generate massive packet-in messages to the controller to test the response time of SDN controller in the environment of Mininet.
Can you give me some advice on it?
You could use iperf to send packets, like this:
$ iperf -c -F
You could specify the amount of time:
$IPERF_TIME (-t, --time)
The time in seconds to transmit for. Iperf normally works by repeatedly sending an array of len bytes for time seconds. Default is 10 seconds. See also the -l and -n options.
Here is a nice reference for iperf: https://iperf.fr/.
If you would like to use Scapy, try this:
from scapy.all import IP, TCP, send
data = "University of Network blah blah"
a = IP(dst="129.132.2.21")/TCP()/data
send(a)