Apache Ignite poor performance compared to Redis

Apache Ignite poor performance compared to Redis - ignite

I did a simplistic benchmark of apache ignite recently and found that when I made use of the ignite client across 50 threads, the performance degraded tremendously. The benchmark pits Redis against Ignite and Jedis, the Redis java client, seems to do much better in the threaded scenario since it keeps multiple clients in a pool rather than using one client among multiple threads.
My question is, am I seeing the degraded performance as a result of what i've just articulated, or is this actually expected and the client isn't the bottleneck?
Here were my results. The local host is a mac pro with 12 2.7GHz cores and 64gb of ram. The remote host was an AWS m4.4xlarge.
Benchmark Results
*50 - 118kb Audio File Streamed and Read Simultaneously from One Node in 4096 byte chunks*
Redis
*Notes:*
I used a 50ms polling interval to check for updates to the cache.
*Results:*
Local:
- Total Time to stream and read 50 audio files: 226ms.
- average complete read and write: 125ms
- average time to first byte read: 26ms
- average read time per runner: 103ms
- average write time per runner: 71ms
- p99 time to first byte: 59ms
- p90 time to first byte: 57ms
- p50 time to first byte: 6ms
Remote (Over SSH | Seattle → IAD):
- Total Time to stream and read 50 audio files: 1405ms.
- average complete read and write: 1298ms
- average time to first byte read: 81ms
- average read time per runner: 1277ms
- average write time per runner: 1238ms
- p99 time to first byte: 148ms
- p90 time to first byte: 126ms
- p50 time to first byte: 84ms
Remote (Through VIP | Seattle → IAD):
- Total Time to stream and read 50 audio files: 2035ms.
- average complete read and write: 1245ms
- average time to first byte read: 67ms
- average read time per runner: 1226ms
- average write time per runner: 1034ms
- p99 time to first byte: 161ms
- p90 time to first byte: 87ms
- p50 time to first byte: 74ms
Ignite
*Notes:*
I have a feeling these numbers are artificially inflated. I think the client is not well built for extreme parallelism. I believe it's doing quite a bit of locking. I think if you were to have many nodes doing the same amount of work, the numbers might be better. This would require more in depth benchmarking. This is 50 caches, one cache group.
*Results:*
Local:
- Total Time to stream and read 50 audio files: 327ms.
- average complete read and write: 321ms
- average time to first byte read: 184ms
- average read time per runner: 225ms
- average write time per runner: 35ms
- p99 time to first byte: 212ms
- p90 time to first byte: 197ms
- p50 time to first byte: 191ms
Remote (Over SSH | Seattle → IAD):
- Total Time to stream and read 50 audio files: 5148ms.
- average complete read and write: 4483ms
- average time to first byte read: 947ms
- average read time per runner: 3224ms
- average write time per runner: 2779ms
- p99 time to first byte: 4936ms
- p90 time to first byte: 926ms
- p50 time to first byte: 577ms
Remote (Through VIP | Seattle → IAD):
- Total Time to stream and read 50 audio files: 4840ms.
- average complete read and write: 4287ms
- average time to first byte read: 780ms
- average read time per runner: 3035ms
- average write time per runner: 2562ms
- p99 time to first byte: 4458ms
- p90 time to first byte: 857ms
- p50 time to first byte: 566ms
*1 - 118kb Audio File Streamed and Read Simultaneously from One Node in 4096 byte chunks*
Redis
*Notes:*
I used a 50ms polling interval to check for updates to the cache.
*Results:*
Local:
- Total Time to stream and read 1 audio files: 62ms.
- average complete read and write: 62ms
- average time to first byte read: 55ms
- average read time per runner: 61ms
- average write time per runner: 3ms
- p99 time to first byte: 55ms
- p90 time to first byte: 55ms
- p50 time to first byte: 55ms
Remote (Over SSH | Seattle → IAD):
- Total Time to stream and read 1 audio files: 394ms.
- average complete read and write: 394ms
- average time to first byte read: 57ms
- average read time per runner: 394ms
- average write time per runner: 342ms
- p99 time to first byte: 57ms
- p90 time to first byte: 57ms
- p50 time to first byte: 57ms
Remote (Through VIP | Seattle → IAD):
- Total Time to stream and read 1 audio files: 388ms.
- average complete read and write: 388ms
- average time to first byte read: 61ms
- average read time per runner: 388ms
- average write time per runner: 343ms
- p99 time to first byte: 61ms
- p90 time to first byte: 61ms
- p50 time to first byte: 61ms
Ignite
*Notes:*
None
*Results:*
Local:
- Total Time to stream and read 1 audio files: 32ms.
- average complete read and write: 32ms
- average time to first byte read: 2ms
- average read time per runner: 23ms
- average write time per runner: 11ms
- p99 time to first byte: 2ms
- p90 time to first byte: 2ms
- p50 time to first byte: 2ms
Remote (Over SSH | Seattle → IAD):
- Total Time to stream and read 1 audio files: 259ms.
- average complete read and write: 258ms
- average time to first byte read: 19ms
- average read time per runner: 232ms
- average write time per runner: 169ms
- p99 time to first byte: 19ms
- p90 time to first byte: 19ms
- p50 time to first byte: 19ms
Remote (Through VIP | Seattle → IAD):
- Total Time to stream and read 1 audio files: 203ms.
- average complete read and write: 203ms
- average time to first byte read: 20ms
- average read time per runner: 174ms
- average write time per runner: 93ms
- p99 time to first byte: 20ms
- p90 time to first byte: 20ms
- p50 time to first byte: 20ms
UPDATE:
To make more apparent what I'm trying to do:
I'm going to have 50+ million devices streaming audio. The streams could be 100kb on average and 200k streams/minute at peak traffic. I'm looking for a storage solution to accommodate that need. I've been examining Bookkeeper, Kafka, Ignite, Cassandra, and Redis. So far i've only benchmarked redis and ignite, but i'm surprised ignite is so slow.

I reviewed your benchmark and made a couple runs locally. I was able to make it much faster:
30 iteration isn't enough for JVM to warm up, on my laptop, it
requires ~150 iterations. So I increased it to 300 iterations.
I moved cache creation out of the benchmark, I added it right after cache destroying.
Also, I moved out of benchmark ignite client creation, it's extremely expensive operation and in real life, you should reuse it.
Please take a look at my changes, I created a pull request:
https://github.com/Sahasrara/AudioStreamStoreDemo/pull/1/files

I don't think you should be creating a cache for every operation. This is a heavy operation with Ignite. What are your requirements for this?
I can see how performance greatly improves for subsequent runs. Ignite is based on Java, which needs some time to warm-up.

You should definitely avoid creating a lot of caches.
Use a cache group in order to share infrastructure between your caches.
Keep a barrel of N caches and re-purpose them for a new files as time goes by, freeing them once file is no longer used. This will need keeping some accounting.
Better yet, find a way to just use one cache, keep file identifier in composite cache key, keep track of what you have in cache.

Related

Scheduling hourly FW job for finite time | Autosys

Working with CA Autosys AE.
A file Watcher job ABC_FW is defined as below, then this FW triggers XYZ job upon success.
insert_job: ABC_FW job_type: FW
machine: <machine_name>
owner: <owner>
date_conditions: 1
days_of_week: mo, tu, we, th, fr
start_mins: 00
run_window: "09:00-15:00"
watch_file: "/tmp/test.txt"
watch_interval: 60
This FW watcher should not be running after 16:00.
test.txt is expected to be received on hourly basis but thats not gurantee. The file is recived at any time of the day or it might not be recived at all.
If test.txt is received apart from this "09:00-15:00" i.e. after 16:00, it should pickup only at 09:00, next scheduled time.
I wish to terminate my FW job after 16:00. Which attribute i can use. ?
term_run_time cannot be defined to serve this FW behaviour.

To use your term_run_time you can use File Trigger (FT) instead of FW.
Are you only getting 1 file a day or sometimes not at all, you can do term_run_time for 360 minutes which would terminate it 15:00 and then start over again next day at 9:00 and pick it up if file was recived later.

How to get encoding time per frame using x265 v1.7?

Using -log-level 4 on x265 v1.4 I get encoding time and elapsed time per frame.
Now using v1.7 I don't get these values in csv. Instead I get DecideWait (ms), Row0Wait (ms), Wall time (ms), Ref Wait Wall (ms), Total CTU time (ms), Stall Time (ms), Avg WPP, amd Row Blocks.
I tried -log-level values 3,4 and 5 and still no luck.
can you help me please?

I found the solution.
x265 v2.5 now gives the Total frame time (ms) in the csv log file.
When you run an x265 encoding add the following flag and csv created file location:
--csv-log-level 2 --csv info.csv
More info can be found here and here.
For x265 earlier versions, a rough estimation of the elapsed time per frame is calculated by the summation of DecideWait (ms), Row0Wait (ms), and Wall time (ms).

How to parse bhist log

I am using IBM LSF and trying to get usage statistics during a certain period. I found that bhist does the job, but the short form bhist output does not show all of the fields I need.
What I want to know is:
Is bhist's output field customizable? The fields I need are:
<jobid>
<user>
<queue>
<job_name>
<project_name>
<job_description>
<submission_time>
<pending_time>
<run_time>
If 1 is not possible, the long form (bhist -l) output shows everything I need, but the format is hard to manipulate. I've pasted an example of the format below.
For example, the number of line between records is not fixed, and the word wrap in each event may break the line in the middle of a word I'm trying to scan for. How do I parse this format with sed and awk?
JobId <1531>, User <user1>, Project <default>, Command< example200>
Fri Dec 27 13:04:14: Submitted from host <hostA> to Queue <priority>, CWD <$H
OME>, Specified Hosts <hostD>;
Fri Dec 27 13:04:19: Dispatched to <hostD>;
Fri Dec 27 13:04:19: Starting (Pid 8920);
Fri Dec 27 13:04:20: Running with execution home </home/user1>, Execution CWD
</home/user1>, Execution Pid <8920>;
Fri Dec 27 13:05:49: Suspended by the user or administrator;
Fri Dec 27 13:05:56: Suspended: Waiting for re-scheduling after being resumed
by user;
Fri Dec 27 13:05:57: Running;
Fri Dec 27 13:07:52: Done successfully. The CPU time used is 28.3 seconds.
Summary of time in seconds spent in various states by Sat Dec 27 13:07:52 1997
PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
5 0 205 7 1 0 218
------------------------------------------------------------
.... repeat

I'm adding a second answer because it might help you with your problem without actually having to write your own solution (depending on the usage statistics you're after).
LSF already has a utility called bacct that computes and prints out various usage statistics about historical LSF jobs filtered by various criteria.
For example, to get summary usage statistics about jobs that were dispatched/completed/submitted between time0 and time1, you can use (respectively):
bacct -D time0,time1
bacct -C time0,time1
bacct -S time0,time1
Statistics about jobs submitted by a particular user:
bacct -u <username>
Statistics about jobs submitted to a particular queue:
bacct -q <queuename>
These options can be combined as well, so for example if you wanted statistics about jobs that were submitted and completed within a particular time window for a particular project, you can use:
bacct -S time0,time1 -C time0,time1 -P <projectname>
The output provides some summary information about all jobs that match the provided criteria like so:
$ bacct -u bobbafett -q normal
Accounting information about jobs that are:
- submitted by users bobbafett,
- accounted on all projects.
- completed normally or exited
- executed on all hosts.
- submitted to queues normal,
- accounted on all service classes.
------------------------------------------------------------------------------
SUMMARY: ( time unit: second )
Total number of done jobs: 0 Total number of exited jobs: 32
Total CPU time consumed: 46.8 Average CPU time consumed: 1.5
Maximum CPU time of a job: 9.0 Minimum CPU time of a job: 0.0
Total wait time in queues: 18680.0
Average wait time in queue: 583.8
Maximum wait time in queue: 5507.0 Minimum wait time in queue: 0.0
Average turnaround time: 11568 (seconds/job)
Maximum turnaround time: 43294 Minimum turnaround time: 40
Average hog factor of a job: 0.00 ( cpu time / turnaround time )
Maximum hog factor of a job: 0.02 Minimum hog factor of a job: 0.00
Total Run time consumed: 351504 Average Run time consumed: 10984
Maximum Run time of a job: 1844674 Minimum Run time of a job: 0
Total throughput: 0.24 (jobs/hour) during 160.32 hours
Beginning time: Nov 11 17:55 Ending time: Nov 18 10:14
This command also has a long form output that provides some bhist -l-like information about each job that might be a bit easier to parse (although still not all that easy):
$ bacct -l -u bobbafett -q normal
Accounting information about jobs that are:
- submitted by users bobbafett,
- accounted on all projects.
- completed normally or exited
- executed on all hosts.
- submitted to queues normal,
- accounted on all service classes.
------------------------------------------------------------------------------
Job <101>, User <bobbafett>, Project <default>, Status <EXIT>, Queue <normal>,
Command <sleep 100000000>
Wed Nov 11 17:37:45: Submitted from host <endor>, CWD <$HOME>;
Wed Nov 11 17:55:05: Completed <exit>; TERM_OWNER: job killed by owner.
Accounting information about this job:
CPU_T WAIT TURNAROUND STATUS HOG_FACTOR MEM SWAP
0.00 1040 1040 exit 0.0000 0M 0M
------------------------------------------------------------------------------
...

Long form output is pretty hard to parse. I know bjobs has an option for unformatted output (-UF) in older LSF versions which makes it a bit easier, and the most recent version of LSF allows you to customize which columns get printed in short form output with -o.
Unfortunately, neither of these options are available with bhist. The only real possibilities for historical information are:
Figure out some way to parse bhist -l -- impractical and maybe not even possible due to inconsistent formatting as you've discovered.
Write a C program to do what you want using the LSF API, which exposes the functions that bhist itself uses to parse the lsb.events file. This is the file that stores all the historical information about the LSF cluster, and is what bhist reads to generate its ouptut.
If C is not an option for you, then you could try writing a script to parse the lsb.events file directly -- the format is documented in the configuration reference. This is hard, but not impossible. Here is the relevant document for LSF 9.1.3.
My personal recommendation would be #2 -- the function you're looking for is lsb_geteventrec(). You'd basically read each line in lsb.events one at a time and pull out the information you need.

How to understand redis-cli's result vs redis-benchmark's result

First, I am new to Redis.
So, I measure latency with redis-cli:
$ redis-cli --latency
min: 0, max: 31, avg: 0.55 (5216 samples)^C
OK, on average I get response in 0.55 milliseconds. From this I assume that using only one connection in 1 second I can get: 1000ms / 0.55ms = 1800 requests per second.
Then on the same computer I run redis-benchmark using only one connection and get more than 6000 requests per second:
$ redis-benchmark -q -n 100000 -c 1 -P 1
PING_INLINE: 5953.80 requests per second
PING_BULK: 6189.65 requests per second
So having measured latency I expected to get around 2000 request per seconds at best. However I got 6000 request per second. I cannot find explanation for it. Am I correct when I calculate: 1000ms / 0.55ms = 1800 requests per second?

Yes, your maths are correct.
IMO, the discrepancy comes from scheduling artifacts (i.e. to the behavior of the operating system scheduler or the network loopback).
redis-cli latency is implemented by a loop which only sends a PING command before waiting for 10 ms. Let's try an experiment and compare the result of redis-cli --latency with the 10 ms wait state and without.
In order to be accurate, we first make sure the client and server are always scheduled on deterministic CPU cores. Note: it is generally a good idea to do it for benchmarking purpose on NUMA boxes. Also, make sure the frequency of the CPUs is blocked to a given value (i.e. no power mode management).
# Starting Redis
numactl -C 2 src/redis-server redis.conf
# Running benchmark
numactl -C 4 src/redis-benchmark -n 100000 -c 1 -q -P 1 -t PING
PING_INLINE: 26336.58 requests per second
PING_BULK: 27166.53 requests per second
Now let's look at the latency (with the 10 ms wait state):
numactl -C 4 src/redis-cli --latency
min: 0, max: 1, avg: 0.17761 (2376 samples)
It seems too high compared to the throughput result of redis-benchmark.
Then, we alter the source code of redis-cli.c to remove the wait state, and we recompile. The code has also been modified to display more accurate figures (but less frequently, because there is no wait state anymore).
Here is the diff against redis 3.0.5:
1123,1128c1123
< avg = ((double) tot)/((double)count);
< }
< if ( count % 1024 == 0 ) {
< printf("\x1b[0G\x1b[2Kmin: %lld, max: %lld, avg: %.5f (%lld samples)",
< min, max, avg, count);
< fflush(stdout);
---
> avg = (double) tot/count;
1129a1125,1127
> printf("\x1b[0G\x1b[2Kmin: %lld, max: %lld, avg: %.2f (%lld samples)",
> min, max, avg, count);
> fflush(stdout);
1135a1134
> usleep(LATENCY_SAMPLE_RATE * 1000);
Note that this patch should not be used against a real system, since it will make the redis-client --latency feature expensive and intrusive for the performance of the server. Its purpose is just to illustrate my point for the current discussion.
Here we go again:
numactl -C 4 src/redis-cli --latency
min: 0, max: 1, avg: 0.03605 (745280 samples)
Surprise! The average latency is now much lower. Furthermore, 1000/0.03605=27739.25, which is completely in line with the result of redis-benchmark.
Morality: the more the client loop is scheduled by the OS, the lower the average latency. It is wise to trust redis-benchmark over redis-cli --latency if your Redis clients are active enough. And anyway keep in mind the average latency does not mean much for the performance of a system (i.e. you should also look at the latency distribution, the high percentiles, etc. ..)

Apache benchmark: what does the total mean milliseconds represent?

I am benchmarking php application with apache benchmark. I have the server on my local machine. I run the following:
ab -n 100 -c 10 http://my-domain.local/
And get this:
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 3 3.7 2 8
Processing: 311 734 276.1 756 1333
Waiting: 310 722 273.6 750 1330
Total: 311 737 278.9 764 1341
However, if I refresh my browser on the page http://my-domain.local/ I find out it takes a lot longer than 737 ms (the mean that ab reports) to load the page (around 3000-4000 ms). I can repeat this many times and the loading of the page in the browser always takes at least 3000 ms.
I tested another, heavier page (page load in browser takes 8-10 seconds). I used a concurrency of 1 to simulate one user loading the page:
ab -n 100 -c 1 http://my-domain.local/heavy-page/
And the results are here:
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 1
Processing: 17 20 4.7 18 46
Waiting: 16 20 4.6 18 46
Total: 17 20 4.7 18 46
So what does the total line on the ab results actually tell? Clearly it's not the number of milliseconds the browser is loading the web page. Is the number of milliseconds that it takes from browser to load the page (X) linearly dependent of number of the total mean milliseconds ab reports (Y)? So if I'm able to reduce half of Y, have I also reduced half of X?
(Also Im not really sure what Processing, Waiting and Total mean).
I'll reopen this question since I'm facing the problem again.
Recently I installed Varnish.
I run ab like this:
ab -n 100 http://my-domain.local/
Apache bench reports very fast response times:
Requests per second: 462.92 [#/sec] (mean)
Time per request: 2.160 [ms] (mean)
Time per request: 2.160 [ms] (mean, across all concurrent requests)
Transfer rate: 6131.37 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 1 2 2.3 1 13
Waiting: 0 1 2.0 1 12
Total: 1 2 2.3 1 13
So the time per request is about 2.2 ms. When I browse the site (as an anonymous user) the page load time is about 1.5 seconds.
Here is a picture from Firebug net tab. As you can see my browser is waiting 1.68 seconds for my site to response. Why is this number so much bigger than the request times ab reports?

Are you running ab on the server? Don't forget that your browser is local to you, on a remote network link. An ab run on the webserver itself will have almost zero network overhead and report basically the time it takes for Apache to serve up the page. Your home browser link will have however many milliseconds of network transit time added in, on top of the basic page-serving overhead.

Ok.. I think I know what's the problem. While I have been measuring the page load time in browser I have been logged in.. So none of the heavy stuff is happening. The page load times in browser with anonymous user are closer to the ones ab is reporting.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Apache Ignite poor performance compared to Redis - ignite

I don't think you should be creating a cache for every operation. This is a heavy operation with Ignite. What are your requirements for this? I can see how performance greatly improves for subsequent runs. Ignite is based on Java, which needs some time to warm-up.

Related

Scheduling hourly FW job for finite time | Autosys

How to get encoding time per frame using x265 v1.7?

How to parse bhist log

How to understand redis-cli's result vs redis-benchmark's result

Apache benchmark: what does the total mean milliseconds represent?

Categories

Resources