How to Increase Flask RestAPI Concurrent Request Performance - api

I'm using "gunicorn" and "gevent" to serve flask APIs with 4 workers. When I do AB (apache-benchmark) test to a single API response time per request is 60ms. However, when I call the second API from the first one response time per request is going up to 358ms. As I share the code sample of APIs, they only return a "Hi there! " response. What could be the reason for this response time increase?
First API
import requests
from flask import Flask, request
api_url = f'http://127.0.0.1:4000/'
app = Flask(__name__)
#app.route('/', methods=['GET'])
def index():
return 'Hi there! '
#app.route('/test', methods=['GET'])
def test():
resp = requests.get(f'{api_url}')
response = resp.text
return response
Second API
from flask import Flask, request
app = Flask(__name__)
#app.route('/', methods=['GET'])
def index():
return 'Hi there!'
ab test on "/" path with 1000 concurrent 10000 request
Benchmarking 0.0.0.0 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software: gunicorn
Server Hostname: 0.0.0.0
Server Port: 3000
Document Path: /
Document Length: 10 bytes
Concurrency Level: 1000
Time taken for tests: 0.602 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 1630000 bytes
HTML transferred: 100000 bytes
Requests per second: 16604.12 [#/sec] (mean)
Time per request: 60.226 [ms] (mean)
Time per request: 0.060 [ms] (mean, across all concurrent requests)
Transfer rate: 2643.04 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 3.2 0 18
Processing: 2 55 11.7 59 60
Waiting: 1 55 11.7 59 60
Total: 8 56 9.1 59 67
Percentage of the requests served within a certain time (ms)
50% 59
66% 59
75% 59
80% 59
90% 60
95% 60
98% 60
99% 60
100% 67 (longest request)
ab test on "/test" path by calling the second API with 1000 concurrent 10000 request response result per request -> 358.406 ms
Benchmarking 0.0.0.0 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software: gunicorn
Server Hostname: 0.0.0.0
Server Port: 3000
Document Path: /test
Document Length: 9 bytes
Concurrency Level: 1000
Time taken for tests: 3.584 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 1610000 bytes
HTML transferred: 90000 bytes
Requests per second: 2790.13 [#/sec] (mean)
Time per request: 358.406 [ms] (mean)
Time per request: 0.358 [ms] (mean, across all concurrent requests)
Transfer rate: 438.68 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 3.1 0 13
Processing: 16 339 54.2 358 446
Waiting: 3 339 54.2 358 446
Total: 16 340 53.4 359 458
Percentage of the requests served within a certain time (ms)
50% 359
66% 364
75% 367
80% 369
90% 372
95% 376
98% 379
99% 401
100% 458 (longest request)

Related

How to scale Cisco Joy capturing speed over 5 GBPS or even more

Currently I am capturing network packets using tcpreplay at a speed of 800 MBPS but I want to scale it over 5 GBPS.
I am running Joy on a server with 16GB Ram and 8 Cores
Tcpreplay Output:
`Actual: 2427978 packets (2098973496 bytes) sent in 20.98 seconds
Rated: 100003501.6 Bps, 800.02 Mbps, 115678.59 pps
Flows: 49979 flows, 2381.11 fps, 2426216 flow packets, 1756 non-flow
Statistics for network device: vth0
Successful packets: 2427978
Failed packets: 0
Truncated packets: 0
Retried packets (ENOBUFS): 0
Retried packets (EAGAIN): 0`
Total Packets Captured: 2412876
I am running Joy on 4 threads but even if I use 24 threads I am not able to see any drastic change in the capturing or receiving speed.
Joy is using af_packet with Zero Copy Ring Buffer and even Cisco mercury use the same mechanism to write packets but they claim that Mercury can write at 40 GBPS on a server-class hardware so anyone have any suggestion on this issue then please revert back.

Fatal error: Allowed memory size of 268 435 456 bytes exhausted (tried to allocate 443 505 68 bytes)

While I was trying to run a insert query the problem arises.
Is there a way to fix it?
Fatal error: Allowed memory size of 268 435 456 bytes exhausted (tried to allocate 443 505 68 bytes)
in /home/customer/public_html/phpmyadmin/vendor/phpmyadmin/sql-parser/src/Token.php on line 257
I expect to insert the long text value successfully

iperf2 latency is a two way or one way latency

iperf2 (version 2.0.9) reports latency in its output as shown below.
Is it a two-way latency or one-way latency measurement ?
Server listening on UDP port 5001 with pid 5167
Receiving 1470 byte datagrams
UDP buffer size: 208 KByte (default)
[ 3] local 192.168.1.102 port 5001 connected with 192.168.1.101 port 59592
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Latency avg/min/max/stdev PPS
[ 3] 0.00-1.00 sec 122 KBytes 1.00 Mbits/sec 0.063 ms 0/ 6254 (0%) 659.932/659.882/660.502/ 8.345 ms 6252 pps
[ 3] 1.00-2.00 sec 122 KBytes 1.00 Mbits/sec 0.020 ms 0/ 6250 (0%) 660.080/659.919/666.878/ 0.110 ms 6250 pps
[ 3] 2.00-3.00 sec 122 KBytes 1.00 Mbits/sec 0.020 ms 0/ 6250 (0%) 660.113/659.955/660.672/ 0.047 ms 6250 pps
[ 3] 3.00-4.00 sec 122 KBytes 1.00 Mbits/sec 0.022 ms 0/ 6250 (0%) 660.153/659.994/660.693/ 0.047 ms 6250 pps
[ 3] 4.00-5.00 sec 122 KBytes 1.00 Mbits/sec 0.021 ms 0/ 6250 (0%) 660.192/660.034/660.617/ 0.049 ms 6250 pps
It's one-way which requires the clocks to be synchronized to a common reference. You may want to check in to Precision Time Protocol. Also, tell your hosting provider that you want better clocks in their data centers. The GPS atomic clock is quite accurate and the signal is free.
There is a lot more work going on with iperf 2.0.14 related to TCP write to read latencies. Version 2.0.14 will enforce the use of --trip-times on the client before any end/end or one way latency measurements are presented. This way the user tells iperf that the systems have their clocks synchronized to the accuracy which the user deems as sufficient. We also produce a Little's law inP metric along with network power. See the man pages for more. The hope is to have iperf 2.0.14 released by early 2021.
[rjmcmahon#localhost iperf2-code]$ src/iperf -s -i 1
[ 4] local 192.168.1.10%enp2s0 port 5001 connected with 192.168.1.80 port 47420 (trip-times) (MSS=1448) (peer 2.0.14-alpha)
[ ID] Interval Transfer Bandwidth Reads Dist(bin=16.0K) Burst Latency avg/min/max/stdev (cnt/size) inP NetPwr
[ 4] 0.00-1.00 sec 1.09 GBytes 9.34 Gbits/sec 18733 2469:2552:2753:2456:2230:2272:1859:2142 2.988/ 0.971/ 3.668/ 0.370 ms (8908/131072) 3.34 MByte 390759.84
Note: For my testing during iperf 2 development, I have GPS disciplined oven controlled oscillators from spectracom in my systems. These cost about $2.5K each and require a GPS signal.

Redis requests done in 1 to 3 ms taking 300ms

i'm currently using a Graph Database using Redis for a Julia project.
Sometimes Redis requests are taking 300 ms to execute and i don't understand why.
I run a simple request 10.000 times (the code of the request is below) and it took me :
using Redis, BenchmarkTools
conn = RedisConnection(port=6382) Redis.execute_command(conn,["FLUSHDB"])
q = string("CREATE (:Type {nature :'Test',val:'test'})") BenchmarkTools.DEFAULT_PARAMETERS.seconds = 1000 BenchmarkTools.DEFAULT_PARAMETERS.samples = 10000
stats = #benchmark Redis.execute_command(conn,[ "GRAPH.QUERY", "GraphDetection", q])
And got this results :
BenchmarkTools.Trial: memory estimate: 3.09 KiB allocs estimate: 68
minimum time: 1.114 ms (0.00% GC)
median time: 1.249 ms (0.00% GC)
mean time: 18.623 ms (0.00% GC)
maximum time: 303.269 ms (0.00% GC)
samples: 10000 evals/sample: 1
The Huge difference between median time and mean time came from the problem i'm talking about (the request take either [1-3] ms or [300-310] ms )
I'm not familiar with Julia but please note RedisGraph report its internal execution time, I'll suggest using this report for measurement,
In addition it would be helpful to understand when (on which sample) did RedisGraph took over 100ms to process the query, usually it is the first query which causes RedisGraph to do some extra work.

Iperf: Transfer of data

I have a question in order to understand how iperf is working, I am using the following command.
What i dont understand is "How can 6945 datagrams are send?" because if 9.66 MBytes are transfered, then 9.66M/1458 = 6625 data grams should be tranfereded according to my understanding.
If 10.125MBytes (2.7Mbps * 30 sec) would have been transfered then 6944 data grams would have been send (excluding udp and other header)
Please clerify if some one knows ..
(Also I have used wireshark on both client and server and checked and there the number of packets is greater then the number of packets shown by iperf)
umar#umar-VPCEB11FM:~$ iperf -t 30 -c 192.168.3.181 -u -b 2.7m -l 1458
------------------------------------------------------------
Client connecting to 192.168.3.181, UDP port 5001
Sending 1458 byte datagrams
UDP buffer size: 208 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.3.175 port 47241 connected with 192.168.3.181 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-30.0 sec 9.66 MBytes 2.70 Mbits/sec
[ 3] Sent 6946 datagrams
[ 3] Server Report:
[ 3] 0.0-92318.4 sec 9.66 MBytes 878 bits/sec 0.760 ms 0/ 6945 (0%)
iperf uses base 2 for M and K, meaning that K = 1024 and M = 1024*1024.
When you do that math that way, you get 9.66 MB / 1458 B/d = 6947 datagrams which is within precision error (you have a max resolution of 0.01 MB which means a rounding error of 0.005 MB ~= 3.6 datagrams).