I am trying to understand latency vs maximum number or requests that can be served per second.
What I understood RTT is time taken for message to reach destination and acknowledgement back to source. So I assume server can only serve maximum requests per second should not exceed more then sum of avg round trip in a give second. My local ping test shows as
> ping 127.0.0.1
rtt min/avg/max/mdev = 0.089/0.098/0.120/0.012 ms
on average it takes 0.098 ms just for network round trip, which means 10 ping req/ms. So I assume that in sequential order a client can only execute maximum of 10_000 req/sec. while it turns out I am wrong. redis-benchmark tool shows something different.
> redis-benchmark -t set -c 1 -h 127.0.0.1
====== SET ======
100000 requests completed in 2.53 seconds
1 parallel clients
3 bytes payload
keep alive: 1
100.00% <= 1 milliseconds
39588.28 requests per second
a single client is able to execute 39 req/ms while i am expecting maximum of 10req/ms.
Can anyone help me where I went wrong or misunderstood ?
Commands can be pipelined even when using a single logical client thread, meaning: you can send lots of requests before the first response comes back. Responses always come back in request order (unless you're using pub/sub), so a pipelining client simply needs to keep a queue of sent messages that have not yet seen responses, and pair responses to requests as they arrive.
So: you aren't strictly bound by latency, although that remains a useful number. The raw throughout number (bound by bandwidth and server capacity) is also meaningful, since it is often the case that you want to issue multiple commands.
Related
I am running a Kusto Query in my Azure Diagnostics where I am querying logs of last 1 week and the query times out after 10 mins. Is there a way I can increase the timeout limits? if yes can someone please guide me the steps. I downloaded Kusto explorer but couldnt see any easy way of connecting my Azure cluster. Need help as how can i increase this timeout duration from inside Azure portal for query I am running?
It seems like 10 minutes are the max value for timeout.
https://learn.microsoft.com/en-us/azure/azure-monitor/service-limits
Query API
Category
Limit
Comments
Maximum records returned in a single query
500,000
Maximum size of data returned
~104 MB (~100 MiB)
The API returns up to 64 MB of compressed data, which translates to up to 100 MB of raw data.
Maximum query running time
10 minutes
See Timeouts for details.
Maximum request rate
200 requests per 30 seconds per Azure AD user or client IP address
See Log queries and language.
https://learn.microsoft.com/en-us/azure/azure-monitor/logs/api/timeouts
Timeouts
Query execution times can vary widely based on:
The complexity of the query
The amount of data being analyzed
The load on the system at the time of the query
The load on the workspace at the time of the query
You may want to customize the timeout for the query.
The default timeout is 3 minutes, and the maximum timeout is 10 minutes.
I am trying to track server uptimes using redis.
So the approach I have chosen is as follows:
server xyz will keep on sending my service ping indicating that it was alive and working in the last 30 seconds.
My service will store a list of all time intervals during which the server was active. This will be done by storing a list of {startTime, endTime} in redis, with key as name of the server (xyz)
Depending on a user query, I will use this list to generate server uptime metrics. Like % downtime in between times (T1, T2)
Example:
assume that the time is T currently.
at T+30, server sends a ping.
xyz:["{start:T end:T+30}"]
at T+60, server sends another ping
xyz:["{start:T end:T+30}", "{start:T+30 end:T+60}"]
and so on for all pings.
This works fine , but an issue is that over a large time period this list will get a lot of elements. To avoid this currently, on a ping, I pop the last element of the list, check if it can be merged with the latest time interval. If it can be merged, I coalesce and push a single time interval into the list. if not then 2 time intervals are pushed.
So with this my list becomes like this after step 2 : xyz:["{start:T end:T+60}"]
Some problems I see with this approach is:
the merging is being done in my service, and not redis.
incase my service is distributed, The list ordering might get corrupted due to multiple readers and writers.
Is there a more efficient/elegant way to handle this , like maybe handling merging of time intervals in redis itself ?
I am trying to call Amadeus API in parallel (/v1/shopping/hotel-offers) in the test environment. Unfortunately when I start 3 threads simultaneously, then only the very first one gets the OK response and the others get HTTP 429 Too Many Requests responses.
I have not exceeded the monthly limit quota yet, so that error is really related to the parallel execution.
Does anybody know what are the exact limits (#requests/sec or #requests in parallel) ? Is it even possible to have more than one request at a time ?
The throttling is not the same depending of the environment:
Test: 10 transactions per sec per user (10 TPS/user) -> With the constrains: not more than 1 request every 100ms.
Production: 20 transactions per sec per user (20 TPS/user) -> With the constraint: not more than 1 request every 50ms.
We are using Redis as a Queue which has on an average about ~3k rps. But when we check the instantaneous_ops_per_sec, this value consistently reports higher than expected, by about 20%, in this case, reports ~4k ops per sec.
To verify this, I have taken a dump of MONITOR for about 10 seconds and checked the number of incoming commands.
grep "1489722862." monitor_output | wc -l
Where 1489722862 is the timestamp. Even this count matches with what is being produced in the queue and what is being consumed from the queue.
This is a master-slave redis cluster setup.
Does instantaneous_ops_per_sec also account for the slave reads? If not, what is the other reason for which this count is significantly higher?
The instantaneous_ops_per_sec metric is calculated as the mean of the recent samples that the server took. The number of recent samples is hardcoded as 16 by STATS_METRIC_SAMPLES in server.h.
After enabling gzip compression in my Apache server (mod_deflate) I found consistently that end user was being served on average 200 ms slower than uncompressed responses.
This was unexpected so I modified the compression directive to ONLY compress text/HTML responses, fired up wireshark and looked at the network dump before and after compression.
Here are my observations of a GET with minimum traffic in the network
Before Compression
Transactions on the wire: 46
Total time for 46 trans: 791ms
i. TCP seq/ack: 14ms
ii. 1st data segment: 693ms
iii. Remaining: 83ms (27/28 data units transferred + tcp/ip handshakes)
After Compression
Transactions on the wire: 10
Total time for 46 trans: 926ms
i. TCP seq/ack: 14ms
ii. 1st data segment: 746ms
iii. Remaining: 165ms (5 out of 6 data units transfered)
After the compression was set it is clear and understandable that the number of transactions on the wire are significantly lower than uncompressed.
However, the compressed data unit took much more longer time to transfer from source to destination.
It appears that the additional work of compression is understandably taking time but can not understand why each data sent was significantly slower when compressed.
My understanding of the compression process is:
1. GET Request is received by Apache
2. Apache identifies resource
3. Compress the resource
4. Respond with compressed response
With this scheme, I would assume that 3rd step is (the step before the very first segment of the response would take a longer time since we are -- compressing + responding -- but the rest of the chunks I assumed should take on average equal time as the uncompressed chunks but they are not.
Can anyone tell me why... or suggest a better way to analyze this scenario. Also does anyone have a before and after comparison... I would appreciate any feedback/comments/questions
I was using insufficient test to compare the two scenarios (i think less than 100 resources). With sufficient tests -- more than 6000 urls, it showed that the compressed response time to first byte was faster by 200 milliseconds in serving text/html, where as TTLB was faster by 25 milliseconds on the average.
I haven't load tested this which I plan to do and update this answer.