Rate Limit Pattern with Redis - Accuracy - redis

Background
I have an application that send HTTP request to foreign servers. The application communicating with other services with strict rate limit policy. For example, 5 calls per second. Any call above the allowed rate will get 429 error code.
The application is deployed in the cloud and run by multiple instances. The tasks are coming from shared queue.
The allowed rate limit synced by Redis Rate Limit pattern.
My current implementation
Assuming that the rate limit is 5 per second: I split the time into multiple "window". Each window has maximum rate of 5. Before each call I checking if the counter is less then 5. If yes, fire the request. If no, wait for the next window (after a second).
The problem
In order to sync the application around the Redis, I need to Redis calls: INCR and EXPR. Let's say that each call can take around 250ms to be returned. So we have checking time of ~500ms. Having said that, in some cases you will check for old window because until you will get the answer the current second has been changed. In case that on the next second we will have another 5 quick calls - it will lead to 429 from the server.
Question
As you can see, this pattern not really ensuring that the rate of my application will be up to 5 calls\second.
How do you recommend to do it right?

Related

Jmeter : how to get large number of rps in jmeter

I'm testing a web app using jmeter for load test and I getting a hard time on how can I set properly how many threads, ramp-up and loops will I use in order to get a large number of rps. Anyway, I want to check if my server can keep up to 500rps. Does anyone here can help me how can I set it properly. Thanks.
The number of requests per unit of time is called Throughput and mainly depends on two factors:
Number of active threads
Your application response time
The first one is obvious - more threads -> more requests per second. However JMeter will wait for response from the previous thread before starting the next request so application response time matters as well.
So the recommendations are:
Set number of threads in the Thread Group to the number of anticipated users of your system.
Set ramp-up period accordingly to the number of threads so the load will increase (and decrease) gradually, this way you will be able to correlate increasing/decreasing load with the changing response time and throughput
Instead of loops it might be a better idea to set desired test duration using Scheduler section of the Thread Group.
Run your test and observe the actual throughput using i.e. Server Hits Per Second listener or Transactions per second chart of the HTML Reporting Dashboard. If it matches your expectations - you are done, if not - you will need to increase the number of virtual users.
You can use ConcurrencyThreadGroup plugin , Specifically see how to Produce Desired RPS:
Threads pool size can be calculated like RPS * <max response time> / 1000. The more rate desired the more threads you will need. The more response time service have the more threads you will need.
For example, if your service response time may be 2.5sec and target
rps is 1230, you have to have 1230 * 2500 / 1000 = 3075 threads.

Marketo API - Maximum of 10 concurrent API calls

I'd like to know what Marketo means by 10 concurrent API calls. If for example 20 people use an API in the same time, is it going to crash ? And if I make the script sleep for X seconds if I get that limit response and try the API call again, will it work ?
Thanks,
Best Regards,
Martin
Maximum of 10 concurrent API calls means, that Marketo will process only 10 simultaneous API requests per subscription at maximum.
So, for example if you have a service that directly queries the API every time it is used, and this very service gets called 11 or more times in the same time, than Marketo will respond with an error message for the eleventh call and the rest. The first 10 calls should be processed fine. According to the docs, the error message the following requests will receive will have an error code of 615.
If your script is single threaded (like standard PHP) and you have more that 10 API calls, and your script is running in one instance, than you are fine, since the calls are performed one after another (so they are not concurrent). However, if your script can run in multiple instance you can hit the limit easily. In case a sleep won't help you, but you can always check the response code in your script and retry the call if it received an error. This retry process is often called Exponential Backoff. Here is a great article on this topic.

Sending HTTP Request at precise time intervals with JMeter

I'm using JMeter to test an Apache2 server I configured. I'd like to test whether the server can handle 200 HTTP requests coming altogether every second, repeatedly for a high number of seconds (like 1 minute, or even more). I read the JMeter documentation, but it struggled a little bit in understanding the Timers functioning. I configured the test with
- Numbers of Threads 200
- Ramp-up period 1
- Loop Count 100
Now, as far as I understood and noticed, the behavior of JMeter is to try to raise the 200 threads in 1 second, and then perform 200*100=20000 requests as fast as possible (or at least this is the behavior I'm experiencing on my server), in chunks of 200 requests per time. This means that the server might (it actually does) receive more than 200 requests/second. The behavior I'd like to reproduce is instead to have 200 requests exactly every second. I don't care if they are coming all together at the beginning of the second, or they come in a randomized way, distributed the second window (one every 5 milliseconds, or whatever). So I tried some Timers, but without success. I tried:
Constant Timer with a Thread Delay of 5 milliseconds. Doing the math, it should send a request every 5 milliseconds, and being 200 Threads, it should send 200 requests/second (200*5 = 1000ms).
Constant Throughput Timer with a target throughput of 12000.0. Maybe I'm wrong here, but this should be samples per minute, so 200 requests per 60 seconds are 200*20 = 12000 (if a sample is a request). I did not understood the "Calculate Throughput based on" option, and I tried both "this thread only" (which one?) and "all active threads".
Anyway, none of this configuration is acting as I need.
You can achieve this by using Constant Throughput Timer.
Constant Throughput Timer can only pause the threads to reach specified "Target Throughput" value so make sure you provide enough virtual users (threads) to generate desired "requests per minute" value.
So, to get 200 requests/sec you have to consider below things:
Make sure that you have Enough number of virtual users (Threads) in your Thread Group.
Throughput Timer is quite accurate on "minute" level, you need to "wait" for it to start working as expected for 60 seconds. Make sure that you have Enough Durations.
Use the Constant Throughput Timer at the test plan level.
Use "Calculate Throughput based on" value as "All active threads".
Also, Remember that other elements (for example, other timers, the number of specified threads, and so on) within the test plan can affect attaining the desired throughput.
So, Here is a technique that you can follow:
First, use the below configurations and observe the throughput results.
Numbers of Threads : 200
Ramp-up period : 60 seconds
Loop Count : Check "Forever".
Duration (seconds) : 360
If it's not as expected (lower than expected), then increase the Number of threads gradually and observe the throughput results increased or not.
You have to increase the number of threads until you get your desired throughput.
By doing this, if you can't get your desired throughput (200 Request/sec) then your application can not serve more requests per second than 200.

Apigee SpikeArrest Sync Across MessageProcessors (MPs)

Our organisation is currently migrating to Apigee.
I currently have a problem very similar to this one, but due to the fact that I am a Stack Overflow newcomer and have low reputation I couldn't comment on it: Apigee - SpikeArrest behavior
So, in our organisation we have 6 MessageProcessors (MP) and I assume they are working in a strictly round-robin manner.
Please see this config (It is applied to the TARGET ENDPOINT of the ApiProxy):
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<SpikeArrest async="false" continueOnError="false" enabled="true" name="spikearrest-1">
<DisplayName>SpikeArrest-1</DisplayName>
<FaultRules/>
<Properties/>
<Identifier ref="request.header.some-header-name"/>
<MessageWeight ref="request.header.weight"/>
<Rate>3pm</Rate>
</SpikeArrest>
I have a rate of 3pm, which means 1 hit each 20sec, calculated according to ApigeeDoc1.
The problem is that instead of 1 successful hit every 20sec I get 6 successful ones in the range of 20sec and then the SpikeArrest error, meaning it hit once each MP in a round robin manner.
This means I get 6 hit per 20 sec to my api backend instead of the desired 1 hit per 20sec.
Is there any way to sync the spikearrests across the MPs?
ConcurrentRatelimit doesn't seem to help.
SpikeArrest has no ability to be distributed across message processors. It is generally used for stopping large bursts of traffic, not controlling traffic at the levels you are suggesting (3 calls per minute). You generally put it in the Proxy Request Preflow and abort if the traffic is too high.
The closest you can get to 3 per minute using SpikeArrest with your round robin message processors is 1 per minute, which would result in 6 calls per minute. You can only specify SpikeArrests as "n per second" or "n per minute", which does get converted to "1 per 1/n second" or "1 per 1/n minute" as you mentioned above.
Do you really only support one call every 20 seconds on your backend? If you are trying to support one call every 20 seconds per user or app, then I suggest you try to accomplish this using the Quota policy. Quotas can share a counter across all message processors. You could also use quotas with all traffic (instead of per user or per app) by specifying a quota identifier that is a constant. You could allow 3 per minute, but they could all come in at the same time during that minute.
If you are just trying to protect against overtaxing your backend, the ConcurrentRateLimit policy is often used.
The last solution is to implement some custom code.
Update to address further questions:
Restating:
6 message processors handled round robin
want 4 apps to each be allowed 5 calls per second
want the rest of the apps to share 10 calls per second
To get the kind of granularity you are looking for, you'll need to use quotas. Unfortunately you can't set a quota to have a "per second" value on a distributed quota (distributed quota shares the count among message processors rather than having each message processor have its own counter). The best you can do is per minute, which in your case would be 300 calls per minute. Otherwise you can use a non-distributed quota (dividing the quota between the 6 message processors), but the issue you'll have there is that calls that land on some MPs will be rejected while others will be accepted, which can be confusing to your developers.
For distributed quotas you'd set the 300 calls per minute in an API Product (see the docs), and assign that product to your four apps. Then, in your code, if that product is not assigned for the current API call's app, you'd use a quota that is hardcoded to 10 per second (600 per minute) and use a constant identifier rather than the client_id, so that all other traffic uses that quota.
Quotas don't keep you from submitting all your requests nearly simultaneously, and I'm assuming your backend can't handle 1200+ requests all at the same time. You'll need to smooth the traffic using a SpikeArrest policy. You'll want to allow the maximum traffic through the SpikeArrest that your backend can handle. This will help protect against traffic spikes, but you'll probably get some traffic rejected that would normally be allowed by the Quota. The SpikeArrest policy should be checked before the Quota, so that rejected traffic is not counted against the app's quota.
As you can probably see, configuring for situations like yours is more of an art than a science. My suggestion would be to do significant performance/load testing, and tune it until you find the correct values. If you can figure out how to use non-distributed quotas to get acceptable performance and predictability, that will let you work with per second numbers instead of per minute numbers, which will probably make massive spikes less likely.
Good luck!
Unlike Quota limits, the Spike Arrest cannot be synchronized across MP.
But, as you're setting them on a per minute level, you could use Quota Policy instead -- then set it to Distributed and Synchronized and it will coordinate across MP.
Keep in mind there will always be some latency on the synchronization across machines so it will never be a completely precise number.

Limiting a queue over time

I'm using an API that is limited for usage, let's say: no more than 10 calls per second, and no more than 5000 calls per day.
I am handling this calls in a beanstalkd queue process job. How can I limit the processing of this jobs, having in mind the API's limits.
When you use Beanstalkd you can have the tube paused for a certain seconds.
When you reserve a job, and you know the API call failed during that call, you get to pause the tube for X seconds.
You can find out the time needed to pause the tube, either from your API response (usually they return you are locked until Time X), or start with something adaptive like pause for the next 60 seconds, and increase/decrease on the go.
If you know you can delay, or disperse in advance, before placing the jobs into your queues, you can also add a delay to the job, so it won't execute immediately, this way you can have your jobs distributed over time.
Also there is a great post about distributed rate limiting using redis
If all workers share durable state, they can update shared status and collective implement rate limiting.
If the only shared writable state is the queue itself, you could create ticketing tubes for the rate limited jobs, and have a rate limit manager insert tickets (permission slips) to control when the jobs get run. Would need changes to the workers, would need a way to time out unused tickets, but should be workable.
Edit: a "valid until" timestamp in the ticket might do it for per-second limits. Per-day limits might need a feedback tube back to let the rate limit manager know about actual usage (to implement a rolling 24 hour window instead of the 5000 all getting reset at midnight)