Centralized Rate limiting in Kong using Redis - redis

I'm working on implementing centralized rate limiting in kong using redis, the problem I'm facing is kong fetches the counter from redis first then checks if the rate limit is reached if not then it will increment the counter. Now suppose I have 3 kong nodes sitting after LB and let us assume value for counter key in redis is 5,following this now if a request comes at LB(assuming Round Robin) and this will be routed to first kong node which fetches the value 5 and suppose if another request comes at LB and is being forwarded to kong node 2 and will fetch the same value of 5 before the first node updates the counter. How will I overcome this Issue?
I thought about incrementing the counter and then fetch the value but I think it will add additional latency to the response.
Please do correct me if my thinking is wrong.
Any small help is appreciated.

Related

Rate Limit Pattern with Redis - Accuracy

Background
I have an application that send HTTP request to foreign servers. The application communicating with other services with strict rate limit policy. For example, 5 calls per second. Any call above the allowed rate will get 429 error code.
The application is deployed in the cloud and run by multiple instances. The tasks are coming from shared queue.
The allowed rate limit synced by Redis Rate Limit pattern.
My current implementation
Assuming that the rate limit is 5 per second: I split the time into multiple "window". Each window has maximum rate of 5. Before each call I checking if the counter is less then 5. If yes, fire the request. If no, wait for the next window (after a second).
The problem
In order to sync the application around the Redis, I need to Redis calls: INCR and EXPR. Let's say that each call can take around 250ms to be returned. So we have checking time of ~500ms. Having said that, in some cases you will check for old window because until you will get the answer the current second has been changed. In case that on the next second we will have another 5 quick calls - it will lead to 429 from the server.
Question
As you can see, this pattern not really ensuring that the rate of my application will be up to 5 calls\second.
How do you recommend to do it right?

Kong 0.10.3 KONG_RESPONSE_LATENCY header returns 0 and latency values don't match the documentation

I am using Kong 0.10.3 and it seems that the "latencies" object being logged by kong using the file logging plugin and the LATENCY headers in the response have erroneous values.
Based on the Kong documentation, the "request" latency is overall latency, first byte in and last byte out, "proxy" is processing time of the upstream API and "Kong" latency is the time for Kong to execute plugins on the request/response.
My issue is that the kong latency is frequently reported as 0 AND kong+proxy latency typically equals the request latency. Based on the documentation, I would think there would be a difference to account for transfer of the request/response payload.
I am trying to figure out if my API clients are slow, but these values returned seem to be faulty and not helping at all.
In this example, my request had a 6.6MB payload and Kong logged these latencies.
if the proxy took 9648ms to do it's work, all I am left with is 38ms which is the Kong latency, and no remainder to account for the data transfer time.
"latencies": {
"request": 9686,
"proxy": 9648,
"kong": 38
}
Am I missing something or is this a Kong issue?
Does the same issue exist in the latest Kong Community Edition version 0.13?

Apigee SpikeArrest Sync Across MessageProcessors (MPs)

Our organisation is currently migrating to Apigee.
I currently have a problem very similar to this one, but due to the fact that I am a Stack Overflow newcomer and have low reputation I couldn't comment on it: Apigee - SpikeArrest behavior
So, in our organisation we have 6 MessageProcessors (MP) and I assume they are working in a strictly round-robin manner.
Please see this config (It is applied to the TARGET ENDPOINT of the ApiProxy):
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<SpikeArrest async="false" continueOnError="false" enabled="true" name="spikearrest-1">
<DisplayName>SpikeArrest-1</DisplayName>
<FaultRules/>
<Properties/>
<Identifier ref="request.header.some-header-name"/>
<MessageWeight ref="request.header.weight"/>
<Rate>3pm</Rate>
</SpikeArrest>
I have a rate of 3pm, which means 1 hit each 20sec, calculated according to ApigeeDoc1.
The problem is that instead of 1 successful hit every 20sec I get 6 successful ones in the range of 20sec and then the SpikeArrest error, meaning it hit once each MP in a round robin manner.
This means I get 6 hit per 20 sec to my api backend instead of the desired 1 hit per 20sec.
Is there any way to sync the spikearrests across the MPs?
ConcurrentRatelimit doesn't seem to help.
SpikeArrest has no ability to be distributed across message processors. It is generally used for stopping large bursts of traffic, not controlling traffic at the levels you are suggesting (3 calls per minute). You generally put it in the Proxy Request Preflow and abort if the traffic is too high.
The closest you can get to 3 per minute using SpikeArrest with your round robin message processors is 1 per minute, which would result in 6 calls per minute. You can only specify SpikeArrests as "n per second" or "n per minute", which does get converted to "1 per 1/n second" or "1 per 1/n minute" as you mentioned above.
Do you really only support one call every 20 seconds on your backend? If you are trying to support one call every 20 seconds per user or app, then I suggest you try to accomplish this using the Quota policy. Quotas can share a counter across all message processors. You could also use quotas with all traffic (instead of per user or per app) by specifying a quota identifier that is a constant. You could allow 3 per minute, but they could all come in at the same time during that minute.
If you are just trying to protect against overtaxing your backend, the ConcurrentRateLimit policy is often used.
The last solution is to implement some custom code.
Update to address further questions:
Restating:
6 message processors handled round robin
want 4 apps to each be allowed 5 calls per second
want the rest of the apps to share 10 calls per second
To get the kind of granularity you are looking for, you'll need to use quotas. Unfortunately you can't set a quota to have a "per second" value on a distributed quota (distributed quota shares the count among message processors rather than having each message processor have its own counter). The best you can do is per minute, which in your case would be 300 calls per minute. Otherwise you can use a non-distributed quota (dividing the quota between the 6 message processors), but the issue you'll have there is that calls that land on some MPs will be rejected while others will be accepted, which can be confusing to your developers.
For distributed quotas you'd set the 300 calls per minute in an API Product (see the docs), and assign that product to your four apps. Then, in your code, if that product is not assigned for the current API call's app, you'd use a quota that is hardcoded to 10 per second (600 per minute) and use a constant identifier rather than the client_id, so that all other traffic uses that quota.
Quotas don't keep you from submitting all your requests nearly simultaneously, and I'm assuming your backend can't handle 1200+ requests all at the same time. You'll need to smooth the traffic using a SpikeArrest policy. You'll want to allow the maximum traffic through the SpikeArrest that your backend can handle. This will help protect against traffic spikes, but you'll probably get some traffic rejected that would normally be allowed by the Quota. The SpikeArrest policy should be checked before the Quota, so that rejected traffic is not counted against the app's quota.
As you can probably see, configuring for situations like yours is more of an art than a science. My suggestion would be to do significant performance/load testing, and tune it until you find the correct values. If you can figure out how to use non-distributed quotas to get acceptable performance and predictability, that will let you work with per second numbers instead of per minute numbers, which will probably make massive spikes less likely.
Good luck!
Unlike Quota limits, the Spike Arrest cannot be synchronized across MP.
But, as you're setting them on a per minute level, you could use Quota Policy instead -- then set it to Distributed and Synchronized and it will coordinate across MP.
Keep in mind there will always be some latency on the synchronization across machines so it will never be a completely precise number.

Use Redis to track concurrent outbound HTTP requests

I'm a little new to Redis, but I'd like to see if it can be used to keep track of how many concurrent HTTP connections I'm making.
Here's the high level plan:
INCR requests
// request begins
HTTP.get(...)
// request ends
DECR.requests
Then at any point, just call GET requests to see how many are currently open.
The ultimate goal here is to throttle my http requests to stay below some arbitrary amount, say 50 requests/s.
Is this the right way to do it? Are there any pitfalls?
As for pitfalls, the only one I can see is that a server that goes down or loses connection to Redis mid-request may never call DECR.
Since you don't know which server does which request, you can never reset the count to the correct value without bringing the system to a halt and reset to 0.
I'm not clear what you'd gain by using redis in this situation. It seems to me it would be more suitable to use just a global variable in your server. If your server goes down, so does your counter, so you don't have to put complicated things in place to deal with disconnection, inconsistencies, etc...

how to scale a WCF service in such a scenario

I have an app which can track vehicles. Vehicles can change location, appear or disappear at any time. In order to be always up to date, every 3 seconds the app sends to the server the region that is currently visible on the map and the server responds with a list of vehicles which are in the specific area.
Problem: What happens when I have a database of 1000 vehicles and 10000 requests being sent to the server every 3 seconds? How would you solve this scalability issue with WCF?
There are a couple a thing to do
On Client-Side
As Joachim said, try to limit request from client-side. I am not sure that vehicule will move significally every 3 seconds. Eventually, try to combine positions and others informations in a batch.
On Server-Side
Problem: What happens when I have a database of 1000 vehicles and
10000 requests being sent to the server every 3 seconds? How would you
solve this scalability issue with WCF?
The best way to answer this question is to do a load test. The results are very depending on your service implementation. If your request takes more than 1 sec, you will certainly have performance problems.
Your can also add a queue behind your service for handling request, and even deploy your service on many servers in order to dispatch requests between different servers.