API Gateway High Availability - api

If API gateway fails (single entry point to the system), then unable to access all the services. Any HA(High Availability) design to handle API gateway failure?

1) As per your project location, you can choose one more region as your disaster recovery plan. When ever something fails in one region then immediately you can switch to another region by just changing the end point.
2) You can use services like route53 to divide your traffic between two regions or two api gateways. That way you will save atleast part of your traffic flowing even if one apig fails.
3) Always keep cloudwatch alarms to get notification about any failures in your system.
4) It is very unlikely that a api gateway will fail. It is AWS my friend.

"node_saini" has a great response and it's correct. I tried to comment but don't have the reputation to do so yet... the comment would say:
5) Configure your timeout to fail ASAP based on baselines and implement retries with exponential backoff on 5xx errors to alleviate any small percentage of failures which may occur.
With all applications, temporary failures are expected but permanent failures after retry can be a sign of a real problem brewing.

Related

Divert traffic to two versions based on error response on the traffic with istio

I am trying to learn istio. I was able to setup a simple traffic shifting in which 40 percent traffic goes to a particular version and the remaining 60 percent to other version. My doubt is can I make this weight(40-60) dynamic, based on
Percentage of error response from both the versions. The version with less error response faces more traffic and eventually 100 percent.
Or atleast, change with time, example 2 percent shift every hour.
Also, would this require me to do kubectl apply again and again.
For the first part, there are no features to make a VirtualService route based on error rates in Istio. The routing is based on the number of requests coming through the VirtualService.
Secondly, given that Kubernetes objects are persistent entities and the Istio's CRDs controlling the weight-based routing have its settings defined in the spec of the object (of which state must be provided by the user), it's unexpected that this configuration would change dynamically.
For your scenario, I would say that deploying a new version without knowing if it will error more than the previous one, and expecting them to error enough to decide which is to persist may not be the best approach.
I'd recommend using traffic mirroring for testing the production traffic in the new version, and from that, determinig if is worth deploying it using any existing/supported deployment strategy.

ISO-8583 message processing(defining priority of messages)

I need to get an understanding of ISO-8583 message platform,lets say i want to perform a authorization of a card transaction,so in real time at a particular instance lets say i got 100000 requests from network(VISA/MASTERCARD) all for authorization,how do i define priority of there request and the response,can the connection pool handle it(in my case its HIKARI),how is it done banks/financial institutions for authorizing a request.Please provide me some insights on how to manage all these requests.Should i go for a MQ?
Tech used are:-spring boot,hibernate,spring-tcp-starter
Your question doesn't seem to be very well researched as there are a ton of switch platforms out there that due this today and many of their technology guides can be found on the web including for major vendors like ACI, FIS, AJB,.. etc if you look yard enough.
I have worked with several iso-interface specifications, commercial switches, and home grown platforms and it is actually pretty consistent in how they do the core realtime processing.
This information on prioritization is generally in each ISO-8583 message processing specification and is made explicitly clear in almost every specification I've ever read written by someone who is familar with ISO-8533 and not just making up their own variant or copying someone elses.
That said.. in general at a high level authorizations / financials (0100, 0200) requests always have high priority than force posts (0x20) messages.
Administrative messages in the 05xx and 06xx and 08xx sometimes also get bumped up above other advices.. but these are still advices and almost always auths/financials are always processed first as they A) Impact the customer B) have much tighter timers than any advice by usually more than double or more.
Most switches I have seen do it entirely in memory without going to MQ and or some other disk for core authorization process to manage these.. but not to say there is not some sort of home grown middle ware sometimes involved.. but non-realtime processes regularly use a MQ process to queue or disk queuing these up into processes not in-line of the approval for this Store-and-forward (SAF) processing.. but many of these still use memory only processing for the front of their queue.
It is important to also differentiate between 100000 requests and 100000 transactions.. the various exchanges both internal and external make a big difference in the number of actual requests/responses in flight at even given time.. a basic transaction can be accomplished in like two messages.. but some of the more complex ones can easily exceed 20 messages just for a pre-authorization or a completion component.
If you are dealing with largely batch transaction bursts.. I can see the challenge of queuing but almost every application I have seen has a max in flight for advices and requests separate of each other.. and sometimes even with different timers.. and the apps pumping the transactions almost always wait for the response back before sending more.. and this tends to work fine for just about everyone.. including big posting batches from retailers and card networks. So if your app doesn't have them.. you probably need to add them.
In fact your 100000 requests should be sorted by (Terminal ID and/or Merchant ID) + (timestamp/local timestamp) + (STAN and/or RRN).
Duplicated transaction requests expected to be rejected.
If you simulating multiple requests from single terminal (or host) with same test card details the increasing of STAN/RRN would be a case.
Please refer to previous answers about STAN and RRN ISO 8583 fields.
In ISO message, what's the use of stan and rrn ?

API Traffic Shaping/Throttling Strategies For Tenant Isolation

I'll start my question by providing some context about what we're doing and the problems we're facing.
We are currently building a SaaS (hosted on Amazon AWS) that consists of several microservices that sit behind an API gateway (we're using Kong).
The gateway handles authentication (through consumers with API keys) and exposes the APIs of these microservices that I mentioned, all of which are stateless (there are no sessions, cookies or similar).
Each service is deployed using ECS services (one or more docker containers per service running on one or more EC2 machines) and load balanced using the Amazon Application Load Balancer (ALB).
All tenants (clients) share the same environment, that is, the very same machines and resources. Given our business model, we expect to have few but "big" tenants (at first).
Most of the requests to these services translate in heavy resource usage (CPU mainly) for the duration of the request. The time needed to serve one request is in the range of 2-10 seconds (and not ms like traditional "web-like" applications). This means we serve relatively few requests per minute where each one of them take a while to process (background or batch processing is not an option).
Right now, we don't have a strategy to limit or throttle the amount of requests that a tenant can make on a given period of time. Taken into account the last two considerations from above, it's easy to see this is a problem, since it's almost trivial for a tenant to make more requests than we can handle, causing a degradation on the quality of service (even for other tenants because of the shared resources approach).
We're thinking of strategies to limit/throttle or in general prepare the system to "isolate" tenants, so one tenant can not degrade the performance for others by making more requests than we can handle:
Rate limiting: Define a maximum requests/m that a tenant can make. If more requests arrive, drop them. Kong even has a plugin for it. Sadly, we use a "pay-per-request" pricing model and business do not allow us to use this strategy because we want to serve as many requests as possible in order to get paid for them. If excess requests take more time for a tenant that's fine.
Tenant isolation: Create an isolated environment for each tenant. This one has been discarded too, as it makes maintenance harder and leads to lower resource usage and higher costs.
Auto-scaling: Bring up more machines to absorb bursts. In our experience, Amazon ECS is not very fast at doing this and by the time these new machines are ready it's possibly too late.
Request "throttling": Using algorithms like Leaky Bucket or Token Bucket at the API gateway level to ensure that requests hit the services at a rate we know we can handle.
Right now, we're inclined to take option 4. We want to implement the request throttling (traffic shaping) in such a way that all requests made within a previously agreed rate with the tenant (enforced by contract) would be passed along to the services without delay. Since we know in advance how many requests per minute each tenant is gonna be making (estimated at least) we can size our infrastructure accordingly (plus a safety margin).
If a burst arrives, the excess requests would be queued (up to a limit) and then released at a fixed rate (using the leaky bucket or similar algorithm). This would ensure that a tenant can not impact the performance of other tenants, since requests will hit the services at a predefined rate. Ideally, the allowed request rate would be "dynamic" in such a way that a tenant can use some of the "requests per minute" of other tenants that are not using them (within safety limits). I believe this is called the "Dynamic Rate Leaky Bucket" algorithm. The goal is to maximize resource usage.
My questions are:
Is the proposed strategy a viable one? Do you know of any other viable strategies for this use case?
Is there an open-source, commercial or SaaS service that can provide this traffic shaping capabilities? As far as I know Kong or Tyk do not support anything like this, so... Is there any other API gateway that does?
In case Kong does not support this, How hard it is to implement something like what I've described as a plugin? We have to take into account that it would need some shared state (using Redis for example) as we're using multiple Kong instances (for load balancing and high availability).
Thank you very much,
Mikel.
Managing request queue on Gateway side is indeed tricky thing, and probably the main reason why it is not implemented in this Gateways, is that it is really hard to do right. You need to handle all the distributed system cases, and in addition, it hard makes it "safe", because "slow" clients quickly consume machine resources.
Such pattern usually offloaded to client libraries, so when client hits rate limit status code, it uses smth like exponential backoff technique to retry requests. It is way easier to scale and implement.
Can't say for Kong, but Tyk, in this case, provides two basic numbers you can control, quota - maximum number of requests client can make in given period of time, and rate limits - safety protection. You can set rate limit 1) per "policy", e.g for group of consumers (for example if you have multiple tiers of your service, with different allowed usage/rate limits), 2) per individual key 3) Globally for API (works together with key rate limits). So for example, you can set some moderate client rate limits, and cap total limit with global API setting.
If you want fully dynamic scheme, and re-calculate limits based on cluster load, it should be possible. You will need to write and run this scheduler somewhere, from time to time it will perform re-calculation, based on current total usage (which Tyk calculate for you, and you get it from Redis) and will talk with Tyk API, by iterating through all keys (or policies) and dynamically updating their rate limits.
Hope it make sense :)

ZMQ device queue does not load balance properly

I know that ZMQ offers all of the flexibility to do your own load-balancing. However I would expect the out-of-the-box broker, about 4 lines of code using the line
zmq_device (ZMQ_QUEUE, frontend, backend);
to load balance quite well as the documentation says it does load balance.
ZMQ_QUEUE creates a shared queue that collects requests from a set of clients, and distributes these fairly among a set of services. Requests are fair-queued from frontend connections and load-balanced between backend connections. Replies automatically return to the client that made the original request.
I have an army of back-end services and yet find that often my front-end clients have to wait several seconds for something that takes < 1/10 of a second in a 1:1 setting (there are same # of client and service machines). I suspect that ZMQ is not load-balancing properly out of the box - it's sending too many requests to the same service even though it doesn't have bandwidth, etc.
I think this is partly because the services are multithreaded in a way that lets them take up to 10 concurrent requests yet it slows down greatly at near the 10th request even though it can still accept them. Random distribution would be ideal. Is there an out-of-the-box way to do this or can it be done in a few lines of code, or do I have to write my own broker from scratch?
Fwiw issue was the workers were taking on work when they didn't have room for it, issue was not in ZMQ layer per se.

Is a status method necessary for an API?

I am building an API and I was wondering is it worth having a method in an API that returns the status of the API whether its alive or not?
Or is this pointless, and its the API users job to be able to just make a call to the method that they need and if it doesn't return anything due to network issues they handle it as needed?
I think it's quite useful to have a status returned. On the one hand, you can provide more statuses than 'alive' or not and make your API more poweful, and on the other hand, it's more useful for the user, since you can tell him exactly what's going on (e.g. 'maintainance').
But if your WebService isn't available at all due to network issues, then, of course, it's up to the user to catch that exception. But that's not the point, I guess, and it's not something you could control with your API.
It's useless.
The information it returns is completely out of date the moment it is returned to you because the service may fail right after the status return call is dispatched.
Also, if you are load balancing the incoming requests and your status request gets routed to a failing node, the reply (or lack thereof) would look to the client like a problem with the whole API service. In the meantime, all the other nodes could be happily servicing requests. Now your client will think that the whole API service is down but subsequent requests would work just fine (assuming your load balancer would remove the failed node or restart it).
HTTP status codes returned from your application's requests are the correct way of indicating availability. Your clients of course have to be coded to tolerate and handle them.
What is wrong with standard HTTP response status codes? 503 Service Unavailable comes to mind. HTTP clients should already be able to handle that without writing any code special to your API.
Now, if the service is likely to be unavailable frequently and it is expensive for the client to discover that but cheap for the server, then it might be appropriate to have a separate 'health check' URL that can quickly let the client know that the service is available (at the time of the GET on the health check URL).
It is not necessary most of the time. At least when it returns simple true or false. It just makes client code more complicated because it has to call one more method. Even if your client received active=true from service, next useful call may still fail. Let you client make the calls that they need during normal execution and have them handle network, timeout and HTTP errors correctly. Very useful pattern for such cases is called Circuit Breaker.
The reasons where status check may be useful:
If all the normal calls are considered to be expensive there may be an advantage in first calling lightweight status-check method (just to avoid expensive call).
Service can have different statuses and client can change its behavior depending on these statuses.
It might also be worth looking into stateful protocols like XMPP.