Divert traffic to two versions based on error response on the traffic with istio - dynamic

I am trying to learn istio. I was able to setup a simple traffic shifting in which 40 percent traffic goes to a particular version and the remaining 60 percent to other version. My doubt is can I make this weight(40-60) dynamic, based on
Percentage of error response from both the versions. The version with less error response faces more traffic and eventually 100 percent.
Or atleast, change with time, example 2 percent shift every hour.
Also, would this require me to do kubectl apply again and again.

For the first part, there are no features to make a VirtualService route based on error rates in Istio. The routing is based on the number of requests coming through the VirtualService.
Secondly, given that Kubernetes objects are persistent entities and the Istio's CRDs controlling the weight-based routing have its settings defined in the spec of the object (of which state must be provided by the user), it's unexpected that this configuration would change dynamically.
For your scenario, I would say that deploying a new version without knowing if it will error more than the previous one, and expecting them to error enough to decide which is to persist may not be the best approach.
I'd recommend using traffic mirroring for testing the production traffic in the new version, and from that, determinig if is worth deploying it using any existing/supported deployment strategy.

Related

Drone management database design

Overview
I’m currently building a prototype to track and control a fleet of drones.
The prototype consists of a service and a web app. In the web app, the location of each drone is displayed in real-time on a map and the user can issue basic commands to each of these drones.
The service is automated and can also issue commands to each of the drones at random times when certain conditions occur.
I am using HiveMQ (an MQTT broker) to facilitate communication between drones, the web app and the service. The web app and the service are both subscribed to the 'telemetry' topic to receive real-time data about the network of drones. The broker will store the telemetry data for each drone directly into a database through the use of HiveMQ's extension functionality.
Specific commands can only be executed if certain criteria are met.
For example: To issue an 'execute mission' command to a drone the service or the web app will make a call to an API. The API will:
Check the drone is not currently on a mission (drone status value must be idle)
Check weather conditions are acceptable in the area the mission is to occur
(Note by 'mission' I mean a drone fly's to a series of set locations autonomously).
If conditions aren't met a response indicating this will be returned to the requester (web app or service). If conditions are met the API will issue the command to the appropriate drone via the MQTT broker and send a response to the requester.
Requirements
I need a storage mechanism that meets the following criteria:
I need to ensure that a race condition does not occur between the web app and the service. That is if a request to issue a command to a drone is being made by the web app, a request made by the service in this time should be automatically rejected.
Drone status between the service and the web app are not synchronous, as a result, they need a synchronized point to check a drones status.
Drones will update their status every second, and API call's to issue commands will be made every 10 - 30 seconds. There will be 5 drones in this prototype but I would like a solution that can scale to 50 drones.
Considered Solution
My solution would be that of a relational database - using a separate table with a 'request_lock' field, this field uses a row-level lock.
When an API call is made it checks if this field is true, if true the request is rejected. If it is false it sets the field to true performs the necessary condition checks and then sets the 'request_lock' field to false when once the command has reached the drone.
I am concerned the status update frequency from each drone does not fit a relational database model and won't scale well. Am I on the right track, or should I be looking to include a NoSQL database in some way to handle status updates?
Thank you to anyone who takes the time to answer.
There are a lot of questions here, so I'll try to pick what seems to be most important:
I am concerned the status update frequency from each drone does not fit a relational database model ..
Should I use a relational or non-relational database?
First, let's calculate the maximum number of drone status updates, per second.
Drones will update their status every second, and API call's [sic] to issue commands will be made every 10 - 30 seconds. There will be 5 drones in this prototype but I would like a solution that can scale to 50 drones.
50 drones * 1 drone-update per second = 50 drone-updates per second
50 drones * (10 / 60) drone-commands per second = 8.3 drone-commands per second
So, can a relational database handle ~60 queries per second?
Yes. Assuming reasonable query complexity, this is within the ability of a traditional relational database. I would not expect the database to need extraordinary system resources, either.
If you'd like to confirm this level of performance with a benchmark, I'd recommend a tool like pgbench.

API Gateway High Availability

If API gateway fails (single entry point to the system), then unable to access all the services. Any HA(High Availability) design to handle API gateway failure?
1) As per your project location, you can choose one more region as your disaster recovery plan. When ever something fails in one region then immediately you can switch to another region by just changing the end point.
2) You can use services like route53 to divide your traffic between two regions or two api gateways. That way you will save atleast part of your traffic flowing even if one apig fails.
3) Always keep cloudwatch alarms to get notification about any failures in your system.
4) It is very unlikely that a api gateway will fail. It is AWS my friend.
"node_saini" has a great response and it's correct. I tried to comment but don't have the reputation to do so yet... the comment would say:
5) Configure your timeout to fail ASAP based on baselines and implement retries with exponential backoff on 5xx errors to alleviate any small percentage of failures which may occur.
With all applications, temporary failures are expected but permanent failures after retry can be a sign of a real problem brewing.

API Traffic Shaping/Throttling Strategies For Tenant Isolation

I'll start my question by providing some context about what we're doing and the problems we're facing.
We are currently building a SaaS (hosted on Amazon AWS) that consists of several microservices that sit behind an API gateway (we're using Kong).
The gateway handles authentication (through consumers with API keys) and exposes the APIs of these microservices that I mentioned, all of which are stateless (there are no sessions, cookies or similar).
Each service is deployed using ECS services (one or more docker containers per service running on one or more EC2 machines) and load balanced using the Amazon Application Load Balancer (ALB).
All tenants (clients) share the same environment, that is, the very same machines and resources. Given our business model, we expect to have few but "big" tenants (at first).
Most of the requests to these services translate in heavy resource usage (CPU mainly) for the duration of the request. The time needed to serve one request is in the range of 2-10 seconds (and not ms like traditional "web-like" applications). This means we serve relatively few requests per minute where each one of them take a while to process (background or batch processing is not an option).
Right now, we don't have a strategy to limit or throttle the amount of requests that a tenant can make on a given period of time. Taken into account the last two considerations from above, it's easy to see this is a problem, since it's almost trivial for a tenant to make more requests than we can handle, causing a degradation on the quality of service (even for other tenants because of the shared resources approach).
We're thinking of strategies to limit/throttle or in general prepare the system to "isolate" tenants, so one tenant can not degrade the performance for others by making more requests than we can handle:
Rate limiting: Define a maximum requests/m that a tenant can make. If more requests arrive, drop them. Kong even has a plugin for it. Sadly, we use a "pay-per-request" pricing model and business do not allow us to use this strategy because we want to serve as many requests as possible in order to get paid for them. If excess requests take more time for a tenant that's fine.
Tenant isolation: Create an isolated environment for each tenant. This one has been discarded too, as it makes maintenance harder and leads to lower resource usage and higher costs.
Auto-scaling: Bring up more machines to absorb bursts. In our experience, Amazon ECS is not very fast at doing this and by the time these new machines are ready it's possibly too late.
Request "throttling": Using algorithms like Leaky Bucket or Token Bucket at the API gateway level to ensure that requests hit the services at a rate we know we can handle.
Right now, we're inclined to take option 4. We want to implement the request throttling (traffic shaping) in such a way that all requests made within a previously agreed rate with the tenant (enforced by contract) would be passed along to the services without delay. Since we know in advance how many requests per minute each tenant is gonna be making (estimated at least) we can size our infrastructure accordingly (plus a safety margin).
If a burst arrives, the excess requests would be queued (up to a limit) and then released at a fixed rate (using the leaky bucket or similar algorithm). This would ensure that a tenant can not impact the performance of other tenants, since requests will hit the services at a predefined rate. Ideally, the allowed request rate would be "dynamic" in such a way that a tenant can use some of the "requests per minute" of other tenants that are not using them (within safety limits). I believe this is called the "Dynamic Rate Leaky Bucket" algorithm. The goal is to maximize resource usage.
My questions are:
Is the proposed strategy a viable one? Do you know of any other viable strategies for this use case?
Is there an open-source, commercial or SaaS service that can provide this traffic shaping capabilities? As far as I know Kong or Tyk do not support anything like this, so... Is there any other API gateway that does?
In case Kong does not support this, How hard it is to implement something like what I've described as a plugin? We have to take into account that it would need some shared state (using Redis for example) as we're using multiple Kong instances (for load balancing and high availability).
Thank you very much,
Mikel.
Managing request queue on Gateway side is indeed tricky thing, and probably the main reason why it is not implemented in this Gateways, is that it is really hard to do right. You need to handle all the distributed system cases, and in addition, it hard makes it "safe", because "slow" clients quickly consume machine resources.
Such pattern usually offloaded to client libraries, so when client hits rate limit status code, it uses smth like exponential backoff technique to retry requests. It is way easier to scale and implement.
Can't say for Kong, but Tyk, in this case, provides two basic numbers you can control, quota - maximum number of requests client can make in given period of time, and rate limits - safety protection. You can set rate limit 1) per "policy", e.g for group of consumers (for example if you have multiple tiers of your service, with different allowed usage/rate limits), 2) per individual key 3) Globally for API (works together with key rate limits). So for example, you can set some moderate client rate limits, and cap total limit with global API setting.
If you want fully dynamic scheme, and re-calculate limits based on cluster load, it should be possible. You will need to write and run this scheduler somewhere, from time to time it will perform re-calculation, based on current total usage (which Tyk calculate for you, and you get it from Redis) and will talk with Tyk API, by iterating through all keys (or policies) and dynamically updating their rate limits.
Hope it make sense :)

ZMQ device queue does not load balance properly

I know that ZMQ offers all of the flexibility to do your own load-balancing. However I would expect the out-of-the-box broker, about 4 lines of code using the line
zmq_device (ZMQ_QUEUE, frontend, backend);
to load balance quite well as the documentation says it does load balance.
ZMQ_QUEUE creates a shared queue that collects requests from a set of clients, and distributes these fairly among a set of services. Requests are fair-queued from frontend connections and load-balanced between backend connections. Replies automatically return to the client that made the original request.
I have an army of back-end services and yet find that often my front-end clients have to wait several seconds for something that takes < 1/10 of a second in a 1:1 setting (there are same # of client and service machines). I suspect that ZMQ is not load-balancing properly out of the box - it's sending too many requests to the same service even though it doesn't have bandwidth, etc.
I think this is partly because the services are multithreaded in a way that lets them take up to 10 concurrent requests yet it slows down greatly at near the 10th request even though it can still accept them. Random distribution would be ideal. Is there an out-of-the-box way to do this or can it be done in a few lines of code, or do I have to write my own broker from scratch?
Fwiw issue was the workers were taking on work when they didn't have room for it, issue was not in ZMQ layer per se.

NServiceBus Dynamic End Points

Is it possible to create end points dynamically at runtime. E.g. Send a message to a known endpoint with details of a new endpoint so that a network node can learn of new nodes on the fly.
NServiceBus does not support this out of the box, but if you really really want it (and you are sure that it is the right way to go), you are free to implement your own message routing and send messages explicitly to an endpoint with bus.Send(endpoint, message).
In a project I am currently involved with, we do this with great success, because it allows us to seamlessly sign services in and out of the system while it is running, resulting in zero downtime during upgrades.
It took a bit of work to get it working though, so I would only recommend this if you are certain that your requirements demand it.