Shared Key Lite authentication scheme validity with azure storage service - authentication

I read in MS msdn site [Link] about the Shared Key Lite authentication scheme for the azure storage service access . It was mentioned that Shared Key Lite signature is valid for 15 mins, this avoids the replay attacks. But my question is, why such a long duration for validity? During 15 mins time span replay attacks can happen right?

But my question is, why such a long duration for validity?
Think of this 15 minutes as a buffer to take care of any clock skewness. It may be entirely possible that the clock on the machine from where you're creating the authorization header is not in sync with clocks in Windows Azure and you obviously don't want exact time match between the 2 systems for the authorization to succeed.

Related

Optimization for GetSecret with Azure Keyvault

Our main goal for now is optimising the a processing service.
The service has a system-assigned managed identity with accespolicies that allow to get a secret.
This service makes 4 calls to a keyvault. The first one takes a lot longer than the others. I'm scratching my head, because the Managed Identity token takes 91µs to obtain. Application Insights image
I changed the way the tokens were obtained. The program only obtains it once and keeps using that same token for other round trips. I did this by making the CredentialClass AddScoped.
I assume the question is why the first call takes more time. If yes, just throwing in a couple of generic reasons which might contribute:
HTTPS handshake on the first call might take time
The client might need to create a connection pool
In case of Azure Key Vault, the first call does two round-trips AFAIK, first for auth, the second for the real payload

API Traffic Shaping/Throttling Strategies For Tenant Isolation

I'll start my question by providing some context about what we're doing and the problems we're facing.
We are currently building a SaaS (hosted on Amazon AWS) that consists of several microservices that sit behind an API gateway (we're using Kong).
The gateway handles authentication (through consumers with API keys) and exposes the APIs of these microservices that I mentioned, all of which are stateless (there are no sessions, cookies or similar).
Each service is deployed using ECS services (one or more docker containers per service running on one or more EC2 machines) and load balanced using the Amazon Application Load Balancer (ALB).
All tenants (clients) share the same environment, that is, the very same machines and resources. Given our business model, we expect to have few but "big" tenants (at first).
Most of the requests to these services translate in heavy resource usage (CPU mainly) for the duration of the request. The time needed to serve one request is in the range of 2-10 seconds (and not ms like traditional "web-like" applications). This means we serve relatively few requests per minute where each one of them take a while to process (background or batch processing is not an option).
Right now, we don't have a strategy to limit or throttle the amount of requests that a tenant can make on a given period of time. Taken into account the last two considerations from above, it's easy to see this is a problem, since it's almost trivial for a tenant to make more requests than we can handle, causing a degradation on the quality of service (even for other tenants because of the shared resources approach).
We're thinking of strategies to limit/throttle or in general prepare the system to "isolate" tenants, so one tenant can not degrade the performance for others by making more requests than we can handle:
Rate limiting: Define a maximum requests/m that a tenant can make. If more requests arrive, drop them. Kong even has a plugin for it. Sadly, we use a "pay-per-request" pricing model and business do not allow us to use this strategy because we want to serve as many requests as possible in order to get paid for them. If excess requests take more time for a tenant that's fine.
Tenant isolation: Create an isolated environment for each tenant. This one has been discarded too, as it makes maintenance harder and leads to lower resource usage and higher costs.
Auto-scaling: Bring up more machines to absorb bursts. In our experience, Amazon ECS is not very fast at doing this and by the time these new machines are ready it's possibly too late.
Request "throttling": Using algorithms like Leaky Bucket or Token Bucket at the API gateway level to ensure that requests hit the services at a rate we know we can handle.
Right now, we're inclined to take option 4. We want to implement the request throttling (traffic shaping) in such a way that all requests made within a previously agreed rate with the tenant (enforced by contract) would be passed along to the services without delay. Since we know in advance how many requests per minute each tenant is gonna be making (estimated at least) we can size our infrastructure accordingly (plus a safety margin).
If a burst arrives, the excess requests would be queued (up to a limit) and then released at a fixed rate (using the leaky bucket or similar algorithm). This would ensure that a tenant can not impact the performance of other tenants, since requests will hit the services at a predefined rate. Ideally, the allowed request rate would be "dynamic" in such a way that a tenant can use some of the "requests per minute" of other tenants that are not using them (within safety limits). I believe this is called the "Dynamic Rate Leaky Bucket" algorithm. The goal is to maximize resource usage.
My questions are:
Is the proposed strategy a viable one? Do you know of any other viable strategies for this use case?
Is there an open-source, commercial or SaaS service that can provide this traffic shaping capabilities? As far as I know Kong or Tyk do not support anything like this, so... Is there any other API gateway that does?
In case Kong does not support this, How hard it is to implement something like what I've described as a plugin? We have to take into account that it would need some shared state (using Redis for example) as we're using multiple Kong instances (for load balancing and high availability).
Thank you very much,
Mikel.
Managing request queue on Gateway side is indeed tricky thing, and probably the main reason why it is not implemented in this Gateways, is that it is really hard to do right. You need to handle all the distributed system cases, and in addition, it hard makes it "safe", because "slow" clients quickly consume machine resources.
Such pattern usually offloaded to client libraries, so when client hits rate limit status code, it uses smth like exponential backoff technique to retry requests. It is way easier to scale and implement.
Can't say for Kong, but Tyk, in this case, provides two basic numbers you can control, quota - maximum number of requests client can make in given period of time, and rate limits - safety protection. You can set rate limit 1) per "policy", e.g for group of consumers (for example if you have multiple tiers of your service, with different allowed usage/rate limits), 2) per individual key 3) Globally for API (works together with key rate limits). So for example, you can set some moderate client rate limits, and cap total limit with global API setting.
If you want fully dynamic scheme, and re-calculate limits based on cluster load, it should be possible. You will need to write and run this scheduler somewhere, from time to time it will perform re-calculation, based on current total usage (which Tyk calculate for you, and you get it from Redis) and will talk with Tyk API, by iterating through all keys (or policies) and dynamically updating their rate limits.
Hope it make sense :)

Asp.net Web Api 2 OWIN self hosted, high CPU, what is the average compute load I should expect?

I built a set of 3 APIs using Asp.net Web Api 2, self-hosted using OWIN in an Azure Cloud service Worker role.
The Worker Role is exposed to the internet with a custom domain.
Each API has a single controller, doing some normal dictionary operations, table calls and Azure Redis calls. 1 request on two just do a single Redis call and return in around 10ms.
The average call when going through all the API code is 150ms.
The answer is a JSON object of around 10k in size.
Everything works fine, but I have a problem.
I'm having around 25 peaks connections per second and no more than 2 Million requests per day, and I can barely get the CPU below 40% with 3 Azure D2_V2 (2 cores , 8GB RAM) instances running.
I'm in trouble because I'm spending almost 1.5k$ a month for an Api serving just 15-25 calls per second.
If I remove or scale down an instance, the CPU go up to 55-60%, Redis and Azure table calls slows down a lot and an API request takes 3- 5 seconds to get back.
I tried everything at the best of my abilities, I thought could be some bots or DDos attack, so I installed the nuget package WebApi Throttle, set a maximum of 1 requests per IP per second.
Nothing changed.
I reviewed all the code configuration to cut unoptimized parts, but 1 call in 2 just call redis and get back and the others are very clean and simple C# returning in 150ms with 2 azure table calls + 1 azure queue set.
The API Controllers are async, everything is async.
I enabled Profiling, the CPU is high in the main azure process, and the Redis Get method, nothing else relevant here, no bottlenecks.
I enabled Diagnostics, no errors.
I installed Application Insights, and here I see something strange that cannot tell if it is normal or not.
I see this IP: 13.88.23.0 doing thousands requests to the APIs with querystring values generally used in normal requests. A lot of them fail.
This IP is Azure itself, why is calling the Api?
Some of these requests are stuck for minutes, I can see that from the Application insights panel, it's always the same IP.
Then I see the remaining logs, dependencies etc,nothing relevant.
Apart from that , what could I do to understand the problem?
I can't think is normal to consume so many CPU resources for an API with just 2 Million calls a day, or not?
Is there an additional profiling technique I could use?
Based on your experience, how many API calls should I expect to serve with 3 dual core 8GB RAM servers in normal conditions? (assuming there is something wrong in my configuration)
Thanks
UPDATE
I separated the API in two cloud services, 2 in one and 1 in another.
I still see in Application Insights calls from another IP belonging to Microsoft.
I suppose this is normal, probably Application insights cannot detect the real IP of the client since is a Worker Role and show the internal one.
But the problem of having to use so much power for so few calls remain.
Any thoughts on that?

Clarification about TURN server authentication through REST api

I was going through this draft to undertstand usage of REST api to access TURN servics. I am bit confused after going through that.
Currently, I am authenicating my TURN server using Long Term Credential Mechanism with Redis database, but instead of using actual username and password, I am using a authenication token( which expires in 8 hours) and a random string as password.
My doubts about the draft are:
the ttl recieved in the response is never used( at least not part of RTCPeerConnection). so how exactly is TURN know when to expire the user?
I see no option in turnserver arguments to specify the timestamp format, ss it is fixed a UNIX timestamp?
Does REST api implementation offer any advantage over my implementation( considering the fact the mine doesn't have a dependency on sync between webrtc server and TURN server time)
The timestamp generated by the REST endpoint as part of the username is ttl seconds in the future. So the TTL in the response is just informative.
The advantage of the overall approach is that (assuming time sync which is a solved problem) it requires no communication between the entity that generates the token and the TURN server. When deploying multiple TURN servers around the globe (see later in this I/O 2015 presentation) this is somewhat easier than syncing a redis database.

Cryptography: Verifying Signed Timestamps

I'm writing a peer to peer network protocol based on private/public key pair trust. To verify and deduplicate messages sent by a host, I use timestamp verification. A host does not trust another host's message if the signed timestamp has a delta (to the current) of greater than 30 seconds or so.
I just ran into the interesting problem that my test server and my second client are about 40 seconds out of sync (fixed by updating ntp).
I was wondering what an acceptable time difference would be, and if there is a better way of preventing replay attacks? Supposedly I could have one client supply a random text to hash and sign, but unfortunately this wont work as in this situation I have to write messages once.
A host does not trust another host's message if the signed timestamp has a delta (to the current) of greater than 30 seconds or so.
Time based is notoriously difficult. I can't tell you the problems I had with mobile devices that would not or could not sync their clock with the network.
Counter based is usually easier and does not DoS itself.
I was wondering what an acceptable time difference would be...
Microsoft's Active Directory uses 5 minutes.
if there is a better way of preventing replay attacks
Counter based with a challenge/response.
I could have one client supply a random text to hash and sign, but unfortunately this wont work as in this situation I have to write messages once...
Perhaps you could use a {time,nonce} pair. If the nonce has not been previously recorded, then act on the message if its within the time delta. Then hold the message (with {time,nonce}) for a windows (5 minutes?).
If you encounter the same nonce again, don't act on it. If you encounter an unseen nonce but its out of the time delta, then don't act on it. Purge your list of nonces on occasion (every 5 minutes?).
I'm writing a peer to peer network protocol based...
If you look around, then you will probably find a protocol in the academic literature.