We have build a free globally distributed mobility analytics REST API. Meaning we have servers all over the world which run different versions (USA, Europe, etc..) of the same application. The services are behind a load balancer so I can't guarantee that the same user always get's the same application/server if he/she does requests today or tomorrow. The API is public but users have to provide an API key in order for us to match them to their paid request quota.
Since we do heavy number crunching with every request, we want to minimize request times as far as possible, inparticular for authentication/authorization and quota monitoring. Since we currently only use one user database (which has to be located in a single data center) there are cases where users in the US make a request to an application/server in the US which authenticates the user in Europe. So we are looking for a solution where the user database interaction:
happens on the same application server
get's synchronized between all application servers
should be easily integrable into java application
should be fast (changes happen in every request)
Things we have done so far:
a single database on each server > not synchronized, nightmare
a single database for all servers > ok, when used with slave as fallback but American users have to authenticate over the Atlantic
started installing bdr but failed on the way (no time, too complex, hard to make transition)
looked at redis.io
Since this is my first globally distributed REST API I wonder how other companies do this. (yelp, Google, etc.)
Any feedback is very kindly appreciated,
Cheers,
Daniel
There is no right answer, there are several ways to perform this. I'll describe the one I'm most familiar with (and which probably is one of the simplest ways to do it, although probably not the most robust one).
1. Separate authentication and authorization
First of all separate authentication and authorization. An user authenticating across the Atlantic is fine, every user request that requires authorization to go across the Atlantic is not. Main user credentials (e.g. a password hash) are a centralized resource, there is no way around that. But you do not need the main credentials every time a request needs to be authorized.
This is how a company I worked at did it (although this was postgres/python/django, nothing to do with java):
Each server has a cache of database queries in memcached but it does not cache user authentication (memcached is very similar to redis, which you mention).
The authentication is performed in the main data centre, always.
A successful authentication produces a user session that expires after a day.
The session is cached.
This may produce a situation in which a user can have an action authorized by more than one session at a given time. The algorithm is that: if at least one user session has not expired, the user is authorized. Since the cache is local, there is no need for a costly operation of fetching data across the Atlantic for every request.
2. Session revival
Sessions expiring after exactly a day may be annoying for the user, since he can never be sure that his session will not expire in the next couple of seconds. Each request the user makes that is authorized by a session shall extend the session lifetime to a full day again.
This is easy to implement: You need to keep in the session the timestamp of the last request made. And count the session lifetime based on that timestamp. If more than a single session may authorize a request the algorithm shall select the younger session (with the longest lifetime remaining) to update.
3. Request logging
This is as far as the live system I worked with went. The remainder of the answer is about how I would extend such a system to make space for logging the requests and verifying request quota.
Assumptions
Lets start by making a couple of design assumptions and argue why they are good assumptions. You're now running a distributed system, since not all data on the system is in a single place. A distributed system shall give priority to response speed and be as horizontal (not hierarchical) as possible, to achieve this consistency is sacrificed.
The session mechanism above already sacrifices some consistency. For example, if a user logs in and talks continuously to server A for a day and half and then a load balancer points the user to server B, the user may be surprised that he needs to login again. That is an unlikely situation: in most cases a load balancer would divide the user's requests equally over both server A and server B over the course of that day and half. And both, server A and server B will have live sessions at that point.
In your question you wonder how google deals with request counting. And I'll argue here that it deals with it in an inconsistent way. By that, I mean the fact that you cannot enforce a strict upper limit of requests if you're sacrificing consistency. Companies like google or yelp simply say:
"You have 12000 requests per month but if you do more than that you will pay $0.05 for every 10 requests above that".
That allows for an easier design: you can count requests at any time after they happened, the counting does not need to happen in real time.
One last assumption: a distributed system will have problems with duplicated internal data. This happens because parts of the system are running in real time and parts are doing batch processing without stopping or timestamping the real time system. That happens because you cannot be 100% sure about the state of the real time system at any given point. It is mandatory that every request coming from a customer have a unique identifier of some sort, it can be as simple as customer number + sequence number, but it needs to exist in every request. You may also add such a unique identifier the moment you receive the request.
Design
Now we extend our user session that is cached on every server (often cached in a different state on different servers that are unaware of each other). For every customer request we store the request's unique identifier as part of the session cache. That's right, the request counter in the central database is not updated in real time.
At a point in time (end of day processing, for example) each server performs a batch processing of request identifiers:
Live sessions are duplicated, and the copies are expired.
All request identifiers in expired sessions are concatenated together, and the central database is written to.
All expired sessions are purged.
This produces a race condition, where a live session may receive requests whilst the server is talking to the central database. For that reason we do not purge request identifiers from live sessions. This, in turn, causes a problem where a request identifier may be logged to the central database twice. And it is for that reason that we need the identifiers to be unique, the central database shall ignore request updates with identifiers already logged.
Advantages
99.9% Uptime, the batch processing does not interrupt the real time system.
Reduced writes to the database, and reduced communication with the database in general.
Easy horizontal growth.
Disadvantages
If a server goes down, recovering the requests performed may be tricky.
There is no way to stop a customer from doing more requests than he is allowed to (that's a peculiarity of distributed systems).
Need to store unique identifiers for all requests, a counter is not enough to measure the number of requests.
Invoicing the user does not change, you just query the central database and see how many requests a customer performed.
Related
I need to do some security validation on my program, and one of the things I need to answer, related to authentication is" Verify that all authentication decisions are logged, including linear back offs and soft-locks."
Does anyone knows what linear back off and soft-locks mean?
Thank you in advance,
Thais.
I am doing my research on OWASP ASVS. Actually Linear Back-of and soft-lock are authentication controls that are used to prevent brute force attacks and also can help against DoS.
Linear Back-of can be implemented through some algorithm by blocking user/IP for a particular time and after every failed login attempt that time is increase exponentially e.g. for first failed login block for 5 minute, for second failed login block for 25 minute for 3rd 125 min and so on.
As per my understanding as I have seen in some articles and implemented in some application like Oracle WebLogic Soft lock is much easier to implement, IP Address (Which is I think is also helpful to protect against DoS and Brute force using automated tools) or user name is logged in database for every failed login attempt and when a certain threshold number of failed login attempts (e.g. 5) block IP address permanently. Once the account has been soft locked in application runtime, it does not try to validate the account credentials against the backend system, thus preventing it from being permanently locked.
ASVS Verification requirement is very much clear on this though.
"Verify that a resource governor is in place to protect against vertical (a single account tested against all possible passwords) and horizontal brute forcing (all accounts tested with the same password e.g. “Password1”). A correct credential entry should incur no delay. For example, if an attacker tries to brute force all accounts with the single password “Password1”, each incorrect attempt incurs a linear back off (say 5, 25, 125, 625 seconds) with a soft lock of say 15 minutes for that IP address before being allowed to proceed. A similar control should also be in place to protect each account, with a linear back off configurable with a soft lock against the user account of say 15 minutes before being allowed to try again, regardless of source IP address. Both these governor mechanisms should be active simultaneously to protect against diagonal and distributed attacks."
I have been trying to solve a big problem for the last 2 weeks with one of our servers (apache 2.2 , windows, php).
The client using our system is a contact center firm.
They have about 120 operators, all connect to our websever with the same IP, their outgoing IP.
We have been suffering DoS attacks from some of these operators.
These are simple, browser attacks , namely 5 or 10 operators will just hold
F5 key and bombard the server with requests when they shouldnt.
There is very little we can do to improve performance of these specific url's the attackers are using. This is a software, not a public portal, so a lot of screens have a good amount of processing and real time querying in them.
We did manage to produce a php protection which will recognize the multiple requests and blacklist the user in php, after the user is logged in, by using a control mechanism with a cookie containing the userID in our software.
This works to some extent, but it’s a little "too late" since the request have already been sent and processed by the webserver.
Even if the response is now minimal and causes no more trouble to the server, ideally we would like something EXACTLY like mod_evasive, but for rejecting single requests instead of blocking the IP.
Exemplifying : if a user calls the same url, 5 times, in a 3 second spawn, we will reject every next request for 30 seconds, but only the requests by that user (identified by some cookie).
Well, this is more of an administrative problem than a technical problem - tell your customers to educate their users.
But if you really want to do that in software, some scheme like the following should work:
Give each of your users a different cookie (you're already doing that?).
Create a database table that has cookies and "request counts".
Whenever your software starts processing a request, if the request count for that session is > 3 (or a number you think is appropriate), abort immediately, set the request count to 30, and give an error message to the user. Especially don't do all that expensive processing stuff.
If the request count is less than 3, increase it by one and process the request normally.
Have some cron job decrease by one all request counts that are >1. Depending on the typical use case of your software, run this once per minute, or once every few seconds in a loop that queries the database, then sleeps for a while.
Tune the parameters to your software, or course.
I'm about to implement an RESTful API to our website (based on WCF data services, but that probably does not matter).
All data offered via this API belongs to certain users of my server, so I need to make sure only those users have access to my resources. For this reason, all requests have to be performed with a login/password combination as part of the request.
What's the recommended approach for preventing brute force attacks in this scenario?
I was thinking of logging failed requests denied due to wrong credentials and ignoring requests originating from the same IP after a certain threshold of failed requests has been exceeded. Is this the standard approach, or am I a missing something important?
IP-based blocking on its own is risky due to the number of NAT gateways out there.
You might slow down (tar pit) a client if it makes too many requests quickly; that is, deliberately insert a delay of a couple of seconds before responding. Humans are unlikely to complain, but you've slowed down the bots.
I would use the same approach as I would with a web site. Keep track of the number of failed login attempts within a certain window -- say allow 3 (or 5 or 15) within some reasonable span, say 15 minutes. If the threshold is exceeded lock the account out and mark the time that the lock out occurred. You might log this event as well. After another suitable period has passed, say an hour, unlock the account (on the next login attempt). Successful logins reset the counters and last lockout time. Note that you never actually attempt a login on a locked out account, you simply return login failed.
This will effectively rate-limit any brute force attack, rendering an attack against a reasonable password very unlikely to succeed. An attacker, using my numbers above would only be able to try 3 (or 5 or 15) times per 1.25hrs. Using your logs you could detect when such an attack were possibly occurring simply by looking for multiple lockouts from the same account on the same day. Since your service is intended to be used by programs, once the program accessing the service has its credentials set properly, it will never experience a login failure unless there is an attack in progress. This would be another indication that an attack might be occurring. Once you know an attack is in process, then you can take further measures to limit access to the offending IPs or involve authorities, if appropriate, and get the attack stopped.
I'm wondering when login timeouts are being used, specifically when using same session (same browser session). On a number of sites I have completed recently I have added 60 minute timeouts and they seem to be causing problems, such as users are not able to fill out larger forms (like a resume submission--people don't think of copying their resume from another program or saving part way through). On one site, I have implemented a div/popup forcing the user to enter their password to continue in the current session, without having to login again.
But on other sites, such as Facebook, it seems you are never logged out as long as you are using the same browser window, even without "remembering" your password.
The main reason I usually use timeouts is to ensure the data is secure, such that another party can't sit down at the computer a few hours later and use the system as the original user.
I'm wondering how you decide when a site should time out users because of inactivity?
I'm thinking the answer would be language agnostic.
IMO, they're valid when:
security is critical (ie. banking)
the likelihood of seat-swapping is
high (ie. public terminals)
Regardless, there may be instances like your resume system, where you want people on public terminals to be able to carry out an act that may leave them inactive for longer than your desired or necessary timeout.
I suppose you just have to handle that in a smart fashion - either figure out a way they can get the data in quicker (which would be ace, spending an hour filling out a form is not fun - can they just upload a file?), or ensuring they can continue without any data loss after being prompted to log in again.
Even though 60 minutes seems like a long time to fill out a single form (perhaps the forms should be divided into multiple pages?), you can probably use SlidingExpiration to solve the problem where your users get logged out even though the browser session is alive.
I think the timeout for an auth cookie is a Security level decision. If your site is SSL secured, you would probably have minimal timeout values (user session would expire within a matter of minutes). On the other hand, for sites with non-critical security, you could set a medium timeout value.
When I sign on to online banking, for example, it asks me whether or not I am using a "public terminal": and if I say yes then it enforces stricter security, or if no then laxer.
This question crossed my mind after I read this post:
“Common REST Mistakes: Sessions are irrelevant”
If sessions are indeed discouraged in a RESTful application. How would you handle licenses in such application. I'm specifically referring to concurrent licenses model and not named licenses. i.e. the customer buys X licenses which means the application may allow up to X users to be logged in simultaneously. Which means that the application must hold a state for current logged in users.
I know I can create a resource called licenses, which will set a cookie or generate a unique ID, and then the client will have to send it with every request. But it's the same as creating a session, right?
If I'll adopt the stateless approach and ask the client to create an authentication token for every request how will the application know when to consume and release license for that client?
Is there an alternative? specifically a more RESTful alternative?
Let me try to connect the dots for you, assuming I interpreted your question correctly.
The link you posted has a valid answer, each request should use a HTTP auth. If you need the concept of licences to maintain a certain state for your user, you can most likely link that to the user. You have a (verified) username to go by. You just need to invoke that controller for each request and save its state. There you have your session.
Cookie input should never be trusted for any critical information, but can be very useful for extra verification like a security token. I think adding a random security token field to your on-site links would be the restful approach to that. It should expire with the 'session', of course.
You may want to consider pushing the license handling concerns down the infrastructure stack one level. Sort of like an Aspect Oriented Programming (AOP) approach if you will. Instead of handling it in the application tier, perhaps, you can push it in to the web server tier.
Without knowing the details of your infrastructure, it is hard to give a specific recommendation. Using the *nix platform as an example, the license handling logic can be implemented as a module for Apache HTTP server.
This approach promotes a separation of concerns across your infrastructure stack. It allows each layer to focus on what it is meant to. The application layer does not need to worry about licensing at all, allowing it to focus strictly on content, which in turn keeps the URL's clean and "RESTful".
If your licensing is based on concurrent users, implementing HTTP digest is trivial, and will let you enable only the maximum number of concurrent logins. Digest has provision for passing expiration data so your session can be timed-out.
Authentication state is hold by http authetnication and nowhere else, beause it is transparent and ubiquituous.
Maybe a more RESTful way of doing licenses would be to limit the rate at which requests are handled, rather than the number of concurrent sessions. Keep track of the number of requests in the last hour, and if it exceeds the number the customer has paid for, serve a 503 Service Unavailable response, along with some text suggesting the user try again later.