Are there benefits to using LDAP for Auth for a single application? - authentication

I worked on a project a while back where the Architect decided to use LDAP for managing authentication / authorization, rather than a traditional database approach.
The application was expected to scale rapidly by approx 500 - 1000 users a day, and then plateau at around 200k users. Beyond that, there was nothing special about this application.
I didn't ask at the time, but I'm curious around why we would've used LDAP here.
As I understand it, the real strengths of LDAP lie in organizations where users are required to authenticate against several disconnected systems, and LDAP provides a single auth provider.
Are there additional benefits that make it a good fit for certain applications?

Actually, the scaling is one: LDAP is easily distributed across multiple servers.
The other reason, whether your architect mentioned it or not, is that there's never in the history of the world been a single application that stayed single. Someone will have a new idea, and now there's a unified single sign on technique already available and standard.

LDAP is incredibly fast for read operations compared with your average RDBMS. Auth operations are read intensive and generally change data infrequently - which is well suited for ldap's strengths. On the flip side, write operations are generally much slower than their database counterparts.
So, while LDAP would most likely not be an adequate alternative for your general data storage needs, it is a strong choice for auth.

I wouldn't call the database solution for users 'traditional': in fact I would call LDAP traditional in this case. It is much better as a user registry in all sorts of ways, including performance, standardisation, availability of APIs, availability of browser clients, ease of scaling/federation, security, ...

Related

Cloud scale user management

I am building a service to handle a large number of devices, for a large number of users.
We have a complex schema of access roles assigned to each entity. Some data entries can be written to by certain users, while some users can only read from some entities (but can write to others).
This is a cloud service: there are more devices, and users than can be handled by a single server machine (we are using non relational Cloud databases for this).
I was wondering if there was an established cloud-scale user/role management backend system which I could integrate to enforce the access rules, instead of writing my own. This tech should preferably be cloud agnostic, so I would prefer not to use a SAAS solution, but deploy my own.
I am looking for a system which can scale to millions of users, and billions of data entities
I think authentication is not going to be a big issue, there are very robust cloud based solutions available for storing identities and authenticating millions of users. Authorization will be trickier, and will depend a lot on how granular you want it to be. You could look at Apigee for example as a very scalable proxy that might help you implement this. So getting to the point where you have a token that you can verify the users identity with and that might contain some scopes is not going to be hard imo. If that is enough for you then I would just look at Auth.0, Okta and the native IDM solution of whatever cloud platform you are using (Cognito, Cloud Identity etc.).
I think you will find that more features come with a very hefty pricetag. So Auth.0 is far superior compared to Cognito, but Cognito still has enough features for basic use cases and will end up costing a fraction of Auth.0 in large deployments. So everything comes with pros and cons. If you have very complex requirements such as a bunch of big legacy repositories that you need to integrate then products like Auth.0 rapidly start looking more attractive.
Personally I would look at Auth.0, Cognito and Apigee and my decision would depend massively on parameters that you haven't mentioned in your question. Obviously these are all SaaS solutions, which I think you should definitely be using anyways. I would not host this myself unless I had no other choice, and going that route will radically limit your choices and probably increase expenses. All the cool stuff is happening in the cloud.

Advantage of LDAP over RDBMS?

I have an application with a backend as database.
The application is sort of PUB-SUB model where users post changes to the application and other peers subscribe to those changes. These changes may happen very frequently or periodically and all the changes have to be written to database.
Now, I am being asked to find the possibility of replacing this RDBMS with LDAP. Probably they want unified DB for all applications but anyways I have to find the advantage/disadvantages of both approaches.
I cannot directly compare RDBMS a with LDAP as I have almost no idea of LDAP though I tried to get some.
I understand that LDAP is designed for directory access and is optimized for Read access, so it is write once and read many. I have read that frequent writes will reduce the performance of LDAP server as each write will result a trigger to indexing process.
Just to give a scenario in regards with indexing in LDAP, my table will have few columns say 2 viz. Name and Desc. Now in LDAP I suppose this would become two attributes as Name and Desc. In my scenario it's Desc which will be frequently updated. I assume Name will be indexed so even if Desc is changing frequently it won't trigger indexing process.
I point is worth mentioning that the database will be hosted on some cloud platform.
I tried to find out the differences but nothing conclusive I could find out.
LDAP is a protocol, REST is a service based on the HTTP (protocol). So when the LDAP server shall not be exposed to the internet, how do you want to get the data from it? As LDAP is the protocol you would need direct access to the LDAP-server. Its like a database server that you would not expose directly to the internet. You would build an interface to encapsulate it. and that might as well be a REST interface.
I'd try to get the point actos that one is the transfer protocol and a storage backend and the ither is the public interface to its data. It's a bit like why is mysql better than a webinterface. You'd never make the mysql-server publicly available but encapsulate its protocol into an application.
REST is an interface. It doesn't matter how you orgsnize your data behind that interface. When you decide that you want to organize it differently you can do so without the consumer of your API noticing any change. And you can provide different versions of your API depending on improvements of your service.
LDAP on the other hand is an implementation. You can't change the way your data is handled without the consumer noticing it. So there's no way to rearrange your backend without affecting the consumer.
With REST you can therefore change the backend from MySQL to PostgreSQL even to LDAP without notice which you won't be able with LDAP.
Hope that helps
Now that we finally know what you're actually asking, which has nothing to do with your title, the body of your question, or REST, the simple answer is that there is no particular reason to believe that an LDAP server will perform significantly better than an RDBMS in this application, with two riders:
it may not even be feasible, due to the schema issue, and
if it is feasible it may not be semantically suitable, due to the lack of ACID properties, lack of JOINs, and the other issues mentioned in comments.
I will state that this is one of the worst formulated questions I have seen here for some considerable time, and the difficulty of extracting the actual question was extreme.

Is it considered a poor practiced to use a single database for different uses across different applcations

What if you had one large database to server all your apps. So your website that needs to store customer orders can use the same database that your game uses to store registered users. Different applications could have tables only for them to use. Some may say that this could be a security issue, because if someone cracks your database, they could attack all your applications. But in a lot of databases you could use a line like the following to restrict access:
deny select on aTable to aUser;
I am wondering if this central database would be considered a poor practice, and if so why?
They way I look at it, a web application is nothing more than a collection of web pages. Because of this, it really doesn't matter if one page is about, say, cooking, while the other page is about computer programming.
If you also consider it, this is very similar to Openid, which I use to log into my SO account!
If you have your fundamental security implemented correctly, it doesn't matter how the user is interacting with your website. Where I would make this distinction is in two cases:
Don't mix http with https. On a shared host, this isn't going to be an issue anyway; if you buy the certificate for https, make everything that way (excluding the rare case where this might affect performance).
E-commerce or financial data should be handled fundamentally in a different way. If you look at your typical bank, they have multiple log-in protocols, picture verification and short log-in times. This builds confidence in user's securities. It would be a pain in the butt for a game site, or most other non-mission critical applications.
Regarding structure, if you do mix applications into one large database, you should consider the other maintenance issues, such as:
Keep tables separate; consider a prefix for every table unique to each application. Following my example above, you would then start the cooking DB table names with 'ck', and the computer programming DB table names with 'pg'. This would allow you to easily separate the applications if you need to in the future.
Use a matching table to identify which ID goes to which web application.
Consider what you would do and how to handle it if a user decided to register for both applications. Do you want to offer transparency that they can share the same username?
Keep an eye on both your data storage limit AND your bandwidth limit.
If you are counting on these applications to drive revenue, you are putting "all your eggs in one basket". Make sure if it goes down, you have options to restore or move to another host.
These are just a few of the things to consider. But fundamentally, outside of huge (big data) applications there is nothing wrong with sharing resources/databases/hardware between applications.
Conceptually, it could be done.
Implementation-wise, to make the various parts distinct from one another, you could use both naming conventions (as per #Sable Foste) and/or separate database schemas (table Finance.Users, GameApp.Users, etc.)
Management-wise, things could get tricky. Repeating some points, adding others:
One application could use a disproportionally large share of resources (disk space, I/O, CPU)
Tracking versions could be tricky (App is v4, finance is v7) -- depends on how many application instances you have to support.
Disaster recovery-wise, everything is lumped together. It all gets backed up as one set, it all gets restored as one set. Finance corrupt? Restore from backup... and lose your more recent game data.
Single point of failure. One database goes down, all your applications are down.
These (and other similar issues) are trade-offs you'll want to consider. Plan ahead, to lessen the chance that what's reasonable and economic today becomes a major headache tomorrow.

How to achieve high availability?

My boss wants to have a system that takes into concern of continent wide catastrophic event. He wants to have two servers in US and two servers in Asia (1 login server and 1 worker server in each continent).
In the event that earthquake breaks the connection between the two continents, both should work alone. When the connection is revived, they should sync each other back to normal.
External cloud system not allowed as he has no confidence.
The system should take into account of scalability which means addition of new servers should be easy to configure.
The servers should be load balanced.
The connection between the servers should be very secure(encrypted and send through SSL although SSL takes care of encryption).
The system should let one and only one user log in with one account. (beware of latency between continent and two users sharing account may reach both login server at the same time)
Please help. I'm already at the end of my wit. Thank you in advance.
I imagine that these requirements (if properly analysed) are essentially incompatible, in that they cannot work according to CAP Theorem.
If you have several datacentres, even if they are close by, partitions WILL happen. If a partition happens, either availability OR consistency MUST be lost, because either:
you have a pre-determined "master", which keeps working and other "slave" DCs which fail (or go readonly). This keeps consistency at the expense of availability.
OR you lose consistency for the duration of the partition (this means that operations which depend on immediate consistency are also unavailable).
This is incompatible with your requirements, as far as I can see. What your boss wants is clearly impossible. He needs to understand CAP theorem.
Now, in YOUR application case, you may decide that you can bend the rules and redefine what consistency or availiblity are, for convenience, and have a system which degrades into an inconsistent but temporarily acceptable state.
You probably want to get product management to have a look at the business case for these requirements. Dropping some of them is probably ok. Consistency is a good requirement to keep, as it makes things behave as people expect - this means to drop availability or partition-tolerance. Keeping consistency is definitely easier from an engineering perspective.
This is another one of those things where employers tend not to understand the benefits of using an off-the-shelf solution. If you as a programmer don't really even know where to start with this, then rolling your own is probably a going to be a huge money and time sink. There's nothing wrong with not knowing this stuff either; high-availability, failsafe networking that takes into consideration catastrophic failure of critical components is a large problem domain that many people pour a lot of effort and money into. Why not take advantage of what providers have to offer?
Give talking to your boss about using existing cloud providers one more try.
You could contact one of the solid and experienced hosting provides (we use Rackspace) that have data centers in different regions world wide and get their recommendations upon your requirements.
This will require expert assistance and a large budget, and serious planning.
I better option will be contact a reputable provider with a global footprint and select a premium solution with a solid SLA backing up there service and let them tailor a solution that comes close to your needs.
Just realize even the guys like Google, Yahoo, Microsoft and Amazon (to name a few), at one time or another have had some or other issue that rendered segments of there systems offline to certain users.

What does LDAP solve?

I've been in touch with LDAP in many projects I've been involved in but, the truth be told, I don't really understand it. I thought it was just a person directory but after I discovered that it can contain any objects in a hierarchical structure.
I installed openldap in my box and I found many tutorials regarding just the installation.
What is LDAP? What are the scenarios where LDAP is the right choice? What are the LDAP concepts I should know for working with it? What are the advantages of LDAP? Is it used just because old applications used it? Is there a good doc anywhere on internet explaining all this questions?
UPDATE:
Complementing the answers I found this link which contains a quick start guide for LDAP newbie like me.
What is LDAP? What are the scenarios where LDAP is the right choice?
At its core, LDAP is a protocol for accessing objects that are suitable for storage in a directory. Whether something is "suitable" is an entirely subjective determination that's left up to implementers, but typically this means collections of many objects that each have infrequently (or never) updated data, where each object has an obvious or canonical way to be looked up:
a phone book (look up by name or by phone number)
titles in a library (look up by title, author, etc.)
tenants in a building (look up by floor, suite, name, etc.)
and so on.
Note that LDAP itself is just a protocol and doesn't provide any actual storage -- in much the same way, HTTP doesn't imply anything about whether you're using Apache, Jetty, Tomcat, Mongrel, et al. as a web server. (One problem with LDAP in general is the confusing reuse of names to mean different things. Wikipedia has a good section on this.)
DITs are a hierarchical description scheme that lend themselves to B-Tree algos very nicely, resulting in tremendous search performance in most cases. Directory Server like OpenDS return indexed searches in micro-seconds, whereas RDBMS systems are much slower. Directory Servers (often called LDAP servers) trade resources (RAM, CPU) for fast read response. RDBMS systems provide greater functionality in terms of management of data in question. Need speed with few or zero updates, simplicity, and small network protocol? Use a Directory Server. Need data management and mining capabilities, and/or high rate-of-change of the database with relational aspects defined between data? Use an RDBMS (MySQL is your best bet here).
LDAP has O(1) read performance, in exchange for O(something worse) write performance. It's ideal for data that's accessed frequently, but changed rarely - directories of people, machine names and addresses, and so on. (hence the acronym: Lightweight Directory Access Protocol.)
LDAP is the right choice where the pain of using a database that isn't relational, in terms of decreased developer familiarity and strange performance characteristics, is less than the gain of blindingly fast read access.
This link will explain LDAP http://blogs.oracle.com/raghuvir/entry/ldap
We use LDAP in our office for email address lookups company wide. We use it as a single source sign on service for our internal apps as well.
One perspective I like to harp on is LDAP is an app on top of a persistence store and a database is a persistence store. Both can be used to store user information.
LDAP gives you a hierarchy which is harder to do in a database. You can make a hierarchy in a database but it's harder to do things like delegation (these rows belong to you only) or ACLs on rows. So pushing security problems out of the database is easier if you use LDAP for storing user identities. Trying to solve it in the database is weird.
At the same time, LDAP is terrible for reporting against (transform LDAP to a DB for reporting). Storing attributes deep in the tree that need to be searched quickly can be problematic for performance (don't do this, have a DB on the side or try to flatten the query by redesigning your DIT). Storing attributes all over the place in a really deep DIT is just bad LDAP or system design but sometimes it's unavoidable if you're tied to a vendor product or legacy app.
LDAP is just a protocol, the wikipedia article explains it adequately http://en.wikipedia.org/wiki/Lightweight_Directory_Access_Protocol
Its a way to query an underlying organizational structure like Microsoft's Active Directory. You can use LDAP queries to get all kinds of information about users, use it for setting application rights, etc.
I am working part time and a full time student. My curriculum encourages (read requires) many group projects.
I have used openLdap and phpLdapAdmin to control access to my Subversion and Mercurial repos, Trac projects, Hudson, etc. It wasn't easy to install, but the time saved in administration was a God send.
If you have projects where you will have many groups of people who need to be able to use different resources, it is a good tool.
See this link :
http://www.umich.edu/~dirsvcs/ldap/doc/guides/slapd/1.html#RTFToC1
Which explains deeply LDAP :
For example you can see this image in that documentation ,
(source: dirsvcs at www.umich.edu)
LDAP is an access protocol; it only provides an API to the underlying technology for which you are trying to find applications - a directory service. OpenLDAP is one of the open source directory services; Sun has another implementation called OpenDS. Active Directory and Novell NDS are another two commonly seen in the field.
The directory can be used for storing information about any sort of resource, and the relationships between the resources - for example, rights of a user to a directory, a printer, or a network access device.
Is there a good doc anywhere on internet explaining all this questions?
IBM published an excellent Red Book about LDAP. The title is:
Understanding LDAP - Design and Implementation.
It can be downloaded from the previous link.
In one of my old workplaces we used LDAP as our primary user authentication system.
This in turn provided our various systems with information which dept. they belonged to, where they should mount their home directories, contact information, employee management.
Not necessarily controlled by LDAP, but other things that we had mixed to work through LDAP was the existence of SQL users, K4, samba and email account generation.