Scalable role based authentication - authentication

I am currently designing a role based authentication system for resources where many users have different access rights to it.
A role may be a single user, or a group of roles (so a role is a tree of roles). (see graphic below)
A resource can have multiple authentication properties (like read, write, delete), where each of this is a list of roles allowed to do access the operation. (see graphic below)
The problem is if I want to check if a user has the right to access a property, i have to traverse n trees in worst case (where n is the number of roles assigned to an property).
So for example to check if 'Max' may read the property I might have to check the Marketing, Management and Administration trees if they contain 'Max'.
Do you know of any algorithm or alternative approach which removes the quite expensive tree searches while maintaining the role system or something equally powerful.
The perfect case would be some lookup like O(log(n)) for n roles.
Thanks,
Fionn

Have you measured this and determined that this traversal is a performance bottleneck?
I've never seen a system with so many roles / levels that the cost of traversing this kind of structure would become an issue. And if the tree really is that large, I'd be more concerned that administrators would have difficulty in understanding who is authorized to do what.
Regarding scalability, I would typically use the ASP.NET cache to cache the complete tree that maps between resources and roles, with a suitable cache timeout. And separately cache the mapping from Users to Roles (e.g. in Session or with a user-specific key in the ASP.NET cache).
Accessing the information from the cache will typically be blindingly fast compared with going to the database each time.

If you put your roles in a SQL database, lookups will perform substantially as you describe. I can help you with the database structure, if you're interested.

You need to reverse your pointers.
"Harry" is a member of "Site2 Admins" which has "Administrators" access to "Site2", so he can thus "Delete," "Write" and "Read that content.
Why "Administration" should be a common thing between "Harry" and "Joe" I'm not clear. Harry is an administrator on one site, but just a user on another, and Joe vice versa.

Related

How far can Keycloak scale in terms of resources and permissions?

I'm planning on having hundred of thousands of entities in my application and I want to handle rights on each of these. I won't be the one handling those rights, my users will and they probably won't set rights on every entity.
How far can Keycloak go on this matter? I probably should be creating resources for only the elements that actually need specific permissions but I want to understand when things may become an issue and when I should be trying to find an other solution.
Thanks

Are there benefits to using LDAP for Auth for a single application?

I worked on a project a while back where the Architect decided to use LDAP for managing authentication / authorization, rather than a traditional database approach.
The application was expected to scale rapidly by approx 500 - 1000 users a day, and then plateau at around 200k users. Beyond that, there was nothing special about this application.
I didn't ask at the time, but I'm curious around why we would've used LDAP here.
As I understand it, the real strengths of LDAP lie in organizations where users are required to authenticate against several disconnected systems, and LDAP provides a single auth provider.
Are there additional benefits that make it a good fit for certain applications?
Actually, the scaling is one: LDAP is easily distributed across multiple servers.
The other reason, whether your architect mentioned it or not, is that there's never in the history of the world been a single application that stayed single. Someone will have a new idea, and now there's a unified single sign on technique already available and standard.
LDAP is incredibly fast for read operations compared with your average RDBMS. Auth operations are read intensive and generally change data infrequently - which is well suited for ldap's strengths. On the flip side, write operations are generally much slower than their database counterparts.
So, while LDAP would most likely not be an adequate alternative for your general data storage needs, it is a strong choice for auth.
I wouldn't call the database solution for users 'traditional': in fact I would call LDAP traditional in this case. It is much better as a user registry in all sorts of ways, including performance, standardisation, availability of APIs, availability of browser clients, ease of scaling/federation, security, ...

Observing social web behavior: to log or populate databases?

When considering social web app architecture, is it a better approach to document user social patterns in a database or in logs? I thought for sure that behavior, actions, events would be strictly database stored but I noticed that some of the larger social sites out there also track a lot by logging what happens.
Is it good practice to store prominent data about users in a database and since thousands of user actions can be spawned easily, should they be simply logged?
Remember that Facebook, for example, doesn't update users information per se, they just insert your new information and use the most recent one, keeping the old one. If you plan to take this approach is HIGHLY recommended, if not mandatory, to use a NoSQL DB like Cassandra, you'll need speed over integrity.
Information = money. Update = lose information = lose money.
Obviously, it depends on what you want to do with it (and what you mean be "logging").
I'd recommend a flexible database storage. That way you can query it reasonably easily, and also make it flexible to changes later on.
Also, from a privacy point of view, it's appropriate to be able to easily associate items with certain entities so they can be removed, if so requested.
You're making an artificial distinction between "logging" and "database".
Whenever practical, I log to a database, even though this data will effectively be static and never updated. This is because the data analysis is much easier if you can cross-reference the log table with other, non-static data.
Of course, if you have a high volume of things to track, logging to a SQL data table may not be practical, but in that case you should probably be considering some other kind of database for the application.

Selective replication with CouchDB

I'm currently evaluating possible solutions to the follwing problem:
A set of data entries must be synchonized between multiple clients, where each client may only view (or even know about the existence of) a subset of the data.
Each client "owns" some of the elements, and the decision who else can read or modify those elements may only be made by the owner. To complicate this situation even more, each element (and each element revision) must have an unique identifier that is equal for all clients.
While the latter sounds like a perfect task for CouchDB (and a document based data model would fit my needs perfectly), I'm not sure if the authentication/authorization subsystem of CouchDB can handle these requirements: While it should be possible to restict write access using validation functions, there doesn't seem to be a way to authorize read access. All solutions I've found for this problem propose to route all CouchDB requests through a proxy (or an application layer) that handles authorization.
So, the question is: Is it possible to implement an authorization layer that filters requests to the database so that access is granted only to documents that the requesting client has read access to and still use the replication mechanism of CouchDB? Simplified, this would be some kind of "selective replication" where only some of the documents, and not the whole database is replicated.
I would also be thankful for directions to some detailed information about how replication works. The CouchDB wiki and even the "Definite Guide" Book are not too specific about that.
this begs for replication filters. you filter outbound replication based on whatever criteria you impose, and give the owner of the target unrestricted access to their own copy.
i haven't had the opportunity to play with replication filters directly, but the idea would be that each doc would have some information about who has access to it, and the filtering mechanism would then allow outbound replication of only those documents that you have access to. replication from the target back to the master would be unrestricted, allowing for the master to remain a rollup copy, and potentially multicast changes to overlapping sets of data.
What you are after is replication filters. According to Chris Anderson, it is a 0.11 feature.
"The current status is that there is
an API for filtering the _changes
feed. The replicator in 0.10 consumes
the changes feed, so the next step is
getting the replicator to use the
filter API.
There is work in progress on this, so
it should be fully ready to go in
0.11."
See the orginal post
Here is a new link to the some documentation about this:
http://blog.couchbase.com/what%E2%80%99s-new-apache-couchdb-011-%E2%80%94-part-three-new-features-replication
Indeed, as others have said, replication filters are the way to go for this. Here is a link with some information on using them.
One caveat I would add is that at scale replication filters can be extremely slow. More information about this and other nuances about couchdb can be found in this excellent blog post: "what every developer should know about couchdb". For large scale systems performing replication in the application layer has proven faster and more reliable.

What does LDAP solve?

I've been in touch with LDAP in many projects I've been involved in but, the truth be told, I don't really understand it. I thought it was just a person directory but after I discovered that it can contain any objects in a hierarchical structure.
I installed openldap in my box and I found many tutorials regarding just the installation.
What is LDAP? What are the scenarios where LDAP is the right choice? What are the LDAP concepts I should know for working with it? What are the advantages of LDAP? Is it used just because old applications used it? Is there a good doc anywhere on internet explaining all this questions?
UPDATE:
Complementing the answers I found this link which contains a quick start guide for LDAP newbie like me.
What is LDAP? What are the scenarios where LDAP is the right choice?
At its core, LDAP is a protocol for accessing objects that are suitable for storage in a directory. Whether something is "suitable" is an entirely subjective determination that's left up to implementers, but typically this means collections of many objects that each have infrequently (or never) updated data, where each object has an obvious or canonical way to be looked up:
a phone book (look up by name or by phone number)
titles in a library (look up by title, author, etc.)
tenants in a building (look up by floor, suite, name, etc.)
and so on.
Note that LDAP itself is just a protocol and doesn't provide any actual storage -- in much the same way, HTTP doesn't imply anything about whether you're using Apache, Jetty, Tomcat, Mongrel, et al. as a web server. (One problem with LDAP in general is the confusing reuse of names to mean different things. Wikipedia has a good section on this.)
DITs are a hierarchical description scheme that lend themselves to B-Tree algos very nicely, resulting in tremendous search performance in most cases. Directory Server like OpenDS return indexed searches in micro-seconds, whereas RDBMS systems are much slower. Directory Servers (often called LDAP servers) trade resources (RAM, CPU) for fast read response. RDBMS systems provide greater functionality in terms of management of data in question. Need speed with few or zero updates, simplicity, and small network protocol? Use a Directory Server. Need data management and mining capabilities, and/or high rate-of-change of the database with relational aspects defined between data? Use an RDBMS (MySQL is your best bet here).
LDAP has O(1) read performance, in exchange for O(something worse) write performance. It's ideal for data that's accessed frequently, but changed rarely - directories of people, machine names and addresses, and so on. (hence the acronym: Lightweight Directory Access Protocol.)
LDAP is the right choice where the pain of using a database that isn't relational, in terms of decreased developer familiarity and strange performance characteristics, is less than the gain of blindingly fast read access.
This link will explain LDAP http://blogs.oracle.com/raghuvir/entry/ldap
We use LDAP in our office for email address lookups company wide. We use it as a single source sign on service for our internal apps as well.
One perspective I like to harp on is LDAP is an app on top of a persistence store and a database is a persistence store. Both can be used to store user information.
LDAP gives you a hierarchy which is harder to do in a database. You can make a hierarchy in a database but it's harder to do things like delegation (these rows belong to you only) or ACLs on rows. So pushing security problems out of the database is easier if you use LDAP for storing user identities. Trying to solve it in the database is weird.
At the same time, LDAP is terrible for reporting against (transform LDAP to a DB for reporting). Storing attributes deep in the tree that need to be searched quickly can be problematic for performance (don't do this, have a DB on the side or try to flatten the query by redesigning your DIT). Storing attributes all over the place in a really deep DIT is just bad LDAP or system design but sometimes it's unavoidable if you're tied to a vendor product or legacy app.
LDAP is just a protocol, the wikipedia article explains it adequately http://en.wikipedia.org/wiki/Lightweight_Directory_Access_Protocol
Its a way to query an underlying organizational structure like Microsoft's Active Directory. You can use LDAP queries to get all kinds of information about users, use it for setting application rights, etc.
I am working part time and a full time student. My curriculum encourages (read requires) many group projects.
I have used openLdap and phpLdapAdmin to control access to my Subversion and Mercurial repos, Trac projects, Hudson, etc. It wasn't easy to install, but the time saved in administration was a God send.
If you have projects where you will have many groups of people who need to be able to use different resources, it is a good tool.
See this link :
http://www.umich.edu/~dirsvcs/ldap/doc/guides/slapd/1.html#RTFToC1
Which explains deeply LDAP :
For example you can see this image in that documentation ,
(source: dirsvcs at www.umich.edu)
LDAP is an access protocol; it only provides an API to the underlying technology for which you are trying to find applications - a directory service. OpenLDAP is one of the open source directory services; Sun has another implementation called OpenDS. Active Directory and Novell NDS are another two commonly seen in the field.
The directory can be used for storing information about any sort of resource, and the relationships between the resources - for example, rights of a user to a directory, a printer, or a network access device.
Is there a good doc anywhere on internet explaining all this questions?
IBM published an excellent Red Book about LDAP. The title is:
Understanding LDAP - Design and Implementation.
It can be downloaded from the previous link.
In one of my old workplaces we used LDAP as our primary user authentication system.
This in turn provided our various systems with information which dept. they belonged to, where they should mount their home directories, contact information, employee management.
Not necessarily controlled by LDAP, but other things that we had mixed to work through LDAP was the existence of SQL users, K4, samba and email account generation.