Long time lurker, first time poster, please bear with me.
I'm trying to set up a sharded, secure Mongodb environment. I would like to make use of Mongo's autosharding capability, since I'm sort of new to databases and on a tight schedule.
It seems that autosharding only applies to individual collections (tables), but I don't want users to have access to the entire collection. Further, mongoDB only allows authentication into databases, so once authenticated, a user can see 1) every collection in the db and 2) all data within each collection. So, as far as I can tell, I can either have autosharding and no authentication, or manual sharding and authentication.
I would like the best of both worlds, that is: autosharding and authentication. Is this possible? If not, how should I go about manual sharding in MongoDB?
A simplified use case of this system: collection 'Users' has data on every user. I want to authenticate user X so that X can only see X's data in the User's collection. And Users is distributed across multiple servers partitioned (sharded) by user_name.
MongoDb doesn't have authentication like traditional SQL databases. In fact if you read the manual its recommended that you use a secured environment instead of using authentication. Any access control to your data would be implemented within your application.
Even with traditional SQL, access isnt control by row. Thats usually something implemented at the application level based on some sort of key within the data.
Related
Background:
Our team is building an inhouse Intranet web application. We are using a standard three layer approach. Presentation layer (mvc web app), Business layer and data access layer.
Sql database is used for persistence.
Web app / iis handles user authentication (windows authentication). Logging is done in business and data access layer.
Question service account vs user specific Sql accounts:
Use service / app account:
Dev team is proposing to set up service account (set up for application only). This service account needs write & read access to db.
Vs
Pass on user credentials to SQL
IT ops is saying that using a service account (specifically created for app only) for db access is not deemed best practice. Set up Kerberos delegation configured from the web server to the SQL server so that you can pass on the Windows credentials of the end users & create a database role that grants the appropriate data access levels for end users
What is the best practice for setting up accounts in sql where all request to db will come through the front end client (ie via bus layer and then data layer)
The Best Practice here is to let the person/team responsible for the database make the decision. It sounds like the dev team wants to forward (or impersonate) some credentials to the DB which I know that some small teams like doing, but yes that can leave things a bit too open. The app can do whatever it likes to the database, which is not much of a separation if you're into that kind of thing.
Personally, if I understand what you're saying above, I do more of what the IT team is thinking about (I use Postgres). In other words my app deploys over SSH using a given account (let's say it's the AppName account). That means I need to have my SSH keys lined up for secure deployment (using a PEM or known_keys or whatever).
In the home root for AppName I have a file called .pgpass which has pretty specific security on it (0600). This means that my AppName account will use local security to get in rather than a username/password.
I do this because otherwise I'd need to store that information in a file somewhere - and those things get treated poorly pushed to github, for instance.
Ultimately, think 5 years from now and what your project and team will look like. Be optimistic - maybe it will be a smashing success! What will maintenance look like? What kinds of mistakes will your team make? Flexibility now is nice, but make sure that whomever will get in trouble if your database has a security problem is the one who gets to make the decision.
The best practice is to have individual accounts. This allows you to use database facilities for identifying who is accessing the database.
This is particularly important if the data is being modified. You can log who is modifying what data -- generally a hard-core requirement in any system where users have this ability.
You may find that, for some reason, you do not want to use your database's built-in authentication mechanisms. In that case, you are probably going to build a layer on top of the database, replicating much of the built-in functionality. There are situations where this might be necessary. In general, this would be a dangerous approach (the database security mechanisms probably undergo much more testing than bespoke code).
Finally, if you are building an in-house application with just a handful of users who have read-only access to the database, it might be simpler to have only a single login account. Normally, you would still like to know who is doing what, but for simplicity, you might forego that functionality. However, knowing who is doing what is usually very useful knowledge for maintaining and enhancing the application.
I'm trying to get some advice on how to approach a security architecture on Azure.
Background:
We are looking at building a multi-tenant app on Azure that needs to be extremely secure (personally sensitive data). The app will be accessed by standard browsers and mobile devices.
Security access types:
We have three types of users / access types...
1 - plain old user/password over https is fine, accessing both general, non private SQL plus hosted files
2 - user/pass over https, but need authentication of users via certificates that will be installed on user machines/devices. This level of user will need access to sensitive data which should be encrypted at rest both in database, and also any uploaded files.
3 - same as (2) but with the addition of some two factor authentication (we have used YubiKey for other things - might look towards a phone OTP offering as well)
Most users will only have access to their own tenant databases, however we have "account manager" type users that need access to selected tenant data, therefore we expect that they will need either a copy of one certificate per tenant they serve, or we will have to use some kind of master certificate.
Database type:
From a multi-tenant point of view it seems Azure Federated SQL is a good way to go because (a) we simply write one app with "TenentID" key in each table, and after login, set a global filter that handles the isolate for us (b) we understand that Azure federated SQL actually in the background maintains separate SQL database instances per tenant.(Ref: http://msmvps.com/blogs/nunogodinho/archive/2012/08/11/tips-amp-tricks-to-build-multi-tenant-databases-with-sql-databases.aspx)
Can anyone point to any links or give advice in relation to the approach needed to setup and manage file shares, encryption of SQL and file data at rest, authentication of users etc. (automated management on new user signup pref).
I can't really help on the certificates, but you will indeed need some "master certificate". If you are planning on using Azure website, you can't use your own certificates currently.
Concerning the database setup. SAAS applications build on trust, so you NEVER (EVER) want to be showing or editing the data of using to other users.
Therefore I strongly suggest that you don't use the TenantID for each table. This would leave still the possiblity of an attack by a malicious user or an error by some developer.
The only way to get around these risks are
extensive testing
physical different tables to store each tenant data.
Personally I believe that even with very extensive+automated testing you can't have 100% code coverage against malicious users. I guess I am not alone.
The only way out IMHO is physical different tables. Let's look at the options:
different server: valid, but pretty expensive in azure
different database: valid, less management overhead but same objection as the previous option - expensive if you have a lot of tenants
different schema's: the solution. Think about it...
you only have to manage users and there default schema's
you can back-up schema's using powershell
you can move schema's to other databases with some work
You can still digg into SQL federation if you need to.
the major drawback is that you will need to support database upgrades for each tenant.
Have you read on azure.com any articles about multi-tenancy? http://msdn.microsoft.com/en-us/library/windowsazure/hh689716.aspx
I know that LDAP server (or directory service or directory) stores information (mostly used for storing user information) in object oriented database.
Is it just a "user store"? And can be used using LDAP API or "LDAP configuration in server" for user authentication and to get user information...
LDAP in itself provides any other functionality than storing user information? Like security configuration? policy configuration? etc.
How bad performance will be if a relational database (say Oracle) is used to store user information?
Thanks.
Actually newer versions of OpenLDAP store their configuration inside itself only, classic text configuration file is depreciated, if not removed already. This feature is called cn=config in OpenLDAP [ http://www.openldap.org/doc/admin24/slapdconf2.html#cn=config ]. Thing you're probably thinking about is dynamic ACI ( not to be confused with ACL which is also provided ), and sure, LDAP, in general, provides much functionality like that. There are also monitor backends provided, in general LDAP likes itself, and is driven into self-managed direction. However, it's purpose is quite different than RDBMS, it's optimized for search operations, but not manipulating data and doing computations on it. Think it that way - e.g. user information, or DNS information is retrieved enormously more times than modified, and that's field in which ldap rocks. You actually rarely need suming UserID's, don't you? :) Object oriented database means, that - in contrary to RDBMS - data is organized with the way closer to OO type ( classess, attributes, inheritance etc. ). There are also SQL backends to ldap ( don't know what sense does it make though ), but I haven't heard about LDAP backends for SQL database.
Have a look on OpenLDAP Administration Guide here
http://www.openldap.org/doc/admin24/
Regarding storing custom information, you can create your own classes, objects and even attribute types, by inheritance/composing existing entities, or from scratch. Sky is the limit, man ;-)
An LDAP directory server stores data in attributes which are grouped in entries. Which attributes are required or allowed in an entry is defined by an attribute called an objectClass. Each attribute type has an attribute definition in a schema. The attribute type definition has a syntax which defines what sort of data is allowed, possibly a matching rule and/or ordering rule defining how attribute values are compared, and other data describing the attribute. Any sort of data can be stored in a directory server database, including binary data. Most often a directory server is used for authentication and profile information. Legacy directory servers like OpenLDAP don't perform as well on updates (ADD, MOD, DELETE, MODRDN) as on authentication or searches, but more modern servers perform updates at a very high rate.
how can i handle read authentication in couchdb? i know roles can be defined in seperate databases but i want to implement read authentication on document level. i am thinking about using node.js but it does not seem an elegant solution because couchdb also has a http server and i dont want to add one more (or another application server like ruby or python). is there anyone working on this?
Thanks.
In the recent O'Reilly web cast on CouchDB, J. Chris Anderson mentioned that read authentication was best handled by a combination of partial replication and multiple databases per reader group. Each database would contain only the documents pertaining to that specific group.
It makes the most sense when you think of each readers CouchDB as a filtered instance of an authority database.
That's basically the correct answer. What I'd add is that document-level read control is hard to get right, especially in the presence of views. Filtering map rows at read-time is doable, but not very IO efficient. Generate reduction values based on filtered map rows, however, is prohibitively expensive.
For those reasons we encourage you to operate something like a database per access group, and make the entire database readable by all users.
I'm a lead developer on a project which is building web applications for my companies SaaS offering. We are currently using LDAP to store user data such as IDs, passwords, contanct details, preferences and other user specific data.
One of the applications we are building is a reporting service that will both collect and present management information to our end users. Obviously this service will require a RDBMS but it will also need to access user data stored in LDAP.
As I see it we have a two basic implementation options:
Duplicate user data in both LDAP and the RDBMS.
Have the reporting service access LDAP whenever it needs user data.
Although duplicating data (and implementing the mechanisms to make this happen) as suggested in option 1 seems the wrong way to go, my gut feeling is that option 2 would not perform well enough (how do you 'join' LDAP data to RDBMS data as efficiently as a pure RDBMS implementation?).
I did find a related question but I'm still unsure which approach to take. I'd be interested in seeing what people thought of either option or perhaps other options.
Why would you feel that duplicating data would be the wrong way to go? Reporting tools (web based and otherwise) are mostly built around RDBMS's, so any mix'n'match will introduce unnecessary complexities. Reports are likely to need to be changed fairly frequently (from experience), so you want them to be as simple as possible. The data you store about users is unlikely to change its format very often, so once you have your import function working, you won't need to touch it again.
The only obstacle I can see is latency: how do you ensure that your RDBMS copy is up to date? You might need to ensure that your updating code writes to both destinations. Personally, also, I wouldn't necessarily use LDAP for application specific personal preferences: LDAP can't handle transactions, so what happens when data is updated from several directions? (Transactionality is of course also a problem with letting updaters write to both stores...) I'd rather let the RDBMS be the master for most data, and let LDAP worry only about identity, credentials and entitlements, which are rarely changed and only for one set of purposes. For myself, LDAP's ability to deal with hierarchical data isn't all that great a selling point.
Data duplication is not always a bad thing, especially when the usage scenarios are different enough.