XACML or DB approach

XACML or DB approach - xacml

I went through the XACML document and it explains about maintaining authorization policies in an XML file, the same can be done by keeping the policies in database, My question is what is the advantage of storing policies in XML file like XACML over DB approach, because at the end of the day its just parsing an XML or querying database.

#user3405607: If I understand you correctly you are questioning the need for "complicated" XACML standard/spec when a database evaluation engine would do the "same" job?
If so, the answer is that a DB-based engine can only provide access control decisions for simple rules, mostly ACL related. For example if you have resources X, Y, Z and users A, B, C, you could design a simple table like:
+----------+----------+----------+
| X | Y | Z |
+----------+----------+----------+
A | 1 | 0 | 1 |
| | | |
B | 1 | 0 | 1 |
| | | |
C | 0 | 1 | 1 |
+----------+----------+----------+
But as you can see this will not scale. Of course you can then make role based ACL rather than using user -> resource mapping. But again this will only cater to simple rules.
How would you handle a simple rule "A user in finance department can approve an order if he is not the one who raised the order and if the order amount is less than his maximum approval limit", assuming the department is captured in an Active Directory?
Of course if all those details needed in the rule (department, order issuer, amount, max amount) are all in a DB, you could consider writing complex SQL queries to do the job for you but then again, the policies containing rules can only get more complicated and soon you will end up with complex policies that turn into a complex decision tree for which writing DB queries will be not worth the hassle.
Also, it will soon end up that you will actually have to write a good sized code to perform and parse all these queries and responses and in fact this will be the entity called a PDP in XACML literature.
The need for XACML also goes beyond that since it defines a standards based policy language as well as a request-response protocol.
I would suggest you to read up on some of the basic material on this matter since my explanation may not do justice to the complexity involved and the need for a dedicated evaluation engine that is not solely reliant on DB queries.

XACML policy repository can be any thing, database or file system or any registry. But if you think about clustering, security, manageability and performance. I guess using database approach is good.. Let me provide few reason for it..
If multiple PDP nodes (different machines) are running in a clustered environment, It is easy that nodes are pointed same database. Then policies can be managed by master nodes and we do not want to worry about policy distribution of across other nodes.
XACML policy can contain meta data that is associate with it. Policy Order is an one of meta data that is associated with the policy. If we use file based approach, we may need to manage them in a separate way. But if it is a database it is just adding another column into policy table. Actually practical PDP, There can be some other associated meta data such as policy enable/disable, policy updater, last update time and so on.
If you are not using policy caching, I guess it is better to use database approach. It may be fast. Some database may provide caching itself. If you have meta data, they can be also retrieved by a single SQL query
However, I do not think practical PDP would never read and load policies for each request. Once PDP initializes Policy would be load into the caches. In that case, there would not be any performance issue with file base approach. But, if cache are expired frequently, then most of time policies may be loaded. So it is always better to go for database approach.

Related

S3 Integration with Snowflake: Best way to implement multi-tenancy?

My team is planning on building a data processing pipeline that will involve S3 integration with Snowflake. This article from Snowflake shows that an AWS IAM role must be created in order for Snowflake to access S3's data.
However, in our pipeline, we need to ensure multi-tenancy and data isolation between users. For example, let's assume that Alice and Bob has files in S3 under "s3://bucket-alice/file_a.csv" and "s3://bucket-bob/file_b.csv" respectively. Then, we want to make sure that, when staging Alice's data onto Snowflake, Alice can only access "s3://bucket-alice" and nothing under "s3://bucket-bob". This means that individual AWS IAM roles must be created for each user.
I do realize that Snowflake has it's own access control system, but my team wants to make sure that data isolation is fully achieved from the S3-to-Snowflake stage of the pipeline, and not only relying on Snowflake's access control.
We are worried that this will not be scalable, as AWS sets a limit of 5000 IAM users, and that will not be enough as we scale our product. Is this the only way of ensuring data multi-tenancy, and does anyone have a real-world application example of something like this?

Have you explored leveraging Snowflake's Internal Stage, instead? By default, every user gets their own internal stage that only they have permissions to from within Snowflake and NO access outside of Snowflake. Snowflake offers the ability to move data in and out of that Internal Stage using just about every driver/connector that Snowflake has available. This said, any pipeline/workflow that is being leveraged by 5000+ users would be able to use these connectors to load data to Snowflake Internal Stage (S3) without the need for any additional AWS IAM Users. Would that be a sufficient solution for your situation?

How to represent ramifications and conditions using BPMN?

I must represent a process using BPMN 2.0 with conditions that generates ramifications exponentially, and I can't see a way to represent it graphically within the diagram without make it grow that much.
That's a print with the problem, the diagram tends to grow even more.
The option for the company is the same for all 3 companies.
The type of employee is the same for all 3.
The configurations are the same for all 3 companies.
the configuration for the type of emplyee is the same for all 3 companies
BUT the last configurations are specific for each company, and I define it for each employee.
Is there a way I could simplify this mess?
EDIT: the result became very simple.

I think you can use a Inclusive gateway here. Join the flows at the "type of employee" location and go through your configuration step and then again fork the path based on the company and the config using a inclusive gateway again. Inclusive gateway makes sure you only wait for executed process paths.
This is supported in BPMN engine in WSO2 EI and Activiti, I'm not sure abut other engines.

With ABAC/XACML how do you protect resources in reports/large result sets?

How have folks used an abac approach when running reports or even just selecting multiple records from a DB?
For instance, if you have a policy that states:
Doctors can only view patients in their hospital
Obviously the efficient way to implement this is to include a filter in your query (where hospital = XXX), but this seems to break with the principal of ABAC as it bakes the rule into the SQL itself.
I know Axiomatics offers a reverse query mechanism that apparently generates filters for you based on rules-- but my system has a lot of complex sql that would have to be refactored quite a bit to work with this.
How have other folks handle this problem?

There are essentially three ways to address this:
Via a reverse query mechanism as you alluded to. This is indeed only supported by Axiomatics at the moment. The idea behind a reverse query is that instead of specifying a full-blown question e.g. "Can Alice view document #123?", you can specify an open-ended question e.g. "Which documents can Alice view?".
Via the the Multiple Decision Profile of XACML 3.0 which allows you to ask multiple questions in one go e.g. "Can Alice view Doc #1, #2, #3?". The MDP is practical for hundreds of items at most. You could combine it with a pagination strategy. You can read more on MDP here.
Via the use of obligations. You could write a policy that says that as a whole a doctor has the right to view medical records + obligation to execute a filter SQL statement. The issue with this approach is that it puts authorization semantics inside the obligation rather than inside the policy. Also, what if multiple obligations are triggered?

Selective replication with CouchDB

I'm currently evaluating possible solutions to the follwing problem:
A set of data entries must be synchonized between multiple clients, where each client may only view (or even know about the existence of) a subset of the data.
Each client "owns" some of the elements, and the decision who else can read or modify those elements may only be made by the owner. To complicate this situation even more, each element (and each element revision) must have an unique identifier that is equal for all clients.
While the latter sounds like a perfect task for CouchDB (and a document based data model would fit my needs perfectly), I'm not sure if the authentication/authorization subsystem of CouchDB can handle these requirements: While it should be possible to restict write access using validation functions, there doesn't seem to be a way to authorize read access. All solutions I've found for this problem propose to route all CouchDB requests through a proxy (or an application layer) that handles authorization.
So, the question is: Is it possible to implement an authorization layer that filters requests to the database so that access is granted only to documents that the requesting client has read access to and still use the replication mechanism of CouchDB? Simplified, this would be some kind of "selective replication" where only some of the documents, and not the whole database is replicated.
I would also be thankful for directions to some detailed information about how replication works. The CouchDB wiki and even the "Definite Guide" Book are not too specific about that.

this begs for replication filters. you filter outbound replication based on whatever criteria you impose, and give the owner of the target unrestricted access to their own copy.
i haven't had the opportunity to play with replication filters directly, but the idea would be that each doc would have some information about who has access to it, and the filtering mechanism would then allow outbound replication of only those documents that you have access to. replication from the target back to the master would be unrestricted, allowing for the master to remain a rollup copy, and potentially multicast changes to overlapping sets of data.

What you are after is replication filters. According to Chris Anderson, it is a 0.11 feature.
"The current status is that there is
an API for filtering the _changes
feed. The replicator in 0.10 consumes
the changes feed, so the next step is
getting the replicator to use the
filter API.
There is work in progress on this, so
it should be fully ready to go in
0.11."
See the orginal post

Here is a new link to the some documentation about this:
http://blog.couchbase.com/what%E2%80%99s-new-apache-couchdb-011-%E2%80%94-part-three-new-features-replication

Indeed, as others have said, replication filters are the way to go for this. Here is a link with some information on using them.
One caveat I would add is that at scale replication filters can be extremely slow. More information about this and other nuances about couchdb can be found in this excellent blog post: "what every developer should know about couchdb". For large scale systems performing replication in the application layer has proven faster and more reliable.

Multi-tenancy with SQL/WCF/Silverlight

We're building a Silverlight application which will be offered as SaaS. The end product is a Silverlight client that connects to a WCF service. As the number of clients is potentially large, updating needs to be easy, preferably so that all instances can be updated in one go.
Not having implemented multi tenancy before, I'm looking for opinions on how to achieve
Easy upgrades
Data security
Scalability
Three different models to consider are listed on msdn
Separate databases. This is not easy to maintain as all schema changes will have to be applied to each customer's database individually. Are there other drawbacks? A pro is data separation and security. This also allows for slight modifications per customer (which might be more hassle than it's worth!)
Shared Database, Separate Schemas. A TenantID column is added to each table. Ensuring that each customer gets the correct data is potentially dangerous. Easy to maintain and scales well (?).
Shared Database, Separate Schemas. Similar to the first model, but each customer has its own set of tables in the database. Hard to restore backups for a single customer. Maintainability otherwise similar to model 1 (?).
Any recommendations on articles on the subject? Has anybody explored something similar with a Silverlight SaaS app? What do I need to consider on the client side?

Depends on the type of application and scale of data. Each one has downfalls.
1a) Separate databases + single instance of WCF/client. Keeping everything in sync will be a challenge. How do you upgrade X number of DB servers at the same time, what if one fails and is now out of sync and not compatible with the client/WCF layer?
1b) "Silos", separate DB/WCF/Client for each customer. You don't have the sync issue but you do have the overhead of managing many different instances of each layer. Also you will have to look at SQL licensing, I can't remember if separate instances of SQL are licensed separately ($$$). Even if you can install as many instances as you want, the overhead of multiple instances will not be trivial after a certain point.
3) Basically same issues as 1a/b except for licensing.
2) Best upgrade/management scenario. You are right that maintaining data isolation is a huge concern (1a technically shares this issue at a higher level). The other issue is if your application is data intensive you have to worry about data scalability. For example if every customer is expected to have tens/hundreds millions rows of data. Then you will start to run into issues and query performance for individual customers due to total customer base volumes. Clients are more forgiving for slowdowns caused by their own data volume. Being told its slow because the other 99 clients data is large is generally a no-go.
Unless you know for a fact you will be dealing with huge data volumes from the start I would probably go with #2 for now, and begin looking at clustering or moving to 1a/b setup if needed in the future.

We also have a SaaS product and we use solution #2 (Shared DB/Shared Schema with TenandId). Some things to consider for Share DB / Same schema for all:
As mention above, high volume of data for one tenant may affect performance of the other tenants if you're not careful; for starters index your tables properly/carefully and never ever do queries that force a table scan. Monitor query performance and at least plan/design to be able to partition your DB later on based some criteria that makes sense for your domain.
Data separation is very very important, you don't want to end up showing a piece of data to some tenant that belongs to other tenant. every query must have a WHERE TenandId = ... in it and you should be able to verify/enforce this during dev.
Extensibility of the schema is something that solutions 1 and 3 may give you, but you can go around it by designing a way to extend the fields that are associated with the documents/tables in your domain that make sense (ie. Metadata for tables as the msdn article mentions)

What about solutions that provide an out of the box architecture like Apprenda's SaaSGrid? They let you make database decisions at deploy and maintenance time and not at design time. It seems they actively transform and manage the data layer, as well as provide an upgrade engine.

I've similar case, but my solution is take both advantage.
Where data and how data being placed is the question from tenant. Being a tenant of course I don't want my data to be shared, I want my data isolated, secure and I can get at anytime I want.
Certain data it possibly share eg: company list. So database should be global and tenant database, just make sure to locked in operation tenant database schema, and procedure to update all tenant database at once.
Anyway SaaS model everything delivered as server / web service, so no matter where the database should come to client as service, then only render by client GUI.
Thanks

Existing answers are good. You should look deeply into the issue of upgrading and managing multiple databases. Without knowing the specific app, it might turn out easier to have multiple databases and not have to pay the extra cost of tracking the TenantID. This might not end up being the right decision, but you should certainly be wary of the dev cost of data sharing.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas