With ABAC/XACML how do you protect resources in reports/large result sets? - authorization

How have folks used an abac approach when running reports or even just selecting multiple records from a DB?
For instance, if you have a policy that states:
Doctors can only view patients in their hospital
Obviously the efficient way to implement this is to include a filter in your query (where hospital = XXX), but this seems to break with the principal of ABAC as it bakes the rule into the SQL itself.
I know Axiomatics offers a reverse query mechanism that apparently generates filters for you based on rules-- but my system has a lot of complex sql that would have to be refactored quite a bit to work with this.
How have other folks handle this problem?

There are essentially three ways to address this:
Via a reverse query mechanism as you alluded to. This is indeed only supported by Axiomatics at the moment. The idea behind a reverse query is that instead of specifying a full-blown question e.g. "Can Alice view document #123?", you can specify an open-ended question e.g. "Which documents can Alice view?".
Via the the Multiple Decision Profile of XACML 3.0 which allows you to ask multiple questions in one go e.g. "Can Alice view Doc #1, #2, #3?". The MDP is practical for hundreds of items at most. You could combine it with a pagination strategy. You can read more on MDP here.
Via the use of obligations. You could write a policy that says that as a whole a doctor has the right to view medical records + obligation to execute a filter SQL statement. The issue with this approach is that it puts authorization semantics inside the obligation rather than inside the policy. Also, what if multiple obligations are triggered?

Related

MariaDB data separation in public and private, database design

I am working at a company that merged with another company a while ago.
There we have several business units that are basically equivalent. One in Europe, one in China, each. We already had an in-house MariaDB database, which we want to start sharing.
The problem is that there are different GDPR regulations and contracts that prohibit sharing certain data across sites. So what I can't do, is replicate data across sites and then just hide in from the user in the frontend. The private data has to stay at the facility, it belongs to.
So my idea was to separate each table that we have now and where possibly sensitive information is contained into two tables each.
One say table_contracts_private and table_contracts_public.
This would still seem pretty doable with basic database replication and replicating the public tables across sites. But how would you go about publishing private data? Also how would I best combine private and public data? Just using a view
I just could not find any good mechanisms for this, especially because we would also like to avoid data duplication, so the private entries would need to be removed and replaced by the public ones, which would entail also changing all referencing IDs.
Is this a possible application of sharding?
I'd be really grateful, if someone could point me in the right direction, or if someone has a demo project with similar requirements that I could check out.
Cheers
Is this a possible application of sharding?
I wouldn't think so. Sharding is a performance optimization method. What you need is to support legal constraints. Those are two very different problems.
I think you are on the right track. I call this a "walled garden" approach. You create a database with all non-PII information, using ids only. Nothing that remotely directly identifies people, their addresses, phones, credit cards, and so on. This can be tricky. In some jurisdictions combinations of demographics can be PII.
Some of those ids then refer to another database where you store all the sensitive information; this is the "walled garden". I would recommend that this second database be on a separate server. It has a very restricted access list. And this is where you implement requirements for things like "forgetting" a customer.
In any case, the point is that sharding is not the right approach. You want an application redesign with privacy and security as the top priorities. Happily, this is not actually that hard to implement, although if the databases are changing, you may need period auditing. For instance, in one database I worked with, we discovered that "coupon codes" sometimes contained unencrypted email addresses. Arrgggh!

System database design for article personalized recommendation system

Hi I am designing a system which takes in article links from an API, sorts the articles into categories, and then sends a list of recommended article links to users based on users' specified filtering parameters.
The initial approach I've planned out is to use SQL databases to store the sorted articles as well as user info. Then each day I will run a SQL query on the article database for each user to fetch relevant article links. One thing I need to figure out is handling duplicate articles/users, but even assuming that there are unique instances this approach seems pretty inefficient.
I was wondering if there is a better way to design the system for scale, i.e., if the system has to handle the scope of millions of articles and millions of users?
Would grouping users together based on similar article filtering parameters be helpful (so potentially less queries need to be run if two or more users have the same article database querying)? Or would this effort be too complicated and not worthwhile?
The user specifies the filters themself and new articles matching the filters should be send out? Sound more like "alert me if new articles arrive"?
Spontaneously this ideas:
If amount of articles >> users then inverse the logic: on every new article check if some users filter match and append it to a alertchannel on the user.
(For new article complexity is O(n) where n is user amount)
If filter evaluation can be normalized (and splitted in filterparts) easily then storing the filters seperat and reference from filters to the users using that filter. Then you only need to evaluate if new articles matches the filters.
(For new article complexity is O(n) where n is filter amount)
General:
offload peaks by async handling of all this. E.g. buffer new articles in a queue and work on them step by step. Also for the "alert channel" for each user you can use pub/sub channels
Other Ideas:
Consider doing item-item (or user-item) based recommendations with existing libs and tools
And in general grow complexity of your evaluation once needed (its ok to start simplier and with an algorithm that scales not perfectly if it works for your case)

How to represent ramifications and conditions using BPMN?

I must represent a process using BPMN 2.0 with conditions that generates ramifications exponentially, and I can't see a way to represent it graphically within the diagram without make it grow that much.
That's a print with the problem, the diagram tends to grow even more.
The option for the company is the same for all 3 companies.
The type of employee is the same for all 3.
The configurations are the same for all 3 companies.
the configuration for the type of emplyee is the same for all 3 companies
BUT the last configurations are specific for each company, and I define it for each employee.
Is there a way I could simplify this mess?
EDIT: the result became very simple.
I think you can use a Inclusive gateway here. Join the flows at the "type of employee" location and go through your configuration step and then again fork the path based on the company and the config using a inclusive gateway again. Inclusive gateway makes sure you only wait for executed process paths.
This is supported in BPMN engine in WSO2 EI and Activiti, I'm not sure abut other engines.

Advantage of LDAP over RDBMS?

I have an application with a backend as database.
The application is sort of PUB-SUB model where users post changes to the application and other peers subscribe to those changes. These changes may happen very frequently or periodically and all the changes have to be written to database.
Now, I am being asked to find the possibility of replacing this RDBMS with LDAP. Probably they want unified DB for all applications but anyways I have to find the advantage/disadvantages of both approaches.
I cannot directly compare RDBMS a with LDAP as I have almost no idea of LDAP though I tried to get some.
I understand that LDAP is designed for directory access and is optimized for Read access, so it is write once and read many. I have read that frequent writes will reduce the performance of LDAP server as each write will result a trigger to indexing process.
Just to give a scenario in regards with indexing in LDAP, my table will have few columns say 2 viz. Name and Desc. Now in LDAP I suppose this would become two attributes as Name and Desc. In my scenario it's Desc which will be frequently updated. I assume Name will be indexed so even if Desc is changing frequently it won't trigger indexing process.
I point is worth mentioning that the database will be hosted on some cloud platform.
I tried to find out the differences but nothing conclusive I could find out.
LDAP is a protocol, REST is a service based on the HTTP (protocol). So when the LDAP server shall not be exposed to the internet, how do you want to get the data from it? As LDAP is the protocol you would need direct access to the LDAP-server. Its like a database server that you would not expose directly to the internet. You would build an interface to encapsulate it. and that might as well be a REST interface.
I'd try to get the point actos that one is the transfer protocol and a storage backend and the ither is the public interface to its data. It's a bit like why is mysql better than a webinterface. You'd never make the mysql-server publicly available but encapsulate its protocol into an application.
REST is an interface. It doesn't matter how you orgsnize your data behind that interface. When you decide that you want to organize it differently you can do so without the consumer of your API noticing any change. And you can provide different versions of your API depending on improvements of your service.
LDAP on the other hand is an implementation. You can't change the way your data is handled without the consumer noticing it. So there's no way to rearrange your backend without affecting the consumer.
With REST you can therefore change the backend from MySQL to PostgreSQL even to LDAP without notice which you won't be able with LDAP.
Hope that helps
Now that we finally know what you're actually asking, which has nothing to do with your title, the body of your question, or REST, the simple answer is that there is no particular reason to believe that an LDAP server will perform significantly better than an RDBMS in this application, with two riders:
it may not even be feasible, due to the schema issue, and
if it is feasible it may not be semantically suitable, due to the lack of ACID properties, lack of JOINs, and the other issues mentioned in comments.
I will state that this is one of the worst formulated questions I have seen here for some considerable time, and the difficulty of extracting the actual question was extreme.

Selective replication with CouchDB

I'm currently evaluating possible solutions to the follwing problem:
A set of data entries must be synchonized between multiple clients, where each client may only view (or even know about the existence of) a subset of the data.
Each client "owns" some of the elements, and the decision who else can read or modify those elements may only be made by the owner. To complicate this situation even more, each element (and each element revision) must have an unique identifier that is equal for all clients.
While the latter sounds like a perfect task for CouchDB (and a document based data model would fit my needs perfectly), I'm not sure if the authentication/authorization subsystem of CouchDB can handle these requirements: While it should be possible to restict write access using validation functions, there doesn't seem to be a way to authorize read access. All solutions I've found for this problem propose to route all CouchDB requests through a proxy (or an application layer) that handles authorization.
So, the question is: Is it possible to implement an authorization layer that filters requests to the database so that access is granted only to documents that the requesting client has read access to and still use the replication mechanism of CouchDB? Simplified, this would be some kind of "selective replication" where only some of the documents, and not the whole database is replicated.
I would also be thankful for directions to some detailed information about how replication works. The CouchDB wiki and even the "Definite Guide" Book are not too specific about that.
this begs for replication filters. you filter outbound replication based on whatever criteria you impose, and give the owner of the target unrestricted access to their own copy.
i haven't had the opportunity to play with replication filters directly, but the idea would be that each doc would have some information about who has access to it, and the filtering mechanism would then allow outbound replication of only those documents that you have access to. replication from the target back to the master would be unrestricted, allowing for the master to remain a rollup copy, and potentially multicast changes to overlapping sets of data.
What you are after is replication filters. According to Chris Anderson, it is a 0.11 feature.
"The current status is that there is
an API for filtering the _changes
feed. The replicator in 0.10 consumes
the changes feed, so the next step is
getting the replicator to use the
filter API.
There is work in progress on this, so
it should be fully ready to go in
0.11."
See the orginal post
Here is a new link to the some documentation about this:
http://blog.couchbase.com/what%E2%80%99s-new-apache-couchdb-011-%E2%80%94-part-three-new-features-replication
Indeed, as others have said, replication filters are the way to go for this. Here is a link with some information on using them.
One caveat I would add is that at scale replication filters can be extremely slow. More information about this and other nuances about couchdb can be found in this excellent blog post: "what every developer should know about couchdb". For large scale systems performing replication in the application layer has proven faster and more reliable.