Representing complex data types in XACML using Authzforce - authorization

I am new to XACML and I would be grateful if you can help me with one problem I encountered.
I use AuthzForce Core PDP (version 17.1.2).
I am wondering what is the correct approach of representing complex data types in XACML.
Example
Access should be granted if PIP response contains any person whose name is present in names array from request and salary of that person is higher than salary provided in request.
Request
names = ["Eric", "Kyle"]
salary = 1500
PIP response
[
{
"name": "Kyle",
"salary": 1000
},
{
"name": "Kenny",
"salary": 2000
},
{
"name": "Eric",
"salary": 4000
},
{
"name": "Stan",
"salary": 3000
}
]
Access will be granted because PIP response contains person with name Eric and his salary is higher than 1500.
My implementation
To represent PIP response I ended up with creating custom type by extending StringParseableValue class from AuthzForce. For above mentioned logic I use attribute designator in xml and have coresponding attribute provider (class extending BaseNamedAttributeProvider) in Java performing PIP call.
I also wrote two custom functions:
Find people with higher salary than provided in one param (returns filtered list)
Get person name (returns string)
And using those functions and standard function I wrote policy and it works.
However my solution seems to be overcomplicated. I suppose what I did can be achieved by using only standard functions.
Additionally if I wanted to define hardcoded bag of people inside other policy single element would look like this:
<AttributeValue DataType="person">name=Eric###salary=4000</AttributeValue>
There is always possibility that parsing of such strings might fail.
So my question is: What is a good practice of representing complex types like my PIP response in XACML using Authzforce? Sometimes I might need to pass more complex data in the request and I saw example in XACML specification showing passing such data inside <Content> element.

Creating a new XACML data-type - and consequently new XACML function(s) to handle that new data-type - seems a bit overkill indeed. Instead, you may improve your PIP (Attribute Provider) a little bit, so that it returns only the results for the employees named in the Request, and only their salaries (extracting them from the JSON using JSON path) returned as a bag of integers.
Then, assuming this PIP result is set to the attribute employee_salaries in your policy (bag of integers) for instance, and min_salary is the salary in the Request, it is just a matter of applying any-of(integer-less-than, min_salary, employee_salaries) in a Condition. (I'm using short names for the functions by convenience, please refer to the XACML 3.0 standard for the full identifiers.)
Tips to improve the PIP:
One issue here is performance (scalability, response time / size...) because if you have hundreds even thousands of employees, it is overkill to get the whole list from the REST service over and over, all the more as you need only a small subset (the names in the Request). Instead, you may have some way to request the REST service to return only a specific employees, using query parameters; an example using RSQL (but this depends on the REST service API):
HTTP GET http://rest-service.example.com/employees?search=names=in=($employee_names)
... where you set the $employee_names variable to (a comma-separated list of) the employee names from the Request (e.g. Eric,Kyle). You can get these in your AttributeProvider implementation, from the EvaluationContext argument of the overriden get(...) method (EvaluationContext#getNamedAttributeValue(...)).
Then you can use a JSON path library (as you did) to extract the salaries from the JSON response (so you have only the salaries of the employees named in the Request), using this JSON path for instance (tested with Jayway):
$[*].salary
If the previous option is not possible, i.e. you have no way of filtering employees on the REST API, you can always do this filtering in your AttributeProvider implementation with the JSON path library, using this JSON path for instance (tested with Jayway against your PIP response):
$[?(#.name in [$employee_names])].salary
... where you set the $employee_names variable like in the previous way, getting the names from the EvaluationContext. So the actual JSONpath after variable replacement would be something like:
$[?(#.name in [Eric,Kyle])].salary
(You may add quotes to each name to be safe.)
All things considered, if you still prefer to go for new XACML data-type (and functions), and since you seem to have done most of the work (impressive btw), I have a suggestion - if doable without to much extra work - to generalize the Person data-type to more generic JSON object datatype that could be reused in any use case dealing with JSON. Then see whether the extra functions could be done with a generic JSONPath evaluation function applied to the new JSON object data-type. This would provide a JSON equivalent to the standard XML/XPath data-type and functions we already have in XACML, and this kind of contribution would benefit the AuthzForce community greatly.
For the JSON object data-type, actually you can use the one in the testutils module as an example: CustomJsonObjectBasedAttributeValue which has been used to test support of JSON objects for the GeoXACML extension.

Related

REST GET mehod: Can return a list of enriched resources?

I have a doubt when I'm designing a REST API.
Consider I have a Resource "Customer" with two elements in my server, like this:
[
{
name : "Mary",
description : "An imaginary woman very tall."
},
{
name : "John",
description : "Just a guy."
}
]
And I want to make an endpoint, that will accept a GET request with a query. The query will provide a parameter with a value that will make an algorithm count how many occurrences for this text are there in all of its parameters.
So if we throw this request:
GET {baseURL}/customers?letters=ry
I should get something like
[
{
name : "Mary",
description : "An imaginary woman very tall.",
count : 3
},
{
name : "John",
description : "Just a guy.",
count : 0
}
]
Count parameter can not be included in the resource scheme as will depend on the value provided in the query, so the response objects have to be enriched.
I'm not getting a list of my resource but a modified resource.
Although it keeps the idempotent condition for GET Method, I see it escapes from the REST architecture concept (even the REST beyond CRUD).
Is it still a valid endpoint in a RESTful API? or should I create something like a new resource called "ratedCustomer"?
REST GET mehod: Can return a list of enriched resources?
TL;DR: yes.
Longer answer...
A successful GET request returns a representation of a single resource, identified by the request-target.
The fact that the information used to create the representation of the resource comes from multiple entities in your domain model, or multiple rows in your database, or from reports produced by other services... these are all implementation details. The HTTP transfer of documents over a network application doesn't care.
That also means that we can have multiple resources that include the same information in their representations. Think "pages in wikipedia" that duplicate each others' information.
Resource identifiers on the web are semantically opaque. All three of these identifiers are understood to be different resources
/A
/A?enriched
/B
We human beings looking at these identifiers might expect /A?enriched to be semantically closer to /A than /B, but the machines don't make that assumption.
It's perfectly reasonable for /A?enriched to produce representations using a different schema, or even a different content-type (as far as the HTTP application is concerned, it's perfectly reasonable that /A be an HTML document and /A?enriched be an image).
Because the machines don't care, you've got additional degrees of freedom in how you design both you resources and your resource identifiers, which you can use to enjoy additional benefits, including designing a model that's easy to implement, or easy to document, or easy to interface with, or easy to monitor, or ....
Design is what we do to get more of what we want than we would get by just doing it.

Creating Mandatory User Filters with multiple element IDs

Mandatory User Filters
I am working on a tool to allow customers to apply Mandatory User Filters. When attributes are loaded like "Year" or "Age", each can have hundreds of elements with the subsequent ids. In the POST request to create a filter (documented here: https://developer.gooddata.com/article/lets-get-started-with-mandatory-user-filters), looks like this:
{
"userFilter": {
"content": {
"expression": "[/gdc/md/{project-id}/obj/{object-id}]=[/gdc/md/{project-id}/obj/{object-id}/elements?id={element-id}]"
},
"meta": {
"category": "userFilter",
"title": "My User Filter Name"
}
}
}
In the "expression" property, it notes how one ID could be set. What I want is to have multiple ids associated with the object-id set with the post. For example, if I user wanted to add a filter to all of the elements in "Year" (there are 150) in the demo project, it seems odd to make 150 post requests.
Is there a better way?
UPDATE
Tomas thank you for your help.
I am not having trouble assigning multiple userfilters to a user. I can easily apply a singular filter to a user with the method outlined in the documentation. However, this overwrites the userfilter field. What is the syntax for this?
Here is my demo POST data:
{ "userFilters":
{ "items": [
{ "user": "/gdc/account/profile/decd0b2e3077cf9c47f8cfbc32f6460e",
"userFilters":["/gdc/md/a1nc4jfa14wey1bnfs1vh9dljaf8ejuq/obj/808728","/gdc/md/a1nc4jfa14wey1bnfs1vh9dljaf8ejuq/obj/808729","/gdc/md/a1nc4jfa14wey1bnfs1vh9dljaf8ejuq/obj/808728"]
}
]
}
}
This receives a BAD REQUEST.
I'm not sure what you mean by "have multiple ids associated with the object-id" exactly, but I'll try to tell you all I know about it. :-)
If you indeed made multiple POST requests, created multiple userFilters and set them all for one user, the user wouldn't see anything at all. That's because the system combines separate userFilters using logical AND, and a Year cannot be 2013 and 2014 at the same time. So for the rest of my answer, I'll assume that you want OR instead.
There are several ways to do this. As you may have guessed by now, you can use AND/OR explicitly, using an expression like this:
[/…/obj/{object-id}]=[/…/obj/{object-id}/elements?id={element-id}] OR [/…/obj/{object-id}]=[/…/obj/{object-id}/elements?id={element-id}]
This can often be further simplified to:
[/…/obj/{object-id}] IN ( [/…/obj/{object-id}/elements?id={element-id}], [/…/obj/{object-id}/elements?id={element-id}], … )
If the attribute is a date (year, month, …) attribute, you could, in theory, also specify ranges using BETWEEN instead of listing all elements:
[/…/obj/{object-id}] BETWEEN [/…/obj/{object-id}/elements?id={element-id}] AND [/…/obj/{object-id}/elements?id={element-id}]
It seems, though, that this only works in metrics MAQL and is not allowed in the implementation of user filters. I have no idea why.
Also, for your own attribute like Age, you can't do that since user-defined numeric attributes aren't supported. You could, in theory, add a fact that holds the numeric value, and construct a BETWEEN filter based on that fact. It seems that this is not allowed in the implementation of user filters either. :-(
Hope this helps.

RESTful API - Correct behaviour when spurious/not requested parameters are passed in the request

We are developing a RESTful api that accepts query parameters in the request in the form of JSON encoded data.
We were wondering what is the correct behaviour when non requested/not expected parameters are passed along with the required ones.
For example, we may require that a PUT request on a given endpoint have to provide exactly two values respectively for the keys name and surname:
{
"name": "Jeff",
"surname": "Atwood"
}
What if a spurious key is passed too, like color in the example below?
{
"name": "Jeff",
"surname": "Atwood",
"color": "red"
}
The value for color is not expected, neither documented.
Should we ignore it or reject the request with a BAD_REQUEST 400 status error?
We can assert that the request is bad because it doesn't conform to the documentation. And probably the API user should be warned about it (She passed the value, she'll expects something for that.)
But we can assert too that the request can be accepted because, as the required parameters are all provided, it can be fulfilled.
Having used RESTful APIs from numerous vendors over the years, let me give you a "users" perspective.
A lot of times documentation is simply bad or out of date. Maybe a parameter name changed, maybe you enforce exact casing on the property names, maybe you have used the wrong font in your documentation and have an I which looks exactly like an l - yes, those are different letters.
Do not ignore it. Instead, send an error message back stating the property name with an easy to understand message. For example "Unknown property name: color".
This one little thing will go a long ways towards limiting support requests around consumption of your API.
If you simply ignore the parameters then a dev might think that valid values are being passed in while cussing your API because obviously the API is not working right.
If you throw a generic error message then you'll have dev's pulling their hair out trying to figure out what's going on and flooding your forum, this site or your phone will calls asking why your servers don't work. (I recently went through this problem with a vendor that just didn't understand that a 404 message was not a valid response to an incorrect parameter and that the documentation should reflect the actual parameter names used...)
Now, by the same token I would expect you to also give a good error message when a required parameter is missing. For example "Required property: Name is missing".
Essentially you want to be as helpful as possible so the consumers of your API can be as self sufficient as possible. As you can tell I wholeheartedly disagree with a "gracious" vs "stern" breakdown. The more "gracious" you are, the more likely the consumers of your API are going to run into issues where they think they are doing the right thing but are getting unexpected behaviors out of your API. You can't think of all possible ways people are going to screw up so enforcing a strict adherence with relevant error messages will help out tremendously.
If you do an API design you can follow two path: "stern" or "gracious".
Stern means: If you do anything I didn't expect I will be mad at you.
Gracious means: If I know what you want and can fulfil it I will do it.
REST allows for a wonderful gracious API design and I would try to follow this path as long as possible and expect the same of my clients. If my API evolves I might have to add additional parameters in my responses that are only relevant for specific clients. If my clients are gracious to me they will be able to handle this.
Having said that I want to add that there is a place for stern API design. If you are designing in an sensitive domain (e.g. cash transactions) and you don't want to leave room for any misunderstanding between the client and server. Imagine the following POST request (valid for your /account/{no}/transaction/ API):
{ amount: "-100", currency : "USD" }
What would you do with the following (invalid API request)?
{ amount: "100", currency : "USD", type : "withdrawal" }
If you just ignore the "type" attribute, you will deposit 100 USD instead of withdrawing them. In such a domain I would follow a stern approach and show no grace whatsoever.
Be gracious if you can, be stern if you must.
Update:
I totally agree with #Chris Lively's answer that the user should be informed. I disagree that it should always be an error case even the message is non-ambiguous for the referenced resource. Doing it otherwise will hinder reuse of resource representations and require repackaging of semantically identical information.
It depends on your documentation.. how strict you want to be .. But commonly speaking, Just ignore it. Most other servers also ignore request parameters it didn't understand.
Example taken from my previous post
Extra Query parameters in the REST API Url
"""Google ignore my two extra parameters here https://www.google.com/#q=search+for+something&invalid=param&more=stuff"""
Imagine I have the following JSON schema:
{
"frequency": "YEARLY",
"date": 23,
"month": "MAY",
}
The frequency attribute accepts "WEEKLY", "MONTHLY" and "YEARLY" value.
The expected payload for "WEEKLY" frequency value is:
{
"frequency": "WEEKLY",
"day": "MONDAY",
}
And the expected payload for "MONTHLY" frequency value is:
{
"frequency": "MONTHLY",
"date": 23,
}
Give the above JSON schema, typically I will have need a POJO containing frequency, day, date, and month fields for deserialization.
If the received payload is:
{
"frequency": "MONTHLY",
"day": "MONDAY",
"date": 23,
"year": 2018
}
I will throw an error on "day" attribute because I will never know the intention of the sender:
frequency: "WEEKLY" and day: "MONDAY" (incorrect frequency value entered), or
frequency: "MONTHLY" and date: 23
For the "year" attribute, I don't really have choice.
Even if I wish to throw an error for that attribute, I may not be able to.
It's ignored by the JSON serialization/deserialization library as my POJO has no such attribute. And this is the behavior of GSON and it makes sense given the design decision.
Navigating the Json tree or the target Type Tree while deserializing
When you are deserializing a Json string into an object of desired type, you can either navigate the tree of the input, or the type tree of the desired type. Gson uses the latter approach of navigating the type of the target object. This keeps you in tight control of instantiating only the type of objects that you are expecting (essentially validating the input against the expected "schema"). By doing this, you also ignore any extra fields that the Json input has but were not expected.
As part of Gson, we wrote a general purpose ObjectNavigator that can take any object and navigate through its fields calling a visitor of your choice.
Extracted from GSON Design Document
Just ignore them.
Do not give the user any chance to reverse engineer your RESTful API through your error messages.
Give the developers the neatest, clearest, most comprehensive documentation and parse only parameters your API need and support.
I will suggest that you ignore the extra parameters. Reusing API is a game changer in the integration world. What if the same API can be used by other integration but with slightly extra parameters?
Application A expecting:
{
"name": "Jeff",
"surname": "Atwood"
}
Application B expecting:
{
"name": "Jeff",
"surname": "Atwood",
"color": "red"
}
Simple get application application A to ignore "color" will do the job rather to have 2 different API to handle that.

Can't understand some basic REST stuff

Suppose my model is:
User:
id
nickname
I have a collection /users/
I want the Users to be retrieved by /users/{id} and not /users/${nickname}, because in some more complex cases, there could be no "logical unique constraint".
So the basic JSON payload I could use is for exemple:
{
id: 123,
nickname: 'someUserName'
}
Nothing fancy here.
POST on /users/
As far as I know, an user as an identifier. It is part of the resource representation, so it should be in the payload (?).
Put what if I want to generate the ID myself on the backend, using a DB sequence for exemple?
Then my payload becomes:
{
nickname: 'someUserName'
}
Is this appropriate?
What is supposed to be the output of this POST? Nothing? Just a header referencing the resource location, including the ID?
GET on /users/id
When we get the resource, we load its content as JSON:
{
id: 123,
nickname: 'someUserName'
}
PUT on /users/id
As far as I know, the payload used on this method is supposed to "override" the resource content. If we wanted partial updates, we would have used PATCH.
But what if I do:
PUT /users/123
{
id: 456,
nickname: 'someUserName'
}
Does this mean that we want to update the id of a resource?
Isn't it kind of redundant to use the id in both the URI and the payload?
Actually I don't really know how to handle the id.
I don't know if I am supposed to use the same resource representation in all POST / PUT / DELETE operations.
I don't know if the id should be part of the unique(?) resource representation.
But if the id is not part of the representation, then when I list the users, using GET /users/, if the ids are not returned, then I don't know how the client can get the user ids...
Can someone help me? :)
First of all
It is not REST if you don't use HATEOAS
I hope you understand this, I'll come back to that at the very end.
POST on /users/
It perfectly ok to not use an ID in the POST payload. If an ID is present react with an error message, so developers understand they are doing wrong.
Therefore only the nickname as a payload is perfectly valid if you don't have anything else in your user resource
The output of your server should include three important things:
HEADER: A status code indicating success or failure (usually 201 Created)
HEADER: The location of the newly created resource (just Location: /path/to/resource)
BODY: A representation of the created resource. Give back a complete payload like on a GET!
GET
perfectly valid
PUT
your analysis regarding PUT/PATCH matchs the spec, the new resource should be identical to the payload meaning the user wishes to change the id if it differs. if a payload contains values which shouldn't be changed (like the ID) you have two possibilities:
Ignore the ID in the payload
Return an error
In both cases inform the user about what you did and what went wrong. I prefer to send/get a 400 Bad Request. If a privileged user could change the ID but the particular user can't an 403 Forbidden may be more appropriate.
Also make sure to document your APIs behaviour. You may allow the ID to be omitted in your API. Don't forget to treat IDs given in a POST payload in a consistent way!
Overall questions
REST operates over Resources.
/users/ is an example for an collection of resources
/users/{id} is an example for a single resource
You should always use the exact same representation in each and every response. If for some reason it is more appropriate to give only a snippet of the information add metadata (link) pointing to the full resource representation.
The ID is always present except in the first POST request of an user.
POST implies that the future location of the resource is not known and has to be provided by the server.
This also means that GET /users/ should return the IDs for each resource.
As always in APIs return strict and be forgiving in requests. document your behaviour so users can learn.
HATEOAS
The true beauty of REST comes to daylight if you implement HATEOAS (Hypermedia As The Engine Of Application State). Part of this means that you should sugar your representations with useful tag/link combinations. This way clients never have to construct an url anymore.
An Example using HAL for your user representation would be:
{
"_links:" {
"self": { "href": "http://yourrest/users/123" }
},
"id": "123"
"nickname": "someUserName"
}
A nice wrapup of using HAL was written by Matthew Weier O'Phinney in his blog when he developed a ZF2 REST Module (first entry is completly zf free, only explaining HAL).
I'm interpreting your descriptions as saying that the id is not part of the resource, it's a unique identifier of the resource. In that case, it should not be part of the payload for any operation.
POST /users with payload {"nickname": "somebody"} would create a new resource with a URL returned in the Location header. That URL would presumably look like /users/123 but from the client's point of view there's no reason to expect that. It could look like /something/else/entirely.
GET /users/123 would (assuming that URL was returned by an earlier POST) return the payload {"nickname": "somebody"}.
PUT /users/123 would (with the same assumption as above) replace the resource with the payload you send with the PUT, say {"nickname": "somebody else"}.
If you want the client to be able to name a resource, then you'd also let PUT /users/123 create a new resource with that URL.
I know of no common RESTful way to rename a resource. I suppose a POST with the old URL as part of the query part or the body would make sense.
Now, suppose I'm wrong and you do want id to be part of the resource itself. Then every payload would include it. But from the client's point of view, there should be no assumption that "id": 123 implies that the URL would be /users/123.
Finally, all of this is from a fairly purist point of view. There is value to thinking of URLs as the only real identifier of a resource, but it's not awful to break that rule and have the client use logic to create the URLs.

Arbitrarily nesting some attributes in rabl

I'm designing a new API for my project, and I want to return objects that have nested children as json. For that purpose i've decided to use RABL.
I want the client side to be able to understand whether the object is valid, and if not which fields are missing in order to save it correctly.
The design I thought of should include some fields as optional, under an optional hash, and the rest are required. The required fields should appear right under the root of the json.
So the output I try to describe should look something like this:
{
"name": "John",
"last_name": "Doe",
"optional": {
"address": "Beverly Hills 90210",
"phones":[{"number":"123456","name":"work"}, {"number":"654321","name":"mobile"}]
}
}
The above output example describes the required fields name and last name, and the not required address and phones (which is associated in a belongs_to-has_many relationship to the object). name, last_name and address are User's DB fields.
Playing with RABL I didn't manage so far to create this kind of structure.
Any suggestions? I'm looking for a DRY way to implement this for all my models.
RABL is really good in creating JSON structures on the fly, so I don't see why you couldn't achieve your goal. Did you try testing if a field is set to null-able in the schema, and thus presenting it as optional? It seems a good approach for me. For the nested children, just do the same, but extend the template for the children.
For example, in your father/show.rabl display a custom node :optional with all the properties that can be null.
Then, create a child/show.rabl with the same logic. Finally, go back to father/show.rabl and add a child node, extending the child/show.rabl template. This way you could achieve unlimited levels of "optionals".
Hope it helped you.
In this case I'd use the free form option.
From https://github.com/nesquena/rabl
There can also be odd cases where the root-level of the response
doesn't map directly to any object.
In those cases, object can be assigned to 'false'
and nodes can be constructed free-form.
object false
node(:some_count) { |m| #user.posts.count }
child(#user) { attribute :name }