Tree structure of data in REST - URL always from root? - api

Problem
When the data have a tree structure of parent/child/grandchild entities, we often duplicate the information in the URL, specifying parent IDs, even if that's not necessary. What's the best way to design the RESTful API in such case? Can the URLs be shortened and the parent IDs omitted?
Example
The tree is as follows: The top-most entity is a product. Each product has 0-N reviews. Each review can have 0-M comments attached. In theory, there can be an arbitrary depth of this tree.
The naive RESTful API would look like this (assuming only GET endpoints):
/products ... list of products
/products/123 ... specific product 123
/products/123/reviews ... list of reviews for product '123'
/products/123/reviews/abc ... specific review 'abc'
/products/123/reviews/abc/comments ... list comments for review 'abc'
Hang on, wait a minute... The last two labels I have written do not say anything about product '123'. Yes, the review 'abc' belongs to that product, but as a human, I don't need to know that. And if the review ID 'abc' is unique among all reviews, neither does the computer.
So for example when we send an update (PATCH) request for review 'abc', we don't need to know whole hierarchy of parent objects up to the tree root (products), e.g that it belongs to product '123' in this case. Of course, we assume each object has an unique ID among all objects of that entity - but that's a natural behavior for example in RDBs, so many people (well, their APIs) are in this situation.
Questions
If the IDs of "child entities" are unique among all entities of that type, would it be best practice to design the API like this?
/reviews/abc ... specific review 'abc'
/reviews/abc/comments ... list comments for review 'abc'
/comments/xyz ... specific comment 'xyz'
If answer to (1) is yes, should an endpoint like this be valid as well? Why? Why not?
/products/123/reviews/abc/comments/xyz ... specific comment 'xyz'
If short URLs are allowed (or even preferred), isn't this a bit inconsistent then?
/products/123/reviews ... list reviews for product '123'
/reviews/abc ... specific review 'abc'
/reviews ... what should be here? all reviews?

Yes.
Depends - I wouldn't recommend it, but if you find a use case for it, why not?
I see no inconsistency - yes, in this situation /reviews should be a list of all reviews in system, but if that makes no sense for your application, then /reviews can just yield a 404 and everything's fine.
Ideally, design of URLs should be decoupled from the rest of the REST API. That means, as far as your URLs are uniquely identifying your resources, they're (from purely theoretical point of view) "well designed".
But API is an interface and it should be treated as such. API is consumed by machines, but those machines are written by people, so in fact, design matters. It's the same reason why to have nice URLs on your blog - there is no technical reason for it, but it improves the experience of users if they want to read, share, remember or understand your URLs (you may say that Google searches for keywords in URLs and so it is a technical reason, but no, it's not - Google's bot is just one of your users - website consumers - and optimization for the bot is just like any other optimization for your users, thus it's interface design).
In case design of your URLs matters (for any reason), then in my opinion the best approach is to keep them simple. As simple, as you can. Your observation is very right - you don't need to mimic hierarchy of your resources or the way you store data in database. Eventually it would only get in your way and in a way of people who want to consume your API.
If a resource is uniquely identified within a collection by an ID, then design your URLs just /collection/{id}. Look how Facebook does it - majority of its API does exactly this. Structure of their URLs is pretty flat.
There doesn't even need to be a /collection resource for listing all existing objects. You can have them linked only from places, where it makes sense, like /products/123/reviews, where you can list links pointing to /reviews/{id}.
Why I think complicated URLs are bad?
Relations between resources are graphs and you can't put graphs to URLs
Putting other IDs and hierarchies into URLs makes things more complicated for no reason. Usually, hierarchies are not so simple in APIs - relations between resources are more often very complicated graphs, not simple trees. So don't put linking between resources into your URLs - there are better places (hypermedia formats, link headers, or at least linking by ID references) where to put information about relations and those are not limited to one string like URLs, so with them you can define relations better.
You're torturing your consumer by requesting too much parameters
By requiring more information in URL from consumer, you force him to remember all this context and all those IDs or know those values in advance. You require more (unnecessary) input, but in reality, there is no reason for consumer to remember product's ID just to check out one of its reviews.
Evolvability
In case your URLs are not decoupled well, you should really think of what happends if structure of your data changes in time. With simple URLs, nothing really happens. With complicated URLs, every time you change the way your API resources are related, you'll need to change also URLs so they keep up with your structure. And as everyone knows, changing URLs is hard - whether we are talking about web or APIs. Hypermedia somehow solves this, but even without hypermedia you can do at least so little that you keep your URLs light and as change-prone, as it gets.
Your design could look like this
/products/{id} - specific product, links to an endpoint with list of its reviews
/products/{id}/reviews - lists links to endpoints of reviews of the product
/reviews/{id} - specific review, should link to reviewed product and it could even link to the list above, if it seems to be useful for an API consumer
In fact, any of those resources can also link to any other thing in the system, if its useful or if there is a logical connection. Some linking systems (such as hypermedia) make understanding those links easier, because you can specify a rel attribute, which says to consumer where the link is pointing to (self points to itself, next could point to another page, etc.).
Of course, as always, it depends on your specific case. But generally, I'd recommend to keep URLs decoupled and simple. Also, I wouldn't recommend to to try to mirror any complicated relations or hierarchies in URLs.

As long as the URL can uniquely identify the resource, it is correct.
So the approaches in both Q-1) and Q-2) are fine to use and can be mixed. It is like provide different entry points to the same resource.
The answer to the question comes back to your business use-case. If there is no need for more than one entry points, should just stick with one and it will simplify the code.
To Q-3, ‘/reviews’ will mean all reviews. Also you don’t need to support that if there is no business use-case to get all reviews in your system.
Hope this help.

Related

Query String vs Resource Path for Filtering Criteria

Background
I have 2 resources: courses and professors.
A course has the following attributes:
id
topic
semester_id
year
section
professor_id
A professor has the the following attributes:
id
faculty
super_user
first_name
last_name
So, you can say that a course has one professor and a professor may have many courses.
If I want to get all courses or all professors I can: GET /api/courses or GET /api/professors respectively.
Quandary
My quandary comes when I want to get all courses that a certain professor teaches.
I could use either of the following:
GET /api/professors/:prof_id/courses
GET /api/courses?professor_id=:prof_id
I'm not sure which to use though.
Current solution
Currently, I'm using an augmented form of the latter. My reasoning is that it is more scale-able if I want to add in filtering/sorting criteria.
I'm actually encoding/embedding JSON strings into the query parameters. So, a (decoded) example might be:
GET /api/courses?where={professor_id: "teacher45", year: 2016}&order={attr: "topic", sort: "asc"}
The request above would retrieve all courses that were (or are currently being) taught by the professor with the provided professor_id in the year 2016, sorted according to topic title in ascending ASCII order.
I've never seen anyone do it this way though, so I wonder if I'm doing something stupid.
Closing Questions
Is there a standard practice for using the query string vs the resource path for filtering criteria? What have some larger API's done in the past? Is it acceptable, or encouraged to use use both paradigms at the same time (make both endpoints available)? If I should indeed be using the second paradigm, is there a better organization method I could use besides encoding JSON? Has anyone seen another public API using JSON in their query strings?
Edited to be less opinion based. (See comments)
As already explained in a previous comment, REST doesn't care much about the actual form of the link that identifies a unique resource unless either the RESTful constraints or the hypertext transfer protocol (HTTP) itself is violated.
Regarding the use of query or path (or even matrix) parameters is completely up to you. There is no fixed rule when to use what but just individual preferences.
I like to use query parameters especially when the value is optional and not required as plenty of frameworks like JAX-RS i.e. allow to define default values therefore. Query parameters are often said to avoid caching of responses which however is more an urban legend then the truth, though certain implementations might still omit responses from being cached for an URI containing query strings.
If the parameter defines something like a specific flavor property (i.e. car color) I prefer to put them into a matrix parameter. They can also appear within the middle of the URI i.e. /api/professors;hair=grey/courses could return all cources which are held by professors whose hair color is grey.
Path parameters are compulsory arguments that the application requires to fulfill the request in my sense of understanding otherwise the respective method handler will not be invoked on the service side in first place. Usually this are some resource identifiers like table-row IDs ore UUIDs assigned to a specific entity.
In regards to depicting relationships I usually start with the 1 part of a 1:n relationship. If I face a m:n relationship, like in your case with professors - cources, I usually start with the entity that may exist without the other more easily. A professor is still a professor even though he does not hold any lectures (in a specific term). As a course wont be a course if no professor is available I'd put professors before cources, though in regards to REST cources are fine top-level resources nonetheless.
I therefore would change your query
GET /api/courses?where={professor_id: "teacher45", year: 2016}&order={attr: "topic", sort: "asc"}
to something like:
GET /api/professors/teacher45/courses;year=2016?sort=asc&onField=topic
I changed the semantics of your fields slightly as the year property is probably better suited on the courses rather then the professors resource as the professor is already reduced to a single resource via the professors id. The courses however should be limited to only include those that where held in 2016. As the sorting is rather optional and may have a default value specified, this is a perfect candidate for me to put into the query parameter section. The field to sort on is related to the sorting itself and therefore also belongs to the query parameters. I've put the year into a matrix parameter as this is a certain property of the course itself, like the color of a car or the year the car was manufactured.
But as already explained previously, this is rather opinionated and may not match with your or an other folks perspective.
I could use either of the following:
GET /api/professors/:prof_id/courses
GET /api/courses?professor_id=:prof_id
You could. Here are some things to consider:
Machines (in particular, REST clients) should be treating the URI as an opaque thing; about the closest they ever come to considering its value is during resolution.
But human beings, staring that a log of HTTP traffic, do not treat the URI opaquely -- we are actually trying to figure out the context of what is going on. Staying out of the way of the poor bastard that is trying to track down a bug is a good property for a URI Design to have.
It's also a useful property for your URI design to be guessable. A URI designed from a few simple consistent principles will be a lot easier to work with than one which is arbitrary.
There is a great overview of path segment vs query over at Programmers
https://softwareengineering.stackexchange.com/questions/270898/designing-a-rest-api-by-uri-vs-query-string/285724#285724
Of course, if you have two different URI, that both "follow the rules", then the rules aren't much help in making a choice.
Supporting multiple identifiers is a valid option. It's completely reasonable that there can be more than one way to obtain a specific representation. For instance, these resources
/questions/38470258/answers/first
/questions/38470258/answers/accepted
/questions/38470258/answers/top
could all return representations of the same "answer".
On the /other hand, choice adds complexity. It may or may not be a good idea to offer your clients more than one way to do a thing. "Don't make me think!"
On the /other/other hand, an api with a bunch of "general" principles that carry with them a bunch of arbitrary exceptions is not nearly as easy to use as one with consistent principles and some duplication (citation needed).
The notion of a "canonical" URI, which is important in SEO, has an analog in the API world. Mark Seemann has an article about self links that covers the basics.
You may also want to consider which methods a resource supports, and whether or not the design suggests those affordances. For example, POST to modify a collection is a commonly understood idiom. So if your URI looks like a collection
POST /api/professors/:prof_id/courses
Then clients are more likely to make the associate between the resource and its supported methods.
POST /api/courses?professor_id=:prof_id
There's nothing "wrong" with this, but it isn't nearly so common a convention.
GET /api/courses?where={professor_id: "teacher45", year: 2016}&order={attr: "topic", sort: "asc"}
I've never seen anyone do it this way though, so I wonder if I'm doing something stupid.
I haven't either, but syntactically it looks a little bit like GraphQL. I don't see any reason why you couldn't represent a query that way. It would make more sense to me as a single query description, rather than breaking it into multiple parts. And of course it would need to be URL encoded, etc.
But I would not want to crazy with that right unless you really need to give to your clients that sort of flexibility. There are simpler designs (see Roman's answer)

Deciding on REST API Path Convention

I am working for classified section of my organization where I need to fetch cities available for this section. Classified cities are the subset of the cities. So, for fetching the cities I created the following API path.
api/classified/cities
Then I realize since the classified cities are the subset of cities the URL can be
api/cities/classifiedcities
which one of the above path should I use according to REST principals?
If you are aiming for REST principles, REST does not really have any principles how a URI might look like (as long as it identifies a "real" resource!). It does however say that it should be linked instead of hardcoded into the clients. This is why some people say URIs do not matter. They do not matter because clients should "discover" URIs instead of knowing them beforehand.
So why don't we all pick URIs like "/45ttfdfg/34tkfjdldf23wedkdfjsd"? That, again is up to personal preference really. It is nice if the URI is "readable" by humans. There are some (badly written) tools that assume some structure. There are some "REST" libraries (server and client ones) that also assume a bunch of stuff, for example the concept of "subresources" (which also does not come from REST).
To sum up: If you follow REST (you don't have to of course!), then clients should discover URIs. If that is the case, then it comes down to personal preference and maybe some technical restrictions with libraries/clients. So pick one which you prefer, don't worry about it! They can be modified later if needed anyway, since the server has control over its URIs.
Theoretically the subset should follow the parent set (so the first solution doesn't look too good).
One third approach would be through query parameters:
api/cities?type=classified

Rest Multiple parentId id's

I have a rest resource with the uri:
/invoices/1/invoiceitems/34
Now I have to add a child of invoice items called invoiceitempieces
Should this be done like this:
/invoices/1/invoiceitems/34/invoiceitempieces/7
That seems more organized than these options
/invoiceitems/34/invoiceitempieces/7
/invoiceitempieces/7
I prefer the example with all three IDs but is that best/acceptable practice?
While this seems the most appropriate, conceptually ...
/invoices/1/invoiceitems/34/invoiceitempieces/7
... you will find it rather difficult to implement (based on my own experience). You'll find that this ...
/invoices
/invoices/1
/invoices/1/invoiceitems
/invoiceitems/34
/invoiceitems/34/invoiceitempieces
/invoiceitempieces/7
... is just as useful and much easier to implement, even though it might be a bit less elegant.
Best practice is to use HATEOAS. RESTful URI is opaque.
A REST API must not define fixed resource names or hierarchies (an
obvious coupling of client and server). Servers must have the freedom
to control their own namespace. Instead, allow servers to instruct
clients on how to construct appropriate URIs, such as is done in HTML
forms and URI templates, by defining those instructions within media
types and link relations. [Failure here implies that clients are
assuming a resource structure due to out-of band information, such as
a domain-specific standard, which is the data-oriented equivalent to
RPC's functional coupling].
see this:
http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven
I think you have to have Id and order for invoiceitems and invoiceitempieces id is unique and "order+invoice_id" combined are unique so you can access the invoiceitempieces throw 2 ways
/invoices/{invoice_id}/invoiceitems/{invoiceitems_order}/invoiceitempieces/{invoiceitempieces_order}
/invoiceitems/{invoiceitems_id}/invoiceitempieces/{invoiceitempieces_order}
/invoiceitempieces/{invoiceitempieces_id}
so you will have flexibility and also provide more information in the URI "the order" if it is required by client.

API object versioning

I'm building an API and I have a question about how to represent objects.
Imagine we have a system with Articles that have a bunch of properties. Some of these properties are complex, for example the Author of the Article refers to another object. We have an URL to fetch all the articles in the system, and another URL to fetch a particular Article.
My first approach to implement this would be to create two representations of the same object Article, because when you request all the articles, it makes sense that you don't retrieve all the information about the Articles, but for example just the title, the date and the name of the author (instead of the whole Author object), excluding other properties like tags, or the content. The idea beneath this is to try to make the response of all the Articles a little bit lighter.
Now I'm going to the client side, and I decide to implement a SDK for Android, for example. So the first step would be to create the objects to store the information that I retrieve from the API. Now a problem pops up, because I want to define the Article object, but I would need two versions of it and it's not only more difficult to implement, but it's going to be more difficult to use.
So my question is, when defining an API, is it a good practice to have multiple versions of the same object (maybe a light one, and a full one) to save some bandwidth when sending the result of a request but generating a more difficult to use service, or it's not worth it and you should retrieve always the same version of the object, generating heavier responses but making the service easier to use?
I work at a company that deals with Articles as well and we also have a REST API to expose the data.
I think you're on the right track, but I'll even take it one step further. These are the potential three calls for large entities in an API:
Index. For the articles, this would be something like /articles. It just returns a list of article ids. You can add parameters to filter, sort, etc. It's very lightweight and I've found it to be very useful.
Header/Mini/Light version. These are only the crucial fields that you think will meet the widest variety of use cases. For us, we have a lot of use cases where we might want to display the top 5 articles, and in those cases, only title, author and maybe publication date. Those fields belong in a "header" article, or a "light" article. This is especially useful for AJAX calls as you don't want to return the entire article (for us the object is quite large.)
Full version. This is the full article. All the text/paragraphs/image references - everything. It's a heavy call to make, but you will be guaranteed to get whatever is available.
Then it just takes discipline to leave the objects the way they are. Ideally users are able to get the version described in (2) to save time over the wire, but if they have to, they go with (3).
I've considered having a dynamic way to return only fields people are interested in, but it would be a lot of implementation. Basically the idea was to let the user go to /article and then show them a sample JSON result. Then the user could click on the fields they wanted returned and get a token. Then they'd pass the token as a parameter to the API and the API would then know which fields to return.
Creates a dynamic schema. Lots of work and I never got around to it, but you can see that if you want to be creative, you can.
Consider whether your data (for one API client) is changing a lot or not. If it's possible to cache data on the client, that'll improve performance by not contacting the API as much. Otherwise I think it's a good idea to have a light-weight and full-scale object type (or more like two views of the same object type).
In the client you should implement it as one object type (to keep it DRY; Don't Repeat Yourself) with all the properties. When fetching a light-weight object, you only store a few of the properties, the rest being null (or similar “undefined” value for the given property type). It should be possible to determine whether all or only a partial subset of the properties are loaded.
When making API requests in the client on a given model (ie. authors) you should be explicit about whether the light-weight or full-scale object is needed and whether cached data is acceptable. This makes it possible to control the data in the UI layer. For example a list of authors might only need to display a name and a number of articles connected with that author. When displaying the author screen, more properties are needed. Also, if using cached data, you should provide a way for the user to refresh it.
When the app works you can start to implement optimizations like: Don't fetch light-weight data if full-scala data is already known & Don't fetch data at all if a recent cache copy exists. I think the best is to look at the actual use cases and improve performance with the highest value for the user.

Does my API design violate RESTful principles?

I'm currently (I try to) designing a RESTful API for a social network. But I'm not sure if my current approach does still accord to the RESTful principles. I'd be glad if some brighter heads could give me some tips.
Suppose the following URI represents the name field of a user account:
people/{UserID}/profile/fields/name
But there are almost hundred possible fields. So I want the client to create its own field views or use predefined ones. Let's suppose that the following URI represents a predefined field view that includes the fields "name", "age", "gender":
utils/views/field-views/myFieldView
And because field views are kind of higher logic I don't want to mix support for field views into the "people/{UserID}/profile/fields" resource. Instead I want to do the following:
utils/views/field-views/myFieldView/{UserID}
Another example
Suppose we want to perform some quantity operations (hope that this is the right name for it in English). We have the following URIs whereas each of them points to a list of persons -- the friends of them:
GET people/exampleUID-1/relationships/friends
GET people/exampleUID-2/relationships/friends
And now we want to find out which of their friends are also friends of mine. So we do this:
GET people/myUID/relationships/intersections/{Value-1};{Value-2}
Whereas "{Value-1/2}" are the url encoded values of "people/exampleUID-1/friends" and "people/exampleUID-2/friends". And then we get back a representation of all people which are friends of all three persons.
Though Leonard Richardson & Sam Ruby state in their book "RESTful Web Services" that a RESTful design is somehow like an "extreme object oriented" approach, I think that my approach is object oriented and therefore accords to RESTful principles. Or am I wrong?
When not: Are such "object oriented" approaches generally encouraged when used with care and in order to avoid query-based REST-RPC hybrids?
Thanks for your feedback in advance,
peta
I've never worked with REST, but I'd have assumed that GETting a profile resource at '''/people/{UserId}/profile''' would yield a document, in XML or JSON or something, that includes all the fields. Client-side I'd then ignore the fields I'm not interested in. Isn't that much nicer than having to (a) configure a personalised view on the server or (b) make lots of requests to fetch each field?
Hi peta,
I'm still reading through RESTful Web Services myself, but I'd suggest a slightly different approach than the proposed one.
Regarding the first part of your post:
utils/views/field-views/myFieldView/{UserID}
I don't think that this is RESTful, as utils is not a resource. Defining custom views is OK, however these views should be (imho) a natural part of your API's URI scheme. To incorporate the above into your first URI example, I would propose one of the following examples instead of creating a special view for it:
people/{UserID}/profile/fields/name,age,gender/
people/{UserID}/profile/?fields=name,age,gender
The latter example considers fields as an input value for your algorithm. This might be a better approach than having fields in the URI as it is not a resource itself - it just puts constraints on the existing view of people/{UserID}/profile/. Technically, it's very similar as pagination, where you would limit a view by default and allow clients to browse through resources by using ?page=1, ?page=2 and so on.
Regarding the second part of your post:
This is a more difficult one to crack.
First:
Having intersection in the URI breaks your URI scheme a bit. It's not a resource by itself and also it sits on the same level as friends, whereas it would be more suitable one level below or as an input value for your algorithm, i.e.
GET people/{UserID}/relationships/friends/intersections/{Value-1};{Value-2}
GET people/{UserID}/relationships/friends/?intersections={Value-1};{Value-2}
I'm again personally inclined to the latter, because similarly as in the first case, you are just constraining the existing view of people/{UserID}/relationships/friends/
Secondly, regarding:
Whereas "{Value-1/2}" are the url
encoded values of
"people/exampleUID-1/friends" and
"people/exampleUID-2/friends"
If you meant that {Value-1/2} contain the whole encoded response of the mentioned GET requests, then I would avoid that - I don't think that the RESTful way. Since friends is a resource by itself, you may want to expose it and access it directly, i.e.:
GET friends/{UserID-1};{UserID-2};{UserID-3}
One important thing to note here - I've used ; between user IDs in the previous example, whereas I used , in the fields example above. The reasoning is that both represent a different operator. In the first case we needed OR (,) in order to get all three fields, while in the last example above we had to use AND (;) in order to get an intersection.
Usage of two types of operators can over-complicate the API design, but it should provide more flexibility in the end.
thanks for your clarifying answers. They are exactly what I was asking for. Unfortunately I hadn't the time to read "RESTful Web Services" from cover to cover; but I will catch it up as soon as possible. :-)
Regarding the first part of my post:
You're right. I incline to your first example, and without fields. I think that the I don't need it at all. (At the moment) Why do you suggest the use of OR (,) instead of AND (;)? Intuitively I'd use the AND operator because I want all three of them and not just the first one existing. (Like on page 121 the colorpairs example)
Regarding the second part:
With {Value-1/2} I meant only the url-encoded value of the URIs -- not their response data. :) Here I incline with you second example. Here it should be obvious that under the hood an algorithm is involed when calculating intersecting friends. And beside that I'm probably going to add some further operations to it.
peta