Designing an efficient rest API - api

I'm trying to design a REST API over HTTP. I am totally new to this, so please tell me if any of my assumptions or ideas are just plain wrong.
The domain is minimalistic. I have a database of Products and for each Product there is an associated Image. As I see it, I can design my API in one of two ways:
I can bundle each image with its product and represent them as one resource. The cons of this api would be that every time you PUT or GET a product, you have to send the image over the wire, even if you don't specifically need to read or change the image. As of my understanding, it would not be RESTful to not PUT or GET a complete representation of a resource. Also, client-side caching of images would be of no use in this scenario.
I can model Products and Images as two different resources. When you GET a Product, it will contain an image_id which can be used to GET an Image. This model would require two HTTP Requests. One to GET the Product and one to GET its corresponding Image. Not so bad maybe, but what if I want to display a list of all Products along with their Images? Then I suddenly have a bunch of HTTP Requests. While using SSL, I guess this could create a performance issue. Good thing though, is that the consumer of my API could choose to cache images client-side.
So, how can I model my API to be both RESTful and efficient?

It's good that you're thinking about the data model.
Related to that, REST doesn't specify or imply that the data model must be completely de-normalized.
Typically, when GETing a resource, you'd receive a packet of information that also includes URL references to other related resources, like a product image. It could also include a reference to a product category, a product manufacturer, and so on. Each might be URLs, or IDs that you could derive URLs from. A message like this:
{
"id": 123456,
"description" : "Your basic paperweight",
"category" : { id: 17717, "name" : "Home furnishings" },
"manufacturer": { id : 78783, "name" : "Boeing" },
"price" : 1.99,
"imageId" : 109101
}
...might imply URLs like this:
http://api.mycompany.com/product/123456
http://api.mycompany.com/category/17717
http://api.mycompany.com/manufacturer/78783
http://api.mycompany.com/image/109101
...and note that the full representation of the linked-to resources, like category, manufacturer and so on, is not transmitted with the original resource. This is a partially de-normalized data model.
In regard to your comments on PUT:
This is a matter of opinion, but... for many developers it's completely acceptable to allow partial update via PUT. So you could update the resources without specifying everything; existing fields would remain unchanged. If you choose this behavior, it can complicate your (server-side) code when dealing with edge cases. For example, how does a client indicate that he wants to erase or delete a field? (Passing null may work, but for some data, null is a meaningful value.)
Why worry about PUT? If you want partial update, it's easy to use POST, with a verb (eg, "partialUpdate") in the query params. Actually this is what Roy Fielding advocates, and it makes sense to me.
A partial update would then be something like this:
POST /products/123456?action=partialUpdate
*headers*
{
"description" : "A fabulous paperweight designed in Sweden, now at a new low price." },
"price" : 1.78
}

I would use option 2, but instead of image_id, store the image URL. Also, don't be afraid to use custom scripts to return what you need (for example, displaying ALL products and images). REST is a design GOAL, not necessarily an implementation truth. Your design would still be RESTful.

I agree with the other 2 answers and I think that you should choose option number 2. But you also asked about getting a list of products, so here is my answer regarding it.
Think of using another resource that can be used with GET only, that resource will return a list of products. In this way there would be only one HTTP request for consuming the list. In case that there is a chance that the list can be very big, you would need to implement some kind of paging mechanism.
For example, lets say that you need to return 2500 products but you decided to return no more than 1000 products. The first GET request would return the first 1000 items and would also include in the answer the URL to consume the next "page", in this case the next 1000 products, then in the second request you would return products 1001-2000 with a URL to the next "page", in this case the last 500 products.
Then the consumer would be able to get the images as well if needed. You can use this list option for the images as well, but the bunch of images should be significantly smaller in each "page". I would not recommend of choosing the list mechanism to consume images.

Related

API - do I need the parent resource?

A person can have many reviews. My endpoint to CREATE a new review is:
post /person/{id}/reviews
How about the endpoint to UPDATE a review? I see two options:
Stick to the parent resource: patch /person/{person_id}/reviews/{id}
Only have reviews in the URI: patch /reviews/{id}
I could be sold on using either of them:
It's consistent with the previously defined endpoint, but {person_id} is not needed.
It's 'efficient' as we're not specifying a parameter ({person_id}) that is not really needed. However, it breaks the API convention.
Which one is preferable and why?
The client shouldn't have to know about ids at all. After a client creates the review, the response should include the URI to the new review like this:
HTTP/1.1 201 Created
Location: /person/4/reviews/5
The client now has the full URL to the review, making it completely irrelevant how it looks like and what information is here.
Don't forget that the URL itself is a system to create globally unique IDs, that embed not just it's own unique identity but also information on how to access the data. If you introduce a separate 'id' and 'person_id' field you are not taking advantage of how the web is supposed to work.
In terms of API design, without knowing too much detail about OP's situation I'd walk along these guideposts:
Only have reviews in the URI: patch /reviews/{id}
It's 'efficient' as we're not specifying a parameter ({person_id})
that is not really needed. However, it breaks the API convention
The "efficiency" allows for a more flexible design. There's no existing API convention broken at this point. Moreover, this approach gives you the flexibility to avoid the need of always needing the parent resource ID whenever you display your items.
Stick to the parent resource: patch /person/{person_id}/reviews/{id}
It's consistent with the previously defined endpoint, but {person_id}
is not needed.
The consistency aspect here can be neglected. It's not beneficial to design endpoints similarly to other endpoints just because the previous ones were designed in a certain way.
The key when deciding one way or the other is the intent you communicate and the following restrictions that are put on the endpoint.
The crucial question here is:
Can the reviews ever exist on their own or will they always have a parent person?
If you don't know for sure, go for the more flexible design: PATCH /reviews/{id}
If you do know for sure that it always will be bound to a particular person and never can have a null value for person_id in the database, then you could embed it right into your endpoint design with: PATCH /person/{person_id}/reviews/{id}
By the way, the same is true for other endpoints, like the creation endpoint POST /person/{person_id}/reviews/{id}. Having an endpoint like this removes the flexibility of creating reviews without a person, which may be desirable or not.

REST API Resource Granularity

I wanted to get opinion on resource granularity. Say, I have a an domain entity called "magazines". But there are different types of magazines, including Sports, Nature, Automobiles, Computers and Aeroplanes, etc.
When I want to create a new "sports" magazine, should I be using construct such as:
PUT /magazines
PUT /sports-magazines
PUT /magazines/sports
When I want to get a specific sports magazine, should I be saying:
GET /magazines/{id}
GET /sports-magazines/{id}
GET /magazines/sports/{id}
If I want to get sports magazines for the year 2001, should I be using:
GET /magazines?type=sports&year=2001
GET /sports-magazines?year=2001
GET /magazines/sports?year=2001
And finally, if I want to return how many pages each type of magazine has for January 2001 publication, how would I do that? Do I need to create a new pages resource for that? Or make two independent requests or something else? First of these is listed below:
GET /magazines/pages?type1=sports&type2=nature&year=2001&month=01
GET /sports-magazines/pages?type=nature&year=2001&month=01
Given these scenarios how would you model your resources?
I have a an domain entity called "magazines". But there are different types of magazines, including Sports, Nature, Automobiles, Computers and Aeroplanes, etc.
Important thing to understand: resources aren't domain entities. Your resource model is a facade that sits in front of your domain model.
Notice, for example, that this resource (REST API Resource Granularity) describes not only your question, but also my answer.
PUT probably is NOT what you want for "create a new resource" unless the client is already in position to know what URI should be used for the new resource. The target URI of a PUT request is the same URI that we expect to use later to GET the data
PUT /magazines/{id}
GET /magazines/{id}
In the case where we don't expect the client to know what the URI is going to be... well, we don't have an HTTP method that means precisely that, so we fall back to using POST (see Fielding, 2009).
POST /magazines
201 Created
Location: /magazines/12345
Note that the machines don't care if the URI of the created resource(s) match the target URI of the POST request.
REST really doesn't care what spelling conventions you use for your resource identifiers (in much the same way that the machines don't care what spelling conventions you use for variable names).
GET /magazines?type=sports&year=2001
GET /sports-magazines?year=2001
GET /magazines/sports?year=2001
GET /magazines/sports/year=2001
GET /magazines/sports/2001
Those are all fine; there are trade-offs. Key value pairs encoded into a query string make creating URI with HTML forms easier, using path segments makes relative resolution easier.
I want to return how many pages each type of magazine has for January 2001 publication
Creating a new URI with that information is fine. Extending the schema of your existing resources to include that information is also fine.

RESTfully creating object graphs

I'm trying to wrap my head around how to design a RESTful API for creating object graphs. For example, think of an eCommerce API, where resources have the following relationships:
Order (the main object)
Has-many Addresses
Has-many Order Line items (what does the order consist of)
Has-many Payments
Has-many Contact Info
The Order resource usually makes sense along with it's associations. In isolation, it's just a dumb container with no business significance. However, each of the associated objects has a life of it's own and may need to be manipulated independently, eg. editing the shipping address of an order, changing the contact info against an order, removing a line-item from an order after it has been placed, etc.
There are two options for designing the API:
The Order API endpoint intelligently creates itself AND its associated resources by processing "nested resource" in the content sent to POST /orders
The Order resource only creates itself and the client has to make follow-up POST requests to newly created endpoints, like POST /orders/123/addresses, PUT /orders/123/line-items/987, etc.
While the second option is simpler to implement at the server-side, it makes the client do extra work for 80% of the use-cases.
The first option has the following open questions:
How does one communicate the URL for the newly created resource? The Location header can communicate only one URL, however the server would've potentially created multiple resources.
How does one deal with errors? What if one of the associons has an error? Do we reject the entire object graph? How is that error communicated to the client?
What's the RESTful + pragmatic way of dealing with this?
How I handle this is the first way. You should not assume that a client will make all the requests it needs to. Create all the entities on the one request.
Depending on your use case you may also want to enforce an 'all-or-nothing' approach in creating the entities; ie, if something falls, everything rolls back. You can do this by using a transaction on your database (which you also can't do if everything is done through separate requests). Determining if this is the behavior you want is very specific to your situation. For instance, if you are creating an order statement you may which to employ this (you dont want to create an order that's missing items), however if you are uploading photos it may be fine.
For returning the links to the client, I always return a JSON object. You could easily populate this object with links to each of the resources created. This way the client can determine how to behave after a successful post.
Both options can be implemented RESTful. You ask:
How does one communicate the URL for the newly created resource? The Location header can communicate only one URL, however the server would've potentially created multiple resources.
This would be done the same way you communicate linkss to other Resources in the GET case. Use link elements or what ever your method is to embed the URL of a Resource into a Representation.

How return the total entries in our JSON API if we use pagination by Link Header

I start implement a REST API. I have a request doing on a resource with multiple entries. To implement the pagination, I do like Github choose implement it.
I define a HTTP Header Link where I add the next/previous/first/last link.
Link: <https://api.github.com/repos?page=3&per_page=100>; rel="next",
<https://api.github.com/repos?page=50&per_page=100>; rel="last"
In my body there are only my entries and nothing else. But Now I want know how entries there are in total. I can't do a multiplication between number of page and per_page entries, because the result is not exact.
So how can I do to return this number to entries ? I think add a new HTTP Header in my answer X-total-entries. But I don't know if there are better technique or not.
When I try to decide whether to put some data into the headers or into the body, I ask myself if it is a feature of the application or of the protocol? In your case, is the pagination a feature of the application? Is the user aware what page he is looking at? Is the total number of items displayed to the user? If the answer is yes, then I would put the information into the body. Then the body becomes not just a list of items, but a representation of a page, with all the information and controls needed to display it. Only if the pagination is a internal protocol detail would I consider putting the links and the item count into the header. I know this may sound a rather abstract way of thinking, but if the pagination details need to bubble up all the way to the top of the application, there is little real benefit in separating this information from the body and putting it into the headers.

The REST-way to check/uncheck like/unlike favorite/unfavorite a resource

Currently I am developing an API and within that API I want the signed in users to be able to like/unlike or favorite/unfavorite two resources.
My "Like" model (it's a Ruby on Rails 3 application) is polymorphic and belongs to two different resources:
/api/v1/resource-a/:id/likes
and
/api/v1/resource-a/:resource_a_id/resource-b/:id/likes
The thing is: I am in doubt what way to choose to make my resources as RESTful as possible. I already tried the next two ways to implement like/unlike structure in my URL's:
Case A: (like/unlike being the member of the "resource")
PUT /api/v1/resource/:id/like maps to Api::V1::ResourceController#like
PUT /api/v1/resource/:id/unlike maps to Api::V1::ResourceController#unlike
and case B: ("likes" is a resource on it's own)
POST /api/v1/resource/:id/likes maps to Api::V1::LikesController#create
DELETE /api/v1/resource/:id/likes maps to Api::V1::LikesController#destroy
In both cases I already have a user session, so I don't have to mention the id of the corresponding "like"-record when deleting/"unliking".
I would like to know how you guys have implemented such cases!
Update April 15th, 2011: With "session" I mean HTTP Basic Authentication header being sent with each request and providing encrypted username:password combination.
I think the fact that you're maintaining application state on the server (user session that contains the user id) is one of the problems here. It's making this a lot more difficult than it needs to be and it's breaking a REST's statelessness constraint.
In Case A, you've given URIs to operations, which again is not RESTful. URIs identify resources and state transitions should be performed using a uniform interface that is common to all resources. I think Case B is a lot better in this respect.
So, with these two things in mind, I'd propose something like:
PUT /api/v1/resource/:id/likes/:userid
DELETE /api/v1/resource/:id/likes/:userid
We also have the added benefit that a user can only register one 'Like' (they can repeat that 'Like' as many times as they like, and since the PUT is idempotent it has the same result no matter how many times it's performed). DELETE is also idempotent, so if an 'Unlike' operation is repeated many times for some reason then the system remains in a consistent state. Of course you can implement POST in this way, but if we use PUT and DELETE we can see that the rules associated with these verbs seem to fit our use-case really well.
I can also imagine another useful request:
GET /api/v1/resource/:id/likes/:userid
That would return details of a 'Like', such as the date it was made or the ordinal (i.e. 'This was the 50th like!').
case B is better, and here have a good sample from GitHub API.
Star a repo
PUT /user/starred/:owner/:repo
Unstar a repo
DELETE /user/starred/:owner/:repo
You are in effect defining a "like" resource, a fact that a user resource likes some other resource in your system. So in REST, you'll need to pick a resource name scheme that uniquely identifies this fact. I'd suggest (using songs as the example):
/like/user/{user-id}/song/{song-id}
Then PUT establishes a liking, and DELETE removes it. GET of course finds out if someone likes a particular song. And you could define GET /like/user/{user-id} to see a list of the songs a particular user likes, and GET /like/song/{song-id} to see a list of the users who like a particular song.
If you assume the user name is established by the existing session, as #joelittlejohn points out, and is not part of the like resource name, then you're violating REST's statelessness constraint and you lose some very important advantages. For instance, a user can only get their own likes, not their friends' likes. Also, it breaks HTTP caching, because one user's likes are indistinguishable from another's.