Writing an API, benefits of: including nested objects automatically, not at all, or provide a parameter to specify which to include? - api

For example, we have an entity called ServiceConfig that contains a pointer to a Service and a Professional. If returned without including the fields would look like this:
{
'type': '__Pointer',
'className': 'Service',
'objectId': 'q92he840'
}
At which point they could query again to retrieve that service. However, it is often the case that they need the Service name. In which case it is inefficient to have to query again to get the service every time.
Options:
Automatically return the Service. In which case, we should automatically return the Industry for that Service as well in case they need that... same applies to all. Seems like we're returning data too often here.
Allow them to pass an includes parameter that specifies which entities to include. Format is an array of strings where using a . can allow them to include subclasses. In this case ['Professional', 'Service.Industry'] would work.
Can anyone identify why any one solution would be better than the others? I feel that the last solution is the best, however it does not seem to be common to do to in the APIs I've seen.

This is a good API Design decision to spend your time on before you release an initial version. Both your approaches are valid and it all depends on what you think are the most common ways that clients would use your API.
Here are some points that you could consider:
You might prefer the first approach where you do not give all the data upfront. Sometimes it is about efficiency and at times it is also about security and ensuring that any additional important data is only fetched on as as needed basis and on authorization.
Implementing the 2nd approach is going to take more effort on part of your team to design/code and test out the API. So you might want to consider how much of effort you want to put into release 1.0
Since you have nested data for example, the second approach will serve you well. Several public APIs do that as a matter of fact. For e.g. look at the LinkedIn public API and particularly the facets section, where you can specify the fields or additional information that you would like to return.
Look at some of the client applications that you have written and if you can identify for sure that some data is needed anyways upfront, then it can help in designed the return data.
Eventually monitoring API usage and doing some analysis on the number of calls, methods invoked will give you good inputs on what to do next.
If I had to make a choice and have a little bit more leeway in terms of effort, I would go with the 2nd option, even if it is a simple version at first.

Related

API object versioning

I'm building an API and I have a question about how to represent objects.
Imagine we have a system with Articles that have a bunch of properties. Some of these properties are complex, for example the Author of the Article refers to another object. We have an URL to fetch all the articles in the system, and another URL to fetch a particular Article.
My first approach to implement this would be to create two representations of the same object Article, because when you request all the articles, it makes sense that you don't retrieve all the information about the Articles, but for example just the title, the date and the name of the author (instead of the whole Author object), excluding other properties like tags, or the content. The idea beneath this is to try to make the response of all the Articles a little bit lighter.
Now I'm going to the client side, and I decide to implement a SDK for Android, for example. So the first step would be to create the objects to store the information that I retrieve from the API. Now a problem pops up, because I want to define the Article object, but I would need two versions of it and it's not only more difficult to implement, but it's going to be more difficult to use.
So my question is, when defining an API, is it a good practice to have multiple versions of the same object (maybe a light one, and a full one) to save some bandwidth when sending the result of a request but generating a more difficult to use service, or it's not worth it and you should retrieve always the same version of the object, generating heavier responses but making the service easier to use?
I work at a company that deals with Articles as well and we also have a REST API to expose the data.
I think you're on the right track, but I'll even take it one step further. These are the potential three calls for large entities in an API:
Index. For the articles, this would be something like /articles. It just returns a list of article ids. You can add parameters to filter, sort, etc. It's very lightweight and I've found it to be very useful.
Header/Mini/Light version. These are only the crucial fields that you think will meet the widest variety of use cases. For us, we have a lot of use cases where we might want to display the top 5 articles, and in those cases, only title, author and maybe publication date. Those fields belong in a "header" article, or a "light" article. This is especially useful for AJAX calls as you don't want to return the entire article (for us the object is quite large.)
Full version. This is the full article. All the text/paragraphs/image references - everything. It's a heavy call to make, but you will be guaranteed to get whatever is available.
Then it just takes discipline to leave the objects the way they are. Ideally users are able to get the version described in (2) to save time over the wire, but if they have to, they go with (3).
I've considered having a dynamic way to return only fields people are interested in, but it would be a lot of implementation. Basically the idea was to let the user go to /article and then show them a sample JSON result. Then the user could click on the fields they wanted returned and get a token. Then they'd pass the token as a parameter to the API and the API would then know which fields to return.
Creates a dynamic schema. Lots of work and I never got around to it, but you can see that if you want to be creative, you can.
Consider whether your data (for one API client) is changing a lot or not. If it's possible to cache data on the client, that'll improve performance by not contacting the API as much. Otherwise I think it's a good idea to have a light-weight and full-scale object type (or more like two views of the same object type).
In the client you should implement it as one object type (to keep it DRY; Don't Repeat Yourself) with all the properties. When fetching a light-weight object, you only store a few of the properties, the rest being null (or similar “undefined” value for the given property type). It should be possible to determine whether all or only a partial subset of the properties are loaded.
When making API requests in the client on a given model (ie. authors) you should be explicit about whether the light-weight or full-scale object is needed and whether cached data is acceptable. This makes it possible to control the data in the UI layer. For example a list of authors might only need to display a name and a number of articles connected with that author. When displaying the author screen, more properties are needed. Also, if using cached data, you should provide a way for the user to refresh it.
When the app works you can start to implement optimizations like: Don't fetch light-weight data if full-scala data is already known & Don't fetch data at all if a recent cache copy exists. I think the best is to look at the actual use cases and improve performance with the highest value for the user.

How to design a REST API with LIKE criteria?

I'm designing a REST API and have an entity for "people":
GET http://localhost/api/people
Returns a list of all the people in the system
GET http://localhost/api/people/1
Returns the person with id 1.
GET http://localhost/api/people?forename=john&surname=smith
Returns all the people with matching forenames and surnames but I have a further requirement. What is the cleanest / best practice way of allowing API consumers to retrieve all the people whose forename starts with "jo" for example.
I've seen some APIs do this like:
GET http://localhost/api/people?forename=jo~&surname=smith
where the tilde signifies a "fuzzy" match. On the other hand I've seen it implemented with a totally different criteria e.g.
GET http://localhost/api/people?forename-startswith=jo&surname=smith
which seems a bit cumbersome considering I might have -endswith, -contains, -soundslike (for some sort of soundex match).
Can anyone suggest from experience which works better and also any examples of well designed REST APIs that have similar functionality.
IMHO it does not matter if you have fuzzy matches or have -endswith -contains etc. What matters is if your REST API permits easy parsing of such parameters so that you can define functions to fetch data from your data source (DB or xml file etc.) accordingly
If you are using PHP...from my experience, SlimFramework is a great light weight, easy-to-get-started solution.
I would recommend you the OData protocol which provides a Query String Options. What you did is ok and follows REST conventions.
But, the OData protocol describes a $expand parameter and even a $filter parameter. This $ prefix denotes "System Query Options" and you will be interested in the last one because it allows you to write the following URI:
http://services.odata.org/Northwind/Northwind.svc/Customers?$filter=tolower(CompanyName) eq 'foobar' &select=FirstName,LastName&$orderby=Name desc
It allows you to pass SQL like data, it can be a nice alternative to what you described (both solutions are fine, it's just a matter of taste).
AFAIK, none of above are quite RESTful. Both of them rely on a priory knowledge on the client's part on how to invoke queries (in the first case, query pattern and on the second one a query DSL). In the second example, in fact, the API is reduced mere to a wrapper around the data store. As such, API does not define a server domain - it is a data provider. This is in contrast to the client-server constraint of REST.
If you need to expose a full-blown data store with all various querying capabilities, you had better stick to known standards which we have OData. OData has been sold as REST but many REST-heads have problems with it. Anyhow, at the end of the day it works and REST discussions can commonly lead to analysis-paralysis.
If I was doing this, I would probably constraint the API to a common use-case, so something more like the second one without defining a query DSL (hence forenameStartsWith rather than forename-startswith).
Having said that, if you need to query based on many fields and various conditions, I would use OData.
Both examples use query parameters for filtering. I don't think it matters what these query parameters are called or if some wildcard syntax is used.
Both approaches are equally RESTFul.

How to handle complex availability of information in OOP from a RESTful API

My issue is that I'm dealing with a RESTful API that returns information about objects, and when writing classes to represent them, I'm not sure how best to handle all the possibilities of the status of each variable's availability. From what I can tell, there are 5 possibilities: The information
is available
has not been requested
is currently being requested (asynchronously)
is unavailable
is not applicable
So with these, having an object represent its data with a value or null doesn't cut it. To give a more concrete example, I'm working with an API about the United States Congress, so the problem goes as thus:
I request information about a bill, and it contains a stub about the sponsoring legislator.
I eventually need to request all the information about that legislator. Not all the legislators will have all the information. Those in the House of Representatives won't have a senate class (Senators' six-year terms are staggered so a third expire every two years, the House is entirely re-elected every two years). Some won't have a twitter id, just because they don't have one. And, of course, if I have already requested information, I shouldn't try to request it again.
There's a couple options I see:
I can create a Legislator object and fill it with what information I have, but then I have to have some mechanism of tracking information availability with the getters and setters. This is kind of what I'm doing right now, but it requires a lot of repeated code.
I could create a separate class for abbreviated objects and replace them when I get more with immutable "complete" objects, but then I have to be really careful about replacing all references to them and also go through a bunch of hoops for unavailable, and especially, not applicable information.
So, I'm just wondering what other people's take on this issue is. Are there other (better?) ways of handling this complexity? What are the advantages and drawbacks of different approaches? What should I consider about what I'm trying to do in choosing an approach?
[Note: I'm working in Objective-C, but this isn't necessarily specific to that language.]
If you want to treat those remote resources as objects on the client side, the do yourself a huge favour and forget about the REST buzzword. You will drive yourself crazy. Just accept that you are doing HTTP RPC and move on as you would doing any other RPC project.
However, if you really want to do REST, you need to understand what is meant by the "State Transfer" part of the REST acronym and you need to read about HATEOAS. It is a huge mental shift for building clients, but it does have a bunch of benefits. But maybe you don't need those particular benefits.
What I do know, is if you are trying using a "REST API" to retrieve objects over the wire, you are going to come to the conclusion that REST is a load of crap.
It's an interesting question, but I think you're probably overthinking this a bit.
Firstly, I think you're considering the possible states of information a bit too much; consider the more basic consideration that you either have the information or you don't. WHY you have the information doesn't really matter, except in one case. Let me explain; if the information about a certain bill or legislator or anything is not applicable, you shouldn't be requesting it / needing it. That "state" is irrelevant. Similarly, if the information is in the process of being requested, then it is simply not yet available; the only state you really care about is whether you have the information or if you do not yet have the information.
If you start worrying about further depths of the request process, you risk getting into a deep, endless cycle of managing state; has the information changed between when I got it and now? All you can know about the information is if you've been told what it is. This is fundamental to the REST process; you're getting REPRESENTATION of the underlying data, but there's no mistake about it; the representation is NOT the underlying data, any more than a congressman's name is the congressman himself.
Second, don't worry about information availability. If an object has a subobject, when you query the object, query for the subobject. If you get back data, great. If you get back that the data isn't available, that too is a representation of the subobject's data; it's just a different representation than you were hoping for, but it's equally valid. I'd represent that as an object with a null value; the object exists (was instantiated because it belonged to the parent), but you have no valid data about it (the representation returned was empty due to some reason; lack of availability, server down, data changed; whatever).
Finally, the real key here is that you need to be remembering that a RESTful structure is driven by hypermedia; a request to an object that does not return the full object's data should return an URI for requesting the subobject's data; and so forth. The key here is that those structures aren't static, like your object structure seems to be hoping to treat them; they're dynamic, and it's up to the server to determine the representation (i.e., the interrelationship). Attempting to define that in stone with a concrete object representation ahead of time means that you're dealing with the system in a way that REST was never meant to be dealt with.

Does my API design violate RESTful principles?

I'm currently (I try to) designing a RESTful API for a social network. But I'm not sure if my current approach does still accord to the RESTful principles. I'd be glad if some brighter heads could give me some tips.
Suppose the following URI represents the name field of a user account:
people/{UserID}/profile/fields/name
But there are almost hundred possible fields. So I want the client to create its own field views or use predefined ones. Let's suppose that the following URI represents a predefined field view that includes the fields "name", "age", "gender":
utils/views/field-views/myFieldView
And because field views are kind of higher logic I don't want to mix support for field views into the "people/{UserID}/profile/fields" resource. Instead I want to do the following:
utils/views/field-views/myFieldView/{UserID}
Another example
Suppose we want to perform some quantity operations (hope that this is the right name for it in English). We have the following URIs whereas each of them points to a list of persons -- the friends of them:
GET people/exampleUID-1/relationships/friends
GET people/exampleUID-2/relationships/friends
And now we want to find out which of their friends are also friends of mine. So we do this:
GET people/myUID/relationships/intersections/{Value-1};{Value-2}
Whereas "{Value-1/2}" are the url encoded values of "people/exampleUID-1/friends" and "people/exampleUID-2/friends". And then we get back a representation of all people which are friends of all three persons.
Though Leonard Richardson & Sam Ruby state in their book "RESTful Web Services" that a RESTful design is somehow like an "extreme object oriented" approach, I think that my approach is object oriented and therefore accords to RESTful principles. Or am I wrong?
When not: Are such "object oriented" approaches generally encouraged when used with care and in order to avoid query-based REST-RPC hybrids?
Thanks for your feedback in advance,
peta
I've never worked with REST, but I'd have assumed that GETting a profile resource at '''/people/{UserId}/profile''' would yield a document, in XML or JSON or something, that includes all the fields. Client-side I'd then ignore the fields I'm not interested in. Isn't that much nicer than having to (a) configure a personalised view on the server or (b) make lots of requests to fetch each field?
Hi peta,
I'm still reading through RESTful Web Services myself, but I'd suggest a slightly different approach than the proposed one.
Regarding the first part of your post:
utils/views/field-views/myFieldView/{UserID}
I don't think that this is RESTful, as utils is not a resource. Defining custom views is OK, however these views should be (imho) a natural part of your API's URI scheme. To incorporate the above into your first URI example, I would propose one of the following examples instead of creating a special view for it:
people/{UserID}/profile/fields/name,age,gender/
people/{UserID}/profile/?fields=name,age,gender
The latter example considers fields as an input value for your algorithm. This might be a better approach than having fields in the URI as it is not a resource itself - it just puts constraints on the existing view of people/{UserID}/profile/. Technically, it's very similar as pagination, where you would limit a view by default and allow clients to browse through resources by using ?page=1, ?page=2 and so on.
Regarding the second part of your post:
This is a more difficult one to crack.
First:
Having intersection in the URI breaks your URI scheme a bit. It's not a resource by itself and also it sits on the same level as friends, whereas it would be more suitable one level below or as an input value for your algorithm, i.e.
GET people/{UserID}/relationships/friends/intersections/{Value-1};{Value-2}
GET people/{UserID}/relationships/friends/?intersections={Value-1};{Value-2}
I'm again personally inclined to the latter, because similarly as in the first case, you are just constraining the existing view of people/{UserID}/relationships/friends/
Secondly, regarding:
Whereas "{Value-1/2}" are the url
encoded values of
"people/exampleUID-1/friends" and
"people/exampleUID-2/friends"
If you meant that {Value-1/2} contain the whole encoded response of the mentioned GET requests, then I would avoid that - I don't think that the RESTful way. Since friends is a resource by itself, you may want to expose it and access it directly, i.e.:
GET friends/{UserID-1};{UserID-2};{UserID-3}
One important thing to note here - I've used ; between user IDs in the previous example, whereas I used , in the fields example above. The reasoning is that both represent a different operator. In the first case we needed OR (,) in order to get all three fields, while in the last example above we had to use AND (;) in order to get an intersection.
Usage of two types of operators can over-complicate the API design, but it should provide more flexibility in the end.
thanks for your clarifying answers. They are exactly what I was asking for. Unfortunately I hadn't the time to read "RESTful Web Services" from cover to cover; but I will catch it up as soon as possible. :-)
Regarding the first part of my post:
You're right. I incline to your first example, and without fields. I think that the I don't need it at all. (At the moment) Why do you suggest the use of OR (,) instead of AND (;)? Intuitively I'd use the AND operator because I want all three of them and not just the first one existing. (Like on page 121 the colorpairs example)
Regarding the second part:
With {Value-1/2} I meant only the url-encoded value of the URIs -- not their response data. :) Here I incline with you second example. Here it should be obvious that under the hood an algorithm is involed when calculating intersecting friends. And beside that I'm probably going to add some further operations to it.
peta

Is this API too simple?

There are a multitude of key-value stores available. Currently you need to choose one and stick with it. I believe an independent open API, not made by a key-value store vendor would make switching between stores much easier.
Therefore I'm building a datastore abstraction layer (like ODBC but focused on simpler key value stores) so that someone build an app once, and change key-value stores if necessary. Is this API too simple?
get(Key)
set(Key, Value)
exists(Key)
delete(Key)
As all the APIs I have seen so far seem to add so much I was wondering how many additional methods were necessary?
I have received some replies saying that set(null) could be used to delete an item and if get returns null then this means that an item doesn't exist. This is bad for two reasons. Firstly, is it not good to mix return types and statuses, and secondly, not all languages have the concept of null. See:
Do all programming languages have a clear concept of NIL, null, or undefined?
I do want to be able to perform many types of operation on the data, but as I understand it everything can be built up on top of a key value store. Is this correct? And should I provide these value added functions too? e.g: like mapreduce, or indexes
Internally we already have a basic version of this in Erlang and Ruby and it has saved us alot of time, and also enabled us to test performance for specific use cases of different key value stores
Do only what is absolute necessary, instead of asking if it is too simple, ask if it is too much, even if it only has one method.
Your API lacks some useful functions like "hasKey" and "clear". You might want to look at, say, Python's hack at it, http://docs.python.org/tutorial/datastructures.html#dictionaries, and pick and choose additional functions.
Everyone is saying, "simple is good" and that's true until "simple is too simple."
If all you are doing is getting, setting, and deleting keys, this is fine.
There is no such thing as "too simple" for an API. The simpler the better! If it solves the need the way it is, then leave it.
The delete method is unnecessary. You can just pass null to set.
Edited to add:
I'm only kidding! I would keep delete, and probably add Count, Contains, and maybe an enumerator (or two).
When creating an API, you need to ask yourself, what does my API provide the user. If your API is so simplistic that it is faster and easier for your client to write their own app, then your API has failed. Ask yourself, does my functionality give them specific benefits. If the answer is no, it is too simplistic and generic.
I am all for simplifying an interface to its bare minimum but without having more details about the requirements of the system, it is tough to tell if this interface is sufficient. Sure looks concise enough though.
Don't forget to document the semantics for "key non-existent" as it isn't clear from reading your API definition above. updated: I see you have added the exists method: is this necessary? you could use the get method and define a NIL of some sort, no?
Maybe worth thinking about: how about considering "freshness" of a value? i.e. an associated "last-modified" timestamp? Of course, it depends on your system requirements.
What about access control? Is it within scope of the API definition?
What about iterating through the keys? If there is a possibility of a large set, you might want to include some pagination semantics.
As mentioned, the simpler the better, but a simple iterator or key-listing method could be of use. I always end up needing to iterate through the set. A "size()" method too, if not taken care of by the iterator. It obviously depends on your usage, though.
It's not too simple, it's beautiful. If "exists(key)" is just a convenient shorthand for "get(Key) != null", you should consider removing it. I guess that depends on how large or complex the value you get() is.