How to design a REST API with LIKE criteria? - api

I'm designing a REST API and have an entity for "people":
GET http://localhost/api/people
Returns a list of all the people in the system
GET http://localhost/api/people/1
Returns the person with id 1.
GET http://localhost/api/people?forename=john&surname=smith
Returns all the people with matching forenames and surnames but I have a further requirement. What is the cleanest / best practice way of allowing API consumers to retrieve all the people whose forename starts with "jo" for example.
I've seen some APIs do this like:
GET http://localhost/api/people?forename=jo~&surname=smith
where the tilde signifies a "fuzzy" match. On the other hand I've seen it implemented with a totally different criteria e.g.
GET http://localhost/api/people?forename-startswith=jo&surname=smith
which seems a bit cumbersome considering I might have -endswith, -contains, -soundslike (for some sort of soundex match).
Can anyone suggest from experience which works better and also any examples of well designed REST APIs that have similar functionality.

IMHO it does not matter if you have fuzzy matches or have -endswith -contains etc. What matters is if your REST API permits easy parsing of such parameters so that you can define functions to fetch data from your data source (DB or xml file etc.) accordingly
If you are using PHP...from my experience, SlimFramework is a great light weight, easy-to-get-started solution.

I would recommend you the OData protocol which provides a Query String Options. What you did is ok and follows REST conventions.
But, the OData protocol describes a $expand parameter and even a $filter parameter. This $ prefix denotes "System Query Options" and you will be interested in the last one because it allows you to write the following URI:
http://services.odata.org/Northwind/Northwind.svc/Customers?$filter=tolower(CompanyName) eq 'foobar' &select=FirstName,LastName&$orderby=Name desc
It allows you to pass SQL like data, it can be a nice alternative to what you described (both solutions are fine, it's just a matter of taste).

AFAIK, none of above are quite RESTful. Both of them rely on a priory knowledge on the client's part on how to invoke queries (in the first case, query pattern and on the second one a query DSL). In the second example, in fact, the API is reduced mere to a wrapper around the data store. As such, API does not define a server domain - it is a data provider. This is in contrast to the client-server constraint of REST.
If you need to expose a full-blown data store with all various querying capabilities, you had better stick to known standards which we have OData. OData has been sold as REST but many REST-heads have problems with it. Anyhow, at the end of the day it works and REST discussions can commonly lead to analysis-paralysis.
If I was doing this, I would probably constraint the API to a common use-case, so something more like the second one without defining a query DSL (hence forenameStartsWith rather than forename-startswith).
Having said that, if you need to query based on many fields and various conditions, I would use OData.

Both examples use query parameters for filtering. I don't think it matters what these query parameters are called or if some wildcard syntax is used.
Both approaches are equally RESTFul.

Related

Query String vs Resource Path for Filtering Criteria

Background
I have 2 resources: courses and professors.
A course has the following attributes:
id
topic
semester_id
year
section
professor_id
A professor has the the following attributes:
id
faculty
super_user
first_name
last_name
So, you can say that a course has one professor and a professor may have many courses.
If I want to get all courses or all professors I can: GET /api/courses or GET /api/professors respectively.
Quandary
My quandary comes when I want to get all courses that a certain professor teaches.
I could use either of the following:
GET /api/professors/:prof_id/courses
GET /api/courses?professor_id=:prof_id
I'm not sure which to use though.
Current solution
Currently, I'm using an augmented form of the latter. My reasoning is that it is more scale-able if I want to add in filtering/sorting criteria.
I'm actually encoding/embedding JSON strings into the query parameters. So, a (decoded) example might be:
GET /api/courses?where={professor_id: "teacher45", year: 2016}&order={attr: "topic", sort: "asc"}
The request above would retrieve all courses that were (or are currently being) taught by the professor with the provided professor_id in the year 2016, sorted according to topic title in ascending ASCII order.
I've never seen anyone do it this way though, so I wonder if I'm doing something stupid.
Closing Questions
Is there a standard practice for using the query string vs the resource path for filtering criteria? What have some larger API's done in the past? Is it acceptable, or encouraged to use use both paradigms at the same time (make both endpoints available)? If I should indeed be using the second paradigm, is there a better organization method I could use besides encoding JSON? Has anyone seen another public API using JSON in their query strings?
Edited to be less opinion based. (See comments)
As already explained in a previous comment, REST doesn't care much about the actual form of the link that identifies a unique resource unless either the RESTful constraints or the hypertext transfer protocol (HTTP) itself is violated.
Regarding the use of query or path (or even matrix) parameters is completely up to you. There is no fixed rule when to use what but just individual preferences.
I like to use query parameters especially when the value is optional and not required as plenty of frameworks like JAX-RS i.e. allow to define default values therefore. Query parameters are often said to avoid caching of responses which however is more an urban legend then the truth, though certain implementations might still omit responses from being cached for an URI containing query strings.
If the parameter defines something like a specific flavor property (i.e. car color) I prefer to put them into a matrix parameter. They can also appear within the middle of the URI i.e. /api/professors;hair=grey/courses could return all cources which are held by professors whose hair color is grey.
Path parameters are compulsory arguments that the application requires to fulfill the request in my sense of understanding otherwise the respective method handler will not be invoked on the service side in first place. Usually this are some resource identifiers like table-row IDs ore UUIDs assigned to a specific entity.
In regards to depicting relationships I usually start with the 1 part of a 1:n relationship. If I face a m:n relationship, like in your case with professors - cources, I usually start with the entity that may exist without the other more easily. A professor is still a professor even though he does not hold any lectures (in a specific term). As a course wont be a course if no professor is available I'd put professors before cources, though in regards to REST cources are fine top-level resources nonetheless.
I therefore would change your query
GET /api/courses?where={professor_id: "teacher45", year: 2016}&order={attr: "topic", sort: "asc"}
to something like:
GET /api/professors/teacher45/courses;year=2016?sort=asc&onField=topic
I changed the semantics of your fields slightly as the year property is probably better suited on the courses rather then the professors resource as the professor is already reduced to a single resource via the professors id. The courses however should be limited to only include those that where held in 2016. As the sorting is rather optional and may have a default value specified, this is a perfect candidate for me to put into the query parameter section. The field to sort on is related to the sorting itself and therefore also belongs to the query parameters. I've put the year into a matrix parameter as this is a certain property of the course itself, like the color of a car or the year the car was manufactured.
But as already explained previously, this is rather opinionated and may not match with your or an other folks perspective.
I could use either of the following:
GET /api/professors/:prof_id/courses
GET /api/courses?professor_id=:prof_id
You could. Here are some things to consider:
Machines (in particular, REST clients) should be treating the URI as an opaque thing; about the closest they ever come to considering its value is during resolution.
But human beings, staring that a log of HTTP traffic, do not treat the URI opaquely -- we are actually trying to figure out the context of what is going on. Staying out of the way of the poor bastard that is trying to track down a bug is a good property for a URI Design to have.
It's also a useful property for your URI design to be guessable. A URI designed from a few simple consistent principles will be a lot easier to work with than one which is arbitrary.
There is a great overview of path segment vs query over at Programmers
https://softwareengineering.stackexchange.com/questions/270898/designing-a-rest-api-by-uri-vs-query-string/285724#285724
Of course, if you have two different URI, that both "follow the rules", then the rules aren't much help in making a choice.
Supporting multiple identifiers is a valid option. It's completely reasonable that there can be more than one way to obtain a specific representation. For instance, these resources
/questions/38470258/answers/first
/questions/38470258/answers/accepted
/questions/38470258/answers/top
could all return representations of the same "answer".
On the /other hand, choice adds complexity. It may or may not be a good idea to offer your clients more than one way to do a thing. "Don't make me think!"
On the /other/other hand, an api with a bunch of "general" principles that carry with them a bunch of arbitrary exceptions is not nearly as easy to use as one with consistent principles and some duplication (citation needed).
The notion of a "canonical" URI, which is important in SEO, has an analog in the API world. Mark Seemann has an article about self links that covers the basics.
You may also want to consider which methods a resource supports, and whether or not the design suggests those affordances. For example, POST to modify a collection is a commonly understood idiom. So if your URI looks like a collection
POST /api/professors/:prof_id/courses
Then clients are more likely to make the associate between the resource and its supported methods.
POST /api/courses?professor_id=:prof_id
There's nothing "wrong" with this, but it isn't nearly so common a convention.
GET /api/courses?where={professor_id: "teacher45", year: 2016}&order={attr: "topic", sort: "asc"}
I've never seen anyone do it this way though, so I wonder if I'm doing something stupid.
I haven't either, but syntactically it looks a little bit like GraphQL. I don't see any reason why you couldn't represent a query that way. It would make more sense to me as a single query description, rather than breaking it into multiple parts. And of course it would need to be URL encoded, etc.
But I would not want to crazy with that right unless you really need to give to your clients that sort of flexibility. There are simpler designs (see Roman's answer)

REST API filter operator best practice

I am building a REST API that uses a filter parameter to control search results. E.g., one could search for a user by calling:
GET /users/?filter=name%3Dfoo
Now, my API should allow many different filter operators. Numeric operators such as equals, greater than, less than, string operators like contains, begins with or ends with and date operators such as year of or timediff. Moreover, AND and OR combinations should be possible.
Basically, I want to support a subset of the underlying MySQL database operators.
I found a lot of different implementations (two good examples are Google Analytics and LongJump) that seem to use custom syntax.
Looking at my requirements, I would probably design a custom syntax pretty similiar to the MySQL operator syntax.
However, I was wondering if there are any best practices established that I should follow and whether I should consider anything else. Thanks!
You need an already existing query language, don't try to reinvent the wheel! By REST this is complicated and not fully solved issue. There are some REST constraints your application must fulfill:
uniform interface / hypermedia as the engine of application state:
You have to send hypermedia responses to your clients, and they have to follow the hyperlinks given in those responses, instead of building the requests on their own. So you can decouple the clients from the structure of the URI.
uniform interface / self-descriptive messages:
You have to send messages annotated with semantics. So you can decouple the clients from the data structure. The best solution to do this is RDF with for example open linked data vocabs. If you don't want to use RDF, then the second best solution to use a vendor specific MIME type, so your messages will be self-descriptive, but the clients need to know how to parse your custom MIME type.
To describe simple search links, you can use URI templates, for example GET /users/{?name} will wait a name parameter in the query string. You can use the hydra:IRITemplateMapping from the hydra vocab to add semantics to the paramers like name.
Describing ad-hoc queries is a hard task. You have to describe somehow what your query can contain.
You can choose an URI query language and stick with URI templates and probably hydra annotation. There are many already existing URI query languages, like HTSQL, OData query (ppl don't like that one), etc...
You can choose an existing query language and send it in a single URI param. This can be anything you want, for example SQL, SPARQL, etc... You have to teach your client to generate that param. You can create your own vocab to describe the constraints of the actual query. If you don't need complicated things, this should not be a problem. I don't know of already existing query structure descibing vocabs, but I never looked for them...
You can choose an existing query language and send it in the body in a SEARCH request. Afaik SEARCH is not cached or supported by recent HTTP clients. It was defined by webdav. You can describe your query with the proper MIME type, and you can use the same vocab as by the previous solution.
You can use an RDF query solution, for example a SPARQL endpoint, or triple pattern fragments, etc... So your queries will contain the semantic metadata, and not your link description. By SPARQL you don't necessary need a triple data storage, you can translate the queries on server side to SQL, or whatever you use. You can probably use SPIN to describe query constraints and query templates, but that is new for me too. There might be other solutions to describe SPARQL query structures...
So to summarize if you want a real REST solution, you have to describe to your clients, how they can construct the queries and what parameters, logical operators they can use. Without query descriptions they won't be able to generate for example a HTML form for the user. If you don't want a REST solution, then pick a query language write a builder on the client, write a parser on the server and that's all.
The Open Data Protocol (OData)
You can check BreezeJs too and see how this protocol it's implemented for node.js + mongodb with breeze-mongodb module and for a .NET project using Web API and EntityFramework with Breeze.ContextProvider dll.
By embracing a set of common, accepted delimiters, equality comparison can be implemented in
straight-forward fashion. Setting the value of the filter query-string parameter to a string using those
delimiters creates a list of name/value pairs which can be parsed easily on the server-side and utilized
to enhance database queries as needed. You can use the delimeters of your choice say (“|”) to separate individual filter phrases for OR and ("&") to separate
individual filter phrases for AND and a double colon (“::”) to separate the names and values.
This provides a unique-enough set of delimiters to support the majority of use cases and creates a user readable
query-string parameter. A simple example will serve to clarify the technique. Suppose we want
to request users with the name “Todd” who live in "Denver" and have the title of “Grand Poobah”.
The request URI, complete with query-string might look like this:
GET http://www.example.com/users?filter="name::todd&city::denver&title::grand poobah”
The delimiter of the double colon (“::”) separates the property name from the comparison value,
enabling the comparison value to contain spaces—making it easier to parse the delimiter from the value
on the server.
Note that the property names in the name/value pairs match the name of the properties that would be
returned by the service in the payload.
Case sensitivity is certainly up for debate on a case-by-case basis, but in general,
filtering works best when case is ignored. You can also offer wild-cards as needed using the asterisk
(“*”) as the value portion of the name/value pair.
For queries that require more-than simple equality or wild-card comparisons, introduction of operators
is necessary. In this case, the operators themselves should be part of the value and parsed on the server
side, rather than part of the property name. When complex query-language-style functionality is
needed, consider introducing query concept from the Open Data Protocol (OData) Filter System Query
Option specification (http://www.odata.org/documentation/odata-version-4-0/)
There seems to be a lot of standards (like OData), but many are quite complicated in that they introduce new syntax.
For simple multi filtering the following format avoid polluting the parameter namespace while still standing on top of existing web-technology
GET /users?filter[name]=John&filter[title]=Manager
It's easily readable and on the backend languages like PHP will receive it as an array of filters to apply.
A possible standard would SCIM which is adopted by some commercial products. But it's not distinguished by brevity. For a pet project I used this
= equal
! not equal
* like
< smaller
> greater
& bitwise and 
| bitwise or
^ bitwise xor
~ in comma separated value list
Examples
So GET /user?name=*An* would get all users whose name start with An and GET /user?name=~Anna,Bertha would get those two users.
Not yet a standard but who knows...

Writing an API, benefits of: including nested objects automatically, not at all, or provide a parameter to specify which to include?

For example, we have an entity called ServiceConfig that contains a pointer to a Service and a Professional. If returned without including the fields would look like this:
{
'type': '__Pointer',
'className': 'Service',
'objectId': 'q92he840'
}
At which point they could query again to retrieve that service. However, it is often the case that they need the Service name. In which case it is inefficient to have to query again to get the service every time.
Options:
Automatically return the Service. In which case, we should automatically return the Industry for that Service as well in case they need that... same applies to all. Seems like we're returning data too often here.
Allow them to pass an includes parameter that specifies which entities to include. Format is an array of strings where using a . can allow them to include subclasses. In this case ['Professional', 'Service.Industry'] would work.
Can anyone identify why any one solution would be better than the others? I feel that the last solution is the best, however it does not seem to be common to do to in the APIs I've seen.
This is a good API Design decision to spend your time on before you release an initial version. Both your approaches are valid and it all depends on what you think are the most common ways that clients would use your API.
Here are some points that you could consider:
You might prefer the first approach where you do not give all the data upfront. Sometimes it is about efficiency and at times it is also about security and ensuring that any additional important data is only fetched on as as needed basis and on authorization.
Implementing the 2nd approach is going to take more effort on part of your team to design/code and test out the API. So you might want to consider how much of effort you want to put into release 1.0
Since you have nested data for example, the second approach will serve you well. Several public APIs do that as a matter of fact. For e.g. look at the LinkedIn public API and particularly the facets section, where you can specify the fields or additional information that you would like to return.
Look at some of the client applications that you have written and if you can identify for sure that some data is needed anyways upfront, then it can help in designed the return data.
Eventually monitoring API usage and doing some analysis on the number of calls, methods invoked will give you good inputs on what to do next.
If I had to make a choice and have a little bit more leeway in terms of effort, I would go with the 2nd option, even if it is a simple version at first.

Design RESTful URI

I am in the process of creating a RESTful API. I read
http://microformats.org/wiki/rest/urls
but this site doesn't give me enough "good" practice on designing my API.
Specifically I will write an API (only GET methods that far) which will provide functions to convert geo-coordinates.
Example:
A geohash is a single value representation of a coordinate, thus
/convert/geohash/u09tvkx0.json?outputformat=latlong
makes sense. On the other hand
/convert/latlong.xml?lat=65&long=13&outputformat=UTC requires two input values.
See my "question"? What makes a good API which requires more than one input parameter?
(Tried to "identify" good practice by "analysing" twitter & FF but failed)
In terms of being considered a "technically" correct REST URI, there is no difference between using query string parameters or not. In RFC 3986, it states:
The query component contains non-hierarchical data that, along with data in the path component (Section 3.3), serves to identify a resource
That's why you're having a difficult time finding a definitive "best practice". Having said that, many REST APIs do embed multiple parameters in the URI without using query strings. For exammple, to identify the make and model of a car, you'll see websites with URI's like this: cars.com/honda/civic. In that case it's very obvious the relationship between the 2 and so having everything in the URI is "hackable". It's also much easier to stick with a non-query string approach when you only have one parameter which is uniquely identifying the resource; but if it's something like a search query, then I'd probably keep it in the query string. This SO question has an interesting discussion about the different approaches as well.
In your example above, I would stick with the query string parameters. Although REST typically has more intuitive URLs, that's really not what REST is about. REST is more about hypermedia and HATEOAS.
There are few REST best practices to be followed when designing a REST API
Abstract vs Concrete
CRUD Operations
Error Handling
API Versioning
Filtering
Security
Analytics
Documentation
Stability and Consistency
URL Structure
Read More

Does my API design violate RESTful principles?

I'm currently (I try to) designing a RESTful API for a social network. But I'm not sure if my current approach does still accord to the RESTful principles. I'd be glad if some brighter heads could give me some tips.
Suppose the following URI represents the name field of a user account:
people/{UserID}/profile/fields/name
But there are almost hundred possible fields. So I want the client to create its own field views or use predefined ones. Let's suppose that the following URI represents a predefined field view that includes the fields "name", "age", "gender":
utils/views/field-views/myFieldView
And because field views are kind of higher logic I don't want to mix support for field views into the "people/{UserID}/profile/fields" resource. Instead I want to do the following:
utils/views/field-views/myFieldView/{UserID}
Another example
Suppose we want to perform some quantity operations (hope that this is the right name for it in English). We have the following URIs whereas each of them points to a list of persons -- the friends of them:
GET people/exampleUID-1/relationships/friends
GET people/exampleUID-2/relationships/friends
And now we want to find out which of their friends are also friends of mine. So we do this:
GET people/myUID/relationships/intersections/{Value-1};{Value-2}
Whereas "{Value-1/2}" are the url encoded values of "people/exampleUID-1/friends" and "people/exampleUID-2/friends". And then we get back a representation of all people which are friends of all three persons.
Though Leonard Richardson & Sam Ruby state in their book "RESTful Web Services" that a RESTful design is somehow like an "extreme object oriented" approach, I think that my approach is object oriented and therefore accords to RESTful principles. Or am I wrong?
When not: Are such "object oriented" approaches generally encouraged when used with care and in order to avoid query-based REST-RPC hybrids?
Thanks for your feedback in advance,
peta
I've never worked with REST, but I'd have assumed that GETting a profile resource at '''/people/{UserId}/profile''' would yield a document, in XML or JSON or something, that includes all the fields. Client-side I'd then ignore the fields I'm not interested in. Isn't that much nicer than having to (a) configure a personalised view on the server or (b) make lots of requests to fetch each field?
Hi peta,
I'm still reading through RESTful Web Services myself, but I'd suggest a slightly different approach than the proposed one.
Regarding the first part of your post:
utils/views/field-views/myFieldView/{UserID}
I don't think that this is RESTful, as utils is not a resource. Defining custom views is OK, however these views should be (imho) a natural part of your API's URI scheme. To incorporate the above into your first URI example, I would propose one of the following examples instead of creating a special view for it:
people/{UserID}/profile/fields/name,age,gender/
people/{UserID}/profile/?fields=name,age,gender
The latter example considers fields as an input value for your algorithm. This might be a better approach than having fields in the URI as it is not a resource itself - it just puts constraints on the existing view of people/{UserID}/profile/. Technically, it's very similar as pagination, where you would limit a view by default and allow clients to browse through resources by using ?page=1, ?page=2 and so on.
Regarding the second part of your post:
This is a more difficult one to crack.
First:
Having intersection in the URI breaks your URI scheme a bit. It's not a resource by itself and also it sits on the same level as friends, whereas it would be more suitable one level below or as an input value for your algorithm, i.e.
GET people/{UserID}/relationships/friends/intersections/{Value-1};{Value-2}
GET people/{UserID}/relationships/friends/?intersections={Value-1};{Value-2}
I'm again personally inclined to the latter, because similarly as in the first case, you are just constraining the existing view of people/{UserID}/relationships/friends/
Secondly, regarding:
Whereas "{Value-1/2}" are the url
encoded values of
"people/exampleUID-1/friends" and
"people/exampleUID-2/friends"
If you meant that {Value-1/2} contain the whole encoded response of the mentioned GET requests, then I would avoid that - I don't think that the RESTful way. Since friends is a resource by itself, you may want to expose it and access it directly, i.e.:
GET friends/{UserID-1};{UserID-2};{UserID-3}
One important thing to note here - I've used ; between user IDs in the previous example, whereas I used , in the fields example above. The reasoning is that both represent a different operator. In the first case we needed OR (,) in order to get all three fields, while in the last example above we had to use AND (;) in order to get an intersection.
Usage of two types of operators can over-complicate the API design, but it should provide more flexibility in the end.
thanks for your clarifying answers. They are exactly what I was asking for. Unfortunately I hadn't the time to read "RESTful Web Services" from cover to cover; but I will catch it up as soon as possible. :-)
Regarding the first part of my post:
You're right. I incline to your first example, and without fields. I think that the I don't need it at all. (At the moment) Why do you suggest the use of OR (,) instead of AND (;)? Intuitively I'd use the AND operator because I want all three of them and not just the first one existing. (Like on page 121 the colorpairs example)
Regarding the second part:
With {Value-1/2} I meant only the url-encoded value of the URIs -- not their response data. :) Here I incline with you second example. Here it should be obvious that under the hood an algorithm is involed when calculating intersecting friends. And beside that I'm probably going to add some further operations to it.
peta