How to deal with a breaking change in an invariant (business rule) when you expose a Public API - api

I am starting to investigate the good practices about Public API, specifically about how to deal with breaking changes. There is a lot of technicalities related to the versioning (or non-versioning!), but I am more interested about the code base implication.
Imagine a basic scenario where you have a business rule "password must have at least 10 caracters". And you have a "Create User" scenario exposed in a public API, accepting a password.
You have hundreds of clients using it, and one day, you decide to change the business rule to "password must have at least 15 caracters". Even if you did not change the semantic of the API signature and payloads, you just introduced a breaking change in your API because you changed the behavior of this API.
How would you deal with this?
I only find wrong approaches:
Modify your domain invariants (business rules) with dated/versionned invariants: this would create a nightmare in the code readibility / testing / etc.
Duplicate your code base per API version: this would create a maintenance nightmare
Hope one day you will be able to deprecate all this and become clean again: in your dream...
Any real life experience on this in your job?

The easiest way is just to communicate with your clients and warn them of the upcoming change weeks/months before. This way they can prepare and be ready for the breaking change.
If you absolutely must support old clients, another option is to keep the domain invariant to 10, but add an additionnal api call for the create user scenario which checks the password length and verifies it is of length 15 outside the domain. Then, encourage your users to migrate to the new CreateUser endpoint. This works for simple cases like this one but will become very hard to do for complicated invariants or if your domain is used in different contexts (multiple Apis, desktop app etc).
If you decide to go with this route, a good tip is to make sure you have metrics to know how many clients use the old endpoint vs how many use the new endpoint. When you have reached a certain threshold you can shutdown the old endpoint and move the minimum password length of 15 invariant from the Api to the domain,

Related

REST API responses based on authentication, best practices?

I have an API with endpoint GET /users/{id} which returns a User object. The User object can contain sensitive fields such as cardLast4, cardBrand, etc.
{
firstName: ...,
lastName: ...,
cardLast4: ...,
cardBrand: ...
}
If the user calls that endpoint with their own ID, all fields should be visible. However, if it is someone elses ID then cardLast4 and cardBrand should be hidden.
I want to know what are the best practices here for designing my response. I see three options:
Option 1. Two DTOs, one with all fields and one without the hidden fields:
// OtherUserDTO
{
firstName: ...,
lastName: ..., // cardLast4 and cardBrand hidden
}
I can see this becoming out of hand with DTOs based on role, what if now I have UserDTOForAdminRole, UserDTOForAccountingRole, etc... It looks like it quickly gets out of hand with the number of potential DTOs.
Option 2. One response object being the User, but null out the values that the user should not be able to see.
{
firstName: ...,
lastName: ...,
cardLast4: null, // hidden
cardBrand: null // hidden
}
Option 3. Create another endpoint such as /payment-methods?userId={userId} even though PaymentMethod is not an entity in my database. This will now require 2 api calls to get all the data. If the userId is not their own, it will return 403 forbidden.
{
cardLast4: ...,
cardBrand: ...
}
What are the best practices here?
You're gonna get different opinions about this, but I feel that doing a GET request on some endpoint, and getting a different shape of data depending on the authorization status can be confusing.
So I would be tempted, if it's reasonable to do this, to expose the privileged data via a secondary endpoint. Either by just exposing the private properties there, or by having 2 distinct endpoints, one with the unprivileged data and a second that repeats the data + the new private properties.
I tend to go for option 1 here, because an API endpoint is not just a means to get data. The URI is an identity, so I would want /users/123 to mean the same thing everywhere, and have a second /users/123/secret-properties
I have an API with endpoint GET /users/{id} which returns a User object.
In general, it may help to reframe your thinking -- resources in REST are generalizations of documents (think "web pages"), not generalizations of objects. "HTTP is an application protocol whose application domain is the transfer of documents over a network" -- Jim Webber, 2011
If the user calls that endpoint with their own ID, all fields should be visible. However, if it is someone elses ID then cardLast4 and cardBrand should be hidden.
Big picture view: in HTTP, you've got a bit of tension between privacy (only show documents with sensitive information to people allowed access) and caching (save bandwidth and server pressure by using copies of documents to satisfy more than one request).
Cache is an important architectural constraint in the REST architectural style; that's the bit that puts the "web scale" in the world wide web.
OK, good news first -- HTTP has special rules for caching web requests with Authorization headers. Unless you deliberately opt-in to allowing the responses to be re-used, you don't have to worry the caching.
Treating the two different views as two different documents, with different identifiers, makes almost everything easier -- the public documents are available to the public, the sensitive documents are locked down, operators looking at traffic in the log can distinguish the two different views because the logged identifier is different, and so on.
The thing that isn't easier: the case where someone is editing (POST/PUT/PATCH) one document and expecting to see the changes appear in the other. Cache-invalidation is one of the two hard problems in computer science. HTTP doesn't have a general purpose mechanism that allows the origin server to mark arbitrary documents for invalidation - successful unsafe requests will invalidate the effective-target-uri, the Location, the Content-Location, and that's it... and all three of those values have other important uses, making them more challenging to game.
Documents with different absolute-uri are different documents, and those documents, once copied from the origin server, can get out of sync.
This is the option I would normally choose - a client looking at cached copies of a document isn't seeing changes made by the server
OK, you decide that you don't like those trade offs. Can we do it with just one resource identifier? You immediately lose some clarity in your general purpose logs, but perhaps a bespoke logging system will get you past that.
You probably also have to dump public caching at this point. The only general purpose header that changes between the user allowed to look at the sensitive information and the user who isn't? That's the authorization header, and there's no "Vary" mechanism on authorization.
You've also got something of a challenge for the user who is making changes to the sensitive copy, but wants to now review the public copy (to make sure nothing leaked? or to make sure that the publicly visible changes took hold?)
There's no general purpose header for "show me the public version", so either you need to use a non standard header (which general purpose components will ignore), or you need to try standardizing something and then driving adoption by the implementors of general purpose components. It's doable (PATCH happened, after all) but it's a lot of work.
The other trick you can try is to play games with Content-Type and the Accept header -- perhaps clients use something normal for the public version (ex application/json), and a specialized type for the sensitive version (application/prs.example-sensitive+json).
That would allow the origin server to use the Vary header to indicate that the response is only suitable if the same accept headers are used.
Once again, general purpose components aren't going to know about your bespoke content-type, and are never going to ask for it.
The standardization route really isn't going to help you here, because the thing you really need is that clients discriminate between the two modes, where general purpose components today are trying to use that channel to advertise all of the standardized representations that they can handle.
I don't think this actually gets you anywhere that you can't fake more easily with a bespoke header.
REST leans heavily into the idea of using readily standardizable forms; if you think this is a general problem that could potentially apply to all resources in the world, then a header is the right way to go. So a reasonable approach would be to try a custom header, and get a bunch of experience with it, then try writing something up and getting everybody to buy in.
If you want something that just works with the out of the box web that we have today, use two different URI and move on to solving important problems.

How RESTful is using subdomains as resource identifiers?

We have a single-page app (AngularJs) which interacts with the backend using REST API. The app allows each user to see information about the company the user works at, but not any other company's data. Our current REST API looks like this:
domain.com/companies/123
domain.com/companies/123/employees
domain.com/employees/987
NOTE: All ids are GUIDs, hence the last end-point doesn't have company id, just the employee id.
We recently started working on enforcing the requirement of each user having access to information related exclusively the company where the user works. This means that on the backend we need to track who the logged in user is (which is simple auth problem) as well as determining the company whose information is being accessed. The latter is not easy to determine from our REST API calls, because some of them do not include company id, such as the last one shown above.
We decided that instead of tracking company ID in the UI and sending it with each request, we would put it in the subdomain. So, assuming that ACME company has id=123 our API would change as follows:
acme.domain.com
acme.domain.com/employees
acme.domain.com/employees/987
This makes identifying the company very easy on the backend and requires minor changes to REST calls from our single-page app. However, my concern is that it breaks the RESTfulness of our API. This may also introduce some CORS problems, but I don't have a use case for it now.
I would like to hear your thoughts on this and how you dealt with this problem in the past.
Thanks!
In a similar application, we did put the 'company id' into the path (every company-specific path), not as a subdomain.
I wouldn't care a jot about whether some terminology enthusiast thought my design was "RESTful
" or not, but I can see several disadvantages to using domains, mostly stemming from the fact that the world tends to assume that the domain identifies "the server", and the path is how you find an item on that server. There will be a certain amount of extra stuff you'll have to deal with with multiple domains which you wouldn't with paths:
HTTPS - you'd need a wildcard certificate instead of a simple one
DNS - you're either going to have wildcard DNS entries, or your application management is now going to involve DNS management
All the CORS stuff which you mention - may or may not be a headache in your specific application - anything which is making 'same domain' assumptions about security policy is going to be affected.
Of course, if you want lots of isolation between companies, and effectively you would be as happy running a separate server for each company, then it's not a bad design. I can't see it's more or less RESTful, as that's just a matter of viewpoint.
There is nothing "unrestful" in using subdomains. URIs in REST are opaque, meaning that you don't really care about what the URI is, but only about the fact that every single resource in the system can be identified and referenced independently.
Also, in a RESTful application, you never compose URLs manually, but you traverse the hypermedia links you find at the API endpoint and in all the returned responses. Since you don't need to manually compose URIs, from the REST point of view it's indifferent how they look. Having a URI such as
//domain.com/ABGHTYT12345H
would be as RESTful as
//domain.com/companies/acme/employees/123
or
//domain.com/acme/employees/smith-charles
or
//acme.domain.com/employees/123
All of those are equally RESTful.
But... I like to think of usable APIs, and when it comes to usability having readable meaningful URLs is a must for me. Also following conventions is a good idea. In your particular case, there is nothing unrestful with the route, but it is unusual to find that kind of behaviour in an API, so it might not be the best practice. Also, as someone pointed out, it might complicate your development (Not specifically on the CORS part though, that one is easily solved by sending a few HTTP headers)
So, even if I can't see anything non REST on your proposal, the conventions elsewhere would be against subdomains on an API.

How can I use vendor specific MIME-TYPES for a "private-labeled" REST API

I'm developing a RESTful API. Currently I'm considering the use of resource-specific vendor MIME-types to convey semantics and meaning as well as well as serve as the "contract" between client and server.
So for example application/vnd.mycompany.person+xml would mean that the data in question is xml that represents a person.
I have a requirement to make this API "private-labeled" meaning a reseller could in turn provide the API to his customer without his customer knowing that it is my company's service. The way this would work is that my company would host the main api at a sort of generic url, i.e www.example.com/api then my company would use a CNAME to point our domain name to that url, and our resellers could do the same.
Internally all resource links would be relative from the API root, and so would respect the actual url that is being used.
HOWEVER, I don't want to have to understand/support arbitrary vendor specific MIME-types, so what should the "mycompany" part of the example MIME-type above be?
The HTTP spec says:
Use of non-registered media types is discouraged.
I used to use “custom” media types in my platform, but it caused issues with user agents (browsers, cURL, wget, etc.) not recognizing the content.
You could try to get your custom media type registered, but (A) that takes a while; (B) it’d take a real long while before user-agents would recognize the type, if ever; (C) you’ve indicated that you don’t want the company name always present anyway.
As an alternative to “custom” media types, I recommend utilizing media type parameters instead; they’re a blessed way to add supplementary information about content to media types.
Using parameters, your media type could be application/xml; mycompany-schema=person or maybe just application/xml; schema=person.
I have seen a couple of frameworks and tutorials that recommend vendor specific mime-type to "solve" issues with making your REST interface "truly RESTful" simply because it can be done and somehow that makes it kosher for a REST service.
One issue with this approach is that by its nature is a hack or cheat to "make it work" the way you want when the whole point of shifting to a hypermedia-driven REST service is to change the model of your API and service and change how you approach the problem. Sneaking a "valid" or allowed but not recommended HTTP value for the Content-Type is like telling the starving Venezuelans that rats are fish so they could eat them without sin during Lent. Is there anything wrong with eating a rat if that's all you have? Probably not. But does pretending its a fish make it a fish? Of course not. If you need an interface that's contract driven, use RPC or SOAP or even a custom vendor mime-type. But don't point to the spec and say it's Rest, because in the end your eating a rat and everyone knows it and you're only lying to yourself.
The second issue is that you are losing the actual rewards of the hypermedia-driven interface when you cut corners. Right away you have run into issues with user agents and your own server having to jump through hoops or simply give up because the mime-type was unfamiliar. All because you thought you could have it both ways when the point isn't to impress your clients with claims of a true Rest service or to lighten the load a bit by shedding the (obviously valuable for some contexts) extra weight of a contract-driven interaction, it was to change how your service actually interacted with external clients.
Finally, I'm really unclear on how a vendor specific mime-type actually enforces a contract any better than a defined endpoint? All of the sites that mention this technique seem to just be glowing with relief that this option exists and, quite frankly, a bit smug and pleased that they are using it, like they know it's technically "naughty" but it's just so easy and fixes everything. What does it fix? In your case, why wouldn't you simply have your inbound person request/content go to:
POST /myRestService/people
and if they have some other request, have that go to a different endpoint intended for that other data type? If you need a method does_something, wouldn't you either go with:
GET /myRestService/people/personID_123/does_something
or
GET /myRestService/people/does_something/personID_123
depending on the exact context?
And just so I don't sound mean on top of loony, any frustration or anger is not at all directed at you or your question, but at the "solution" of the vendor mime-type and the obsession everyone has developed for the "Roy Fielding officially-approved and stamped
as valid REST service" that apparently no one is even able to provide a working public example of, leaving only a sense of urgency to adopt it right away taking whatever shortcuts needed and we can deal with the shame and finger pointing later when we actually fix the problems the shortcuts made.

Suggestions on addressing REST resources for API

I'm a new REST convert and I'm trying to design my first RESTful (hopefully) api and here is my question about addressing resources
Some notes first:
The data described here are 3d render
jobs
A user (graphics company) has multiple projects.
A project has multiple render jobs.
A render job has multiple frames.
There is a hierarchy enforced in the data (1 render job
belongs to one project, to one user)
How's this for naming my resourses...?
https:/api.myrenderjobsite.com/
/users/graphicscompany/projects
/users/graphicscompany/projects/112233
/users/graphicscompany/projects/112233/renders/
/users/graphicscompany/projects/112233/renders/889900
/users/graphicscompany/projects/112233/renders/889900/frames/0004
OR a shortened address for renders?
/users/graphicscompany/renders/889900
/users/graphicscompany/renders/889900/frames/0004
OR should I shorten (even more) the address if possible, omitting the user when not needed...?
/projects/112233/
/renders/889900/
/renders/889900/frames/0004
THANK YOU!
Instead of thinking about your api in terms of URLs, try thinking of it more like pages and links
between those pages.
Consider the following:
Will it be reasonable to create a resource for users? Do you have 10, 20 or 50 users? Or do you have 10,000 users? If it is the latter then obviously creating a single resource that represents all users is probably not going too work to well when you do a GET on it.
Is the list of Users a reasonable root url? i.e. The entry point into your service. Should the list of projects that belong to a GraphicsCompany be a separate resource, or should it just be embedded into the Graphics Company resource? You can ask the same question of each of the 1-to-many relationships that exist. Even if you do decide to merge the list of projects into the GraphicsCompany resource, you may still want a distinct resource to exist simple for the purpose of being able to POST to it in order to create a new project for that company.
Using this approach you should be able get a good idea of most of the resources in your API and how they are connected without having to worry about what your URLs look like. In fact if you do the design right, then any client application you right will not need to know anything about the URLs that you create. The only part of the system that cares what the URL looks like is your server, so that it can dispatch the request to the right controller.
The other significant question you need to ask yourself is what media type are you going to use for these resources. How many different clients will need to access these resources? Are you writing the clients, or is someone else? Should you attempt to reuse an existing standard like XHTML and classes/microformats? Could you squeeze most of the information into Atom? Maybe Atom with some extra namespaces like GDATA does it? Or is this only going to be used internally so you can just create your own media types, like application/vnd.YourCompany.Project+xml, application/vnd.YourCompany.Render+xml, etc.
There are many things to think about when designing a REST api, don't get hung up on what your URLs look like and you should really try to avoid doing "design by URL".
Presuming that you authenticate to the service, I would use the 1st option, but remove the user, particularly if the user is the currently logged in user.
If user actually represents something else (like client), I would include it, but not if it simply designates the currently logged in user. Agree with StaxMan, though, don't worry too much about squeezing the paths, as readability is key in RESTful APIs.
Personally I would not try to squeeze path too much, that is, some amount of redundant information is helpful both to quickly see what resource is, and for future expansion.
Generally users won't be typing paths anyway, so verbosity is not all that bad.

writing SEO-friendly pages that can be toggled public or private

our application wants to be able to create static, searchable pages based on user profile information, which would be linkable to other public profiles.
I am looking at LinkedIn as an example...it seems like they actually auto-generate the page to be a static file that is indexable and searchable.
Can someone suggest how we would do this? I am thinking there would need to be a cron job that runs and writes a the path and file name.
The user may want to keep the whole page private, in which case I imagine it would need to delete it.
There's alot of sub-requirements but that's the general concept and wanted to start getting ideas and feedback.
Thanks.
You can do without the cron job if you generate the static pages in real time whenever the profile information is created/updated or whenever user changed the setting to keep info public/private. This way you are not constantly looping through all users, and do not depend on another component (your cron job) to be running.
One alternative would be to adopt an explicit RESTful information architecture so that a profile resource ("page") is addressable with a permanent URL. The resulting resource could be a static page. Or not. That would be an implementation detail invisible to the search engine crawler and any web browser accessing the resource.
umnik700's answer is fairly dead-on if you're not considering issues related to authentication or who gets to see what. Consider the difference between the profiles you see when you're logged into Facebook versus those same profiles' publicly facing, searchable counterparts. Even MySpace, with a lot less consideration for search engine privacy, has viewability that is dependent on your relationship to the other person, defaulting, for private profiles, to "This profile has been set to private by the user" or something to that extent.
If you're looking to suddenly scale out a social tool where individuals are eliciting their personal information, I would suggest umnik700's answer (dynamically generate the content, but not the URLs, for public versions of the profile) with the following corollary: you need to be able to support privacy preferences varying from extremely strict to completely open, and default to a version that at least errs on the stricter, more private version of the profile. If you're just now pushing out searchable personal content when there never was any way to find it outside the site before, it's important not to abuse information given under different pretenses.
I know this probably requires maybe more scalability and added functionality than you were hoping this project would take, but to do otherwise could be most likely taken as a violation of your user base's tacit trust. Anyway, the best strategy to do this will probably require you to lean on your database more anyway, so it might be time to rework it a bit--including adding some privacy preferences.