RESTful API HATEOAS - api

I've come to the conclusion that building a truly RESTful API, one the uses HATEOAS is next to impossible.
Every content I've come across either fails to illustrate the true power of HATEOAS
or simply does not explicitly mentions the inherent pain points with the dynamic nature of HATEOAS.
What I believe HATEOAS is all about:
From my understanding, a truly HATEOAS API should have ALL the information needed to interact with the API, and while that is possible, it is a nightmare to use especially with different stacks.
For example, consider a collection of resources located at "/books":
{
"items": [
{
"self": "/book/sdgr345",
"id": "sdgr345",
"name": "Building a RESTful API - The unspoken truth",
"author": "Elad Chen ;)",
"published": 1607606637049000
}
],
// This describes every field needed to create a new book
// just like HyperText Markup Language (i.e. HTML) rendered on the server does with forms
"create-form": {
"href": "/books",
"method": "POST",
"rel": ["create-form"],
"accept": ["application/x-www-form-urlencoded"],
"fields": [
{ "name": "name", "label": "Name", "type": "text", "max-length": "255", "required": true }
{ "name": "author", "label": "Author", "type": "text", "max-length": "255", "required": true }
{ "name": "author", "label": "Publish Date", "type": "date", "format": "dd/mm/YY", "required": true }
]
}
}
Giving the above response, a client (such as a web app) can use the "create-form" property to render an actual HTML form.
What value do we get from all this work?
The same value we've been getting from HTML for years.
Think about it, this is exactly what hypertext is all about, and what HTML has been designed for.
When a browser hits "www.pizza.com" the browser has no knowledge of the other paths that a user
can visit, it does not concatenate strings to produce a link to the order page -> "www.pizza.com/order", it simply renders anchors
and navigates when a user clicks them. This is what allows web developers to change the path from "/order" to "/shut-up-and-take-my-money" without changing any client (browsers).
The above idea is also true for forms, browsers do not guess the parameters needed to order a pizza, they simply render a form and its inputs, and handle its submission.
I have seen too many lines of codes in front-ends and back-ends alike, that build strings
like -> "https://api.com" + "/order" - You don't see browsers do that, right?
The problems with HATEOAS
Giving the above example ("/books" response), in order to create a new book, clients are expected to parse the response in order to leverage the true power of this RESTful API, otherwise, they risk assuming what the names of the fields are, which of them is required, what their expected type is, etc...
Now consider having two clients within your company that are using this API,
one for the web (browsers) written in JS, and another for the mobile (say an android app) written in Java. They can be published as SDK's, hopefully making 3 party consumers have an easier integration.
As soon as the API is used by clients outside your control, say a 3rd party developer with an affinity to python, with the purpose of creating a new book.
That developer is REQUIRED to parse such a response, to figure out what the parameters are, their name, the URL to send inputs to, and so on.
In all my years of developing I have yet to come across an API such as the one I have in mind.
I have a feeling this type of API is nothing more than a pipe dream, and I was hoping to understand whether my assumptions are correct, and what downfalls it brings before starting the implementation phase.
P.S
in case it's not clear, this is exactly what HATEOAS compliant API is all about - when the fields to create a book change clients adapt without breaking.

On the Hypermedia Maturity Model (HMM), the example you give is at Level 0. At this level, you are absolutely correct about the problems with this approach. It's a lot of work and developers are probably going to ignore it and hard-code things anyway. However, with a generic hypermedia enabled media type, not only does all that extra work go away, it actually reduces the work for developers.
Let's take a step back for a moment and consider how the web works. There are three main components: the web server, the web browser, and the driver (usually a human user). The web server provides HTML, which the web browser executes to present a graphical user interface, which the driver can use to follow links and fill out forms. The browser uses the HTML to completely abstract from the driver all the details about how to present the form and how to send it over HTTP.
In the API world, this concept of the generic browser that abstracts away the media type and HTTP details hasn't taken hold yet. The only one I know about that is both active and high quality is Ketting. Using a browser like Ketting, removes all of that extra work the developer would have to put into making use of all that hypermedia. A hypermedia browser is like an SDK that API vendors often provide except that it works for any API. In theory, you can link from one API to another completely unrelated API. APIs would no longer be islands, they would become a web.
The thing that makes hypermedia browsers possible are general purpose hypermedia enabled media types. HTML is of course the most successful and famous example, but there are JSON based media types as well. Some of the more widely used examples are HAL and Siren.
The higher level a media type is on the Hypermedia Maturity Model, the more that a generic browser can do to abstract way the media-type, URIs, and HTTP details. Here's a brief explanation. Checkout the blog post linked above for more details and examples.
Level 0: At this level, hypermedia is encoded in an ad-hoc way. A browser can't do much with this because every API might encode things a little different. At best a browser can use heuristics or AI to guess that something is a link or a form and treat it as such, but generally HMM Level 0 media-types are intended for developers to read an interpret. This leads to many of the challenges you identified your question. Examples: JSON and XML.
Level 1: At this level, links are a first class feature. The media type has a well defined way to represent a link. A browser knows unambiguously what is to be interpreted as a link and can provide an interface to follow that link without the user needing to be concerned about URI or HTTP. This is sufficient for read-only APIs, but if we need the user to provide input, we don't have a way to represent a form-like hypermedia control. It's up to a human to read the documentation or an ad-hoc form representation to know how to submit data. Examples: HAL, RESTful JSON.
Level 2: At this level, forms (or form-like controls) are a first class feature. A media type has a well defined way of representing a form-like hypermedia control. A browser can use this media type to do things like build an HTML Form, validate user input, encode user input into an acceptable media type, and make an HTTP request using the appropriate HTTP method. Let's say you want to change your API to support PATCH and you prefer that application start using it over PUT. If you are using a HMM Level 2 media type, you can change method in the representation and any application that uses a hypermedia browser that knows how to construct a PATCH request will start sending PATCHs instead of PUTs without any developer intervention. Without a HMM Level 2 media type, you're stuck with those PUTs until you can get all the applications that use your API to update their code. Examples: HTML, HAL-Forms, Siren, JSON Hyper-Schema, Uber, Mason, Collection+JSON.
Level 3: At this level, in addition to hypermedia controls, data are also self-describing. Remember those three main components I mentioned? The last one, "driver", is the major difference between using hypermedia on the web and using hypermedia in an API. On the web, the driver is a human (excluding crawlers for simplicity), but with an API, the driver is an application. Humans can interpret the meaning of what they are presented with and deal with changes. Applications may act on heuristics or even AI, but usually they are following a fixed routine. If something changes about the data that the application didn't expect, the application breaks. At this level, we apply semantics to the data using something like JSON-LD. This allows us to construct drivers that are better at dealing with change and can even make decisions without human intervention. Examples: Hydra.
I think the only downside to choosing to use hypermedia in your API is that there aren't production-ready HMM Level 2 hypermedia browsers available in most languages. But, the good news is that one implementation will cover any API that uses a media type it supports. Ketting will work for any API and application that is written in JavaScript. It would only take a few more similar implementations to cover all the major languages and choosing to use hypermedia would be an easy choice.
The other reason to choose hypermedia is that it helps with the API design process. I personally use JSON Hyper-Schema to rapid-prototype APIs and use a generic web client to click through links and forms to get a feel for API workflows. Even if no one else uses the schemas, for me, it's worth having even if just for the design phase.

Implementing a HATEOAS API needs to be done both on the server and on the clients, so this point you make in the comments is very valid indeed:
changing a resource URI is risky given I don't believe clients actually "navigate" the API
Besides the World Wide Web, which is the best implementation of HATEOAS, I only know of the SunCloud API on the now decommissioned Project Kenai. Most APIs out there don't quite make use of HATEOAS, but are just a bunch of documented URLs where you can get or submit specific resources (basically, instead of being "hypermedia driven", they are "documentation driven"). With this type of API, clients don't actually navigate the API, they concatenate strings to go at specific endpoints where they know they can find specific resources.
If you expose a HATEOAS API, developers of any clients can still look at the links you return and may decide to build them on their own 'cause they figure what the API is doing, so then think they can just bypass any other navigation that might be needed and go straight for the URL, because it is always /products/categories/123, until - of course - it isn't anymore.
A HATEOAS API is more difficult to build and adds complexity to both the server and the clients, so when deciding to build one, the questions are:
do you need the flexibility HATEOAS is giving you to justify the extra complexity of the implementation?
do you want to make it easier or harder for clients to consume your API?
does the knowledge and discipline to make it all work exist on both sides (server developers and clients developers)?
Most of the times, the answer is no. In addition, many times the questions don't even get asked, instead people end up with the approach that is more familiar because they've seen the sort of APIs people build, have used some, or built some before. Also, many times, a REST API is just stuck in front of a database and doesn't do much but expose data from the database as JSON or XML, so not that much need for navigation there.
There is no one forcing you to implement HATEOAS in your API, and no one preventing you to do so either. At the end of the day, it's a matter of deciding if you want to expose your API this way or not (another example would be, do you expose your resources as JSON, XML, or let the client chose the content type?).
In the end, there is always the chance of breaking your clients when you make changes to your API (HATEOAS or no HATEOAS), because you don't control your clients and you can't control how knowledgeable the client developer are, how disciplined, or how good of a work they do in implementing someone else's API.

Related

Should I split my API up into subdirectories or specify what action I want it to do in the POST request?

I am designing a REST API for a software engineering class in school, and I am not sure what the better approach is. My API needs to support "submitting items" and "creating collections of items" in essentially the same subdirectory (let's call it api/action). As I understand this, both of these would require a POST request from the client-side. The first type of POST, for "submitting items", would have a body looking something like
{
userId: ...,
brandName: ...,
itemDetail1: ...,
itemDetail2: ...,
etc.
}
The second type of request would have a body that is different. My question is this:
Since these are similar, is it better for me to POST to two different api subdirectories like
api/action/submitItem
api/action/createCollection
OR
Is it better to POST to "api/action" and specify in the POST request body a parameter called "actionType"
actionType: submitItem
and
actionType: createCollection
and have the API differentiate this while it handles the "generic" POST request?
Asking for General Advice, I have not tried anything just yet.
The approach of POSTing to two different API subdirectories (e.g. api/action/submitItem and api/action/createCollection) is known as a resource-oriented approach. This approach is straightforward and easy to understand, as each URL maps to a specific resource, such as a submitted item or a collection. This approach also makes it easier to add additional endpoints in the future without affecting existing endpoints.
On the other hand, POSTing to the same endpoint with a "actionType" parameter in the body is known as an operation-oriented approach. This approach is more flexible, as it allows you to add or modify the behavior of a single endpoint without affecting the rest of the API. However, this approach can become more complex as the number of actions grows, and it may not be as intuitive for API clients to understand how to use the API.
In the end, the choice between these approaches is largely a matter of personal preference and the specific requirements of your API. Both approaches have their advantages and disadvantages, and it's important to carefully consider the trade-offs and make the choice that works best for your API and its users.

URIs in REST API endpoints according to Restful practices

I am planning to have these endpoints for our REST APIs.
PUT /tenant/:tenantId/users/save/:username
POST /tenant/:tenantId/users/invite
GET /tenant/:tenantId/users/fetch
GET /tenant/:tenantId/users/fetch/:username
PATCH /tenant/:tenantId/users/activate/:username
POST /tenant/:tenantId/groups/save/
Verbs such as save/fetch/activate are from the consistency point of view. Are these RESTFul according to the REST principles? How should these be changed if at all? Any recommendations?
According to this REST Resource Naming Guide:
RESTful URI should refer to a resource that is a thing (noun) instead of referring to an action (verb) because nouns have properties which verbs do not have – similar to resources have attributes.
And also
URIs should not be used to indicate that a CRUD function is performed. URIs should be used to uniquely identify resources and not any action upon them. HTTP request methods should be used to indicate which CRUD function is performed.
So let's take your first URI as example
PUT /tenant/:tenantId/users/save/:username
Here you are using the verb save. As mentioned before you should not be indicating a CRUD operation in the URI, in this case using a POST would be more appropriate.Here is a guide with the purpose of each HTTP verb. Knowing this, I think that for example a more appropriate URI for that case would be something like
POST /tenants/:tenantId/users/:username
In this cases:
GET /tenant/:tenantId/users/fetch
GET /tenant/:tenantId/users/fetch/:username
you should remove the fetch because you are already telling through the GET verb that data is being fetched. Same goes for the 6th example.
But, this doesn't mean that you can't use verbs in your URIs, in fact there is a specific category called controller which as mentioned in the same guide:
A controller resource models a procedural concept. Controller resources are like executable functions, with parameters and return values; inputs and outputs.
Use “verb” to denote controller archetype.
This controllers resources could go well (I asume) with for example your
GET /tenant/:tenantId/users/activate/:username.
But I would think that the verb activate should go last:
GET /tenant/:tenantId/users/:username/activate
First note: REST doesn't care what spelling conventions you use for your resource identifiers. Once you figure out the right resources, you can choose any identifiers for them that you like (so long as those identifiers are consistent with the production rules defined in RFC 3986).
"Any information that can be named can be a resource" (Fielding, 2000), but its probably most useful to think about resources as abstractions of documents. We use HTTP as an application protocol whose application domain is the transfer of documents over a network.
GET
This is the method we use to retrieve a document
PATCH
PUT
POST
These methods all indicate requests to edit a document (specifically, to edit the request target).
PUT and PATCH are each ask the server to make its copy of a document look like the client's local copy. Imagine loading a web page into an editor, making changes, and then "saving" those changes back to the server.
POST is less specific; "here's a document that we created by filling in a web form, edit yourself appropriately". It is okay to use POST: after all, the web was catastrophically successful and we're still using POST in our form submissions.
The useful work is a side effect of these edits.
Are these RESTFul according to the REST principles?
Do they work like a web site? If they work like a web site: meaning you follow links, and send information to the server by submitting forms, or editing the webpages and submitting your changes to the server, then it is REST.
A trick though: it is normal in REST that a single method + request uri might have different useful side effects. We can have several different HTML forms that all share the same Form.action. Uploading changes to an order document might have very different effects if the edits are to the shipping address vs to the billing information or the order items.
Normal doesn't mean obligatory - if you prefer a resource model where each form request goes to a specific resource, that can be OK too. You get simpler semantics, but you support more resources, which can make caching trickier.

REST API responses based on authentication, best practices?

I have an API with endpoint GET /users/{id} which returns a User object. The User object can contain sensitive fields such as cardLast4, cardBrand, etc.
{
firstName: ...,
lastName: ...,
cardLast4: ...,
cardBrand: ...
}
If the user calls that endpoint with their own ID, all fields should be visible. However, if it is someone elses ID then cardLast4 and cardBrand should be hidden.
I want to know what are the best practices here for designing my response. I see three options:
Option 1. Two DTOs, one with all fields and one without the hidden fields:
// OtherUserDTO
{
firstName: ...,
lastName: ..., // cardLast4 and cardBrand hidden
}
I can see this becoming out of hand with DTOs based on role, what if now I have UserDTOForAdminRole, UserDTOForAccountingRole, etc... It looks like it quickly gets out of hand with the number of potential DTOs.
Option 2. One response object being the User, but null out the values that the user should not be able to see.
{
firstName: ...,
lastName: ...,
cardLast4: null, // hidden
cardBrand: null // hidden
}
Option 3. Create another endpoint such as /payment-methods?userId={userId} even though PaymentMethod is not an entity in my database. This will now require 2 api calls to get all the data. If the userId is not their own, it will return 403 forbidden.
{
cardLast4: ...,
cardBrand: ...
}
What are the best practices here?
You're gonna get different opinions about this, but I feel that doing a GET request on some endpoint, and getting a different shape of data depending on the authorization status can be confusing.
So I would be tempted, if it's reasonable to do this, to expose the privileged data via a secondary endpoint. Either by just exposing the private properties there, or by having 2 distinct endpoints, one with the unprivileged data and a second that repeats the data + the new private properties.
I tend to go for option 1 here, because an API endpoint is not just a means to get data. The URI is an identity, so I would want /users/123 to mean the same thing everywhere, and have a second /users/123/secret-properties
I have an API with endpoint GET /users/{id} which returns a User object.
In general, it may help to reframe your thinking -- resources in REST are generalizations of documents (think "web pages"), not generalizations of objects. "HTTP is an application protocol whose application domain is the transfer of documents over a network" -- Jim Webber, 2011
If the user calls that endpoint with their own ID, all fields should be visible. However, if it is someone elses ID then cardLast4 and cardBrand should be hidden.
Big picture view: in HTTP, you've got a bit of tension between privacy (only show documents with sensitive information to people allowed access) and caching (save bandwidth and server pressure by using copies of documents to satisfy more than one request).
Cache is an important architectural constraint in the REST architectural style; that's the bit that puts the "web scale" in the world wide web.
OK, good news first -- HTTP has special rules for caching web requests with Authorization headers. Unless you deliberately opt-in to allowing the responses to be re-used, you don't have to worry the caching.
Treating the two different views as two different documents, with different identifiers, makes almost everything easier -- the public documents are available to the public, the sensitive documents are locked down, operators looking at traffic in the log can distinguish the two different views because the logged identifier is different, and so on.
The thing that isn't easier: the case where someone is editing (POST/PUT/PATCH) one document and expecting to see the changes appear in the other. Cache-invalidation is one of the two hard problems in computer science. HTTP doesn't have a general purpose mechanism that allows the origin server to mark arbitrary documents for invalidation - successful unsafe requests will invalidate the effective-target-uri, the Location, the Content-Location, and that's it... and all three of those values have other important uses, making them more challenging to game.
Documents with different absolute-uri are different documents, and those documents, once copied from the origin server, can get out of sync.
This is the option I would normally choose - a client looking at cached copies of a document isn't seeing changes made by the server
OK, you decide that you don't like those trade offs. Can we do it with just one resource identifier? You immediately lose some clarity in your general purpose logs, but perhaps a bespoke logging system will get you past that.
You probably also have to dump public caching at this point. The only general purpose header that changes between the user allowed to look at the sensitive information and the user who isn't? That's the authorization header, and there's no "Vary" mechanism on authorization.
You've also got something of a challenge for the user who is making changes to the sensitive copy, but wants to now review the public copy (to make sure nothing leaked? or to make sure that the publicly visible changes took hold?)
There's no general purpose header for "show me the public version", so either you need to use a non standard header (which general purpose components will ignore), or you need to try standardizing something and then driving adoption by the implementors of general purpose components. It's doable (PATCH happened, after all) but it's a lot of work.
The other trick you can try is to play games with Content-Type and the Accept header -- perhaps clients use something normal for the public version (ex application/json), and a specialized type for the sensitive version (application/prs.example-sensitive+json).
That would allow the origin server to use the Vary header to indicate that the response is only suitable if the same accept headers are used.
Once again, general purpose components aren't going to know about your bespoke content-type, and are never going to ask for it.
The standardization route really isn't going to help you here, because the thing you really need is that clients discriminate between the two modes, where general purpose components today are trying to use that channel to advertise all of the standardized representations that they can handle.
I don't think this actually gets you anywhere that you can't fake more easily with a bespoke header.
REST leans heavily into the idea of using readily standardizable forms; if you think this is a general problem that could potentially apply to all resources in the world, then a header is the right way to go. So a reasonable approach would be to try a custom header, and get a bunch of experience with it, then try writing something up and getting everybody to buy in.
If you want something that just works with the out of the box web that we have today, use two different URI and move on to solving important problems.

Could REST API OPTIONS be used as the HATEOAS only request?

As I've understood it, REST MUST use the HATEOAS constraint to be implemented properly. My understanding of HATEOAS is that basically every resource should share information about what communication options it has and how the consumer can use those options to achieve their end goal.
My question is if the HTTP OPTIONS method could be used as a way to navigate a REST API. Basically the response from an OPTIONS request would include the possible actions to take on a resource which would make it possible to consume the API without knowing the endpoints.
e.g.
An initial request to the API
HTTP OPTIONS /api
Could return all resources available for consumption and their relations. Like a massive tree to consume the API and all it has to offer. This idea doesn't neglect implementing HATEOAS on other responses as well, but the OPTIONS request would allow navigation without returning data that the consumer might not actually want to consume.
Is this a really bad idea? Or is it something that is commonly implemented. I'm currently attempting to implement a REST API but I'm having a hard time understanding the benefit of HATEOAS if there is no way to navigate the API without actually requesting data that you might not necessarily need when consuming certain end points. And I assume HATEOAS aims to make clients consume resources by their relation and not actually hard coding the end point?
TL;DR
Could HTTP OPTIONS request act as a way to navigate a REST API by returning what communication options are available for the requested resource without actually returning the resource?
According to RFC 7231
The OPTIONS HTTP method requests information about the communication options available for the target resource, at either the origin server or an intervening intermediary. This method allows a client to determine the options and/or requirements associated with a resource, or the capabilities of a server, without implying a resource action.
...
A server generating a successful response to OPTIONS SHOULD send any header fields that might indicate optional features implemented by the server and applicable to the target resource (e.g., Allow), including potential extensions not defined by this specification. The response payload, if any, might also describe the communication options in a machine or human-readable representation. A standard format for such a representation is not defined by this specification, but might be defined by future extensions to HTTP. A server MUST generate a Content-Length field with a value of "0" if no payload body is to be sent in the response.
So, basically a response to an OPTIONS request will tell your client which HTTP operations may be performed on a certain resource. It is furthermore admissible to target the whole server on utilizing * instead of a specific resource URI.
A response to an OPTIONS request may look like this:
HTTP/1.1 204 No Content
Allow: OPTIONS, GET, HEAD, POST
Cache-Control: max-age=604800
Date: Thu, 13 Oct 2016 11:45:00 GMT
Expires: Thu, 20 Oct 2016 11:45:00 GMT
Server: EOS (lax004/2813)
x-ec-custom-error: 1
which states that a certain resource supports the mentioned operations in the Allow header of the resonse. Via the Cache-Control header a client knows that it by default can cache responses of safe requests (GET and HEAD) for up to 7 days (value is mentioned in seconds). The x-ec-custom-error header specifies a non-standard header that is specific to a particular software, in that particular case to a ECS Server. According to this Q & A the meaning isn't publicly documented and therefore application specific.
In regards to returning a tree of traversable resources from the given resource the OPTIONS operation was requested for, technically this could be possible, however, certain systems might produce an almost never-ending list of URIs. Therefore such a design is questionable for larger systems.
My understanding of HATEOAS is that basically every resource should share information about what communication options it has and how the consumer can use those options to achieve their end goal.
Hypertext as the engine of application state (HATEOAS) is basically just a requirement to use the interaction model used on the Web for decades quite successfully and offer the same functionality to applications. This enabled applications to surf the Web similar like we humans do.
Great, but how does it work?
On the Web we use links and Web forms all the time. Through a Web form a server is able to teach a client basically what properties a certain resource supports or expects. But that's not all! The same form also tells your client where to send the request to (target URI), the HTTP method to use and, usually implicitly given, the media type the payload needs to be serialized to upon sending the request to the server. This, in essence, makes out-of-band API documentation unnecessary as all the information a client needs to make a valid request is given by the server already.
On a typical Web site you might have a table of entries which offers the option to add new entries, update or delete existing ones. Usually such links are hidden behind fancy images, i.e. a dustbin for deleting an entry and a pencil for editing an existing entry or the like where the image represents an affordance. The affordance of certain elements make it clear what you should do with it or what's the purpose of that element. A button on a page wants to be pushed while a slider widget wants to be changed while a text field waits for user input. As applications aren't that eager to work on images a further concept is used instead. Link relation names exactly serve this purpose. I.e. if you have a pageable collection consisting of multiple page à 25 entries i.e. you might be familiar with a widget containing arrows to page through that collection. A link here should usually be annotated with link relation names such as self, next, prev, first or last. The purpose of such links is quite clear, some others like prefetch, that indicates that a resource can be loaded in the background early as it is very likely that the next action may request it, might be less intuitive at first. Such link relation names should be standardized or at least follow the Web Linking extension mechanism.
Through the help of link-relation names a client that knows to look for URIs annotated with next i.e. will still work if the server decides to change its URI scheme as it treats the URI rather opaque.
Of course, both client and server need to support the same media type that furthermore is able to represent such capabilities. Plain application/json is i.e. not able to provide such a support. HAL JSON or JSON Hyper-Schema at least add support for links and link relation names to JSON based documents, while hal-forms, halo+json (halform) and ion might be used to teach a client how a request needs to be created. The question here shouldn't be which media type to support but how many different ones you want to support as the more media types your API is able to handle, the more likely it will be to interact with arbitrary clients not under your control.
These concepts allow you to basically use the controls given in the server response to "drive your workflow" forward. In essence, what you, as an API designer should do is to design the interactions of a client with your API so that it follows a certain, as Jim Webber termed it, domain application protocol or state machine as Asbjørn Ulsberg put it that basically guides a client through its task, i.e. ordering from your shop API.
So, in short, HATEOAS is comparable to Web surfing for applications by making use of named link relations and form-like media type representations that allow you to take actions solely on the response retrieved from a server directly instead of having to bake external knowledge stemming from some reference documentation page (Swagger, OpenAPI or the like) into your application.
But how does HATEOAS benefit the consumer in practice then?
First, it does not have to consult any external documentation other maybe than the current media type specification, though usually support for well-known media types is already backed into popular frameworks or at least allows to add support through plugins or further libraries. Once the media type is understood and supported interactions with all serivces that also support the same media type is possible, regardless of their domain. This allows to reuse the same client implementation to interact with service A and service B out of the box. In an RPC-like systems you'd need to integrate the API of service A first and if you want to interact with service B also you need to integrate those API separately. It's most likely that these APIs are incompatible and thus don't allow the reusage of the same classes.
Without knowing the URL for a resource, is the idea that the consumer can discover it by browsing the API, but they will still have a hard dependency on the actual URL? Or is HATEOAS purpose to leverage actions on a certain resource, i.e. the consumer knows the users end-point but he does not need to know the end-points for actions to take on the users resource cause those are provided by the API?
A client usually does not care about the URI itself, it cares about the content a URI may provide. Compare this to your typical browsing behavior. Do you prefer a meaningful text that summarizes that links content so you can decide whether to request that resource or do you prefer parsing and dissecting a URI to learn what it might do? Minifying or obfuscating URIs will do you no favor in the latter case though.
A further danger arise from URIs and resources that a client put meaning to. A slopy developer will interpret such URIs/resources and implement a tiny hack to interact with that service assuming the URI/resource will remain static. I.e. it is not unreasonable to consider a URI /api/users/1 to return some user related data and based on the response format a tiny Java class is written that expects to receive a field for username and one for email i.e.. If the server now decides to add additional data or rename its fields, the client suddenly will not be able to interact with that service further. And rest assured that in practice, especially in the EDI domain, you will have to interact with clients that are not meant to interact with the Web or where programmers implemented their own JSON framework that can't coope with changing orders of elements or can't handle additional optional fields, even though the spec contains notes on those issues. Fielding claimed that
A REST API should never have “typed” resources that are significant to the client. Specification authors may use resource types for describing server implementation behind the interface, but those types must be irrelevant and invisible to the client. The only types that are significant to a client are the current representation’s media type and standardized relation names. [ditto] (Source)
Instead of typed resources content type negotiation should be used to support interoperability of different stackholders in the network.
As such, the URI itself is just the identifier of a resource that is mainly used to learn where to send a request to. Through the help of meaningful link relation names a client should know that it is interested in i.e. http:/www.acme.com/rel/orders if it wants to send an order to the service and just looks up the URI that either is annotated with that Web Linking extension realtion name or that has an URI attached to it. Whether the link relation name is just an annotation (i.e. a further attribute on the URI element) or the URI being attached to the link-relation name (i.e. as an embedded object of the link relation name) is dependent on the actual media type. This way, if a server ever decides to change its URI scheme or move around resources, for whatever reason, the client will still be able to find the URI that way and it couldn't care less about the characters present in the URI or not. It just treats the URI as opaque thing. The nice thing here is, that a URI can be annotated with multiple link relation names simultaneously, which allows a server to "offer" that URI to clients that support different link-relation names. In the case of forms the URI to send the request to is probably contained in the action attribute of the form element or the like.
As you hopefully can see, with HATEOAS there is no need for a hard dependency on URIs, if so there may be a dependency on the link-relation name though. It still requires URIs to learn where to send the request to, but through looking up the URI via its accompanying link relation name you make the handling of URIs much more dynamic as it allows a server to change the URI anytime it wants to or has to.

Consuming a REST API endpoint from a resource ID

Lets consider the following flow to a RESTfull API:
API root
|
v
user list
|
v
user details
|
v
user messages
Suppose I have a client to consume the API, and I want to retrieve messages from a user with ID 42.
From what I've been studying, my client is not supposed to know how to "build" urls, and it should follow the links given by the API.
How should I do to retrieve messages for the user with ID 42?
The only way I can think is "walk" the whole API from it's root to user messages, which doesn't look very pretty or efficient to me.
Eg:
1 - GET / and get the link to the list of users
2 - GET /user/?id=42 and get the link to details of the user with the ID 42
3 - GET /user/42/ and get the link to user 42 list of messages
4 - GET /user/42/messages/ and finally get the user messages
Did I get something wrong? Is this the right way according to Roy's Fielding paper?
Or is it ok to just assume the messages url is "/user/{id}/messages/" and make the request directly?
Use URL templates in your API root. Let the client consume the API root at runtime. It should look for a URL template named something like "user-messages" with the value of "/user/{userid}/messages/". Then let the client substitute "42" for "{userid}" in the template and do a GET on the resulting URL. You can add as many of these URL templates you want for all of the required, often used, use cases.
The difference between this solution and a "classic" web API is the late binding of URLs: the client reads the API root with its templates at runtime - as opposed to compiling the client with the knowledge of the URL templates.
Take a look at the HAL media type for some information about URL templates: http://stateless.co/hal_specification.html
I wrote this piece here some time ago to explain the benefits of hypermedia: http://soabits.blogspot.dk/2013/12/selling-benefits-of-hypermedia.html
I believe what your real concern is should you go about implementing HATEOAS or not. Now as it's an integral part of REST specifications, it is recommended that each entity should have a link to it's child entity that it encompasses. In your case, API ROOT should show list of users with each "user" having a link (/root/users/{id}) to corresponding user's details. And each User details entity will contain a link to the list of "messages" (/root/users/{id}/messages) which, finally, inturn encompass the link to the actual message detail as well (/root/users/{id}/messages/{messageId}). This concept is extremely useful (and thus a part of the specifications) because the client doesn't need to know the url to where your entity is exposed. For example, if your users were on http://users.abc.com/rest/users/{id} but your messages were on http://messages.abc.com/rest/{userId}/messages/{messageId}, the user entity that encompasses the list of "messages" will already have link embedded to point to the right resource on a different server.
Now that being said, I haven't actually seen many REST implementations out there (I must admit I do not have TOO MUCH of an experience, but enough to give an opinion) where HATEOAS is being used widespread. In most cases the resources are almost always on the same server (environment) and the paths to resources are almost always relative to the root url.Thus, it doesn't make sense for the clients to parse out the embedded links from the object when they can generate one by themselves, especially when the client would like to provide access to a resource directly (View the message directly without getting the user entity provided you already know what the messageId is).
In the end, it all depends on how close do you want your REST implementations to that of specifications and what kind of clients are you going to have. My 2 cents would be: if you have time, implement REST with HATEOAS and feel proud about it :). There are libraries out there that will make this implementation (HATEOAS) somewhat transparent to you REST implementation (I believe spring has one, although not very mature. You can look at it here). If you are like me and don't have much time to go that route, I think you can continue with a normal REST implementation without HATEOAS and your clients will still be OK with it (or so I hope!)
Hope this helps!
I found this article about hacking urls: Avoid hackable URLs.
There is a very interesting discussion about the topic of this question in the comments section.