Best practice : What instead of "User-Agent" in http headers to identify an app? - http-headers

It looks like you can't always set "user-agent" header using Ajax. (user-agent is somewhat a reserved keyword and you can't forge it on some browser because of security concern).
When calling my REST service I'd like the caller to give me a clue about who (which application) is using it.
Registration won't be mandatory, it's rather a way to check if there are some external (valuable) clients that still use my web service when I'd like to close it.
So if I can't use the "user-agent" is there some name of choice to use instead ?
X-Application-Id ? X-UserAgent ?
Is there some doc that lists all those X-*** headers ?

Depending on your application, it may or may not be possible to add custom headers to your application.
From your question, I assume that you are able to set custom headers. The default for your application would be to use the User-Agent Flag as described in the Official HTTP Specification:
The "User-Agent" header field contains information about the user agent originating the request, which is often used by servers to help identify the scope of reported interoperability problems, to work around or tailor responses to avoid particular user agent limitations, and for analytics regarding browser or operating system use. A user agent SHOULD send a User-Agent field in each request unless specifically configured not to do so.
If you choose to add another custom header, there are no restrictions or recommendations for the name of the header. Please note, that you should not use an "X-" prefix as described in RFC 6648:
Historically, designers and implementers of application protocols have often distinguished between standardized and unstandardized parameters by prefixing the names of unstandardized parameters with the string "X-" or similar constructs. In practice, that convention causes more problems than it solves. Therefore, this document deprecates the convention for newly defined parameters with textual (as opposed to numerical) names in application protocols.

Related

Differentiating between 404 types

I know the 404 vs 204 debate has been beaten to death, and I understand the argument for using 404 when there is no record in the table corresponding to a REST endpoint request, but it feels like there should be some way of differentiating between "This endpoint is malformed" and "there is no record in the table." For example if I have an endpoint like this:
https://mycloudfront.cloudfront.com/api/my-table/{userId}
Is there a recommended way of configuring error handling on the backend to differentiate between "no resource found because there is no entry for userId" and "no resource found because there is no table named my-table" or "no resource found because there is no cloudfront distribution named mycloudfront"?
I ask, because it would be nice on the frontend to inform the end user whether or not their request did not produce the desired result because they have no data in the table (in which case I would display a message encouraging them to take an action that would generate data) or because something went wrong (in which case I would display an error message).
it would be nice on the frontend to inform the end user whether or not their request did not produce the desired result because they have no data in the table
That's what the response body is for.
Except when responding to a HEAD request, the server SHOULD send a representation containing an explanation of the error situation, and whether it is a temporary or permanent condition. RFC 9110.
Status codes are metadata in the transfer-of-documents-over-a-network domain (Webber, 2011) - the information indicates to general purpose web components (browsers, proxies, caches, spiders....) the semantics (meaning) of the fields and response body (ex: does the message include a representation of a resource or a representation of an error?)
Bespoke HTTP message handlers (and human operators) are expected to look for information in the body (ex: a 404 for a web page returns a picture of a fail whale and a bunch of links to different resources that might clarify what's gone wrong).
You can also leverage ideas like web linking (RFC 8288), if you want to describe relationships between the error and other resources.
Problem Details (RFC 7807) describes a standardized JSON schema for communicating error information, if you want a JSON representation but prefer not to do all of the schema design yourself.
First and foremost, REST has no endpoints but resources.
there should be some way of differentiating between "This endpoint is malformed" and "there is no record in the table."
By "This endpoint is malformed" I guess you probably mean the request issued to the server doesn't conform to the HTTP specification. As voice already mentioned, HTTP status code are coordination metadata for the outcome of the transportation and not necessarily the outcome of your business logic. Of course you need to come up with a mapping for problems you noticed while applying your business logic to the HTTP transport domain.
Unfortunately, REST is polluted with false assumption and believes. Plenty of people seem to think of it as HTTP based CRUD mostly done with JSON payloads. But this is just a very tiny fraction of what REST really is. At its heart it is a technique used in distributed computing to help decouple clients from server to allow the latter to evolve freely in future. Clients on the other hand are build with the inherent design decision of a possible change in mind and therefore get much more robust towards change in the end.
So, how does REST help to decouple clients from servers?
First, the spelling of a URI is not of importance. The URI needs to be a valid one but that's it basically. Clients shouldn't parse the URI or try to extract some knowledge off the URI nor does a URI pattern like /api/user/1 and /api/user/1/stuff mean that both of those URIs are somehow related. That's what link-relations are there for.
Next, in order to teach a client what an URI returned by the server is good for URIs should come with one or multiple link-relation names, which should either be based on registered ones or at least follow the Web Linking extension mechanism, which basically is just a further URI that does not necessarily need to point to a valid resource. Treat it like a predicate in a (SemWeb) ontology.
Use forms similar to HTML, like HAL-forms, JsonForms or Ion, if your server needs further input from clients. Forms also teach clients on what HTTP methods to use, which URI to send the request to, what media-type to encode the request in and of course a description of the properties the resource has and/or the server expects input for. This information is enough to let a client send valid HTTP requests in terms of the transport domain to the server. Note that this does not mean that there won't be any issues then. Requests still might fail to reach the server due to internet outage on whatever end, the request being routed badly and exceed the maximum number of allowed hops and so on but depending on the HTTP method used for sending the request a client might automatically reissue a request once it hit its timeout threshold.
In order to increase interoperability of any peer in a REST ecosystem REST has a strong focus on media types. Think of it as the binding contract between a client and a server which should be negotiated between both of them. This guarantees that both are capable of exchanging "messages" both understand and are able to process. One of the difference to regular RPC services here though is that RPC services are usually restricted to one payload mechanism while REST supports more or less an unlimited amount of payloads, depending on its support for various media-types. Media types are a human-readable description on how payload should be encoded and processed and also contains information, besides the syntax description of allowed elements, a semantic description on the purpose of the respective elements. A payload issued for plain application/json doesn't teach a client really what the properties of the respective JSON objects used in the payload mean nor does it really support URIs in first place. Note however that issuing a plain JSON request to the server is fine if the client was "instructed" that way using a form the client was acting upon. The server here just expects that kind of payload then. Just look at how a typical HTML document is build up and read up on some of the tag definitions that are used within the HTML document and you might get the gist of this paragraph.
Especially about the latter two points Fielding himself was quite vocal about in his famous rant:
A REST API should spend almost all of its descriptive effort in defining the media type(s) used for representing resources and driving application state, or in defining extended relation names and/or hypertext-enabled mark-up for existing standard media types ...
So, back to the actual question at hand. Is "there is no record in the table" really a business logic error? You could also design it to return what's currently available there and return an empty list. This at least spare you the hazzle of mapping that business error onto the transportation domain in that case.
If you want or need to express a business logic failure to the client you should, as voice also recommended before, look into application/problem+json (or its XML alternative application/problem+xml) which define properties such as type of failure, general title, status and details among others. The respective type the response is issued for may define further properties specific to that type that are part of the payload. I.e. you may define an extension type of http://acme.com/problem/validation and this extension type defines that the payload needs to contain a target-ref property to identify the element that failed the validation check as well as a property for the actual error message.
In the end some general recommendations in terms of REST are:
Design the interactions of client and servers first as if you'd interact with a typical human-focused Web page and then translate the interaction steps onto the application domain. REST in the end is nothing more than a generalized approach for how we humans interact on the Web for decades. REST is basically Web surfing for applications rather than humans. As we humans follow an outlined state machine of i.e. Amazon.com to order some books, computers can do the same. Therefore design the whole interaction between client and server as state machine that clients just follow along and may exit at certain points
Allow servers to teach clients what they need to know using various form-support and use link-relations to set given URIs in context to the current resource

Is a REST API supposed to support multiple protocols

Fielding has written in his blog entry REST APIs must be hypertext-driven that:
A REST API should not be dependent on any single communication protocol, though its successful mapping to a given protocol may be dependent on the availability of metadata, choice of methods, etc. In general, any protocol element that uses a URI for identification must allow any URI scheme to be used for the sake of that identification. [Failure here implies that identification is not separated from interaction.]
From my reading of this, any REST API must support more than one protocol in order to be considered restful.
Would an application that would fit all other conditions not be considered restful, if the application only supports one protocol such as HTTP?
From my reading of this, any REST API must support more than one protocol in order to be considered restful.
I don't believe that's quite the right way to read it.
Recall that the web is the reference implementation of the REST architectural style. Most of the identifiers will be https/http; but we also find ftp, and mailto, and about. Fielding's point is that, for the purposes of identifying resources, our mechanisms should be scheme agnostic.
any protocol element that uses a URI for identification must allow any URI scheme to be used for the sake of that identification. (emphasis added)
We've got an entire registry full of URI schemes, and for the purposes of identifying resources, they all have equal standing.
Link: <mailto://JMG#example.org>; rel="author"; anchor="https://stackoverflow.com/questions/73654298/is-a-rest-api-supposed-to-support-multiple-protocols"
That's a perfectly fine link relation, indicating that there is an author relationship between two resources.
Fielding doesn't mean that web servers also have to be mail servers; or that browsers need to figure out what you meant when you put a mailto URI in an image tag.
It may be useful to review Fielding's follow up essay, Specialization:
My dissertation is written to a certain audience: experts in the fields of software engineering and network protocol design....
Fielding designed REST for HTTP 1.1 machine to machine communication. It has many constraints: https://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm Afaik. only HTTP fulfills all of them and even if you use HTTP you must use it in a specific way. Though I don't know much about other protocols. What's certain, that you cannot use REST for websockets, because the communication is stateful and it violates the statelessness constraint.

PUT or PATCH when user can modify either partially or completely?

Let's imagine I have an application where users can either completely, or partially, update their profile details in one part of the app.
PUT for all requests
a PUT (for complete updates) and PATCH (for partial updates) for the requests
In the second scenario I could let the frontend decide whether the full or just a part of the profile was updated. However, this would involve both more code on both the front- and backend.
The first method is on the other hand "easier" to implement. However, is it against certain REST specs / principles?
is it against certain REST specs / principles?
It depends on how you mean it.
If you are thinking "all profile changes are performed by sending a complete replacement of the profile resource via an HTTP PUT", then yes, that is aligned with REST principles (specifically, it respects the uniform interface constraint -- you are using the HTTP PUT message the same way it is used everywhere else, which means that general purpose clients can interface with your resources).
On the other hand, if, instead of the complete replacement, you are considering sending a partial replacement via HTTP PUT, then that is not consistent with REST principles (because you are deviating from the standardized semantics of HTTP PUT).
If HTTP had a standardized "partial PUT" method, using this hypothetical method would be consistent with REST principles.
In other words, REST doesn't really say anything about what messages should be included in the "uniform interface". It just says that everybody should use those messages the same way. It's the HTTP standard that says PUT means complete replacement.

HttpRequest Host Vulnerabilites

I would like your advice regarding any security vulnerabilities from extracting the Domain Name from the Host property on an HttpRequest?
I have developed a PWA using ASP.NET Core that is multi-tenant and I extract the domain (i.e. The tenant) name from the host (HttpRequest.Host) which I use to look up information in a database.
For example, if I had a URL like www.JoeBloggs.com the extracted domain would be 'JoeBloggs'. Using this I then retrieve the information I require for that tenant.
Information is always sent over a HTTPS connection.
Can the Host value be faked or potentially used in a SQL Injection attack if I am using the domain name as part of a database lookup? 
Thanks, Advance.
Historically there have been a slew of HTTP Host header attacks in which target webservers implicitly trust the Host header value with no/improper whitelist checking or sanitization. In short, it is possible to fake this value in certain contexts/configurations.
For concerns regarding SQL injection specifically, you should already be using prepared statements and parameterized queries to mitigate such risks; if you aren't already you should absolutely be working to refactor your SQL interaction code to do so. Even if the Host value sent by a malicious client is intended to exploit a SQL injection vulnerability somewhere downstream from the HTTP server, such a value shouldn't be able to trigger any unintended functionality or data exposure as parameters passed to query strings via mechanisms like prepared statements/parameterized queries wouldn't interpretable as SQL statements.
Relatedly, if you're using the value of the Host header to determine whether a client should receive any sort of "privileged" information in the response from your server - don't. A Host header value is not nearly a stand-in for a proper authentication/authorization flow and should absolutely never be used as such, considering it's able to be manipulated rather trivially. You can certainly use it in conjunction with other, more secure methods of authentication/authorization, but using it by itself is a big security no-no.
This advice does not preclude there being a separate exploitable flaw/bug in your database, database driver, or anywhere else in your stack that examines the contents of the Host header.

Can we pass parameters to HTTP DELETE api

I have an API that will delete a resource (DELETE /resources/{resourceId})
THE above API can only tell us to delete the resource. Now I want to extend the API for other use cases like taking a backup of that resource before deleting or delete other dependant resources of this resource etc.
I want to extend the delete API to this (DELETE /resources/{resourceId}?backupBeforeDelete=true...)
Is the above-mentioned extension API good/recommended?
According to the HTTP Specification, any HTTP message can bear an optional body and/or header part, which means, that you can control in your back-end - what to do (e.g. see what your server receives and conventionally perform your operation), in case of any HTTP Method; however, if you're talking about RESTful API design, DELETE, or any other operation should refer to REST API endpoint resource, which is mapped to controller's DELETE method, and server should then perform the operation, based on the logic in your method.
DELETE /resources/{resourceId} HTTP/1.1
should be OK.
Is the above-mentioned extension API good/recommended?
Probably not.
HTTP is (among other things) an agreement about message semantics: a uniform agreement about what the messages mean.
The basic goal is that, since everybody has the same understanding about what messages mean, we can use a lot of general purpose components (browsers, reverse proxies, etc).
When we start trying to finesse the messages in non standard ways, we lose the benefits of the common interface.
As far as DELETE is concerned, your use case runs into a problem, which is that HTTP does not define a parameterized DELETE.
The usual place to put parameters in an HTTP message is within the message body. Unfortunately...
A payload within a DELETE request message has no defined semantics; sending a payload body on a DELETE request might cause some existing implementations to reject the request
In other words, you can't count on general purpose components doing the right thing here, because the request body is out of bounds.
On the other hand
DELETE /resources/{resourceId}?backupBeforeDelete=true
This has the problem that general purpose components will not recognize that /resources/{resourceId}?backupBeforeDelete=true is the same resource as /resources/{resourceId}. The identifiers for the two are different, and messages sent to one are not understood to affect the other.
The right answer, for your use case, is to change your method token; the correct standard method for what you are trying to do here is POST
POST serves many useful purposes in HTTP, including the general purpose of “this action isn’t worth standardizing.” -- Fielding, 2009
You should use the "real" URI for the resource (the same one that is used in a GET request), and stick any parameters that you need into the payload.
POST /resources/{resourceId}
backupBeforeDelete=true
Assuming you are using POST for other "not worth standardizing" actions, there will need to be enough context in the request that the server can distinguish the different use cases. On the web, we would normally collect the parameters via an HTML form, the usual answer is to include a request token in the body
POST /resources/{resourceId}
action=delete&backupBeforeDelete=true
On the other hand, if you think you are working on an action that is worth standardizing, then the right thing to do is set to defining a new method token with the semantics that you want, and pushing for adoption
MAGIC_NEW_DELETE /resources/{resourceId}
backupBeforeDelete=true
This is, after all, where PATCH comes from; Dusseault et al recognized that patch semantics could be useful for all resources, created a document that described the semantics that they wanted, and shepherded that document through the standardization process.