Is the URL subject to HTTP/2 header compression? - http-headers

I understand that, if you send duplicate header values in subsequent requests, the dynamic table makes it so that you do not send the value again but a reference to it in the table is sent instead.
My question is whether this applies to the URL as well?
Say you have repeated requests to the same URL (possibly containing long IDs and/or tokens), would bandwidth be saved in this instance?

There are various options that a client can use to send headers under HTTP/2 as defined in the HPACK specification. These basically say whether to use a previously referred to header, whether to store a header for later reference, whether to never store a header for reuse...etc. The client decides which of these to use for headers it sends.
In HTTP/2 the URL is sent in the :path pseudo-header so unlike in HTTP/1.1 it is a just like any other HTTP Header so could be compressed. Typically a URL is not repeated often, however, so it would be sent as a Literal Header Field without Indexing, which means this is a once off header so don’t store it for reuse. Of course, as it’s an HTTP header much like any other, there’s nothing to stop an HTTP/2 client sending this as an indexed type, but web browsers are unlikely to do this, so this is probably only really an option for custom clients.
Incidentally if wishing to know more about this, and finding the spec a little difficult to follow, then my book HTTP/2 in Action, goes into this in a lot more detail in Chapter 8.

Related

Should I use GET or POST if getting idempotent information but with parameters that are not meant to be in URL

I have an API that gets a Credit Card number when you supply a reference id. The reference id is considered sensitive data, so my understanding that it shouldn't show up in the URL, and instead needs to be defined in JSON body while the protocol is HTTPs for encryption.
Now should the request be a GET which sounds more natural when reading it, yet looks odd when attaching a JSON body to it. Or should it be POST were it makes sense to have a JSON body, yet sounds odd when reading it, and also the request in itself is idempotent.
A payload within a GET request message has no defined semantics -- RFC 7231
If you must pass information to the server in the payload of the request, then GET isn't a valid option.
On the other hand
POST serves many useful purposes in HTTP, including the general purpose of “this action isn’t worth standardizing.” -- Fielding, 2009
In other words, we use POST if none of the other registered methods have appropriate semantics and we don't want to extend HTTP with our own method-token.
should it be POST were it makes sense to have a JSON body, yet sounds odd when reading it, and also the request in itself is idempotent.
It's not ideal - you have a request where the intended semantics are idempotent, but no effective way to communicate that to general purpose components.
What you can sometimes do, is use a request with a body to create a new resource, and then use GET with the identifier of the new resource. That keeps the sensitive information out of the logs, while still giving you safe semantics, but at the cost of an extra round trip and some complexity
POST /foo
Content-Type: application/json
{ "CreditCardNumber" : "0000-0000-0000-0000" }
201 Created
Location: /4d49cad6-4165-472d-ad61-c91160fdd06c
Content-Location: /4d49cad6-4165-472d-ad61-c91160fdd06c
Here, Location tells a general purpose client where the new page has been created, and Content-Location tells a general purpose client that the contents of this message is a copy of the new page.
If the client wants to check that page later for an update, a simple GET request will work
GET /4d49cad6-4165-472d-ad61-c91160fdd06c
So the URI never has the credit card number, but instead has a token that can unlock the credit card number from some secure store at the server.
In effect, /4d49cad6-4165-472d-ad61-c91160fdd06c is a web page about credit card number 0000-0000-0000-0000.
But there's extra song and dance when the client doesn't remember the unique identifier for that web page, and has to use POST to ask where it is again.

REST API for sending files between services

I'm building a microservice which one of it's API's expects a file and some parameters which the API will process and return a response for.
I've searched and found some references, mostly pointing towards form-data (multipart), however they mostly refer to client to service and not service to service like in my case.
I'll be happy to know what is the best practice for this case for both the client (a service actually) and me.
I would also suggest to perform a POST request (multipart) to a service endpoint that can process/accept a byte stream wrapped into the provided HTML body(s). A PUT request may also work in some cases.
Your main concerns will consist in binding enough metadata to the request so that the remote service can correctly handle it. This include in particular the following headers:
Content-Type: to provide the MIME type of the data being transferred and enable its proper processing.
Content-Disposition: to provide additional information about the body part such as the file name.
I personally believe that a single request is enough (in contrast to #Evert suggestion) as it will result in less overhead overall and will keep things simple (and RESTful) by avoiding any linking (or state) between successive requests.
I would not wrap data in form-data, because it just adds to the total body size. You can just put the entire raw file in the body of a PUT or POST request.
If you also need to send meta-data, I would suggest 2 requests. If you absolutely can't do 2 requests, form-data might still be the best option and it does work server-to-server.

Which of the following are valid URI's(Uniform Resource Identifier) as per the REST API specifications?

How to identify which of the following Uniform Resource Identifier(URI) is valid as per REST API specifications.
Choose one or more options
1. POST https://api.example.com/whales/create/9xf3df
2. PUT https://api.example.com/whales/9xf3df
3. GET https://api.example.com/whales/9xf3df?sort=name&valid=true
4. DELETE https://api.example.com/whales
REST doesn't care what spelling conventions you use for your resource identifiers; anything that conforms to the production rules defined by RFC 3986 is fine.
/whales/create/9xf3df
/whales/9xf3df
/whales/9xf3df?sort=name&valid=true
/whales
/0cc846bb-678d-45d8-9c06-d9cf94cee0a5
/9xf3df/whales
These are all fine identifiers.
Identifiers for a "REST API" are exactly like identifiers for web pages - you can use any spelling you want, and the browsers, caches, web crawlers, and so on will work with them quite happily; these general purpose components treat identifiers like identifiers - they don't try to extract any meaning from them.
By way of demonstration, please observe that all of the following work exactly the way you would expect them to:
https://www.merriam-webster.com/dictionary/create
https://www.merriam-webster.com/dictionary/get
https://www.merriam-webster.com/dictionary/put
https://www.merriam-webster.com/dictionary/post
https://www.merriam-webster.com/dictionary/patch
https://www.merriam-webster.com/dictionary/delete
Does REST care about the POST, PUT, GET, and DELETE for the above options?
Hard to be sure which question you are asking here.
PUT /dictionary/delete HTTP/1.1
That's a perfectly satisfactory request-line, and there is no ambiguity about what it means. In this example, PUT is the method-token; that tells the server that we are requesting that the representation of the target resource (identified by /dictionary/delete) be replaced by the representation include in the message-body of the request
For this specific resource, that probably means that the message-body is an HTML document (we'd see Content-Type: text/html in the headers, to ensure that the server knows how to correctly interpret the bytes provided).
PATCH /dictionary/delete HTTP/1.1
This is also a satisfactory request line; we are again requesting a change to the representation of the /dictionary/delete resource, but we're going about it in a slightly different way - instead of including a replacement representation in the message body, we're providing a representation of a list of changes to make (aka a "patch document").
Uniform interface means that we should expect the folks at www.merriam-webster.com to understand these messages exactly as we've described them here.
Now, for these specific resources, they probably don't want random stack-overflow members making changes to their website, so they are likely to respond 403 Forbidden or 405 Method Not Allowed.
All of the general purpose components will understand what that means, again because the standardized response meta-data is common to all resources.

Content-Range header - allowed units?

This is related to:
How should I implement a COUNT verb in my RESTful web service? , Paging in a Rest Collection
and Using the HTTP Range Header with a range specifier other than bytes?
Actually I think the -1 rated anwser here is correct https://stackoverflow.com/a/1434701/1237617
Generally anwsers say that you can use custom units citing the sec 3.12
range-unit = bytes-unit | other-range-unit
bytes-unit = "bytes"
other-range-unit = token
However when you read the HTTP spec please notice the production rules are thus:
Content-Range = "Content-Range" ":" content-range-spec
content-range-spec = byte-content-range-spec
byte-content-range-spec = bytes-unit SP
byte-range-resp-spec "/"
( instance-length | "*" )
The header spec only references bytes-unit from sec 3.12, not range-units, so I think that actually it's against the spec to use custom units here.
Am I missing something or is the popular anwser wrong?
EDIT: Since this probbably isn't clear, the gist of my question is:
rfc2616 sec14.16 only references bytes-unit. It never mentions range-unit, so range-unit production is not relevant for Content-Range, and thus only byte-units can be used.
I think this adresses my concerns best, although I needed some time to understand it (plus I wanted to make sure, that there is something wrong with the wording).
This reflects the fact that, apparently, the first set of grammar rules has been specifically made for parsing and the second one for producing HTTP requests
thanks to elgaton
The spec, as being revised, allows custom range units. See HTTPbis Part 5, Section 2.
If you read the HTTP/1.1 RFC, section 3.12, you will see that:
The only range unit defined by HTTP/1.1 is "bytes". HTTP/1.1 implementations MAY ignore ranges specified using other units.
So, the other-range-unit token has been introduced only to make servers more "liberal" when accepting. This reflects the fact that, apparently, the first set of grammar rules has been specifically made for parsing and the second one for producing HTTP requests, so that servers could accept even invalid requests (they will be simply ignored) and clients would use only the universally-accepted bytes unit.
Therefore, I personally recommend to:
use only the bytes unit when acting as a client, and
accept other units (discarding the Content-Range header if they are invalid) when acting as a server.
This is a purely personal opinion, but I think it is fairly consistent with how other HTTP extensions (custom methods or headers) are used. Here is how I read it: Yes I can use custom range units and no, I shouldn't submit a bug report when it gets ignored when passing through firewalls, web proxies, and other intermediaries. I conform to the HTTP spec when I'm sending it and they conform to HTTP when they ignore it. WebDAV uses HTTP extensions correctly, IMO, but rarely works over the Internet for exactly this reason. As I said, a personal opinion only.
Apparently it's OK to use custom units, because:
This reflects the fact that, apparently, the first set of grammar
rules has been specifically made for parsing and the second one for
producing HTTP requests

Does the `Expires` HTTP header needs to be consistent across multiple cold-cache requests?

I'm implementing a custom web server of a kind. And am looking into adding an Expires header support. However, I'm a little unsure of how exactly to implement it.
If multiple cold-cache requests are being made to the same unchanged resource on the server and the server returned different Expires header (say it uses relative time to calculate the exact value of the Expires date e.g. +6 hours from the request time), does that invalidate the cache on all the proxy servers in-between as well? Or is it impossible to happen (per the spec)?
Does the Expires HTTP header needs to be consistent across multiple cold-cache requests?
Ok, never mind, found the relevant information under the Cache Revalidation and Reload Controls section of the HTTP Spec
Basically, you can serve all the different validators you want but you must be aware that in such case proxies may have a set of different validators from their own cache and from various user agents communicating with the proxy. They may choose to send one to you and that might not be the correct or the most optimal one for the end-users. However, a "best approach" has been suggested in the spec.
I suppose this should covers Expires headers as well as ETags, Cache-Control and whatnot.
Here's the relevant excerpt, in case anyone's interested:
When an intermediate cache is forced,
by means of a max-age=0 directive, to
revalidate its own cache entry, and
the client has supplied its own
validator in the request, the supplied
validator might differ from the
validator currently stored with the
cache entry. In this case, the cache
MAY use either validator in making its
own request without affecting semantic
transparency. However, the choice of
validator might affect performance.
The best approach is for the
intermediate cache to use its own
validator when making its request. If
the server replies with 304 (Not
Modified), then the cache can return
its now validated copy to the client
with a 200 (OK) response. If the
server replies with a new entity and
cache validator, however, the
intermediate cache can compare the
returned validator with the one
provided in the client's request,
using the strong comparison function.
If the client's validator is equal to
the origin server's, then the
intermediate cache simply returns 304
(Not Modified). Otherwise, it returns
the new entity with a 200 (OK)
response. If a request includes the
no-cache directive, it SHOULD NOT
include min-fresh, max-stale, or
max-age.