Google cloud load balancer custom http header is missing - load-balancing

While using Google Cloud HTTPS Load Balancer we hit the following bug. Couldn't find any information on it.
We have a custom http header in our request:
X-<Company name>-abcde. If we are working directly against the server all is good, but once we are working through the load balancer, than our custom header is missing. We didn't find any reference in the documentation that there is a need to white list our headers or something like that.
Why my custom header is not being transferred to my backend server while working through Google Cloud Load Balancer? And how to make it work?
Thanks

Data
After a lot of testing, these are the results I've come up with:
The Google Cloud HTTPS Load Balancer does transfer custom HTTP headers to the backend service.
However, it changes them to lower-case.
So, in your case, X-Custom-Header is transformed to x-custom-header.
Solution
Simply change your code to read the lower-case version of your custom HTTP header. This is a simple fix, but one which may not be supported in the long-term by Google (there's not a word on this in Google's documentation so it's subject to change with no notice).
Petition Google to change this idiosyncratic behaviour or at the very least mention it clearly in their documentation.
A little extra
As far as I know, the RFC 2047 which specified the X- prefix for custom HTTP headers and propagated the pseudo-standard of a capital letter for each word has been deprecated and replaced by RFC 6648 which recommends against the X- prefix in general and mentions nothing regarding the rest of the words in the custom HTTP header key name. If I were Google, I would change this behaviour to pass custom HTTP headers as is and let developers deal with the strings as they've set them.

The RFC (RFC 7230) for HTTP/1.1 Message Syntax and Routing says that header fields have a case-insensitive field name. If you're relying on case to match the header that doesn't align with the RFC.
Way back in the day I looked through either the Tomcat of Jetty source and they worked with everything as a .toLower().
Go has a CanonicalMIMEHeaderKey where it'll format the headers in a common way to be sure everything is on the same page.
Python still harkens back to the RFC822 (hg.python.org/cpython/file/2.7/Lib/rfc822.py#l211) days, but it forces a .lower() on headers to standardize.
Basically though what the GCP HTTP(S) Load Balancer is doing is acceptable as far as the RFC is concerned.

This is most likely an application bug.
As other answers have stated, HTTP header names are case insensitive. Ime, every time headers appear to be case sensitive, it is because there is a request wrapper somewhere in the application call stack.
Request wrappers like this are common (usually necessary) in Java servlet filters. It's a common, newbie mistake to use case-sensitive matching (e.g. a regular Java HashMap<String, T>()) for the header names in the wrapper.
That's where I would start looking for your bug.
A reasonable way to create a Java Map<String, T> that is both case insensitive and that doesn't modify the keys is to use new TreeMap<String, T>( String.CASE_INSENSITIVE_ORDER ).

Related

Appropriate REST design for data export

What's the most appropriate way in REST to export something as PDF or other document type?
The next example explains my problem:
I have a resource called Banana.
I created all the canonical CRUD rest endpoint for that resource (i.e. GET /bananas; GET /bananas/{id}; POST /bananas/{id}; ...)
Now I need to create an endpoint which downloads a file (PDF, CSV, ..) which contains the representation of all the bananas.
First thing that came to my mind is GET /bananas/export, but in pure rest using verbs in url should not be allowed. Using a more appropriate httpMethod might be cool, something like EXPORT /bananas, but unfortunately this is not (yet?) possible.
Finally I thought about using the Accept header on the same GET /bananas endpoint, which based on the different media type (application/json, application/pdf, ..) returns the corresponding representation of the data (json, pdf, ..), but I'm not sure if I am misusing the Accept header in this way.
Any ideas?
in pure rest using verbs in url should not be allowed.
REST doesn't care what spelling conventions you use in your resource identifiers.
Example: https://www.merriam-webster.com/dictionary/post
Even though "post" is a verb (and worse, an HTTP method token!) that URI works just like every other resource identifier on the web.
The more interesting question, from a REST perspective, is whether the identifier should be the same that is used in some other context, or different.
REST cares a lot about caching (that's important to making the web "web scale"). In HTTP, caching is primarily about re-using prior responses.
The basic (but incomplete) idea being that we may be able to re-use a response that shares the same target URI.
HTTP also has built into it a general purpose mechanism for invalidating stored responses that is also focused on the target URI.
So here's one part of the riddle you need to think about: when someone sends a POST request to /bananas, should caches throw away the prior responses with the PDF representations?
If the answer is "no", then you need a different target URI. That can be anything that makes sense to you. /pdfs/bananas for example. (How many common path segments are used in the identifiers depends on how much convenience you will realize from relative references and dot segments.)
If the answer is "yes", then you may want to lean into using content negotiation.
In some cases, the answer might be "both" -- which is to say, to have multiple resources (each with its own identifier) that return the same representations.
That's a normal thing to do; we even have a mechanism for describing which resource is "preferred" (see RFC 6596).
REST does not care about this, but the HTTP standard does. Using the accept header for the expected MIME type is the standard way of doing this, so you did the right thing. No need to move it to a separate endpoint if the data is the same just the format is different.
Media types are the best way to represent this, but there is a practical aspect of this in that people will browse a rest API using root nouns... I'd put some record-count limits on it, maybe GET /bananas/export/100 to get the first 100, and GET /bananas/export/all if they really want all of them.

Concept of default headers in mule

I want to understand the concept of default header in mule.I want to hit a get api call[the code is written in java] from mule and I am sending a token in the header, but I am setting the token in the default header inside the http request configuration.
<http:default-headers >
<http:default-header key="testing" value="#[vars.authorizationHeader]" />
</http:default-headers>
Will my java code be able to read this header from attributes ?
Default headers are just ones that will always be sent across all requests referencing that configuration, so yes, your server will get that token. However, it is not a good practice to use them along with expressions as you are doing because that makes the configuration very fragile (what if there's no such var in the request flow?) and force a new configuration to be used (as the expression must be resolved each time). Default headers make sense when you want to force a static header everywhere, for tracking purposes for example. If the header will be dynamic, then it's best to configure it on each request.

REST request cannot be encoded for GET (URL too long)

Example:
URL: http://example.com/collection/a%20search%20term
Method: GET
Response: All items in collection matching a search term.
Problem: The search term may be so long that it breaks the web server's maximum
URL length.
How do I allow extremely long search terms and still stay RESTful?
For inspiration, I just looked at Google Translate's API v2, which is "using
the RESTful calling style."
Naturally, texts to be translated can be quite long. And so Google optionally
allows sending a request with POST, but with a twist:
To use POST, you must use the X-HTTP-Method-Override header to tell the
Translate API to treat the request as a GET (use X-HTTP-Method-Override:
GET).
So it is possible to semantically transform a POST request into a GET request.
(This discovery led me to add the x-http-method-override tag to my question.)
REST does not restrict POST to creation. Be careful with mapping CRUD to HTTP methods and assume that's RESTful. POST is the method used for any action that isn't standardized by HTTP.
Since the standard doesn't establish a limit for URIs, this can be considered a broken implementation, and it's ok to fix it. As long as the workaround is loosely coupled to your API, you are still RESTful. This means your API shouldn't implement a translation or override directly, but on a pre-processor of some kind that rewrites the request properly. It should be clearly documented somewhere that this is due to a broken implementation, and you expect it to eventually become obsolete.
It's a bad smell if your query can be so long that it exceeds the maximum length (de facto for browsers is 2000 characters but it can be higher for other means of accessing REST APIs).
If the user can pass in that much data, it should go in the request body/data field, not in the URL.

Content-Range header - allowed units?

This is related to:
How should I implement a COUNT verb in my RESTful web service? , Paging in a Rest Collection
and Using the HTTP Range Header with a range specifier other than bytes?
Actually I think the -1 rated anwser here is correct https://stackoverflow.com/a/1434701/1237617
Generally anwsers say that you can use custom units citing the sec 3.12
range-unit = bytes-unit | other-range-unit
bytes-unit = "bytes"
other-range-unit = token
However when you read the HTTP spec please notice the production rules are thus:
Content-Range = "Content-Range" ":" content-range-spec
content-range-spec = byte-content-range-spec
byte-content-range-spec = bytes-unit SP
byte-range-resp-spec "/"
( instance-length | "*" )
The header spec only references bytes-unit from sec 3.12, not range-units, so I think that actually it's against the spec to use custom units here.
Am I missing something or is the popular anwser wrong?
EDIT: Since this probbably isn't clear, the gist of my question is:
rfc2616 sec14.16 only references bytes-unit. It never mentions range-unit, so range-unit production is not relevant for Content-Range, and thus only byte-units can be used.
I think this adresses my concerns best, although I needed some time to understand it (plus I wanted to make sure, that there is something wrong with the wording).
This reflects the fact that, apparently, the first set of grammar rules has been specifically made for parsing and the second one for producing HTTP requests
thanks to elgaton
The spec, as being revised, allows custom range units. See HTTPbis Part 5, Section 2.
If you read the HTTP/1.1 RFC, section 3.12, you will see that:
The only range unit defined by HTTP/1.1 is "bytes". HTTP/1.1 implementations MAY ignore ranges specified using other units.
So, the other-range-unit token has been introduced only to make servers more "liberal" when accepting. This reflects the fact that, apparently, the first set of grammar rules has been specifically made for parsing and the second one for producing HTTP requests, so that servers could accept even invalid requests (they will be simply ignored) and clients would use only the universally-accepted bytes unit.
Therefore, I personally recommend to:
use only the bytes unit when acting as a client, and
accept other units (discarding the Content-Range header if they are invalid) when acting as a server.
This is a purely personal opinion, but I think it is fairly consistent with how other HTTP extensions (custom methods or headers) are used. Here is how I read it: Yes I can use custom range units and no, I shouldn't submit a bug report when it gets ignored when passing through firewalls, web proxies, and other intermediaries. I conform to the HTTP spec when I'm sending it and they conform to HTTP when they ignore it. WebDAV uses HTTP extensions correctly, IMO, but rarely works over the Internet for exactly this reason. As I said, a personal opinion only.
Apparently it's OK to use custom units, because:
This reflects the fact that, apparently, the first set of grammar
rules has been specifically made for parsing and the second one for
producing HTTP requests

Amazon S3 pre signed URLs using Amazon Java SDK and extra / characters

I've been creating Presigned HTTP PUT URLs and everything was working great until I wanted to start using "folders" in S3; I wanted the key to have the character '/'.
Now I get Signature doesn't match when I send the HTTP PUT requests due to the fact the '/' probably changes to %2F... If I escape the character before creating the presigned URL it works great, but then the Amazon console management doesn't understand it and shows it as one file instead of subfolders.
Any idea?
P.s.
The HTTP PUT requests are sent using C++ with POCO NET library.
EDIT
I'm using Poco HttpRequest from C++ to my Java web server to generate a signed url (returned on the response).
C++ then uses this url to put a file in s3 using Poco again.
The problem was that the urls returned from the web server were parsed through Poco URI objects that auto decoded the s3 object key thus changing it.With that in mind I was able to fix my problem.
Tricky - I'll try to approach this bottom up.
Disclaimer: I got carried away visually inspecting the Poco libraries instead of actually debugging a code sample, which should yield more reliable results much faster, see below ;)
Analysis
If I escape the character before creating the presigned URL it works
great, but then the Amazon console management doesn't understand it
and shows it as one file instead of subfolders.
The latter stems from S3 not having a concept of folders on the storage level actually, see e.g. section Index Documents and Folders within Index Document Support:
Objects stored in Amazon S3 are stored within a flat container, i.e.,
an Amazon S3 bucket, and it does not provide any hierarchical
organization, similar to a file system's. However, you can create a
logical hierarchy using object key names and use these names to infer
logical folders that contain these objects.
That's exactly what the AWS Management Console is doing here as well:
The AWS Management Console also supports the concept of folders, by
using the same key naming convention used in the preceding sample.
However, your test regarding the assumption of / being encoded as %2F proves, that this is indeed how Poco::Net is encoding the URL when performing the HTTP PUT request.
(I'm actually a bit surprised that the AWS Java SDK seems to generate different URLs here for / vs. %2F, insofar a recent analysis regarding Why is my S3 pre-signed request invalid when I set a response header override that contains a “+”? seems to indicate respective canonicalization by the AWS .NET SDK, see below for more on this.)
Potential Solution
In order for your scenario to work as desired, you'll need to figure out where the URL is encoded this way - I could think of two components in principle:
Poco::Net
Finding out why Poco::Net is encoding the URL different than S3 (if at all, see below) is best done by debugging your code, here's where I'd start:
Class HTTPRequest uses class URI in turn, which automatically performs a few normalizations on all URIs and URI parts passed to it, in particular percent-encoded characters are decoded. The other way round is handled by method encode(), which is where things get interesting and call for a breakpoint, see URI.cpp:
lines 575 ff. - here encode() does its magic, which indeed seems to be in place, insofar neither the code within the function nor the various chars passed in via the reserved parameter contain the offending / (see lines 47 ff. for the respective constants in use)
consequently you might want to set a breakpoint in this function and backtrace the callstack to find out which code is actually doing the encoding upfront, which might not yield an offender at all, see below.
Java => C++ transition
You haven't specified yet, which channel is actually used to communicate the pre-signed URL generated by the AWS Java SDK to C++ in turn. Given the code review (mind you, visual inspection only, I haven't debugged this myself yet) of the Poco::Net functionality yields the conclusion, that no obvious offender can be identified in the library itself, thus it seems more likely that it might already enter your C++ layer encoded (easily verified via debugging of course) - are you by chance using any kind of web service between these components for example?
Good luck!