HTTP4s increase max upload size - http4s

I am toying with http4s multipart file upload, which I got working. However, the multipart parsing throws an exception for file uploads bigger than ~500kb.
The error on client side, which is thrown while parsing multipart body is HTTP 422: The request body was invalid.
The error on the server side is "Part not terminated properly"
Since this is obviously related to the size of uploaded file, I suspect there must be a config in http4s to allow larger uploads?
Thanks in advance!

can you try changing content length in headers?
eg: content-length: 3495 OR Content-Length: 3495 depending on size of your content.
Ref:
https://github.com/http4s/http4s/blob/4b928e0dc0ba6edbdbe7461204663e13a7013f8c/blaze-server/src/main/scala/org/http4s/blaze/server/Http2NodeStage.scala#L129
As I see this method getBody is called with len param
https://github.com/http4s/http4s/blob/4b928e0dc0ba6edbdbe7461204663e13a7013f8c/blaze-server/src/main/scala/org/http4s/blaze/server/Http2NodeStage.scala#L108
What I uncovered is that you can pass this header content-length to allow that size
https://github.com/http4s/blaze/blob/3d1b15eace96740507daac9c9e75f978bbd2e524/http/src/main/scala/org/http4s/blaze/http/HeaderNames.scala#L26
ref for content length header:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Length
Hope it works.

There is a header to change the maximum size able to be uploaded called Content-Length.
This is the simple syntax:
Content-Length: <length>
The length parameter is just a number stating the number of bytes that is allowed for a maximum size.
Some examples include:
Content-Length: 6553
Content-Length: 54138
So, you can set the maximum size here.
To inspect this header in browsers, follow the steps below.
Click Inspect Element in your browser.
Click on the Network tab.
Check the request header.
Find the header Content-Length in there.
If you want browser compatibility statistics, here they are:
Google Chrome (and all Chromnium-based browsers)
Firefox
Opera
Safari
Microsoft Internet Explorer
This is how you can change the maximum file upload size with HTTP4.

Related

Request to API getting truncated

I have an Asp.net API application running on Windows data center 2019 and some of the requests for an API get truncated. This is happening to only one of the APIs as far as I know because it’s the only one that upload big chunks of data. All others are tiny requests.
This API takes json in the body which contains: and issue ID and a serialized image. The largest requests size is 20mb so its not too big and most of the time they go thru fine, but sometimes the request gets chopped off at the end – sometimes a little, sometimes a lot. Also, there is a pattern that when a request it truncated, the API returns a 500 code and the client device tries to call the API again, and often it will continue to be truncated and always in different places.
I have good visibility into this because I use a logger module that writes every request to a text file allowing me to see exactly what hits the server.
I know the device is sending a well formatted request because it would get an exception if it created bad formatted json.
Lastly, this is using SSL.
This is what a typical request looks like:
HEADERS
Content-Length : 7154364
Content-Type : text/plain; charset=utf-8
Accept-Encoding : gzip
Host : cmtafr-dev.nwis.net
User-Agent : Dart/2.17 (dart:io)
deviceid : xxxxxxx
appversion : 2.21.10
cap2.0_tokenkey : xxxxx
osversion : 15.6.1
devicebrand : IOS
devicemodel : iPhone13,4
PATH
/api/IssueController/ExecuteIssue_UploadImageV2/7acd5643-c112-4f74-9dfa-1d558ee3ae69
BODY
{"Isu_Id":"9EC539F2-ABDD-49C5-934A-FAD6371B3E9C","ImageData":"/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAA... bla bla bla"}
Additional info:
The screenshot below shows 2 requests attempting to upload the same image. both failed as they were truncated. Notice the file sizes are different - meaning they left the device in good shape.
However, the header on both requests show the Content-Length:
16449758
HEADERS
Connection : keep-alive
Content-Length : **16449758**
Content-Type : application/json
Accept : */*
How can I possibly troubleshoot this? I have spent hours searching and not found anything similar to this. Most posts regarding …truncated requests/responses are application specific where a vendor replies.
Thank you for any help you can offer.

is there a size limit to individual fields in HTTP POST?

I have an API for a file upload that expects a multipart form submission. But I have a customer writing a client and his system can't properly generate a multipart/form-data request. He's asking that I modify my API to accept the file in a application/x-www-form-urlencoded request, with the filename in one key/value pair and the contents of the file, base64 encoded, in another key/value pair.
In principle I can easily do this (tho I need a shower afterwards), but I'm worried about size limits. The files we expect in Production will be fairly large: 5-10MB, sometimes up to 20MB. I can't find anything that tells me about length limitations on individual key/value pair data inside a form POST, either in specs (I've looked at, among others, the HTTP spec and the Forms spec) or in a specific implementation (my API runs on a Java application server, Jetty, with an Apache HTTP server in front of it).
What is the technical and practical limit for an individual value in a key/value pair in a form POST?
There are artificial limits, configurations, present on the HttpConfiguration class. Both for maximum number of keys, and maximum size of the request body content.
In practical terms, this is a really bad idea.
You'll have a String, which uses 2-bytes per character for the Base64 data.
And you have the typical 33% overhead just being Base64.
They'll also have to utf8 urlencode the Base64 string for various special characters (such as "+" which has meaning in Base64, but is space " " in urlencoded form. So they'll need to encode that "+" to "%2B").
So for a 20MB file you'll have ...
20,971,520 bytes of raw data, represented as 27,892,122 characters in raw Base64, using (on average) 29,286,728 characters when urlencoded, which will use 58,573,455 bytes of memory in its String form.
The decoding process on Jetty will take the incoming raw urlencoded bytes and allocate 2x that size in a String before decoding the urlencoded form. So that's a 58,573,456 length java.lang.String (that uses 117,146,912 bytes of heap memory for the String, and don't forget the 29MB of bytebuffer data being held too!) just to decode that Base64 binary file as a value in a x-www-form-urlencoded String form.
I would push back and force them to use multipart/form-data properly. There are tons of good libraries to generate that form-data properly.
If they are using Java, tell them to use the httpmime library from the Apache HttpComponents project (they don't have to have/use/install Apache Http Client to use the httpmime, its a standalone library).
Alternative Approach
There's nothing saying you have to use application/x-www-form-urlecnoded or multipart/form-data.
Offer a raw upload option via application/octet-stream
They use POST, and MUST include the following valid request headers ...
Connection: close
Content-Type: application/octet-stream
Content-Length: <whatever_size_the_content_is>
Connection: close to indicate when the http protocol is complete.
Content-Type: application/octet-stream means Jetty will not process that content as request parameters and will not apply charset translations to it.
Content-Length is required to ensure that the entire file is sent/received.
Then just stream the raw binary bytes to you.
This is just for the file contents, if you have other information that needs to be passed in (such as filename) consider using either the query parameters for that, or a custom request header (eg: X-Filename: secretsauce.doc)
On your servlet, you just use HttpServletRequest.getInputStream() to obtain those bytes, and you use the Content-Length variable to verify that you received the entire file.
Optionally, you can make them provide a SHA1 hash in the request headers, like X-Sha1Sum: bed0213d7b167aa9c1734a236f798659395e4e19 which you then use on your side to verify that the entire file was sent/received properly.

Getting file contents with Range header returns Partial Content and subsequent request returns no data

We have a non-understandable issue with getting file contents using OneDrive API.
When we request file contents with Range header:
GET /blahblah/foobar.docx HTTP/1.1
Host: qw122q-ch3301.files.1drv.com
Accept: */*
Accept-Encoding: deflate, gzip
Range: bytes=0-77270
OneDrive returns:
HTTP/1.1 206 Partial Content
Cache-Control: no-cache
Content-Length: 18325
We checked that the file size is correct on OneDrive server using web interface. Usually OneDrive returns full requested content but from last week they returns partial contents. But it's OK if we can get remaining parts with another API calls.
But when we send another request with Range header:
Range: bytes=18325-77270
OneDrive returns no data:
HTTP/1.1 206 Partial Content
Control: no-cache
Content-Length: 0
Has anyone experienced this issue? I can't find any clues on this issue from OneDrive developer documents. Please shed some light on this..
Actually I have a theory so I'm going to take a shot at an answer. There are actually two different issues that are resulting in this confusing behavior, so I'll tackle each one separately.
Reported file size doesn't match content size
This is an unfortunate quirk of the system that is being tracked with this GitHub issue. Ryan explains in more detail here.
Range downloads of word docs do not correctly handle unsatisfiable ranges
When a range outside of the actual file size is requested we should be failing with a 416 Requested Range Not Satisfiable like we do for "normal" files. But that's obviously not working. You can see in the Content-Range of the result there's something screwy going on:
FileSize: 15 bytes
Range Requested: bytes=15-
Content-Range Response: bytes=15-14/15
The value of the Content-Range obviously makes no sense.
Together these two issues should result in the weird behavior you're seeing. We're close to resolving the first, while the second was unknown (at least to me) so I've opened a new GitHub issue to track it.

Do any browsers support trailers sent in chunked encoding responses?

HTTP/1.1 specifies that a response sent as Transfer-Encoding: chunked can include optional trailers (ie. what would normally be sent as headers, but for whatever reason can't be calculated before the content, so they can be appended to the end), for example:
Request:
GET /trailers.html HTTP/1.1
TE: chunked, trailers
Response:
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Trailer: My-Test-Trailer
D\r\n
All your base\r\n
B\r\n;
are belong\r\n
6\r\n
to us\r\n
0\r\n
My-Test-Trailer: something\r\n
\r\n
This request specifies in the TE header that it's expecting a chunked response, and will be looking for trailers after the final chunk.
The response specifies in the Trailer header the list of trailers it will be sending (in this case, just one: My-Test-Trailer)
Each of the chunks are sent as:
size of chunk in hex (D = 13), followed by a CRLF
chunk data (All your base), followed by a CRLF
A zero size chunk (0\r\n) indicates the end of the body.
Then the trailer(s) are specified (My-Test-Trailer: something\r\n), followed by a final CRLF.
Now, from everything I've read so far, trailers are rarely (if ever) used. Most discussions here and elsewhere concerning trailers typically start with "but why do you want to use trailers anyway?".
Putting aside the question of why, out of curiosity I've been trying to simulate a HTTP request/response exchange that uses trailers; but so far I have not yet been able to get it to work, and I'm not sure if it's something wrong with response I'm generating, or whether (as some have suggested) there are simply no clients that look for trailing headers.
Clients I've tried include: curl, wfetch, Chrome + jQuery.
In all cases, the client receives and correctly reconstructs the chunked response (All your base are belong to us); and I can see in the response headers that Trailer: My-Test-Trailer is being sent; but I'm not seeing My-Test-Trailier: something returned either in the response headers, or anywhere.
It's unclear whether a trailing header like this should appear in the client as a normal response header, after the entire response has been received and the connection closed?
Interestingly, the curl change logs appear to suggest that curl does support optional trailers, and that curl will process any trailers it finds into the normal header stream.
So does anybody know:
of a valid URL that I could ping, which sends trailers in a chunked response? (so that I can confirm whether it's just my test response that's not working); and
which clients are known to support (and access/display) trailers sent by the server?
No common browsers support HTTP/1.1 trailers. Look at the column "Headers in trailer" in the "Network" tab of browserscope.
Chrome: No, and won't fix (bug). Supports H/2 trailers (bug).
Firefox: No, and I don't see a ticket in bugzilla for it. Appears to support H/2.
IE: No
Edge: No
Safari: No
Opera: Old versions only (v10 - 12, removed in 14)
As you've discovered, a number of non-browser clients support it.
Over 5 years since asking this question, I can now definitively answer it myself.
Mozilla just announced that they will be supporting the new Server-Timing field as a HTTP trailing header (their first ever support for trailers).
https://bugzilla.mozilla.org/show_bug.cgi?id=1413999
However, more importantly, they confirm that it will be whitelisted so that Server-Timing is the only support value (emphasis mine):
Server-Timing is an HTTP trailer, not a header. :mcmanus tells me we currently parse trailers, but then silently throw them away. We don't want to change that behavior in general (we don't want to encourage trailers), so we'll want to whitelist the Server-Timing trailer, store it somewhere (probably even just a mServerTiming header will work for now, since it's the only trailer we support) and then make it available via some new channel.getTrailers() call.
So I guess that confirms it once and for all: trailing headers are not supported (and never likely to be in a general sense) by Moz, and presumably the same stance is taken by all other browser vendors.
Since this commit, Jodd HTTP Java client support trailer headers.
On the first question, I haven't yet found any live response that uses them ;)
As of May 2022, all browsers support the Trailer response header: https://caniuse.com/mdn-http_headers_trailer.
Library support:
Node.js
Jodd
EDIT ME, to add more. (This is a Community wiki answer.)
Just recently, according to caniuse.com, it is claimed that most major browser providers now support the Trailer response header in their latest versions (e.g. Firefox-88, Safari-14.1, Chrome-88, etc).
However, it seems that this only due to their support of Server-Timing (which can use trailers as mentioned in another answer), and there does not currently seem to be a general way to access a Trailer header from the browser Javascript - it's currently an open issue for the Fetch API, and the bug request for trailers in Chrome is marked wontfix.

411 Length required error with App Proxies

I set up an app proxy for my app deployed on a normal shared hosted server (not Heroku or anything like that). It works like a charm (as do my other apps) until I set the content type to application/liquid.
As soon as I do that I get a 411 Length Required error by nginx which is generated by my server (my guess). I tried to resolve it by setting content length to 0. It worked for a while but then it stopped. I tried other values and it works depending on its mood. Funnily, sometimes the output is truncated at content length, and at times I get the whole output (a simple page refresh can give different outputs). Also, sometimes it doesn't work AT ALL and shopify throws a "we're having tech. difficulties" error.
To summarize, content length is not reliable at all.
Now I am not sure exactly what causes a 411 error and what can I do about it. And why is it thrown only when content type is liquid. Moreover, content-length doesn't result in a consistent output (no output/predictable output/truncated output/shopify error).
Anyone knows what's up?
Perhaps your responses are using chunked transfer encoding. I don't think nginx supports this by default, so would return a 411 error in this case because chunked encoding doesn't use a Content-Length header.
If you do want to use chunked responses, there is the http://wiki.nginx.org/HttpChunkinModule module that should add support for this. Otherwise, disable the chunked encoding in your app, and make sure the Content-Length header is consistent with the length of the body of the response.