Nginx + Tornado ( + curl): Inflate gzipped POST request - file-upload

I have setup a server (well... two servers, but I don't think that is too relevant for this question) running Tornado (version 2.4.1) and being proxied by Nginx (version 1.4.4).
I need to periodically upload json (basically text) files to one of them through a POST request. These files would greatly benefit from gzip compression (I get compression ratios of 90% when I compress the files manually) but I don't know how to inflate them in a nice way.
Ideally, Nginx would inflate it and pass it clean an neat to Tornado... but that's not what's happening now, as you'll have probably guessed, otherwise I wouldn't be asking this question :-)
These are the relevant parts of my nginx.conf file (or the parts that I think are relevant, because I'm pretty new to Nginx and Tornado):
user borrajax;
worker_processes 1;
pid /tmp/nginx.pid;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
access_log /tmp/access.log main;
error_log /tmp/error.log;
# Basic Settings
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
gzip on;
gzip_disable "msie6";
gzip_types application/json text/plain text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript image/x-icon image/bmp;
gzip_http_version 1.1;
gzip_proxied expired no-cache no-store private auth;
upstream web {
server 127.0.0.1:8000;
}
upstream input {
server 127.0.0.1:8200;
}
server {
listen 80 default_server;
server_name localhost;
location / {
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_pass http://web;
}
}
server {
listen 81 default_server;
server_name input.localhost;
location / {
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_pass http://input;
}
}
}
As I mentioned before, there are two Tornado servers. The main one is running on localhost:8000 for the web pages and that kind of stuff. The one running on localhost:8200 is the one intended to receive those json files) This setup is working fine, except for the Gzip part.
I'd like for Nginx to inflate the gzipped requests that come to localhost:81, and forward them to the Tornado I have running on localhost:8200 (inflated)
With the configuration like this, the data reaches Tornado, but the body is still compressed, and Tornado throws an exception:
[E 140108 15:33:42 input:1085] Uncaught exception POST
/input/log?ts=1389213222 (127.0.0.1)
HTTPRequest(
protocol='http', host='192.168.0.140:81',
method='POST', uri='/input/log?&ts=1389213222',
version='HTTP/1.0', remote_ip='127.0.0.1', body='\x1f\x8b\x08\x00\x00',
headers={'Content-Length': '1325', 'Accept-Encoding': 'deflate, gzip',
'Content-Encoding': 'gzip', 'Host': '192.168.0.140:81', 'Accept': '*/*',
'User-Agent': 'curl/7.23.1 libcurl/7.23.1 OpenSSL/1.0.1c zlib/1.2.7',
'Connection': 'close', 'X-Real-Ip': '192.168.0.94',
'Content-Type': 'application/json'}
)
I understand I can always get the request's body within the post() Tornado handler and inflate it manually, but that just sounds... dirty.
Finally, this is the curl call I use to upload the gzipped file:
curl --max-time 60 --silent --location --insecure \
--write-out "%{http_code}" --request POST \
--compressed \
--header "Content-Encoding:gzip" \
--header "Content-Type:application/json" \
--data-binary "$log_file_path.gz" \
"/input/log?ts=1389216192" \
--output /dev/null \
--trace-ascii "/tmp/curl_trace.log" \
--connect-timeout 30
The file in $log_file_path.gz is generated using gzip $log_file_path (I mean... is a regular Gzip compressed file)
Is this something doable? It sounds like something that should be pretty straight forward, but nopes...
If this is is something not doable through Nginx, an automated method in Tornado would work too (something more reliable and elegant that having me uncompressing files in the middle of a POST request's handler) Like... something like Django middlewares or something like that?
Thank you in advance!!

You're already calling json.loads() somewhere (Tornado doesn't decode json for you so the exception you're seeing (but did not quote) must be coming from your own code); why not just replace that with a method that examines the Content-Encoding and Content-Type headers and decodes appropriately?

I gave up trying to have Nginx or Tornado automatically expanding the body of the POST request, so I went with what Ben Darnell mentioned in his answer. I compress the file using gzip and POST it as a part of a Form (pretty much as if I were uploading a file).
I'm gonna post the bits of code that take care of it, just in case this helps someone else:
In the client (a bash script using curl):
The path (absolute) to the file to send is in the variable f. The variable TMP_DIR points to /tmp/, and SCRIPT_NAME contains the name of the bash script trying to perform the upload (namely uploader.sh)
zip_f_path="$TMP_DIR/$(basename ${f}).gz"
[[ -f "${zip_f_path}" ]] && rm -f "${zip_f_path}" &>/dev/null
gzip -c "$f" 1> "${zip_f_path}"
if [ $? -eq 0 ] && [[ -s "${zip_f_path}" ]]
then
response=$(curl --max-time 60 --silent --location --insecure \
--write-out "%{http_code}" --request POST \
"${url}" \
--output /dev/null \
--trace-ascii "${TMP_DIR}/${SCRIPT_NAME}_trace.log" \
--connect-timeout 30 \
--form "data=#${zip_f_path};type=application/x-gzip")
else
echo "Attempt to compress $f into $zip_f_path failed"
fi
In the server (in the Tornado handler):
try:
content_type = self.request.files['data'][0]['content_type']
if content_type == 'application/x-gzip':
gzip_decompressor = GzipDecompressor()
file_body = gzip_decompressor.decompress(
self.request.files['data'][0]['body'])
file_body += gzip_decompressor.flush()
else:
file_body = self.request.files['data'][0]['body']
except:
self.send_error(400)
logging.error('Failed to interpret data: %s',
self.request.files['data'])
return

Related

TLS 1.3 early data where to put $ssl_early_data

I've set ssl_early_data on; to my nginx.conf (inside http { }) and according to these commands,
echo -e "HEAD / HTTP/1.1\r\nHost: $host\r\nConnection: close\r\n\r\n" > request.txt
openssl s_client -connect example.tld:443 -tls1_3 -sess_out session.pem -ign_eof < request.txt
openssl s_client -connect example.tld:443 -tls1_3 -sess_in session.pem -early_data request.txt
it does work properly.
According to the nginx documentation (https://nginx.org/en/docs/http/ngx_http_ssl_module.html#ssl_early_data), it is recommended to set proxy_set_header Early-Data $ssl_early_data;.
My question is: Where do I set this? Right after ssl_early_data on;, still inside http { }?
You should pass Early-Data to your application. So you must have something like:
http {
...
# Enabling 0-RTT
ssl_early_data on;
...
server {
...
# Passing it to the upstream
proxy_set_header Early-Data $ssl_early_data;
}
}
Otherwise, you can render you application vulnerable to Replay Attacks: https://blog.trailofbits.com/2019/03/25/what-application-developers-need-to-know-about-tls-early-data-0rtt/

cURL vs MIME: POSTing a file

Question
I have written a very simple API using Flask, and I would like to upload a file to it using a POST command. I can easily make it work using cURL, but not so much using a logic app.
I have been using the Mozilla MIME Guide trying to construct the HTTP call, but I am not sure what to use in the header and body.
What I know is:
I would like to be able to send any file type, so I think I have to use the following:
Content-Type: application/octet-stream
Content-Disposition: attachment; filename="filename.xxx"
I have my file encoded with Base64, so I need to write that somehow, and place it in the body
I would like to use chunking. Does this make any difference?
My API
from flask import Flask, request, redirect
app = Flask(__name__)
#app.route('/', methods=['POST'])
def print_hello():
if request.files:
request.files['file'].save("/home/ebbemonster/cool_file.txt")
return "Hello World"
return "Goodbye World"
if __name__=="__main__":
app.run(host='0.0.0.0')
cURL
curl -X POST 13.81.62.87:5000 -F file=#GH019654.MP4
Logic App
So I figured out how to convert a cURL POST to a HTTP header/body.
Fetching header/body details
# Logging post call
curl -X POST XX.XX.XX.XX:5000 -F file=#GH019654.MP4 --trace-ascii post.log
# Fetching header
head -n 30 post.log
=> Send header, 216 bytes (0xd8)
0000: POST / HTTP/1.1
0011: Host: XX.XX.XX.XX:5000
0029: User-Agent: curl/7.58.0
0042: Accept: */*
004f: Content-Length: 300745456
006a: Content-Type: multipart/form-data; boundary=--------------------
00aa: ----ec1aab65fb2d68fd
00c0: Expect: 100-continue
# Fetching body
sed -n '18,25p' post.log
0000: --------------------------ec1aab65fb2d68fd
002c: Content-Disposition: form-data; name="file"; filename="GH019654.
006c: MP4"
0072: Content-Type: application/octet-stream
009a:
009c: ....ftypmp41 ...mp41....mdatGPRO#...HD7.01.01.70.00LAJ9022436601
00dc: 517...................................1...US.8'.f..C328132710684
011c: 1.HERO7 Black........................E.....1...US.8'.f..........
# Fetching end of body
tail -n 30 google.log
02c2: --------------------------ec1aab65fb2d68fd--
== Info: HTTP 1.0, assume close after body
Logic App Header/Body

Akka Http Segment in the route path is not working as expected

I have akka http app running using the code here and one of my routes is having segments in it.
When I tested the rest path with segment, I get the below error.
Request
curl -i -X POST \
-H "Content-Type:application/json" \
-d \
'{"tickets":2}' \
'http://localhost:5000/events/RHCP/tickets'
Response
HTTP/1.1 404 Not Found
Content-Length: 83
Content-Type: text/plain; charset=UTF-8
Date: Tue, 02 Jan 2018 11:59:38 GMT
Server: GoTicks.com REST API
The requested resource could not be found but may be available again in the future.
Is there any configuration missing or it is bug?
I think pathPrefix is matching eventsRoute and finds no POST directive
try this:
change
def routes: Route = eventRoute ~ eventsRoute ~ ticketsRoute
to
def routes: Route = ticketsRoute ~ eventsRoute ~ eventRoute

My CORS rule doesn't fix my CORS error

I have some CORS rules on my S3 bucket.
This is what it looks like:
<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
<AllowedOrigin>https://prod-myapp.herokuapp.com/</AllowedOrigin>
<AllowedMethod>POST</AllowedMethod>
<AllowedHeader>*</AllowedHeader>
</CORSRule>
<CORSRule>
<AllowedOrigin>http://prod-myapp.herokuapp.com/</AllowedOrigin>
<AllowedMethod>POST</AllowedMethod>
<AllowedHeader>*</AllowedHeader>
</CORSRule>
</CORSConfiguration>
When I am in my app, and I try to upload a file (aka...do a POST request) in my JS console, I get this error:
XMLHttpRequest cannot load https://myapp.s3.amazonaws.com/. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://prod-myapp.herokuapp.com' is therefore not allowed access. The response had HTTP status code 403.
I attempted to do a POST from my CLI and I got this:
$ curl -v -H "Origin: http://prod-myapp.herokuapp.com" -X POST https://myapp.s3.amazonaws.com
* Rebuilt URL to: https://myapp.s3.amazonaws.com/
* Trying XX.XXX.XX.153...
* Connected to myapp.s3.amazonaws.com (XX.XXX.XX.153) port 443 (#0)
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
* Server certificate: *.s3.amazonaws.com
* Server certificate: VeriSign Class 3 Secure Server CA - G3
* Server certificate: VeriSign Class 3 Public Primary Certification Authority - G5
> POST / HTTP/1.1
> Host: myapp.s3.amazonaws.com
> User-Agent: curl/7.43.0
> Accept: */*
> Origin: http://prod-myapp.herokuapp.com
>
< HTTP/1.1 412 Precondition Failed
< x-amz-request-id: SOME_ID
< x-amz-id-2: SOME_ID_2
< Content-Type: application/xml
< Transfer-Encoding: chunked
< Date: Thu, 17 Sep 2015 04:43:28 GMT
< Server: AmazonS3
<
<?xml version="1.0" encoding="UTF-8"?>
* Connection #0 to host myapp.s3.amazonaws.com left intact
<Error><Code>PreconditionFailed</Code><Message>At least one of the pre-conditions you specified did not hold</Message><Condition>Bucket POST must be of the enclosure-type multipart/form-data</Condition><RequestId>SOME_ID</RequestId><HostId>SOME_HOST_ID</HostId></Error>
I just added the CORS rule that applies to the domain I am trying from about 10 - 15 minutes ago. But I was under the impression that it should happen immediately.
Is there some remote cache that I need to bust to get my browser to work? I tried it both in normal mode and in Incognito Mode.
Also, based on the results from curl, it seems as if I am no longer getting an Access-Control-Allow-Origin header error, right? So, theoretically, it should be working in my browser.
Am I misreading what is happening at the command-line?
What else am I missing?
This is a slightly solution that what I have done.
I set up the policy in S3 to allow put content to bucket by only the restrict domain as referer
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AddPerm",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::myapp/*",
"Condition": {
"StringLike": {
"aws:Referer": "http://prod-myapp.herokuapp.com/*"
}
}
}
]
}
so you can test the PUT method by
curl -v -H "Referer: http://prod-myapp.herokuapp.com/index.php" -H "Content-Length: 0" -X PUT https://myapp.s3.amazonaws.com/testobject.jpg
The error message from curl says:
At least one of the pre-conditions you specified did not hold
Bucket POST must be of the enclosure-type multipart/form-data
You can make curl use the content-type "multipart/form-data" by using the -F option (e.g. "-F name=value"). You can use this multiple times to add all of the form parameters that you need. This page lists the parameters expected by S3:
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPOST.html
Specifying "file" and "key" gets you to the point where it fails with an "Access Denied" error. I assume that you've set it to be private, so you probably need the "access-key-id" or similar to get beyond this point.
curl -v -H "Origin: http://prod-myapp.herokuapp.com" -X POST \
https://myapp.s3.amazonaws.com -F key=wibble -F file=value
Also, based on the results from curl, it seems as if I am no longer getting an Access-Control-Allow-Origin header error, right? So, theoretically, it should be working in my browser.
It seems actually to make no difference whether you specify the -H origin option, so I'm not sure if your CORS setting is actually having any effect.
Check which requests you send to the server, before POST request can be sent OPTIONS request (chrome do it)
I got Precondition Failed error for CORS because only POST method was allowed, allowing OPTIONS method resolved this problem.

tclhttpd.3.5.1 shows page source (Windows)

I am playing with tclhttpd web server and found a strange error
I start tclhttpd at default port 8015
Open firefox and navigate to http://localhost:8015
I see source of my index.html file instead of web page.
index.html is simple ( < and > are skipped ):
html
head
title
TEST
/title
/head
body
H1 TEST HEADER /H1
/body
/html
Any ideas?
I have checked with the curl:
* About to connect() to localhost port 8015 (#0)
* Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 8015 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.21.3 (i386-pc-win32) libcurl/7.21.3
OpenSSL/0.9.8q zlib/1.2.5
> Host: localhost:8015
> Accept: */*
Server Response
HTTP/1.1 200 Data follows
Date: Thu, 12 Apr 2012 14:16:47 GMT
Server: Tcl-Webserver/3.5.1 May 27, 2004
Content-Type: text/plain
Content-Length: 130
Last-Modified: Thu, 12 Apr 2012 14:14:30 GMT
So, tclhttpd returns text/plain instead of text/html
Linux case
I have tried to check what would happened with Linux.
As tclkttpd is wrapped in kit I made the same test under Linux.
It looks like everything works fine.
curl -G -v localhost:8015
* About to connect() to localhost port 8015 (#0)
* Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 8015 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.21.7 (i686-pc-linux-gnu) libcurl/7.21.7
OpenSSL/1.0.0d zlib/1.2.5 libssh2/1.2.7
> Host: localhost:8015
> Accept: */*
Server response
HTTP/1.1 200 Data follows
Date: Thu, 12 Apr 2012 17:25:29 GMT
Server: Tcl-Webserver/3.5.1 May 27, 2004
Content-Type: text/html
Content-Length: 125
Last-Modified: Thu, 12 Apr 2012 17:14:04 GMT
Deep research
I have modified some of the source files, to dump more information:
proc Mtype {path} {
global MimeType
set ext [string tolower [file extension $path]]
Stderr "Mtype: path $path ext $ext"
if {[info exist MimeType($ext)]} {
Stderr "MimeType($ext) exists."
Stderr "Print MimeType "
set lst [lsort [array names MimeType]]
foreach {i} $lst {
Stderr " $i $MimeType($i)"
}
return $MimeType($ext)
} else {
Stderr "Mimetype not found. ext $ext"
Stderr "Print MimeType "
set lst [lsort [array names MimeType]]
foreach {i} $lst {
Stderr " $i $MimeType($i)"
}
return text/plain
}
}
When I query http://localhost:8015
I got following output:
Linux
Mtype: path /home/a2/src/tcl/tcl_www/doc/index.html ext .html
MimeType(.html) exists.
Print MimeType
text/plain
.ai application/postscript
.aif audio/x-aiff
.aifc audio/x-aiff
....
.hqx application/mac-binhex40
.htm text/html
.html text/html
.i86pc application/octet-stream
...
Default cmd Doc_text/html
Windows
Look for Tcl proc whos name match the MIME Content-Type
Mtype: path M:/apr/tcl_www/doc/index.html ext .html
Mimetype not found. ext .html
Print MimeType
.man application/x-doctool
Mtype M:/apr/tcl_www/doc/index.html returns Doc_text/plain
So it look like there are troubles with reading mime.types
You have to inspect the traffic tclhttpd generates to see if it really says in the HTTP headers of its response that the payload type is "text/html".
Use Fiddler, sockspy, Microsoft Network Monitor or Wireshark.
Also there are lighter-weight debugging tools for browsers. I'm pretty sure Firebug wold show you this information, and even simple Live HTTP Headers can do that.
IE also has some debugging addon (akin to Firebug) which I'm lazy to google for.
Problem found.
httpdthread.tcl
# Core modules
package require httpd ;# Protocol stack
package require httpd::version ;# Version number
package require httpd::url ;# URL dispatching
package require httpd::mtype ;# Mime types
# Search for mime.types either right in Config(lib), or down
# one level in the installed tclhttpd subdirectory
foreach path [list \
[file join $Config(lib) mime.types] \
[glob -nocomplain [file join $Config(lib) tclhttpd* mime.types]] \
] {
if {[llength $path] > 0} {
set path [lindex $path 0]
}
if {[file exists $path]} {
Mtype_ReadTypes $path
break
}
}
This code checks for the mime.types file under following paths:
- /home/a2/..../tclhttpd3.5.1.kit/bin/../lib
- /home/a2/..../tclhttpd3.5.1.kit/bin/../lib/tclhttpd*/mime.types
Linux
glob -nocomplain /home/..../tclhttpd3.5.1.kit/bin/../lib/tclhttpd*/mime.types]
works fine and returns
/home/....tclhttpd3.5.1.kit/bin/../lib/tclhttpd3.5.1/mime.types
Windows
glob -nocomplain /home/..../tclhttpd3.5.1.kit/bin/../lib/tclhttpd*/mime.types]
failed.
I have tried different masks:
tclhttpd*
tclhttpd*.*
tclhttpd*..
nothing is working
Finally I have modified the code:
foreach path [list \
[file join $Config(lib) mime.types] \
[glob -nocomplain [file join $Config(lib) tclhttpd* mime.types]] \
[file join $Config(lib) [lindex [Httpd_Version] 0] mime.types]
] {
if {[llength $path] > 0} {
set path [lindex $path 0]
}
if {[file exists $path]} {
Mtype_ReadTypes $path
break
}
}
string
[file join $Config(lib) [lindex [Httpd_Version] 0] mime.types]
generate the path:
/home/..../tclhttpd3.5.1.kit/bin/../lib/tclhttpd3.5.1/mime.types
Now tclhttpd could find mime.types under windows.
And it looks like that problem happened only if glob is searching inside the statkit file.
I have checked with the fresh tclkitsh and tclhttpd
tclkitsh-8.5.9-win32.upx.exe ( http://code.google.com/p/tclkit/downloads/list )
tclhttpd5.3.1.kit
Everything works.
If I use my "old" version of tclkitsh-win32.upx.exe
I receive text/plain instead of text/html
So it looks like there is a bug in my old wrapped interpretor, that leads to the problem with not reading mime.types.
I think tclhttpd automatically uses text/html if the file ends with .html. You properly should read this wiki entry on wiki.tcl.tk about mime-type.
Tried it myself with a index.html and it worked. Than I created an index.tml and it worked.
[html::description "Test"]
[Doc_Dynamic]
[html::head "hello"]
<body>
test
</body>
</html>
Here is the header part of the response:
HTTP/1.1 200 Data follows
Content-Length: 137
Date: Thu, 12 Apr 2012 16:47:53 GMT
Server: Tcl-Webserver/3.5.1 May 27, 2004
Connection: Close
Content-Type: text/html
The reason for curl getting text/plain instead of text/html might be that it passes */* in the Accept header of its HTTP requests, while typically browsers place some more elaborate construct thereā€”for instance, my FF 11.0 uses text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8.
One might notice that in the FF's case text/html (along with XHTML and XML types) is assigned a higher preference (0.9) and everything else (*/*) is assigned a lower preference (0.8).
A conformant HTTP server should attempt to serve the requested resource in a format indicated as preferred in the client's request.
Probably that might also shed some light on that original IE vs FF behavioral difference.