Background Intelligent Transfer Service and Amazon S3 - http-headers

I'm using SharpBITS to download file from AmazonS3.
> // Create new download job. BitsJob
> job = this._bitsManager.CreateJob(jobName, JobType.Download);
> // Add file to job.
> job.AddFile(downloadFile.RemoteUrl, downloadFile.LocalDestination);
> // Resume
> job.Resume();
It works for files which do no need authentication. However as soon as I add authentication query string for AmazonS3 file request the response from server is http state 403 -unauthorized. Url works file in browser.
Here is the HTTP request from BIT service:
HEAD /mybucket/6a66aeba-0acf-11df-aff6-7d44dc82f95a-000001/5809b987-0f65-11df-9942-f2c504c2c389/v10/summary.doc?AWSAccessKeyId=AAAAZ5SQ76RPQQAAAAA&Expires=1265489615&Signature=VboaRsOCMWWO7VparK3Z0SWE%2FiQ%3D HTTP/1.1
Accept: */*
Accept-Encoding: identity
User-Agent: Microsoft BITS/7.5
Connection: Keep-Alive
Host: s3.amazonaws.com
The only difference between the one from a web browser is the request type. Firefox makes a GET request and BITS makes a HEAD request. Are there any issues with Amazon S3 HEAD requests and query string authentication?
Regards, Blaz

You are probably right that a proxy is the only way around this. BITS uses the HEAD request to get a content length and decide whether or not it wants to chunk the file download. It then does the GET request to actually retrieve the file - sometimes as a whole if the file is small enough, otherwise with range headers.
If you can use a proxy or some other trick to give it any kind of response to the HEAD request, it should get unstuck. Even if the HEAD request is faked with a fictitious content length, BITS will move on to a GET. You may see duplicate GET requests in a case like this, because if the first GET request returns a content length longer than the original HEAD request, BITS may decide "oh crap, I better chunk this after all."
Given that, I'm kind of surprised it's not smart enough to recover from a 403 error on the HEAD request and still move on to the GET. What is the actual behaviour of the job? Have you tried watching it with bitsadmin /monitor? If the job is sitting in a transient error state, it may do that for around 20 mins and then ultimately recover.

Before beginning a download, BITS sends an HTTP HEAD request to the server in order to figure out the remote file's size, timestamp, etc. This is especially important for BranchCache-based BITS transfers and is the reason why server-side HTTP HEAD support is listed as an HTTP requirement for BITS downloads.
That being said, BITS bypasses the HTTP HEAD request phase, issuing an HTTP GET request right away, if either of the following conditions is true:
The BITS job is configured with the BITS_JOB_PROPERTY_DYNAMIC_CONTENT flag.
BranchCache is disabled AND the BITS job contains a single file.
Workaround (1) is the most appropriate, since it doesn't affect other BITS transfers in the system.
For workaround (2), BranchCache can be disabled through BITS' DisableBranchCache group policy. You'll need to do "gpupdate" from an elevated command prompt after making any Group Policy changes, or it will take ~90 minutes for the changes to take effect.

Related

POST with bodies larger than 64k to Express.js failing to process

I'm attempting to post some json to an express.js endpoint. If the size of the json is less than 64k, then it succeeds just fine. If it exceeds 64k, the request is never completely received by the server. The problem only occurs when running express directly locally. When running on heroku, the request proceeds without issue.
The problem is seen across MacOS, Linux (ubuntu 19), and Windows. It is present when using Chrome, Firefox, or Safari.
When I make requests using postman, the request fails.
If I make the request using curl, the request succeeds.
If I make the request after artificially throttling chrome to "slow 3G" levels in network settings, the request succeeds.
I've traced through express and discovered that the problem appears when attempting to parse the body. The request gets passed to body-parser.json() which in turns called getRawBody to get the Buffer from the request.
getRawBody is processing the incoming request stream and converting it into a buffer. It receives the first chunk of the request just fine, but never receives the second chunk. Eventually the request continues parsing with an empty buffer.
The size limit on bodyparser is set to 100mb, so it is not the problem. getRawBody never returns, so body-parser never gets a crack at it.
If I'm logging the events from getRawBody I can see the first chunk come in, but no other events are fired.
Watching wireshark logs, all the data is getting sent over the wire. But it looks like for some reason, express is not receiving all the chunks. I think it's got to be due to how express is processing the packets, have no idea how to proceed.
In the off chance anyone in the future is running into the same thing: The root problem in this case was that we were overwriting req.socket with our socket.io client. req.socket is used by node internally to transfer data. We were overwriting such that the first packets would get through, but not subsequent packets. So if the request was processed sufficiently quickly, all was well.
tl;dr: Don't overwrite req.socket.

Cache an Express REST route on browser side

is there a way to sent the browser, that he should cache the response/body (json) for minutes/hours/days?
I want to reduce server request and one a specific route where the content change very rarely (probably once a week). This will reduce the traffic on the client side aswell.
i've tried with:
res.set('Cache-Control', 'public, max-age=6000');
res.set('Expires', new Date(Date.now() + 60000).toUTCString());
res.set('Prgama', 'cache');
but chrome ignores that, but maybe that's wrong. Im clueless and google didn't help yet.
Final result should be like (Chrome Network Tab):
First Client request: Status Code 200 OK
Second Client request: STatus Code OK (from disk cache)
etc.
After the the time expires, again the Status Code 200 OK (WITHOUT from disk cache) like static files (images) works.
I only find server side caching, but this won't reduce GET requests to the Backend.

Seemingly legit requests generating "400 bad request"

So I've got a problem where a small percentage of incoming requests are resulting in "400 bad request" errors and I could really use some input. At first I thought they were just caused by malicious spiders, scrapers, etc. but they seem to be legitimate requests.
I'm running Apache 2.2.15 and mod_perl2.
The first thing I did was turn on mod_logio and interestingly enough, for every request where this happens the request headers are between 8000-9000 bytes, whereas with most requests it's under 1000. Hmm.
There are a lot of cookies being set, and it's happening across all browsers and operating systems, so I assumed it had to be related to bad or "corrupted" cookies somehow - but it's not.
I added \"%{Cookie}i\" to my LogFormat directive hoping that would provide some clues, but as it turns out half the time the 400 error is returned the client doesn't even have a cookie. Darn.
Next I fired up mod_log_forensic hoping to be able to see ALL the request headers, but as luck would have it nothing is logged when it happens. I guess Apache is returning the 400 error before the forensic module gets to do its logging?
By the way, when this happens I see this in the error log:
request failed: error reading the headers
To me this says Apache doesn't like something about the raw incoming request, rather than a problem with our rewriting, etc. Or am I misunderstanding the error?
I'm at a loss where to go from here. Is there some other way that I can easily see all the request headers? I feel like that's the only thing that will possibly provide a clue as to what's going on.
We set a lot of cookies and it turns out we just needed to bump up LimitRequestFieldSize which defaults to 8190. Hope this helps someone else some day...

black screen and error 400 bad request

I was trying telnet into a Web server and send a multiline request message. I have to include in the request message the If-modified-since. I made settings for Win7.
For instance when I type telnet edition.cnn.com 80 on my command prompt, it opens a black empty screen, I don't see any thing that I type.
Then I wrote this line on the black screen GET pageName HTTP/1.0, it returned 400 Bad Request Error and says connection closed. What should I do? I used get pagename for an example.
If you want to use e.g. a telnet client to manually get web-pages, you have to remember the format of a HTTP request header:
GET pageName HTTP/1.0
additional header
additional header
Note that the last line is an empty line. You might also need HTTP/1.1 for certain headers to make sense. Please read a HTTP specification for more information and what headers are standard.
The "black screen" is simply the telnet program running in a command window.
400 is an HTTP error code, meaning you did succesfully send a message to the server, it was just invalid HTTP (probably a simple typo)
The black screen and not seeing what you type is 'normal' behavior for telnet (at least I always had that).
If you use a tool like Putty you can see what you type, so it becomes easier to do this sort of thing, and spot your typo.
See the tutorial here: http://www.hellboundhackers.org/articles/571-spoofing-http-requests-with-putty.html
Hope this helps you.
If you need to anything other than basic interaction with the web server I'd suggesting using a tool specifically made for the job, for example cURL. It will allow you set headers etc.
curl -H "If-Modified-Since:04-Nov-2012 11:59:00 GMT" http://host.com/21838937.asp

HTTP headers from some clients have characters randomly replaced

Doing web traffic and log analysis, but there are a lot of malformed headers being passed from clients. These have characters transposed or replaced with "x"'s.
Does anyone know where they come from or why?
Is this some kind of attempt at security, or something more nefarious?
Examples:
xroxy-connection: Keep-Alive
cneoction: close
nncoection: close
ocpry-connection: Keep-Alive
pxyro-connection: close
proxy-~~~~~~~~~~: ~~~~~~~~~~
x-xorwarded-for: 000.00.00.000
Referer: http://www.example.xom/nxws/article/2009-1x-21/?cid=4xxx00x2-0x60x3x0
Check out the Missed Cneonctions section of Fun With HTTP Headers. The author thinks:
I now believe this is something done
by a hackish hardware load balancer
trying to “remove” the connection
close header when proxying for an
internal server. That way, the
connection can be held open and images
can be transmitted through the same
TCP connection, while the backend web
server doesn’t need to be modified at
all.
A Google search for "xroxy-connection" turns up a security advisory on Kerio Winroute Firewall which replaces the first character in a header with an X for some reason.
The letter transposition is probably a similar proxying issue, if I had to take a guess.