How does Varnish 3.0 deal with gzip? - reverse-proxy

Varnish will hold compressed obj in Cache, but when client don't support gzip.
How does Varnish deal with it? Does it hold the other uncompressed obj in Cache too, or decompress the compressed obj?

Varnish 3.0 supports Gzip as mentioned in the "Compression" chapter of the official tutorial. All HTTP requests to the backend will include a request for gzipped content, so by default all objects are stored in memory gzipped.
If the backend does not support gzip, you can ask Varnish to compress the response before storing it by setting beresp.do_gzip in vcl_fetch.
If a request comes in from a client that does not support gzip, Varnish will gunzip the stored object before delivering it.

Varnish 2.x does not compress or de-compress: if the client supports Gzip, it will hold a gzipped version of the page in cache. If the client does not support it, another copy will be placed in the cache for the plain-content, without compression.
So yes: depending on the Accept-Encoding header (which should be normalized), multiple version of a page will be held in cache for each supported compression algoritme.
*Updated: editted for clarity, the above is for Varnish 2.0 or 2.1 only. Varnish 3.x supports gzip, as explained above.

Related

How to test if your REST API service supports gzip data?

I am testing a REST API service that supports json and gzip data. I have tested the response from the json request payload. How to test if it handles gzip data properly? Please help?
Basically the client sends a request with accept-encoding as gzip in the header, service handles the compression and the client takes care of the decompression. But I need a way to confirm that service indeed handles compressed gzip data
gzip is basically a header + deflate + a checksum.
Gatling will retain the original Content-Encoding response header so you can check if the payload was gzipped, and then trust the gzip codec to do the checksum verification and throw an error if the payload was malformed.

CkRest.AddHeader function does not add a header using Chilkat C++ ("Content-MD5" header using fullRequestBinary PUT)

We are using Chilkat 9.5.0.80 C++ library.
There is a certain HTTP header we cannot add to our requests: "Content-MD5". When we add this header like this:
m_ckRest.AddHeader("Content-MD5", "any-value-here");
and examine the resulting request*, the "Content-MD5" header is NOT present.
However, when we add a header of a different name:
m_ckRest.AddHeader("Content-Type", "application/octet-stream");
... the resulting request DOES contain that header. We are using the "fullRequestBinary" method, for example:
const char* responseStrPtr = m_ckRest.fullRequestBinary( "PUT", encodedObjectName.c_str(), ckByteDataBuffer);
* We are examining our requests using a Proxy (using "Fiddler" as an http proxy in between us and Amazon S3 for example to test the upload of a "part" in a multipart AWS S3 upload) and in every attempt, the "Content-MD5" header is NOT present, while other headers are present.
Is this a bug? We found an old forum post from 2013 referencing a very similar sounding problem: http://www.chilkatforum.com/questions/2901/addheader-range-does-not-appear-to-be-effective Does Chilkat remove or ignore our attempt to add a "Content-MD5" header? Is this bug fixed in a version newer than the one we are using? Is there a workaround? Here is an example of the headers in a PUT request:
PUT https://our-bucket.s3.us-west-1.amazonaws.com/somefile?partNumber=4&uploadId=tJJYIXdxG_7X8elzSJrKt32A_rH46Y0Yk1vyzZgwxpvmK5uCrcE82k_F9UmytVHWuxXfc6tX5o3w.SRnnYcD7VBskcLrr0xC13bHHVDx62iGGQ3eIzkv5J5d1F4_DkcW HTTP/1.1
Content-Length: 5266235
x-amz-date: 20200921T201943Z
x-amz-content-sha256: 90fa8fc564dd558d0c2eac92e367d94101f4ca9570c970795b9fdb2aa96d6666
Host: our-bucket.s3.us-west-1.amazonaws.com
Content-Type: application/octet-stream
Date: Mon, 21 Sep 2020 20:19:43 GMT
Authorization: AWS4-HMAC-SHA256 Credential=AKIAIBYS55OSD2FIOBFUS/20200921/us-west-1/s3/aws4_request,SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date,Signature=8ea74cb7769d8e158e5ccc0604cc2cdb096703b10c3c8d9323d0746debbdUUU
In correspondence with Chilkat support, turns out that Chilkat versions 9.5.0.80 and 9.5.0.83 intentionally remove the Content-MD5 header when authenticating using AWS Signature V4. Instead, Chilkat calculates the SHA256 hash and places it in the x-amz-content-sha256 (if authenticating using older AWS signature V2, it calculates Content-MD5 I'm told) So, unlike the comment from #Chilkat Software, this has not been fixed in a later version as of the writing of that comment, and the removal is intentional.
This is not terrible, but it stems from the misunderstanding that a SHA-256 hash of the content is necessary to construct a valid AWS Signature V4 for authentication, when in fact that is not the case. While SHA256 is perfectly adequate for content verification, it is also wasteful in comparison to MD5 for content verification.
The AWS C++ SDK itself does not use a SHA-256 hash in the x-amz-content-sha256 header when uploading a part. I've confirmed it uses: x-amz-content-sha256:UNSIGNED-PAYLOAD and instead uses the less "costly" MD5 hash, and puts it in the Content-MD5 header (see AWS documentation here https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html)
Unsigned payload option – You include the literal string
UNSIGNED-PAYLOAD when constructing a canonical request, and set the
same value as the x-amz-content-sha256 header value when sending the
request to Amazon S3
Here is an example of an Amazon AWS UploadPart request using Content-MD5 for conent verification, and NOT using SHA256 for signing the request (captured from request using AWS SDK for C++):
PUT https://mybucket.s3.us-west-1.amazonaws.com/somefile.mfs01?partNumber=1&uploadId=6CHL6tPKFcRSoxD4iysjKMgQCNfcFAt87bn4fsduV1YI5_aFIz9e36BxFURH_iEX8EChUtQm06qT9oyIUDbAnA.2M.novpBBKsnGl_NqNvVllQ7L1VK6x1PiLlqq46tH HTTP/1.1
Cache-Control: no-cache
Connection: Keep-Alive
Pragma: no-cache
Content-Type: binary/octet-stream
Content-MD5: PV204S0m8zJY8zu9Q3EF+w==
Accept: */*
Authorization: AWS4-HMAC-SHA256 Credential=AKIAIBYS55OSD2FOBFUSC/20200923/us-west-1/s3/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content-length;content-md5;content-type;host;x-amz-content-sha256;x-amz-date, Signature=d013028d77e45f3dcce5f46f3fb53cdeeb3c9cfbd931371e69a9925047e61cd3
Host: nuix-nov-dev.s3.us-west-1.amazonaws.com
User-Agent: aws-sdk-cpp/1.7.333 Windows/10.0.19041.329 x86 MSVC/1927
amz-sdk-invocation-id: E57D09A7-B5E7-4E2A-8B2D-B493147F06D7
amz-sdk-request: attempt=1
x-amz-content-sha256: UNSIGNED-PAYLOAD
x-amz-date: 20200923T212738Z
Content-Length: 5242880
Chilkat gave us a new "beta" build that allows us to specify the Content-MD5 header even for AWS Signature V4 and it won't remove it, however, it's in addition to the automatically calculated SHA-256 x-amz-content-sha256 so that's doubling the hashing unnecessarily, and would be better to be able to specify UNSIGNED-PAYLOAD for purposes of the AWS signature.
If there is a content mismatch error with the Content-MD5 value, AWS returns this (with a status 400):
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>InvalidDigest</Code>
<Message>The Content-MD5 you specified was invalid.</Message>
<Content-MD5>thisisbad</Content-MD5>
<RequestId>8274DC9566D4AAA8</RequestId>
<HostId>H6kSy4cl+54nMon1Hq6AGjmTX/MfTVMQQr8vEVNXUnPlfMtIt8HPdObfusckhBpwpG/CJ6ORWv16c=</HostId>
</Error>
If there is a content mismatch with x-amz-content-sha256 AWS returns the following error, which I had difficulty finding on the web, and is slightly different so pasting here (also status 400):
Status:400 : AWSCode: XAmzContentSHA256Mismatch : AWSMessage: The provided 'x-amz-content-sha256' header does not match what was computed.
This problem should have already been fixed in a later version of Chilkat.

Terrible Apache Bench results on Custom CMS

Please note: This is not a complain about a shoddy CMS.
Just toying with Apache Bench and got terrible results with our custom CMS, more exactly i got:
Requests per second: 0.37 [#/sec] (mean)
When i run another test with a plain php file i got:
Requests per second: 4786.07 [#/sec] (mean)
Another test with a previous version of the CMS:
Requests per second: 6068.66 [#/sec] (mean)
The website(s) are working fine, no problems detected, Google's Webmaster Tools reports our sites as faster than 80% of the pages which is fine, i think.
The test was:
ab -t 30 -c 10 http://example.com/
Maybe some kind of Apache problem? Bad .htaccess config, or similar?
Update:
Just ran a simple test with sockets and the results are similar. Page loads very, very slowly. If i ran my script with another website everything is fine.
Also, there's a small hint about a chunk length problem. (Bad Apache Headers, or line endings?)
The site is gzipped, and when verbose logging turned on, i see these lines in the response:
LOG: Response code = 200
LOG: header received:
HTTP/1.1 200 OK
Date: Tue, 04 Oct 2011 13:10:49 GMT
Server: Apache
Set-Cookie: PHPSESSID=ibnfoqir9fee2koirfl5mhm633; path=/
Expires: Sat, 26 Jul 1997 05:00:00 GMT
Cache-Control: no-store, no-cache, must-revalidate
Pragma: no-cache
Cache-Control: post-check=0, pre-check=0
Vary: Accept-Encoding
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
2ef6
Always at the same place, in the middle of the HTML-source, then <!DOCTYPE HTML> again.
Please, help.
Update #2:
Just checked my HTTP headers with Rex Swain's HTTP Viewer and got these results:
HTTP/1.1·200·OK(CR)(LF)
Date:·Wed,·05·Oct·2011·08:33:51·GMT(CR)(LF)
Server:·Apache(CR)(LF)
Set-Cookie:·PHPSESSID=n88g3qcvv9p6irm1fo0qfse8m2;·path=/(CR)(LF)
Expires:·Sat,·26·Jul·1997·05:00:00·GMT(CR)(LF)
Cache-Control:·no-store,·no-cache,·must-revalidate(CR)(LF)
Pragma:·no-cache(CR)(LF)
Cache-Control:·post-check=0,·pre-check=0(CR)(LF)
Vary:·Accept-Encoding(CR)(LF)
Connection:·close(CR)(LF)
Transfer-Encoding:·chunked(CR)(LF)
Content-Type:·text/html;·charset=UTF-8(CR)(LF)
(CR)(LF)
Do you notice anything unusual?
If it works well with ordinary web browsers (as you mentioned in the comments) the CMS handle the requests from Apache Benchmark differently.
A quick checklist:
AFAIK Apache Benchmark just send simple requests without any cookie handling, so try to set -C with a valid cookie (copy the values from a web browser).
Try to send exactly the same headers to the CMS as the web browser sends. Save a dump of a valid request with netcat, HttpFox or a packet sniffer and set the missing headers with -H.
Profile the CMS on the server while you're sending to it a request with Apache Benchmark. Maybe you found the bottleneck. Two poor man's error_log calls with a timestamp in the first and the last line of the index.php (or the tested script's entry point) could show how fast is the PHP script and help to calculate the overhead of the Apache HTTP Server and network.
If you run socket tests and browser tests from different machines it's could be a DNS issue (turn off HostnameLookups in Apache). Try to run them from the same machine.
Try ab -k ... or ab -H "Connection: close" ....
I guess the CMS does some costly initialization when it initializes the session and it's happens when it processes the first request. Since Apache Benchmark does not send the cookies back the CMS it creates a new session for every request and it's the cause of the slow answers.
A second guess is that the CMS handle the incoming http headers differently and the headers which was sent (or the lack of them) by Apache Benchmark trigger some costly/slow processing. It looks more appropriate since the report of the Google's Webmaster Tools.
Apache Benchmark sends HTTP 1.0 request, for example:
GET / HTTP/1.0
Host: localhost:9100
User-Agent: ApacheBench/2.3
Accept: */*
It looks to me that your server does not send any http header about Keep-Alive settings but it assumes that the client uses keep-alive when the client uses HTTP 1.0. It's not an RFC compliant behaviour:
From RFC 2616, 19.6.2 Compatibility with HTTP/1.0 Persistent Connections:
Some clients and servers might wish to be compatible with some
previous implementations of persistent connections in HTTP/1.0
clients and servers. Persistent connections in HTTP/1.0 are
explicitly negotiated as they are not the default behavior.
By default Apache Benchmark doesn't use keep-alive so it waits when the response arrives for the closing of the socket. The server closes it after 15 seconds idle. Downloading the main page with wget also takes 15 seconds. Wget also uses HTTP 1.0 in the request.
I think it's a bug in the PHP code of the CMS since ab works well on the same server with a plain php file. Anyway, you can workaround it with using keep-alive connections (-k):
ab -k -t 30 -c 10 http://example.com/
or with explicitly disabling persistent connections:
ab -H "Connection: close" -t 30 -c 10 http://example.com/
but it's still a server side issue and your original ab commands is right.
Please note that this bug probably affects only HTTP 1.0 clients (like Apache Benchmark, wget) and clients with regular browsers will not notice it.

Enable GlassFish Compression

How to enable glass fish compression? I enabled compression in http-lister properties
but no changed response
Login to admin console: localhost:4848
Go to the Network Config > Network Listener
Select the listener for which you want to enable gzip > HTTP Tab
Check if you have min compression size set. I don't remember from the top of my head if glassfish has a default min compression size. Pretty much if the resource does not exceed this value in size, it won't be compressed.
Check if you have correct compressableMimeType set. application/xml is not the same as text/xml, even if they're really both XMLs.
Responses to HTTP requests in version 1.0 are not compressed. You must send your requests in HTTP 1.1 to get gzipped responses from your glassfish server.
More over, you must add the header "Accept-Encoding: gzip" in your http requests.

How to get JMeter to request gzipped content?

My website serves gzipped content. I verified with Firebug and YSlow. However, JMeter does not request the gzipped content. Therefore, it gets all uncompressed content. As a result, my test cases take much longer (6-10x longer) than they do in reality.
How can I make JMeter request gzipped content from a website?
FYI, I am using the latest stable build: JMeter 2.3.4 r785646.
Add an HTTP Header Manager to the Thread Group in your Test Plan.
Add the name-value pair:
Name: Accept-Encoding
Value: gzip,deflate,sdch
This will ensure that all JMeter requests use HTTP compression.
To verify:
Add this Listener to the Thread Group: View the Results Tree
Run your test plan
View the Sampler result tab for one of the webpages.
Do you see these name-value pairs?
Content-Encoding: gzip
Vary: Accept-Encoding
Transfer-Encoding: chunked
If yes, then you've successfully setup gzip requests in JMeter. Congrats.
Another way to verify is in the Summary Report stats:
You'll see that the Avg Bytes values are the uncompressed sizes. That's OK. For whatever reason, that's how JMeter works. Pay attention to the KB/sec column. That will show an improvement of 6-10x with gzip enabled.