I use Xampp control panel to host an Apache server. I'm testing how to run mhtml files on a server. So far it only shows me raw text when visiting it on server side. I looked around on how to make it work but the solutions I got (for example, adding "AddType message/rfc822 .mhtml .mht" in http conf file) just proceeds to download the file instead of reading it.
Here's a sample of the initial block of the mhtml file:
From: <Saved by Blink>
Snapshot-Content-Location: https://www.instagram.com/jo0sef/
Subject: =?utf-8?Q?Yousef=20AlSudais=20=D9=8A=D9=88=D8=B3=D9=81=20=D8=A7=D9=84=D8?=
=?utf-8?Q?=B3=D8=AF=D9=8A=D8=B3=20(#jo0sef)=20=E2=80=A2=20Instagram=20pho?=
=?utf-8?Q?tos=20and=20videos?=
Date: Tue, 16 Feb 2021 08:18:55 -0000
MIME-Version: 1.0
Content-Type: multipart/related;
type="text/html";
boundary="----MultipartBoundary--c1Osf7aCebmaZjjAXk0gfl7cuYp300joTDYRFPKyLF----"
------MultipartBoundary--c1Osf7aCebmaZjjAXk0gfl7cuYp300joTDYRFPKyLF----
Content-Type: text/html
Content-ID: <frame-AD05338F6D10E72FA62E6C2E3D66903E#mhtml.blink>
Content-Transfer-Encoding: quoted-printable
Content-Location: https://www.instagram.com/jo0sef/
Related
does anyone know how to serve a web bundle so that it loads, rather than just downloading as a file?
Some disambiguation: There is a format called WebPackage (not to be confused with webpack), also called a Web Bundle. Files typically have the .wbn suffix. It contains html and js files and can be used to view websites offline. Useful for e.g. archiving websites or making websites that work well with intermittent network access. Download the file once, and you have all the assets you need for at last basic operation of the site.
The standard on how to serve a .wbn file is here:
https://wicg.github.io/webpackage/draft-yasskin-wpack-bundled-exchanges.html
However when I add the required headers in the web server, the .wbn file is just downloaded. If I drag the downloaded file onto my browser (google-chrome), the file is displayed as the website it contains, so unless there is some very subtle bug in there I believe that the format of the bundle is OK.
Here is a sample request:
Request URL: http://localhost/bundle/www-signed.wbn
Request Method: GET
Status Code: 200 OK
Remote Address: [::1]:80
Referrer Policy: strict-origin-when-cross-origin
and the server response:
Accept-Ranges: bytes
Connection: keep-alive
Content-Length: 4300
Content-Type: application/webbundle <-- Required by the standard
Date: Thu, 02 Sep 2021 12:00:24 GMT
ETag: "612ef7cb-10cc"
Last-Modified: Wed, 01 Sep 2021 03:47:23 GMT
Server: nginx/1.18.0 (Ubuntu)
X-Content-Type-Options: nosniff <-- required by the standard
If anyone has this working on a website or knows how to do it, I would love to have a look.
I had the same problem that the wbn file was just downloaded instead of executed.
I had to enable the web bundles feature even though my chrome version is 96+
Some websites provide pdf files for viewing but I can't download such pdf files with wget.
Calling the website in question from my browser views the pdf:
https://www.lokalmatador.de/epaper/ausgabe/gemeinderundschau-muehlhausen-14-2021/
But using the following code I only get a pdf file with 0 lenght.
wget --content-disposition -nd https://www.lokalmatador.de/epaper/ausgabe/gemeinderundschau-muehlhausen-14-2021/
I tried some combinations with saving and loading cookies and referer but nothing works.
At this point I'm just curious what is happening and why wget is not fetching anything except maybe an empty index.html.
When I was looking at server response, it was saying the content length was 0.
--2021-04-17 14:59:35-- https://www.lokalmatador.de/epaper/ausgabe/gemeinderundschau-muehlhausen-14-2021/
Resolving www.lokalmatador.de (www.lokalmatador.de)... 37.202.6.70
Connecting to www.lokalmatador.de (www.lokalmatador.de)|37.202.6.70|:443... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Date: Sat, 17 Apr 2021 13:59:36 GMT
Server: Apache
Set-Cookie: fe_typo_user=477e8a1d2b3dd74bc5b6b408a6d74edd; expires=Mon, 17-May-2021 13:59:36 GMT; Max-Age=2592000; path=/; domain=.lokalmatador.de; httponly; samesite=lax
Upgrade: h2,h2c
Connection: Upgrade, Keep-Alive
Content-Length: Array
Cache-Control: max-age=2592000
Expires: Mon, 17 May 2021 13:59:36 GMT
X-UA-Compatible: IE=edge
X-Content-Type-Options: nosniff
Keep-Alive: timeout=5, max=100
Content-Type: application/pdf
Length: 0 [application/pdf]
Remote file exists but does not contain any link -- not retrieving.
So looked at the manual:
https://www.gnu.org/software/wget/manual/html_node/HTTP-Options.html
And there is a command just exactly for this:
‘--ignore-length’
Unfortunately, some HTTP servers (CGI programs, to be more precise) send out bogus Content-Length headers, which makes Wget go wild, as it thinks not all the document was retrieved. You can spot this syndrome if Wget retries getting the same document again and again, each time claiming that the (otherwise normal) connection has closed on the very same byte.
With this option, Wget will ignore the Content-Length header—as if it never existed.
Then the wget command started working as expected:
wget --ignore-length -O epaper.pdf https://www.lokalmatador.de/epaper/ausgabe/gemeinderundschau-muehlhausen-14-2021
Here is output which I'm seeing with the ignore length:
--2021-04-17 14:56:19-- https://www.lokalmatador.de/epaper/ausgabe/gemeinderundschau-muehlhausen-14-2021
Resolving www.lokalmatador.de (www.lokalmatador.de)... 37.202.6.70
Connecting to www.lokalmatador.de (www.lokalmatador.de)|37.202.6.70|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: ignored [application/pdf]
Saving to: ‘epaper.pdf’
epaper.pdf [ <=> ] 4.39M 1.23MB/s in 3.6s
2021-04-17 14:56:23 (1.21 MB/s) - ‘epaper.pdf’ saved [4601842]
For the first time I've had to wrap something I'm working on as a CGI script. I'm having trouble with browsers (Both both Chrome and Firefox) not recognising the Content-Length header and stating size "unknown" to the users.
When I test this with the linux too wget, the tool recognises the size just fine.
When I test this manually though openssl s_client -connect I get the following headers:
The precise output from the webserver is as follows:
HTTP/1.1 200 OK
Date: Sun, 30 Jul 2017 20:12:20 GMT
Server: Apache/2.4.25 (Ubuntu) mod_fcgid/2.3.9 OpenSSL/1.0.2g
Content-Disposition: attachment; filename=foo.000000000G-000000001G.foofile.txt;
Content-Length: 501959790
Vary: Accept-Encoding
Content-Type: text/plain;charset=utf-8
Can anyone suggest what is missing / badly formatted?
Cracked it eventually.
This was caused by Apache doing something unexpected. Apache is compressing the output of the CGI script on the fly (sending with Content-Encoding: gzip). This changes the size of the file but Apache cannot know how much it is going to change when it sends the header. The files are 1/GB each so it can't / doesn't cache the gzipped content before it starts sending therefore cannot know the file size. This means it has to switch to Transfer-Encoding: chunked
One way to fix this is set Content-Encoding: none in the header which disables Apache from compressing the content. This does mean that 1/2 GB files take much longer to send.
Another might be to manually gzip the content in my cgi script and setting Content-Encoding: gzip and Content-Length: <gzipped size>. This will require me to work out the compressed size before sending.
I cant access a specific filetype on my customer server (production).
Here are the results with cURL:
curl "http://domain.tld/fonts/glyphicons-halflings-regular.eot" -I
HTTP/1.1 200 OK
Date: Tue, 28 Jul 2015 12:06:23 GMT
Server: Apache/2.2.15 (Red Hat)
Last-Modified: Tue, 19 May 2015 15:32:20 GMT
ETag: "14023-4f42-516710421e900"
Accept-Ranges: bytes
Content-Length: 20290
Connection: close
Content-Type: application/vnd.ms-fontobject
The file is here.
But when I try to get the file content:
curl "http://domain.tld/fonts/glyphicons-halflings-regular.eot"
curl: (56) Recv failure: Connection was reset
I can't (yet) access the customer server, so I'm trying to guess what's wrong here.
What is working so far:
curl "https://domain.tld/fonts/glyphicons-halflings-regular.eot" --insecure
It is working in HTTPS, even if there is no certificate (which is why I use --insecure). I get the file content.
The customer can get the file if he accesses the file from a local URL.
I can access all other files on the server, even in the fonts directory.
I can't access all .eot files, even in other directories.
So I think it is one of those 2 problems:
- Apache configuration / .htaccess problem.
- Proxy / reverse proxy problem.
What do you think about it?
What kind of other test should I do?
What information should I ask to the customer?
Thanks.
Ok, here is the cause:
The customer firewall blocks .eot file content.
A vulnerability in Embedded Web Fonts Could Allow Remote Code Execution.
http://www.checkpoint.com/defense/advisories/public/2006/cpai-2006-010.html
As the .eot files are used by IE8 and lower, and those browser versions are not required by the customer, I've simply removed all references to .eot files.
Another solution would be to ask for the customer firewall admins to add an exception, as the severity is low.
I'm failing a PCI compliance scan because my Railo server is revealing the path to the document web root in an "exception-message" header when a missing page is requested. I tried using both the built-in Railo 404 template and my own custom 404 template to no avail. Is there anyway to get rid of this header from the reponse?
$ curl -I http://mydomain.com/this-page-does-not-exist.html
HTTP/1.1 200 OK
Date: Wed, 08 Jan 2014 22:46:20 GMT
Server: Apache-Coyote/1.1
exception-message: Page /this-page-does-not-exist.html [/var/www/html/this-page-does-not-exist.html] not found
Content-Type: text/html;charset=UTF-8
Content-Length: 44
Set-Cookie: CFID=31254774-4b81-470d-b0da-dfadd4585ce0;Path=/;Expires=Fri, 08-Jan-2044 06:37:50 GMT
Set-Cookie: CFTOKEN=0;Path=/;Expires=Fri, 08-Jan-2044 06:37:50 GMT
Connection: close
Update: I was able to fix this problem by overwriting the header.
I created a custom 404 template and then set the Missing Template Error (404) option to point at it in the Railo administrator. Then I added this line of code to the top of the page which seems to overwrite the header with a blank string.
<cfset getPageContext().getResponse().setHeader("exception-message","")>
Note: Using the tag <cfheader> to do the same thing does not work. I'm not sure why but the Java route seems to do the trick.