Server-side auto-minify? - apache

Is there any way to automatically minify static content and then serve it from a cache automatically? Similar to have mod_compress/mod_deflate work? Preferably something I could use in combination with compression (since compression has a more noticeable benefit).
My preference is something that works with lighttpd but I haven't been able to find anything, so any web server that can do it would be interesting.

You can try nginx's third party Strip module:
http://wiki.nginx.org/NginxHttpStripModule
Any module you use is just going to remove whitespace. You'll get a better result by using a minifier that understands whatever you're minifying. e.g. Google's Closure javascript compiler.
It's smart enough to know what a variable is and make it's name shorter. A whitespace remover can't do that.
I'd recommend minifying offline unless your site is very low traffic. But if you want to minify in your live environment I recommend using nginx's proxy cache. (Sorry but I don't have enough reputation to post more than one link)
Or you can look into memcached for an in-memory cache or Redis for the same thing but with disk backup.

I decided to do this through PHP (mostly because I didn't feel like writing a lighttpd module).
My script takes in a query string specifying the type of the files requested (js or css), and then the names of those files. For example, on my site the CSS is added like this:
<link rel="stylesheet" href="concat.php?type=css&style&blue" ... />
This minifies and concatenates style.css and blue.css
It uses JSMin-PHP and cssmin.
It also caches the files using XCache if it's available (since minifying is expensive). I actually plan to change the script so it doesn't minify if Xcache isn't available, but I have Xcache and I got bored.
Anyway, if anyone else wants it, it's here. If you use mine you'll need to change the isAllowed() function to list your files (it may be safe to make it just return true, but it was easy to just list the ones I want to allow).

I use Microsoft Ajax Minifier which comes with a C# library to minify js files. I use that on the server and serve up a maximum of two minified .js files per page (one "static" one that is the same across the whole site, and one "dynamic" one that is specific to just that page).
Yahoo's YUI compressor is also a simple Java .jar file that you could use as well.
The important thing, I think, is not to do it on a file-by-file basis. You really do need to combine the .js files to get the most benefit. For that reason, an "automatic" solution is not really going to work - because it will necessarily only work on a file-by-file basis.

If you use Nginx instead of lighttpd then you can take advantage of Nginx's embedded Perl support to leverage the Perl module JavaScript-Minifier to minify and cache JS server-side.
Here are the details on how to achieve this: wiki.nginx.org/NginxEmbeddedPerlMinifyJS

Related

Archiving an old PHP website: will any webhost let me totally disable query string support?

I want to archive an old website which was built with PHP. Its URLs are full of .phps and query strings.
I don't want anything to actually change from the perspective of the visitor -- the URLs should remain the same. The only actual difference is that it will no longer be interactive or dynamic.
I ran wget --recursive to spider the site and grab all the static content. So now I have thousands of files such as page.php?param1=a&param2=b. I want to serve them up as they were before, so that means they'll mostly have Content-Type: text/html, and the webserver needs to treat ? and & in the URL as literal ? and & in the files it looks up on disk -- in other words it needs to not support query strings.
And ideally I'd like to host it for free.
My first thought was Netlify, but deployment on Netlify fails if any files have ? in their filename. I'm also concerned that I may not be able to tell it that most of these files are to be served as text/html (and one as application/rss+xml) even though there's no clue about that in their filenames.
I then considered https://surge.sh/, but hit exactly the same problems.
I then tried AWS S3. It's not free but it's pretty close. I got further here: I was able to attach metadata to the files I was uploading so each would have the correct content type, and it doesn't mind the files having ? and & in their filenames. However, its webserver interprets ?... as a query string, and it looks up and serves the file without that suffix. I can't find any way to disable query strings.
Did I miss anything -- is there a way to make any of the above hosts act the way I want them to?
Is there another host which will fit the bill?
If all else fails, I'll find a way to transform all the filenames and all the links between the files. I found how to get wget to transform ? to #, which may be good enough. It would be a shame to go this route, however, since then the URLs are all changing.
I found a solution with Netlify.
I added the wget options --adjust-extension and --restrict-file-names=windows.
The --adjust-extension part adds .html at the end of filenames which were served as HTML but didn't already have that extension, so now we have for example index.php.html. This was the simplest way to get Netlify to serve these files as HTML. It may be possible to skip this and manually specify the content types of these files.
The --restrict-file-names=windows alters filenames in a few ways, the most important of which is that it replaces ? with #. This is needed since Netlify doesn't let us deploy files with ? in the name. It's a bit of a hack; this is not really what this option is meant for.
This gives static files with names like myfile.php#param1=value1&param2=value2.html and myfile.php.html.
I did some cleanup. For example, I needed to adjust a few link and resource paths to be absolute rather than relative due to how Netlify manages presence or lack of trailing slashes.
I wrote a _redirects file to define URL rewriting rules. As the Netlify redirect options documentation shows, we can test for specific query parameters and capture their values. We can use those values in the destinations, and we can specify a 200 code, which makes Netlify handle it as a rewrite rather than a redirection (i.e. the visitor still sees the original URL). An exclamation mark is needed after the 200 code if a "query-string-less" version (such as mypage.php.html) exists, to tell Netlify we are intentionally shadowing.
/mypage.php param1=:param1 param2=:param2 /mypage.php#param1=:param1&param2=:param2.html 200!
/mypage.php param1=:param1 /mypage.php#param1=:param1.html 200!
/mypage.php param2=:param2 /mypage.php#param2=:param2.html 200!
If not all query parameter combinations are actually used in the dumped files, not all of the redirect lines need to be included of course.
There's no need for a final /mypage.php /mypage.php.html 200 line, since Netlify automatically looks for a file with a .html extension added to the requested URL and serves it if found.
I wrote a _headers file to set the content type of my RSS file:
/rss.php
Content-Type: application/rss+xml
I hope this helps somebody.

Block access to files by URL

I am new to webhosting and building a very small PHP website as a part of my project. It will not be used for practical purposes for now, but still I want to make sure that it is not TOO insecure.
I have a few files which I don't want users to access by URL(some text and CSV files) but my PHP code should be able to use them. How can I achieve something like this?
If you don't want them accessed by the web server but just by PHP, the best thing is to just keep them outside the webroot.
You can block access using .htaccess, but that will prevent you from using pretty much any other web server than Apache, and it adds un-necessary overhead (and a possible vulnerability if the .htaccess is accidentally removed or configured wrong)

JSF2 resources - compression, minification

I have two questions about resources in JSF2:
is there any way to set that all JSF2 resources (JS, CSS) should be compressed (gziped) or at least minified. (Something a la wro4j).
And the second one: is there any way to force exclude some library? I am using in my admin system OpenFaces, but the JS dependency is included even in the user frontend pages, even thought I never use (or import namespace) there.
Thanks
Gzipping is more a servletcontainer configuration. Consult its documentation for details. In Tomcat for example, it's a matter of adding the compression="on" attribute to the <Connector> element in /conf/server.xml. See also Tomcat Configuration Reference - The HTTP Connector.
<Connector ... compression="on">
You can also configure compressable mime types over there.
Minification is more a build process configuration. If you're using Ant as build tool, you may find the YuiCompressorAntTask useful. We use it here and it works wonderfully.
As to OpenFaces, that's a completely different question and I also don't use it so I don't have an answer for you. I'd suggest to just ask that in a separate question. I don't see how that's related to performance improvements as gzipping and minification.
For what concerns OpenFaces I had the same problem and I solved unpacking the JAR, minifying the huge Javascripts manually and repacking the JAR. It allowed me to save about 70 Kb per request which were impacting performance on heavy load.

Serving dynamic zip files through Apache

One of the responsibilities of my Rails application is to create and serve signed xmls. Any signed xml, once created, never changes. So I store every xml in the public folder and redirect the client appropriately to avoid unnecessary processing from the controller.
Now I want a new feature: every xml is associated with a date, and I'd like to implement the ability to serve a compressed file containing every xml whose date lies in a period specified by the client. Nevertheless, the period cannot be limited to less than one month for the feature to be useful, and this implies some zip files being served will be as big as 50M.
My application is deployed as a Passenger module of Apache. Thus, it's totally unacceptable to serve the file with send_data, since the client will have to wait for the entire compressed file to be generated before the actual download begins. Although I have an idea on how to implement the feature in Rails so the compressed file is produced while being served, I feel my server will get scarce on resources once some lengthy Ruby/Passenger processes are allocated to serve big zip files.
I've read about a better solution to serve static files through Apache, but not dynamic ones.
So, what's the solution to the problem? Do I need something like a custom Apache handler? How do I inform Apache, from my application, how to handle the request, compressing the files and streaming the result simultaneously?
Check out my mod_zip module for Nginx:
http://wiki.nginx.org/NgxZip
You can have a backend script tell Nginx which URL locations to include in the archive, and Nginx will dynamically stream a ZIP file to the client containing those files. The module leverages Nginx's single-threaded proxy code and is extremely lightweight.
The module was first released in 2008 and is fairly mature at this point. From your description I think it will suit your needs.
You simply need to use whatever API you have available for you to create a zip file and write it to the response, flushing the output periodically. If this is serving large zip files, or will be requested frequently, consider running it in a separate process with a high nice/ionice value / low priority.
Worst case, you could run a command-line zip in a low priority process and pass the output along periodically.
it's tricky to do, but I've made a gem called zipline ( http://github.com/fringd/zipline ) that gets things working for me. I want to update it so that it can support plain file handles or paths, right now it assumes you're using carrierwave...
also, you probably can't stream the response with passenger... I had to use unicorn to make streaming work properly... and certain rack middleware can even screw that up (calling response.to_s breaks it)
if anybody still needs this bother me on the github page

Is mod_rewrite a valid option for caching dynamic pages with Apache?

I have read about a technique involving writing to disk a rendered dynamic page and using that when it exists using mod_rewrite. I was thinking about cleaning out the cached version every X minutes using a cron job.
I was wondering if this was a viable option or if there were better alternatives that I am not aware of.
(Note that I'm on a shared machine and mod_cache is not an option.)
You could use your cron job to run the scripts and redirect the output to a file.
If you had a php file index.php, all you would have to do is run
php index.php > (location of static file)
You just have to make sure that your script runs the same on command line as it does served by apache.
I would use a cache on application level. Because the application knows best when the cached version is out of date and is more flexible and powerful in the matter of cache negotiation.
Does the page need to be junked every so often because it just has to? Or should it be paralleled with a static version after an update to the page?
If the latter, you could try and write a script that would make a copy of the just edited page and save it to its static filename version. That should lighten the write load since in that scenario you wouldn't need to have a fresh static copy unless there was a change made that needed some show time.