Controlling access to large files in Apache - apache

I am looking to control access to some large files (we're talking many GB here) by the use of signed URLs. The files are currently restricted by LDAP Basic authentication (mod_auth_ldap), but I need to change this to verify the signature (passed as a query parameter in the URL).
Basically, I just need to run a script to verify the signature, and allow the request to proceed as if authentication had succeeded. My initial thought to this was just to use a simple CGI script, but as the files are so large I'm concerned about performance. So, really, this question is (probably) more like "are there any performance implications of streaming large files from a CGI script via Apache?"… and if so, "is there a better way of doing this (short of writing a dedicated authentication module)?"
If this makes any sense, help would be much appreciated :)
P.S. I wasn't sure exactly what to search for for this (10 minutes of Googling were fruitless), so I may very well be duplicating someone else's post.

Have a look at the crypto cookies/sessions in apache - one way to do this is to put a must-have-valid session limit on that directory - forward anyone who does not have a valid one to a cgi-script; auth there - and then forward back to the actual download.
That way apache can use its normal sendfile() and other optimizations.
However keep in mind that a shell script or perl script ending with a simple 'execvp', 'exec cat' or something like that is not that expensive.
An alternative is more uRL based - like

I ended up solving this with a CGI script as mentioned… cookies weren't an option because we need to be able to support clients that don't use cookies (apt).


Send very large file (>> 2gb) via browser

I have a task to do. I need to build a WCF service that allow a client to import a file inside a database using the server backend. In order to do this, i need to communicate to the server, the setting, the events needed to start and set the importation and most importantly the file to import. Now the problem is that these files can be extremely large (much bigger then 2gb), so it's not possible to send them via browser as they are. The only thing that comes into my mind is to split these files and send them one by one to the server.
I have also another requirement: i need to be 100% sure that this file are not corrupted, so i need to implement also a sort of policy for correction and possibly recover of the errors.
Do you know if there is a sort of API or dll that can help me to achieve my goals or is it better to write the code by myself? And in this case, which would be the optimal size of the packets?

configure multiple servers and scale

I have been given a task configure 1000 of servers with some simple data. Lets say I need to login to server (linux or windows) and setup the ntp server. I need to come up with some kind of automation framework using perl. I have some ideas and want to get more.
Here is my thought process:
a) Since there are 1000s of servers, definitely the framework should be able to read in a csv file so all inputs can be provided as apposed to single input.
b) Since there are so many servers, I have to find a way to do things in parallel. I cant go server by server in a sequential way
c) I should have some output file that shows the results of all the servers that I successfully configured, servers that failed. That way I can compare input file and output file and generate a report
Should I consider anything else in my framework ?
How can I do parallel processing using perl ?
Even if you want to stick with Perl, it looks like there are already some alternatives available that would keep you from implementing another framework from scratch.
Check out the comments from for a couple options.

How to compare test website and live website

We have our production server running our website. Then we have a test server which has exact same data but with changes to code to do some new functionality. This web app has over 500 pages.
Is there any program that can
Login to the test site
Crawl through each page and then save the page as html
Compare with the same page saved with live site?
This way we can make sure that new features that we add to our test site will not break the live site when code updates are applied to production.
I am currently trying to use WinHTTrack website copier and then comparing the test and live folders with some code comparison tool like beyond compare. This works ok but there are lot of files changed because of the domain name changes.
Looking forward to ideas / solutions for this problem.
Have you looked at using Watir for this? It's not exactly the thing you are looking for but it might allow you some more granularity in your tests and ensure the site is functionally identical rather than getting caught up on changing guids, timestamps and all the other things that tend to change across any significant size website from day to day as part of it's standard functionality.
Apparently you can't make consistent, reproduceable builds in your project, can you? I would recommend moving towards that in the long run, it will save you a lot of headaches. That way you would know exactly what was deployed to which server when, so there would be no more need to bend around backwards to get the deployed sources back like this...
I know this is not a direct solution to your problem... but maybe it is worth comparing, whether you would save more in the long run by investing the efforts into your build process now, instead of implementing this workaround (and then improving your build process anyway - because one day you will almost surely need to do that).
wget has a --convert-links option, there are also some options to preserve cookies that might let you do it logged in
use an Offline Downloader, download all files to your computer from both sources, then compare the folder contents using a free tool like Total Commander.
Load both of your sources into a CVS, and compare it there.

How can i access and manipulate a mdb file available online (on web) using VB

I have a mdb file hosted on my site I am developing an application which will be running on multiple machines and will contact the mdb file via internet. I am not sure how do i go about it as building the connection string for an online connection. Any help?
You do not - not at all, not even a little bit, want to expose a .MDB file directly over the internet. You really, really do not want to do this.
There are two reasons and I'll start with second, even if it works - and since it needs to be able to create a .ldb file if its not read only I'm not sure it will - it is liable to be horribly slow. Multi-user MDB can be bad enough over a local network.
The other reason is security, assuming it works at all you're going to really struggle to make this even vaguely safe.
Broadly speaking what you need to do is to create a web service that runs on your site that provides an secured API that your client applications can use to access your database - this gives you two benefits: 1) its much more secure (you're not exposing webspace with write permissions) and 2) it gives you the ability to change the back end data store if required without affecting the clients. There are various possibilities for implementing this but it will depend on the tools you have/are comfortable with.
I think it is possible to access the same way that access a local file, simply using the URL as Data Source. That is, the connection string looks like:
Provider=Microsoft.Jet.OLEDB.4.0;User ID=...;Data Source=;Mode=..., etc

Managing ajax Couchdb calls and IE's (hta) agressive cache

I'm having a quite annoying problem, and came up with a quite ugly hack to make it work.
I develop an Hta application using a CouchDB database (for internal company use). The problem is there seems to be some very aggressive caching of the database queries, and it's been hard to come up with solutions.
So the updated data in the database just won't come up in the browser, who still has the previous request results in his cache, until the entire app is started anew.
Oh, and CouchDB (or it's mochiweb server) doesn't allow unknown GET variables, so the usual solution of appending some sort of timestamp just won't work.
I have found some sort of solution, but it's damn ugly. Solutions are:
Only open documents with latest revision number (easy and nice, won't work on views)
Use apache as forward proxy listening to 200+ ports, and select one at random on each read query. (that's the ugly one).
Hta accepts ajax calls to other ports (maybe even on other domains, strange behaviour), so it works nicely, I just have a 1/200 chance that new data won't come up, but that's still better then 1/1, I can live with that.
So what I'm asking is, is there a better solution to this ? Can I hack in to the mochiweb server to modify cache headers (and hope they're not going to be ignored) ? Is there a special unknown "throwaway" key I could use in the url's to append some random string ? Or is there a way to tell Hta not to cache anything (from within the app, this is supposed to work on hundreds of computers) ?
it's still ugly but slightly less ugly than your current apache setup but Can't you use an apache rewrite rule to allow you to set an arbitrary no_cache attribute on the url? apache can throw it away so couchdb won't see it.