Smart local copy of a remote directory - file-io

Currently I have a bunch of local copies of dev/production websites. Each copy contains the "files" directory, which contains files uploaded by site users. Currently I use rsync to synchronize the directories contents from remote servers (via ssh).
There are some annoyances:
I have to run rsync manually each time when I want fresh files (this could be automated of course, but as I have a lot of website copies, it's not a good idea).
The rsync execution takes some time.
Disc space on my laptop is running out.
I think all of this could be solved if there is some kind of a software that can work like a proxy:
When I list files, it requests the file list from the remote server and caches the results for some (configurable) time.
When I first time request file contents, it retrieves the remote file and saves it locally.
When I update a file, it only gets updated locally.
When I save a new file in the "files" directory, it not goes to the remote server.
Of course, the logic of such software should be much more complex, but I hope, my idea is clear: don't waste disk space, download files on demand, no remote changes.
Is there any software that works like that?

Map a network drive with NFS or sshfs. Make local copies if you really need a file.

I did not mention it in the question, but I needed this for work with Drupal. And now I have found a Drupal-only solution, the Stage File Proxy module.
It does exactly what I need: downloads files from a remote server only when they are requested.

Related

SFTP file locking semantics [duplicate]

How can I make sure that a file uploaded through SFTP (in a Linux base system) stays locked during the transfer so an automated system will not read it?
Is there an option on the client side? Or server side?
SFTP protocol supports locking since version 5. See the specification.
You didn't specify, what SFTP server are you using. So I'm assuming the most widespread one, the OpenSSH. The OpenSSH supports SFTP version 3 only, so it does not support locking.
Anyway, even if your server supported file locking, most SFTP clients/libraries won't support SFTP version 5. Or even if they do, they won't support the locking feature. Note that the lock is explicit, the client has to request it.
There are some common workarounds for the problem:
As suggested by #user1717259, you can have the client upload a "done" file, once an upload finishes. Make your automated system wait for the "done" file to appear.
You can have a dedicated "upload" folder and have the client (atomically) move the uploaded file to a "done" folder. Make your automated system look to the "done" folder only.
Have a file naming convention for files being uploaded (".filepart") and have the client (atomically) rename the file after an upload to its final name. Make your automated system ignore the ".filepart" files.
See (my) article Locking files while uploading / Upload to temporary file name for example of implementing this approach.
Also, some SFTP servers have this functionality built-in. For example ProFTPD with its HiddenStores directive (courtesy of #fakedad).
A gross hack is to periodically check for file attributes (size and time) and consider the upload finished, if the attributes have not changed for some time interval.
You can also make use of the fact that some file formats have clear end-of-the-file marker (like XML or ZIP). So you know, when you download an incomplete file.
A typical way of solving this problem is to upload your real file, and then to upload an empty 'done.txt' file.
The automated system should wait for the appearance of the 'done' file before trying to read the real file.
A simple file locking mechanism for SFTP is to first upload a file to a directory (folder) where the read process isn't looking. You can "make" an alternate folder using the sftp> mkdir command. Upload the file to the alternate directory, instead of the ultimate destination directory. Once the SFTP> put command completes, then do a move like this:
SFTP> move alternate_path/filename destination_path/filename. Since the SFTP "move" is just switching the file pointers, it is atomic, so it is an effective lock.

NFS server receives multiple inotify events on new file

I have 2 machines in our datacenter:
The public server exposes part of the internal servers's storage through ftp. When files are uploaded to the ftp, the files in fact end up on the internal storage. But when watching the inotify events on the internal server's storage, i notice the file gets written in chunks, probably due to buffering at client side. The software on the internal server, watches the inotify events, to determine if new files have arrived. But due to the NFS manner of writing the files, there is no good way of telling when a file is complete. Is there a way of telling the NFS client to write files in only one operation, or is there a work around for this behaviour?
EDIT:
The events i get on the internal server, when uploading a file of around 900 MB are:
./ CREATE big_buck_bunny_1080p_surround.avi
# after the CREATE i get around 250K MODIFY and CLOSE_WRITE,CLOSE events:
./ MODIFY big_buck_bunny_1080p_surround.avi
./ CLOSE_WRITE,CLOSE big_buck_bunny_1080p_surround.avi
# when the upload finishes i get a CLOSE_NOWRITE,CLOSE
./ CLOSE_NOWRITE,CLOSE big_buck_bunny_1080p_surround.avi
of course, i could listen to the CLOSE_NOWRITE event, but reading inotify documentation says:
close_nowrite
A watched file or a file within a watched directory was closed, after being opened in read-only mode.
Which is not exactly the same as 'the file is complete'. The only workaround I see, is to use .part or .filepart files and move them, once uploaded, to the original filename and ignore the .part files in my storage watcher. Disadvantage is I'll have to explain this to customers, how to upload with .part. Not many ftp clients support this by default.
Basically, if you want to check when the write operations is completed, monitor the event IN_CLOSE_WRITE.
IN_CLOSE_WRITE gets "fired" when a file gets closed which was open for writing. Even if the file gets transferred in chunks, the FTP server will close the file only after the whole file has been transferred.

SFTP file lock mechanism

How can I make sure that a file uploaded through SFTP (in a Linux base system) stays locked during the transfer so an automated system will not read it?
Is there an option on the client side? Or server side?
SFTP protocol supports locking since version 5. See the specification.
You didn't specify, what SFTP server are you using. So I'm assuming the most widespread one, the OpenSSH. The OpenSSH supports SFTP version 3 only, so it does not support locking.
Anyway, even if your server supported file locking, most SFTP clients/libraries won't support SFTP version 5. Or even if they do, they won't support the locking feature. Note that the lock is explicit, the client has to request it.
There are some common workarounds for the problem:
As suggested by #user1717259, you can have the client upload a "done" file, once an upload finishes. Make your automated system wait for the "done" file to appear.
You can have a dedicated "upload" folder and have the client (atomically) move the uploaded file to a "done" folder. Make your automated system look to the "done" folder only.
Have a file naming convention for files being uploaded (".filepart") and have the client (atomically) rename the file after an upload to its final name. Make your automated system ignore the ".filepart" files.
See (my) article Locking files while uploading / Upload to temporary file name for example of implementing this approach.
Also, some SFTP servers have this functionality built-in. For example ProFTPD with its HiddenStores directive (courtesy of #fakedad).
A gross hack is to periodically check for file attributes (size and time) and consider the upload finished, if the attributes have not changed for some time interval.
You can also make use of the fact that some file formats have clear end-of-the-file marker (like XML or ZIP). So you know, when you download an incomplete file.
A typical way of solving this problem is to upload your real file, and then to upload an empty 'done.txt' file.
The automated system should wait for the appearance of the 'done' file before trying to read the real file.
A simple file locking mechanism for SFTP is to first upload a file to a directory (folder) where the read process isn't looking. You can "make" an alternate folder using the sftp> mkdir command. Upload the file to the alternate directory, instead of the ultimate destination directory. Once the SFTP> put command completes, then do a move like this:
SFTP> move alternate_path/filename destination_path/filename. Since the SFTP "move" is just switching the file pointers, it is atomic, so it is an effective lock.

Uploading via SFTP over slow connection to temporary location then moving to real location

I have an issue where occasionally I need to work at Starbucks.
When I upload a PHP file the connection is slow so if a user tries to access the PHP file while I am uploading it they will of course be issues a fatal error.
This is very inconvenient to my busy websites. Is there a way that when a file is uploaded it can be uploaded to a temporary location, and then the server moves it to the real location once finished?
You can make WinSCP upload the file to temporary file and rename it once transfer completes automatically.
In Preferences go to the Transfer > Endurance tab and select All Files in the Enable ... Transfer to temporary file name box.
For details refer to:
https://winscp.net/eng/docs/ui_pref_resume
Why don't you just upload the file to a temporary folder on the server and execute commands on the server to remove the old file and move the new file? It should move the file fast enough on the server to eliminate any hiccups the users would see unless their timing was just right.

Joomla 1.5 Site Backup Strategy

I would like to make a complete backup of my whole joomla 1.5 based site from time to time. How would this ideally be done? Are there any common pitfalls? Not that I only have ftp access to the hosting server. Is there a step by step tutorial somewhere? I am using latest Joomgallery and Kunena 1.0.9 (Legacy mode).
Maybe there is a good way to automate this?
There's two parts of the backup you have to worry about, the database and the files.
The first part is the database. It can be backed up using something like phpMyAdmin. If you don't have this available on your server already, it's not too hard to upload and get it going yourself. From there, you can just Export the entire database to a gzip file.
The second part is the code and uploaded files. The code base shouldn't change too often, so you could probably just make one backup of this. There's a number of ways. The simplest is to just download the entire folder via FTP, though if you're Linux, I'm sure someone will know a single command line to get all the changed files (rsync?).
The database is the main thing you have to worry about though: everything else should be able to be rebuilt just by reinstalling.
I think this: http://www.joomlapack.net/ is what you need. I use it myself and it works like a charm. Both for backups and for moving my Joomla installations from developer sites and to the real site.
get an FTP synchronisation tool and keep an up-to-date copy of your site locally. Then you could run the batch script
mysqldump -hhost -uuser -p%1 schema > C:\backup.sql
to create a backup of your mysql tables at various points in time.
edit
you would have to have MySQL Server installed on your local machine and path to its bin directory in you PATH, in order to run the mysqldump command without much hassle. -p%1 would take the command-line provided password, as you wouldn't want to store passwords in your batch script.
If you only have FTP access you are in a bit of a problem, as beside all files you'll also have to backup the database. Without accessing the database, a full-backup won't do you any good.
Whatever backup strategy you choose - be sure it can handle UTF-8 correctly. Joomla 1.5 stores all content with UTF-8, even when the database charset is set on 'iso-5589-1' - so when the backup solution is detecting the database charset, some characters like € or é will result in "strange" ¬ / é - not really what you'll want.
I absolutely endorse using Joomlapack - it works great. The optional remote tools allow you to initiate the backup from a Windows desktop machine - it performs the backup and downloads it. The remote has a scheduler, and you can also set it off to backup and download a list of sites.
Joomlapack also provides a file "kickstart.php" which you copy to your empty server account along with the backup, which automates the restore procedure. You do have to create an empty database with PHPMyAdmin or similar, and you are given the opportunity to supply the database parameters (host, database, username, password) during the process.
One pitfall I did run into with this though is that some common components can have absolute URLs in their configuration - e.g. SOBI2, Virtuemart. It's then just a matter of finding the appropriate configuration file, editing it and re-uploading it.
Another problem was one archive file (either ZIP or their JPA format) got a filename with a "?" character in it (from a Linux server) and this caused a bit of a problem trying to install it locally on a Windows WAMP stack - the extract process on the ZIP file failed, and it stopped the process completing cleanly.
I suggest using automatic backup service by http://www.everlive.net
Update:
Ok, here is some more information. EverLive.net is a website where you can create a free account. Enter your website details and you are ready to take your backups withe just one click. Restore is also possible in the same way.
Further you can use automatic backup option to take automatic backups at defined intervals. Other than that, you can use the website health check service to inform you if your website is not available.