Use rsync without copying files that are in use - backup

I have a server (Machine A) that receives uploads throughout the day from other machines. I have a script running on another internal server (running as cron - Machine B) that uses rsync to pull these files onto itself and remove the originals on Machine A. Some of these uploads last an hour or more.
How do I use rsync so that it won't attempt to copy files that are currently uploaded (being written to)? I don't want it to pull partial uploads and then attempt to process them.
I'm using Ubuntu 10.04 64-bit on both machine A & B.

In order to make incremental baclups in rsync, you should put the --update or -u option. The only situation in which the a file existing in the receiver will be updated is when the archive exists and has the same timestamp in both ends but the size differs.
About the partial updates, all the temporary uploads are stored in a temporary archive and then moved to the dest directory when uploaded. you can use the --partial in case of a rsync or network problem, this will resume the partial updates next time you execute the sync again.
You can check the whole options from this man page.

Related

What is the most efficient way to use rsync to confirm transfer of large folder

I’m using a synology ds214 and I have a 6tb folder I originally was using USB Copy to transfer to an external drive connected directly to the nas. That process failed somewhere along the way and I attempted to rsync each folder within the 6tb individually using the flags -avPc. What’s the most efficient command to run to ensure that all the files are synced and completed?
Would —-ignore-existing be the best flag in this case?

Tar --compare over an ssh connection

I have a lot of image data on a remote server that I need to copy over to a local drive. The local drive writes data off to tape after 30days, and so it is important that the data is kept together in convenient chunks otherwise it can end up on multiple tapes and take several days to retrieve. To do this I ssh into the remote server, tar up each image sequence into a tarball, and write it to the local drive using CPIO. The result is I have my image sequence on the remote server, and the tar file locally.
What I want to do:
Check that the tar file contains everything that it is meant to contain. I believe using the tar --compare flag is the way to go, however I am having trouble figuring out how to use --compare to compare a local tar file to the source files on a remote server via ssh.
So far I think I can run the command locally using the following:
# Somehow tell it the files are located across an ssh connection
tar --compare --file=test.tar -C
But I don't know how to do the last part.
I have an SSH key setup with the remote server so I do not need to enter a password.
Alternatively if you have another method of doing this comparison I would love to hear it.

Accessing external hard drive after logging into a remote machine using ssh command

I am doing an intensive computing project with a super old C program. The program requires a library called Sun Performance Library which is a commercial ware. Instead of purchasing the library by myself, I am running the program by logging onto a Solaris machine in our computer lab with the ssh command, while the working directory to store output data is still on my local Mac.
Now, a problem just occurred: the program uses large amount of disk space to save some intermediate results and the space on my local Mac is quickly filled (50 GB for each user prescribed by the administrator). These results are necessary for the next stage of computing and I cannot delete any of them before it finally produce the output data. Therefore, I have to move the working directory to an external hard drive in order to continue. Obviously,
cd /Volumes/VOLNAME
is not the correct way to do it because the remote machine will give me a prompt saying
/Volumes/VOLNAME: No such file or directory.
So, what is the correct way to do it?
sshfs recently added support for "slave mode" which allows you to do this. Assuming you have sshfs on Solaris (I'm not sure about this), the following command (ran from your Mac) will do what you want: dpipe /usr/lib/openssh/sftp-server = ssh SOLARISHOSTNAME sshfs MACHOSTNAME:/Volumes/VOLNAME MOUNTPOINT -o slave
This will result in the MOUNTPOINT directory on the server being mounted to your local external drive. Note that I'm not sure whether macOS has dpipe. If it doesn't, you can replace it with one of the equivalent solutions at How to make bidirectional pipe between two programs?. Also, if your SFTP server binary is somewhere else, substitute its path.
The common way to mount a remote volume in Solaris is via NFS, but that usually requires root permissions.
Another approach would be to make your application read its data from stdin and output its results to stdout, without using the file system directly. Then you could just redirect the data from/to your local machine through ssh. For instance:
ssh user#host </Volumes/VOLNAME/input.data >/Volumes/VOLNAME/output.data

Redis / Create new .rdb file while Redis is still running

I use Redis, and today I start to get the following exception:
Can't save in background: fork: Cannot allocate memory
As I understand, this error appears because my DB is too big, and there is no memory for this process.
So I start to delete tables, but the problem is that Redis doesn't success to write it to the disc, and in face it doesn't know about this changes.
I decided to create new .rdb file (in /etc/redis.config), and then change the file path with the new RDB file:
dbfilename dump_cache_new.rdb
Then, I will reload all the data which critical to me (I can do it - its data from my file system), and restart redis service.
The problem is that I can't create this file, because redis is now executing with the old path (and Redis has to run, because other process takes some critical data from it).
How can I create this dump_cache_new.rdb file, while redis is still running with the old path?
If you want to change the snapshot file name (or most other configuration parameters) on a running instance of Redis, use the CONFIG SET command. Based on that documentation page, it looks like dir and dbfilename are both parameters than can be set on a live instance.
Another option to consider is using the synchronous SAVE command, which doesn't require a fork.
You almost never want to call SAVE in production environments where it will block all the other clients. Instead usually BGSAVE is used. However in case of issues preventing Redis to create the background saving child (for instance errors in the fork(2) system call), the SAVE command can be a good last resort to perform the dump of the latest dataset.
It's a pretty severe operation, but if you're already at the point of dumping data to make the save work, this would at least allow you to first make a snapshot.

Smart local copy of a remote directory

Currently I have a bunch of local copies of dev/production websites. Each copy contains the "files" directory, which contains files uploaded by site users. Currently I use rsync to synchronize the directories contents from remote servers (via ssh).
There are some annoyances:
I have to run rsync manually each time when I want fresh files (this could be automated of course, but as I have a lot of website copies, it's not a good idea).
The rsync execution takes some time.
Disc space on my laptop is running out.
I think all of this could be solved if there is some kind of a software that can work like a proxy:
When I list files, it requests the file list from the remote server and caches the results for some (configurable) time.
When I first time request file contents, it retrieves the remote file and saves it locally.
When I update a file, it only gets updated locally.
When I save a new file in the "files" directory, it not goes to the remote server.
Of course, the logic of such software should be much more complex, but I hope, my idea is clear: don't waste disk space, download files on demand, no remote changes.
Is there any software that works like that?
Map a network drive with NFS or sshfs. Make local copies if you really need a file.
I did not mention it in the question, but I needed this for work with Drupal. And now I have found a Drupal-only solution, the Stage File Proxy module.
It does exactly what I need: downloads files from a remote server only when they are requested.