Can I prevent rsync from checksumming its link-dest files? - backup

I'm using rsync (version 3.0.9) to do snapshot-style incremental backup from local disk to a LAN-attached NAS that's mounted using cifs. The functionality is ideal, but it's unreasonably slow for the most common scenario: daily backup of a file hierarchy (~100GB, ~2000 directories) in which only a very few files have changed. The slowdown does not happen when doing the simple:
rsync -a /home/stuff/ /mnt/nas/backup/yesterday
(when only a few files have changed since yesterday) because in this case rsync uses only its quick timestamp+size check to compare files. But when I do my snapshot backup:
rsync -a --link-dest=/mnt/nas/backup/yesterday /home/stuff/ /mnt/nas/backup/today
there is heavy network traffic to/from the NAS and things go very slowly even though almost no data is actually transferred from source to target. I suspect this is caused by rsync checksumming the target files in the link-dest directory. Adding --no-checksum doesn't alter things. Is there any way to get rsync to do its file compare as quickly when doing link-dest as it does when doing a simple overwrite?

Related

sshfs: will a mount overwrite existing files? Can I tell it to exclude a certain subfolder?

I'm running Ubuntu and have a remote CentOS system which stores (and has access to) various files and network locations. I have SSH access to the CentOS machine and want to be able to work locally on Ubuntu.
I'm trying to mirror a remote directory structure. The remote directory is structured:
/my_data/user/*
And I want to replicate this structure locally (a lot of scripts rely on absolute paths).
However, for reasons of speed, I want a certain subfolder, for example:
/my_data/user/sourcelibs/
To be stored locally on disk. I know the sourcelibs subfolder doesn't change much (but the rest might). So I can comfortably rsync it:
mkdir -p /my_data/user/sourcelibs/
rsync -r remote_user#remote_host:/my_data/user/sourcelibs/ /my_data/user/sourcelibs/
My question is, if I use sshfs to mount /my_data/user:
sudo sshfs -o allow_other,default_permissions, remote_user#remote_host:/my_data/user /my_data/user
Will it overwrite my existing files? Is there a way to have sshfs mount but exclude certain subfolders?
Yes, sshfs will overwrite existing files. I have almost the same use case and just tested this myself. BTW, you'll need to add -o nonempty to your sshfs command since the destination dir /my_data/user already exists.
What I found to work is make a copy of the remote directory excluding the large sub dirs. IDK if keeping 2 copies in sync on the remote machine is feasible for your use case? But if you'll mostly be updating on your local machine and rarely making changes remotely, that could work.

Sync clients' files with server - Electron/node.js

My goal is to make an Electron application, which synchronizes clients' folder with server. To explain it more clearly:
If client doesn't have the files present on the host server, the application downloads all of the files from server to client.
If client has the files, but some files have been updated on the server, the application deletes ONLY the outdated files (leaving the unmodified ones) and downloads the updated files.
If a file has been removed from the host server, but is present at client's folder, the application deletes the file.
Simply, the application has to make sure, that client has EXACT copy of host server's folder.
So far, I did this via wget -m, however frequently wget did not recognize, that some files changed and left clients with outdated files.
Recently I've heard of zsync-windows and webtorrent npm package, but I am not sure which approach is right and how to actually accomplish my goal. Thanks for any help.
rsync is a good approach but you will need to access it via node.js
An npm package like this may help you:
https://github.com/mattijs/node-rsync
But things will get slightly more difficult on windows systems:
How to get rsync command on windows?
If you have ssh access to the server an approach could be using rsync through a Node.js package.
There's a good article here on how to implement this.
You can use rsync which is widely used for backups and mirroring and as an improved copy command for everyday use. It offers a large number of options that control every aspect of its behaviour and permit very flexible specification of the set of files to be copied.
It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination.
For your use case:
If the client doesn't have the files present on the host server, the application downloads all of the files from a server to the client. This can be achieved by simple rsync.
If the client has the files, but some files have been updated on the server, the application deletes ONLY the outdated files (leaving the unmodified ones) and downloads the updated files. Use: –remove-source-files or -delete based on whether you want to delete the outdated files from the source or the destination.
If a file has been removed from the host server but is present at the client's folder, the application deletes the file. Use: -delete option of rsync.
rsync -a --delete source destination
Given it's a folder list (and therefore having simple filenames without spaces, etc.), you can pick the filenames with below code
# Get last item from each line of FILELIST
awk '{print $NF}' FILELIST | sort >weblist
# Generate a list of your files
find -type f -print | sort >mylist
# Compare results
comm -23 mylist weblist >diffs
# Remove old files
xargs -r echo rm -fv <diffs
you'll need to remove the final echo to allow rm work
Next time you want to update your mirror, you can modify the comm line (by swapping the two file arguments) to find the set of files you don't have, and feed those to wget.
or
rsync -av --delete https://mirror.abcd.org/xyz/xyz-folder/ my-client-xyz-directory/

What is the optimal way to store data-files for testing using travis-ci + Docker?

I am trying to set-up the testing of the repository using travis-ci.org and Docker. However, I couldn't find any manuals about what is the politics on memory usage.
To perform a set of tests (test.sh) I need a set of input files to run on, which are very big (up to 1 Gb, but average 500 Mb).
One idea is to wget directly in test.sh script, but for each test-run it would be not efficient to download the input file again and again.
The other idea is to create a separate dockerfile containing the test-files and mount it as a drive, but this would be not nice to push such a big dockerimage in the general register.
Is there a general prescription for such tests?
Have you considered using Travis File Cache?
You can write your test.sh script in a way so that it will only download a test file if it was not available on the local file system yet.
In your .travis.yml file, you specify which directories should be cached after a successful build. Travis will automatically restore that directory and files in it at the beginning of the next build. As your test.sh script will then notice the file exists already, it will simply skip the download and your build should be a little faster.
Note that how the Travis cache works is that it will create an archive file and put it on some cloud storage where it will need to download it later on as well. However, the assumption is that the network traffic will likely be inside that "cloud" and potentially in the same data center as well. This should still give you some benefits in terms of build time and lower use of resources in your own infrastructure.

Restoring Apache Tomcat after an accidental delete

I have a server running apache tomcat. The path to the server is following:
root#serverb:/usr/tomcat/apache-tomcat-7.0.23# pwd
/usr/tomcat/apache-tomcat-7.0.23
root#serverb:/usr/tomcat/apache-tomcat-7.0.23# ls
LICENSE NOTICE RELEASE-NOTES RUNNING.txt bin conf lib logs temp webapps work ws.war
From time to time, I have to go logs/ folder and run following command:
find . -mtime +2 -exec rm {} \;
However, I accidentally ran this command in /usr/tomcat/apache-tomcat-7.0.23 as a result, my ws.war file and other files from within bin/ folder got deleted.
I have the backup of ws.war but not of the apache folder. Is there anyway I can reinstall the apache and restore my server.
Most likely you're not asking how to create a backup after you need it (not before...), right?
Of course, you can get tomcat at http://tomcat.apache.org, but if you don't have your configuration and changed settings (e.g. memory settings, host setup etc.) you'll have to redo it from memory or until nobody complains any more.
Congratulations, you've learnt about the importance of backups. When you're done with the new installation, consider having a proper backup from now on. Keep in mind: IMHO you're only allowed to call something a backup if you have demonstrated that you can use it to restore to a new environment in the time that you specify as acceptable downtime.

Backup or snaphot tool for ext4

I'm looking for a backup tool for ext4, which can take a copy from a running fs like /var with no collisions after recover such fs. I know BSD dump has an '-L' option, which tells him to work on a snapshot. But nor dump nor dumpe2fs from repository have such option. I've read about a patchset for ext4 providing snapshot support, but replies about it are very different, so i'm here to ask about your experience with this patches.
It's not a dump tool but I use rsync which allows incremental backups between 2 filesystems on a running system.
for example
rsync -aSXvH /srcdir /target_dir