Bash copying specific files - backup

How can I get tar/cp to copy only files that dont end in .jar and only in root and /plugins directories?
So, I'm making a Minecraft server backup script. One of the options I wish to have is a backup of configuration files only. Here's the scenario:
There are many folders with massive amounts of data in.
Configuration files mainly use the following extensions, but some may use a different one:
Configuration files mainly appear in the /plugins folder
There are a few configuration files in the root directory, but none in any others except /plugins
The only other files in these two directories are .jar files - to an extent. These do not need to be backed up. That's the job of the currently-working plugins flag.
The code uses a mix of tar and cp depending on which flags the user started the process with.
The process is started with a command, then paths are added via a concatenated variable, such as $paths = plugins world_nether mysql/hawk where arguments can be added one at a time.
How can I selectively backup these configuration files with tar and cp? Due to the nature of the configuration process, we needn't have the same flags to add into both commands - it can be seperate arguments for either command.
Here are the two snippets of code in concern:
Configure paths:
# My first, unsuccessful attempt.
if $BKP_CFG; then
# Tell user they are backing up config
# Main directory, and everything in plugin directory only
# Jars are not allowed to be backed up
#paths="$paths --no-recursion * --recursion plugins$suffix --exclude *.jar"
---More Pro Stuff----
# Set commands
if $ARCHIVE; then
command="tar -cpv"
# Paths starts with a space </protip>
command=$command"C $SERVER_PATH -f $BACKUP_PATH/$bkpName$paths"
prep="mkdir $BACKUP_PATH/$bkpName"
# Make each path an absolute path. Currently, they are all relative
for path in $paths; do
command="cp -av$paths $BACKUP_PATH/$bkpName"
I can provide more code/explaination where neccessary.

find /actual/path ! -iname '*jar' -maxdepth 1 -exec cp \{\} /where/to/copy/ \;
find /actual/path/plugins ! -iname '*jar' -maxdepth 1 -exec cp \{\} /where/to/copy/ \;
Might help.

Final code:
if $BKP_CFG; then
# Tell user what's being backed up
echo " +CONFIG $confType"
# Main directory, and everything in plugin directory only
# Jars are not allowed to be backed up
# Find matches within the directory cd'd to earlier, strip leading ./
paths="$paths $(find . -maxdepth 1 -type f ! -iname '*.jar' | sed -e 's/\.\///')"
paths="$paths $(find ./plugins -type f ! -iname '*.jar' | sed -e 's/\.\///')"


scp: how to transfer files from different directories preserving the whole path

I need to transfer all *.png from different directories from a remote server BUT preserving the full path of each .png file because all .png files have the same name.
scp -r -e server:coverages/K4me3/*/pos/output/*/*.png Desktop/
While coping it rewrites already existing .png files because the names od them are the same in different directories. I want to preserve the full directory path,s o that when copying, the .png files are copied within their own directories.
SCP doesn't preserve the paths of files on its own, as you have discovered.
You'll probably want to use rsync to do this, since rsync does preserve paths
I think the command would be:
rsync -a -r -v -z server_config:/path/to/root/directory/on/server [destination_folder]
This is the reverse of this question: scp a folder to a remote system keeping the directory layout
Alternatively, and as the comments suggest, you can write a script to get all of the files or lower level directories (with absolute path) and call an scp transfer on each of them. Here is a script that I at one point used to copy files in this way:
#!/usr/bin/env python
from multiprocessing.dummy import Pool
from subprocess import call
from functools import partial
root = # Root Directory
files = [
root + # Sub 1,
root + # Sub 2,
root + # Sub 3,
root + # Sub etc,
command_s = "scp -r -v -c arcfour -F /path/to/.ssh/config Server:"
command_e = " Output_Dir/"
max_processes = 4
# Transfer the files 4 at a time because my computer is busy with other stuff
cmds = []
for filename in files:
cmds.append(command_s + filename + command_e)
pool = Pool(max_processes)
for i, returncode in enumerate(pool.imap(partial(call, shell=True), cmds)):
if returncode != 0:
print ("%d command failed: %d" % (i, returncode))
Here is an answer to preserve directory structure and copy just the png files from a server to a local system based on ssh.
ssh user#server 'find /server/path -name "*.png" -print0 | xargs -0 tar -cO' | tar -xfv - -C .
Source: Link

Where is the data directory in Redis?

After writing some data to a redis server, I could read the data from a client.
However, how can I find the data directory on the file system?
Quickest method: use redis-cli.
redis-cli config get dir
If you have authentication configured, you will need to pass that in using -a password Replacing "password" with your password.
Find your Redis configuration directory, probably /etc/redis. Then look in the config file called redis.conf and find the line that starts dir.
It will look similar to this:
dir /etc/redis/database
This will do the job slowly but surely if you can't be bothered to look :-)
sudo find / -name "redis.conf" -exec grep "^dir" {} \; 2> /dev/null
dir /etc/redis
or if you want the config filename as well:
sudo find / -name "redis.conf" -exec grep -H "^dir" {} \; 2> /dev/null
/private/etc/redis/redis.conf:dir /etc/redis
Other possibilities you can check are whether Redis was started with a custom config file as its first parameter like this:
redis-server /path/to.custom/config-file
or with the dir option set on the commandline like this:
redis-server dir /path/to/data
ps -aef | grep redis
to look for these options.
Because Redis config file can be located in several possible places (depending on the system or container, such as /opt/redis/ in my case), a general solution to find the currently configured location of the RDB file (as set using dir in redis.conf - if config file is used at all*) is:
$ cat $(cd / && find | grep redis.conf) | grep dir
*Note that true to its simplicity Redis by default ships without any config files (using built-in configuration which depends on the version, see docs), and this is indeed the case for the official redis:latest container.

Is it possible to make SCP ignore symbolic links during copy?

I need to reinstall one of ours servers, and as a precaution, I want to move /home, /etc, /opt, and /Services to backup server.
However, I have a problem: because of plenty of symbolic links a lot of files are copied multiple times.
Is it possible to make scp ignore the symbolic links (or actually to copy link as a link not as a directory or file)? If not, is there another way to do it?
I knew that it was possible, I just took wrong tool. I did it with rsync
rsync --progress -avhe ssh /usr/local/ XXX.XXX.XXX.XXX:/BackUp/usr/local/
I found that the rsync method did not work for me, however I found an alternative that did work on this website (
Specifically section 7.5.3 of "O'Reilly: SSH: The Secure Shell. The Definitive Guide".
7.5.3. Recursive Copy of Directories
Although scp can copy directories, it isn't necessarily the best method. If your directory contains hard links or soft links, they won't be duplicated. Links are copied as plain files (the link targets), and worse, circular directory links cause scp1 to loop indefinitely. (scp2 detects symbolic links and copies their targets instead.) Other types of special files, such as named pipes, also aren't copied correctly.A better solution is to use tar, which handles special files correctly, and send it to the remote machine to be untarred, via SSH:
$ tar cf - /usr/local/bin | ssh tar xf -
Using tar over ssh as both sender and receiver does the trick as well:
ssh user#remote-host 'cd $REMOTE_SRC_DIR; tar cf - ./' | tar xvf -
One solution is to use a shell pipe. I have a situation where I got some *.gz files and symbolic links generated by some software to link to the same *.gz files with a slightly shorter name. If I simply use scp, then the symbolic links will be copied as regular files and resulting in duplicates. I know rsync can ignore symbolic links, but my gz files are not compressed with syncable options, and sync is very slow in copying these gz files. So I simply use the following script to copy over the files:
find . -type f -exec scp {} target_host:/directory/name/data \;
The -f option will only find regular files and ignore symbolic links. You need to give this command on the source host. Hope this may help some user in my situation. Let me know if I missed anything.
A one liner solution which can be executed at client to copy folder from server using tar + ssh command.
ssh user#<Server IP/link> 'mkdir -p <Remote destination directory;cd <Remote destination directory>; tar cf - ./' | tar xf - C <Source destination directory>
Note: mkdir is must, if the remote destination directory is not present then the command will simply compress the entire home of the remote server and extract it to client.

Can someone help explain this code? It is a shell script for creating a checksum list

# create a list of checksums
cat /dev/null > MD5SUM
for i in */*/*.sql ; do test -e $i && md5sum $i >>MD5SUM ; done
Then this command is used to check to see if anything has changed:
md5sum -c MD5SUM
It works fine and everything. I just don't really understand how. Say if I wanted to make a checksum list of all the files in my home directory $HOME how can I do that? What does the */*/*.sql part of the for loop mean? I'm assuming that is to display SQL files only but how can I modify that? Say I wanted all files in the directory? Why is it not just *.sql ? What does the rest of the for loop do in this case?
Lets go by parts:
cat /dev/null > MD5SUM
this will only "erase" the previous MD5SUM file/list that was created before.
for i in */*/*.sql;
this will iterate over files that are 2 directories deep from your current folder. If you have folders
and you run your script in your home folder (~) all "*.sql" inside directories b,d,f will have the checksum calculated and piped to a file MD5SUM in the current direcotry:
do test -e $i && md5sum $i >>MD5SUM ; done
Now Answering your questions:
Say if I wanted to make a checksum list of all the files in my home directory $HOME how can I do that?
I would use the find command with the exec option
find $HOME -maxdepth 1 -name \*.sql -exec md5sum {} \;
What does the //*.sql part of the for loop mean?
I answered it above, anyway only goes 2 directories deep before getting to the files.
I'm assuming that is to display SQL files only but how can I modify that? Say I wanted all files in the directory?
for i in */*/*.sql;
for i in */*/*;
or for current directory
find $HOME -maxdepth 1 -name \* -exec md5sum {} \;
Why is it not just *.sql ? What does the rest of the for loop do in this case?
Explained before.
Hope it helps =)

How to ignore certain files when branching / checking out?

I'd like to compare a few files from the bazaar branch lp:ubuntu/nvidia-graphics-drivers. I'm mainly interested in the debian subdirectory inside that branch, but due to the binary blob in, it takes ages to get just the text files. I've already downloaded 555MB and it's still counting.
Is it possible to retrieve a bazaar branch, including or excluding certain files by one of the following properties:
file size
file extension
file name (include only debian/ for example)
I do not need to push back any changes, nor do I need to view the history of a file. I just want to compare two files in the debian/ directory, files with the .in extension and files without.
As far as I'm aware, no. You're downloading the branch history, not just the individual files. And each file is an integral part of the branch's history.
On the bright side, you only have to check it out once. Unless those binary files change, they'll be skipped the next time you pull from Launchpad.
Depending on the branch's history, you may be able to cut down on the download size if you use a lightweight checkout (bzr checkout --lightweight). But of course, that may come back and bite you later, as it means you won't get a local copy of the branch, only the checked-out files. So it'll work much like SVN, where every operation has to go through the server. And as long as you don't need to look at the branch history, or commit your changes, that should serve you just fine, I believe.
I ended up doing some dirty grep-ing on the HTTP response since bzr info "$branch" and bzr ls -d "$branch" "$directory" did not provide enough information to me.
The below Bash script relies on the working of Launchpads front-end Loggerhead. It recursively downloads from a given URL. Currently, it ignores *.run files. Save it as bzrdl in a directory available from $PATH and run it with bzrdl All files will be saved in the current directory, be sure that it's empty to avoid conflicts.
if ! [[ $rooturl =~ /$ ]]; then
echo "Usage: ${0##*/} URL"
echo "URL must end with a slash. Example URL:"
echo ""
exit 1
tmpdir="$(mktemp -d)"
# used for holding HTTP response before extracting data
# url_filter reads download URLs from stdin (piped)
url_filter() {
grep -v '\.run$'
get_files_from_dir() {
local slash=/
local dir="$1"
# to avoid name collision: a/b/c/ -> a.d/b.d/c.d/
local storedir="${dir//$slash/.d${slash}}"
mkdir -p "$tmpdir/$storedir" "$target/$dir"
local i subdir
for ((i=0; i<$max_retries; i++ )); do
if wget -O "$tmp" "$rooturl$dir"; then
# store file list
grep -F -B 1 '<img src="/static/images/ico_file_download.gif" alt="Download File" />' "$tmp" |\
grep '^<a' | cut -d '"' -f 2 | url_filter \
> "$tmpdir/$storedir/files"
for subdir in $(grep -F -B 1 '<img src="/static/images/ico_folder.gif" ' "$tmp" | \
grep -F '<a ' | rev | cut -d / -f 2 | rev); do
IFS=$' \t\n'
get_files_from_dir "$dir$subdir/"
echo "Failed to download directory listing of: $dir" >> "$tmpdir/errors"
download_files() {
local slash=/
local dir="$1"
# to avoid name collision: a/b/c/ -> a.d/b.d/c.d/
local storedir="${dir//$slash/.d${slash}}"
local done=false
local subdir
cd "$tmpdir/$storedir"
for ((i=0; i<$max_retries; i++)); do
if wget -B "$rooturl$dir" -nc -i files -P "$target/$dir"; then
$done || echo "Failed to download all files from $dir" >> "$tmpdir/errors"
for subdir in *.d; do
download_files "$dir${subdir%%.d}/"
get_files_from_dir ''
# make *.d expand to nothing if no directories are found
shopt -s nullglob
download_files ''
echo "TMP dir: $tmpdir"
echo "Errors : $(wc -l "$tmpdir/errors" 2>/dev/null | cut -d ' ' -f 2 || echo 0)"
The temporary directory and file is not removed afterwards, that must be done manually. Any errors (failures to download) will be written to $tmpdir/errors
It's confirmed to work with:
Feel free to correct any mistakes or add improvements.
There is no way to selectively check out a specific directory from a Bazaar branch at the moment, although we do have plans to add such support in the future.
There is definitely too much traffic for the clone you are doing, considering the size of the branch. It's probably a bug in the client implementation.
Here on bzr 2.4 it is still quite slow but not too bad (60s):
localhost:/tmp% bzr branch
Most recent Ubuntu Oneiric version: 275.09.07-0ubuntu1
Packaging branch status: CURRENT
Branched 37 revision(s).
From the log:
[11866] 2011-07-31 00:56:57.007 INFO: Branched 37 revision(s).
56.786 Transferred: 5335kB (95.8kB/s r:5314kB w:21kB)