scp: how to transfer files from different directories preserving the whole path - scp

I need to transfer all *.png from different directories from a remote server BUT preserving the full path of each .png file because all .png files have the same name.
scp -r -e server:coverages/K4me3/*/pos/output/*/*.png Desktop/
While coping it rewrites already existing .png files because the names od them are the same in different directories. I want to preserve the full directory path,s o that when copying, the .png files are copied within their own directories.

SCP doesn't preserve the paths of files on its own, as you have discovered.
You'll probably want to use rsync to do this, since rsync does preserve paths
I think the command would be:
rsync -a -r -v -z server_config:/path/to/root/directory/on/server [destination_folder]
This is the reverse of this question: scp a folder to a remote system keeping the directory layout
Alternatively, and as the comments suggest, you can write a script to get all of the files or lower level directories (with absolute path) and call an scp transfer on each of them. Here is a script that I at one point used to copy files in this way:
#!/usr/bin/env python
from multiprocessing.dummy import Pool
from subprocess import call
from functools import partial
root = # Root Directory
files = [
root + # Sub 1,
root + # Sub 2,
root + # Sub 3,
root + # Sub etc,
]
command_s = "scp -r -v -c arcfour -F /path/to/.ssh/config Server:"
command_e = " Output_Dir/"
max_processes = 4
# Transfer the files 4 at a time because my computer is busy with other stuff
cmds = []
for filename in files:
cmds.append(command_s + filename + command_e)
pool = Pool(max_processes)
for i, returncode in enumerate(pool.imap(partial(call, shell=True), cmds)):
if returncode != 0:
print ("%d command failed: %d" % (i, returncode))

Here is an answer to preserve directory structure and copy just the png files from a server to a local system based on ssh.
ssh user#server 'find /server/path -name "*.png" -print0 | xargs -0 tar -cO' | tar -xfv - -C .
Source: Link

Related

Secure copying files from a remote server to local machine from a list in a text file

I have about a thousand files on a remote server (all in different directories). I would like to scp them to my local machine. I would not want to run scp command a thousand times in a row, so I have created a text file with a list of file locations on the remote server. It is a simple text file with a path on each line like below:
...
/iscsi/archive/aat/2005/20050801/A/RUN0010.FTS
/iscsi/archive/aat/2006/20060201/A/RUN0062.FTS
/iscsi/archive/aat/2013/20130923/B/RUN0010.FTS
/iscsi/archive/aat/2009/20090709/A/RUN1500.FTS
...
I have searched and found someone trying to do a similar but not the same thing here. The command I would like to edit is below:
cat /location/file.txt | xargs -i scp {} user#server:/location
In my case I need something like:
cat fileList.txt | xargs -i scp user#server:{} .
To download files from a remote server using the list in fileList.txt located in the same directory I run this command from.
When I run this I get an error: xargs: illegal option -- i
How can I get this command to work?
Thanks,
Aina.
You get this error xargs: illegal option -- i because -i was deprecated. Use -I {} instead (you could also use a different replace string but {} is fine).
If the list is remote, the files are remote, you can do this to retrieve it locally and use it with xargs -I {}:
ssh user#server cat fileList.txt | xargs -I {} scp user#server:{} .
But this creates N+1 connections, and more importantly this copies all remote files (scattered in different directories you said) to the same local directory. Probably not what you want.
So, in order to recreate a similar hierarchy locally, let's say everything under /iscsi/archive/aat, you can:
use cut -d/ to extract the part you want to be identical on both sides
use a subshell to create the command that creates the target directory and copies the file there
Thus:
ssh user#server cat fileList.txt \
| cut -d/ -f4- \
| xargs -I {} sh -c 'mkdir -p $(dirname {}); scp user#server:/iscsi/archive/{} ./{}'
Should work, but that's starting to look messy, and you still have N+1 connections, so now rsync looks like a better option. If you have passwordless ssh connection, this should work:
rsync -a --files-from=<(ssh user#server cat fileList.txt) user#server:/ .
The leading / is stripped by rsync and in the end you'll get everything under ./iscsi/archive/....
You can also copy the files locally first, and then:
rsync -a --files-from=localCopyOfFileList.txt user#server:/ .
You can also manipulate that file to remove for example 2 levels:
rsync -a --files-from=localCopyOfFileList2.txt user#server:/iscsi/archive .
etc.

s3cmd: backup folder to Amazon S3

I'm using s3cmd to backup my databases to Amazon S3, but I'd also like to backup a certain folder and archive it.
I have this part from this script that successfully backups the databases to S3:
# Loop the databases
for db in $databases; do
# Define our filenames
filename="$stamp - $db.sql.gz"
tmpfile="/tmp/$filename"
object="$bucket/$stamp/$filename"
# Feedback
echo -e "\e[1;34m$db\e[00m"
# Dump and zip
echo -e " creating \e[0;35m$tmpfile\e[00m"
mysqldump -u root -p$mysqlpass --force --opt --databases "$db" | gzip -c > "$tmpfile"
# Upload
echo -e " uploading..."
s3cmd put "$tmpfile" "$object"
# Delete
rm -f "$tmpfile"
done;
How can I add another section to archive a certain folder, upload to S3 and then delete the local archive?
Untested and basic but this should get the job done with some minor tweaks
# change to tmp dir - creating archives with absolute paths can be dangerous
cd /tmp
# create archive with timestamp of dir /path/to/directory/to/archive
tar -czf "$stamp-archivename.tar.gz" /path/to/directory/to/archive
# upload archive to s3 bucket 'BucketName'
s3cmd put "/tmp/$stamp-archivename.tar.gz" s3://BucketName/
# remove local archive
rm -f "/tmp/$stamp-archivename.tar.gz"

scp files in a certain order using ls

Whenever I try to SCP files (in bash), they end up in a seemingly random(?) order.
I've found a simple but not-very-elegant way of keeping a desired order, described below. Is there a clever way of doing it?
Edit: deleted my early solution from here, cleaned, adapted using other suggestions, and added as an answer below.
To send files from a local machine (e.g. your laptop) to a remote (e.g. your calculation server), you can use Merlin2011's clever solution:
Go into the folder in your local machine where you want to copy files from.
Execute the scp command, assuming you have an access key for the remote server:
scp -r $(ls -rt) user#foo.bar:/where/you/want/them/.
Note: if you don't have a public access key it may be better to do something similar using tar, then send the tar file, i.e. tar -zcvf files.tar.gz $(ls -rt), and then send that tar file on its own using scp.
But to do it the other way around you might not be able to run the scp command directly from the remote server to send files to, say, your laptop. Instead, you may need to, let's say bring files into your laptop. My brute-force solution is:
In the remote server, cd into the folder you want to copy files from.
Create a list of the files in the order you want. For example, for reverse order of creation (most recent copied last):
ls -rt > ../filenames.txt
Now you need to add the path to each file name. Before you go up to the directory where the list is, print the path using pwd. Now do go up: cd ..
You now need to add this path to each file name in the list. There are many ways to do this, here's one using awk:
cat filenames.txt | awk '{print "path/to/files/" $0}' > delete_me.txt
You need the filenames to be in the same line, separated by a space, so change newlines to spaces:
tr '\n' ' ' < delete_me.txt > filenames.txt
Get filenames.txt to the local server, and put it in the folder where you want to copy the files into.
The scp run would be:
scp -r user#foo.bar:"$(cat filenames.txt)" .
Similarly, this assumes you have a private access key, otherwise it's much simpler to tar the file in the remote, and bring that.
One can achieve file transfer with alphabetical order using rsync:
rsync -P -e ssh -r user#remote_host:/some_path local_path
P allows partial downloading, e sets the SSH protocol and r downloads recursively.
You can do it in one line without an intermediate using xargs:
ls -r <directory> | xargs -I {} scp <Directory>/{} user#foo.bar:folder/
Of course, this would require you to type your password multiple times if you do not have public key authentication.
You can also use cd and still skip the intermediate file.
cd <directory>
scp $(ls -r) user#foo.bar:folder/

Bash copying specific files

How can I get tar/cp to copy only files that dont end in .jar and only in root and /plugins directories?
So, I'm making a Minecraft server backup script. One of the options I wish to have is a backup of configuration files only. Here's the scenario:
There are many folders with massive amounts of data in.
Configuration files mainly use the following extensions, but some may use a different one:
.yml
.json
.properties
.loc
.dat
.ini
.txt
Configuration files mainly appear in the /plugins folder
There are a few configuration files in the root directory, but none in any others except /plugins
The only other files in these two directories are .jar files - to an extent. These do not need to be backed up. That's the job of the currently-working plugins flag.
The code uses a mix of tar and cp depending on which flags the user started the process with.
The process is started with a command, then paths are added via a concatenated variable, such as $paths = plugins world_nether mysql/hawk where arguments can be added one at a time.
How can I selectively backup these configuration files with tar and cp? Due to the nature of the configuration process, we needn't have the same flags to add into both commands - it can be seperate arguments for either command.
Here are the two snippets of code in concern:
Configure paths:
# My first, unsuccessful attempt.
if $BKP_CFG; then
# Tell user they are backing up config
echo " +CONFIG $confType - NOT CURRENTLY WORKING"
# Main directory, and everything in plugin directory only
# Jars are not allowed to be backed up
#paths="$paths --no-recursion * --recursion plugins$suffix --exclude *.jar"
fi
---More Pro Stuff----
# Set commands
if $ARCHIVE; then
command="tar -cpv"
if $COMPRESSION; then
command=$command"z"
fi
# Paths starts with a space </protip>
command=$command"C $SERVER_PATH -f $BACKUP_PATH/$bkpName$paths"
prep=""
else
prep="mkdir $BACKUP_PATH/$bkpName"
# Make each path an absolute path. Currently, they are all relative
for path in $paths; do
path=$SERVER_PATH/$path
done
command="cp -av$paths $BACKUP_PATH/$bkpName"
fi
I can provide more code/explaination where neccessary.
find /actual/path ! -iname '*jar' -maxdepth 1 -exec cp \{\} /where/to/copy/ \;
find /actual/path/plugins ! -iname '*jar' -maxdepth 1 -exec cp \{\} /where/to/copy/ \;
Might help.
Final code:
if $BKP_CFG; then
# Tell user what's being backed up
echo " +CONFIG $confType"
# Main directory, and everything in plugin directory only
# Jars are not allowed to be backed up
# Find matches within the directory cd'd to earlier, strip leading ./
paths="$paths $(find . -maxdepth 1 -type f ! -iname '*.jar' | sed -e 's/\.\///')"
paths="$paths $(find ./plugins -type f ! -iname '*.jar' | sed -e 's/\.\///')"
fi

How to ignore certain files when branching / checking out?

I'd like to compare a few files from the bazaar branch lp:ubuntu/nvidia-graphics-drivers. I'm mainly interested in the debian subdirectory inside that branch, but due to the binary blob in http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-graphics-drivers/oneiric/files, it takes ages to get just the text files. I've already downloaded 555MB and it's still counting.
Is it possible to retrieve a bazaar branch, including or excluding certain files by one of the following properties:
file size
file extension
file name (include only debian/ for example)
I do not need to push back any changes, nor do I need to view the history of a file. I just want to compare two files in the debian/ directory, files with the .in extension and files without.
As far as I'm aware, no. You're downloading the branch history, not just the individual files. And each file is an integral part of the branch's history.
On the bright side, you only have to check it out once. Unless those binary files change, they'll be skipped the next time you pull from Launchpad.
Depending on the branch's history, you may be able to cut down on the download size if you use a lightweight checkout (bzr checkout --lightweight). But of course, that may come back and bite you later, as it means you won't get a local copy of the branch, only the checked-out files. So it'll work much like SVN, where every operation has to go through the server. And as long as you don't need to look at the branch history, or commit your changes, that should serve you just fine, I believe.
I ended up doing some dirty grep-ing on the HTTP response since bzr info "$branch" and bzr ls -d "$branch" "$directory" did not provide enough information to me.
The below Bash script relies on the working of Launchpads front-end Loggerhead. It recursively downloads from a given URL. Currently, it ignores *.run files. Save it as bzrdl in a directory available from $PATH and run it with bzrdl http://launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-graphics-drivers/oneiric/files/head:/debian/. All files will be saved in the current directory, be sure that it's empty to avoid conflicts.
#!/bin/bash
max_retries=5
rooturl="$1"
if ! [[ $rooturl =~ /$ ]]; then
echo "Usage: ${0##*/} URL"
echo "URL must end with a slash. Example URL:"
echo "http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-graphics-drivers/oneiric/files/head:/"
exit 1
fi
tmpdir="$(mktemp -d)"
target="$(pwd)"
# used for holding HTTP response before extracting data
tmp="$(mktemp)"
# url_filter reads download URLs from stdin (piped)
url_filter() {
grep -v '\.run$'
}
get_files_from_dir() {
local slash=/
local dir="$1"
# to avoid name collision: a/b/c/ -> a.d/b.d/c.d/
local storedir="${dir//$slash/.d${slash}}"
mkdir -p "$tmpdir/$storedir" "$target/$dir"
local i subdir
for ((i=0; i<$max_retries; i++ )); do
if wget -O "$tmp" "$rooturl$dir"; then
# store file list
grep -F -B 1 '<img src="/static/images/ico_file_download.gif" alt="Download File" />' "$tmp" |\
grep '^<a' | cut -d '"' -f 2 | url_filter \
> "$tmpdir/$storedir/files"
IFS=$'\n'
for subdir in $(grep -F -B 1 '<img src="/static/images/ico_folder.gif" ' "$tmp" | \
grep -F '<a ' | rev | cut -d / -f 2 | rev); do
IFS=$' \t\n'
get_files_from_dir "$dir$subdir/"
done
return
fi
done
echo "Failed to download directory listing of: $dir" >> "$tmpdir/errors"
}
download_files() {
local slash=/
local dir="$1"
# to avoid name collision: a/b/c/ -> a.d/b.d/c.d/
local storedir="${dir//$slash/.d${slash}}"
local done=false
local subdir
cd "$tmpdir/$storedir"
for ((i=0; i<$max_retries; i++)); do
if wget -B "$rooturl$dir" -nc -i files -P "$target/$dir"; then
done=true
break
fi
done
$done || echo "Failed to download all files from $dir" >> "$tmpdir/errors"
for subdir in *.d; do
download_files "$dir${subdir%%.d}/"
done
}
get_files_from_dir ''
# make *.d expand to nothing if no directories are found
shopt -s nullglob
download_files ''
echo "TMP dir: $tmpdir"
echo "Errors : $(wc -l "$tmpdir/errors" 2>/dev/null | cut -d ' ' -f 2 || echo 0)"
The temporary directory and file is not removed afterwards, that must be done manually. Any errors (failures to download) will be written to $tmpdir/errors
It's confirmed to work with:
bzrdl http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-settings/oneiric/files/head:/debian/
Feel free to correct any mistakes or add improvements.
There is no way to selectively check out a specific directory from a Bazaar branch at the moment, although we do have plans to add such support in the future.
There is definitely too much traffic for the clone you are doing, considering the size of the branch. It's probably a bug in the client implementation.
Here on bzr 2.4 it is still quite slow but not too bad (60s):
localhost:/tmp% bzr branch http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-settings/oneiric
Most recent Ubuntu Oneiric version: 275.09.07-0ubuntu1
Packaging branch status: CURRENT
Branched 37 revision(s).
From the log:
[11866] 2011-07-31 00:56:57.007 INFO: Branched 37 revision(s).
56.786 Transferred: 5335kB (95.8kB/s r:5314kB w:21kB)