Scp locate's output by Xargs - scp

I want to make a list of files of locate's output.
I want scp to take the list.
I am not sure about the syntax.
My attempt with pseudo-code
locate labra | xargs scp {} masi#11.11.11:~/Desktop/
How can you move the files to the destination?

xargs normally takes as many arguments it can fit on the command line, but using -I it suddenly only takes one. GNU Parallel http://www.gnu.org/software/parallel/ may be a better solution:
locate labra | parallel -m scp {} masi#11.11.11:~/Desktop/
Since you are looking at scp, may I suggest you also check out rsync?
locate labra | parallel -m rsync -az {} masi#11.11.11:~/Desktop/

Typically, {} is a findism:
find ... -exec cmd {} \;
Where {} is the current file that find is working on.
You can get xargs to behave similar with:
locate labra | xargs -I{} echo {} more arguments
However, you'll quickly notice that it runs the commands multiple times instead of one call to scp.
So in the context of your example:
locate labra | xargs -I{} scp '{}' masi#11.11.11:~/Desktop/
Notice the single quotes around the {} as it'll be useful for paths with spaces in them.

Related

--immediate-submit {dependencies} string contains script paths, not job IDs?

I'm trying to use the --immediate-submit on a PBSPro cluster. I tried using an in-place modification of the dependencies string to adapt it to PBSPro, similar to what is done here.
snakemake --cluster "qsub -l wd -l mem={cluster.mem}GB -l ncpus={threads} -e {cluster.stderr} -q {cluster.queue} -l walltime={cluster.walltime} -o {cluster.stdout} -S /bin/bash -W $(echo '{dependencies}' | sed 's/^/depend=afterok:/g' | sed 's/ /:/g')"
This last part gets converted into, for example:
-W depend=afterok: /g/data1a/va1/dk0741/analysis/2018-03-25_marmo_test/.snakemake/tmp.cyrhf51c/snakejob.trimmomatic_pe.7.sh
There are two problems here:
How can I get the dependencies string to output job ID instead of the script path? The qsub command normally outputs the job ID to stdout, so I'm not sure why it's not doing so here.
How do I get rid of the space after afterok:? I've tried everything!
As an aside, it would be helpful if there were some option to debug the submission or not to delete the tmp.cyrhf51c directory in .snakemake -- is there some way to do this?
Thanks,
David
I suggest to use a profile for this, instead of trying to find an ad-hoc solution. This will also help with debugging. E.g., there is already a pbs-torque profile available (https://github.com/Snakemake-Profiles/pbs-torque), probably there is not much to change towards pbspro?

Secure copying files from a remote server to local machine from a list in a text file

I have about a thousand files on a remote server (all in different directories). I would like to scp them to my local machine. I would not want to run scp command a thousand times in a row, so I have created a text file with a list of file locations on the remote server. It is a simple text file with a path on each line like below:
...
/iscsi/archive/aat/2005/20050801/A/RUN0010.FTS
/iscsi/archive/aat/2006/20060201/A/RUN0062.FTS
/iscsi/archive/aat/2013/20130923/B/RUN0010.FTS
/iscsi/archive/aat/2009/20090709/A/RUN1500.FTS
...
I have searched and found someone trying to do a similar but not the same thing here. The command I would like to edit is below:
cat /location/file.txt | xargs -i scp {} user#server:/location
In my case I need something like:
cat fileList.txt | xargs -i scp user#server:{} .
To download files from a remote server using the list in fileList.txt located in the same directory I run this command from.
When I run this I get an error: xargs: illegal option -- i
How can I get this command to work?
Thanks,
Aina.
You get this error xargs: illegal option -- i because -i was deprecated. Use -I {} instead (you could also use a different replace string but {} is fine).
If the list is remote, the files are remote, you can do this to retrieve it locally and use it with xargs -I {}:
ssh user#server cat fileList.txt | xargs -I {} scp user#server:{} .
But this creates N+1 connections, and more importantly this copies all remote files (scattered in different directories you said) to the same local directory. Probably not what you want.
So, in order to recreate a similar hierarchy locally, let's say everything under /iscsi/archive/aat, you can:
use cut -d/ to extract the part you want to be identical on both sides
use a subshell to create the command that creates the target directory and copies the file there
Thus:
ssh user#server cat fileList.txt \
| cut -d/ -f4- \
| xargs -I {} sh -c 'mkdir -p $(dirname {}); scp user#server:/iscsi/archive/{} ./{}'
Should work, but that's starting to look messy, and you still have N+1 connections, so now rsync looks like a better option. If you have passwordless ssh connection, this should work:
rsync -a --files-from=<(ssh user#server cat fileList.txt) user#server:/ .
The leading / is stripped by rsync and in the end you'll get everything under ./iscsi/archive/....
You can also copy the files locally first, and then:
rsync -a --files-from=localCopyOfFileList.txt user#server:/ .
You can also manipulate that file to remove for example 2 levels:
rsync -a --files-from=localCopyOfFileList2.txt user#server:/iscsi/archive .
etc.

Remote rsync in parallel

I'm trying to run rsync over ssh in parallel to transfer files between two machines for evaluation purposes. I wanna see how faster can I get compared to a single rsync process.
I tried these two solutions:
https://wiki.ncsa.illinois.edu/display/~wglick/Parallel+Rsync but with no great success.
https://gist.github.com/rcoup/5358786 (I couldn't make it work)
Based on the first link I run a command like this:
ssh HOST "mkdir -p ~/destdir/basefolder"
cd ./basefolder; ls | xargs -n1 -P 4 -I% rsync -arvuz -e ssh % HOST:~/destdir/basefolder/.
and I get the files transfered, but it doesn't seem to work well... In this case, It will run a process for every file and folder in the basefolder, but when it finds a folder, it will transfer everything inside that folder using only 1 process.
I tried to use find -type f, but I got problems because I loose the file hierarchy.
Does anyone how some methods to do what I want? (Use rsync in parallel over ssh while keeping files and folders hierarchy).
Since you tagged your question 'gnu-parallel' the obvious is to refer you to http://www.gnu.org/software/parallel/man.html#EXAMPLE:-Parallelizing-rsync
cd src-dir; find . -type f -size +100000 | parallel -v ssh fooserver mkdir -p /dest-dir/{//}\;rsync -Havessh {} fooserver:/dest-dir/{}

How can I find ".dat" within all *.mk files?

I am trying to grep for the .dat string in all my *.mk files using the below command. I am wondering if this is right, because it doesn't give me any output.
find . -name "*.mk" | grep *.dat
No it's not right, there are a couple of issues: 1) you seem to be supplying grep with a glob pattern, 2) the pattern is not quoted and will be expanded by the shell before grep ever sees it, 3) you're grep'ing through filenames, not file contents.
To address 1), use Basic Regular Expression, the equivalent here would be .*\.dat or just .dat. 2) is a matter of using single or double-quotes. 3) find returns filenames, so if you want grep to operate on each of those files either use the -exec flag for find or use xargs. All these taken together:
find . -name '*.mk' | xargs grep '.dat'
Use Find's Exec Flag
You don't really need a pipeline here, and can bypass the need for xargs. Use the following invocation to perform a fixed-string search (which is generally faster than a regex match) on each file found by the standard find command:
find . -name '*.mk' -exec grep -F .dat {} \;
If you're using GNU find, you can use this syntax instead to avoid the process overhead of multiple calls to grep:
find . -name '*.mk' -exec grep -F .dat {} +
Use xargs:
find . -name "*.mk"| xargs grep '\.dat'
Using exec option in find command this way:
find . -name "*.mk" -exec grep ".dat" \{\} \;

alias in xargs sourcing tcsh

I am trying to run an xargs command that uses an alias. Searching came up with this
alias gojk 'stsq \!:1 | xargs -t -0 -I {} tcsh -c source ~/.tcshrc.user;myset {}'
but it returns
Bad ! arg selector
and variations will return
source: too few arguments.
tcsh still evaluates the ! character inside of quotes. You need to put a backslash before it.
I'd suggest you make the tcsh part a script, where you pass it an argument, and get this working. Then call the script using xargs.
Use the -m flag to tcsh to have it read your ~/.tcshrc on startup, as in
... | xargs -t0 -I {} tcsh -m -c "<alias> {}"