what is the snakemake variable name for slurm job id? - snakemake

When I run jobs on snakemake using --profile ./slurm, I see in standard output:
Submitted job 406 with external jobid '1956125'
in slurm/config.yaml I have:
cores: "all"
cluster: "sbatch --partition=mypartition -A myaccount -t {resources.time_min} --mem={resources.mem_mb} -c {resources.cpus} -o slurm/logs/{jobid}.out -e slurm/logs/{jobid}.err --mail-type=FAIL --mail-user=mymail.edu --parsable"
default-resources: [cpus=1, mem_mb=2000, time_min=10080, parition=mypartition]
use-conda: true
This writes log files like 406.err and what I want is 1956125.err
How do I do this?

406 is the internal jobid from snakemake. You want the external jobid from slurm.
IIRC that should be possible by using %j instead of {jobid}:
cluster: "sbatch --partition=mypartition -A myaccount -t {resources.time_min} --mem={resources.mem_mb} -c {resources.cpus} -o slurm/logs/%j.out -e slurm/logs/%j.err --mail-type=FAIL --mail-user=mymail.edu --parsable"
Let us know if it works.

Related

rsync not finding local directory when sending through SSH on pipeline

Using bitbucket pipelines to push to our remote from the build process that you get from the pipeline.
This is a snippet of the bitbucket-pipelines.yml file
- pipe: atlassian/ssh-run:0.2.2
variables:
SSH_USER: $PRODUCTION_USER
SERVER: $PRODUCTION_SERVER
COMMAND: '''rsync -zrSlh -e "ssh -p 22007" --stats --max-delete=0 $BITBUCKET_CLONE_DIR/ $PRODUCTION_USER#$PRODUCTION_SERVER:home/$PRODUCTION_USER'''
PORT: '22007'
The connection itself works, and it does run the command correctly once it is remoted onto the server...
INFO: Executing the pipe...
INFO: Using default ssh key
INFO: Executing command on {HOST}
ssh -A -tt -i /root/.ssh/pipelines_id -o StrictHostKeyChecking=no -p 22007 {USER}#{HOST} 'rsync -zrSlh -e "ssh -p 22007" --stats --max-delete=0 /opt/atlassian/pipelines/agent/build/ {USER}#{HOST}:home/{USER}'
bash: rsync -zrSlh -e "ssh -p 22007" --stats --max-delete=0 /opt/atlassian/pipelines/agent/build/ {USER}#{HOST}:home/{USER}: No such file or directory
Connection to {HOST} closed.
I've tried to run the same command locally from the directory on my machine
ssh -A -tt -i /root/.ssh/pipelines_id -o StrictHostKeyChecking=no -p 22007 {USER}#{HOST} 'rsync -zrSlh -e "ssh -p 22007" --stats --max-delete=0 "$PWD" {USER}#{HOST}:/home/{USER}'
but it just duplicates the home directory on the remote.
It looks to me like it's looking for the source directory on the server and not looking at the docker container from bitbucket (or the files on my local machine with pwd).
If I try to run the command without the '' then it fails because it's using port 22 by default. I've also tried offsetting the command into a bash script and using MODE: 'Script' which is an acceptable pattern for the plugin, but I can't use my environment variables in the sh file.
If all you wan't to do is to copy the files from the pipeline to the production server, you should you the rsync-deploy pipe, instead of the ssh-run. Your pipe configuration is gonna look pretty much like the following:
script:
- pipe: atlassian/rsync-deploy:0.3.2
variables:
USER: $PRODUCTION_USER
SERVER: $PRODUCTION_USER
REMOTE_PATH: 'home/$PRODUCTION_USER'
LOCAL_PATH: 'build'
SSH_PORT: '22007'
Make sure to configure your SSH keys in pipelines properly (here is a link to our docs for configuring SSH keys https://confluence.atlassian.com/bitbucket/use-ssh-keys-in-bitbucket-pipelines-847452940.html)
I've found another way around this instead of needing a plugin, instead I'm running an rsync as a script step
image: atlassian/default-image:latest
- rsync -rltDvzCh --max-delete=0 --stats --exclude-from=excludes -e 'ssh -e none -p 22007' $BITBUCKET_CLONE_DIR/ $PRODUCTION_USER#$PRODUCTION_SERVER:/home/$PRODUCTION_USER
It seems the -e none is an important addition, as is loading in the atlassian image, as fails to find the rsync function, otherwise. I found this info on this post on Atlassian Community.
This seems to work pretty well for me
image: node:10.15.3
pipelines:
default:
- step:
name: <project-path>
script:
- apt-get update && apt-get install -y rsync
- ssh-keyscan -H $SSH_HOST >> ~/.ssh/known_hosts
- cd $BITBUCKET_CLONE_DIR
- rsync -r -v -e ssh . $SSH_USER#$SSH_HOST:/<project-path>
- ssh $SSH_USER#$SSH_HOST 'cd <project-path> && npm install'
- ssh $SSH_USER#$SSH_HOST 'pm2 restart 0'
Note: Avoid using sudo cmd in pipeline scripts
same issue with atlassian/default-image:3
rsync -azv ./project_path/*
bash: rsync: command not found
Solution:
apt-get update && apt-get install -y rsync

CommandException: Caught non-retryable exception - aborting rsync

After using gsutil for more than 1 year I suddenly have this error:
.....
At destination listing 8350000...
At destination listing 8360000...
CommandException: Caught non-retryable exception - aborting rsync
.....
I tried to locate the files with this sync problem but I am not able to do so. Is there a "skip error" option of is there a way I can have gsutil more verbose?
My command line is like this:
gsutil -V -m rsync -d -r -U -P -C -e -x -x 'Download/*' /opt/ gs://mybucket1/kraanloos/
I have created a script to split the problem. This gives me more info for a solution
!#/bin/bash
array=(
3ware
AirTime
Amsterdam
BigBag
Download
guide
home
Install
Holding
Multimedia
newsite
Overig
Trak-r
)
for i in "${array[#]}"
do
echo Processing : $i
PROCESS="/usr/bin/gsutil -m rsync -d -r -U -P -C -e -x 'Backup/*' /opt/$i/ gs://mybucket1/kraanloos/$i/"
echo $PROCESS
$PROCESS
echo ""
echo ""
done
I've been struggling with the same problem the last few days. One way to make it super verbose is to put the -D flag before the rsync argument, as in:
gsutil -D rsync ...
By doing that, I found that my problem is due to having # characters in filenames, as in this question.
In my case, it was because of a broken link to a directory.
As blambert said, use the -D option to see exactly what file causes the problem.
I had struggled with this problem as well. I figured it out now.
you need to re-authenticate your Google Cloud SDK Shell and set a target project again.
It seems like rsync will not show the correct error message.
try cp instead, it will guide you to authentic and set the correct primary project
gsutil cp OBJECT_LOCATION gs://DESTINATION_BUCKET_NAME/
after that, your gsutil rsync should run fine.

Error: "attribute "m_numa_nodes" is not a integer value." when running a qsub snakemake

I am trying to run a snakemake with cluster submission for RNAseq analysis. Here is my script:
#path to gff
GFF = "RNASeq/data/ref_GRCh38.p2_top_level.gff3"
#sample names and classes
CTHP = 'CTHP1 CTHP2'.split()
CYP = 'CYP1 CYP2'.split()
samples = CTHP + CYP
rule all:
input:
'CTHP1/mapping_results/out_summary.gtf',
'CTHP2/mapping_results/out_summary.gtf',
'CYP2/mapping_results/out_summary.gtf',
'CYP1/mapping_results/out_summary.gtf',
rule order_sam:
input:
'{samples}/mapping_results/mapped.sam'
output:
'{samples}/mapping_results/ordered.mapped.bam'
threads: 12
params: ppn="nodes=1:ppn=12"
shell:
'samtools view -Su {input} | samtools sort > {output}'
rule count_sam:
input:
bam='{samples}/mapping_results/ordered.mapped.bam'
output:
summary='{samples}/mapping_results/out_summary.gtf',
abun='{samples}/mapping_results/abun_results.tab',
cover='{samples}/mapping_results/coveraged.gtf'
threads: 12
params: ppn="nodes=1:ppn=12"
shell:
'stringtie -o {output.summary} -G {GFF} -C {output.cover} '
'-A {output.abun} -p {threads} -l {samples} {input.bam}'
```
I want to submit each rule to a cluster. So, in the Terminal from the working directory, I do this:
snakemake --cluster "qsub -V -l {params.ppn}" -j 6
However, the jobs are not submitted and I get following error:
Unable to run job: attribute "m_numa_nodes" is not a integer value.
Exiting.
Error submitting jobscript (exit code 1):
I have also tried to set the nodes variable directly when running the snake file like this:
snakemake --cluster "qsub -V -l nodes=1:ppn=16" -j 6
and as expected, it gave me the same error. At this point I am not sure if its the local cluster setup or something that I am not doing right in the snake file. Any help would be great.
Thanks
The error does not look Snakemake related. I am not an SGE/Univa expert so I cannot really help you, but m_numa_nodes is a parameter of the engine. Snakemake does not set it in any way, so it must be either your local configuration or one of the arguments you provide to qsub.
EDIT: 2017/04/12 -- Caught one of the errors in the Google Groups Post. Remove the comma from the last line of input in your "all" rule.
**EDIT: 2017/04/13 -- Was advised the comma is not an issue **
The beauty of Snakemake is sending it to the cluster just requires additional arguments.To determine if its a Cluster issue, or a Snakemake issue, I recommend running a dryrun, via
snakemake -n
Dryrun will not submit any jobs, but it will return the list of jobs. This is a strong indicator if it's a Snakemake issue or a submission issue. I always perform dryruns while in development, to ensure my Snakemake code works before I start trying to submit it to the cluster, because cluster submissions can be a whole different basket of issues.
As per your submission problems, I use the "--drmaa" flag within Snakemake to handle my submissions to the cluster. I realize this is not what you asked for, but I really enjoy its functionality, and I guess I am just suggesting it as a robust alternative to your current approach.
https://pypi.python.org/pypi/drmaa OR https://anaconda.org/anaconda/drmaa
snakemake --jobs 10 --cluster-config input/config.json --drmaa "{cluster.clusterSpec}"
Inside config.json, my rules are mostly all provide this parameter set:
{
"__default__": {
"clusterSpec": "-V -S /bin/bash -o log/varScan -e log/varScan -l h_vmem=10G -pe ncpus 1"
}
}
SGE Cluster Arguments = "-V -S /bin/bash -l h_vmem=10G -pe ncpus 1"
DRMAA Arguments = "-o log/varScan -e log/varScan"
P.S. I think you have to post as well the Operating System (E.g. CentOS5) and your cluster type(E.g. SGE) you are using.

Running pssh as a cron job

I have the script below.
OUTPUT_FOLDER=/home/user/output
LOGFILE=/root/log/test.log
HOST_FILE=/home/user/host_file.txt
mkdir -p $OUTPUT_FOLDER
rm -f $OUTPUT_FOLDER/*
pssh -h $HOST_FILE -o $OUTPUT_FOLDER "cat $LOGFILE | tail -n 100 | grep foo"
When I run this script on its own, it works fine and the $OUTPUT_FOLDER contains the output from the servers in the $HOST_FILE. However, when I ran the script as a cronjob, the $OUTPUT_FOLDER is created, but it's always empty. It's as if the pssh command was never executed.
Why is this? How do I resolve this?

Error while running docker container

I am running a docker image using the following command.
docker run -it -p 8080:8080 -p 29418:29418 --rm \
-e AUTH_TYPE='DEVELOPMENT_BECOME_ANY_ACCOUNT' \
-v /home/gerrit-site:/home/gerrit/site \
-v /home/nidhi/.ssh/id_rsa.pub:/root/.ssh/id_admin_rsa.pub \
-v /home/nidhi/.ssh/id_rsa:/root/.ssh/id_admin_rsa \
-e GERRIT_ADMIN_USER='admin' \
-e GERRIT_ADMIN_EMAIL='admin#fabric8.io' \
-e GERRIT_ADMIN_FULLNAME='Administrator' \
-e GERRIT_ADMIN_PWD='mysecret' \
-e GERRIT_ADMIN_PRIVATE_KEY='/home/gerrit/ssh-keys/id_admin_rsa' \
-e GERRIT_PUBLIC_KEYS_PATH='/home/gerrit/ssh-keys' \
-v /home/nidhi/.ssh:/home/gerrit/ssh-keys \
--name gerrit admin_gerrit
I know the command is right cause I had used this command before and it worked perfectly fine. But now, when I run this command I get the following error,
Error response from daemon: Cannot start container 2c9514c3b0d953344e66525d083c7ec3921cb9cde2185f43ec3bec2579597485: stat /home/nidhi/.ssh/id_rsa: permission denied
I checked the permission for the ssh public and private keys. The permission is 700 and is owned by nidhi. Please can someone point out what my error is.
When docker runs, the uid in your container will likely not match the uid on the host. So with a host volume containing files with 700 permissions, that will not be readable by the uid inside the container. Three options come to mind:
To keep the 700 permissions and same image, you'd need to chown the file on the host to match the uid inside the container.
You can use a named volume instead of a host volume, add your credentials to that named volume, and then set permissions inside there to match the containers where you'll use the volume.
Or you can use a different image that's been rebuilt to change the uid to match your own on the host.