Snakemake in cluster mode with --no-shared-fs: How to set cluster-status - snakemake

I'm running Snakemake in a cluster environment and would like to use S3 as shared file system for writing output files.
Options --default-remote-provider, --default-remote-prefix and --no-shared-fs are set accordingly. The cluster uses UGE as scheduler, so setting --cluster is straightforward, but how do I set --cluster-status, whose use is enforced when using --no-shared-fs?
My best guess was a naive --cluster-status "qstat -j" which resulted in
subprocess.CalledProcessError: Command 'qstat Your job 2 ("snakejob.bwa_map.1.sh") has been submitted' returned non-zero exit status 1.
So I guess my question is, how do I get the actual jobid in there?
Thanks!
Andreas
EDIT 1:
I found https://groups.google.com/forum/#!topic/snakemake/7cyqAIfgeq4, so cluster-status has to be a script. So I wrote a Python script that is able to parse the above line, however snakemake still fails with:
/bin/sh: -c: line 0: syntax error near unexpected token `('
/bin/sh: -c: line 0: `/home/ec2-user/clusterstatus.py Your job 2 ("snakejob.bwa_map.1.sh") has been submitted'
...
subprocess.CalledProcessError: Command '/home/ec2-user/clusterstatus.py
Your job 2 ("snakejob.bwa_map.1.sh") has been submitted' returned non-zero exit status 1.

To answer my own question:
First, I needed the -terse option for qsub (which I had not added at first in my case and snakemake somehow remembered the wrong cluster command
Secondly, the cluster-status argument needs to point to a script being able to get the job status (job id being the only argument) and output "failed", "running" or "success".

Related

"scancel: error: Invalid job id Submitted batch job" with --cluster-cancel from snakemake

I am running snakemake using this command:
snakemake --profile slurm -j 1 --cores 1 --cluster-cancel "scancel"
which writes this to standard out:
Submitted job 224 with external jobid 'Submitted batch job 54174212'.
but after I cancel the run with ctrl + c, I get the following error:
scancel: error: Invalid job id Submitted batch job 54174212
What I would guess is that the jobid is 'Submitted batch job 54174212'
and snakemake tries to run scancel 'Submitted batch job 54174212' instead of the expected scancel 54174212. If this is the case, how do I change the jobid to something that works with scancel?
Your suspicion is probably correct, snakemake probably tries to cancel the wrong job (id Submitted batch job 54174212).
Check your slurm profile for snakemake which you invoke (standard location ~/.config/snakemake/slurm/config.yaml):
Does it contain the --parsable flag for sbatch?
Missing to include that flag is a mistake I made before. Adding the flag solved it for me.

making snakefile for data analysis

I am making a Snakefile for data analysis. the extension of my raw data is .RCC. for example the first input file I have is: CF30207_01.RCC.
and the script I am running on the data is QC.py. Looking at the tutorial, I have made the following snakefile:
SAMPLES = ["CF30207_01",
"CF30212_06",
"CF30209_03",
"CF30213_07",
"CF30211_05",
"CF30214_08"]
rule all:
input:
expand('{sample}.RCC', sample=SAMPLES)
rule QC:
input:
rc = '/home/snakemaker/{sample}.RCC'
output:
'{sample}.pdf'
"quality_control.csv"
shell:
"python3 QC.py"
but I got the following errors:
./Snakefile: line 1: SAMPLES: command not found
./Snakefile: line 2: CF30212_06,: command not found
./Snakefile: line 3: CF30209_03,: command not found
./Snakefile: line 4: CF30213_07,: command not found
./Snakefile: line 5: CF30211_05,: command not found
./Snakefile: line 6: CF30214_08]: command not found
./Snakefile: line 8: rule: command not found
./Snakefile: line 9: input:: command not found
./Snakefile: line 10: syntax error near unexpected token `'{sample}.RCC','
./Snakefile: line 10: ` expand('{sample}.RCC', sample=SAMPLES)'
but I followed exactly the same structures. do you guys know how I can fix the problem is with this snakefile?
I guess you are executing the snakefile script itself as ./Snakefile. Instead, you should do
snakemake -s /path/to/Snakefile
Or just snakemake if the Snakefile is in the current directory.
Welcome to snakemake! You have a good start, but couple of other notes on your snakefile.
rule all:
input:
expand('{sample}.RCC', sample=SAMPLES)
The rule all should request the final outputs of your workflow, not the inputs. These are the files you are requesting to be made. Change the input to:
expand('{sample}.pdf', sample=SAMPLES)
For the QC rule, it doesn't seem like you are passing the input/output files to the QC.py script. If you have command line arguments in that function, you can add them like:
"python3 QC.py --input {input.rc} --output {output[0]}"
Alternatively you can pass QC.py to the script directive and use snakemake.input[0], etc to access the files in your python code.
Within the output
output:
'{sample}.pdf'
"quality_control.csv"
You need to add a comma between the files to make them a list. Also note that every sample will output to the same quality_control.csv. At best this will overwrite and only keep the last sample, if you have multithreading you may have an error in your python code. You may want something like:
output:
'{sample}.pdf',
'quality_control_{sample}.csv'
If your QC code actually appends to quality_control, you can instead force a single execution at a time for that rule with custom resources
A good test for new snakefiles is to run snakemake -nq to make sure the file syntax is ok and you have the expected number of rules queued up.

hide error messages in dcl script

I have a test script I'm running that generates some errors,shown below, I expect these errors. Is there anyway I can prevent them from showing on the screen however? I use the
$ write sys$output
to display if there is an expected error.
I tried to use
$ DEFINE SYS$ERROR ERROR.LOG
but this then changed my entire error output log to this, if this is the correct way to handle it can I unset this at the end of my script somehow?
[error example]
%DCL-E-OPENIN, error opening TEST$DISK:[AAA]NOTTHERE.TXT; as input
-RMS-E-FNF, file not found
%DCL-E-OPENIN, error opening TEST$DISK:[AAA]NOTTHERE.TXT; as input
-RMS-E-FNF, file not found
%DCL-W-UNDFIL, file has not been opened by DCL - check logical name
DEFINE/USER creates a logical name that disappears when the next image exits.
So if you use that just before a command just to protect that command, then fine.
Otherwise I would prefer SET MESSAGE to control the output.
And of course yoy want to grab $STATUS and verify it after the command for success or for the expected error, reporting any unexpected error.
Better still... if you expect certain error conditions to occur,
then why not test for them?
For example:
$ file = F$SEARCH("TEST$DISK:[AAA]NOTTHERE.TXT")
$ IF file.NES."" THEN TYPE 'file'
Cheers,
Hein
To suppress Error message inside a script. try this command
$ DEFINE/USER SYS$ERROR NL:
NL: is a null device, so you don`t see any error messages displayed on your terminal.
good luck
This works interactively and in batch.
$ SET MESSAGE /NOTEXT /NOSEV /NOFAC /NOID
$ <DCL_Command>
$ SET MESSAGE /TEXT /SEV /FAC/ ID

Bad spawn_id while executing expect command

I am writing a script that will copy Valgrind onto whatever shelf that we enter on the command line. The syntax is as follows:
vgrindCopy [shelf number]
For some reason, the files will copy over without any issue, but after the copy completes the follow error is observed:
bad spawn_id (process died earlier?)
while executing
"expect "#""
Here is a copy of the relevant code:
function login_shelf {
expect -c "
set timeout 15
spawn $1
expect \"password:\"
send \"$PW\r\"
expect \"#\"
sleep 1
exit
"
}
# login and make the valgrind directory at /sfs/software/shelf/current
set -- /opt/swe/tools/ext/gnu/valgrind-3.7.0/i686-linux2.6/lib/valgrind/*
login_shelf "/opt/corp/projects/shelftools/bin/app rsync -Lau $* $shelf:/shelf/valgrind"
After playing around with the code, I found that if I remove the line "expect \"#\"", then the program doesn't copy any of the files over anymore. What odd as well is that I'm seeing the issue when I run the script, but a co-worker is not.
Has anyone had a similar issue and determined the cause? Any help would be greatly appreciated as always!
Your code is spawning the rsync and at the expect \"#\" is waiting for rsync to output a #, which it never does, so it exits and expect reports the error.
When you remove the expect \"#\" the expect script exits, terminating the rsync.
Instead of expect \"#\" you should wait for rsync to exit:
expect eof
wait

127 Return code from $?

What is the meaning of return value 127 from $? in UNIX.
Value 127 is returned by /bin/sh when the given command is not found within your PATH system variable and it is not a built-in shell command. In other words, the system doesn't understand your command, because it doesn't know where to find the binary you're trying to call.
Generally it means:
127 - command not found
but it can also mean that the command is found,
but a library that is required by the command is NOT found.
127 - command not found
example: $caat
The error message will
bash:
caat: command not found
now you check using echo $?
A shell convention is that a successful executable should exit with the value 0. Anything else can be interpreted as a failure of some sort, on part of bash or the executable you that just ran. See also $PIPESTATUS and the EXIT STATUS section of the bash man page:
For the shell’s purposes, a command which exits with a zero exit status has succeeded. An exit status
of zero indicates success. A non-zero exit status indicates failure. When a command terminates on a
fatal signal N, bash uses the value of 128+N as the exit status.
If a command is not found, the child process created to execute it returns a status of 127. If a com-
mand is found but is not executable, the return status is 126.
If a command fails because of an error during expansion or redirection, the exit status is greater than
zero.
Shell builtin commands return a status of 0 (true) if successful, and non-zero (false) if an error
occurs while they execute. All builtins return an exit status of 2 to indicate incorrect usage.
Bash itself returns the exit status of the last command executed, unless a syntax error occurs, in
which case it exits with a non-zero value. See also the exit builtin command below.
It has no special meaning, other than that the last process to exit did so with an exit status of 127.
However, it is also used by bash (assuming you're using bash as a shell) to tell you that the command you tried to execute couldn't be executed (i.e. it couldn't be found). It's unfortunately not immediately deducible though, if the process exited with status 127, or if it couldn't found.
EDIT:
Not immediately deducible, except for the output on the console, but this is stack overflow, so I assume you're doing this in a script.
If you're trying to run a program using a scripting language, you may need to include the full path of the scripting language and the file to execute. For example:
exec('/usr/local/bin/node /usr/local/lib/node_modules/uglifycss/uglifycss in.css > out.css');
This error is also at times deceiving. It says file is not found even though the files is indeed present. It could be because of invalid unreadable special characters present in the files that could be caused by the editor you are using. This link might help you in such cases.
-bash: ./my_script: /bin/bash^M: bad interpreter: No such file or directory
The best way to find out if it is this issue is to simple place an echo statement in the entire file and verify if the same error is thrown.
If the IBM mainframe JCL has some extra characters or numbers at the end of the name of unix script being called then it can throw such error.
In addition to the given answers, note that running a script file with incorrect end-of-line characters could also result in 127 exit code if you use /bin/sh as your shell.
As an example, if you run a shell script with CRLF end-of-line characters in a UNIX-based system and in the /bin/sh shell, it is possible to encounter some errors like the following I've got after running my script named my_test.sh :
$ ./my_test.sh
sh: 2: ./my_test.sh: not found
$ echo $?
127
As a note, using /bin/bash, I got 126 exit code, which is in accordance with gnu.org documentation about the bash :
If a command is not found, the child process created to execute it returns a status of 127. If a command is found but is not executable, the return status is 126.
Finally, here is the result of running my script in /bin/bash :
arman#Debian-1100:~$ ./my_test.sh
-bash: ./my_test.sh: /bin/bash^M: bad interpreter: No such file or directory
arman#Debian-1100:~$ echo $?
126
go to C:\Program Files\Git\etc
open gitconfig with notepad
change
[core]
autocrlf = true
To
[core]
autocrlf = false