Keep Snakemake jobscript after a failed cluster execution? - snakemake

I have a workflow that runs correctly on my own machine, but fails when being submitted to a cluster. The error seems to be a shell problem on the jobscript written where the environment variables are set:
/nfs/mypath/test/.snakemake/tmp.m80omc9q/snakejob.retrieve_data.3.sh:
line 3: NEW_ACCESSION='E-CORN-1' ACCESSIONS='E-MTAB-4395,E-MTAB-4342,E-MTAB-4128,E-MTAB-3826,E-MTAB-3173,E-MTAB-964,E-GEOD-62778,E-GEOD-54272'
COVARIATE_TYPE='characteristic'
BATCH='study' COVARIATE='organism part'
CACHE_PATH='/other-path/baseline-merge-cache': No such file or directory
however, because the jobscript gets deleted, I cannot check the shell script to understand the issue. Is there any way of keeping this file after snakemake finishes/fails for inspection?

Related

IntelliJ throws "init terminating in do_boot", but same erl command works in Windows command line

Setup: IntelliJ IDEA 2022.2.2, Erlang 25.0
I am trying to run the Erlang code available at https://erlangbyexample.org/send-receive. I am able to run in werl and Windows command line. But I am getting the error "init terminating in do_boot" when I run in IntelliJ.
I checked a similar issue reported in this question , wherein the solution was to convert list input to integer/s. However, my Erlang code does not expect any input, it just expects the function name.
Please provide a pointer to resolve the issue.
"C:\Program Files\Erlang OTP\bin\erl.exe" -pa F:/1TB/P/workspace-IntelliJ-Erlang1/out/production/workspace-IntelliJ-Erlang1 -pa F:/1TB/P/workspace-IntelliJ-Erlang1 -eval send_recv:run(). -s init stop -noshell
{"init terminating in do_boot",{undef,[{send_recv,run,[],[]},{erl_eval,do_apply,7,[{file,"erl_eval.erl"},{line,744}]},{init,start_it,1,[{file,"init.erl"},{line,1234}]},{init,start_em,1,[{file,"init.erl"},{line,1220}]},{init,do_boot,3,[{file,"init.erl"},{line,910}]}]}}
init terminating in do_boot ({undef,[{send_recv,run,[],[]},{erl_eval,do_apply,7,[{_},{_}]},{init,start_it,1,[{_},{_}]},{init,start_em,1,[{_},{_}]},{init,do_boot,3,[{_},{_}]}]})
Crash dump is being written to: erl_crash.dump...done
I configured RunConfiguration to BUILD before RUNNING ("Before launch" section). As result, RunConfiguration was creating an empty folder "../out/production/workspace-IntelliJ-Erlang1" without .beam files, if the folder does not exist. It would delete any existing .beam files if the folder exists. Hence, the RUN was failing eventually.
As a workaround, I removed the BUILD before RUNNING option from RunConfiguration. And, I manually built using BuildProject before RunConfiguration.
TODO: I will check why was not RunConfiguration able to generate the .beam file.
Check if there is a file called send_recv.beam in either of the directories specified as code path in the -pa arguments. (The undef error means that it can't find the function send_recv:run/0, more often than not because it can't find the compiled module.)
My guess is that this file is actually in the directory where you ran Erlang from the command prompt, but IntelliJ runs Erlang using another working directory. The current working directory is part of the code path by default, which would be why this works from the command prompt but not within IntelliJ.

Batch job submission failed: I/O error writing script/environment to file

I installed slurm on a workstation and it seemed to work, i can use the slurm commands, srun is working too.
But when i try to launch a job from a script using sbatch test.sh i get the following error : Batch job submission failed: I/O error writing script/environment to file even if the script is the simplest like
#!/bin/bash
srun hostname
Make sure slurmd is running as root. See the SlurmdUser parameter in slurm.conf. Its default value is root and it should be so.
Note this is different from the SlurmUser parameter, that defines the user which runs the controller processes ; this one is preferably not root.
If the configuration is correct, then you might have a faulty filesystem at the location referred to in the SlurmdSpoolDir parameter, where slurmd writes the submission script and environment for jobs assigned to the node.

How to have dbt run with log error FileNotFound to trigger an exit code that isn't 0?

We're running dbt version 0.16.1. We've set up our data pipeline to run in Airflow, and have a library set up to map each dbt model run within it's own bash operator on Airflow.
The dbt run command executed is as follows:
cd /usr/local/airflow/models/[PACKAGE_NAME] && dbt --log-format json run --models [MODEL_NAME]--no-version-check --profiles-dir=/usr/local/airflow/dags/dags-enterprise-model/enterprise_model/include --target=[TARGET] --profile=[PROFILE]
Occasionally (likely when two models are being run at the same time), Airflow will show the following message from within the dbt run command:
INFO - FileNotFoundError: [Errno 2] No such file or directory: 'logs/dbt.log' -> 'logs/dbt.log.1'
This is problematic because the logfiles do not get updated, but the exit code of the task is listed a 0:
Command exited with return code 0
This causes Airflow to mark the task as a success; however, the log wasn't printed successfully.
My questions:
Is there a a way for these errors to be raised as an actual error?
Failing that, is there a way to specific a unique log file?
I'm not sure if this is a gap in my understand, a bug within dbt's logging, or maybe both?
It definitely sounds like this is the result of invoking dbt multiple times simultaneously, while having it write to the same files. It's not a dbt bug because we don't intend for dbt to be invoked simultaneously; a single invocation can handle concurrent model runs via threads. Log collisions are one risk of reimplementing dbt's model DAG as Airflow DAGs.
Those are both fair questions:
Historically, dbt only used two log levels: debug and info. See the comment on a related issue: dbt#2680. I totally appreciate that Airflow and other orchestration tools have well defined notification behaviors when presented with different log levels. A community member actually just opened a PR to add error-level logging (dbt#2723).
It is possible to set a custom log path for a dbt invocation using the log-path config in dbt_project.yml (docs)

Redhat with httpd24 connecting to Informix using DBI

I'm at my wits' end on this. I have 2 RH7 boxes that I just installed httpd24 (v2.4.34) on. They were running httpd (v2.4.6) without any connection problems. Now when I try and run Perl scripts from the browser, they fail with...
install_driver(Informix) failed: Can't load '/usr/local/lib64/perl5/auto/DBD/Informix/Informix.so' for module DBD::Informix: libifsql.so: cannot open shared object file: No such file or directory at /usr/lib64/perl5/DynaLoader.pm line 190.
at (eval 5) line 3.
Compilation failed in require at (eval 5) line 3.
Perhaps a required shared library or dll isn't installed where expected
at /var/www/html/app/cgi-bin/test_informix_odbc.cgi line 35.
But when I run the same script from the command line, as 'apache', it runs just fine. All the ENV vars are set correctly.
Anyone run into anything similar before?
It would no longer use the LD_LIBRARY_PATH environment variable I was setting in httpd.conf.
Services are started in a fresh environment without any influence of user's environment (like environment variable values). As a consequence, information of all enabled collections will be lost during service start up.
Newer versions of httpd have stopped bringing the user environment in when the service is started. I found this little blurb in /opt/rh/httpd24/service-environment.
grep -r "LD_LIBRARY_PATH" /opt/rh/httpd24/
/opt/rh/httpd24/enable:export LD_LIBRARY_PATH=/opt/rh/httpd24/root/usr/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
I prepended the standard informix paths in /opt/rh/httpd24/enable.
export LD_LIBRARY_PATH=/opt/IBM/informix/lib:/opt/IBM/informix/lib/esql:/opt/rh/httpd24/root/usr/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
And everything is back to normal. Woohoo!

Jenkins SSH remote process is getting killed as soon as the Jenkins SSH plugin returns back

Jenkins version: 1.574
I created a simple job which performs the following:
Using "Execute shell script on remote host using SSH" as one of the BUILD steps, I'm just calling a shell script. This shell script performs stop and start operations on Tomcat to restart an application on the target machine.
I have a valid username, password, port defined for the target SSH server in Jenkins Global settings.
I saw this behavior that when I run a Jenkins job and call the restart script (which gets the application name as parameter $1), it works fine, but as soon as "Execute shell script on remote host using SSH" step completes, I see the new process dies on the remote/target application server.
If I run the script from the target/remote server itself, everything works fine and the new process/PID remains live forever, but running the same script from Jenkins, though I don't see any errors and everything works as expected, the new process dies as soon as the above mentioned SSH step is complete and control comes back to the next BUILD step in Jenkins job OR the Jenkins job is complete.
I saw a few posts/blogs and tried setting: BUILD_ID=dontKillMe in the Jenkins job (in various places i.e. Prepare Environment variables and also using Inject Environment variables...). When the job's particular build# is complete, I can see Environment Variables for that build# does say BUILD_ID=dontKillMe as its value (instead of the default Timestamp tag value).
I tried putting nohup before calling the restart script, i.e.,
nohup restart_tomcat.sh "${app}"
I also tried:
BUILD_ID=dontKillMe nohup restart_tomcat.sh "${app}"
This doesn't give any error and creates a nohup.out file on the remote server (but I'm not worried about it as the restart_tomcat.sh script itself creates its own LOG file which I'm "cat"ing after the restart_tomcat.sh script is complete. cat'ing on the log file is performed using another "Execute shell script on remote host using SSH" build step, and it successfully shows the log file created by the restart script).
I don't know what I'm missing at this point, but as soon as the restart_tomcat.sh step is complete, the new PID/process on the remote/target server dies.
How can I fix this?
I've been through this myself.
On my first iteration, before I knew about Jenkins ProcessTreeKiller, I ended up just daemonizing Tomcat. The Apache Tomcat documentation includes a section on running as a daemon.
You can also try disabling the ProcessTreeKiller for your whole Jenkins instance, if it's relatively small (read the first link for information).
The BUILD_ID=dontKillMe should be passed to the shell, and therefore it should be in your command line, not in Jenkins global configuration or job parameters.
BUILD_ID=dontKillMe restart_tomcat.sh "${app}" should have worked without problems.
You can also try nohup restart_tomcat.sh "${app}" & with the & at the end.
My solution (it worked after trying everything else) in Ubuntu 14.04 (Trusty Tahr) (Amazon AWS - Amazon EC2), Jenkins 1.601:
Exec command: (setsid COMMAND < /dev/null > /dev/null 2>&1 &);
Exec in PTY: DISABLED
// Example COMMAND=socat TCP4-LISTEN:1337,fork TCP4:127.0.0.1:1338
I created this Transfer as my last one.
#!/bin/ksh
export BUILD_ID=dontKillMe
I added the above line to the start of my script and the issue was resolved.