My workflow often includes PBS job submissions to a shared cluster that need to either wait in the scheduling queue, take over 24hrs to run or both. I'd like to run snakemake in the 'background' and get my prompt back while these jobs are running. I know this can be done using tmux, screen, or & but is there is a better way to do this?
I guess submitting a bash wrapper script with the snakemake commands inside is an option but I think I'm lacking some understanding of the workflow.
tmux is the recommended way to execute a Snakemake workflow. It will give you all you need, regardless of whether you are in a cluster or on a compute server.
Related
the situation is such that we move from jenkins to gitlab ci. Every time a stage occurs in the pipeline, a new container is created, I would like to know if it is possible to make the container used by the previous one, that is, a single one. Gitlab Executer is docker.
I want to save condition of one container
No, this is not possible in a practical way with the docker executor. Each job is executed in its own container. There is no setting to change this behavior.
Keep in mind that jobs (even across stages) can run concurrently and that jobs can land on runners on completely different underlying machines. Therefore, this is not really practical.
When running Snakemake on a cluster, jobs get scheduled fine via slurm. Sometimes I have a case that one job is failing and consequently leads to a stop of the snakemake instance/run after completion of the still running jobs. To speed up this I have stopped snakemake (CTRl+C) and restarted it. What I did not thought of was that in this case some jobs from the previous run might still be running on the cluster. Hence it could potentially happen that the same job is started again in case no output has been written until then. In this case it could finally lead to the situation where 2 jobs write to the same output file. Or is that prevented by some other log of snakemake to care about successful completion?
I hope you can follow this explanation. Happy for every comment !
In this case it could finally lead to the situation where 2 jobs write to the same output file.
Snakemake should be aware that the previous execution didn't exit clean (because of Ctrl+C) and the jobs that were running at that moment are incomplete or absent. However, snakemake cannot know that those pending jobs are still running as independent processes.
So yes, I think it can happen that jobs steps on each other feet in what you are doing.
In my opinion, before re-running snakemake it would be safer to kill the pending jobs and start fresh. (Those that have completed before snakemake was killed are ok of course).
Note that there is an option in snakemake that may help you:
--keep-going, -k Go on with independent jobs if a job fails. (default:
False)
I launch a Dataproc cluster and serve Hive on it. Remotely from any machine I use Pyhive or PyODBC to connect to Hive and do things. It's not just one query. It can be a long session with intermittent queries. (The query itself has issues; will ask separately.)
Even during one single, active query, the operation does not show as a "Job" (I guess it's Yarn) on the dashboard. In contrast, when I "submit" tasks via Pyspark, they show up as "Jobs".
Besides the lack of task visibility, I also suspect that, w/o a Job, the cluster may not reliably detect a Python client is "connected" to it, hence the cluster's auto-delete might kick in prematurely.
Is there a way to "register" a Job to companion my Python session, and cancel/delete the job at times of my choosing? For my case, it is a "dummy", "nominal" job that does nothing.
Or maybe there's a more proper way to let Yarn detect my Python client's connection and create a job for it?
Thanks.
This is not supported right now, you need to submit jobs via Dataproc Jobs API to make them visible on jobs UI page and to be taken into account by cluster TTL feature.
If you can not use Dataproc Jobs API to execute your actual jobs, then you can submit a dummy Pig job that sleeps for desired time (5 hours in the example below) to prevent cluster deletion by max idle time feature:
gcloud dataproc jobs submit pig --cluster="${CLUSTER_NAME}" \
--execute="sh sleep $((5 * 60 * 60))"
I know it's possible on a queued job to change directives via scontrol, for example
scontrol update jobid=111111 TimeLimit=08:00:00
This only works in some cases, depending on the administrative configuration of the slurm instance (I'm not an admin). Thus this post does not answer my question.
What I'm looking for is a way to ask SLURM to add more time to a running job, if resources are available, and even if it's already running. Sort of like a nested job request.
Particularly a running job that was initiated with srun on-the-fly.
In https://slurm.schedmd.com/scontrol.html, it is clearly written under TimeLimit:
Only the Slurm administrator or root can increase job's TimeLimit.
So I fear what you want is not possible.
An it makes sense, since the scheduler looks at job time to decide which jobs to launch and some short jobs can benefit from back-filling to start before longer jobs, it would be really a mess if users where allowed to change the job length while running. Indeed, how to define "when resource are available"? Some node can be IDLE for some time because slurm knows that it will need it soon for a large job
Recently I found myself several times in situations where I need to let run some operation in some background xterm and I'd need to be notified when my input is requested.
I know how to make it so I'm notified when the command ends, but that doesn't help in the cases where the command is not 100% batch (it puts up a prompt every now and then; a common example would be apt-get) or where the command hangs (because of some network failure, for example).
So I'd like to be notified when there's been no output in the last N minutes. Is there some way to configure xterm to do that for me, or maybe some other tool (screen maybe) that could do it?
xterm doesn't notice if the application is actually waiting for input, or simply doing nothing. An application (or shell) could be modified to do this, but that seems like a lot more work than you expected (i.e., many programs could be modified).
I also don't know of a way how to do it for applications that might be waiting for input, but if you have a batch application that should always output log info within a certain time span then you could run an extra process that does the notification if it doesn't get killed within a timeout. The process gets killed whenever a new line is read. Maybe that will help you or someone else to adapt it to processes that might wait for input:
i=0;{ while true;do echo $i;((i++));sleep $i;done }|while read line;do if [ $pid ];then sudo kill $pid;fi;bash -c 'sleep 5;notify-send boom'& pid=$!;echo $line;done
The part before the pipe sign is a process that outputs slower and slower and if it becomes slower than the threshold, notify-send sends notifications. If you wanted output to happen within 3 minutes, use sleep 3m.