snakemake randomly identifying outputs as incomplete - snakemake

I'm having some trouble with a snakemake workflow I developed. For a specific rule, the output is sometimes identified as incomplete by snakemake:
IncompleteFilesException:
The files below seem to be incomplete. If you are sure that certain files are not incomplete, mark them as complete with
snakemake --cleanup-metadata <filenames>
To re-generate the files rerun your command with the --rerun-incomplete flag.
Incomplete files:
This rule runs several times (with different wildcard values) and only some fail with this error. Interestingly, if I rerun the workflow from scratch, the same jobs will complete with no error and other ones might produce it. Also, I manually checked the output and don't see anything wrong with it. I can resume the workflow with no problem.
I am aware of the --ignore-incomplete workaround, but still curious as to why this might happen? How does snakemake decide about an output being incomplete? I should also mention that the jobs run on a PBS HPC system - not sure if it's related.

Incomplete in this context probably means, that the job did not finish how it should have been, so Snakemake cannot guarantee the output is how it should be. If your rule produces output but then fails, Snakemake would still mark the output as incomplete.
I looked up in the source code when the IncompleteFilesException is raised. Snakemake seems to mark files as complete when persistence.finished() is called, see code here.
And finished() is called by postprocess() which again gets called by a number of places. Without knowing Snakemake inside out, it seems hard to know where the problem lies. Somehow, Snakemake must think that the job didn't complete properly.
I would look into the logs of the Snakemake runs. Possibly some of the jobs fail.

Related

Snakemake: What will happen if the output file of a rule is already generated?

I'm very new to snakemake, and I downloaded a package from github that utilize snakemake, I managed to run it once, but since my data is so large, it took 27 hours to complete the whole thing, but around 99% of it is spent on executing 1 rule, so I wanted to skip that particular rule, when the output file of that rule has already existed. Is snakemake going to skip that rule automatically if the output file of that rule is listed in the rule all section? else, what should I do to skip it?
From the way you describe it, yes, snakemake will skip that long-running rule if its output is already present AND the output is newer than its input. If this second condition is not met, snakemake will run the rule again. This makes sense, right? If the input has been updated then the output is obsolete and needs to be redone. Note that snakemake checks the timestamps not the content of the files.
In practice, you can execute snakemake with the --dry-run option to confirm it is not going to run that rule again. Look also at the --summary option to see why snakemake wants to execute some rules and skip others.
(In doubt, make a copy of the output from the long-running rule, just in case...)

Does the new msbuild 15 /warnaserror switch allow to fail on all the warnings except some?

Specifically, I have a problem with MSB3026 - Could not copy bla-bla-bla to bla-bla-bla. Beginning retry 1 in 1000ms. The process cannot access the file bla-bla-bla because it is being used by another process.
I know why it happens - two different libraries use two different versions of the same dependency, but I cannot fix that right now.
So, I want to tell MSBuild to treat all the warnings, except MSB3026, as errors.
I do not understand if this is possible. Is it?
The answer was provided here - https://github.com/Microsoft/msbuild/issues/3062#issuecomment-439945441 by Rainer Sigwald.
There is the flag /warnasmessage which demotes warnings to simple messages. It has higher priority than /warnaserror, so it is good enough.
To fail on all the warnings, except MSB3026 one would pass this to msbuild /err /nowarn:MSB3026

snakemake use both --keep-going and --stats

The --keep-going flag tells snakemake to go on with independent jobs if a job fails.
The --stats /path_to_the_runtime_statistics_file option produces the runtime statistics of all the rules at the end of the pipeline.
However, if a job fails then the pipeline does not produce the runtime statistics file at all.
I.e. if you have 100 jobs and only one of them fails, then the runtime statistics about the 99 successful jobs are not produced.
How should one get the runtime statistics of the jobs succeeded?
Thanks in advance.
if you see the snake make API documentation of how --stats is called execute function implementation. You will get to know that, the implementation calls snakemake.stats module implemtation
over a condition which says if success:!
So, the straight answer to your question is NO you can't do it!
Two way of moving forward:
Quick & simple solution uses their stats implementation, and write what you wanted to do taking/calling particular functionality as per your needs! :)
from snakemake import stats
and do whatever you want .....
If you can't, then create an issue on snakemake github, Based on the priority their developer will add this feature to the newer versions of snakemake! It very slow process.

Snakemake: go back and clean up temp() files

I know variants on this have been asked before (e.g. https://groups.google.com/forum/#!topic/snakemake/4kslVBX2kew), but I don't see a definitive solution.
If I run a long-running and complex Snakemake pipeline with '--notemp' (maybe because I'm debugging), it would be really nice to be able to subsequently run a 'cleanup' command to delete anything that would automatically have been deleted on the first run without --notemp. Is there any easy way of doing this?
The way I'm doing this right now is to re-run after using '--forceall --touch', without '--notemp', such that everything just gets touched, and the temp files then get removed at the end. But it's not ideal to change all the timestamps. Is there a better way?
Jon
Since v5.0.0, --delete-temp-output achieves this.
--delete-temp-output
Remove all temporary files generated by the workflow. Use together with –dry-run to list files without actually deleting anything. Note that this will not recurse into subworkflows.
Default: False

Steps to error proofing a mission critical process

I'm writing a program that will continuously process files placed into a hot folder.
This program should have 100% uptime with no admin intervention. In other words it should not fail on "stupid" errors. i.e. Someone deletes the output directory it should simply recreate it and move on.
What I'm thinking about doing is to code the entire program and then go through and look for "error points" and then add code to handle the error.
What I'm trying to avoid is adding erroneous or unnecessary error handling or even building error handling into the control flow of the program (i.e. the error handling controls the flow of the program). Well perhaps it could control the flow to a certain extent, but that would constitute bad design (subjective).
What are some methodologies for "error proofing" a "critical" process?
If your process must be error-proof and have no admin intervention, you must handle all possible errors. If you leave any chance of stopping the program, it will happen (Murphy's Law) and you will not know.
Even handling all possible errors, I think you'll need some logging and even a monitor with (mail?) alerts to be sure your process is always running fine.
The most important thing to do is to document your assumptions in the form of unit tests. You should write a test that violates each assumption, and then prove that your program successfully recovers or takes action to make this state true again.
To use your example, if someone could delete the critical folder, make a test that simulates this and then show that your program handles this case without crashing.
Unit testing.
On technique for thorough analysis is a HAZOP study, where for each part of the process you consider keywords for that process. For a chemical in a process plant, these might be 'more' 'less', 'missing', 'hotter' 'colder' 'leak' 'pressure' and so-one.
When applying HAZOP to software, you would consider keywords appropriate to the objects in your software.
For example, for a reading a file you might consider 'more' to be buffer overrun, 'less' missing data, 'missing' not existing, 'leak' lack of file handles, and so on.