The --keep-going flag tells snakemake to go on with independent jobs if a job fails.
The --stats /path_to_the_runtime_statistics_file option produces the runtime statistics of all the rules at the end of the pipeline.
However, if a job fails then the pipeline does not produce the runtime statistics file at all.
I.e. if you have 100 jobs and only one of them fails, then the runtime statistics about the 99 successful jobs are not produced.
How should one get the runtime statistics of the jobs succeeded?
Thanks in advance.
if you see the snake make API documentation of how --stats is called execute function implementation. You will get to know that, the implementation calls snakemake.stats module implemtation
over a condition which says if success:!
So, the straight answer to your question is NO you can't do it!
Two way of moving forward:
Quick & simple solution uses their stats implementation, and write what you wanted to do taking/calling particular functionality as per your needs! :)
from snakemake import stats
and do whatever you want .....
If you can't, then create an issue on snakemake github, Based on the priority their developer will add this feature to the newer versions of snakemake! It very slow process.
Related
I was working on the Cucumber report then found the parallel option, as of now I am running only #1 thread and using parallel =false in the feature file. As per my understanding, we cant use parallelism with the karate.robot as it needs one activated window with a title. Please correct me if I am wrong?
I think the main challenge is that most of the UI interactions assume that the "active" window is "on top", visible and has focus. If you can figure out a way to use Element.invoke() for everything, maybe - but you will need to experiment.
Personally I feel that the better strategy is to split your test suite across multiple cloud nodes, and maybe virtual-machines or EC2 instances will work, provided you get the RDP stuff sorted out.
Note that Karate has a way to run distributed tests: https://github.com/intuit/karate/wiki/Distributed-Testing - it may need some research though.
While using async-profiler I run the profiles for cpu and alloc separately but was hoping it would be possible to use them as part of the same duration? Given the output format types supported, this only seems to make sense if JFR is used.
Yes, this feature is implemented in v2.0 branch of async-profiler. The branch is currently under development, use with care. Planned for the next major release.
To specify multiple events in the command line, use
profiler.sh -e cpu,alloc -f out.jfr ...
The same as an agent option:
-agentpath:/path/to/libasyncProfiler.sh=start,event=cpu,event=alloc,file=out.jfr,...
As you've correctly guessed, this works only with JFR output.
For the feedback, comment the corresponding GitHub issue.
According to the documentation here, this feature is experimental but I would like to know if anyone is using it successfully. I already have some data so I am trying use case 4.
I tried to run an update hive query with #Incremental annotation but with it nothing goes into my RDB anymore.
If I remove it, everything is working but I want to take an advantage of this feature, because of the large amount of stored data and the query execution going very slow cause of it.
Any suggestion or help is greatly appreciated.
The incremental analysis feature will be working fine in the partially distributed setup, but it wasn't thoroughly tested in the external hadoop cluster, hence it was marked as 'experimenal'. Anyhow if you find any bugs on these you can report it in jira.
To answer your question, you need to enable the incremental processing for your stream first and then you need to add the incremental annotation.The following are the detailed steps for this.
1) You need add property 'streams.definitions.defn1.enableIncrementalIndex=true' in the streams.properties as explained here file and create a toolbox which consists only the stream definition artefact as explained here.
2) Install the toolbox - This will register the stream definition you mentioned in the toolbox with incremental analysis. On this point on wards the incoming data will be incrementally processed.
3) Now indicate the #Incremental annotation in the query. The first iteration will consider the whole available data as you have enabled the incremental analysis in the middle of the processing, but from next iteration onwards it'll only consider the new bunch of data.
This feature is said as experimental as there may be some critical bugs. We will release a more stable version of BAM with this feature in the next release.
I have a bamboo build with 2 stages: Build&Test and Publish. The way bamboo works, if Build&Test fails, Publish is not run. This is usually the way that I want things.
However, sometimes, Build&Test will fail, but I still want Publish to run. Typically, this is a manual process where even though there is a failing test, I want to push a button so that I can just run the Publish stage.
In the past, I had two separate plans, but I want to keep them together as one. Is this possible?
From the Atlassian help forum, here:
https://answers.atlassian.com/questions/52863/how-do-i-run-just-a-single-stage-of-a-build
Short answer: no. If you want to run a stage, all prior stages have to finish successfully, sorry.
What you could do is to use the Quarantine functionality, but that involves re-running the failed job (in yet-unreleased Bamboo 4.1, you may have to press "Show more" on the build result screen to see the re-run button).
Another thing that could be helpful in such situation (but not for OP) is disabling jobs.
Generally speaking, the best solution to most Bamboo problems is to rely on Bamboo as little as possible because you ultimately can't patch it.
In this case, I would just quickly write / re-use a aynchronous dependency resolution mechanism (something like GNU Make and its targets), and run that from a single stage.
Then just run everything on the default all-like target, and let users select the target on a custom run variable.
The idea is that given a specific input to the program, somehow I want to automatically step-in through the complete program and dump its control flow along with all the data being used like classes and their variables. Is their a straightforward way to do this? Or can this be done by some scripting over gdb or does it require modification in gdb?
Ok the reason for this question is because of an idea regarding a debugging tool. What it does is this. Given two different inputs to a program, one causing an incorrect output and the other a correct one, it will tell what part of the control flow differ for them.
So What I think will be needed is a complete dump of these 2 control flows going into a diff engine. And if the two inputs are following similar control flows then their diff would (in many cases) give a good idea about why the bug exist.
This can be made into a very engaging tool with many features build on top of this.
Tell us a little more about the environment. dtrace, for example, will do a marvelous job of this in Solaris or Leopard. gprof is another possibility.
A bumpo version of this could be done with yes(1), or expect(1).
If you want to get fancy, GDB can be scripted with Python in some versions.
What you are describing sounds a bit like gdb's "tracepoint debugging".
See gdb's internal help "help tracepoint". You can also see a whitepaper
here: http://sourceware.org/gdb/talks/esc-west-1999/
Unfortunately, this functionality is not currently implemented for
native debugging, but I believe that CodeSourcery is doing some work
on it.
Check this out, unlike Coverity, Fenris is free and widly used..
How to print the next N executed lines automatically in GDB?