Snakemake: how maintain a snakelike instance value across multiple instances of the same invocation - variables

I want to save some information within the python code that is part of my snake file, and have this information available to the python code in every instance that snakemake creates when it is running the workflow. But a separate run of the workflow should have its own separate instance of information.
For example, say I were to create a UUID in my python code, and then later use it in the python code. But I want the UUID to be the same one in all running instances of the workflow. Instead, a new UUID gets created each time an instance is started.
If I start snakemake twice at the same time, I would want each of the two runs to create their own UUID, but within each run, all instances created by the run would use the same UUID. How to do this? Is there an identifier somewhere in the snakemake object that remains the same within one run across all instances, but changes from run to run?
Here's an example that fails with a 'No rule to produce' error:
import uuid
ID = str(uuid.uuid4())
print("ID:", ID)
rule all:
output: ID
run: print("Hello world")
If instead of 'run' it uses 'shell', it works fine, so I assume that Snakemake is rerunning the snakefile code when it executes the "run" portion of the rule. How could this be modified to work, to retain the first UUID value instead of generating a second one? Also, why isn't the ID specified for output in the rule captured when the rule is first processed, without requiring a second invocation of the python code? Since it works with 'shell', the second invocation is not needed specifically for processing the "output" statement.

Indeed, when you use a run block, Snakemake will invoke itself to execute that job, meaning that it also reparses the Snakefile, generating a new UUID. The same will happen on the cluster. There are good technical reasons for doing it like this (performance, the Python GIL, restrictions with pickling, simplicity and robustness of the implementation).
I am not sure what exactly you want to achieve, but it might help to look at this: http://snakemake.readthedocs.io/en/stable/project_info/faq.html#i-want-to-pass-variables-between-rules-is-that-possible

I've found a method that seems to work: use the process group ID:
ID = str(os.getpgrp())
Multiple instances of the same pipeline have the same group ID. However, I'm not sure if this remains true on a cluster, probably not. In my case that didn't matter.

Related

How can I make the origin of aliases in Cypress more apparent from the it/spec files?

My team is using aliases to set some important variables which are used within the it('Test',) blocks.
For example, we may be running the following command in a before step:
cy.setupSomeDynamicData()
Then the setupSomeDynamicData() method exists in a another file (ex: commands.js) and the setupSomeDynamicData() method may setup a couple aliases:
cypress/support/commands.js
setupSomeDynamicData() {
cy.createDynamicString(first).as('String1')
cy.createDynamicString(second).as('String2')
cy.createDynamicString(third).as('String3')
}
Now we go back to our spec/test file, and start using these aliases:
cypress/e2e/smallTest.cy.js
it('A small example test', function () {
cy.visit(this.String1)
// do some stuff...
cy.get(this.String2)
// do some stuff...
cy.visit(this.String3)
// do some stuff...
})
The problem is that unless you're the person who wrote the code, it's not obvious where this.String1, this.String2, or this.String3 are coming from nor when they were initialized (from the perspective of smallTest.cy.js) since the code that initializes the aliases is being executed in another file.
In the example, it's quite easy to Ctrl+F the codebase and search for these aliases but you have to really start doing some reverse engineering once you have more complex use cases.
I guess this feels like some sort of readability/maintainability problem because once you setup enough of these and the example I provided starts to get more complex then finding out where these aliases are created can be inconvenient. The this.* syntax makes it feel like you'd find these aliases variables somewhere within the same file in which they're being used but when you don't see any sign of them then it becomes evident that they've just magically been initialized (somewhere/somehow) and then the hunt 🕵🏼‍♂️ begins.
Some solutions that come to mind (which may be bad ideas) are:
Create JS objects with getters/setters. This way, it'll be a bit easier to trace where the variable you're using was "set"
Not use aliases, and instead, use global variables that can be imported into the spec/test files so it's clear where they are coming from then run a before/after hook to clear these variables so that the reset-per-test functionality remains.
Name the variables in a way where it's obvious that they are aliased and then spread the word/document this method within my team so that anytime they see this.aliasedString2 then they know it's coming from some method that performs these alias assignments.
I'm sure there may be a better way to handle this so just thought I'd post this question.

How can I use a bamboo plan variable in a script task?

I have defined in my bamboo plan a variable (BAMBOO_TEST_VAR) that I'd like to reuse in a particular script but I can't seem to figure out how to make it visible to that script.
If I just reference that variable from the script it merely prints the variable as empty.
27-Oct-2020 23:34:00 TEST JOB
27-Oct-2020 23:34:00 bamboo.shortJobName =
27-Oct-2020 23:34:00 BAMBOO_TEST_VAR=
And if I provide it as input to the Environment variables field it just renders with the value I give in that field taken as a literal, not to the plan variable I was hoping it would evaluate to.
27-Oct-2020 23:36:57 TEST JOB
27-Oct-2020 23:36:57 bamboo.shortJobName =
27-Oct-2020 23:36:57 BAMBOO_TEST_VAR=$BAMBOO_TEST_VAR
How can I reference the plan's environment variable directly from a script task without passing it down through arguments or something of the sort. What aspect or bamboo detail am I ignorant of that would have informed me that what I'm attempting is not possible or not supported because of reason XYZ?
So the trouble was I didn't scope the variable appropriately. What did it in the end was
${bamboo.BAMBOO_TEST_VAR}
Turns out if I slowed down and looked at the bamboo page more carefully I would have noted the help breadcrumbs they left around. Copying that help text here, emphasis mine:
Variables substitute values in your task configuration and inline scripts. If a variable name contains any reference to a password, like "password", "sshKey", "secret", or "passphrase", its value will be masked with "********".
For tasks configuration fields, use the syntax ${bamboo.myvariablename}.

How to user user input variables in Build file?

prompted for user entry by I'm new to ant and I was wondering if it would be possible to create a global variable in the build file so that I can use it repeatedly throughout the file itself.
For example, if the command were 'ant a', I would be able to use that value 'a' throughout the build file (for example in a file path i.e C:/test/a).
The reason I want to know how to do this is because there are multiple values like 'a' (lets say all the letters in the alphabet), and instead of copying and pasting the same code 26 times, I would be able to have 1 piece of code that takes different values (depending on what the user enters). In java you are able to have a variable storing the user input, and use that variable throughout the code (same idea here).
I tried searching for this but wasn't sure how to word it.
UPDATE
With the help of some people I managed to solve what I needed.
So I managed to use Input Task to kind of fix my problem. I prompted the user for an entry by using the following command:
Then I can just use the value entered by the user anywhere i want by simply writing ${hold.it}. For example in a file path "C:/go/to/${hold.it}"
Have a look at Ant properties and the property task used to set them. For example, you can define a property named prop1 and pass its value using ant -Dprop1=some_value.
A property is "global" since after defining it, any part of the buildfile can use it.

Execute command block in primitive in NetLogo extension

I'm writing a primitive that takes in two agentsets and a command block. It needs to call a few functions, execute the command block in the current context, and then call another function. Here's what I have so far:
class WithContext(pushGraphContext: GraphContext => Unit, popGraphContext: api.World => GraphContext)
extends api.DefaultCommand {
override def getSyntax = commandSyntax(
Array(AgentsetType, AgentsetType, CommandBlockType))
def perform(args: Array[Argument], context: Context) {
val turtleSet = args(0).getAgentSet.requireTurtleSet
val linkSet = args(1).getAgentSet.requireLinkSet
val world = linkSet.world
val gc = new GraphContext(world, turtleSet, linkSet)
val extContext = context.asInstanceOf[ExtensionContext]
val nvmContext = extContext.nvmContext
pushGraphContext(gc)
// execute command block here
popGraphContext(world)
}
}
I looked at some examples that used nvmContext.runExclusively, but that looked like it's specifically for having a given agentset run the command block. I want the current agent (possibly the observer) to run it. Should I wrap nvm.agent in an agentset and pass that to nvmContext.runExclusively? If so, what's the easiest way to wrap an agent in agentset? If not, what should I do?
Method #1
The quicker-but-arguably-dirtier method is to use runExclusiveJob, as demonstrated in (e.g.) the create-red-turtles command in https://github.com/NetLogo/Sample-Scala-Extension/blob/master/src/SampleScalaExtension.scala .
To wrap the current agent in an agentset, you can use agent.AgentSetBuilder. (You could also pass an Array[Agent] of length 1 to one of the ArrayAgentSet constructors, but I'd recommend AgentSetBuilder since it's less reliant on internal implementation details which are likely to change.)
Method #2
The disadvantage of method #1 is the slight constant overhead associated with creating and setting up the extra AgentSet, Job, and Context objects and directing execution through them.
Creating and running a separate job isn't actually how built-in commands like if and while work. Instead of making a new job, they remain in the current job and cause commands in a command block to run (or not run) by manipulating the instruction pointer (nvm.Context.ip) to jump into them or skip over them.
I believe an extension command could do the same. I'm not certain if it has been tried before, but I can't see any reason it wouldn't work.
Doing it this way would involve understanding more about NetLogo engine internals, as documented at https://github.com/NetLogo/NetLogo/wiki/Engine-architecture . You'd model your primitive after e.g. https://github.com/NetLogo/NetLogo/blob/5.0.x/src/main/org/nlogo/prim/etc/_if.java , including altering your implementation of nvm.CustomAssembled. (Note that prim._extern, which runs extension commands, delegates its assemble method to the wrapped command's own assemble method, so this should work.) In your assemble method, instead of calling done() at the end to terminate the job, you'd just allow execution to fall through.
I could try to construct an example that works this way, but it'd take me a couple hours; it's probably not worth me doing unless there's a real need.

Variable scope and source command in Tcl

I have the following two files:
a.tcl:
set condition false
source b.tcl
b.tcl:
if {$condition} {
puts "hello"
}
When I run a.tcl, it prints "hello". Is this a correct practice for accessing variable defined in a.tcl? What is the scope of $condition in b.tcl? Thank you.
The scope of condition is global. The source command evaluates the script read from the specified file in the context it's run; in your case this context is also global, hence your puts works.
The question about practice is more complicated as it hightly depends on what you actually do.
The way the source command works is pretty much exactly as if it was reading the file into a string and then passing that to eval (the sole subtlety is to do with info script). That means that the scope that the source was done in will be the one that the outermost level of the script is evaluated in, and so that you could have condition be a local variable there:
proc funkystuff {condition} {
source b.tcl
}
funkystuff true
That will work (and is in fact vital for how Tcl's package definition scripts work; they're evaluated in a context where there is a local variable $dir that describes where the package definition is located) but it can most certainly lead to code that is confusing!
Because of this, it's good practice to write your scripts so that the code inside them makes no assumptions about what context it is evaluated in. The easiest way to do that is often to put the code in the script inside a namespace, where the name of the namespace is fully qualified.
namespace eval ::foobar {
# Do stuff here...
}
It's also a good thing to try to write code that isn't excessively parameterized on sourcing, instead saving that for either which version of the code you load (e.g., one file for Linux, another for Windows) or what parameters you pass to the commands. Of course you don't have to work that way, but it does help make your code robust and easy to understand.
Finally, the scope used for the main script to a Tcl interpreter is always evaluated at the global level (i.e., in the :: namespace with no parent scope).