I often have a snakemake rule like the following:
rule aggregate:
input: expand("samples/{sample}/data.txt", sample=samples)
script:
"scripts/aggregate.py"
This gives aggregate.py the correct list of sample data files in snakemake.input, but it loses the association between samples and their files. I usually need the association sample -> sample file in aggregate.py and to get it in aggregate.py I either (A) recreate the list of files or (B) recreate the list of sample IDs in the same order as the files. Both are unsatisfying due to duplication of data and requiring that two places of code be kept in sync if either changes.
If like this example, there's only one variable being expanded, then adding it to params is OK, i.e. params: samples then zipping that together with inputs. But for more than one expanded variable, there is a big possible error where you give the variables in the different orders in the Snakefile and aggregate.py. That causes a silent error where all the data is mislabeled.
Is there a canonical or recommended way to handle this?
I would better rework the aggregate.py script, and call it from the shell section. This script should not know that it is being called from Snakemake, and get all relevant information from command line. Clean interface between the caller and the script is crucial, and would help you to rethink the task itself.
Related
I'm using TF-Agents library for reinforcement learning,
and I would like to take into account that, for a given state,
some actions are invalid.
How can this be implemented?
Should I define a "observation_and_action_constraint_splitter" function when
creating the DqnAgent?
If yes: do you know any tutorial on this?
Yes you need to define the function, pass it to the agent and also appropriately change the environment output so that the function can work with it. I am not aware on any tutorials on this, however you can look at this repo I have been working on.
Note that it is very messy and a lot of the files in there actually are not being used and the docstrings are terrible and often wrong (I forked this and didn't bother to sort everything out). However it is definetly working correctly. The parts that are relevant to your question are:
rl_env.py in the HanabiEnv.__init__ where the _observation_spec is defined as a dictionary of ArraySpecs (here). You can ignore game_obs, hand_obs and knowledge_obs which are used to run the environment verbosely, they are not fed to the agent.
rl_env.py in the HanabiEnv._reset at line 110 gives an idea of how the timestep observations are constructed and returned from the environment. legal_moves are passed through a np.logical_not since my specific environment marks legal_moves with 0 and illegal ones with -inf; whilst TF-Agents expects a 1/True for a legal move. My vector when cast to bool would therefore result in the exact opposite of what it should be for TF-agents.
These observations will then be fed to the observation_and_action_constraint_splitter in utility.py (here) where a tuple containing the observations and the action constraints is returned. Note that game_obs, hand_obs and knowledge_obs are implicitly thrown away (and not fed to the agent as previosuly mentioned.
Finally this observation_and_action_constraint_splitter is fed to the agent in utility.py in the create_agent function at line 198 for example.
I've created a model with Uppaal in which several integer variables change over the course of time. Now I would like to save the values of the variables during the modelling process somewhere (best in xml or a text file). In the Uppaal documentation (https://www.it.uu.se/research/group/darts/uppaal/documentation.shtml) I found the method in point 13 (How do I export and interpret the traces from Uppaal?) and tried the Java API way already, in the hope that it can output the variables as well as the traces. Unfortunately this method seems to be limited to traces. Does anyone know a method to save the variable values from Uppaal?
Hopeful greetings,
Josi
Solution from the comments.
to export the variable value tractory over time, one may use SMC query in the verifier.
For example:
Typeset the following query: simulate 1 [<=300] { Gate.len }
Click Check
Right-click on the query, and from the popup menu choose Simulations (1)
Observe a new window popup with a plot
Right-click on the plot and choose Export Comma Separated Values
Follow the save file dialog and observe the resulting file to contain time and value sequence.
Note that SMC assumes that all channels are broadcast and there are no deadlocks.
In my workflow, I have a sample sheet that contains all the samples that are supposed to be analysed + the path where to find input files + the reference genome that is supposed to be used. All of this is sample-specific.
In my config file, I have a list of reference genomes and for each of them a list of paths of files depending on the tool.
In the rule that performs the alignment of each sample, I need to load some of those files but in a sample-specific way because the reference genome might not be the same for all samples.
Here is how I tried to solve this:
params: reference=lambda wildcards: table_samples['reference'][wildcards.sample],
chrom_sizes=config[reference]['chrom_sizes']
However, when I try to run it like this, I get an error (directly when running Snakemake) saying that reference in the line of chrom_sizes=... is not defined.
Does anybody have an idea of a workaround?
EDIT: Some more information because I guess it's not really clear what I meant. Here is the relevant part of my config file.
hg19:
bwa: 'path/to/hg19/bwa/reference'
samtools: 'path/to/hg19/samtools/reference'
chrom_sizes: '...'
mm9:
bwa: 'path/to/mm9/bwa/reference'
samtools: 'path/to/mm9/samtools/reference'
chrom_sizes: '...'
And here is an example of the sample sheet.
name path reference
sample1 path/to/sample1 mm9
So, in the line reference=lambda wildcards: table_samples['reference'][wildcards.sample] I load the respective reference to be used for the current sample. Then, in chrom_sizes=config[reference]['chrom_sizes'] I need to use reference as a variable to get chrom_sizes for the correct reference genome.
I hope this makes it a bit more clear.
This is probably a ugly solution but should work.
params:
reference = table_samples['reference']['{sample}']
chrom_sizes = config[table_samples['reference']['{sample}']]['chrom_sizes']
You were defining a variable under params and attempted to pass its value within params itself; I'm not sure Snakemake can do that.
You forgot to put quotes around the reference key. Like you write it, Python interprets it as a variable.
chrom_sizes=config['reference']['chrom_sizes']
Alright, taking the information from your comments I was able to make it work. I just had to modify them a little.
As I added to my original post, I actually needed reference to be a variable in order to pull the information for every sample individually.
As #JeeYem suggested, I tried to do the following:
chrom_sizes = config[table_samples['reference']['{sample}']]['chrom_sizes']
However, it seems not to be possible to use {sample} in this context. Instead, I changed it like this:
chrom_sizes = lambda wildcards: config[table_samples['reference'][wildcards.sample]]['chrom_sizes']
For now, it works! Thanks for everyone for the contribution!
I have a file variable in d3 pick basic and I am trying to figure out what file it corresponds to.
I tried the obvious thing which was to say:
print f *suppose the file variable's name is f in this case
but that didn't work, because:
SELECTION: 58[B34] in program "FILEPRINTER", Line 7: File variable used
where string expression expected.
I also tried things like:
list f *didn't compile
execute list dict f *same error
execute list f *same error
but those also did not work.
In case any one is wondering, the reason I am trying to do this in the first place is that there is a global variable that is passed up and down in the code base I am working with, but I can't find where the global variable gets its value from.
That file pointer variable is called a "file descriptor". You can't get any information from it.
You can use the file-of-files to log Write events, and after a Write is performed by the code, check to see what file was updated. The details for doing this would be a bit cumbersome. You really should rely on the Value-Add Reseller or contract with competent assistance for this.
If this is not a live end-user system, you can also modify an item getting written with some very unique text like "WHAT!FILE!IS!THIS?". Then you can do a Search-System command to search the entire account (or system) to find that text. See docs for proper use of that command.
This is probably the best option... Inject the following:
IF #USER = "CRISZ" THEN ; * substitute your user ID
READU FOO FROM F,"BLAH" ELSE
DEBUG
RELEASE F,"BLAH"
END
END
That code will stop only for one person - for everyone else it will flow as normal. When it does stop, use the LIST-LOCKS command to see which file has a read lock for item "BLAH". That's your file! Don't forget to remove and recompile the code. Note that recompiling code while users are actively using it results in aborts. It's best to do this kind of thing after hours or on a test system.
If you can't modify the code like that, diagnostics like this can be difficult. If the above suggestions don't help, I think this challenge might be beyond your personal level of experience yet and recommend you get some help.
If suggestion here Does help, please flag this as the answer. :)
i want to compute a table of SDP-solutions. I create a bash file that calls an SDP-solver (SDPA or CSDP) for different data sets:
problem1.dat-s
problem2.dat-s
...
Because i want to create a table of numbers, i dont want the whole output like iterations etc. Is there a way to avoid these messages? Or even better, a way to create one solution-set-file of the data sets?
Thanks, dalvo
It's a while now, since this questions was asked, maybe you have found an answer yourself by now. If not, try calling
csdp problem1.dat-s problem1.sol > NUL
csdp problem2.dat-s problem2.sol > NUL
...
This way you'll get your solutions written to a solution file. With CSDP you'll have one vector and two matrices. Reading these files, you can easily create any other set of solutions. The information written to stdout are useless, if you're just looking for the solution, since you'll only find the error values and messages and measures of time. So redirecting stdout to NUL will help you avoid these informations.
I don't know, how this would actually work with SDPA, but considering the information found on the man-pages, it should be the same there.