Snakemake: Pull sample-specific information from config sheet - snakemake

In my workflow, I have a sample sheet that contains all the samples that are supposed to be analysed + the path where to find input files + the reference genome that is supposed to be used. All of this is sample-specific.
In my config file, I have a list of reference genomes and for each of them a list of paths of files depending on the tool.
In the rule that performs the alignment of each sample, I need to load some of those files but in a sample-specific way because the reference genome might not be the same for all samples.
Here is how I tried to solve this:
params: reference=lambda wildcards: table_samples['reference'][wildcards.sample],
chrom_sizes=config[reference]['chrom_sizes']
However, when I try to run it like this, I get an error (directly when running Snakemake) saying that reference in the line of chrom_sizes=... is not defined.
Does anybody have an idea of a workaround?
EDIT: Some more information because I guess it's not really clear what I meant. Here is the relevant part of my config file.
hg19:
bwa: 'path/to/hg19/bwa/reference'
samtools: 'path/to/hg19/samtools/reference'
chrom_sizes: '...'
mm9:
bwa: 'path/to/mm9/bwa/reference'
samtools: 'path/to/mm9/samtools/reference'
chrom_sizes: '...'
And here is an example of the sample sheet.
name path reference
sample1 path/to/sample1 mm9
So, in the line reference=lambda wildcards: table_samples['reference'][wildcards.sample] I load the respective reference to be used for the current sample. Then, in chrom_sizes=config[reference]['chrom_sizes'] I need to use reference as a variable to get chrom_sizes for the correct reference genome.
I hope this makes it a bit more clear.

This is probably a ugly solution but should work.
params:
reference = table_samples['reference']['{sample}']
chrom_sizes = config[table_samples['reference']['{sample}']]['chrom_sizes']
You were defining a variable under params and attempted to pass its value within params itself; I'm not sure Snakemake can do that.

You forgot to put quotes around the reference key. Like you write it, Python interprets it as a variable.
chrom_sizes=config['reference']['chrom_sizes']

Alright, taking the information from your comments I was able to make it work. I just had to modify them a little.
As I added to my original post, I actually needed reference to be a variable in order to pull the information for every sample individually.
As #JeeYem suggested, I tried to do the following:
chrom_sizes = config[table_samples['reference']['{sample}']]['chrom_sizes']
However, it seems not to be possible to use {sample} in this context. Instead, I changed it like this:
chrom_sizes = lambda wildcards: config[table_samples['reference'][wildcards.sample]]['chrom_sizes']
For now, it works! Thanks for everyone for the contribution!

Related

Access snakemake expand-ed variables in a script

I often have a snakemake rule like the following:
rule aggregate:
input: expand("samples/{sample}/data.txt", sample=samples)
script:
"scripts/aggregate.py"
This gives aggregate.py the correct list of sample data files in snakemake.input, but it loses the association between samples and their files. I usually need the association sample -> sample file in aggregate.py and to get it in aggregate.py I either (A) recreate the list of files or (B) recreate the list of sample IDs in the same order as the files. Both are unsatisfying due to duplication of data and requiring that two places of code be kept in sync if either changes.
If like this example, there's only one variable being expanded, then adding it to params is OK, i.e. params: samples then zipping that together with inputs. But for more than one expanded variable, there is a big possible error where you give the variables in the different orders in the Snakefile and aggregate.py. That causes a silent error where all the data is mislabeled.
Is there a canonical or recommended way to handle this?
I would better rework the aggregate.py script, and call it from the shell section. This script should not know that it is being called from Snakemake, and get all relevant information from command line. Clean interface between the caller and the script is crucial, and would help you to rethink the task itself.

What's the professional term for this type of API?

A URL in which you add the parameters in the end of it, and it gives to back results for the specific parameters given, for example:
api.website.com/something.json?foo=2&bar=1
The common way looks like the way you mentioned as
domain/cgi_path?param_0=val_0&...&param_n=val_n
but it could be in any other scheme too like
domain/cgi_path/sub_path_as_param <<-- note the subpath as param instead of param
domain/images/2 <<-- instead of domain/images?page=2
or
domain?pg=cgi_path?args=blah_blah <<-- note the path as arg for one global path
About the .json part, it's not very common, but it's a human goody thing informs the dev about the possible response type(if it's a troll)
but the type could be set as a arg too like
domain/cgi_path?arg0=val0&response_type=xml
domain/cgi_path/xml?arg0=val0
Finally go with any way you like and looks easy for you. But remember to make a great documentation for it.

Extracting information from a file variable in d3 pick basic

I have a file variable in d3 pick basic and I am trying to figure out what file it corresponds to.
I tried the obvious thing which was to say:
print f *suppose the file variable's name is f in this case
but that didn't work, because:
SELECTION: 58[B34] in program "FILEPRINTER", Line 7: File variable used
where string expression expected.
I also tried things like:
list f *didn't compile
execute list dict f *same error
execute list f *same error
but those also did not work.
In case any one is wondering, the reason I am trying to do this in the first place is that there is a global variable that is passed up and down in the code base I am working with, but I can't find where the global variable gets its value from.
That file pointer variable is called a "file descriptor". You can't get any information from it.
You can use the file-of-files to log Write events, and after a Write is performed by the code, check to see what file was updated. The details for doing this would be a bit cumbersome. You really should rely on the Value-Add Reseller or contract with competent assistance for this.
If this is not a live end-user system, you can also modify an item getting written with some very unique text like "WHAT!FILE!IS!THIS?". Then you can do a Search-System command to search the entire account (or system) to find that text. See docs for proper use of that command.
This is probably the best option... Inject the following:
IF #USER = "CRISZ" THEN ; * substitute your user ID
READU FOO FROM F,"BLAH" ELSE
DEBUG
RELEASE F,"BLAH"
END
END
That code will stop only for one person - for everyone else it will flow as normal. When it does stop, use the LIST-LOCKS command to see which file has a read lock for item "BLAH". That's your file! Don't forget to remove and recompile the code. Note that recompiling code while users are actively using it results in aborts. It's best to do this kind of thing after hours or on a test system.
If you can't modify the code like that, diagnostics like this can be difficult. If the above suggestions don't help, I think this challenge might be beyond your personal level of experience yet and recommend you get some help.
If suggestion here Does help, please flag this as the answer. :)

Execute coldfusion code stored in a string dynamically?

I have an email body stored as a string in a database, something like this:
This is an email body containing lots of different variables. Dear #name#, <br/> Please contact #representativeName# for further details.
I pull this field from the database using a stored proc, and then I want to evaluate it on the coldfusion side, so that instead of "#name#", it will insert the value of the name variable.
I've tried using evaluate, but that only seems to work if there's just a variable name. It throws an error because of the other text.
(I can't just use placeholders and a find/replace like this - Resolving variables inside a Coldfusion string, because the whole point of storing this in a database is that the variables used to build the string are dynamic. For example, in one case the name field can be called "name" and in another it could be "firstName", etc.)
I would loop over each #variableName# reference and replace it with the evaluated version.
A regex will be able to find them all and then a loop to go over them all and just evaluate them one by one.
You need to write it to a file and CFINCLUDE it. This will incur a compilation overhead, but that's unavoidable.
Can you not save the code to the file system and just store a reference to where it is in the DB? That way it'll only get recompiled when it changes, rather than every time you come to use it?
<!--- pseudo code --->
<cfquery name="q">
SELECT fileContent // [etc]
</cfquery>
<cfset fileWrite(expandPath("/path/to/file/to/write/code.cfm"), q.fileContent)>
<cfinclude template="/path/to/file/to/write/code.cfm">
<cfset fileDelete(expandPath("/path/to/file/to/write/code.cfm"))>
That's the basic idea: get the code, write the code, include the code, delete the code. Although you'll want to make sure the file that gets created doesn't collide with any other file (use a UUID as the file name, or something, as per someone else's suggestion).
You're also gonna want to load test this. I doubt it'll perform very well. Others have suggested using the virtual file system, but I am not so sure there'll be much of a gain there: it's the compilation process that takes the time, not the actual file ops. But it's worth investigating.
Are you using ColdFusion 9 or Railo? If yes, writing to and including in-memory files may be quick and simple solution. Just generate file name with something like CreateUUID() to avoid collisions.
So basically, after researching and reading the answers, it seems that these are my options:
Have separate fields in the table for each variable and evaluate them individually. e.g. nameVariable, reprNameVariable, and then I can build the body with code like this:
This is an email body containing lots of different variables. Dear #evaluate(nameVariable)#, <br/> Please contact #evaluate(reprNameVariable)# for further details.
Have an "emailBody" field with all the text and applicable field names, and write it to a temporary file and cfinclude it. (As suggested by Adam Cameron, and I think that's what Sergii's getting at)
Have an "emailBody" field and write code to loop over it, find all coldfusion variables, and replace them with their "evaluate"d version. (As suggested by Dale Fraser)
Have small template files, one for each report type, with the email body, and have a field "emailBodyTemplate" that indicates which template to include. (As suggested by Adam Cameron)
Now I just have to decide which to use :) When I do, I'll accept the answer of the person who suggested that method (unless it's one that wasn't suggested, in which case I'll probably accept this, or if someone comes up with another method that makes more sense)
It's been a while since you posted this - but it's EXACTLY what I do. I found your question while looking for something else.
I've simply created my own simple syntax for variables when I write my emails into the database:
Hello ~FirstName~ ~LastName~,
Then, in my sending cfm file, I pull the email text from the database, and save it to a variable:
<cfset EmailBody = mydatabasequery.HTMLBody>
Then I quickly strip my own syntax with my variables (from another query called RecipientList):
<cfset EmailBody = ReplaceNoCase(EmailBody, "~FirstName~", "#RecipientList.First#", "ALL")>
<cfset EmailBody = ReplaceNoCase(EmailBody, "~LastName~", "#RecipientList.Last#", "ALL")>
Then I simply send my email:
<cfmail ....>#EmailBody#</cfmail>
I hope you manage to see this. If you control the authoring of the email messages, which I suspect you do, this should work well.
Russell

Is there any way I can define a variable in LaTeX?

In LaTeX, how can I define a string variable whose content is used instead of the variable in the compiled PDF?
Let's say I'm writing a tech doc on a software and I want to define the package name in the preamble or somewhere so that if its name changes, I don't have to replace it in a lot of places but only in one place.
add the following to you preamble:
\newcommand{\newCommandName}{text to insert}
Then you can just use \newCommandName{} in the text
For more info on \newcommand, see e.g. wikibooks
Example:
\documentclass{article}
\newcommand\x{30}
\begin{document}
\x
\end{document}
Output:
30
Use \def command:
\def \variable {Something that's better to use as a variable}
Be aware that \def overrides preexisting macros without any warnings and therefore can cause various subtle errors. To overcome this either use namespaced variables like my_var or fall back to \newcommand, \renewcommand commands instead.
For variables describing distances, you would use \newlength (and manipulate the values with \setlength, \addlength, \settoheight, \settolength and \settodepth).
Similarly you have access to \newcounter for things like section and figure numbers which should increment throughout the document. I've used this one in the past to provide code samples that were numbered separatly of other figures...
Also of note is \makebox which allows you to store a bit of laid-out document for later re-use (and for use with \settolength...).
If you want to use \newcommand, you can also include \usepackage{xspace} and define command by \newcommand{\newCommandName}{text to insert\xspace}.
This can allow you to just use \newCommandName rather than \newCommandName{}.
For more detail, http://www.math.tamu.edu/~harold.boas/courses/math696/why-macros.html
I think you probably want to use a token list for this purpose:
to set up the token list
\newtoks\packagename
to assign the name:
\packagename={New Name for the package}
to put the name into your output:
\the\packagename.