I'm using PexSymbolicValue.GetPathConditionString() to get path condition within PexMethods. I found there is a method PexSymbolicValue.GetRawPathConditionString() that get the path conditions in S-expression format but I can't find a reference of the symbols used in its output.
Sample of this output :
ā(Ceq (Ceq node null) 0)\r\n(Ceq (Ceq(select next node) null)0)\r\n(Clt (Add (select elem node)(Mul (select elem(select next node)) -1)) 1)\r\nā
Description of all these symbols is available in ECMA-335 Standard Common Language Infrastructure (CLI) Partitions I to VI http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-335.pdf
Related
I am working with a nextflow workflow that, at a certain stage, groups a series of files by their sample id using groupTuple(), and resulting in a channel that looks like this:
[sample_id, [file_A, file_B, ... , file_N]]
[sample_id, [file_A, file_B, ... , file_N]]
...
[sample_id, [file_A, file_B, ... , file_N]]
Note that this is the same channel structure that you get from .fromFilePairs().
I want to use these channel items in a process in such a way that, for each item, the process reads the sample_id from the first field and all the files from the inner tuple at once.
The nextflow documentation is somewhat cryptic about this, and it is hard to find how to declare this type of input in a channel, so I thought I'd create a question on stack overflow and then answer it myself for anyone who will ever be looking for this answer.
How does one declare the inner tuple in the input section of a nextflow process?
In the example given above, my inner tuple contains items of only one type (files). I can therefore pass the whole second term of the tuple (i.e. the inner tuple) as a single input item under the file() qualifier. Like this:
input:
tuple \
val(sample_id), \
file(inner_tuple) \
from Input_channel
This will ensure that the tuple content is read as file (one by one), the same way as performing .collect() on a channel of files, in the sense that all files will then be available in the nextflow temp directory where the process is executed.
The question is how you come up with sample_id, but in case they just have different file extensions you might use something like this:
all_files = Channel.fromPath("/path/to/your/files/*")
all_files.map { it -> [it.simpleName, it] }
.groupTuple()
.set { grouped_files }
The path qualifier (previously the file qualifier) can be used to stage a single (file) value or a collection of (file) values into the process execution directory. The note at the bottom of the multiple input files section in the docs also mentions:
The normal file input constructs introduced in the input of files
section are valid for collections of multiple files as well.
This means, you can use a script variable, e.g.:
input:
tuple val(sample_id), path(my_files)
In which case, the variable will hold the list of files (preserving the original filenames). You could use it directly to refer to all of the files in the list, or, you could access specific (file) elements (if you need them) using square bracket (slice) notation.
This is the syntax you will want most of the time. However, if you need predicable filenames or if you need to deal with files with the identical filenames, you may need a different approach:
Alternatively, you could specify a target filename, e.g.:
input:
tuple val(sample_id), path('my_file')
In the case where a single file is received by the process, the file would be staged with the target filename. However, when a collection of files is received by the process, the filename will be appended with a numerical suffix representing its ordinal position in the list. For example:
process test {
tag { sample_id }
debug true
stageInMode 'rellink'
input:
tuple val(sample_id), path('fastq')
"""
echo "${sample_id}:"
ls -g --time-style=+"" fastq*
"""
}
workflow {
readgroups = Channel.fromFilePairs( '*_{1,2}.fastq' )
test( readgroups )
}
Results:
$ touch {foo,bar,baz}_{1,2}.fastq
$ nextflow run .
N E X T F L O W ~ version 22.04.4
Launching `./main.nf` [scruffy_caravaggio] DSL2 - revision: 87a80d6d50
executor > local (3)
[65/66f860] process > test (bar) [100%] 3 of 3 ā
baz:
lrwxrwxrwx 1 users 20 fastq1 -> ../../../baz_1.fastq
lrwxrwxrwx 1 users 20 fastq2 -> ../../../baz_2.fastq
foo:
lrwxrwxrwx 1 users 20 fastq1 -> ../../../foo_1.fastq
lrwxrwxrwx 1 users 20 fastq2 -> ../../../foo_2.fastq
bar:
lrwxrwxrwx 1 users 20 fastq1 -> ../../../bar_1.fastq
lrwxrwxrwx 1 users 20 fastq2 -> ../../../bar_2.fastq
Note that the names of staged files can be controlled using the * and ? wildcards. See the links above for a table that shows how the wildcards are replaced depending on the cardinality of the input collection.
Suppose I have a code in python that generates a dictionary as the result. I need to write each element of dictionary in a separate folder which later will be used by other set of rules in snakemake.
I have written the code as following but it does not work!
simulation_index_dict={1:'test1',2:'test2'}
def indexer(wildcards):
return(simulation_index_dict[wildcards.simulation_index])
rule SimulateAll:
input:
expand("{simulation_index}/ProteinCodingGene/alfsim.drw",simulation_index=simulation_index_dict.keys())
rule simulate_phylogeny:
output:
ProteinCodingGeneParams=expand("{{simulation_index}}/ProteinCodingGene/alfsim.drw"),
IntergenicRegionParams=expand("{{simulation_index}}/IntergenicRegions/dawg_IR.dawg"),
RNAGeneParams=expand("{{simulation_index}}/IntergenicRegions/dawg_RG.dawg"),
RepeatRegionParams=expand("{{simulation_index}}/IntergenicRegions/dawg_RR.dawg"),
params:
value= indexer,
shell:
"""
echo {params.value} > {output.ProteinCodingGeneParams}
echo {params.value} > {output.IntergenicRegionParams}
echo {params.value} > {output.RNAGeneParams}
echo {params.value} > {output.RepeatRegionParams}
"""
The error it return is :
InputFunctionException in line 14 of /$/test.snake:
KeyError: '1'
Wildcards:
simulation_index=1
It seems that problems is with the params section of the rule because deleting it will eliminates the error but I can not figure out what is wrong with the params!
The solution: using strings as dictionary keys
One can guess from the error message (KeyError: '1') that some query in a dictionary went wrong on a key that is '1', which happens to be a string.
However, the dictionary used in the indexer "params" function has integers as keys.
Apparently, using strings instead of ints as keys to this simulation_index_dict dictionary solves the problem (see comments below the question).
The cause: loss of type information during workflow inference
The cause of the problem is likely that the integer nature (inherited from simulation_index_dict.keys()) of the value assigned to the simulation_index parameter of the expand in SimulateAll is "forgotten" in subsequent steps of the workflow inference.
Indeed, the expand results in a list of strings, which are then matched against the output of the other rules (which also consist in strings), to infer the values of the wildcards attributes (which are also strings). Therefore, when the indexer function is executed, wildcards.simulation_index is a string, and this causes a KeyError when looking it up in simulation_index_dict.
I need to process MODIS ocean level 2 data and I obtained an external plugin for ENVI https://github.com/dawhite/EPOC/releases. Now, I want to batch process hundreds of images for which I modified the code like the following code. The code is running fine, but I have to select the input file every time. Can anyone please help me to make the program fully automatic? I really appreciate and thanks a lot for your help!
Pro OCL2convert
dir = 'C:\MODIS\'
CD, dir
; batch processing of level 2 ocean chlorophyll data
files=file_search('*.L2_LAC_OC.x.hdf', count=numfiles)
; this command will search for all files in the directory which end with
; the specified one
counter=0
; this is a counter that tells IDL which file is being read-starts at 0
While (counter LT numfiles) Do begin
; this command tells IDL to start a loop and to only finish when the counter
; is equal to the number of files with the name specified
name=files(counter)
openr, 1, name
proj = envi_proj_create(/utm, zone=40, datum='WGS-84')
ps = [1000.0d,1000.0d]
no_bowtie = 0 ;same as not setting the keyword
no_msg = 1 ;same as setting the keyword
;OUTPUT CHOICES
;0 -> standard product only
;1 -> georeferenced product only
;2 -> standard and georeferenced products
output_choice = 2
;RETURNED VALUES
;r_fid -> ENVI FID for the standard product, if requested
;georef_fid -> ENVI FID for the georeferenced product, if requested
convert_oc_l2_data, fname=fname, output_path=output_path, $
proj=proj, ps=ps, output_choice=output_choice, r_fid=r_fid, $
georef_fid=georef_fid, no_bowtie=no_bowtie, no_msg=no_msg
print,'done!'
close, 1
counter=counter+1
Endwhile
End
Not knowing what convert_oc_l2_data does (it appears to be a program you created, there is no public documentation for it), I would say that the problem might be that the out_path variable is not defined in the rest of your program.
Consider this code:
OperatorTable addOperator(":", 2)
: := method(value,
list(self, value)
)
hash := "key": "value"
hash println
The return should be list(key, value), and when using this in the Io REPL that is exactly the return value. When using the interpreter (as in io somefile.io) the value returned is value. After some inspection the difference is here:
# In the REPL
OperatorTable addOperator(":", 2)
message("k" : "v") # => "k" :("v")
# Via the Interpreter
OperatorTable addOperator(":", 2)
message("k" : "v") # => "k" : "v"
Why is this happening?
File execution happens in these stages:
load file
replace operators based on the current operator table
execute contents
So operator to message conversion only happens when the file is initially loaded in stage 2.
When the operator registration code is executed in stage 3. this has already happened,
thus the operator has no effect.
You can set the order which files get loaded manually and put the operator definition in the first file loaded.
Having a file called operators.io for example which includes all operator definitions loaded before the files that use them.
After confirming with ticking I arrived at the following solution:
main.io:
doFile("ops.io")
doFile("script.io")
ops.io:
OperatorTable addOperator(":", 2)
: := method(value,
list(self, value))
script.io:
hash := "key": "value"
hash println
Like ticking explains, the whole file is loaded at once so you have to split it up so the loading order guarantees that the operators are available.
I would like to do
register s3n://uw-cse344-code/myudfs.jar
-- load the test file into Pig
--raw = LOAD 's3n://uw-cse344-test/cse344-test-file' USING TextLoader as (line:chararray);
-- later you will load to other files, example:
raw = LOAD 's3n://uw-cse344/btc-2010-chunk-000' USING TextLoader as (line:chararray);
-- parse each line into ntriples
ntriples = foreach raw generate FLATTEN(myudfs.RDFSplit3(line)) as (subject:chararray,predicate:chararray,object:chararray);
--filter 1
subjects1 = filter ntriples by subject matches '.*rdfabout\\.com.*' PARALLEL 50;
--filter 2
subjects2 = subjects1;
but I get the error:
2012-03-10 01:19:18,039 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: mismatched input ';' expecting LEFT_PAREN
Details at logfile: /home/hadoop/pig_1331342327467.log
so it seems pig doesn't like that. How do I accomplish this?
i don't think that kind of 'typical' assignment works in pig. It's not really a programming language in the strict sense - it's a high-level language on top of hadoop with some specialized functions.
i think you'll need to simply re-project the data from subjects1 to subjects2, such as:
subjects2 = foreach subjects1 generate $0, $1, $2;
another approach might be to use the LIMIT function with some absurdly high parameter.
subjects2 = subjects2 LIMIT 100000000 ;
there could be a lot of reasons why that doesn't make sense, but it's a thought.
i sense you are considering doing things as you would in a programming language
i have found that rarely works out like you want it to but you can always get the job done once you think like Pig.
As I understand your example fro DataScience coursera course.
It's strange but I found the same problem. This code works on the on amount of data and don't on the another.
Because we need to change parameters I used this code:
filtered2 = foreach filtered generate subject as subject2, predicate as predicate2, object as object2;