Why does the Io REPL and the interpreter give me two different values? - operators

Consider this code:
OperatorTable addOperator(":", 2)
: := method(value,
list(self, value)
)
hash := "key": "value"
hash println
The return should be list(key, value), and when using this in the Io REPL that is exactly the return value. When using the interpreter (as in io somefile.io) the value returned is value. After some inspection the difference is here:
# In the REPL
OperatorTable addOperator(":", 2)
message("k" : "v") # => "k" :("v")
# Via the Interpreter
OperatorTable addOperator(":", 2)
message("k" : "v") # => "k" : "v"
Why is this happening?

File execution happens in these stages:
load file
replace operators based on the current operator table
execute contents
So operator to message conversion only happens when the file is initially loaded in stage 2.
When the operator registration code is executed in stage 3. this has already happened,
thus the operator has no effect.
You can set the order which files get loaded manually and put the operator definition in the first file loaded.
Having a file called operators.io for example which includes all operator definitions loaded before the files that use them.

After confirming with ticking I arrived at the following solution:
main.io:
doFile("ops.io")
doFile("script.io")
ops.io:
OperatorTable addOperator(":", 2)
: := method(value,
list(self, value))
script.io:
hash := "key": "value"
hash println
Like ticking explains, the whole file is loaded at once so you have to split it up so the loading order guarantees that the operators are available.

Related

check if nextflow channel is empty

I am trying to figure out how to check if a channel is empty or not.
For instance, I have two processes. The first process runs only if a combination of parameters/flags are set and if so, checks also if its input file from another process (input via a channel) is not empty, then it creates a new input file for a second process (to eventually replace the default one). As a simplified example:
.....
.....
// create the channel here to force nextflow to wait for the first process
_chNewInputForProcessTwo = Channel.create()
process processOne {
when:
params.conditionOne && parameters.conditionTwo
input:
file inputFile from _channelUpstreamProcess
output:
file("my.output.file") into _chNewInputForProcessTwo
script:
"""
# check if we need to produce new input for second process (i.e., input file not empty)
if [ -s ${inputFIle} ]
then
<super_command_to_generate_new_fancy_input_for_second_process> > "my.output.file"
else
echo "No need to create new input"
fi
"""
}
// and here I would like to check if new input was generated or leave the "default" one
_chInputProcessTwo = Channel.from(_chNewInputForProcessTwo).ifEmpty(Channel.value(params.defaultInputProcessTwo))
process secondProcess {
input:
file inputFile from _chInputProcessTwo
......
......
etc.
When I try running with this approach it fails because the channel _chNewInputForProcessTwo contains DataflowQueue(queue=[]) therefore, not being actually empty.
I've tried several things looking at the documentation and the threads on google groups and on gitter. trying to set it to empty, but then it complains i am trying to use the channel twice. putting create().close(), etc.
Is there a clean/reasonable way to do this? I could do it using a value channel and have the first process output some string on the stdout to be picked up and checked by the second process, but that seems pretty dirty to me.
Any suggestions/feedback is appreciated. Thank you in advance!
Marius
Best to avoid trying to check if the channel is empty. If your channel could be empty and you need a default value in your channel, you can use the ifEmpty operator to supply one. Note that a single value is implicitly a value channel. I think all you need is:
myDefaultInputFile = file(params.defaultInputProcessTwo)
chInputProcessTwo = chNewInputForProcessTwo.ifEmpty(myDefaultInputFile)
Also, calling Channel.create() is usually unnecessary.

Can gather be used to unroll Junctions?

In this program:
use v6;
my $j = +any "33", "42", "2.1";
gather for $j -> $e {
say $e;
} # prints 33␤42␤2.1␤
for $j -> $e {
say $e; # prints any(33, 42, 2.1)
}
How does gather in front of forchange the behavior of the Junction, allowing to create a loop over it? The documentation does not seem to reflect that behavior. Is that spec?
Fixed by jnthn in code and test commits.
Issue filed.
Golfed:
do put .^name for any 1 ; # Int
put .^name for any 1 ; # Mu
Any of ten of the thirteen statement prefixes listed in the doc can be used instead of do or gather with the same result. (supply unsurprisingly produces no output and hyper and race are red herrings because they try and fail to apply methods to the junction values.)
Any type of junction produces the same results.
Any number of elements of the junction produces the same result for the for loop without a statement prefix, namely a single Mu. With a statement prefix the for loop repeats the primary statement (the put ...) the appropriate number of times.
I've searched both rt and gh issues and failed to find a related bug report.

InputFunctionException with KeyError

Suppose I have a code in python that generates a dictionary as the result. I need to write each element of dictionary in a separate folder which later will be used by other set of rules in snakemake.
I have written the code as following but it does not work!
simulation_index_dict={1:'test1',2:'test2'}
def indexer(wildcards):
return(simulation_index_dict[wildcards.simulation_index])
rule SimulateAll:
input:
expand("{simulation_index}/ProteinCodingGene/alfsim.drw",simulation_index=simulation_index_dict.keys())
rule simulate_phylogeny:
output:
ProteinCodingGeneParams=expand("{{simulation_index}}/ProteinCodingGene/alfsim.drw"),
IntergenicRegionParams=expand("{{simulation_index}}/IntergenicRegions/dawg_IR.dawg"),
RNAGeneParams=expand("{{simulation_index}}/IntergenicRegions/dawg_RG.dawg"),
RepeatRegionParams=expand("{{simulation_index}}/IntergenicRegions/dawg_RR.dawg"),
params:
value= indexer,
shell:
"""
echo {params.value} > {output.ProteinCodingGeneParams}
echo {params.value} > {output.IntergenicRegionParams}
echo {params.value} > {output.RNAGeneParams}
echo {params.value} > {output.RepeatRegionParams}
"""
The error it return is :
InputFunctionException in line 14 of /$/test.snake:
KeyError: '1'
Wildcards:
simulation_index=1
It seems that problems is with the params section of the rule because deleting it will eliminates the error but I can not figure out what is wrong with the params!
The solution: using strings as dictionary keys
One can guess from the error message (KeyError: '1') that some query in a dictionary went wrong on a key that is '1', which happens to be a string.
However, the dictionary used in the indexer "params" function has integers as keys.
Apparently, using strings instead of ints as keys to this simulation_index_dict dictionary solves the problem (see comments below the question).
The cause: loss of type information during workflow inference
The cause of the problem is likely that the integer nature (inherited from simulation_index_dict.keys()) of the value assigned to the simulation_index parameter of the expand in SimulateAll is "forgotten" in subsequent steps of the workflow inference.
Indeed, the expand results in a list of strings, which are then matched against the output of the other rules (which also consist in strings), to infer the values of the wildcards attributes (which are also strings). Therefore, when the indexer function is executed, wildcards.simulation_index is a string, and this causes a KeyError when looking it up in simulation_index_dict.

Issue automating CSV import to an RSQLite DB

I'm trying to automate writing CSV files to an RSQLite DB.
I am doing so by indexing csvFiles, which is a list of data.frame variables stored in the environment.
I can't seem to figure out why my dbWriteTable() code works perfectly fine when I enter it manually but not when I try to index the name and value fields.
### CREATE DB ###
mydb <- dbConnect(RSQLite::SQLite(),"")
# FOR LOOP TO BATCH IMPORT DATA INTO DATABASE
for (i in 1:length(csvFiles)) {
dbWriteTable(mydb,name = csvFiles[i], value = csvFiles[i], overwrite=T)
i=i+1
}
# EXAMPLE CODE THAT SUCCESSFULLY MANUAL IMPORTS INTO mydb
dbWriteTable(mydb,"DEPARTMENT",DEPARTMENT)
When I run the for loop above, I'm given this error:
"Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'DEPARTMENT': No such file or directory
# note that 'DEPARTMENT' is the value of csvFiles[1]
Here's the dput output of csvFiles:
c("DEPARTMENT", "EMPLOYEE_PHONE", "PRODUCT", "EMPLOYEE", "SALES_ORDER_LINE",
"SALES_ORDER", "CUSTOMER", "INVOICES", "STOCK_TOTAL")
I've researched this error and it seems to be related to my working directory; however, I don't really understand what to change, as I'm not even trying to manipulate files from my computer, simply data.frames already in my environment.
Please help!
Simply use get() for the value argument as you are passing a string value when a dataframe object is expected. Notice your manual version does not have DEPARTMENT quoted for value.
# FOR LOOP TO BATCH IMPORT DATA INTO DATABASE
for (i in seq_along(csvFiles)) {
dbWriteTable(mydb,name = csvFiles[i], value = get(csvFiles[i]), overwrite=T)
}
Alternatively, consider building a list of named dataframes with mget and loop element-wise between list's names and df elements with Map:
dfs <- mget(csvfiles)
output <- Map(function(n, d) dbWriteTable(mydb, name = n, value = d, overwrite=T), names(dfs), dfs)

How to modify a line in a file with Erlang OTP module

I got a big file and I would like to replace the first line with other content.
When I use {ok, IoDev} = file:open("/root/FileName", [write, raw, binary]), the whole content is removed.
But when I use {ok, IoDev} = file:open("/root/FileName", [append, raw, binary]) and file:pwrite(S, {bof,0}, <<"new content\n">>), I got the result {error, badarg}.
If I set Location to 0: file:pwrite(S, 0, <<"new content\n">>), the string is appended at tail of the file.
You seem to be confused with the actual file API.
file:open/2 will truncate the file if you pass [write, raw, binary]as you do:
(about write mode): The file is opened for writing. It is created if it does not exist. If the file exists, and if write is not combined with read, the file will be truncated.
So you need to pass either [write, read] or [write, append] as documented.
file:pwrite/3 also works exactly as documented. It allows you to write at a given position in the file. In particular, you cannot pass {bof, 0} as second argument since you opened the file in raw mode:
If IoDevice has been opened in raw mode, some restrictions apply: Location is only allowed to be an integer; and the current position of the file is undefined after the operation.
The following sample code shows how they work:
ok = file:write_file("/tmp/file", "This is line 1.\nThis is line 2.\n"),
{ok, F} = file:open("/tmp/file", [read, write, raw, binary]),
ok = file:pwrite(F, 0, <<"This is line A.\n">>),
ok = file:close(F),
{ok, Content} = file:read_file("/tmp/file"),
io:put_chars(Content),
ok = file:delete("/tmp/file").
It will output:
This is line A.
This is line 2.
This works because text "This is line A.\n" is exactly as long as "This is line 1.\n". It does not really replace the line, but just bytes. If you need to replace the first line with content that has a different length, you need to rewrite the whole content of the file. A common approach is indeed to write a new file and swap them eventually. If the file is small enough, however, you can read it entirely in memory and rewrite it. file:read_file/1 and file:write_file/2 would work:
replace_first_line(Path, NewLine) ->
{ok, Content} = file:read_file(Path),
[FirstLine | Tail] = binary:split(Content, <<"\n">>),
NewContent = [NewLine, <<"\n">> | Tail],
ok = file:write_file(Path, NewContent).
The question is not related to erlang but rather general file operations.
Replacing a line in a file requires to rewrite the file in a whole. The easiest way to do so would be to write all the new content in a new file and then to move the file.