Reading file line by line in Perl6, how to do idiomatically? - raku

I have a rudimentary script in Perl6 which runs very slowly, about 30x slower than the exact perl5 translation.
CONTROL {
when CX::Warn {
note $_;
exit 1;
}
}
use fatal;
role KeyRequired {
method AT-KEY (\key) {
die "Key {key} not found" unless self.EXISTS-KEY(key);
nextsame;
}
}
for dir(test => /^nucleotide_\d**2_\d**2..3\.tsv$/) -> $tsv {
say $tsv;
my $qqman = $tsv.subst(/\.tsv$/, '.qqman.tsv');
my $out = open $qqman, :w;
put "\t$qqman";
my UInt $line-no = 0;
for $tsv.lines -> $line {
if $line-no == 0 {
$line-no = 1;
$out.put(['SNP', 'CHR', 'BP', 'P', 'zscore'].join("\t"));
next
}
if $line ~~ /.+X/ {
next
}
$line-no++;
my #line = $line.split(/\s+/);
my $chr = #line[0];
my $nuc = #line[1];
my $p = #line[3];
my $zscore = #line[2];
my $snp = "'rs$line-no'";
$out.put([$snp, $chr, $nuc, $p, $zscore].join("\t"));
#$out.put();
}
last
}
this is idiomatic in Perl5's while.
This is a very simple script, which only alters columns of text in a file. This Perl6 script runs in 30 minutes. The Perl5 translation runs in 1 minute.
I've tried reading Using Perl6 to process a large text file, and it's Too Slow.(2014-09) and Perl6 : What is the best way for dealing with very big files? but I'm not seeing anything that could help me here :(
I'm running Rakudo version 2018.03 built on MoarVM version 2018.03
implementing Perl 6.c.
I realize that Rakudo hasn't matured to Perl5's level (yet, I hope), but how can I get this to read the file line by line in a more reasonable time frame?

There is a bunch of things I would change.
/.+X/ can be simplified to just /.X/ or even $line.substr(1).contains('X')
$line.split(/\s+/) can be simplified to $line.words
$tsv.subst(/\.tsv$/, '.qqman.tsv') can be simplified to $tsv.substr(*-4) ~ '.qqman.tsv'
uint instead of UInt
given .head {} instead of for … {last}
given dir(test => /^nucleotide_\d**2_\d**2..3\.tsv$/).head -> $tsv {
say $tsv;
my $qqman = $tsv.substr(*-4) ~ '.qqman.tsv';
my $out = open $qqman, :w;
put "\t$qqman";
my uint $line-no = 0;
for $tsv.lines -> $line {
FIRST {
$line-no = 1;
$out.put(('SNP', 'CHR', 'BP', 'P', 'zscore').join("\t"));
next
}
next if $line.substr(1).contains('X');
++$line-no;
my ($chr,$nuc,$zscore,$p) = $line.words;
my $snp = "'rs$line-no'";
$out.put(($snp, $chr, $nuc, $p, $zscore).join("\t"));
#$out.put();
}
}

Related

how to make a context aware code evaluator

I was looking at REPL-like evaluation of code from here and here, and tried to make a very small version for it, yet it fails:
use nqp;
class E {
has Mu $.compiler;
has $!save_ctx;
method evaluate(#fragments) {
for #fragments -> $code {
my $*MAIN_CTX;
my $*CTXSAVE := self;
$!compiler.eval($code,
outer_ctx => nqp::ctxcaller(nqp::ctx()));
if nqp::defined($*MAIN_CTX) {
$!save_ctx := $*MAIN_CTX;
}
}
}
method ctxsave(--> Nil) {
say "*in ctxsave*";
$*MAIN_CTX := nqp::ctxcaller(nqp::ctx());
$*CTXSAVE := 0;
}
}
my $e := E.new(compiler => nqp::getcomp("Raku"));
nqp::bindattr($e, E, '$!save_ctx', nqp::ctx());
$e.evaluate: ('say my #vals = 12, 3, 4;', 'say #vals.head');
I pieced together this from the above links without very much knowing what I'm doing :) When run, this happens:
*in ctxsave*
[12 3 4]
===SORRY!=== Error while compiling file.raku
Variable '#vals' is not declared. Did you mean '&val'?
file.raku:1
------> say ⏏#vals.head
with Rakudo v2022.04. First fragment was supposed to declare it (and prints it). Is it possible to do something like this, so it recognizes #vals as declared?
You can do it in pure Raku code, although depending on the not-exactly-official context parameter to EVAL.
# Let us use EVAL with user input
use MONKEY;
loop {
# The context starts out with a fresh environment
state $*REPL-CONTEXT = UNIT::;
# Get the next line of code to run.
my $next-code = prompt '> ';
# Evaluate it; note that exceptions with line numbers will be
# off by one, so may need fixups.
EVAL "\q'$*REPL-CONTEXT = ::;'\n$next-code", context => $*REPL-CONTEXT;
}
Trying it out:
$ raku simple-repl.raku
> my $x = 35;
> say $x;
35
> my $y = 7;
> say $x + $y;
42

After first "once {next}" block, other same-scoped "once" blocks fail to execute

My original plan was to use two once {next} blocks to skip the first two lines in a file (here emulating a as a multiline string):
for "A\nB\nC\n".lines() -> $line {
once {next}
once {next}
put $line;
}
But it only skipped one iteration instead of two, outputting the following:
B
C
Instead of what I expected:
C
Apparently a single once {next} somehow cancels all remaining once blocks in the same scope:
my $guard = 3;
loop {
last if $guard-- <= 0;
once { next };
once { put 'A: once ' };
once { put 'A: once again' };
put 'A: many ';
}
$guard = 3;
loop {
last if $guard-- <= 0;
once { put 'B: once ' };
once { next };
once { put 'B: once again' };
put 'B: many ';
}
$guard = 3;
loop {
last if $guard-- <= 0;
once { put 'C: once ' };
once { put 'C: once again' };
once { next };
put 'C: many ';
}
Outputting:
A: many
A: many
B: once
B: many
B: many
C: once
C: once again
C: many
C: many
(Example code here is a modified version of code at https://docs.raku.org/language/control#once).
Is this a bug or am I misunderstanding once {next} ?
The once construct semantics are associated with closure clones; since for is defined in terms of map, we can think of the block of the for loop being like a closure that is cloned once per loop, and that clone used for all iterations of the loop. The running of once blocks is done only on the first invocation of that closure clone. That is to say, it's a property at the level of the closure, not one of the once block itself.
The very same semantics apply to state variable initializers, which are defined in the same way (that is, they have once semantics). Therefore, this this also exhibits the same behavior:
for "A\nB\nC\n".lines() -> $line {
state $throwaway-a = next;
state $throwaway-b = next; # this `next` never runs
put $line;
}
Alternative semantics could have been chosen, however a per-once (and so per-state variable) indicator would imply an extra piece of state is needed for each of them.
So far as the original problem goes, a clearer solution would be:
for "A\nB\nC\n".lines().skip(2) -> $line {
put $line;
}

Can you loop a Perl 6 block that's in a variable?

I keep wanting to do something like this:
my $block := {
state $n = 0;
say $n++;
last if $n > 3;
};
loop $block;
Or even:
$block.loop;
I'm not expecting that this is possible but it would sure be cool if it was.
How would I find out where a particular routine comes from?
$ perl6
To exit type 'exit' or '^D'
> &loop.^name
===SORRY!=== Error while compiling:
Undeclared routine:
loop used at line 1
Making $block.loop work, is rather easy and could live in module land:
use MONKEY;
augment class Block {
method loop($self:) {
Nil while $self()
}
}
my $a = { print "a" };
$a.loop # aaaaaaaaaaaaaaaaaaa (with apologies to Knorkator)
Making loop $block work would be rather more involved, as this would involve changes to the action handling of the Perl 6 grammar.
Using what is already in Perl 6, you can use Seq.from-loop in sink context.
(Note that the REPL doesn't put the last statement on a line into sink context)
my $block := {
state $n = 0;
say $n++;
last if $n > 3;
}
Seq.from-loop: $block;
Seq.from-loop: {say $++}, {$++ <= 3};

How can I use a non-caching infinite lazy list in Perl 6

Infinite lazy lists are awesome!
> my #fibo = 0, 1, *+* ... *;
> say #fibo[1000];
43466557686937456435688527675040625802564660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875
They automatically cache their values, which is handy ... most of the time.
But when working with huge Fibonacci numbers (example), this can cause memory issues.
Unfortunately, I can't figure out how to create a non-caching Fibonacci sequence. Anyone?
One major problem is you are storing it in an array, which of course keeps all of its values.
The next problem is a little more subtle, the dotty sequence generator syntax LIST, CODE ... END doesn't know how many of the previous values the CODE part is going to ask for, so it keeps all of them.
( It could look at the arity/count of the CODE, but it doesn't currently seem to from experiments at the REPL )
Then there is the problem that using &postcircumfix:<[ ]> on a Seq calls .cache on the assumption that you are going to ask for another value at some point.
( From looking at the source for Seq.AT-POS )
It's possible that a future implementation could be better at each of these drawbacks.
You could create the sequence using a different feature to get around the current limitations of the dotty sequence generator syntax.
sub fibonacci-seq (){
gather {
take my $a = 0;
take my $b = 1;
loop {
take my $c = $a + $b;
$a = $b;
$b = $c;
}
}.lazy
}
If you are just iterating through the values you can just use it as is.
my $v;
for fibonacci-seq() {
if $_ > 1000 {
$v = $_;
last;
}
}
say $v;
my $count = 100000;
for fibonacci-seq() {
if $count-- <= 0 {
$v = $_;
last;
}
}
say chars $v; # 20899
You could also use the Iterator directly. Though this isn't necessary in most circumstances.
sub fibonacci ( UInt $n ) {
# have to get a new iterator each time this is called
my \iterator = fibonacci-seq().iterator;
for ^$n {
return Nil if iterator.pull-one =:= IterationEnd;
}
my \result = iterator.pull-one;
result =:= IterationEnd ?? Nil !! result
}
If you have a recent enough version of Rakudo you can use skip-at-least-pull-one.
sub fibonacci ( UInt $n ) {
# have to get a new iterator each time this is called
my \result = fibonacci-seq().iterator.skip-at-least-pull-one($n);
result =:= IterationEnd ?? Nil !! result
}
You can also implement the Iterator class directly, wrapping it in a Seq.
( this is largely how methods that return sequences are written in the Rakudo core )
sub fibonacci-seq2 () {
Seq.new:
class :: does Iterator {
has Int $!a = 0;
has Int $!b = 1;
method pull-one {
my $current = $!a;
my $c = $!a + $!b;
$!a = $!b;
$!b = $c;
$current;
}
# indicate that this should never be eagerly iterated
# which is recommended on infinite generators
method is-lazy ( --> True ) {}
}.new
}
Apparently, a noob cannot comment.
When defining a lazy iterator such as sub fibonacci-seq2, one should mark the iterator as lazy by adding a "is-lazy" method that returns True, e.g.:
method is-lazy(--> True) { }
This will allow the system to detect possible infiniloops better.

Perl6: is there a phaser that runs only when you fall out of a loop?

Take this sample code:
#!/usr/bin/env perl6
use v6.c;
ROLL:
for 1..10 -> $r {
given (1..6).roll {
when 6 {
say "Roll $r: you win!";
last ROLL;
}
default {
say "Roll $r: sorry...";
}
}
LAST {
say "You either won or lost - this runs either way";
}
}
I'd like to be able to distinguish falling out of the loop from explicitly saying last.
Ideally, there'd be a phaser for this, but as far as I can find, there is only LAST which runs in either case.
Is there an elegant way to do this? (Elegant, so without adding a $won variable.)
We're dealing with Perl, so There's More Than One Way To Do It; one of them is using the topic variable $_ to keep the value so we can easily match against it repeatedly:
constant N = 5;
for flat (1..6).roll xx * Z 1..N -> $_, $n {
print "roll $n: $_ ";
when 6 {
put "(won)";
last;
}
default {
put "(lost)";
}
LAST {
print "result: ";
when 6 { put "winner :)" }
default { put "loser :(" }
}
}
Here's another way to do it. Elegant? I think reasonably so. I wish there were a separate phaser for this, though.
#!/usr/bin/env perl6
use v6.c;
constant MAX_ROLLS = 10;
ROLL:
for 1..MAX_ROLLS+1 -> $r {
last ROLL if $r > MAX_ROLLS;
given (1..6).roll {
when 6 {
say "Roll $r: you win!";
last ROLL;
}
default {
say "Roll $r: sorry...";
}
}
LAST {
say "You lost, better luck next time!" if $r > MAX_ROLLS;
}
}