Why does comb behave differently inside the loop? - raku

Note: I'm doing this all from the REPL using
This is Rakudo Star version 2019.03.1 built on MoarVM version 2019.03
implementing Perl 6.d.
From euler project #22 there is a names.txt file similar to
"JERE","HAI","ELDEN","DORSEY","DARELL","BRODERICK","ALONSO", ...
When I read that in, split and sort it I get the list of names as I'd expect.
for '../names.txt'.IO.slurp.split(',').sort -> $name {
say $name;
}
That prints out
...
"ZONIA"
"ZORA"
"ZORAIDA"
"ZULA"
"ZULEMA"
"ZULMA"
Now if I add comb()
for '../names.txt'.IO.slurp.split(',').sort -> $name {
say $name.comb;
}
I'm getting
...
(" Z O N I A ")
(" Z O R A ")
(" Z O R A I D A ")
(" Z U L A ")
(" Z U L E M A ")
(" Z U L M A ")
However, if I just run from the repl;
> "ZULMA".comb
I get
(Z U L M A) # Note the lack of quotes
Why is comb behaving differently in these two scenarios?

It's not behaving differently. In one case the quotes are a syntax element - part of the code - and in the other case data read verbatim from the file:
"ZULMA".comb
'"ZULMA"'.comb
The data is different. not the behaviour.

Related

What does \m mean in expression "l = \m -> ..."?

semOp l = \m -> case l of
LD g -> case m of
St xs -> St (g::xs)
_ -> Error
Just want to know what does the \m part is doing here.
\m -> ... is syntax for anonymous function, aka "lambda-expression".
For example, the following two declarations would be equivalent:
f x = x + 5
f = \x -> x + 5
Both define a function with one parameter x that returns a number greater than x by 5.
I wanted to add to Fyodor’s answer by saying that this is a slightly unusual way to write this - it’s making explicit that semOp l returns a lambda. The following are all valid and equivalent ways to write this function’s beginning:
semOp l m = case l of ..
semOp l = \m -> case l of ..
semOp = \l -> \m -> case l of ..
semOp = \l m -> case l of ..
You might be most used to the first version, but in some languages like Haskell these can have slightly difference performance impacts - but this isn’t something for you to worry about, particularly in Elm as far as I understand.

How can I filter out consecutive repeats in a sequence?

I'm trying to generate a Seq of random options where consecutive repeats are not allowed:
> (<F R U>.roll(*).grep(* ne *))[^10]
((R F) (F U) (R F) (F R) (U R) (R F) (R F) (U F) (R U) (U R))
Why does using grep in this way result in nested pairs? And how can I achieve what I'm looking for? Using .flat above will still allow consecutive repeats.
( R U F U F U R F R U ... )
how can I achieve what I'm looking for?
I believe the squish method will do what you want:
say <F R U>.roll(*).squish.head(20)
This produces:
(U R F U F R F R F U F R U R F U R U R F)
Why does using grep in this way result in nested pairs?
Because grep (and map too) on something with an arity greater than 1 will work by breaking the list into chunks of that size. Thus given A B C D, the first call to the grep block gets A B, the second C D, and so on. What's really needed in this case, however, is a lagged value. One could use a state variable in the block passed to grep to achieve this, however for this problem squish is a much simpler solution.
You could employ a simple regex using the <same> zero-width matcher. Below in the Raku REPL:
> my $a = (<F R U>.roll(*))[^20].comb(/\w/).join; say $a;
RRFUURFFFFRRFRFUFUUU
> say $/ if $a ~~ m:g/ \w <!same> /;
(「R」 「F」 「U」 「R」 「F」 「R」 「F」 「R」 「F」 「U」 「F」 「U」)
https://docs.raku.org/language/regexes#Predefined_Regexes

Generate context free grammar for the following language

**{a^i b^j c^k d^m | i+j=k+m | i<m}**
The grammar should allow the language in order abbccd not cbbcda. First should be the a's then b's and so on.
I know that you must "count" the number of a's and b's you are adding to make sure there are an equivalent number of c's and d's. I just can't seem to figure out how to make sure there are more c's than a's in the language. I appreciate any help anyone can give. I've been working on this for many hours now.
Edit:
The grammar should be Context Free
I have only got these two currently because all others turned out to be very wrong:
S -> C A D
| B
B -> C B D
|
C -> a
| b
D -> c
| d
and
S -> a S d
| A
A -> b A c
|
(which is close but doesn't satisfy the i < k part)
EDIT: This is for when i < k, not i < m. OP changed the problem, but I figure this answer may still be useful.
This is not a context free grammar, and this can be proven with the pumping lemma which states that if the grammar is context free, there exists an integer p > 0, such that all strings in the language of length >= p can be split into a substring uvwxy, where len(vx) >= 1, len(vwx) <= p, and uvnwxny is a member of the language for all n >= 0.
Suppose that a value of p exists. We can create a string such that:
k = i + 1
j = m + 1
j > p
k > p
v and x cannot contain more than one type of character or be both on the left side or both on the right side, because then raising them to powers would break the grammar immediately. They cannot be the same character as each other, because then multiplying them would break the rule that i + j = k + m. v cannot be a if x is d, because then w contains the bs and cs, which makes len(vwx) > p. By the same reasoning, v cannot be as if x is cs, and v cannot be bs if x is ds. The only remaining option is bs and cs, but setting n to 0 would make i >= k and j >= m, breaking the grammar.
Therefore, it is not a context free grammar.
There has to be at least one d because i < m, so there has to be a b somewhere to offset it. T and V guarantee this criterion before moving to S, the accepted state.
T ::= bd | bTd
U ::= bc | bUc
V ::= bUd | bVd
S ::= T | V | aSd

Optimizing partial computation in Haskell

I'm curious how to optimize this code :
fun n = (sum l, f $ f0 l, g $ g0 l)
where l = map h [1..n]
Assuming that f, f0, g, g0, and h are all costly, but the creation and storage of l is extremely expensive.
As written, l is stored until the returned tuple is fully evaluated or garbage collected. Instead, length l, f0 l, and g0 l should all be executed whenever any one of them is executed, but f and g should be delayed.
It appears this behavior could be fixed by writing :
fun n = a `seq` b `seq` c `seq` (a, f b, g c)
where
l = map h [1..n]
a = sum l
b = inline f0 $ l
c = inline g0 $ l
Or the very similar :
fun n = (a,b,c) `deepSeq` (a, f b, g c)
where ...
We could perhaps specify a bunch of internal types to achieve the same effects as well, which looks painful. Are there any other options?
Also, I'm obviously hoping with my inlines that the compiler fuses sum, f0, and g0 into a single loop that constructs and consumes l term by term. I could make this explicit through manual inlining, but that'd suck. Are there ways to explicitly prevent the list l from ever being created and/or compel inlining? Pragmas that produce warnings or errors if inlining or fusion fail during compilation perhaps?
As an aside, I'm curious about why seq, inline, lazy, etc. are all defined to by let x = x in x in the Prelude. Is this simply to give them a definition for the compiler to override?
If you want to be sure, the only way is to do it yourself. For any given compiler version, you can try out several source-formulations and check the generated core/assembly/llvm byte-code/whatever whether it does what you want. But that could break with each new compiler version.
If you write
fun n = a `seq` b `seq` c `seq` (a, f b, g c)
where
l = map h [1..n]
a = sum l
b = inline f0 $ l
c = inline g0 $ l
or the deepseq version thereof, the compiler might be able to merge the computations of a, b and c to be performed in parallel (not in the concurrency sense) during a single traversal of l, but for the time being, I'm rather convinced that GHC doesn't, and I'd be surprised if JHC or UHC did. And for that the structure of computing b and c needs to be simple enough.
The only way to obtain the desired result portably across compilers and compiler versions is to do it yourself. For the next few years, at least.
Depending on f0 and g0, it might be as simple as doing a strict left fold with appropriate accumulator type and combining function, like the famous average
data P = P {-# UNPACK #-} !Int {-# UNPACK #-} !Double
average :: [Double] -> Double
average = ratio . foldl' count (P 0 0)
where
ratio (P n s) = s / fromIntegral n
count (P n s) x = P (n+1) (s+x)
but if the structure of f0 and/or g0 doesn't fit, say one's a left fold and the other a right fold, it may be impossible to do the computation in one traversal. In such cases, the choice is between recreating l and storing l. Storing l is easy to achieve with explicit sharing (where l = map h [1..n]), but recreating it may be difficult to achieve if the compiler does some common subexpression elimination (unfortunately, GHC does have a tendency to share lists of that form, even though it does little CSE). For GHC, the flags fno-cse and -fno-full-laziness can help avoiding unwanted sharing.

Haskell Heap Issues with Parameter Passing Style

Here's a simple program that blows my heap to Kingdom Come:
intersect n k z s rs c
| c == 23 = rs
| x == y = intersect (n+1) (k+1) (z+1) (z+s) (f : rs) (c+1)
| x < y = intersect (n+1) k (z+1) s rs c
| otherwise = intersect n (k+1) z s rs c
where x = (2*n*n) + 4 * n
y = (k * k + k )
f = (z, (x `div` 2), (z+s))
p = intersect 1 1 1 0 [] 0
main = do
putStr (show p)
What the program does is calculate the intersection of two infinite series, stopping when it reaches 23 elements. But that's not important to me.
What's interesting is that as far as I can tell, there shouldn't be much here that is sitting on the heap. The function intersect is recursives with all recursions written as tail calls. State is accumulated in the arguments, and there is not much of it. 5 integers and a small list of tuples.
If I were a betting person, I would bet that somehow thunks are being built up in the arguments as I do the recursion, particularly on arguments that aren't evaluated on a given recursion. But that's just a wild hunch.
What's the true problem here? And how does one fix it?
If you have a problem with the heap, run the heap profiler, like so:
$ ghc -O2 --make A.hs -prof -auto-all -rtsopts -fforce-recomp
[1 of 1] Compiling Main ( A.hs, A.o )
Linking A.exe ...
Which when run:
$ ./A.exe +RTS -M1G -hy
Produces an A.hp output file:
$ hp2ps -c A.hp
Like so:
So your heap is full of Integer, which indicates some problem in the accumulating parameters of your functions -- where all the Integers are.
Modifying the function so that it is strict in the lazy Integer arguments (based on the fact you never inspect their value), like so:
{-# LANGUAGE BangPatterns #-}
intersect n k !z !s rs c
| c == 23 = rs
| x == y = intersect (n+1) (k+1) (z+1) (z+s) (f : rs) (c+1)
| x < y = intersect (n+1) k (z+1) s rs c
| otherwise = intersect n (k+1) z s rs c
where x = (2*n*n) + 4 * n
y = (k * k + k )
f = (z, (x `div` 2), (z+s))
p = intersect 1 1 1 0 [] 0
main = do
putStr (show p)
And your program now runs in constant space with the list of arguments you're producing (though doesn't terminate for c == 23 in any reasonable time).
If it is OK to get the resulting list reversed, you can take advantage of Haskell's laziness and return the list as it is computed, instead of passing it recursively as an accumulating argument. Not only does this let you consume and print the list as it is being computed (thereby eliminating one space leak right there), you can also factor out the decision about how many elements you want from intersect:
{-# LANGUAGE BangPatterns #-}
intersect n k !z s
| x == y = f : intersect (n+1) (k+1) (z+1) (z+s)
| x < y = intersect (n+1) k (z+1) s
| otherwise = intersect n (k+1) z s
where x = (2*n*n) + 4 * n
y = (k * k + k )
f = (z, (x `div` 2), (z+s))
p = intersect 1 1 1 0
main = do
putStrLn (unlines (map show (take 23 p)))
As Don noted, we need to be careful so that accumulating arguments evaluate timely instead of building up big thunks. By making the argument z strict we ensure that all arguments will be demanded.
By outputting one element per line, we can watch the result being produced:
$ ghc -O2 intersect.hs && ./intersect
[1 of 1] Compiling Main ( intersect.hs, intersect.o )
Linking intersect ...
(1,3,1)
(3,15,4)
(10,120,14)
(22,528,36)
(63,4095,99)
(133,17955,232)
(372,139128,604)
(780,609960,1384)
...