Why does Perl 6 try to evaluate an infinite list only in one of two similar situations? - raku

Suppose I define a lazy, infinite array using a triangular reduction at the REPL, with a single element pasted onto the front:
> my #s = 0, |[\+] (1, 2 ... *)
I can print out the first few elements:
> #s[^10]
(0 1 3 6 10 15 21 28 36 45)
I'd like to move the zero element inside the reduction like so:
> my #s = [\+] (0, |(1, 2 ... *))
However, in response to this, the REPL hangs, presumably by trying to evaluate the infinite list.
If I do it in separate steps, it works:
> my #s = 0, |(1, 2 ... *)
> ([\+] #s)[^10]
(0 1 3 6 10 15 21 28 36 45)
Why doesn't the way that doesn't work...work?

Short answer:
It is probably a bug.
Long answer:
(1, 2 ... *) produces a lazy sequence because it is obviously infinite, but somehow that is not making the resulting sequence from being marked as lazy.
Putting a sequence into an array #s causes it to be eagerly evaluated unless it is marked as being lazy.
Quick fix:
Append lazy to the front.
> my #s = [\+] lazy 0, |(1, 2 ... *)
> #s[^10]
(0 1 3 6 10 15 21 28 36 45)


Sorting/Optimization problem with rows in a pandas dataframe [duplicate]

So if I was given a sorted list/array i.e. [1,6,8,15,40], the size of the array, and the requested number..
How would you find the minimum number of values required from that list to sum to the requested number?
For example given the array [1,6,8,15,40], I requested the number 23, it would take 2 values from the list (8 and 15) to equal 23. The function would then return 2 (# of values). Furthermore, there are an unlimited number of 1s in the array (so you the function will always return a value)
Any help is appreciated
The NP-complete subset-sum problem trivially reduces to your problem: given a set S of integers and a target value s, we construct set S' having values (n+1) xk for each xk in S and set the target equal to (n+1) s. If there's a subset of the original set S summing to s, then there will be a subset of size at most n in the new set summing to (n+1) s, and such a set cannot involve extra 1s. If there is no such subset, then the subset produced as an answer must contain at least n+1 elements since it needs enough 1s to get to a multiple of n+1.
So, the problem will not admit any polynomial-time solution without a revolution in computing. With that disclaimer out of the way, you can consider some pseudopolynomial-time solutions to the problem which work well in practice if the maximum size of the set is small.
Here's a Python algorithm that will do this:
import functools
S = [1, 6, 8, 15, 40] # must contain only positive integers
#functools.lru_cache(maxsize=None) # memoizing decorator
def min_subset(k, s):
# returns the minimum size of a subset of S[:k] summing to s, including any extra 1s needed to get there
best = s # use all ones
for i, j in enumerate(S[:k]):
if j <= s:
sz = min_subset(i, s-j)+1
if sz < best: best = sz
return best
print min_subset(len(S), 23) # prints 2
This is tractable even for fairly large lists (I tested a random list of n=50 elements), provided their values are bounded. With S = [random.randint(1, 500) for _ in xrange(50)], min_subset(len(S), 8489) takes less than 10 seconds to run.
There may be a simpler solution, but if your lists are sufficiently short, you can just try every set of values, i.e.:
1 --> Not 23
6 --> Not 23
1 + 6 = 7 --> Not 23
1 + 8 = 9 --> Not 23
1 + 40 = 41 --> Not 23
6 + 8 = 14 --> Not 23
8 + 15 = 23 --> Oh look, it's 23, and we added 2 values
If you know your list is sorted, you can skip some tests, since if 6 + 20 > 23, then there's no need to test 6 + 40.

No of Passes in a Bubble Sort

For the list of items in an array i.e. {23 , 12, 8, 15, 21}; the number of passes I could see is only 3 contrary to (n-1) passes for n elements. I also see that the (n-1) passes for n elements is also the worst case where all the elements are in descending order. So I have got 2 conclusions and I request you to let me know if my understanding is right wrt the conclusions.
Conclusion 1
(n-1) passes for n elements can occur in the following scenarios:
all elements in the array are in descending order which is the worst case
when it is not the best case(no of passes is 1 for a sorted array)
Conclusion 2
(n-1) passes for n elements is more like a theoretical concept and may not hold true in all cases as in this example {23 , 12, 8, 15, 21}. Here the number of passes are (n-2).
In a classic, one-directional bubble sort, no element is moved more than one position to the “left” (to a smaller index) in a pass. Therefore you must make n-1 passes when (and only when) the smallest element is in the largest position. Example:
12 15 21 23 8 // initial array
12 15 21 8 23 // after pass 1
12 15 8 21 23 // after pass 2
12 8 15 21 23 // after pass 3
8 12 15 21 23 // after pass 4
Let's define L(i) as the number of elements to the left of element i that are larger than element i.
The array is sorted when L(i) = 0 for all i.
A bubble sort pass decreases every non-zero L(i) by one.
Therefore the number of passes required is max(L(0), L(1), ..., L(n-1)). In your example:
23 12 8 15 21 // elements
0 1 2 1 1 // L(i) for each element
The max L(i) is L(2): the element at index 2 is 8 and there are two elements left of 8 that are larger than 8.
The bubble sort process for your example is
23 12 8 15 21 // initial array
12 8 15 21 23 // after pass 1
8 12 15 21 23 // after pass 2
which takes max(L(i)) = L(2) = 2 passes.
For a detailed analysis of bubble sort, see The Art of Computer Programming Volume 3: Sorting and Searching section 5.2.2. In the book, what I called the L(i) function is called the “inversion table” of the array.
re: Conclusion 1
all elements in the array are in descending order which is the worst case
Yep, this is when you'll have to do all (n-1) "passes".
when it is not the best case (no of passes is 1 for a sorted array)
No. When you don't have the best-case, you'll have more than 1 passes. So long as it's not fully sorted, you'll need less than (n-1) passes. So it's somewhere in between
re: Conclusion 2
There's nothing theoretical about it at all. You provide an example of a middle-ground case (not fully reversed, but not fully sorted either), and you end up needing a middle-ground number of passes through it. What's theoretical about it?

Finding the contiguous sequences of equal elements in a list Raku

I'd like to find the contiguous sequences of equal elements (e.g. of length 2) in a list
my #s = <1 1 0 2 0 2 1 2 2 2 4 4 3 3>;
say grep {$^a eq $^b}, #s;
# ==> ((1 1) (2 2) (4 4) (3 3))
This code looks ok but when one more 2 is added after the sequence of 2 2 2 or when one 2 is removed from it, it says Too few positionals passed; expected 2 arguments but got 1 How to fix it? Please note that I'm trying to find them without using for loop, i.e. I'm trying to find them using a functional code as much as possible.
Optional: In the bold printed section:
<1 1 0 2 0 2 1 2 2 2 4 4 3 3>
multiple sequences of 2 2 are seen. How to print them the number of times they are seen? Like:
((1 1) (2 2) (2 2) (4 4) (3 3))
There are an even number of elements in your input:
say elems <1 1 0 2 0 2 1 2 2 2 4 4 3 3>; # 14
Your grep block consumes two elements each time:
{$^a eq $^b}
So if you add or remove an element you'll get the error you're getting when the block is run on the single element left over at the end.
There are many ways to solve your problem.
But you also asked about the option of allowing for overlapping so, for example, you get two (2 2) sub-lists when the sequence 2 2 2 is encountered. And, in a similar vein, you presumably want to see two matches, not zero, with input like:
<1 2 2 3 3 4>
So I'll focus on solutions that deal with those issues too.
Despite the narrowing of solution space to deal with the extra issues, there are still many ways to express solutions functionally.
One way that just appends a bit more code to the end of yours:
my #s = <1 1 0 2 0 2 1 2 2 2 4 4 3 3>;
say grep {$^a eq $^b}, #s .rotor( 2 => -1 ) .flat
The .rotor method converts a list into a list of sub-lists, each of the same length. For example, say <1 2 3 4> .rotor: 2 displays ((1 2) (3 4)). If the length argument is a pair, then the key is the length and the value is an offset for starting the next pair. If the offset is negative you get sub-list overlap. Thus say <1 2 3 4> .rotor: 2 => -1 displays ((1 2) (2 3) (3 4)).
The .flat method "flattens" its invocant. For example, say ((1,2),(2,3),(3,4)) .flat displays (1 2 2 3 3 4).
A perhaps more readable way to write the above solution would be to omit the flat and use .[0] and .[1] to index into the sub-lists returned by rotor:
say #s .rotor( 2 => -1 ) .grep: { .[0] eq .[1] }
See also Elizabeth Mattijsen's comment for another variation that generalizes for any sub-list size.
If you needed a more general coding pattern you might write something like:
say #s .pairs .map: { .value xx 2 if .key < #s - 1 and [eq] #s[.key,.key+1] }
The .pairs method on a list returns a list of pairs, each pair corresponding to each of the elements in its invocant list. The .key of each pair is the index of the element in the invocant list; the .value is the value of the element.
.value xx 2 could have been written .value, .value. (See xx.)
#s - 1 is the number of elements in #s minus 1.
The [eq] in [eq] list is a reduction.
If you need text pattern matching to decide what constitutes contiguous equal elements you might convert the input list into a string, match against that using one of the match adverbs that generate a list of matches, then map from the resulting list of matches to your desired result. To match with overlaps (eg 2 2 2 results in ((2 2) (2 2)) use :ov:
say #s .Str .match( / (.) ' ' $0 /, :ov ) .map: { .[0].Str xx 2 }
Here's an iterative approach using gather/take.
say gather for <1 1 0 2 0 2 1 2 2 2 4 4 3 3> {
state $last = '';
take ($last, $_) if $last == $_;
$last = $_;
# ((1 1) (2 2) (2 2) (4 4) (3 3))

creating a variable that change sizes in for loop

I have to create a fits file using the data from two IDL structures. This is not the basic problem.
My problem is that first I have to create a variable that contains the two structures.
To create this I used a for loop that will write at each step a new row of my variable.
The problem is that I cannot add the new row at the next step, it overwrite it so at the end my fits file instead of having, I don't know, 10000 rows, it has only one row.
This is what I also tried
for jj=0,h[1]-1 do begin
test[*,jj] = [sme.wave[jj], sme.smod[jj]]
but the * wildcard is messing up everything because now inside test I have the number corresponding to jj, not the values of sme.wave and sme.smod.
I hope that someone can understand what I asked and that can help me!
thank you in advance!
Assuming your "sme.wave" and "sme.smod" structure fields contain 1-D arrays with the same number of elements as there are rows in "test", then your code should work. For example, I tried this and got the following output:
IDL> test = intarr(2, 10) ; all zeros
IDL> sme = {wave:indgen(10), smod:indgen(10)*2}
IDL> for jj=0, 9 do test[*,jj] = [sme.wave[jj], sme.smod[jj]]
IDL> print, test
0 0
1 2
2 4
3 6
4 8
5 10
6 12
7 14
8 16
9 18
However, for better speed optimization, you should instead do the following and take advantage of IDL's multi-threaded array operations. Looping is typically much slower than something like the following:
IDL> test = intarr(2, 10) ; all zeros
IDL> sme = {wave:indgen(10), smod:indgen(10)*2}
IDL> test[0,*] = sme.wave
IDL> test[1,*] = sme.smod
IDL> print, test
0 0
1 2
2 4
3 6
4 8
5 10
6 12
7 14
8 16
9 18
Further, if you don't know what the size of "test" is ahead of time, and you want to append to the variable, i.e. add a row, then you can do this:
IDL> test = []
IDL> sme = {wave:Indgen(10), smod:Indgen(10)*2}
IDL> for jj=0, 9 do test = [[test], [sme.wave[jj], sme.smod[jj]]]
IDL> Print, test
0 0
1 2
2 4
3 6
4 8
5 10
6 12
7 14
8 16
9 18

Hash function to iterate through a matrix

Given a NxN matrix and a (row,column) position, what is a method to select a different position in a random (or pseudo-random) order, trying to avoid collisions as much as possible?
For example: consider a 5x5 matrix and start from (1,2)
0 0 0 0 0
0 0 X 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
I'm looking for a method like
(x,y) hash (x,y);
to jump to a different position in the matrix, avoiding collisions as much as possible
(do not care how to return two different values, it doesn't matter, just think of an array).
Of course, I can simply use
row = rand()%N;
column = rand()%N;
but it's not that good to avoid collisions.
I thought I could apply twice a simple hash method for both row and column and use the results as new coordinates, but I'm not sure this is a good solution.
Any ideas?
Can you determine the order of the walk before you start iterating? If your matrices are large, this approach isn't space-efficient, but it is straightforward and collision-free. I would do something like:
Generate an array of all of the coordinates. Remove the starting position from the list.
Shuffle the list (there's sample code for a Fisher-Yates shuffle here)
Use the shuffled list for your walk order.
Edit 2 & 3: A modular approach: Given s array elements, choose a prime p of form 2+3*n, p>s. For i=1 to p, use cells (iii)%p when that value is in range 1...s-1. (For row-length r, cell #c subscripts are c%r, c/r.)
Effectively, this method uses H(i) = (iii) mod p as a hash function. The reference shows that as i ranges from 1 to p, H(i) takes on each of the values from 0 to p-1, exactly one time each.
For example, with s=25 and p=29 or 47, this uses cells in following order:
p=29: 1 8 6 9 13 24 19 4 14 17 22 18 11 7 12 3 15 10 5 16 20 23 2 21 0
p=47: 1 8 17 14 24 13 15 18 7 4 10 2 6 21 3 22 9 12 11 23 5 19 16 20 0
according to bc code like
s=25;p=29;for(i=1;i<=p;++i){t=(i^3)%p; if(t<s){print " ",t}}
The text above shows the suggestion I made in Edit 2 of my answer. The text below shows my first answer.
Edit 0: (This is the suggestion to which Seamus's comment applied): A simple method to go through a vector in a "random appearing" way is to repeatedly add d (d>1) to an index. This will access all elements if d and s are coprime (where s=vector length). Note, my example below is in terms of a vector; you could do the same thing independently on the other axis of your matrix, with a different delta for it, except a problem mentioned below would occur. Note, "coprime" means that gcd(d,s)=1. If s is variable, you'd need gcd() code.
Example: Say s is 10. gcd(s,x) is 1 for x in {1,3,7,9} and is not 1 for x in {2,4,5,6,8,10}. Suppose we choose d=7, and start with i=0. i will take on values 0, 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, which modulo 10 is 0, 7, 4, 1, 8, 5, 2, 9, 6, 3, 0.
Edit 1 & 3: Unfortunately this will have a problem in the two-axis case; for example, if you use d=7 for x axis, and e=3 for y-axis, while the first 21 hits will be distinct, it will then continue repeating the same 21 hits. To address this, treat the whole matrix as a vector, use d with gcd(d,s)=1, and convert cell numbers to subscripts as above.
If you just want to iterate through the matrix, what is wrong with row++; if (row == N) {row = 0; column++}?
If you iterate through the row and the column independently, and each cycles back to the beginning after N steps, then the (row, column) pair will interate through only N of the N^2 cells of the matrix.
If you want to iterate through all of the cells of the matrix in pseudo-random order, you could look at questions here on random permutations.
This is a companion answer to address a question about my previous answer: How to find an appropriate prime p >= s (where s = the number of matrix elements) to use in the hash function H(i) = (i*i*i) mod p.
We need to find a prime of form 3n+2, where n is any odd integer such that 3*n+2 >= s. Note that n odd gives 3n+2 = 3(2k+1)+2 = 6k+5 where k need not be odd. In the example code below, p = 5+6*(s/6); initializes p to be a number of form 6k+5, and p += 6; maintains p in this form.
The code below shows that half-a-dozen lines of code are enough for the calculation. Timings are shown after the code, which is reasonably fast: 12 us at s=half a million, 200 us at s=half a billion, where us denotes microseconds.
// timing how long to find primes of form 2+3*n by division
// jiw 20 Sep 2011
#include <stdlib.h>
#include <stdio.h>
#include <sys/time.h>
double ttime(double base) {
struct timeval tod;
gettimeofday(&tod, NULL);
return tod.tv_sec + tod.tv_usec/1e6 - base;
int main(int argc, char *argv[]) {
int d, s, p, par=0;
double t0=ttime(0);
++par; s=5000; if (argc > par) s = atoi(argv[par]);
p = 5+6*(s/6);
while (1) {
for (d=3; d*d<p; d+=2)
if (p%d==0) break;
if (d*d >= p) break;
p += 6;
printf ("p = %d after %.6f seconds\n", p, ttime(t0));
return 0;
Timing results on 2.5GHz Athlon 5200+:
qili ~/px > for i in 0 00 000 0000 00000 000000; do ./divide-timing 500$i; done
p = 5003 after 0.000008 seconds
p = 50021 after 0.000010 seconds
p = 500009 after 0.000012 seconds
p = 5000081 after 0.000031 seconds
p = 50000021 after 0.000072 seconds
p = 500000003 after 0.000200 seconds
qili ~/px > factor 5003 50021 500009 5000081 50000021 500000003
5003: 5003
50021: 50021
500009: 500009
5000081: 5000081
50000021: 50000021
500000003: 500000003
Update 1 Of course, timing is not determinate (ie, can vary substantially depending on the value of s, other processes on machine, etc); for example:
qili ~/px > time for i in 000 004 010 058 070 094 100 118 184; do ./divide-timing 500000$i; done
p = 500000003 after 0.000201 seconds
p = 500000009 after 0.000201 seconds
p = 500000057 after 0.000235 seconds
p = 500000069 after 0.000394 seconds
p = 500000093 after 0.000200 seconds
p = 500000099 after 0.000201 seconds
p = 500000117 after 0.000201 seconds
p = 500000183 after 0.000211 seconds
p = 500000201 after 0.000223 seconds
real 0m0.011s
user 0m0.002s
sys 0m0.004s
Consider using a double hash function to get a better distribution inside the matrix,
but given that you cannot avoid colisions, what I suggest is to use an array of sentinels
and mark the positions you visit, this way you are sure you get to visit a cell once.