Can Perl6 cmp two strings case insensitively? - raku

I am doing a sort and would like to control the cmp of alpha values to be case insensitive (viz. https://perl6.org/archive/rfc/143.html).
Is there some :i adverb for this, perhaps?

Perl 6 doesn't currently have that as an option, but it is a very mutable language so let's add it.
Since the existing proto doesn't allow named values we have to add a new one, or write an only sub.
(That is you can just use the multi below except with an optional only instead.)
This only applies lexically, so if you write this you may want to mark the proto/only sub as being exportable depending on what you are doing.
proto sub infix:<leg> ( \a, \b, *% ){*}
multi sub infix:<leg> ( \a, \b, :ignore-case(:$i) ){
$i
?? &CORE::infix:<leg>( fc(a) , fc(b) )
!! &CORE::infix:<leg>( a , b )
}
say 'a' leg 'A'; # More
say 'a' leg 'A' :i; # Same
say 'a' leg 'A' :!i; # More
say 'a' leg 'A' :ignore-case; # Same
Note that :$i is short for :i( $i ) so the two named parameters could have been written as:
:ignore-case( :i( $i ) )
Also I used the sub form of fc() rather than the method form .fc because it allows the native form of strings to be used without causing autoboxing.

If you want a "dictionary" sort order, #timotimo is on the right track when they suggest collate and coll for sorting.
Use collate() on anything listy to sort it. Use coll as an infix operator in case you need a custom sort.
$ perl6
To exit tyype 'exit' or '^D'
> <a Zp zo zz ab 9 09 91 90>.collate();
(09 9 90 91 a ab zo Zp zz)
> <a Zp zo zz ab 9 09 91 90>.sort: * coll *;
(09 9 90 91 a ab zo Zp zz)

You can pass a code block to sort. If the arity of the block is one, it works on both elements when doing the comparison. Here's an example showing the 'fc' from the previous answer.
> my #a = <alpha BETA gamma DELTA>;
[alpha BETA gamma DELTA]
> #a.sort
(BETA DELTA alpha gamma)
> #a.sort(*.fc)
(alpha BETA DELTA gamma)

From the documentation
In order to do case-insensitive comparison, you can use .fc
(fold-case). The problem is that people tend to use .lc or .uc, and it
does seem to work within the ASCII range, but fails on other
characters. This is not just a Perl 6 trap, the same applies to other
languages.
For example:
say ‘groß’.fc eq ‘GROSS’.fc; # ← RIGHT; True
If you are working with regexes, then there is no need to use .fc and you can use :i (:ignorecase) adverb instead.

Related

Why do these 2 for looping over sequences differ?

First:
$ raku -e "for 1...6, 7...15 { .say }"
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Now:
$ raku -e "for 1...3, 7...15 { .say }"
1
2
3
7
11
15
I would expect this case to print 1,2,3,7,8,... 15.
What's happening here?
I think you might want the raku Range operator .. (two dots) and not the raku Sequence operator ... (three dots).
Here's how you examples look with the Range operator instead:
> raku -e 'for 1..6, 7..15 { .say }'
1..6
7..15
Oh, that's not good ... looks like for is just iterating over the two things 1..6 and 7..15 and stringifying them.
We can use a Slip | to fix that:
> raku -e 'for |(1..6), |(7..15) { .say }'
1
2
... (all the numbers)
14
15
And then:
raku -e 'for |(1..3), |(7..15) { .say }'
1
2
3
7
8
9
10
11
12
13
14
15
With the Sequence operator, you have made something like:
>raku -e 'for 3,7...15 { .say }'
3
7
11
15
That is raku for "make a sequence that starts with 3, then 7, then all the values until you get to the last at 15" ... and since the gap from 3 to 7 is 4, raku will count up in steps of 4. Then you began it with 1..3. ;-)
~p6steve
It's because it is two deductive sequences.
1...3
Is obviously a sequence where you add 1 to each successive value.
1, 2, 3
And since 7 is 4 more than 3, this is a sequence where you add 4 to each successive value.
3, 7 ... 15
3, 7, 11, 15
To get what you want, you could use a flattened Range.
1...3, |(7..15)
Or even a flattened Sequence.
1...3, |(7...15)
TL;DR This answer focuses on addressing what you originally asked (which was about "sequences") and precisely what the code you wrote is doing, rather than providing a solution (using ranges instead).
This is a work in progress dealing with something that seems both poorly documented and hard to fathom (which may explain part though not all of the doc situation). Please bear with me! (And I may just end up deleting this answer.)
1 ... 3, 7 ... 15 ≡ 1 ... (3, 7) ... 15
In the absence of parentheses, operators within an expression are applied according to rules of "precedence" and "associativity".
Infix , has a higher precedence than infix ....¹ The above two lines of code thus produce the same result (1␤2␤3␤7␤11␤15␤):
for 1 ... 3, 7 ... 15 { .say } # Operator evaluation by precedence
for 1 ... (3, 7) ... 15 { .say } # Operator evaluation by parentheses
That said, while the result is what, given a glance at the code, I would expect based on my own "magical" DWIM ("Do What I Mean") thinking, I must say I don't yet know what the precise Raku(do)'s rule(s) are that lead to it DWIMing.
The doc for infix ... says:
If the endpoint is not *, it's smartmatched against each generated element and the sequence is terminated when the smartmatch succeeded.
But that seems overly simple. What if the endpoint of one sequence is another sequence? (As, at least taking a naive view, appears to be the case in your code.)
Also, as #MustafaAydin has noted:
how does your post explain the irregular last step size (of 2) instead of 3? I mean 4, 7 ... 15 alone produces (4, 7, 10, 13). But 1... 4, 7...15 now produces 7, 10, 13, 15 in the tail. Why is 15 included? Maybe i'm missing something idk
I'm at least as confused as Mustafa.
Indeed, I'm confused about several things. How come Raku(do) flattens the two sequences? [D'oh. Because the infix comma is higher precedence than the infix ....] Why doesn't it repeat the 3 in the final combined list? [Perhaps because multiple infix ...s are smart about what to do when there's an expression that's the endpoint of one sequence and the start of another?]
I'm going to go read the old design docs and/or spelunk roast and/or the Rakudo compiler code to see if I can see what's supposedly/actually going on. But not tonight.
Footnotes
¹ There's a table of operators in the current official operator doc. Supposedly this table:
summarizes the precedence levels offered by Raku, listing them in order from high to low precedence.
Unfortunately, at the time of writing this, the central operator table in the Operators page is profoundly wrong #4071.
Until that's fixed, here are "official" and "unofficial" options for determining the precedence of operators:
"official" Use in page search to search the official doc operator page for the operator of interest. Skip to the match in the entries on the left hand side of that same page. As you'll see, infix ,' is one level higher precedence than infix ...`:
Comma operator precedence
infix ,
infix :
List infix precedence
infix Z
infix X
infix ...
"unofficial" Look at the corresponding page of a staging site for an improved doc site. (I don't know how up to date it is, but the central table appears to list operators by precedence order as it claims.)

Testing with a metaoperator doesn't print the test description

I was writing tests on Complex arrays and I was using the Z≅ operator to check whether the arrays were approximately equal, when I noticed a missing test description.
I tried to golf the piece of code to find out the simplest case that shows the result I was seeing. The description is missing in the second test even when I use Num or Int variables and the Z== operator.
use Test;
my #a = 1e0, 3e0;
my #b = 1e0, 3e0;
ok #a[0] == #b[0], 'description1'; # prints: ok 1 - description1
ok #a[^2] Z== #b[^2], 'description2'; # prints: ok 2 -
done-testing;
Is there a simple explanation or is this a bug?
It's just precedence -- you need parens.
== is a binary op that takes a single operand on either side.
The Z metaop distributes its operator to a list on either side.
use Test;
my #a = 1e0, 3e0;
my #b = 1e0, 3e0;
ok #a[0] == #b[0], 'description1'; # prints: ok 1 - description1
ok (#a[^2] Z== #b[^2]), 'description2'; # prints: ok 2 - description2
done-testing;
Without parens, 'description2' becomes an additional element of the list on the right. And per the doc for Z:
If one of the operands runs out of elements prematurely, the zip operator will stop.

Syntax error when attempting to amend a string with indexing

I'm studying APL from here.
Why am I getting this syntax error?
'computer' [ 1 2 3 ] ← 'COM'
SYNTAX ERROR
'computer'[1 2 3]←'COM'
^
But if I save 'computer' in a variable I don't get the error:
T ← 'computer'
T
computer
T[1 2 3] ← 'COM'
T
COMputer
What am I doing wrong?
'computer' is a constant, and you can't change the value of a constant itself, only the current value of a variable.
Think about it: If you could assign to 'computer', then next time you wrote 'computer', would you expect the result to be COMputer? How about 2←3? Clearly, this doesn't make any sense.
However, you can amend a value without assigning it to a name, using the relatively new # "at" operator (it isn't included in Mastering Dyalog APL, but the documentation is available online).
'COM'#1 2 3⊢'computer'
COMputer
You can read this as put the letters 'COM' at indices 1 2 3 of the word 'computer'. The ⊢ here only serves to separate 1 2 3 from 'computer so it is clear to # what constitutes the indices and what is the array to be amended.
Run it on TryAPL!
That bracket notation is made specifically for modifying variables. The return value of T[1 2 3] ← 'COM' is 'COM', so if the expression didn't modify a variable, it would be pointless (or, almost identical to ⊢).
To get a modified array, not modify a variable, use the operator #:
('COM'#1 2 3) 'computer'
Try it online!

SAS Index function?

Can anyone explain why the below piece of code gives two different values?
87 data _null_;
88 length a b $14;
89 a = 'ABC.DEF (X=Y)';
90 b = 'X=Y';
91 x = index(a,b);
92 y = index('ABC.DEF (X=Y)','X=Y');
93 put x y;
94 run;
0 10
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
Thanks.
It seems this is an exact copy of the example on the SAS website, so it would have been helpful if you would have looked for an answer there first.
This is their explanation:
Example 2:
Removing Trailing Spaces When You Use the INDEX Function with the TRIM Function
The following example shows the results when you use the INDEX function with and without the TRIM function. If you use INDEX without the TRIM function, leading and trailing spaces are considered part of the excerpt argument. If you use INDEX with the TRIM function, TRIM removes trailing spaces from the excerpt argument as you can see in this example. Note that the TRIM function is used inside the INDEX function.
options nodate nostimer ls=78 ps=60;
data _null_;
length a b $14;
a='ABC.DEF (X=Y)';
b='X=Y';
q=index(a,b);
w=index(a,trim(b));
put q= w=;
run;
SAS writes the following output to the log:
q=0 w=10
Added based on mjsqu's comment:
data _null_;
length a b $14 c $3;
a='ABC.DEF (X=Y)';
b='X=Y';
c='X=Y';
x=index(a,b);
y=index(a,c);
z=index(a,trim(b));
d = "|" || a ||"|";
e = "|" || b ||"|";
f = "|" || c ||"|";
put d=;
put e=;
put f=;
put x= y= z=;
run;
d=|ABC.DEF (X=Y) |
e=|X=Y |
f=|X=Y|
x=0 y=10 z=10
You can see that b has a trailing space which is part of the string that the Index function will be looking for. Since in string a X=Y is followed by ) and not a space, this means it will not be found => q = 0. You can also see here that if you change the length of b to the actual lenght of the string you want to look for (3 in this case), it would give you the same outcome.

What's the R equivalent of SQL's LIKE 'description%' statement?

Not sure how else to ask this but, I want to search for a term within several string elements. Here's what my code looks like (but wrong):
inplay = vector(length=nrow(des))
for (ii in 1:nrow(des)) {
if (des[ii] = 'In play%')
inplay[ii] = 1
else inplay[ii] = 0
}
des is a vector that stores strings such as "Swinging Strike", "In play (run(s))", "In play (out(s) recorded)" and etc. What I want inplay to store is a 1s and 0s vector corresponding with the des vector, with the 1s in inplay indicating that the des value had "In play%" in it and 0s otherwise.
I believe the 3rd line is incorrect, because all this does is return a vector of 0s with a 1 in the last element.
Thanks in advance!
The data.table package has syntax that is often similar to SQL. The package includes %like%, which is a "convenience function for calling regexpr". Here is an example taken from its help file:
## Create the data.table:
DT = data.table(Name=c("Mary","George","Martha"), Salary=c(2,3,4))
## Subset the DT table where the Name column is like "Mar%":
DT[Name %like% "^Mar"]
## Name Salary
## 1: Mary 2
## 2: Martha 4
The R analog to SQL's LIKE is just R's ordinary indexing syntax.
The 'LIKE' operator selects data rows from a table by matching string values in a specified column against a user-supplied pattern
> # create a data frame having a character column
> clrs = c("blue", "black", "brown", "beige", "berry", "bronze", "blue-green", "blueberry")
> dfx = data.frame(Velocity=sample(100, 8), Colors=clrs)
> dfx
Velocity Colors
1 90 blue
2 94 black
3 71 brown
4 36 beige
5 75 berry
6 2 bronze
7 89 blue-green
8 93 blueberry
> # create a pattern to use (the same as you would do when using the LIKE operator)
> ptn = '^be.*?' # gets beige and berry but not blueberry
> # execute a pattern-matching function on your data to create an index vector
> ndx = grep(ptn, dfx$Colors, perl=T)
> # use this index vector to extract the rows you want from the data frome:
> selected_rows = dfx[ndx,]
> selected_rows
Velocity Colors
4 36 beige
5 75 berry
In SQL, that would be:
SELECT * FROM dfx WHERE Colors LIKE ptn3
Something like regexpr?
> d <- c("Swinging Strike", "In play (run(s))", "In play (out(s) recorded)")
> regexpr('In play', d)
[1] -1 1 1
attr(,"match.length")
[1] -1 7 7
>
or grep
> grep('In play', d)
[1] 2 3
>
Since stringr 1.5.0, you can use str_like, which follows the structure of SQL's LIKE:
library(stringr)
fruit <- c("apple", "banana", "pear", "pineapple")
str_like(fruit, "app%")
#[1] TRUE FALSE FALSE FALSE
Not only does it include %, but also several other operators (see ?str_like).
Must match the entire string
_⁠ matches a single character (like .)
⁠%⁠ matches any number of characters (like ⁠.*⁠)
⁠%⁠ and ⁠_⁠ match literal ⁠%⁠ and ⁠_⁠
The match is case insensitive by default