.trans with keys longer than one symbol in Perl 6 - raku

trans is a very useful and powerful instrument, but it remains a bit of a mystery for me.
E.g. I still don't understand this phrase from the docs:
In case a list of keys and values is used, substrings can be replaced
as well.
What's the algorithm if keys and values are longer than one symbol?
The following test code explores how .trans works with 'conflicting' keys. Why does the 1st pair work differently depending on whether it is alone or accompanied by the 2nd pair?
my Pair #trans =
ab => '12',
bc => '34',
;
my $str = 'ab';
say "both trans: $str.trans(#trans)"; # 13
say "1st trans: $str.trans(#trans[0])"; # 12
Using a hash instead of a list of pairs produces a different result:
my %trans =
ab => '12',
bc => '34',
;
my $str = 'ab';
say "both trans: $str.trans(%trans)"; # 12
(I understand that in hash, pairs can go in any sequence, but in the first example with the list it's the 1st pair, which isn't fully used if the 2nd pair is present)

(I'm not 100% sure of the following but I have to run.)
.trans requires one or more pair arguments that together describe the desired translation.
Translation of a single pair whose key is a single string
P6 maps the Nth character of the pair's key string to the Nth character of the pair's value string.
Thus .trans: "ab" => "12" maps "a" to "1" and "b" to "2".
Translation of a single pair whose key is a list of strings
P6 maps the Nth string of the pair's key list to the Nth string of the pair's value list.
Thus .trans: ("ab", "bc") => ("12", "13") maps "ab" to "12" and "bc" to "13".
Translation of a list of pairs
Translation of a single pair proceeds in one or other of the two forms explained above depending on whether the key contains one string or a list of them.
Translation of a list of pairs just repeats the process for each pair, doing either the Nth character or Nth string mapping as per that pair's key.
how .trans works with 'conflicting' keys
Given a list of pairs, P6 tries the first one first, and if that doesn't match, then the second pair, and so on.
I'll need to explore what lizmat now thinks and what she then meant when she said the following in her earlier answer about .trans:
I think you misunderstand what .trans does. You specify a range of characters to be changed into other characters. You are NOT specifying a string to be changed to another string.
I think the sentence you quoted from the doc is a bit ambiguous:
In case a list of keys and values is used, substrings can be replaced as well.
It means that the (single) .key attribute of a pair passed to .trans stores a list of strings rather than a single string, and likewise for the pair's single .value attribute.

Related

get each number in String and Compare in TCL/tk

I have string output:
1 4 2 1 4
I want to get each character in string to compare.
I did it to want to know whether the list is sorted yet.
It's not exactly clear to me what you are trying to achieve. Going by "to know whether the list is sorted", and assuming a list of integers, you can use tcl::mathop::< or tcl::mathop::<=, depending on whether you want to allow duplicate values:
if {[tcl::mathop::<= {*}$list]} {
puts "List is sorted"
} else {
puts "List is mixed up"
}
This will also work for ASCII comparison of strings. For more complex comparisons, like using dictionary rules or case insensitive, it's probably easiest to combine that with lsort along with the -indices option:
tcl::mathop::< {*}[lsort -indices -dictionary $list]
The -indices option returns the original index of each list element in sorted order. By checking if those indices are in incremental order, you know if the original list was already sorted.
Of course, if the point of the exercise was to avoid unnecessary sorting, then this is no use. But then again, bubble sort of an already sorted list is very fast and will basically do exactly the comparisons you described. So just sorting will probably be faster than first checking for a sorted list via a scripted loop.
To get each character in the string, do split $the_string "" (yes, on the empty string). That gives you a list of all the characters in the string; you can use foreach to iterate over them. Remember, you can iterate over two (or more) lists at once:
foreach c1 [split $the_string ""] c2 $target_comparison_list {
if {$c1 ne $c2} {
puts "The first not equal character is “$c1” when “$c2” was expected"
break
}
}
Note that it's rarely useful to continue comparison after a difference is found as the most common differences are (relative to the target string) insertions and deletions; almost everything after either of those will differ.

:ex and :ov adverbs with Perl 6 named captures

I don't fully understand, why the results are different here. Does :ov apply only to <left>, so having found the longest match it wouldn't do anything else?
my regex left {
a | ab
}
my regex right {
bc | c
}
"abc" ~~ m:ex/<left><right>
{put $<left>, '|', $<right>}/; # 'ab|c' and 'a|bc'
say '---';
"abc" ~~ m:ov/<left><right>
{put $<left>, '|', $<right>}/; # only 'ab|c'
Types of adverbs
It's important to understand that there are two different types of regex adverbs:
Those that fine-tune how your regex code is compiled (e.g. :sigspace/:s, :ignorecase/:i, ...). These can also be written inside the regex, and only apply to the rest of their lexical scope within the regex.
Those that control how regex matches are found and returned (e.g. :exhaustive/:ex, :overlap/:ov, :global/:g). These apply to a given regex matching operation as a whole, and have to be written outside the regex, as an adverb of the m// operator or .match method.
Match adverbs
Here is what the relevant adverbs of the second type do:
m:ex/.../ finds every possible match at every possible starting position.
m:ov/.../ finds the first possible match at every possible starting position.
m:g/.../ finds the first possible match at every possible starting position that comes after the end of the previous match (i.e., non-overlapping).
m/.../ finds the first possible match at the first possible starting position.
(In each case, the regex engine moves on as soon as it has found what it was meant to find at any given position, that's why you don't see additional output even by putting print statements inside the regexes.)
Your example
In your case, there are only two possible matches: ab|c and a|bc.
Both start at the same position in the input string, namely at position 0.
So only m:ex/.../ will find both of them – all the other variants will only find one of them and then move on.
:ex will find all possible combinations of overlapping matches.
:ov acts like :ex except that it limits the search algorithm by constraining it to find only a single match for a given starting position, causing it to produce a single match for a given length. :ex is allowed to start from the very beginning of the string to find a new unique match, and so it may find several matches of length 3; :ov will only ever find exactly one match of length 3.
Documentation:
https://docs.perl6.org/language/regexes
Exhaustive:
To find all possible matches of a regex – including overlapping ones – and several ones that start at the same position, use the :exhaustive (short :ex) adverb
Overlapping:
To get several matches, including overlapping matches, but only one (the longest) from each starting position, specify the :overlap (short :ov) adverb:

multiple matches for NSRegularExpression

I have the following regex pattern to match:
NSString *pattern=[NSString stringWithFormat:#"%#(.*)%#", key, key2];
So let say if key=\\\\[\\\\\[ and key2=\\\\]\\\\] then I am getting the string containing the
keys along with the contained text. But the problem is that if there are multiple matches then it only takes ist appearance of key and last appearance of key2 and gives me the text contained between those along with the keys. Eg.: This is [[some]] [[text]]. This gives me:[[some]] [[text]] as one match whereas I want [[some]] and [[text]] as separate matches. How should I modify it to give all the matches separately?
Same thing that bothers novice parser makers who wanna parse a string between quotes, and think that
\\".*\\"
is sufficient, but then are surprised when this matches all the text between
"a string" and also "another string"
The reason behind this is that the * operator is greedy. You have to use character set negation to achieve the expected result:
\\[\\[[^\\[\\]]*\\]\\]
Hope this helps.

Is it possible to ignore characters in a string when matching with a regular expression

I'd like to create a regular expression such that when I compare the a string against an array of strings, matches are returned with the regex ignoring certain characters.
Here's one example. Consider the following array of names:
{
"Andy O'Brien",
"Bob O'Brian",
"Jim OBrien",
"Larry Oberlin"
}
If a user enters "ob", I'd like the app to apply a regex predicate to the array and all of the names in the above array would match (e.g. the ' is ignored).
I know I can run the match twice, first against each name and second against each name with the ignored chars stripped from the string. I'd rather this by done by a single regex so I don't need two passes.
Is this possible? This is for an iOS app and I'm using NSPredicate.
EDIT: clarification on use
From the initial answers I realized I wasn't clear. The example above is a specific one. I need a general solution where the array of names is a large array with diverse names and the string I am matching against is entered by the user. So I can't hard code the regex like [o]'?[b].
Also, I know how to do case-insensitive searches so don't need the answer to focus on that. Just need a solution to ignore the chars I don't want to match against.
Since you have discarded all the answers showing the ways it can be done, you are left with the answer:
NO, this cannot be done. Regex does not have an option to 'ignore' characters. Your only options are to modify the regex to match them, or to do a pass on your source text to get rid of the characters you want to ignore and then match against that. (Of course, then you may have the problem of correlating your 'cleaned' text with the actual source text.)
If I understand correctly, you want a way to match the characters "ob" 1) regardless of capitalization, and 2) regardless of whether there is an apostrophe in between them. That should be easy enough.
1) Use a case-insensitivity modifier, or use a regexp that specifies that the capital and lowercase version of the letter are both acceptable: [Oo][Bb]
2) Use the ? modifier to indicate that a character may be present either one or zero times. o'?b will match both "o'b" and "ob". If you want to include other characters that may or may not be present, you can group them with the apostrophe. For example, o['-~]?b will match "ob", "o'b", "o-b", and "o~b".
So the complete answer would be [Oo]'?[Bb].
Update: The OP asked for a solution that would cause the given character to be ignored in an arbitrary search string. You can do this by inserting '? after every character of the search string. For example, if you were given the search string oleary, you'd transform it into o'?l'?e'?a'?r'?y'?. Foolproof, though probably not optimal for performance. Note that this would match "o'leary" but also "o'lea'r'y'" if that's a concern.
In this particular case, just throw the set of characters into the middle of the regex as optional. This works specifically because you have only two characters in your match string, otherwise the regex might get a bit verbose. For example, match case-insensitive against:
o[']*b
You can add more characters to that character class in the middle to ignore them. Note that the * matches any number of characters (so O'''Brien will match) - for a single instance, change to ?:
o[']?b
You can make particular characters optional with a question mark, which means that it will match whether they're there or not, e.g:
/o\'?b/
Would match all of the above, add .+ to either side to match all other characters, and a space to denote the start of the surname:
/.+? o\'?b.+/
And use the case-insensitivity modifier to make it match regardless of capitalisation.

How can I check for a certain suffix in my string?

I got a list of strings. And I want to check for every string in there. Sometimes, a string can have the suffix _anim(X) where X is an integer. If such string has that kind of suffix, I need to check for all other strings that have the same "base" (the base being the part without suffix) and finally group such strings and send them to my function.
So, given the next list:
Man_anim(1)
Woman
Man_anim(3)
Man_anim(2)
My code would discover the base Man has a special suffix, and will then generate a new list grouping all Man objects and arrange them depending on the value inside parenthesis. The code is supposed to return
Man_anim(1)
Man_anim(2)
Man_anim(3)
And send such list to my function for further processing.
My problem is, how can I check for the existence of such suffix, and afterwards, check for the value inside parenthesis?
If you know that the suffix is going to be _anim(X) every time (obviously, with X varying) then you can use a regular expression:
Regex.IsMatch(value, #"_anim\(\d+\)$")
If the suffix isn't at least moderately consistent, then you'll have to look into data structures, like Suffix Trees, which you can use to determine common structures in strings.