String interpolation in Perl6 - raku

I have difficulty figuring out why the statement
say "\c500";
produces the character 'Ǵ' on my screen as expected, while the following statements give me an error message at compile time ("Unrecognized \c character"):
my $i = 500;
say "\c$i";
even though
say "$i"; # or 'say $i.Str;' for that matter
produces "500" (with "$i".WHAT indicating type Str).

You'll have to use $i.chr, which is documented here. \c is handled specially within strings, and does not seem to admit anything that is not a literal.

The string literal parser in Perl 6 is a type of domain specific language.
Basically what you write gets compiled similarly to the rest of the language.
"abc$_"
&infix:«~»('abc',$_.Str)
In the case of \c500, you could view it as a compile-time constant.
"\c500"
(BEGIN 500.chr)
Actually it is more like:
(BEGIN 500.HOW.find_method_qualified(Int,500,'chr').(500))
Except that the compiler for string literals actually tries to compile it to an abstract syntax tree, but is unable to because there hasn't been code added to handle this case of \c.
Even if there was, \c is effectively compiled to run at BEGIN time, which is before $_ has a value.
Also \c is used for more than .chr
"\c9" eq "\c[TAB]" eq "\cI" eq "\t"
(Note that \cI represents the character you would get by typing Cntrl+Alt+i on a posix platform)
So which of these should \c$_ compile to?
$_.chr
$_.parse-names
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'.index($_).succ.chr
If you want .chr you can write it as one of the following. (spaces added where they are allowed)
"abc$_.chr( )def"
"abc{ $_.chr }def"
"abc{ .chr }def"
'abc' ~ $_.chr ~ 'def'

Related

Why is the order of evaluation of expressions used for concatenation undefined in Awk?

In GNU Awk User's Guide, I went through the section 6.2.2 String Concatenation and found interesting insights:
Because string concatenation does not have an explicit operator, it is often necessary to ensure that it happens at the right time by using parentheses to enclose the items to concatenate.
Then, I was quite surprised to read the following:
Parentheses should be used around concatenation in all but the most common contexts, such as on the righthand side of ‘=’. Be careful about the kinds of expressions used in string concatenation. In particular, the order of evaluation of expressions used for concatenation is undefined in the awk language. Consider this example:
BEGIN {
a = "don't"
print (a " " (a = "panic"))
}
It is not defined whether the second assignment to a happens before or after the value of a is retrieved for producing the concatenated value. The result could be either ‘don't panic’, or ‘panic panic’.
In particular, in my GNU Awk 5.0.0 it performs like this, doing the replacement before printing the value:
$ gawk 'BEGIN {a = "dont"; print (a " " (a = "panic"))}'
dont panic
However, I wonder: why isn't the order of evaluation of expressions defined? What are the benefits of having "undefined" outputs that may vary depending on the version of Awk you are running?
This particular example is about expressions with side-effects. Traditionally, in C and awk syntax (closely inspired by C), assignments are allowed inside expressions. How those expressions are then evaluated is up to the implementation.
Leaving something unspecified would make sure that people don't use potentially confusing or ambiguous language constructs. But that assumes they are aware of the lack of specification.

Perl6 regex not matching end $ character with filenames

I've been trying to learn Perl6 from Perl5, but the issue is that the regex works differently, and it isn't working properly.
I am making a test case to list all files in a directory ending in ".p6$"
This code works with the end character
if 'read.p6' ~~ /read\.p6$/ {
say "'read.p6' contains 'p6'";
}
However, if I try to fit this into a subroutine:
multi list_files_regex (Str $regex) {
my #files = dir;
for #files -> $file {
if $file.path ~~ /$regex/ {
say $file.path;
}
}
}
it no longer works. I don't think the issue with the regex, but with the file name, there may be some attribute I'm not aware of.
How can I get the file name to match the regex in Perl6?
Regexes are a first-class language within Perl 6, rather than simply strings, and what you're seeing here is a result of that.
The form /$foo/ in Perl 6 regex will search for the string value in $foo, so it will be looking, literally, for the characters read\.p6$ (that is, with the dot and dollar sign).
Depending on the situation of the calling code, there are a couple of options:
If you really are receiving regexes as strings, for example read as input or from a file, then use $file.path ~~ /<$regex>/. This means it will treat what's in $regex as regex syntax.
If you will just be passing a range of different regexes in, change the parameter to be of type Regex, and then do $file.path ~~ $regex. In this case, you'd pass them like list_files_regex(/foo/).
Last but not least, dir takes a test parameter, and so you can instead write:
for dir(test => /<$regex>/) -> $file {
say $file.path;
}

Variable substitution within braces in Tcl

Correct me wherever I am wrong.
When we use the variables inside braces, the value won't be replaced during evaluation and simply passed on as an argument to the procedure/command. (Yes, some exception are there like expr {$x+$y}).
Consider the following scenarios,
Scenario 1
% set a 10
10
% if {$a==10} {puts "value is $a"}
value is 10
% if "$a==10" "puts \"value is $a\""
value is 10
Scenario 2
% proc x {} {
set c 10
uplevel {set val $c}
}
%
% proc y {} {
set c 10
uplevel "set val $c"
}
% x
can't read "c": no such variable
% y
10
% set val
10
%
In both of the scenarios, we can see that the variable substitution is performed on the body of the if loop (i.e. {puts "value is $a"}), whereas in the uplevel, it is not (i.e. {set val $c}), based on the current context.
I can see it as if like they might have access it via upvar kind of stuffs may be. But, why it has to be different among places ? Behind the scene, why it has to be designed in such this way ? Or is it just a conventional way how Tcl works?
Tcl always works exactly the same way with exactly one level of interpretation, though there are some cases where there is a second level because a command specifically requests it. The way it works is that stuff inside braces is never interpolated or checked for word boundaries (provided those braces start at the start of a “word”), stuff in double quotes is interpolated but not parsed for word boundaries (provided they start a word), and otherwise both interpolation and word boundary scanning are done (with the results of interpolation not scanned).
But some commands send the resulting word through again. For example:
eval {
puts "this is an example with your path: $env(PATH)"
}
The rule applies to the outer eval, but that concatenates its arguments and then sends the results into Tcl again. if does something similar with its body script except there's no concatenation, and instead there's conditional execution. proc also does the same, except it delays running the code until you call the procedure. The expr command is like eval, except that sends the script into the expression evaluation engine, which is really a separate little language. The if command also uses the expression engine (as do while and for). The expression language understands $var (and […]) as well.
So what happens if you do this?
set x [expr $x + $y]
Well, first we parse the first word out, set, then x, then with the third word we start a command substitution, which recursively enters the parser until the matching ] is found. With the inner expr, we first parse expr, then $x (reading the x variable), then +, then $y. Now the expr command is invoked with three arguments; it concatenates the values with spaces between them and sends the result of the concatenation into the expression engine. If you had x previously containing $ab and y containing [kaboom], the expression to evaluate will be actually:
$ab + [kaboom]
which will probably give you an error about a non-existing variable or command. On the other hand, if you did expr {$x + $y} with the braces, you'll get an addition applied to the contents of the two variables (still an error in this case, because neither looks like a number).
You're recommended to brace your expressions because then the expression that you write is the expression that will be evaluated. Otherwise, you can get all sorts of “unexpected” behaviours. Here's a mild example:
set x {12 + 34}
puts [expr $x]
set y {56 + 78}
puts [expr $y]
puts [expr $x * $y]
Remember, Tcl always works the same way. No special cases. Anything that looks like a special cases is just a command that implements a little language (often by calling recursively back into Tcl or the expression engine).
In addition to Donal Fellows's answer:
In scenario 2, in x the command uplevel {set val $c} is invoked, and fails because there is no such variable at the caller's level.
In y, the equivalent of uplevel {set val 10} is invoked (because the value of c is substituted when the command is interpreted). This script can be evaluated at the caller's level since it doesn't depend on any variables there. Instead, it creates the variable val at that level.
It has been designed this way because it gives the programmer more choices. If we want to avoid evaluation when a command is prepared for execution (knowing that the command we invoke may still evaluate our variables as it executes), we brace our arguments. If we want evaluation to happend during command preparation, we use double quotes (or no form of quoting).
Now try this:
% set c 30
30
% x
30
% y
10
If there is such a variable at the caller's level, x is a useful command for setting the variable val to the value of c, while y is a useful command for setting the variable val to the value encapsulated inside y.

What's the difference between parenthesis $() and curly bracket ${} syntax in Makefile?

Is there any differences in invoking variables with syntax ${var} and $(var)? For instance, in the way the variable will be expanded or anything?
There's no difference – they mean exactly the same (in GNU Make and in POSIX make).
I think that $(round brackets) look tidier, but that's just personal preference.
(Other answers point to the relevant sections of the GNU Make documentation, and note that you shouldn't mix the syntaxes within a single expression)
The Basics of Variable References section from the GNU make documentation state no differences:
To substitute a variable's value, write a dollar sign followed by the
name of the variable in parentheses or braces: either $(foo) or
${foo} is a valid reference to the variable foo.
As already correctly pointed out, there is no difference but be be wary not to mix the two kind of delimiters as it can lead to cryptic errors like in the GNU make example by unomadh.
From the GNU make manual on the Function Call Syntax (emphasis mine):
[…] If the arguments themselves contain other function calls or variable references, it is wisest to use the same kind of delimiters for all the references; write $(subst a,b,$(x)), not $(subst a,b,${x}). This is because it is clearer, and because only one type of delimiter is matched to find the end of the reference.
The ${} style lets you test the make rules in the shell, if you have the corresponding environment variables set, since that is compatible with bash.
Actually, it seems to be fairly different:
, = ,
list = a,b,c
$(info $(subst $(,),-,$(list))_EOL)
$(info $(subst ${,},-,$(list))_EOL)
outputs
a-b-c_EOL
md/init-profile.md:4: *** unterminated variable reference. Stop.
But so far I only found this difference when the variable name into ${...} contains itself a comma. I first thought ${...} was expanding the comma not as part as the value, but it turns out i'm not able to hack it this way. I still don't understand this... If anyone had an explanation, I'd be happy to know !
It makes a difference if the expression contains unbalanced brackets:
${info ${subst ),(,:-)}}
$(info $(subst ),(,:-)))
->
:-(
*** insufficient number of arguments (1) to function 'subst'. Stop.
For variable references, this makes a difference for functions, or for variable names that contain brackets (bad idea)

How to pass a regular expression to a function in AWK

I do not know how to pass an regular expression as an argument to a function.
If I pass a string, it is OK,
I have the following awk file,
#!/usr/bin/awk -f
function find(name){
for(i=0;i<NF;i++)if($(i+1)~name)print $(i+1)
}
{
find("mysql")
}
I do something like
$ ./fct.awk <(echo "$str")
This works OK.
But when I call in the awk file,
{
find(/mysql/)
}
This does not work.
What am I doing wrong?
Thanks,
Eric J.
you cannot (should not) pass regex constant to a user-defined function. you have to use dynamic regex in this case. like find("mysql")
if you do find(/mysql/), what does awk do is : find($0~/mysql/) so it pass a 0 or 1 to your find(..) function.
see this question for detail.
awk variable assignment statement explanation needed
also
http://www.gnu.org/software/gawk/manual/gawk.html#Using-Constant-Regexps
section: 6.1.2 Using Regular Expression Constants
warning: regexp constant for parameter #1 yields boolean value
The regex gets evaluated (matching against $0) before it's passed to the function. You have to use strings.
Note: make sure you do proper escaping: http://www.gnu.org/software/gawk/manual/gawk.html#Computed-Regexps
If you use GNU awk, you can use regular expression as user defined function parameter.
You have to define your regex as #/.../.
In your example, you would use it like this:
function find(regex){
for(i=1;i<=NF;i++)
if($i ~ regex)
print $i
}
{
find(#/mysql/)
}
It's called strongly type regexp constant and it's available since GNU awk version 4.2 (Oct 2017).
Example here.
use quotations, treat them as a string. this way it works for mawk, mawk2, and gnu-gawk. but you'll also need to double the backslashes since making them strings will eat away one of them right off the bat.
in your examplem just find("mysql") will suffice.
you can actually get it to pass arbitrary regex as you wish, and not be confined to just gnu-gawk, as long as you're willing to make them strings not the #/../ syntax others have mentioned. This is where the # of backslashes make a difference.
You can even make regex out of arbitrary bytes too, preferably via octal codes. if you do "\342\234\234" as a regex, the system will convert that into actual bytes in the regex before matching.
While there's nothing with that approach, if you wanna be 100% safe and prefer not having arbitrary bytes flying around , write it as
"[\\342][\\234][\\234]" ----> ✜
Once initially read by awk to create an internal representation, it'll look like this :
[\342][\234][\234]
which will still match the identical objects you desire (in this case, some sort of cross-looking dingbat). This will spit out annoying warnings in unicode-aware mode of gawk due to attempting to enclose non-ASCII bytes directly into square brackets. For that use case,
"\\342\\234\\234" ------(eqv to )---> /\342\234\234/
will keep gawk happy and quiet. Lately I've been filling the gaps in my own codes and write regex that can mimic all the Unicode-script classes that perl enjoys.