inspect regex made with a variable - raku

sub make-regex {
my $what's-in-the-box = rand > .5 ?? 'x' !! 'y';
/$what's-in-the-box/
}
my $lol-who-knows = make-regex;
$lol-who-knows.gist.say;
How do you see the innards of the regex (i.e. x or y)? Forcing a match is not a solution.

How do you see the innards of the regex (i.e. x or y)?
You:
Statically compile the regex. Doing so will involve use of a raku compiler, i.e. Rakudo.
Dynamically evaluate the regex so that the $what's-in-the-box variable in your regex gets interpolated and turns into x or y. Doing so will involve running the regex as code. That in turn means both using Rakudo and running the regex with a Match object invocant (or sub-class instance or, conceivably, a mock object equivalent).
View the resulting regex. Doing so will involve using compiler (Rakudo) toolchain specific introspection or debug functionality.
Forcing a match is not a solution.
You can run a regex with no input if your concern is just to avoid the need for a successful match:
say Match.new.&$lol-who-knows; # #<failed match>
But it must be run, otherwise the $what's-in-the-box variable won't turn into an x or y. You might think you could cheat on this by writing something that mimics this part of raku regex construction/use but there's good reason to think that's not going to work out[1].
And then you must view its innards, using Rakudo toolchain features, after you've started running it and before it finishes running.
A regex is a method
In raku, a Regex is code, a sub-class of Method.
Until you run it, by using it in a match, that code is just /$what's-in-the-box/ (aka regex { $what's-in-the-box }). It's the equivalent of something like:
method {
...self should be a Match or a sub-class of it
...if self is not an instance, create a new one
...do matching -- compiler evaluates $what's-in-the-box during this
...return updated self / new instance
}
(To see a bit more detail, see Moritz's answer to the SO Can I change the slang inside a method?.)
A routine is a closure
You create your regex inside the closure named make-regex. The compiler spots that you've used $what's-in-the-box and therefore hangs on to the variable even after the make-regex closure returns. Later on, if/when your regex is run, the $what's-in-the-box variable will be replaced by its value at the moment the regex is run.
Footnotes
[1] There are plenty of huge complications you'll encounter if you try to do an end-run around using the compiler. Even something simple like interpolation is non-trivial. To quote Jonathan Worthington, in 2020:
I thought oh I'm only building [a tool that's] not the real compiler. I can ... have a simpler model of the symbol resolution. In the end it turned out that no I couldn't really. That started to create us some problems. When we aligned the way that we resolved symbols with the same algorithms and lookup structures that was being used in the compiler then suddenly it all became a lot simpler.

Related

Differences between .Bool, .so, ? and so

I’m trying to figure out what the differences are between the above-mentioned routines, and if statements like
say $y.Bool;
say $y.so;
say ? $y;
say so $y;
would ever produce a different result.
So far the only difference that is apparent to me is that ? has a higher precedence than so. .Bool and .so seem to be completely synonymous. Is that correct and (practically speaking) the full story?
What I've done to answer your question is to spelunk the Rakudo compiler source code.
As you note, one aspect that differs between the prefixes is parsing differences. The variations have different precedences and so is alphabetic whereas ? is punctuation. To see the precise code controlling this parsing, view Rakudo's Grammar.nqp and search within that page for prefix:sym<...> where the ... is ?, so, etc. It looks like ternary (... ?? ... !! ...) turns into an if. I see that none of these tokens have correspondingly named Actions.pm6 methods. As a somewhat wild guess perhaps the code generation that corresponds to them is handled by this part of method EXPR. (Anyone know, or care to follow the instructions in this blog post to find out?)
The definitions in Bool.pm6 and Mu.pm6 show that:
In Mu.pm6 the method .Bool returns False for an undefined object and .defined otherwise. In turn .defined returns False for an undefined object and True otherwise. So these are the default.
.defined is documented as overridden in two built in classes and .Bool in 19.
so, .so, and ? all call the same code that defers to Bool / .Bool. In theory classes/modules could override these instead of, or as well, as overriding .Bool or .defined, but I can't see why anyone would ever do that either in the built in classes/modules or userland ones.
not and ! are the same (except that use of ! with :exists dies) and both turn into calls to nqp::hllbool(nqp::not_i(nqp::istrue(...))). I presume the primary reason they don't go through the usual .Bool route is to avoid marking handling of Failures.
There are .so and .not methods defined in Mu.pm6. They just call .Bool.
There are boolean bitwise operators that include a ?. They are far adrift from your question but their code is included in the links above.

Why cannot CMake functions return values?

A question for CMake experts out-there.
According to the CMake function documentation a function simply does not return anything. To change variable values one has to pass it to the function, and inside the function set the new value specifying the PARENT_SCOPE option.
Fine, this is a well-known feature of CMake.
My question here is not about the how, rather on why: why CMake functions do not return values? What is the idea behind?
For example, a function cannot be used inside a if expression, or called inside a set command.
If I remember correctly, it is the same with autotools, therefore I do not think it is like this just by chance.
Is there any expert that knows why?
You can find a partial answer by Ken Martin in a message from the CMake's mailing list:
With respect to the general question of functions returning values it
could be done but it is a bit of a big change. Functions and commands
look the same (and should act the same IMO) to the people using them.
So really we are talking about commands returning values. This is
mostly just a syntax issue. Right now we have
command(arg arg arg
)
to support return values we need something that could handle
command (arg command2(arg arg) arg arg
)
or in your case
if(assertdef(foo))
or in another case
set(foo get_property(
))
etc. This hits the parser and the argument processing in CMake but I
think it could be done. I guess I’m not sure if we should do it.
Open to opinions here.

Variable Encapsulation in Case Statement

While modifying an existing program's CASE statement, I had to add a second block where some logic is repeated to set NetWeaver portal settings. This is done by setting values in a local variable, then assigning that variable to a Changing parameter. I copied over the code and did a Pretty Print, expecting to compiler to complain about the unknown variable. To my surprise however, this code actually compiles just fine:
CASE i_actionid.
WHEN 'DOMIGO'.
DATA: ls_portal_actions TYPE powl_follow_up_sty.
CLEAR ls_portal_actions.
ls_portal_actions-bo_system = 'SAP_ECC_Common'.
" [...]
c_portal_actions = ls_portal_actions.
WHEN 'EBELN'.
ls_portal_actions-bo_system = 'SAP_ECC_Common'.
" [...]
C_PORTAL_ACTIONS = ls_portal_actions.
ENDCASE.
As I have seen in every other programming language, the DATA: declaration in the first WHEN statement should be encapsulated and available only inside that switch block. Does SAP ignore this encapsulation to make that value available in the entire CASE statement? Is this documented anywhere?
Note that this code compiles just fine and double-clicking the local variable in the second switch takes me to the data declaration in the first. I have however not been able to test that this code executes properly as our testing environment is down.
In short you cannot do this. You will have the following scopes in an abap program within which to declare variables (from local to global):
Form routine: all variables between FORM and ENDFORM
Method: all variables between METHOD and ENDMETHOD
Class - all variables between CLASS and ENDCLASS but only in the CLASS DEFINITION section
Function module: all variables between FUNCTION and ENDFUNCTION
Program/global - anything not in one of the above is global in the current program including variables in PBO and PAI modules
Having the ability to define variables locally in a for loop or if is really useful but unfortunately not possible in ABAP. The closest you will come to publicly available documentation on this is on help.sap.com: Local Data in the Subroutine
As for the compile process do not assume that ABAP will optimize out any variables you do not use it won't, use the code inspector to find and remove them yourself. Since ABAP works the way it does I personally define all my variables at the start of a modularization unit and not inline with other code and have gone so far as to modify the pretty printer to move any inline definitions to the top of the current scope.
Your assumption that a CASE statement defines its own scope of variables in ABAP is simply wrong (and would be wrong for a number of other programming languages as well). It's a bad idea to litter your code with variable declarations because that makes it awfully hard to read and to maintain, but it is possible. The DATA statements - as well as many other declarative statements - are only evaluated at compile time and are completely ignored at runtime. You can find more information about the scopes in the online documentation.
The inline variable declarations are now possible with the newest version of SAP Netweaver. Here is the link to the documentation DATA - inline declaration. Here are also some guidelines of a good and bad usage of this new feature
Here is a quote from this site:
A declaration expression with the declaration operator DATA declares a variable var used as an operand in the current writer position. The declared variable is visible statically in the program from DATA(var) and is valid in the current context. The declaration is made when the program is compiled, regardless of whether the statement is actually executed.
Personally have not had time to check it out yet, because of lack of access to such system.

Write a compiler for a language that looks ahead and multiple files?

In my language I can use a class variable in my method when the definition appears below the method. It can also call methods below my method and etc. There are no 'headers'. Take this C# example.
class A
{
public void callMethods() { print(); B b; b.notYetSeen();
public void print() { Console.Write("v = {0}", v); }
int v=9;
}
class B
{
public void notYetSeen() { Console.Write("notYetSeen()\n"); }
}
How should I compile that? what i was thinking is:
pass1: convert everything to an AST
pass2: go through all classes and build a list of define classes/variable/etc
pass3: go through code and check if there's any errors such as undefined variable, wrong use etc and create my output
But it seems like for this to work I have to do pass 1 and 2 for ALL files before doing pass3. Also it feels like a lot of work to do until I find a syntax error (other than the obvious that can be done at parse time such as forgetting to close a brace or writing 0xLETTERS instead of a hex value). My gut says there is some other way.
Note: I am using bison/flex to generate my compiler.
My understanding of languages that handle forward references is that they typically just use the first pass to build a list of valid names. Something along the lines of just putting an entry in a table (without filling out the definition) so you have something to point to later when you do your real pass to generate the definitions.
If you try to actually build full definitions as you go, you would end up having to rescan repatedly, each time saving any references to undefined things until the next pass. Even that would fail if there are circular references.
I would go through on pass one and collect all of your class/method/field names and types, ignoring the method bodies. Then in pass two check the method bodies only.
I don't know that there can be any other way than traversing all the files in the source.
I think that you can get it down to two passes - on the first pass, build the AST and whenever you find a variable name, add it to a list that contains that blocks' symbols (it would probably be useful to add that list to the corresponding scope in the tree). Step two is to linearly traverse the tree and make sure that each symbol used references a symbol in that scope or a scope above it.
My description is oversimplified but the basic answer is -- lookahead requires at least two passes.
The usual approach is to save B as "unknown". It's probably some kind of type (because of the place where you encountered it). So you can just reserve the memory (a pointer) for it even though you have no idea what it really is.
For the method call, you can't do much. In a dynamic language, you'd just save the name of the method somewhere and check whether it exists at runtime. In a static language, you can save it in under "unknown methods" somewhere in your compiler along with the unknown type B. Since method calls eventually translate to a memory address, you can again reserve the memory.
Then, when you encounter B and the method, you can clear up your unknowns. Since you know a bit about them, you can say whether they behave like they should or if the first usage is now a syntax error.
So you don't have to read all files twice but it surely makes things more simple.
Alternatively, you can generate these header files as you encounter the sources and save them somewhere where you can find them again. This way, you can speed up the compilation (since you won't have to consider unchanged files in the next compilation run).
Lastly, if you write a new language, you shouldn't use bison and flex anymore. There are much better tools by now. ANTLR, for example, can produce a parser that can recover after an error, so you can still parse the whole file. Or check this Wikipedia article for more options.

Writing a TemplateLanguage/VewEngine

Aside from getting any real work done, I have an itch. My itch is to write a view engine that closely mimics a template system from another language (Template Toolkit/Perl). This is one of those if I had time/do it to learn something new kind of projects.
I've spent time looking at CoCo/R and ANTLR, and honestly, it makes my brain hurt, but some of CoCo/R is sinking in. Unfortunately, most of the examples are about creating a compiler that reads source code, but none seem to cover how to create a processor for templates.
Yes, those are the same thing, but I can't wrap my head around how to define the language for templates where most of the source is the html, rather than actual code being parsed and run.
Are there any good beginner resources out there for this kind of thing? I've taken a ganer at Spark, which didn't appear to have the grammar in the repo.
Maybe that is overkill, and one could just test-replace template syntax with c# in the file and compile it. http://msdn.microsoft.com/en-us/magazine/cc136756.aspx#S2
If you were in my shoes and weren't a language creating expert, where would you start?
The Spark grammar is implemented with a kind-of-fluent domain specific language.
It's declared in a few layers. The rules which recognize the html syntax are declared in MarkupGrammar.cs - those are based on grammar rules copied directly from the xml spec.
The markup rules refer to a limited subset of csharp syntax rules declared in CodeGrammar.cs - those are a subset because Spark only needs to recognize enough csharp to adjust single-quotes around strings to double-quotes, match curley braces, etc.
The individual rules themselves are of type ParseAction<TValue> delegate which accept a Position and return a ParseResult. The ParseResult is a simple class which contains the TValue data item parsed by the action and a new Position instance which has been advanced past the content which produced the TValue.
That isn't very useful on it's own until you introduce a small number of operators, as described in Parsing expression grammar, which can combine single parse actions to build very detailed and robust expressions about the shape of different syntax constructs.
The technique of using a delegate as a parse action came from a Luke H's blog post Monadic Parser Combinators using C# 3.0. I also wrote a post about Creating a Domain Specific Language for Parsing.
It's also entirely possible, if you like, to reference the Spark.dll assembly and inherit a class from the base CharGrammar to create an entirely new grammar for a particular syntax. It's probably the quickest way to start experimenting with this technique, and an example of that can be found in CharGrammarTester.cs.
Step 1. Use regular expressions (regexp substitution) to split your input template string to a token list, for example, split
hel<b>lo[if foo]bar is [bar].[else]baz[end]world</b>!
to
write('hel<b>lo')
if('foo')
write('bar is')
substitute('bar')
write('.')
else()
write('baz')
end()
write('world</b>!')
Step 2. Convert your token list to a syntax tree:
* Sequence
** Write
*** ('hel<b>lo')
** If
*** ('foo')
*** Sequence
**** Write
***** ('bar is')
**** Substitute
***** ('bar')
**** Write
***** ('.')
*** Write
**** ('baz')
** Write
*** ('world</b>!')
class Instruction {
}
class Write : Instruction {
string text;
}
class Substitute : Instruction {
string varname;
}
class Sequence : Instruction {
Instruction[] items;
}
class If : Instruction {
string condition;
Instruction then;
Instruction else;
}
Step 3. Write a recursive function (called the interpreter), which can walk your tree and execute the instructions there.
Another, alternative approach (instead of steps 1--3) if your language supports eval() (such as Perl, Python, Ruby): use a regexp substitution to convert the template to an eval()-able string in the host language, and run eval() to instantiate the template.
There are sooo many thing to do. But it does work for on simple GET statement plus a test. That's a start.
http://github.com/claco/tt.net/
In the end, I already had too much time in ANTLR to give loudejs' method a go. I wanted to spend a little more time on the whole process rather than the parser/lexer. Maybe in version 2 I can have a go at the Spark way when my brain understands things a little more.
Vici Parser (formerly known as LazyParser.NET) is an open-source tokenizer/template parser/expression parser which can help you get started.
If it's not what you're looking for, then you may get some ideas by looking at the source code.