<!WHATVER!> syntax in Kotlin? (Angle brackets wrapping exclamation points) - kotlin

I saw this syntax I'm not familiar with in the Kotlin compiler test suite.
// !DIAGNOSTICS: +UNUSED_LAMBDA_EXPRESSION, +UNUSED_VARIABLE
fun unusedLiteral(){
<!UNUSED_LAMBDA_EXPRESSION!>{ ->
val <!UNUSED_VARIABLE!>i<!> = 1
}<!>
}
What does <!UNUSED_LAMBDA_EXPRESSION!>...<!> mean?
Found in unusedLiteral.kt
The term UNUSED_LAMBDA_EXPRESSION is declared in Errors.kt to be:
DiagnosticFactory0<KtLambdaExpression> UNUSED_LAMBDA_EXPRESSION = DiagnosticFactory0.create(WARNING);

This syntax is not valid Kotlin. It is only used in the test data files of Kotlin's test pipeline. That is, only the test runners recognises this syntax, not the Kotlin compiler. Specifically, the <!DIAGNOSTIC_NAME!>foo<!> syntax denotes a handler. Handlers do checks on things, or output information to a file. In this case, this syntax checks that there is indeed the specified diagnostic being emitted at that point in the file.
Also note that the // !DIAGNOSTICS comment at the top is not just a comment. It denotes a directive. Directives are like the options for running the test.
I highly recommend you read compiler/testData/diagnostics/ReadMe.md, which explains how diagnostic tests work specifically, and if you're really interested in this stuff, check out compiler/test-infrastructure/ReadMe.md too, which tells you all about how the whole test pipeline works in general.

Related

How can one invoke the non-extension `run` function (the one without scope / "object reference") in environments where there is an object scope?

Example:
data class T(val flag: Boolean) {
constructor(n: Int) : this(run {
// Some computation here...
<Boolean result>
})
}
In this example, the custom constructor needs to run some computation in order to determine which value to pass to the primary constructor, but the compiler does not accept the run, citing Cannot access 'run' before superclass constructor has been called, which, if I understand correctly, means instead of interpreting it as the non-extension run (the variant with no object reference in https://kotlinlang.org/docs/reference/scope-functions.html#function-selection), it construes it as a call to this.run (the variant with an object reference in the above table) - which is invalid as the object has not completely instantiated yet.
What can I do in order to let the compiler know I mean the run function which is not an extension method and doesn't take a scope?
Clarification: I am interested in an answer to the question as asked, not in a workaround.
I can think of several workarounds - ways to rewrite this code in a way that works as intended without calling run: extracting the code to a function; rewriting it as a (possibly highly nested) let expression; removing the run and invoking the lambda (with () after it) instead (funnily enough, IntelliJ IDEA tags that as Redundant lambda creation and suggests to Inline the body, which reinstates the non-compiling run). But the question is not how to rewrite this without using run - it's how to make run work in this context.
A good answer should do one of the following things:
Explain how to instruct the compiler to call a function rather than an extension method when a name is overloaded, in general; or
Explain how to do that specifically for run; or
Explain that (and ideally also why) it is not possible to do (ideally with supporting references); or
Explain what I got wrong, in case I got something wrong and the whole question is irrelevant (e.g. if my analysis is incorrect, and the problem is something other than the compiler construing the call to run as this.run).
If someone has a neat workaround not mentioned above they're welcome to post it in a comment - not as an answer.
In case it matters: I'm using multi-platform Kotlin 1.4.20.
Kotlin favors the receiver overload if it is in scope. The solution is to use the fully qualified name of the non-receiver function:
kotlin.run { //...
The specification is explained here.
Another option when the overloads are not in the same package is to use import renaming, but that won't work in this case since both run functions are in the same package.

Alter how arguments are processed before they're passed to sub MAIN

Given the documentation and the comments on an earlier question, by request I've made a minimal reproducible example that demonstrates a difference between these two statements:
my %*SUB-MAIN-OPTS = :named-anywhere;
PROCESS::<%SUB-MAIN-OPTS><named-anywhere> = True;
Given a script file with only this:
#!/usr/bin/env raku
use MyApp::Tools::CLI;
and a module file in MyApp/Tools called CLI.pm6:
#PROCESS::<%SUB-MAIN-OPTS><named-anywhere> = True;
my %*SUB-MAIN-OPTS = :named-anywhere;
proto MAIN(|) is export {*}
multi MAIN( 'add', :h( :$hostnames ) ) {
for #$hostnames -> $host {
say $host;
}
}
multi MAIN( 'remove', *#hostnames ) {
for #hostnames -> $host {
say $host;
}
}
The following invocation from the command line will not result in a recognized subroutine, but show the usage:
mre.raku add -h=localhost -h=test1
Switching my %*SUB-MAIN-OPTS = :named-anywhere; for PROCESS::<%SUB-MAIN-OPTS><named-anywhere> = True; will print two lines with the two hostnames provided, as expected.
If however, this is done in a single file as below, both work identical:
#!/usr/bin/env raku
#PROCESS::<%SUB-MAIN-OPTS><named-anywhere> = True;
my %*SUB-MAIN-OPTS = :named-anywhere;
proto MAIN(|) is export {*}
multi MAIN( 'add', :h( :$hostnames )) {
for #$hostnames -> $host {
say $host;
}
}
multi MAIN( 'remove', *#hostnames ) {
for #hostnames -> $host {
say $host;
}
}
I find this hard to understand.
When reproducing this, be alert of how each command must be called.
mre.raku remove localhost test1
mre.raku add -h=localhost -h=test1
So a named array-reference is not recognized when this is used in a separate file with my %*SUB-MAIN-OPTS = :named-anywhere;. While PROCESS::<%SUB-MAIN-OPTS><named-anywhere> = True; always works. And for a slurpy array, both work identical in both cases.
The problem is that it isn't the same variable in both the script and in the module.
Sure they have the same name, but that doesn't mean much.
my \A = anon class Foo {}
my \B = anon class Foo {}
A ~~ B; # False
B ~~ A; # False
A === B; # False
Those two classes have the same name, but are separate entities.
If you look at the code for other built-in dynamic variables, you see something like:
Rakudo::Internals.REGISTER-DYNAMIC: '$*EXECUTABLE-NAME', {
PROCESS::<$EXECUTABLE-NAME> := $*EXECUTABLE.basename;
}
This makes sure that the variable is installed into the right place so that it works for every compilation unit.
If you look for %*SUB-MAIN-OPTS, the only thing you find is this line:
my %sub-main-opts := %*SUB-MAIN-OPTS // {};
That looks for the variable in the main compilation unit. If it isn't found it creates and uses an empty Hash.
So when you try do it in a scope other than the main compilation unit, it isn't in a place where it could be found by that line.
To test if adding that fixes the issue, you can add this to the top of the main compilation unit. (The script that loads the module.)
BEGIN Rakudo::Internals.REGISTER-DYNAMIC: '%*SUB-MAIN-OPTS', {
PROCESS::<%SUB-MAIN-OPTS> := {}
}
Then in the module, write this:
%*SUB-MAIN-OPTS = :named-anywhere;
Or better yet this:
%*SUB-MAIN-OPTS<named-anywhere> = True;
After trying this, it seems to work just fine.
The thing is, that something like that used to be there.
It was removed on the thought that it slows down every Raku program.
Though I think that any slowdown it caused would still be an issue as the line that is still there has to look to see if there is a dynamic variable of that name.
(There are more reasons given, and I frankly disagree with all of them.)
May a cuppa bring enlightenment to future SO readers pondering the meaning of things.[1]
Related answers by Liz
I think Liz's answer to an SO asking a similar question may be a good read for a basic explanation of why a my (which is like a lesser our) in the mainline of a module doesn't work, or at least confirmation that core devs know about it.
Her later answer to another SO explains how one can use my by putting it inside a RUN-MAIN.
Why does a slurpy array work by default but not named anywhere?
One rich resource on why things are the way they are is the section Declaring a MAIN subroutine of S06 (Synopsis on Subroutines)[2].
A key excerpt:
As usual, switches are assumed to be first, and everything after the first non-switch, or any switches after a --, are treated as positionals or go into the slurpy array (even if they look like switches).
So it looks like this is where the default behavior, in which nameds can't go anywhere, comes from; it seems that #Larry[3] was claiming that the "usual" shell convention was as described, and implicitly arguing that this should dictate that the default behavior was as it is.
Since Raku was officially released RFC: Allow subcommands in MAIN put us on the path to todays' :named-anywhere option. The RFC presented a very powerful 1-2 punch -- an unimpeachable two line hackers' prose/data argument that quickly led to rough consensus, with a working code PR with this commit message:
Allow --named-switches anywhere in command line.
Raku was GNU-like in that it has '--double-dashes' and that it stops interpreting named parameters when it encounters '--', but unlike GNU-like parsing, it also stopped interpreting named parameters when encountering any positional argument. This patch makes it a bit more GNU-like by allowing named arguments after a positional, to prepare for allowing subcommands.
> Alter how arguments are processed before they're passed to sub MAIN
In the above linked section of S06 #Larry also wrote:
Ordinarily a top-level Raku "script" just evaluates its anonymous mainline code and exits. During the mainline code, the program's arguments are available in raw form from the #*ARGS array.
The point here being that you can preprocess #*ARGS before they're passed to MAIN.
Continuing:
At the end of the mainline code, however, a MAIN subroutine will be called with whatever command-line arguments remain in #*ARGS.
Note that, as explained by Liz, Raku now has a RUN-MAIN routine that's called prior to calling MAIN.
Then comes the standard argument processing (alterable by using standard options, of which there's currently only the :named-anywhere one, or userland modules such as SuperMAIN which add in various other features).
And finally #Larry notes that:
Other [command line parsing] policies may easily be introduced by calling MAIN explicitly. For instance, you can parse your arguments with a grammar and pass the resulting Match object as a Capture to MAIN.
A doc fix?
Yesterday you wrote a comment suggesting a doc fix.
I now see that we (collectively) know about the coding issue. So why is the doc as it is? I think the combination of your SO and the prior ones provide enough anecdata to support at least considering filing a doc issue to the contrary. Then again Liz hints in one of the SO's that a fix might be coming, at least for ours. And SO is itself arguably doc. So maybe it's better to wait? I'll punt and let you decide. At least you now have several SOs to quote if you decide to file a doc issue.
Footnotes
[1] I want to be clear that if anyone perceives any fault associated with posting this SO then they're right, and the fault is entirely mine. I mentioned to #acw that I'd already done a search so they could quite reasonably have concluded there was no point in them doing one as well. So, mea culpa, bad coffee inspired puns included. (Bad puns, not bad coffee.)
[2] Imo these old historical speculative design docs are worth reading and rereading as you get to know Raku, despite them being obsolete in parts.
[3] #Larry emerged in Raku culture as a fun and convenient shorthand for Larry Wall et al, the Raku language team led by Larry.

inspect regex made with a variable

sub make-regex {
my $what's-in-the-box = rand > .5 ?? 'x' !! 'y';
/$what's-in-the-box/
}
my $lol-who-knows = make-regex;
$lol-who-knows.gist.say;
How do you see the innards of the regex (i.e. x or y)? Forcing a match is not a solution.
How do you see the innards of the regex (i.e. x or y)?
You:
Statically compile the regex. Doing so will involve use of a raku compiler, i.e. Rakudo.
Dynamically evaluate the regex so that the $what's-in-the-box variable in your regex gets interpolated and turns into x or y. Doing so will involve running the regex as code. That in turn means both using Rakudo and running the regex with a Match object invocant (or sub-class instance or, conceivably, a mock object equivalent).
View the resulting regex. Doing so will involve using compiler (Rakudo) toolchain specific introspection or debug functionality.
Forcing a match is not a solution.
You can run a regex with no input if your concern is just to avoid the need for a successful match:
say Match.new.&$lol-who-knows; # #<failed match>
But it must be run, otherwise the $what's-in-the-box variable won't turn into an x or y. You might think you could cheat on this by writing something that mimics this part of raku regex construction/use but there's good reason to think that's not going to work out[1].
And then you must view its innards, using Rakudo toolchain features, after you've started running it and before it finishes running.
A regex is a method
In raku, a Regex is code, a sub-class of Method.
Until you run it, by using it in a match, that code is just /$what's-in-the-box/ (aka regex { $what's-in-the-box }). It's the equivalent of something like:
method {
...self should be a Match or a sub-class of it
...if self is not an instance, create a new one
...do matching -- compiler evaluates $what's-in-the-box during this
...return updated self / new instance
}
(To see a bit more detail, see Moritz's answer to the SO Can I change the slang inside a method?.)
A routine is a closure
You create your regex inside the closure named make-regex. The compiler spots that you've used $what's-in-the-box and therefore hangs on to the variable even after the make-regex closure returns. Later on, if/when your regex is run, the $what's-in-the-box variable will be replaced by its value at the moment the regex is run.
Footnotes
[1] There are plenty of huge complications you'll encounter if you try to do an end-run around using the compiler. Even something simple like interpolation is non-trivial. To quote Jonathan Worthington, in 2020:
I thought oh I'm only building [a tool that's] not the real compiler. I can ... have a simpler model of the symbol resolution. In the end it turned out that no I couldn't really. That started to create us some problems. When we aligned the way that we resolved symbols with the same algorithms and lookup structures that was being used in the compiler then suddenly it all became a lot simpler.

Static testing for Scala

There are some nice libraries for testing in Scala (Specs, ScalaTest, ScalaCheck). However, with Scala's powerful type system, important parts of an API being developed in Scala are expressed statically, usually in the form of some undesirable or disallowed behavior being prevented by the compiler.
So, what is the best way to test whether something is prevented by the compiler when designing an library or other API? It is unsatisfying to comment out code that is supposed to be uncompilable and then uncomment it to verify.
A contrived example testing List:
val list: List[Int] = List(1, 2, 3)
// should not compile
// list.add("Chicka-Chicka-Boom-Boom")
Does one of the existing testing libraries handle cases like this? Is there an approach that people use that works?
The approach I was considering was to embed code in a triple-quote string or an xml element and call the compiler in my test. Calling code looking something like this:
should {
notCompile(<code>
val list: List[Int] = List(1, 2, 3)
list.add("Chicka-Chicka-Boom-Boom")
</code>)
}
Or, something along the lines of an expect-type script called on the interpreter.
I have created some specs executing some code snippets and checking the results of the interpreter.
You can have a look at the Snippets trait. The idea is to store in some org.specs.util.Property[Snippet] the code to execute:
val it: Property[Snippet] = Property(Snippet(""))
"import scala.collection.List" prelude it // will be prepended to any code in the it snippet
"val list: List[Int] = List(1, 2, 3)" snip it // snip some code (keeping the prelude)
"list.add("Chicka-Chicka-Boom-Boom")" add it // add some code to the previously snipped code. A new snip would remove the previous code (except the prelude)
execute(it) must include("error: value add is not a member of List[Int]") // check the interpreter output
The main drawback I found with this approach was the slowness of the interpreter. I don't know yet how this could be sped up.
Eric.

Writing a TemplateLanguage/VewEngine

Aside from getting any real work done, I have an itch. My itch is to write a view engine that closely mimics a template system from another language (Template Toolkit/Perl). This is one of those if I had time/do it to learn something new kind of projects.
I've spent time looking at CoCo/R and ANTLR, and honestly, it makes my brain hurt, but some of CoCo/R is sinking in. Unfortunately, most of the examples are about creating a compiler that reads source code, but none seem to cover how to create a processor for templates.
Yes, those are the same thing, but I can't wrap my head around how to define the language for templates where most of the source is the html, rather than actual code being parsed and run.
Are there any good beginner resources out there for this kind of thing? I've taken a ganer at Spark, which didn't appear to have the grammar in the repo.
Maybe that is overkill, and one could just test-replace template syntax with c# in the file and compile it. http://msdn.microsoft.com/en-us/magazine/cc136756.aspx#S2
If you were in my shoes and weren't a language creating expert, where would you start?
The Spark grammar is implemented with a kind-of-fluent domain specific language.
It's declared in a few layers. The rules which recognize the html syntax are declared in MarkupGrammar.cs - those are based on grammar rules copied directly from the xml spec.
The markup rules refer to a limited subset of csharp syntax rules declared in CodeGrammar.cs - those are a subset because Spark only needs to recognize enough csharp to adjust single-quotes around strings to double-quotes, match curley braces, etc.
The individual rules themselves are of type ParseAction<TValue> delegate which accept a Position and return a ParseResult. The ParseResult is a simple class which contains the TValue data item parsed by the action and a new Position instance which has been advanced past the content which produced the TValue.
That isn't very useful on it's own until you introduce a small number of operators, as described in Parsing expression grammar, which can combine single parse actions to build very detailed and robust expressions about the shape of different syntax constructs.
The technique of using a delegate as a parse action came from a Luke H's blog post Monadic Parser Combinators using C# 3.0. I also wrote a post about Creating a Domain Specific Language for Parsing.
It's also entirely possible, if you like, to reference the Spark.dll assembly and inherit a class from the base CharGrammar to create an entirely new grammar for a particular syntax. It's probably the quickest way to start experimenting with this technique, and an example of that can be found in CharGrammarTester.cs.
Step 1. Use regular expressions (regexp substitution) to split your input template string to a token list, for example, split
hel<b>lo[if foo]bar is [bar].[else]baz[end]world</b>!
to
write('hel<b>lo')
if('foo')
write('bar is')
substitute('bar')
write('.')
else()
write('baz')
end()
write('world</b>!')
Step 2. Convert your token list to a syntax tree:
* Sequence
** Write
*** ('hel<b>lo')
** If
*** ('foo')
*** Sequence
**** Write
***** ('bar is')
**** Substitute
***** ('bar')
**** Write
***** ('.')
*** Write
**** ('baz')
** Write
*** ('world</b>!')
class Instruction {
}
class Write : Instruction {
string text;
}
class Substitute : Instruction {
string varname;
}
class Sequence : Instruction {
Instruction[] items;
}
class If : Instruction {
string condition;
Instruction then;
Instruction else;
}
Step 3. Write a recursive function (called the interpreter), which can walk your tree and execute the instructions there.
Another, alternative approach (instead of steps 1--3) if your language supports eval() (such as Perl, Python, Ruby): use a regexp substitution to convert the template to an eval()-able string in the host language, and run eval() to instantiate the template.
There are sooo many thing to do. But it does work for on simple GET statement plus a test. That's a start.
http://github.com/claco/tt.net/
In the end, I already had too much time in ANTLR to give loudejs' method a go. I wanted to spend a little more time on the whole process rather than the parser/lexer. Maybe in version 2 I can have a go at the Spark way when my brain understands things a little more.
Vici Parser (formerly known as LazyParser.NET) is an open-source tokenizer/template parser/expression parser which can help you get started.
If it's not what you're looking for, then you may get some ideas by looking at the source code.