For spirit::x3, what is the right way to deal with unknown symbols? - boost-spirit-x3

A newbie for x3... The code is adapted from the roman.cpp in the x3 tutorial. Suppose I have a symbol table like below:
struct car_models_ : x3::symbols<char, unsigned>
{
car_models_()
{
add
("sedan", 1)
("compact", 2)
("suv", 3)
;
}
} car_models;
Then parse,
char const *first = "Model: sedan";
char const *last = first + std::strlen(first);
parse(first, last, "Model: " >> car_models[action()]);
If there is new model not listed in the symbol table, what would be the right way to handle it? Is there a way to add a wildcard as the last entry in the symbol table, and then somehow pass an unknown model to action (e.g., number "0" in this case)?

There is no way to do it inside the symbol table itself. One possibility is:
auto ext_car_models = car_models | (x3::omit[*x3::lower] >> attr(0))
Then to parse:
parse(first, last, "Model: " >> ext_car_models[action()]);
Ignoring the attribute for a moment, your symbol table is effectively syntactic sugar for:
x3::string("sedan") | "compact" | "suv"
So, handling an unknown string in that position would need to be handled the same way. You will need to define a parser that defines what a model string looks like. Possibly *x3::lower

Related

Swift 5.7 RegexBuilder: Nested TryCapture - transform / Mapping Output?

Here in this example I tried to capture two Int values and then capture them together as a struct. This gives a "Thread 1: signal SIGABRT" error.
(NOTE: I know that my example could be fixed by simply not nesting the Captures and handling the pattern matching differently. This is just simplified example code for the sake of this question.)
let intCapture = Regex {
TryCapture(as: Reference(Int.self)) {
OneOrMore(.digit)
} transform: { result in
return Int(result)
}
}
let mainPattern = Regex {
TryCapture(as: Reference(Floor.self)) {
"floor #: "
intCapture
" has "
intCapture
" rooms"
}transform: { ( stringMatch, floorInt, roomInt ) in
return Floor(floorNumber: floorInt, roomCount: roomInt)
}
}
struct Floor {
let floorNumber: Int
let roomCount: Int
}
let testString = "floor #: 34 has 25 rooms"
let floorData = testString.firstMatch(of: mainPattern)
After looking into it, I found that in the mainPattern's 'transform' the 'floorInt' and 'roomInt' are what is causing the problem.
The funny part is that when you look at the 'Quick Help'/Option+click, it shows that they are both type Int! It knows what is there but you are not able to capture it!
Further, when I erase one of them, let's say 'floorInt', it gives this error:
Contextual closure type '(Substring) throws -> Floor?' expects 1 argument, but 2 were used in closure body
So really, even though for SOME reason it does know that there are two captured Int values there, it doesn't let you access them for the sake of the transform.
Not deterred, I was helped out in another question by a very helpful user who pointed me to the Evolution submission where they mentioned a .mapOutput, but sadly it seems this particular feature was never implemented!
Is there no real way to create a new transformed value from nested transformed values like this? Any help would be greatly appreciated.

Why doesn't Perl 6's colon pair and name interpolation work together?

I was playing around with Interpolating into names. I was mostly interested in this colon syntax feature to turn a variable into a pair where the identifier is the key.
my %Hamadryas = map { slip $_, 0 }, <
februa
honorina
velutina
>;
{
my $pair = :%Hamadryas;
say $pair; # Hamadryas => { ... }
}
put '-' x 50;
But, just for giggles, I wanted to try it with variable name interpolation too. I know this is stupid because if I know the name I don't need the colon syntax to get it. But, I also thought that it should work by accident:
{
my $name = 'Hamadryas';
# Since I already have the name, I could just:
# my $pair = $name => %::($name)
# But, couldn't I just line up the syntax?
my $pair = :%::($name); # does not work
say $pair;
}
Why doesn't that :%::($name) syntax work? That's more a question of when the parser decides that it's not parsing something it wants to understand. I figured it would see the : and start processing a colon pair, then see the % and know it had a hash, even though there's the :: after the %.
Is there a way to make it work with tricks and grammar mutations?

What could be a reason for `_localctx` being null in an antlr4 semantic predicate?

I'm using list labels to gather tokens and semantic predicates to validate sequences in my parser grammar. E.g.
line
:
(text+=WORD | text+=NUMBER)+ ((BLANK | SKIP)+ (text+=WORD | text+=NUMBER)+)+
{Parser.validateContext(_localctx)}?
(BLANK | SKIP)*
;
where
WORD: [\u0021-\u002F\u003A-\u007E]+; // printable ASCII characters (excluding SP and numbers)
NUMBER: [\u0030-\u0039]+; // printable ASCII number characters
BLANK: '\u0020';
SKIP: '\u0020\u0020' | '\t'; // two SPs or a HT symbol
The part of Parser.validateContext used to validate the line rule would be implemented like this
private static final boolean validateContext(ParserRuleContext context) {
//.. other contexts
if(context instanceof LineContext)
return "<reference-sequence>".equals(Parser.joinTokens(((LineContext) context).text, " "));
return false;}
where Parser.joinTokens is defined as
private static String joinTokens(java.util.List<org.antlr.v4.runtime.Token> tokens, String delimiter) {
StringBuilder builder = new StringBuilder();
int i = 0, n;
if((n = tokens.size()) == 0) return "";
builder.append(tokens.get(0).getText());
while(++i < n) builder.append(delimiter + tokens.get(i).getText());
return builder.toString();}
Both are put in a #parser::members clause a the beginning of the grammar file.
My problem is this: sometimes the _localctx reference is null and I receive "no viable alternative" errors. These are probably caused because the failing predicate guards the respective rule and no alternative is given.
Is there a reason–potentially an error on my part–why _localctx would be null?
UPDATE: The answer to this question seems to suggest that semantic predicates are also called during prediction. Maybe during prediction no context is created and _localctx is set to null.
The semantics of _localctx in a predicate are not defined. Allowable behavior includes, but is not limited to the following (and may change during any release):
Failing to compile (no identifier with that name)
Using the wrong context object
Not having a context object (null)
To reference the context of the current rule from within a predicate, you need to use $ctx instead.
Note that the same applies for rule parameters, locals, and/or return values which are used in a predicate. For example, the parameter a cannot be referenced as a, but must instead be $a.

How to use Start States in ML-Lex?

I am creating a tokeniser in ML-Lex a part of the definition of which is
datatype lexresult = STRING
| STRINGOP
| EOF
val error = fn x => TextIO.output(TextIO.stdOut,x ^ "\n")
val eof = fn () => EOF
%%
%structure myLang
digit=[0-9];
ws=[\ \t\n];
str=\"[.*]+\";
strop=\[[0-9...?\^]\];
%s alpha;
alpha=[a-zA-Z];
%%
<alpha> {alphanum}+ => (ID);
. => (error ("myLang: ignoring bad character " ^ yytext); lex());
I want that the type ID should be detected only when it starts with or is found after "alpha". I know that writing it as
{alpha}+ {alphanum}* => (ID);
is an option but I need to learn to use the use of start states as well for some other purposes. Can someone please help me on this?
The information you need is in the documentation which comes with SML available in various places. Many university courses have online notes which contain working examples.
The first thing to note from your example code is that you have overloaded the name alpha and used it to name a state and a pattern. This is probably not a good idea. The pattern alphanum is not not defined, and the result ID is not declared. Some basic errors which you should probably fix before thinking about using states - or posting a question here on SO. Asking for help for code with such obvious faults in it is not encouraging help from the experts. :-)
Having fixed up those errors, we can start using states. Here is my version of your code:
datatype lexresult = ID
| EOF
val error = fn x => TextIO.output(TextIO.stdOut,x ^ "\n")
val eof = fn () => EOF
%%
%structure myLang
digit=[0-9];
ws=[\ \t\n];
str=\"[.*]+\";
strop=\[[0-9...?\^]\];
%s ALPHA_STATE;
alpha=[a-zA-Z];
alphanum=[a-zA-Z0-9];
%%
<INITIAL>{alpha} => (YYBEGIN ALPHA_STATE; continue());
<ALPHA_STATE>{alphanum}+ => (YYBEGIN INITIAL; TextIO.output(TextIO.stdOut,"ID\n"); ID);
. => (error ("myLang: ignoring bad character " ^ yytext); lex());
You can see I've added ID to the lexresult, named the state ALPHA_STATE and added the alphanum pattern. Now lets look at how the state code works:
There are two states in this program, they are called INITIAL and ALPHA_STATE (all lex programs have an INITIAL default state). It always begins recognising in the INITIAL state. Having a rule <INITIAL>{alpha} => indicates that if you encounter a letter when in the initial state (i.e. NOT in the ALPHA_STATE) then it is a match and the action should be invoked. The action for this rule works as follows:
YYBEGIN ALPHA_STATE; (* Switch from INITIAL state to ALPHA_STATE *)
continue() (* and keep going *)
Now we are in ALPHA_STATE it enables those rules defined for this state, which enable the rule <ALPHA_STATE>{alphanum} =>. The action on this rule switch back to the INITIAL state and record the match.
For a longer example of using states (lex rather than ML-lex) you can see my answer here: Error while parsing comments in lex.
To test this ML-LEX program I referenced this helpful question: building a lexical analyser using ml-lex, and generated the following SML program:
use "states.lex.sml";
open myLang
val lexer =
let
fun input f =
case TextIO.inputLine f of
SOME s => s
| NONE => raise Fail "Implement proper error handling."
in
myLang.makeLexer (fn (n:int) => input TextIO.stdIn)
end
val nextToken = lexer();
and just for completeness, it generated the following output demonstrating the match:
c:\Users\Brian>"%SMLNJ_HOME%\bin\sml" main.sml
Standard ML of New Jersey v110.78 [built: Sun Dec 21 15:52:08 2014]
[opening main.sml]
[opening states.lex.sml]
[autoloading]
[library $SMLNJ-BASIS/basis.cm is stable]
[autoloading done]
structure myLang :
sig
structure UserDeclarations : <sig>
exception LexError
structure Internal : <sig>
val makeLexer : (int -> string) -> unit -> Internal.result
end
val it = () : unit
hello
ID

Flex/Lex - How to know if a variable was declared

My grammar allows:
C → id := E // assign a value/expression to a variable (VAR)
C → print(id) // print variables(VAR) values
To get it done, my lex file is:
[a-z]{
yylval.var_index=get_var_index(yytext);
return VAR;
}
get_var_index returns the index of the variable in the list, if it does not exist then it creates one.
It is working!
The problem is:
Everytime a variable is matched on lex file it creates a index to that variable.
I have to report if 'print(a)' is called and 'a' was not declared, and that will never happen since print(a) always creates an index to 'a'.*
How can I solve it?
Piece of yacc file:
%union {
int int_val;
int var_index;
}
%token <int_val> INTEGER
%token <var_index> VAR
...
| PRINT '(' VAR ')'{
n_lines++;
printf("%d\n",values[$3]);
}
...
| VAR {$$ =values[$1];}
This does seem a bit like a Computer Science class homework question for us to do.
Normally one would not use bison/yacc in this way. One would do the parse with bison/yacc and make a parse tree which then gets walked to perform semantic checks, such as checking for declaration before use and so on. The identifiers would normally be managed in a symbol table, rather than just a table of values to enable other attributes, such as declared to be managed. It's for these reasons that it looks like an exercise rather than a realistic application of the tools. OK; those disclaimers disposed of, lets get to an answer.
The problem would be solved by remembering what has been declared and what not. If one does not plan to use a full symbol table then a simple array of booleans indicating which are the valid values could be used. The array can be initialised to false and set to true on declaration. This value can be checked when a variable is used. As C uses ints for boolean we can use that. The only changes needed are in the bison/yacc. You omitted any syntax for the declarations, but as you indicated they are declared there must be some. I guessed.
%union {
int int_val;
int var_index;
}
int [MAX_TABLE_SIZE] declared; /* initialize to zero before starting parse */
%token <int_val> INTEGER
%token <var_index> VAR
...
| DECLARE '(' VAR ')' { n_lines++; declared[$3] = 1; }
...
| PRINT '(' VAR ')'{
n_lines++;
if (declared[$3]) printf("%d\n",values[$3]);
else printf("Variable undeclared\n");
}
...
| VAR {$$ =value[$1]; /* perhaps need to show more syntax to show how VAR used */}