How to debug 'no production for X' in lbnf / bnfc grammar? - grammar

when playing around with lbnf/bnfc, in some cases I would like it to optionally allow for the plural form. However it always says 'no production for 'Plural' appearing in rule' and and I do not get why.
Relevant line below. SomeOther and SomeToken are basically strings.
HeadAuthors. Authors::= "AUTHOR" [Plural] ":" SomeOther SomeToken ;
Plural. Plural::= "S" ;

I would skip the list, and make Plural into a rule like this
rules Plural ::= "S" | ;
For documentation about the rules macro, see https://bnfc.readthedocs.io/en/latest/lbnf.html#rules.
If you want to keep the list, then you need to give a separator or terminator for Plural, see here https://bnfc.readthedocs.io/en/latest/lbnf.html#terminator, otherwise it doesn't become a list. You can just write
terminator Plural "" ;

Related

ANTLR not matching empty comments

I am using ANTLR to parse a language which uses the colon for both a comment indicator and as part of a 'becomes equal to' assignment. So for example in the line
Index := 2 :Set Index
I need to recognize the first part as an assignment statement and the text after the second colon as a comment. Currently I do this using the rule:
COMMENT : ':'+ ~[:='\r\n']*;
This seems to work OK apart from when the colon is immediately followed by a new line. e.g. in the line
Index := 2 :
the newline occurs immediately after the second colon. In this case the comment is not recognized and the rest of the code is not parsed in the correct context. If there is a single space after the second colon the line is parsed correctly.
I expected the '\r'\n' to cope with this but it only seems to work if there is at least one character after the comment symbol - have I missed something from the command?
The braces denote a collection of characters without any quotes. Hence your '\r\n' literal doesn't work there (you should have got a warning that the apostrophe is included more than once in the char range.
Define the comment like this instead:
COMMENT: ':'+ ~[:=\n\r]*;

ANTLR - Checking for a String's "contruction"

Currently working with ANTLR and found out something interesting that's not working as I intended.
I try to run something along the lines of "test 10 cm" through my grammar and it fails, however "test 10 c m" works as the previous should. The "cm" portion of the code is what I call "wholeunit" in my grammar and it is as follows:
wholeunit :
siunit
| unitmod siunit
| wholeunit NUM
| wholeunit '/' wholeunit
| wholeunit '.' wholeunit
;
What it's doing right now is the "unitmod siunit" portion of the rule where unitmod = c and siunit = m .
What I'd like to know is how would I make it so the grammar would still follow the rule "unitmod siunit" without the need for a space in the middle, I might be missing something huge. (Yes, I have spaces and tabs marked to be skipped)
Probable cause is "cm" being considered another token together (possibly same token type as "test"), rather than "c" and "m" as separate tokens.
Remember that in ANTLR lexer, the rule matching the longest input wins.
One solution would possibly be to make the wholeunit a lexer rule rather than parser rule, and make sure it's above the rule that matches any word (like "test") - if same input can be matched by multiple rules, ANTLR selects the first rule in order they're defined in.

Failed to parse command using ANTLR3 grammar, if command has same word which is declared as rule

I have facing a problem while parsing some command with the parser which, I have implemented using ANLTR3. Parser fails to parse those commands which contains 'any-word' that is declared as lexer rule in the grammar.
For Example take a look following grammar:
show :
SHOW TABLES '[' projectName? tableName']' -> ^(SHOW TABLES_ ^(PROJECT_NAME projectName)? ^(DATASET_TABLE tableName));
SHOW : S H O W;
If i try to parse command 'SHOW TABLES [sample-project:SHOW]' then parse fails for this command.But if I change the SHOW word then it works.
SHOW TABLES [sample-project:SHOW] - this works.
I don't want to get name as string which is surrounded in quotes(").
Can anyone suggest solution? I am using ANTLR3.
Thanks in advance.
This is a typical effect of using a reserved word as identifier. In ANTLR when you define a reserved word like your SHOW rule it will implicitly excluded from a identifier rule you might have defined after that keyword rule.
The solution to allow such keywords also as identifiers in rules like your tablName is to make that rule accept certain (or all) keywords that could be accepted in that place (and will not act as keywords then). Example:
tableName:
IDENTIFIER
| SHOW
| <others go here>
;

ANTLR Filtering Grammar for specific tokens and ignore everything else possible?

I am currently trying to create a SQL-grammar for the Data Definition Language.
For my program, the parser only needs to recognize some specific sql-commands like "CREATE TABLE", "ALTER TABLE", etc.
Since I am working with automatically generated export files there is also a lot of overhead in the things I am gonna parse like "SET CURRENT PATH" etc. This is not necessary to be parsed and I am wondering if there is a way to ignore "everything else" that is not defined in the SQL-Statements. Hope anyone has some experience with this..
Here's the header part of my grammar:
list: sql_expression ENDOFFILE?;
sql_expression:
((create_statement|alter_table_statement|create_unique_index_statement|insert_statement) SEMICOLON)+
;
...
and I am wondering if it is possible to extend the sql_expression rule like this:
list: sql_expression ENDOFFILE?;
sql_expression:
((create_statement|alter_table_statement|create_unique_index_statement|insert_statement|else_stuff) SEMICOLON)+
;
Thanks in advance!
Yes you can achieve this.
You can ignore statements like "SET CURRENT PATH" or "CONNECT ..blah blah". These are nothing but SQL plus commands. You need to swallow everything which comes after particular keyword.
For e.g , in case of "ACCEPT ..blah.." , you can create following rule:
SQL_PLUS_ACCEPT
: 'accept' SPACE ( ~('\r' | '\n') )* (NEWLINE|EOF)
;
accept_key
: SQL_PLUS_ACCEPT
;
this will ignore "ACCEPT.. " command and u can parse whatever stmt u wnat to parse. You need to do this for other sql plus commands like SET, CONNECT, EXIT etc.
You can refer to this link

Antlr 3 keywords and identifiers colliding

Surprise, I am building an SQL like language parser for a project.
I had it mostly working, but when I started testing it against real requests it would be handling, I realized it was behaving differently on the inside than I thought.
The main issue in the following grammar is that I define a lexer rule PCT_WITHIN for the language keyword 'pct_within'. This works fine, but if I try to match a field like 'attributes.pct_vac', I get the field having text of 'attributes.ac' and a pretty ANTLR error of:
line 1:15 mismatched character u'v' expecting 'c'
GRAMMAR
grammar Select;
options {
language=Python;
}
eval returns [value]
: field EOF
;
field returns [value]
: fieldsegments {print $field.text}
;
fieldsegments
: fieldsegment (DOT (fieldsegment))*
;
fieldsegment
: ICHAR+ (USCORE ICHAR+)*
;
WS : ('\t' | ' ' | '\r' | '\n')+ {self.skip();};
ICHAR : ('a'..'z'|'A'..'Z');
PCT_CONTAINS : 'pct_contains';
USCORE : '_';
DOT : '.';
I have been reading everything I can find on the topic. How the Lexer consumes stuff as it finds it even if it is wrong. How you can use semantic predication to remove ambiguity/how to use lookahead. But everything I read hasn't helped me fix this issue.
Honestly I don't see how it even CAN be an issue. I must be missing something super obvious because other grammars I see have Lexer rules like EXISTS but that doesn't cause the parser to take a string like 'existsOrNot' and spit out and IDENTIFIER with the text of 'rNot'.
What am I missing or doing completely wrong?
Convert your fieldsegment parser rule into a lexer rule. As it stands now it will accept input like
"abc
_ abc"
which is probably not what you want. The keyword "pct_contains" won't be matched by this rule since it is defined separately. If you want to accept the keyword in certain sequences as regular identifier you will have to include it in the accepted identifier rule.