Constant not found: overload in GF - gf

I was trying to make an overload of function that I have created on GF, but after using the overload function I keep getting this error message.
Am I using this function of GF incorrectly or is there is any new update to this functionality.
Here is what I was trying to do
Sentence = sentence(mkN(“random”));
oper
sentence:overload {
sentence:N-> Utt =
\noun->
mkUtt(mkNP(noun));
sentence:V-> Utt =
\ verb->
mkutt(mkImp(mkV2(verb)));
};
Thank you~

It looks like your code is a snippet, so there may be a few things preventing your grammar from compiling.
To start, if you check the documentation, you can see that there are two forms for overloading operations.
You are using the syntax to give the types to define the operations. Instead, you should use this existing code with the syntax for defining the operations.
oper sentence = overload {...} ;
The syntax that you are using should be used to define your types before you make the overload oper.

Weird unicode characters
There are a lot of things wrong in this sample. First of all, it seems like you have written it in a word processor instead of a text editor, because the characters are not what they are supposed to be. For instance, : is the real colon, and your code snippet has :, which is a different Unicode codepoint. Look at them side by side:
:: -- first the wrong one, followed by the correct one
:: -- The wrong colon includes more space
:: -- the right colon is thicker and less space
The solution to this problem is to program in a text editor (like Atom or Sublime) or a programming IDE (like VS code), not in a word processor. Many editors don't have a syntax highlighting for GF, but you can see in this video how to use Haskell mode instead.
Actual problem
Now let's suppose we fixed the characters, then we have this.
sentence : overload {
sentence : N -> Utt =
\noun -> mkUtt (mkNP noun) ;
sentence : V -> Utt =
\verb -> mkUtt something_NP ; -- the original had an unrelated bug
} ;
Now we get the error you describe. The solution is to change the first : to a =, like this:
sentence = overload { -- = used to be :
{- sentence : N -> Utt = -- the rest unchanged
\noun -> mkUtt (mkNP noun) ;
sentence : V -> Utt =
\verb -> mkUtt something_NP ; -- the original had an unrelated bug
} ; -}
Working version of the whole
As Paula said, your example is just fragments. Here's a minimal fully working version.
abstract Sentences = {
cat
S ;
fun
Sentence : S ;
}
And here's a concrete, with all the weird Unicode characters swapped out for actual characters. I also fixed the mkImp instance for V2: if you see here in the synopsis, there's no mkImp instance for a single V2, it's either V or V2 and NP.
concrete SentencesEng of Sentences = open SyntaxEng, ParadigmsEng in {
lincat
S = Utt ;
lin
Sentence = sentence (mkN "random") ;
oper
sentence = overload {
sentence : N -> Utt =
\noun -> mkUtt (mkNP noun) ;
sentence : V -> Utt =
\verb -> mkUtt (mkImp (mkV2 verb) something_NP) ;
} ;
}

Related

Antlr4: Mismatch input

My grammar:
qwe.g4
grammar qwe;
query
: COLUMN OPERATOR value EOF
;
COLUMN
: [a-z_]+
;
OPERATOR
: ('='|'>'|'<')
;
SCALAR
: [a-z_]+
;
value
: SCALAR
;
WS : [ \t\r\n]+ -> skip ;
Handling in Python code:
qwe.py
from antlr4 import InputStream, CommonTokenStream
from qweLexer import qweLexer
from qweParser import qweParser
conditions = 'total_sales>qwe'
lexer = qweLexer(InputStream(conditions))
stream = CommonTokenStream(lexer)
parser = qweParser(stream)
tree = parser.query()
for child in tree.getChildren():
print(child, '==>', type(child))
Running qwe.py outputs error when parsing (lexing?) value:
How to fix that?
I read some and suppose that there is something to do with COLUMN rule that also matches value...
Your COLUMN and SCALAR lexer rules are identical. When the LExer matches two rules, then the rule that recognizes the longest token will win. If the token lengths are the same (as the are here), the the first rule wins.
Your Token Stream will be COLUMN OPERATOR COLUMN
That's thy the query rule won't match.
As a general practice, it's good to use the grun alias (that the setup tutorial will tell you how to set up) to dump the token stream.
grun qwe tokens -tokens < sampleInputFile
Once that gives you the expected output, you'll probably want to use the grun tool to display parse trees of your input to verify that is correct. All of this can be done without hooking up the generated code into your target language, and helps ensure that your grammar is basically correct before you wire things up.

ANTLR trying to create a lexer rule that goes up to, but not including, some symbols

I'm using ANTLR4 to parse text adventure game dialogue files written in Yarn, so mostly free form text and loads of island grammars, and for the most part things are going smoothly but I am having an issue excluding certain text inside the Shortcut mode (when presenting options for the player to choose from).
Basically I need to write a rule to match anything except #, newline or <<. When it hits a << it needs to move into a new mode for handling expressions of various kinds or to just leave the current mode so that the << will get picked up by the already existing rules.
A cut down version of my lexer (ignoring rules for expressions):
lexer grammar YarnLexer;
NEWLINE : ('\n') -> skip;
CMD : '<<' -> pushMode(Command);
SHORTCUT : '->' -> pushMode(Shortcut);
HASHTAG : '#' ;
LINE_GOBBLE : . -> more, pushMode(Line);
mode Line;
LINE : ~('\n'|'#')* -> popMode;
mode Shortcut ;
TEXT : CHAR+ -> popMode;
fragment CHAR : ~('#'|'\n'|'<');
mode Command ;
CMD_EXIT : '>>' -> popMode;
// RULES FOR OPERATORS/IDs/NUMBERS/KEYWORDS/etc
CMD_TEXT : ~('>')+ ;
And the parser grammar (again ignoring all the rules for expressions):
parser grammar YarnParser;
options { tokenVocab=YarnLexer; }
dialogue: statement+ EOF;
statement : line_statement | shortcut_statement | command_statement ;
hashtag : HASHTAG LINE ;
line_statement : LINE hashtag? ;
shortcut_statement : SHORTCUT TEXT command_statement? hashtag?;
command_statement : CMD expression CMD_EXIT;
expression : CMD_TEXT ;
I have tested the Command mode when it is by itself and everything inside there is working fine, but when I try to parse my example input:
Where should we go?
-> the park
-> the zoo
-> Peter's house <<if $metPeter == true >>
ok shall we take the bus?
-> :<
-> ok
<<set $daySpent = true>>
my issue is the line:
-> Peter's house <<if $metPeter == true >>
gets matched completely as TEXT and the CMD rules just gets ignored in favour by far longer TEXT.
My first thought was to add < to the set but then I can't have text like:
-> :<
which should be perfectly valid. Any idea how to do this?
Adding a single left angle bracket to the exclusion list creates a single corner case that is easily handled:
TEXT : CHAR+ ;
CMD : '<<' -> pushMode(Command);
LAB : '<' -> type(TEXT) ;
fragment CHAR : ~('\n' | '#' | '<') ;

What is the ANTLR4 equivalent of a ! in a lexer rule?

I'm working on converting an old ANTLR 2 grammar to ANTLR 4, and I'm having trouble with the string rule.
STRING :
'\''!
(
~('\'' | '\\' | '\r' | '\n')
)*
'\''!
;
This creates a STRING token whose text contains the contents of the string, but does not contain the starting and ending quotes, because of the ! symbol after the quote literals.
ANTLR 4 chokes on the ! symbol, ('!' came as a complete surprise to me (AC0050)) but if I leave it off, I end up with tokens that contain the quotes, which is not what I want. What's the correct way to port this to ANTLR 4?
Antlr4 generally treats tokens as being immutable, at least in the sense that there is no support for a language neutral equivalent of !.
Perhaps the simplest way to accomplish the equivalent is:
string : str=STRING { Strings.unquote($str); } ;
STRING : SQuote ~[\r\n\\']* SQuote ;
fragment SQuote : '\'' ;
where Strings.unquote is:
public static void unquote(Token token) {
CommonToken ct = (CommonToken) token;
String text = ct.getText();
text = .... unquote it ....
ct.setText(text);
}
The reason for using a parser rule is because attribute references are not (currently) supported in the lexer. Still, it could be done on the lexer rule - just would require a slight bit more effort to dig to the token.
An alternative to modifying the token text is to implement a custom token with custom fields and methods. See this answer if of interest.
I believe in ANTLR4 your problem can be solved using lexical modes and lexer commands.
Here is an example from there that I think does exactly what you need (although for double quotes but it's an easy fix):
lexer grammar Strings;
LQUOTE : '"' -> more, mode(STR) ;
WS : [ \r\t\n]+ -> skip ;
mode STR;
STRING : '"' -> mode(DEFAULT_MODE) ; // token we want parser to see
TEXT : . -> more ; // collect more text for string

ANTLRv3 not reading options

I am very new to ANTLR and am trying to understand how the Lexer and Parser rules work. I'm experiencing issues with a grammar I've written that seem to be related to lexer tokens with multiple characters being seen as "matches" even when only the first few characters actually match. To demonstrate this, I have written a simple ANTLR 3 Grammar:
grammar test;
options {
k=3;
}
#lexer::header { package test;}
#header {package test;}
sentence : (CHARACTER)*;
CHARACTER : 'a'..'z'|' ';
SPECIAL : 'special';
I'm using AntlrWorks to parse the following test input:
apple basic say sponsor speeds speckled specific wonder
The output I get is:
apple basic say nsor ds led ic wonder
It seems to me that the LEXER is using k=1 and therefore matching my SPECIAL token with anything that includes the two letters 'sp'. Once it encounters the letters 'sp', it then matches sucessive characters within the SPECIAL literal until the actual input fails to match the expected token - at which point it throws an error (consuming that character) and then continues with the rest of the sentence. Each error is of the form:
line 1:18 mismatched chracter 'o' expecting 'e'
However, this isn't the behaviour I'm trying to create. I wish to create a lexer token that matches the keyword ('special') - for use in other parser rules not included in this test example. However, I don't want other rules/input that just happens to include the same initial characters to be affected
To summarize:
How do I actually set antlr 3 options (such as k=2 or k=3 etc)? It seems to me, at least, that the options I'm trying to use here aren't being set.
Is there a better way to create parser or lexer rules to match a particular keyword in my input, without affecting processing of other parts of the input that don't contain a full match?
The k in the options { ... } section defines the look ahead of the parser, not the lexer.
Note that the grammar
CHARACTER : 'a'..'z'|' ';
SPECIAL : 'special';
is ambiguous: your 'special' could also be considered as 7 'a'..'z''s. Normally, it'd be lexed as follows:
grammar Test;
sentence : (special | word | space)+ EOF;
special : SPECIAL;
word : WORD;
space : SPACE;
SPECIAL : 'special';
WORD : 'a'..'z'+;
SPACE : ' ';
which will parse the input:
specia special specials
as follows:
I.e. it gets (more or less) tokenized as a combination of LL(1) and "longest-matched". Sorry, I realize that's a bit vague, but the Definitive ANTLR Reference does not clarify this exactly (at least, I can't find it...). But I realize that this might not be what you're looking for.
AFAIK, the only way to produce single char-tokens and define keywords that are made up from these single char-tokens, is done by merging these two tokens in a single rule, and use predicates and manual look-ahead to see if they conform to a key word, and if not, change the type of the token in a "fall through" sub rule. A demo:
grammar test;
tokens {
LETTER;
}
#lexer::members {
// manual look ahead
private boolean ahead(String text) {
for(int i = 0; i < text.length(); i++) {
if(input.LA(i+1) != text.charAt(i)) {
return false;
}
}
return true;
}
}
sentence
: (t=. {System.out.printf("\%-7s :: '\%s'\n", tokenNames[$t.type], $t.text);})+ EOF
;
SPECIAL
: {ahead("special")}?=> 'special'
| {ahead("keyword")}?=> 'keyword'
| 'a'..'z' {$type = LETTER;} // Last option and no keyword is found:
// change the type of this token
;
SPACE
: ' '
;
The parser generated from the above grammar can be tested with the class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("apple basic special speckled keyword keywor");
testLexer lexer = new testLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
testParser parser = new testParser(tokens);
parser.sentence();
}
}
As you can see, when parsing the input:
apple basic special speckled keyword keywor
the following output is generated:
LETTER :: 'a'
LETTER :: 'p'
LETTER :: 'p'
LETTER :: 'l'
LETTER :: 'e'
SPACE :: ' '
LETTER :: 'b'
LETTER :: 'a'
LETTER :: 's'
LETTER :: 'i'
LETTER :: 'c'
SPACE :: ' '
SPECIAL :: 'special'
SPACE :: ' '
LETTER :: 's'
LETTER :: 'p'
LETTER :: 'e'
LETTER :: 'c'
LETTER :: 'k'
LETTER :: 'l'
LETTER :: 'e'
LETTER :: 'd'
SPACE :: ' '
SPECIAL :: 'keyword'
SPACE :: ' '
LETTER :: 'k'
LETTER :: 'e'
LETTER :: 'y'
LETTER :: 'w'
LETTER :: 'o'
LETTER :: 'r'
See the Q&A What is a 'semantic predicate' in ANTLR? to learn more about predicates in ANTLR.

ANTLR - Implicit AND Tokens In Tree

I’m trying to build a grammar that interprets user-entered text, search-engine style. It will support the AND, OR, NOT and ANDNOT Boolean operators. I have pretty much everything working, but I want to add a rule that two adjacent keywords outside of a quoted string implicitly are treated as in an AND clause. For example:
cheese and crackers = cheese AND crackers
(up and down) or (left and right) = (up AND down) OR (left AND right)
cat dog “potbelly pig” = cat AND dog AND “potbelly pig”
I’m having trouble with the last one, and I’m hoping somebody can point me in the right direction. Here’s my *.g file thus far, and please be nice, my ANTLR experience spans less than a work day:
grammar SearchEngine;
options { language = CSharp2; output = AST; }
#lexer::namespace { Demo.SearchEngine }
#parser::namespace { Demo.SearchEngine }
LPARENTHESIS : '(';
RPARENTHESIS : ')';
AND : ('A'|'a')('N'|'n')('D'|'d');
OR : ('O'|'o')('R'|'r');
ANDNOT : ('A'|'a')('N'|'n')('D'|'d')('N'|'n')('O'|'o')('T'|'t');
NOT : ('N'|'n')('O'|'o')('T'|'t');
fragment CHARACTER : ('a'..'z'|'A'..'Z'|'0'..'9');
fragment QUOTE : ('"');
fragment SPACE : (' '|'\n'|'\r'|'\t'|'\u000C');
WS : (SPACE) { $channel=HIDDEN; };
PHRASE : (QUOTE)(CHARACTER)+((SPACE)+(CHARACTER)+)+(QUOTE);
WORD : (CHARACTER)+;
startExpression : andExpression;
andExpression : andnotExpression (AND^ andnotExpression)*;
andnotExpression : orExpression (ANDNOT^ orExpression)*;
orExpression : notExpression (OR^ notExpression)*;
notExpression : (NOT^)? atomicExpression;
atomicExpression : PHRASE | WORD | LPARENTHESIS! andExpression RPARENTHESIS!;
Since your AND-rule has the optional AND-keyword, you should create an imaginary AND-token and use a rewrite-rule to "inject" that token in your tree. In this case, you can't make use of ANTLR's short-hand ^ root-operator. You'll have to use the -> rewrite operator.
Your andExpression should look like:
andExpression
: (andnotExpression -> andnotExpression)
(AND? a=andnotExpression -> ^(AndNode $andExpression $a))*
;
A detailed description of this (perhaps cryptic) notation is given in Chapter 7, section Rewrite Rules in Subrules, page 173-174 of The Definitive ANTLR Reference by Terence Parr.
I ran a quick test to see if the grammar produces the proper AST with the new andExpression rule. After parsing the string cat dog "potbelly and pig" and FOO, the generated parser produced the following AST:
alt text http://img580.imageshack.us/img580/7370/andtree.png
Note that the AndNode and Root are imaginary tokens.
If you want to know how to create the AST picture above, see this thread: Visualizing an AST created with ANTLR (in a .Net environment)
EDIT
When parsing both one two three and (one two) three, the following AST is created:
alt text http://img203.imageshack.us/img203/2558/69551879.png
And when parsing (one two) OR three, the following AST is created:
alt text http://img340.imageshack.us/img340/8779/73390353.png
which seems to be the proper way in all cases.