Proper names with determiners in GF - gf

I'm trying to generate the following sentence in GF:
Jack wants to listen to a Taylor song
But as I can see in RGL PN -> NP is the only solution to get a proper name. How to get GF to output a proper noun with a determiner and an object.
Also I'm wondering why in RGL PN is being called proper name instead of proper noun?

The syntactic function of "Taylor" in that sentence is a modifier to "song". So the grouping isn't "a Taylor", but rather "a song", with Taylor as a modifier:
a song
a good song
a Taylor song
This answer doesn't necessarily work in other languages than English. But there are plenty of other structures you can use straight out of RGL, like "a song by NP".
Limited set of artists
If you have a fixed set of artists, it's probably easiest to just cheat and make them into adjectives in English. Here's an example:
abstract Song = {
flags startcat = Sentence ;
cat
Sentence ;
Person ;
Artist ;
fun
wantToListen : Person -> Artist -> Sentence ;
Jack : Person ;
Taylor, Ariana : Artist ;
}
For the English version, we make the lincat of Artist into AP. This works in English, because the adjectives don't inflect, so it'll just be a single string.
concrete SongEng of Song = open SyntaxEng, ParadigmsEng, LexiconEng in {
lincat
Sentence = Utt ;
Person = NP ;
Artist = AP ; -- cheat in English
lin
-- : Person -> Artist -> Sentence ;
wantToListen person artist =
let artistsSong : CN = mkCN artist song_N ;
listenToSong : VP = mkVP listen_V2 (mkNP a_Det artistsSong) ;
in mkUtt (mkS (mkCl person want_VV listenToSong)) ;
-- : Person
Jack = mkNP (mkPN "Jack") ;
-- : Artist
Taylor = mkAP (mkA "Taylor") ;
Ariana = mkAP (mkA "Ariana") ;
}
Works as intended:
$ gf SongEng.gf
…
Song> gt | l -treebank
Song: wantToListen Jack Ariana
SongEng: Jack wants to listen to an Ariana song
Song: wantToListen Jack Taylor
SongEng: Jack wants to listen to a Taylor song
Note that if your bands have an article of their own, you'll get Jack wants to listen to a The Beatles song.
Artist as modifier vs. subject
You can always make a more complex lincat for Artist, here's an example. We'll add TheBeatles as an Artist in the abstract syntax. I also add another function, where the artist is not just a modifier, but the subject.
abstract Song2 = Song ** {
flags startcat = Sentence ;
fun
TheBeatles : Artist ;
isGood : Artist -> Sentence ;
}
And here's the concrete. I'm not reusing the original SongEng, because I need to change the lincat of Artist.
concrete Song2Eng of Song2 = open SyntaxEng, ParadigmsEng, LexiconEng in {
lincat
Sentence = Utt ;
Person = NP ;
Artist = LinArtist ; -- {independent : NP ; modifier : AP}
lin
-- : Person -> Artist -> Sentence ;
wantToListen person artist =
let artistsSong : CN = mkCN artist.modifier song_N ;
listenToSong : VP = mkVP listen_V2 (mkNP a_Det artistsSong) ;
in mkUtt (mkS (mkCl person want_VV listenToSong)) ;
-- : Artist -> Sentence
isGood artist = mkUtt (mkS (mkCl artist.independent good_A)) ;
-- : Person
Jack = mkNP (mkPN "Jack") ;
-- : Artist
Taylor = mkArtist "Taylor" ;
Ariana = mkArtist "Ariana" ;
TheBeatles = mkArtist "The Beatles" "Beatles" ;
oper
LinArtist : Type = {independent : NP ; modifier : AP} ;
mkArtist = overload {
mkArtist : Str -> Str -> LinArtist = \thebeatles, beatles -> {
independent = mkNP (mkPN thebeatles) ;
modifier = mkAP (mkA beatles)
} ;
mkArtist : Str -> LinArtist = \taylor -> {
independent = mkNP (mkN taylor) ;
modifier = mkAP (mkA taylor)
}
} ;
}
And here's the output from that.
Song2> gt | l
Ariana is good
Taylor is good
The Beatles is good
Jack wants to listen to an Ariana song
Jack wants to listen to a Taylor song
Jack wants to listen to a Beatles song
If you wanted to make "The Beatles" into plural, then we can do this. Add another overload instance of mkArtist, where you give it an already constructed NP. Then you can specify that "The Beatles" are, in fact, plural.
oper
mkArtist = overload {
mkArtist : Str -> Str -> LinArtist = … ; -- same as before
mkArtist : Str -> LinArtist = … ; -- same as before
mkArtist : NP -> AP -> LinArtist = \thebeatles,beatles -> {
independent = thebeatles ;
modifier = beatles
}
lin
TheBeatles =
mkArtist
(mkNP aPl_Det (mkN "The Beatles" "The Beatles"))
(mkAP (mkA "Beatles")) ;
} ;
And this gives you the following output:
Song2> l isGood TheBeatles
The Beatles are good
Arbitrary strings as artists
Finally, if you want to use arbitrary strings as artists, you can use string literals. For the independent field, it works great: there are functions in the Symbolic module that take a string literal to NP.
However, for the modifier field, you need to go outside the API---here are instructions how to do it in a slightly less unsafe manner, but it's still not guaranteed to be stable, if the RGL internals ever change.
Disclaimers aside, here's the final extension.
abstract Song3 = Song2 ** {
flags startcat = Sentence ;
fun
StringArtist : String -> Artist ;
}
And the concrete. This time we do extend Song2Eng, because there's no need to change the lincats.
concrete Song3Eng of Song3 = Song2Eng ** open SyntaxEng, LexiconEng, SymbolicEng in {
lin
-- : String -> Artist ;
StringArtist string = {
independent = symb string ; -- symb : String -> NP, from SymbolicEng
modifier = <mkAP good_A : AP> ** {s = \\_ => string.s} -- hack
} ;
}
Let's see how it works:
Song3> p "Jack wants to listen to a skhdgdjgfhjkdfhjsdf song"
wantToListen Jack (StringArtist "skhdgdjgfhjkdfhjsdf")
Song3> p "9ortge94yhjerh90fpersk is good"
isGood (StringArtist "9ortge94yhjerh90fpersk")
Just a warning: it works perfectly fine to linearise strings with spaces, so this is ok:
Song3> l isGood (StringArtist "Spice Girls")
Spice Girls is good
But you can't parse string literals with spaces.
Song3> p "Backstreet Boys is good"
The parser failed at token 2: "Boys"

Related

How does one create an ANTLR grammar for a non-lambda-free language?

I am attempting to mimic the grammar G shown below using ANTLR v4.9.3 …
My attempt to do so is shown below …
grammar G ;
s : t s | ;
t : 'aaa' t 'bbb' | ;
I invoke the ANTLR Tool as follows …
org.antlr.v4.Tool G.g4
The tool's response is …
The following sets of rules are mutually left-recursive [s]
My question is …
How does one create an ANTLR grammar for grammar G ?
In order to eliminate the errors, grammar G has been updated to …
grammar G ;
s : t* EOF ;
t : 'aaa' 'bbb' | 'aaa' t 'bbb' ;
I am also trying to programatically distinguish between strings that are in L(G), and strings that are not in L(G) (where L(G) is the language generated by grammar G). In the following code, the first string is in L(G), but the second string is not in L(G).
String [] stringArray = { "aaaaaabbbbbbaaabbbaaaaaabbbbbb",
"aaabbbaaaaaabbbbbbaaabbbb" } ;
for ( int i = 0 ; i < stringArray.length ; i ++ )
{
CharStream charStream = CharStreams.fromString ( stringArray[i] ) ;
GLexer lexer = new GLexer( charStream ) ;
CommonTokenStream tokens = new CommonTokenStream( lexer ) ;
GParser parser = new GParser( tokens ) ;
ParseTree tree = parser.s() ;
} // end for i loop
I'd like the code to print messages such as …
The string "aaaaaabbbbbbaaabbbaaaaaabbbbbb" is in L(G).
… and …
The string "aaabbbaaaaaabbbbbbaaabbbb" is not in L(G).
How does one programmatically distinguish between strings that parse successfully (and are in L(G)), and strings that fail to parse successfully (and are not in L(G)) ?

Conjunction for Verb Phrase on GF

Problem
I'm trying to generate the sentence It's sunny on Monday and rainy on Tuesday on GF using RGL. I looked for a method to generate this sentence on the RGL page, but I couldn't find anything that might help with this. Checked Extend.gf on GitHub for more information about GF, and I found these three lines:
MkVPS : Temp -> Pol -> VP -> VPS ; -- hasn't slept
ConjVPS : Conj -> [VPS] -> VPS ; -- has walked and won't sleep
PredVPS : NP -> VPS -> S ; -- she [has walked and won't sleep]
They seemed promising at first glance, but when I tried implementing them on a real code, it seems like I misused [VPS]. My code:
mkPhr(PredVPS
(it_NP)
(ConjVPS
(and_Conj)
(MkVPS
(mkTemp (futureTense) (simultaneousAnt))
(positivePol)
(mkVP
(mkVP (mkA "sunny"))
(SyntaxEng.mkAdv (on_Prep) (mkNP (mkN ("Monday"))))))
(MkVPS
(mkTemp (futureTense) (simultaneousAnt))
(positivePol)
(mkVP
(mkVP (mkA "rainy"))
(SyntaxEng.mkAdv (on_Prep) (mkNP (mkN ("Tuesday"))))))));
But I ran into this error, which obviously a problem with the defined variable and the expected one.
missing record fields: s1, s2 type of MkVPS (mkTemp futureTense simultaneousAnt) positivePol (AdvVP ((\a -> UseComp (CompAP (PositA a))) (regA "rainy")) (PrepNP on_Prep ((\n -> MassNP (UseN n)) (regN "Monday"))))
expected: {s1 : ResEng.Agr => Str; s2 : ResEng.Agr => Str}
inferred: {s : ResEng.Agr => Str; lock_VPS : {}}
Question
What is the correct way to use [VPS]?
Clarification on lists
Just like with other list categories C, you need to use a constructor that takes two (or more) Cs and creates a [C].
For categories that are in the RGL API, there are convenience opers of type mkC : Conj -> C -> C -> C, but under the hood, those opers also need to call the proper constructors for [C]. (The constructors are called BaseC and ConsC, and you can read more on lists here.)
How to use conjunctions with VPSs
So VPS is not in the API, so there is no convenience oper with type signature Conj -> VPS -> VPS -> VPS. Instead, you need to call BaseVPS explicitly. Here is working code, I cut your long expression into smaller pieces.
resource VPS = open SyntaxEng, ParadigmsEng, ExtendEng in {
oper
-- Lexicon
sunny_A : A = mkA "sunny" ;
rainy_A : A = mkA "rainy" ;
monday_N : N = mkN "Monday" ;
tuesday_N : N = mkN "Tuesday" ;
-- Helper functions
adj_on_day : A -> N -> VP = \a,n ->
mkVP (mkVP a) (SyntaxEng.mkAdv on_Prep (mkNP n)) ;
sunny_on_Monday_VP : VP = adj_on_day sunny_A monday_N ;
rainy_on_Tuesday_VP : VP = adj_on_day rainy_A tuesday_N ;
tenseVPS : Tense -> VP -> VPS = \tns,vp ->
MkVPS (mkTemp tns simultaneousAnt) positivePol vp ;
futureVPS = tenseVPS futureTense ;
pastVPS = tenseVPS pastTense ;
-- Constructing the phrase
-- lin: "it will be sunny on Monday and will be rainy on Tuesday"
futFutPhrase : Phr =
mkPhr (
PredVPS it_NP
(ConjVPS -- : Conj -> [VPS] -> VPS
and_Conj -- : Conj
(BaseVPS -- : VPS -> VPS -> [VPS]
(futureVPS sunny_on_Monday_VP) -- : VPS
(futureVPS rainy_on_Tuesday_VP) -- : VPS
)
)
) ;
-- lin: "it was sunny on Monday and will be rainy on Tuesday"
pastFutPhrase : Phr =
mkPhr (
PredVPS it_NP
(ConjVPS -- : Conj -> [VPS] -> VPS
and_Conj -- : Conj
(BaseVPS -- : VPS -> VPS -> [VPS]
(pastVPS sunny_on_Monday_VP) -- : VPS
(futureVPS rainy_on_Tuesday_VP) -- : VPS
)
)
) ;
}
And it works like this:
$ gf
Languages:
> i -retain VPS.gf
> cc -one futFutPhrase
it will be sunny on Monday and will be rainy on Tuesday
> cc -one pastFutPhrase
it was sunny on Monday and will be rainy on Tuesday
So the tenses are repeated in both cases, because the conjunction is on the VPS level, not on the AP level.
How to create the phrase you wanted
If you want to have ellipsis, "it will be sunny on Monday and rainy on Tuesday", you will need to attach the Adv "on Monday" to the AP "sunny" using AdvAP, then do an AP conjunction, turn that AP into VP, and then use that VP in a Cl as you normally would. Here is code, a separate file from the previous:
resource APConj = open SyntaxEng, ParadigmsEng, (A=AdjectiveEng) in {
oper
-- Lexicon
sunny_A : A = mkA "sunny" ;
rainy_A : A = mkA "rainy" ;
monday_N : N = mkN "Monday" ;
tuesday_N : N = mkN "Tuesday" ;
-- Helper functions
adj_on_day : A -> N -> AP = \a,n ->
A.AdvAP (mkAP a) (SyntaxEng.mkAdv on_Prep (mkNP n)) ;
sunny_on_Monday_AP : AP = adj_on_day sunny_A monday_N ;
rainy_on_Tuesday_AP : AP = adj_on_day rainy_A tuesday_N ;
-- Constructing the phrase
sunnyRainyEllipsisPhrase : Phr =
mkPhr (
mkCl (
mkVP (mkAP and_Conj sunny_on_Monday_AP rainy_on_Tuesday_AP)
)
) ;
}
Works like this:
$ gf
Languages:
> i -retain APConj.gf
> cc -one sunnyRainyEllipsisPhrase
it is sunny on Monday and rainy on Tuesday

Grammatical Framework: "linearization type field cannot be Int"; how to write a concrete syntax for grammar with arithmetic expressions?

I'm trying to write a concrete syntax for this grammar (from Chapter 6 in Grammatical Framework: Programming with Multilingual Grammars):
abstract Arithm = {
flags startcat = Prop ;
cat
Prop ; -- proposition
Nat ; -- natural number
fun
Zero : Nat ; -- 0
Succ : Nat -> Nat ; -- the successor of x
Even : Nat -> Prop ; -- x is even
And : Prop -> Prop -> Prop ; -- A and B
}
There are predefined categories for integer, float and string literals (Int, Float and String), and they can be used as arguments to functions, but they may not be value types of any function.
In addition, they may not be used as a field in a linearisation type. This is what I would like to do, using plus defined in Predef.gf:
concrete ArithmEng of Arithm =
open Predef, SymbolicEng, SyntaxEng, ParadigmsEng in
lincat
Prop = S ;
Nat = {s : NP ; n : Int} ;
lin
Zero = mkNat 0 ;
Succ nat = let n' : Int = Predef.plus nat.n 1 in mkNat n' ;
Even nat = mkS (mkCl nat.s (mkA "even")) ;
And p q = mkS and_Conj p q ;
oper
mkNat : Int -> Nat ;
mkNat int = lin Nat {s = symb int ; n = int} ;
} ;
But of course, this does not work: I get the error "linearization type field cannot be Int".
Maybe the right answer to my question is to use another programming language, but I am curious, because this example is left as an exercise to the reader in the GF book, so I would expect it to be solvable.
I can write a unary solution, using the category Digits from Numeral.gf:
concrete ArithmEng of Arithm =
open SyntaxEng, ParadigmsEng, NumeralEng, SymbolicEng, Prelude in {
lincat
Prop = S ;
Nat = {s : NP ; d : Digits ; isZero : Bool} ;
lin
Zero = {s = mkNP (mkN "zero") ; d = IDig D_0 ; isZero = True} ;
Succ nat = case nat.isZero of {
True => mkNat (IDig D_1) ;
False => mkNat (IIDig D_1 nat.d) } ;
Even nat = mkS (mkCl nat.s (mkA "even")) ;
And p q = mkS and_Conj p q ;
oper
mkNat : Digits -> Nat ;
mkNat digs = lin Nat {s = symb (mkN "number") digs ; d = digs ; isZero = False} ;
} ;
This produces the following results:
Arithm> l -bind Even Zero
zero is even
0 msec
Arithm> l -bind Even (Succ Zero)
number 1 is even
0 msec
Arithm> l -bind Even (Succ (Succ (Succ Zero)))
number 111 is even
This is of course a possible answer, but I suspect this is not the way the exercise was intended to be solved. So I wonder if I'm missing something, or if the GF language used to support more operations on Ints?
A possible, but still rather unsatisfactory answer, is to use the parameter Ints n for any natural number n.
Notice the difference:
Int is a literal type, just like String and Float.
All literals have {s : Str} as their lincat in every concrete syntax, and you can't change it in any way.
Since the lincat contains Str (not String), you are not allowed to pattern match it at runtime.
Introducing Ints n: a parameter type
However, Ints n is a parameter type, because it's finite.
You may have seen Ints n in old RGL languages, like the following Finnish grammar:
-- from the Finnish resource grammar
oper
NForms : Type = Predef.Ints 10 => Str ;
nForms10 : (x1,_,_,_,_,_,_,_,_,x10 : Str) -> NForms =
\ukko,ukon,ukkoa,ukkona,ukkoon,
ukkojen,ukkoja,ukkoina,ukoissa,ukkoihin -> table {
0 => ukko ; 1 => ukon ; 2 => ukkoa ;
3 => ukkona ; 4 => ukkoon ; 5 => ukkojen ;
6 => ukkoja ; 7 => ukkoina ; 8 => ukoissa ;
9 => ukkoihin ; 10 => ukko
} ;
What's happening here? This is an inflection table, where the left-hand side is… just numbers, instead of combinations of case and number. (For instance, 5 corresponds to plural genitive. Yes, it is completely unreadable for anyone who didn't write that grammar.)
That same code could as well be written like this:
-- another way to write the example from the Finnish RG
param
NForm = SgNom | SgGen | … | PlGen | PlIll ; -- Named params instead of Ints 10
oper
NForms : Type = NForm => Str ;
nForms10 : (x1,_,_,_,_,_,_,_,_,x10 : Str) -> NForms =
\ukko,ukon,ukkoa,ukkona,ukkoon,
ukkojen,ukkoja,ukkoina,ukoissa,ukkoihin -> table {
SgNom => ukko ;
SgGen => ukon ;
...
PlGen => ukkojen ;
PlIll => ukkoihin
} ;
As you can see, the integer works as a left-hand side of a list: 5 => ukkojen is as valid GF as PlGen => ukkojen. In that particular case, 5 has the type Ints 10.
Anyway, that code was just to show you what Ints n is and how it's used in other grammars than mine, which I'll soon paste here.
Step 1: incrementing the Ints
Initially, I wanted to use Int as a field in my lincat for Nat. But now I will use Ints 100 instead. I linearise Zero as {n = 0}.
concrete ArithmC of Arithm = open Predef in {
lincat
Nat = {n : Ints 100} ;
lin
Zero = {n = 0} ; -- the 0 is of type Ints 100, not Int!
Now we linearise also Succ. And here's the exciting news: we can use Predef.plus on a runtime value, because the runtime value is no longer on Int, but Ints n---which is finite! So we can do this:
lin
Succ x = {n = myPlus1 ! x.n} ;
oper
-- We use Predef.plus on Ints 100, not Int
myPlus1 : Ints 100 => Ints 100 = table {
100 => 100 ; -- Without this line, we get error
n => Predef.plus n 1 -- magic!
} ;
}
As you can see from myPlus1, it's definitely possible to pattern match Ints n at runtime. And we can even use the Predef.plus on it, except that we must cap it at the highest value. Without the line 100 => 100, we get the following error:
- compiling ArithmC.gf... Internal error in GeneratePMCFG:
convertTbl: missing value 101
among 0
1
...
100
Unfortunately, it's restricted to a finite n.
Let's test this in the GF shell:
$ gf
…
> i -retain ArithmC.gf
> cc Zero
{n = 0; lock_Nat = <>}
> cc Succ Zero
{n = 1; lock_Nat = <>}
> cc Succ (Succ (Succ (Succ (Succ Zero))))
{n = 5; lock_Nat = <>}
Technically works, if you can say that. But we can't do anything interesting with it yet.
Step 2: Turn the Ints n into a NP
Previously, we just checked the values in a GF shell with cc (compute_concrete). But the whole task for the grammar was to produce sentences like "2 is even".
The lincat for Int (and all literal types) is {s : Str}. To make a literal into a NP, you can just use the Symbolic module.
But we cannot increment an Int at runtime, so we chose to use Ints 100 instead.
But there is no lincat for Ints n, because it's a param. So the only way I found is to manually define a showNat oper for Ints 100.
This is ugly, but technically it works.
concrete ArithmEng of Arithm =
open Predef, SymbolicEng, SyntaxEng, ParadigmsEng in {
lincat
Prop = S ;
Nat = {s : NP ; n : MyInts} ;
oper
MyInts = Ints 100 ;
lin
Zero = mkNat 0 ;
Succ nat = mkNat (myPlus1 ! nat.n) ;
Even nat = mkS (mkCl nat.s (mkA "even")) ;
And p q = mkS and_Conj p q ;
oper
mkNat : MyInts -> Nat ;
mkNat i = lin Nat {s = symb (showNat ! i) ; n = i} ;
myPlus1 : MyInts => MyInts = table {
100 => 100 ;
n => Predef.plus n 1
} ;
showNat : MyInts => Str ;
showNat = table {
0 => "0" ; 1 => "1" ; 2 => "2" ;
3 => "3" ; 4 => "4" ; 5 => "5" ;
6 => "6" ; 7 => "7" ; 8 => "8" ;
9 => "9" ; 10 => "10" ; 11 => "11" ;
12 => "12" ; 13 => "13" ; 14 => "14" ;
15 => "15" ; 16 => "16" ; 17 => "17" ;
18 => "18" ; 19 => "19" ; 20 => "20" ;
_ => "Too big number, I can't be bothered"
} ;
} ;
Let's test in the GF shell:
Arithm> gr | l -treebank
Arithm: Even (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ Zero))))))))))))
ArithmEng: 12 is even
So yes, technically it works, but it's unsatisfactory. It only works for a finite n, and I had to type out a bunch of boilerplate in the showNat oper. I'm still unsure if this was the intended way by the GF book, or if GF used to support more operations on Int.
Other solutions
Here's a solution by daherb, where Zero outputs the string "0" and Succ outputs the string "+1", and the final output is evaluated in an external programming language.

How can I build an ANTLR Works style parse tree?

I've read that you need to use the '^' and '!' operators in order to build a parse tree similar to the ones displayed in ANTLR Works (even though you don't need to use them to get a nice tree in ANTLR Works). My question then is how can I build such a tree? I've seen a few pages on tree construction using the two operators and rewrites, and yet say I have an input string abc abc123 and a grammar:
grammar test;
program : idList;
idList : id* ;
id : ID ;
ID : LETTER (LETTER | NUMBER)* ;
LETTER : 'a' .. 'z' | 'A' .. 'Z' ;
NUMBER : '0' .. '9' ;
ANTLR Works will output:
What I dont understand is how you can get the 'idList' node on top of this tree (as well as the grammar one as a matter of fact). How can I reproduce this tree using rewrites and those operators?
What I dont understand is how you can get the 'idList' node on top of this tree (as well as the grammar one as a matter of fact). How can I reproduce this tree using rewrites and those operators?
You can't use ^ and ! alone. These operators only operate on existing tokens, while you want to create extra tokens (and make these the root of your sub trees). You can do that using rewrite rules and defining some imaginary tokens.
A quick demo:
grammar test;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
IdList;
Id;
}
#parser::members {
private static void walk(CommonTree tree, int indent) {
if(tree == null) return;
for(int i = 0; i < indent; i++, System.out.print(" "));
System.out.println(tree.getText());
for(int i = 0; i < tree.getChildCount(); i++) {
walk((CommonTree)tree.getChild(i), indent + 1);
}
}
public static void main(String[] args) throws Exception {
testLexer lexer = new testLexer(new ANTLRStringStream("abc abc123"));
testParser parser = new testParser(new CommonTokenStream(lexer));
walk((CommonTree)parser.program().getTree(), 0);
}
}
program : idList EOF -> idList;
idList : id* -> ^(IdList id*);
id : ID -> ^(Id ID);
ID : LETTER (LETTER | DIGIT)*;
SPACE : ' ' {skip();};
fragment LETTER : 'a' .. 'z' | 'A' .. 'Z';
fragment DIGIT : '0' .. '9';
If you run the demo above, you will see the following being printed to the console:
IdList
Id
abc
Id
abc123
As you can see, imaginary tokens must also start with an upper case letter, just like lexer rules. If you want to give the imaginary tokens the same text as the parser rule they represent, do something like this instead:
idList : id* -> ^(IdList["idList"] id*);
id : ID -> ^(Id["id"] ID);
which will print:
idList
id
abc
id
abc123

Antlr grammar multiplicity problem of tree in tree grammar

I have a simple grammar
options {
language = Java;
output = AST;
ASTLabelType=CommonTree;
}
tokens {
DEF;
}
root
: ID '=' NUM (',' ID '=' NUM)* -> ^(DEF ID NUM)+
;
and the corresponding tree grammar:
options {
tokenVocab=SimpleGrammar;
ASTLabelType=CommonTree;
}
root
: ^(DEF ID NUM)+
;
However antlr (v3.3) cannot compile this tree grammar I'm getting:
syntax error: antlr: unexpected token: +
|---> : ^(DEF ID NUM)+
Also it don't works if I want to create it as ^(ROOT ^(DEF ID NUM)+)
I want a tree that is corresponds to this (as parse creates it as well) :
(ROOT (DEF aa 11) (DEF bb 22) (DEF cc 33))
Thus antlr is capable to generate the tree in parser but not capable to parse it with tree grammar?!
Why this happens?
In order to get (ROOT (DEF aa 11) (DEF bb 22) (DEF cc 33)) you can define the following parser rules:
tokens {
ROOT;
DEF;
}
root
: def (',' def)* -> ^(ROOT def+)
;
def
: ID '=' NUM -> ^(DEF ID NUM)
;
and then your tree grammar would contain:
root
: ^(ROOT def+)
;
def
: ^(DEF ID NUM)
;