I am attempting to mimic the grammar G shown below using ANTLR v4.9.3 …
My attempt to do so is shown below …
grammar G ;
s : t s | ;
t : 'aaa' t 'bbb' | ;
I invoke the ANTLR Tool as follows …
org.antlr.v4.Tool G.g4
The tool's response is …
The following sets of rules are mutually left-recursive [s]
My question is …
How does one create an ANTLR grammar for grammar G ?
In order to eliminate the errors, grammar G has been updated to …
grammar G ;
s : t* EOF ;
t : 'aaa' 'bbb' | 'aaa' t 'bbb' ;
I am also trying to programatically distinguish between strings that are in L(G), and strings that are not in L(G) (where L(G) is the language generated by grammar G). In the following code, the first string is in L(G), but the second string is not in L(G).
String [] stringArray = { "aaaaaabbbbbbaaabbbaaaaaabbbbbb",
"aaabbbaaaaaabbbbbbaaabbbb" } ;
for ( int i = 0 ; i < stringArray.length ; i ++ )
{
CharStream charStream = CharStreams.fromString ( stringArray[i] ) ;
GLexer lexer = new GLexer( charStream ) ;
CommonTokenStream tokens = new CommonTokenStream( lexer ) ;
GParser parser = new GParser( tokens ) ;
ParseTree tree = parser.s() ;
} // end for i loop
I'd like the code to print messages such as …
The string "aaaaaabbbbbbaaabbbaaaaaabbbbbb" is in L(G).
… and …
The string "aaabbbaaaaaabbbbbbaaabbbb" is not in L(G).
How does one programmatically distinguish between strings that parse successfully (and are in L(G)), and strings that fail to parse successfully (and are not in L(G)) ?
Problem
I'm trying to generate the sentence It's sunny on Monday and rainy on Tuesday on GF using RGL. I looked for a method to generate this sentence on the RGL page, but I couldn't find anything that might help with this. Checked Extend.gf on GitHub for more information about GF, and I found these three lines:
MkVPS : Temp -> Pol -> VP -> VPS ; -- hasn't slept
ConjVPS : Conj -> [VPS] -> VPS ; -- has walked and won't sleep
PredVPS : NP -> VPS -> S ; -- she [has walked and won't sleep]
They seemed promising at first glance, but when I tried implementing them on a real code, it seems like I misused [VPS]. My code:
mkPhr(PredVPS
(it_NP)
(ConjVPS
(and_Conj)
(MkVPS
(mkTemp (futureTense) (simultaneousAnt))
(positivePol)
(mkVP
(mkVP (mkA "sunny"))
(SyntaxEng.mkAdv (on_Prep) (mkNP (mkN ("Monday"))))))
(MkVPS
(mkTemp (futureTense) (simultaneousAnt))
(positivePol)
(mkVP
(mkVP (mkA "rainy"))
(SyntaxEng.mkAdv (on_Prep) (mkNP (mkN ("Tuesday"))))))));
But I ran into this error, which obviously a problem with the defined variable and the expected one.
missing record fields: s1, s2 type of MkVPS (mkTemp futureTense simultaneousAnt) positivePol (AdvVP ((\a -> UseComp (CompAP (PositA a))) (regA "rainy")) (PrepNP on_Prep ((\n -> MassNP (UseN n)) (regN "Monday"))))
expected: {s1 : ResEng.Agr => Str; s2 : ResEng.Agr => Str}
inferred: {s : ResEng.Agr => Str; lock_VPS : {}}
Question
What is the correct way to use [VPS]?
Clarification on lists
Just like with other list categories C, you need to use a constructor that takes two (or more) Cs and creates a [C].
For categories that are in the RGL API, there are convenience opers of type mkC : Conj -> C -> C -> C, but under the hood, those opers also need to call the proper constructors for [C]. (The constructors are called BaseC and ConsC, and you can read more on lists here.)
How to use conjunctions with VPSs
So VPS is not in the API, so there is no convenience oper with type signature Conj -> VPS -> VPS -> VPS. Instead, you need to call BaseVPS explicitly. Here is working code, I cut your long expression into smaller pieces.
resource VPS = open SyntaxEng, ParadigmsEng, ExtendEng in {
oper
-- Lexicon
sunny_A : A = mkA "sunny" ;
rainy_A : A = mkA "rainy" ;
monday_N : N = mkN "Monday" ;
tuesday_N : N = mkN "Tuesday" ;
-- Helper functions
adj_on_day : A -> N -> VP = \a,n ->
mkVP (mkVP a) (SyntaxEng.mkAdv on_Prep (mkNP n)) ;
sunny_on_Monday_VP : VP = adj_on_day sunny_A monday_N ;
rainy_on_Tuesday_VP : VP = adj_on_day rainy_A tuesday_N ;
tenseVPS : Tense -> VP -> VPS = \tns,vp ->
MkVPS (mkTemp tns simultaneousAnt) positivePol vp ;
futureVPS = tenseVPS futureTense ;
pastVPS = tenseVPS pastTense ;
-- Constructing the phrase
-- lin: "it will be sunny on Monday and will be rainy on Tuesday"
futFutPhrase : Phr =
mkPhr (
PredVPS it_NP
(ConjVPS -- : Conj -> [VPS] -> VPS
and_Conj -- : Conj
(BaseVPS -- : VPS -> VPS -> [VPS]
(futureVPS sunny_on_Monday_VP) -- : VPS
(futureVPS rainy_on_Tuesday_VP) -- : VPS
)
)
) ;
-- lin: "it was sunny on Monday and will be rainy on Tuesday"
pastFutPhrase : Phr =
mkPhr (
PredVPS it_NP
(ConjVPS -- : Conj -> [VPS] -> VPS
and_Conj -- : Conj
(BaseVPS -- : VPS -> VPS -> [VPS]
(pastVPS sunny_on_Monday_VP) -- : VPS
(futureVPS rainy_on_Tuesday_VP) -- : VPS
)
)
) ;
}
And it works like this:
$ gf
Languages:
> i -retain VPS.gf
> cc -one futFutPhrase
it will be sunny on Monday and will be rainy on Tuesday
> cc -one pastFutPhrase
it was sunny on Monday and will be rainy on Tuesday
So the tenses are repeated in both cases, because the conjunction is on the VPS level, not on the AP level.
How to create the phrase you wanted
If you want to have ellipsis, "it will be sunny on Monday and rainy on Tuesday", you will need to attach the Adv "on Monday" to the AP "sunny" using AdvAP, then do an AP conjunction, turn that AP into VP, and then use that VP in a Cl as you normally would. Here is code, a separate file from the previous:
resource APConj = open SyntaxEng, ParadigmsEng, (A=AdjectiveEng) in {
oper
-- Lexicon
sunny_A : A = mkA "sunny" ;
rainy_A : A = mkA "rainy" ;
monday_N : N = mkN "Monday" ;
tuesday_N : N = mkN "Tuesday" ;
-- Helper functions
adj_on_day : A -> N -> AP = \a,n ->
A.AdvAP (mkAP a) (SyntaxEng.mkAdv on_Prep (mkNP n)) ;
sunny_on_Monday_AP : AP = adj_on_day sunny_A monday_N ;
rainy_on_Tuesday_AP : AP = adj_on_day rainy_A tuesday_N ;
-- Constructing the phrase
sunnyRainyEllipsisPhrase : Phr =
mkPhr (
mkCl (
mkVP (mkAP and_Conj sunny_on_Monday_AP rainy_on_Tuesday_AP)
)
) ;
}
Works like this:
$ gf
Languages:
> i -retain APConj.gf
> cc -one sunnyRainyEllipsisPhrase
it is sunny on Monday and rainy on Tuesday
I'm trying to write a concrete syntax for this grammar (from Chapter 6 in Grammatical Framework: Programming with Multilingual Grammars):
abstract Arithm = {
flags startcat = Prop ;
cat
Prop ; -- proposition
Nat ; -- natural number
fun
Zero : Nat ; -- 0
Succ : Nat -> Nat ; -- the successor of x
Even : Nat -> Prop ; -- x is even
And : Prop -> Prop -> Prop ; -- A and B
}
There are predefined categories for integer, float and string literals (Int, Float and String), and they can be used as arguments to functions, but they may not be value types of any function.
In addition, they may not be used as a field in a linearisation type. This is what I would like to do, using plus defined in Predef.gf:
concrete ArithmEng of Arithm =
open Predef, SymbolicEng, SyntaxEng, ParadigmsEng in
lincat
Prop = S ;
Nat = {s : NP ; n : Int} ;
lin
Zero = mkNat 0 ;
Succ nat = let n' : Int = Predef.plus nat.n 1 in mkNat n' ;
Even nat = mkS (mkCl nat.s (mkA "even")) ;
And p q = mkS and_Conj p q ;
oper
mkNat : Int -> Nat ;
mkNat int = lin Nat {s = symb int ; n = int} ;
} ;
But of course, this does not work: I get the error "linearization type field cannot be Int".
Maybe the right answer to my question is to use another programming language, but I am curious, because this example is left as an exercise to the reader in the GF book, so I would expect it to be solvable.
I can write a unary solution, using the category Digits from Numeral.gf:
concrete ArithmEng of Arithm =
open SyntaxEng, ParadigmsEng, NumeralEng, SymbolicEng, Prelude in {
lincat
Prop = S ;
Nat = {s : NP ; d : Digits ; isZero : Bool} ;
lin
Zero = {s = mkNP (mkN "zero") ; d = IDig D_0 ; isZero = True} ;
Succ nat = case nat.isZero of {
True => mkNat (IDig D_1) ;
False => mkNat (IIDig D_1 nat.d) } ;
Even nat = mkS (mkCl nat.s (mkA "even")) ;
And p q = mkS and_Conj p q ;
oper
mkNat : Digits -> Nat ;
mkNat digs = lin Nat {s = symb (mkN "number") digs ; d = digs ; isZero = False} ;
} ;
This produces the following results:
Arithm> l -bind Even Zero
zero is even
0 msec
Arithm> l -bind Even (Succ Zero)
number 1 is even
0 msec
Arithm> l -bind Even (Succ (Succ (Succ Zero)))
number 111 is even
This is of course a possible answer, but I suspect this is not the way the exercise was intended to be solved. So I wonder if I'm missing something, or if the GF language used to support more operations on Ints?
A possible, but still rather unsatisfactory answer, is to use the parameter Ints n for any natural number n.
Notice the difference:
Int is a literal type, just like String and Float.
All literals have {s : Str} as their lincat in every concrete syntax, and you can't change it in any way.
Since the lincat contains Str (not String), you are not allowed to pattern match it at runtime.
Introducing Ints n: a parameter type
However, Ints n is a parameter type, because it's finite.
You may have seen Ints n in old RGL languages, like the following Finnish grammar:
-- from the Finnish resource grammar
oper
NForms : Type = Predef.Ints 10 => Str ;
nForms10 : (x1,_,_,_,_,_,_,_,_,x10 : Str) -> NForms =
\ukko,ukon,ukkoa,ukkona,ukkoon,
ukkojen,ukkoja,ukkoina,ukoissa,ukkoihin -> table {
0 => ukko ; 1 => ukon ; 2 => ukkoa ;
3 => ukkona ; 4 => ukkoon ; 5 => ukkojen ;
6 => ukkoja ; 7 => ukkoina ; 8 => ukoissa ;
9 => ukkoihin ; 10 => ukko
} ;
What's happening here? This is an inflection table, where the left-hand side is… just numbers, instead of combinations of case and number. (For instance, 5 corresponds to plural genitive. Yes, it is completely unreadable for anyone who didn't write that grammar.)
That same code could as well be written like this:
-- another way to write the example from the Finnish RG
param
NForm = SgNom | SgGen | … | PlGen | PlIll ; -- Named params instead of Ints 10
oper
NForms : Type = NForm => Str ;
nForms10 : (x1,_,_,_,_,_,_,_,_,x10 : Str) -> NForms =
\ukko,ukon,ukkoa,ukkona,ukkoon,
ukkojen,ukkoja,ukkoina,ukoissa,ukkoihin -> table {
SgNom => ukko ;
SgGen => ukon ;
...
PlGen => ukkojen ;
PlIll => ukkoihin
} ;
As you can see, the integer works as a left-hand side of a list: 5 => ukkojen is as valid GF as PlGen => ukkojen. In that particular case, 5 has the type Ints 10.
Anyway, that code was just to show you what Ints n is and how it's used in other grammars than mine, which I'll soon paste here.
Step 1: incrementing the Ints
Initially, I wanted to use Int as a field in my lincat for Nat. But now I will use Ints 100 instead. I linearise Zero as {n = 0}.
concrete ArithmC of Arithm = open Predef in {
lincat
Nat = {n : Ints 100} ;
lin
Zero = {n = 0} ; -- the 0 is of type Ints 100, not Int!
Now we linearise also Succ. And here's the exciting news: we can use Predef.plus on a runtime value, because the runtime value is no longer on Int, but Ints n---which is finite! So we can do this:
lin
Succ x = {n = myPlus1 ! x.n} ;
oper
-- We use Predef.plus on Ints 100, not Int
myPlus1 : Ints 100 => Ints 100 = table {
100 => 100 ; -- Without this line, we get error
n => Predef.plus n 1 -- magic!
} ;
}
As you can see from myPlus1, it's definitely possible to pattern match Ints n at runtime. And we can even use the Predef.plus on it, except that we must cap it at the highest value. Without the line 100 => 100, we get the following error:
- compiling ArithmC.gf... Internal error in GeneratePMCFG:
convertTbl: missing value 101
among 0
1
...
100
Unfortunately, it's restricted to a finite n.
Let's test this in the GF shell:
$ gf
…
> i -retain ArithmC.gf
> cc Zero
{n = 0; lock_Nat = <>}
> cc Succ Zero
{n = 1; lock_Nat = <>}
> cc Succ (Succ (Succ (Succ (Succ Zero))))
{n = 5; lock_Nat = <>}
Technically works, if you can say that. But we can't do anything interesting with it yet.
Step 2: Turn the Ints n into a NP
Previously, we just checked the values in a GF shell with cc (compute_concrete). But the whole task for the grammar was to produce sentences like "2 is even".
The lincat for Int (and all literal types) is {s : Str}. To make a literal into a NP, you can just use the Symbolic module.
But we cannot increment an Int at runtime, so we chose to use Ints 100 instead.
But there is no lincat for Ints n, because it's a param. So the only way I found is to manually define a showNat oper for Ints 100.
This is ugly, but technically it works.
concrete ArithmEng of Arithm =
open Predef, SymbolicEng, SyntaxEng, ParadigmsEng in {
lincat
Prop = S ;
Nat = {s : NP ; n : MyInts} ;
oper
MyInts = Ints 100 ;
lin
Zero = mkNat 0 ;
Succ nat = mkNat (myPlus1 ! nat.n) ;
Even nat = mkS (mkCl nat.s (mkA "even")) ;
And p q = mkS and_Conj p q ;
oper
mkNat : MyInts -> Nat ;
mkNat i = lin Nat {s = symb (showNat ! i) ; n = i} ;
myPlus1 : MyInts => MyInts = table {
100 => 100 ;
n => Predef.plus n 1
} ;
showNat : MyInts => Str ;
showNat = table {
0 => "0" ; 1 => "1" ; 2 => "2" ;
3 => "3" ; 4 => "4" ; 5 => "5" ;
6 => "6" ; 7 => "7" ; 8 => "8" ;
9 => "9" ; 10 => "10" ; 11 => "11" ;
12 => "12" ; 13 => "13" ; 14 => "14" ;
15 => "15" ; 16 => "16" ; 17 => "17" ;
18 => "18" ; 19 => "19" ; 20 => "20" ;
_ => "Too big number, I can't be bothered"
} ;
} ;
Let's test in the GF shell:
Arithm> gr | l -treebank
Arithm: Even (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ Zero))))))))))))
ArithmEng: 12 is even
So yes, technically it works, but it's unsatisfactory. It only works for a finite n, and I had to type out a bunch of boilerplate in the showNat oper. I'm still unsure if this was the intended way by the GF book, or if GF used to support more operations on Int.
Other solutions
Here's a solution by daherb, where Zero outputs the string "0" and Succ outputs the string "+1", and the final output is evaluated in an external programming language.
I've read that you need to use the '^' and '!' operators in order to build a parse tree similar to the ones displayed in ANTLR Works (even though you don't need to use them to get a nice tree in ANTLR Works). My question then is how can I build such a tree? I've seen a few pages on tree construction using the two operators and rewrites, and yet say I have an input string abc abc123 and a grammar:
grammar test;
program : idList;
idList : id* ;
id : ID ;
ID : LETTER (LETTER | NUMBER)* ;
LETTER : 'a' .. 'z' | 'A' .. 'Z' ;
NUMBER : '0' .. '9' ;
ANTLR Works will output:
What I dont understand is how you can get the 'idList' node on top of this tree (as well as the grammar one as a matter of fact). How can I reproduce this tree using rewrites and those operators?
What I dont understand is how you can get the 'idList' node on top of this tree (as well as the grammar one as a matter of fact). How can I reproduce this tree using rewrites and those operators?
You can't use ^ and ! alone. These operators only operate on existing tokens, while you want to create extra tokens (and make these the root of your sub trees). You can do that using rewrite rules and defining some imaginary tokens.
A quick demo:
grammar test;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
IdList;
Id;
}
#parser::members {
private static void walk(CommonTree tree, int indent) {
if(tree == null) return;
for(int i = 0; i < indent; i++, System.out.print(" "));
System.out.println(tree.getText());
for(int i = 0; i < tree.getChildCount(); i++) {
walk((CommonTree)tree.getChild(i), indent + 1);
}
}
public static void main(String[] args) throws Exception {
testLexer lexer = new testLexer(new ANTLRStringStream("abc abc123"));
testParser parser = new testParser(new CommonTokenStream(lexer));
walk((CommonTree)parser.program().getTree(), 0);
}
}
program : idList EOF -> idList;
idList : id* -> ^(IdList id*);
id : ID -> ^(Id ID);
ID : LETTER (LETTER | DIGIT)*;
SPACE : ' ' {skip();};
fragment LETTER : 'a' .. 'z' | 'A' .. 'Z';
fragment DIGIT : '0' .. '9';
If you run the demo above, you will see the following being printed to the console:
IdList
Id
abc
Id
abc123
As you can see, imaginary tokens must also start with an upper case letter, just like lexer rules. If you want to give the imaginary tokens the same text as the parser rule they represent, do something like this instead:
idList : id* -> ^(IdList["idList"] id*);
id : ID -> ^(Id["id"] ID);
which will print:
idList
id
abc
id
abc123
I have a simple grammar
options {
language = Java;
output = AST;
ASTLabelType=CommonTree;
}
tokens {
DEF;
}
root
: ID '=' NUM (',' ID '=' NUM)* -> ^(DEF ID NUM)+
;
and the corresponding tree grammar:
options {
tokenVocab=SimpleGrammar;
ASTLabelType=CommonTree;
}
root
: ^(DEF ID NUM)+
;
However antlr (v3.3) cannot compile this tree grammar I'm getting:
syntax error: antlr: unexpected token: +
|---> : ^(DEF ID NUM)+
Also it don't works if I want to create it as ^(ROOT ^(DEF ID NUM)+)
I want a tree that is corresponds to this (as parse creates it as well) :
(ROOT (DEF aa 11) (DEF bb 22) (DEF cc 33))
Thus antlr is capable to generate the tree in parser but not capable to parse it with tree grammar?!
Why this happens?
In order to get (ROOT (DEF aa 11) (DEF bb 22) (DEF cc 33)) you can define the following parser rules:
tokens {
ROOT;
DEF;
}
root
: def (',' def)* -> ^(ROOT def+)
;
def
: ID '=' NUM -> ^(DEF ID NUM)
;
and then your tree grammar would contain:
root
: ^(ROOT def+)
;
def
: ^(DEF ID NUM)
;