grammar of a new language [closed] - grammar

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have this grammar which I am not able to understand as to how to make a parser for it:
module = properties fields methods module#3 'end'
properties = list#0 (property add#2)*
property = 'class' 'name' class# ';'
fields = list#0 (field add#2)*
field = type list#0 id add#2 [';'/','] ! (',' id add#2)* ';' field#2
methods = list#0 (method add#2)*
method = (type id / nothing#0 id) ! '(' args ')' follow method#4
args = list#0 (arg add#2 (',' arg add#2)*)?
arg = type id ! arg#2 / nothing#0 id ! arg#2
statements = list#0 (statement add#2)*
statement = do / jump / compound / simple
follow = block / jump / compound / simple
jump = break / continue / return
compound = if / while
simple = local / assign
do = 'do' '{' statements '}' do#1
block = '{' statements '}' block#1
break = 'break' ';' break#0
continue = 'continue' ';' continue#0
return = 'return' (exp / nothing#0) ';' return#1
if = 'if' '(' exp ')' follow ('else' follow / nothing#0) if#3
while = 'while' '(' exp ')' follow while#2
local = type id ! init? local#2 ';'
init = 'assign' exp assign#2 / '.' id dot#2 '(' exps ')' call#2
assign = id 'assign' ! exp assign#2 ';'
exp = id ( '(' exps ')' call#2 / '.' id dot#2 '(' exps ')' call#2 )?
exps = list#0 (exp add#2 (',' exp add#2)*)?
type = 'name' type#
id = 'name' id#
'.' = 'DOT'
Could anyone please make me comprehend this grammar.
thanks

This:
module = properties fields methods module#3 'end'
means that:
a module consists of properties, followed by fields, followed by methods, followed by the word "end"
so in order to parse a module the compiler should:
parse properties
parse fields
parse methods
match the word "end"
The item with the "#" is the type of syntax node it should create, the number indicating how many of the previous parse results should be passed to it.
In the Python language the code might look something like this:
def parse_module():
properties = parse_properties()
fields = parse_fields()
methods = parse_methods()
module = make_module(properties, fields, methods)
match("end")
return module
A "/" separates alternatives, "(...)*" indicates any number of repeats, and "?" indicates an optional item.

Related

Why does the DSL show "Couldn't resolve reference to "?

I am implementing a grammar with three sections. In the first section I declare components with their interfaces, for instance Component A with interfaces interface_1, interface_2. In the third section I declare some restrictions, for instance component A can acces component B through interface XXXX. When I try to cross-reference the interfaces of a component I get the error "Couldn't resolve reference to ProbeInterface 'interface_1'"?.
I tried several examples from internet but none of them works to my case.
This is part of my grammar:
ArchitectureDefinition:
'Abstractions' '{' abstractions += DSLAbstraction+ '}'
'Compositions' '{' compositions += DSLComposition* '}'
'Restrictions' '{' restrictions += DSLRestriction* '}'
;
DSLComposition:
DSLProbe|DSLSensor
;
DSLRestriction:
'sensor' t=[DSLSensor] 'must-access-probe' type = [DSLProbe] 'through-interface' probeinterface=[ProbeInterface] ';'
;
DSLSensor:
'Sensor' name=ID ';'
;
DSLProbe:
'Probe' name=ID ('with-interface' probeinterface=ProbeInterface)? ';'
;
ProbeInterface :
name+=ID (',' name+=ID)*
;
And the implementation:
Abstractions
{
Sensor sensor_1 ;
Probe probe_1 with-interface interface_1, interface_2;
}
Compositions{}
Restrictions
{
sensor sensor_1 must-access-probe probe_1 through-interface
interface_1;
}
I expect that interface_1 or interface_2 can be referenced by the grammar.
Thanks.
the grammar you posted is incomplete
the way you define the interfaces is really bad.
default naming works only with single valued name attributes
ProbeInterface :
name+=ID (',' name+=ID)*
;
better
DSLProbe:
'Probe' name=ID ('with-interface' probeinterfaces+=ProbeInterface ("," probeinterfaces+=ProbeInterface)*)? ';'
;
ProbeInterface :
name=ID
;
it looks like the qualified name of a Interface is
<probename>.<interfacename>
you either have to adapt the name provider
or the grammar and model to use qualiedname ref=[Thing|FQN] with FQN: ID ("." ID)*;
or you implement scoping properly which is what you want to do likely in your case since you want to restrict the inferfaces for specific probes
here is a sample
override getScope(EObject context, EReference reference) {
if (reference === MyDslPackage.Literals.DSL_RESTRICTION__PROBEINTERFACE) {
if (context instanceof DSLRestriction) {
val probe = context.type
return Scopes.scopeFor(probe.probeinterfaces)
}
}
super.getScope(context, reference)
}

ANTLR3 - Decision can match input using multiple alternatives

When running ANTLR3 on the following code, I get the message - warning(200): MYGRAMMAR.g:40:36: Decision can match input such as "QMARK" using multiple alternatives: 3, 4
As a result, alternative(s) 4 were disabled for that input.
The warning message is pointing me to postfixExpr. Is there a way to fix this?
grammar MYGRAMMAR;
options {language = C;}
tokens {
BANG = '!';
COLON = ':';
FALSE_LITERAL = 'false';
GREATER = '>';
LSHIFT = '<<';
MINUS = '-';
MINUS_MINUS = '--';
PLUS = '+';
PLUS_PLUS = '++';
QMARK = '?';
QMARK_COLON = '?:';
TILDE = '~';
TRUE_LITERAL = 'true';
}
condExpr
: shiftExpr (QMARK condExpr COLON condExpr)? ;
shiftExpr
: addExpr ( shiftOp addExpr)* ;
addExpr
: qmarkColonExpr ( addOp qmarkColonExpr)* ;
qmarkColonExpr
: prefixExpr ( QMARK_COLON prefixExpr )? ;
prefixExpr
: ( prefixOrUnaryMinus | postfixExpr) ;
prefixOrUnaryMinus
: prefixOp prefixExpr ;
postfixExpr
: primaryExpr ( postfixOp | BANG | QMARK )*;
primaryExpr
: literal ;
shiftOp
: ( LSHIFT | rShift);
addOp
: (PLUS | MINUS);
prefixOp
: ( BANG | MINUS | TILDE | PLUS_PLUS | MINUS_MINUS );
postfixOp
: (PLUS_PLUS | MINUS_MINUS);
rShift
: (GREATER GREATER)=> a=GREATER b=GREATER {assertNoSpace($a,$b)}? ;
literal
: ( TRUE_LITERAL | FALSE_LITERAL );
assertNoSpace [pANTLR3_COMMON_TOKEN t1, pANTLR3_COMMON_TOKEN t2]
: { $t1->line == $t2->line && $t1->getCharPositionInLine($t1) + 1 == $t2->getCharPositionInLine($t2) }? ;
I think one problem is that PLUS_PLUS as well as MINUS_MINUS will never be matched as they are defined after the respective PLUS or MINUS token. therefore the lexer will always output two PLUS tokens instead of one PLUS_PLUS token.
In order to avaoid something like this you have to define your PLUS_PLUS or MINUS_MINUS token before the PLUS or MINUS token as the lexer processes them in the order they are defined and won't look any further once it found a way to match the current input.
The same problem applies to QMARK_COLON as it is defined after QMARK (this only is a problem because there is another token type COLON to match the following colon).
See if fixing the ambiguities resolves the error message.

PEGJS predicate grammar

I need to create a grammar with the help of predicate. The below grammar fails for the given case.
startRule = a:namespace DOT b:id OPEN_BRACE CLOSE_BRACE {return {"namespace": a, "name": b}}
namespace = id (DOT id)*
DOT = '.';
OPEN_BRACE = '(';
CLOSE_BRACE = ')';
id = [a-zA-Z]+;
It fails for the given input as
com.mytest.create();
which should have given "create" as value of "name" key in the result part.
Any help would be great.
There are several things here.
The most important, is that you must be aware that PEG is greedy. That means that your (DOT id)* rule matches ALL the DOT id sequences, including the one that you have in startRule as DOT b:id.
That can be solved using lookahead.
The other thing is that you must remember to use join, since by default it will return each character as the member of an array.
I also added a rule for semicolons.
Try this:
start =
namespace:namespace DOT name:string OPEN_BRACE CLOSE_BRACE SM nl?
{
return { namespace : namespace, name : name };
}
/* Here I'm using the lookahead: (member !OPEN_BRACE)* */
namespace =
first:string rest:(member !OPEN_BRACE)*
{
rest = rest.map(function (x) { return x[0]; });
rest.unshift(first);
return rest;
}
member =
DOT str:string
{ return str; }
DOT =
'.'
OPEN_BRACE =
'('
CLOSE_BRACE =
')'
SM =
';'
nl =
"\n"
string =
str:[a-zA-Z]+
{ return str.join(''); }
And as far I can tell, I'm parsing that line correctly.

Antrl lexer/parser exception understanding

I have the following language i wish to parse using antlr 1.2.2.
TEST <name>
{
<param_name> = <param value>;
}
while
<...> - means user value, not part of the language keywords
for example
TEST myTest
{
my_param = 1.0;
}
the value can be an integer, a real or a quated string
my_param = 1.0;, my_param = 1; and my_param = "myStringValue"; are all valid inputs.
here is the grammer for this parsing.
parse_test : TESTKEYWORD TEST_NAME '{' param_value_def '}';
param_value_def : ID EQUALS param_value ';';
param_value : REAL|INTEGER|QUOTED_STRING;
TESTKEYWORD : 'TEST';
QUOTED_STRING : '"' ~('"')* '"';
INTEGER : MINUS? DIGIT DIGIT*
REAL : INTEGER '.' DIGIT DIGIT*;
EQUALS : '=';
fragment
MINUS : '-';
fragment
DIGIT : '0'..'9';
when i feed the sample input to the antlr interpreter, i get a `MismatchedTokenException' related to the param_value rule.
can you help me cipher the error message and what i am doing wrong?
thanks
Although ANTLRWorks is not a tool well written, you can use its debugger to see which token in the input leads to this exception, and then you can see which rules need to be revised (since you did not post the full grammar).
http://www.antlr.org/works/index.html

Whats the correct way to add new tokens (rewrite) to create AST nodes that are not on the input steam

I've a pretty basic math expression grammar for ANTLR here and what's of interest is handling the implied * operator between parentheses e.g. (2-3)(4+5)(6*7) should actually be (2-3)*(4+5)*(6*7).
Given the input (2-3)(4+5)(6*7) I'm trying to add the missing * operator to the AST tree while parsing, in the following grammar I think I've managed to achieve that but I'm wondering if this is the correct, most elegant way?
grammar G;
options {
language = Java;
output=AST;
ASTLabelType=CommonTree;
}
tokens {
ADD = '+' ;
SUB = '-' ;
MUL = '*' ;
DIV = '/' ;
OPARN = '(' ;
CPARN = ')' ;
}
start
: expression EOF!
;
expression
: mult (( ADD^ | SUB^ ) mult)*
;
mult
: atom (( MUL^ | DIV^) atom)*
;
atom
: INTEGER
| (
OPARN expression CPARN -> expression
)
(
OPARN expression CPARN -> ^(MUL expression)+
)*
;
INTEGER : ('0'..'9')+ ;
WS : (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;};
This grammar appears to output the correct AST Tree in ANTLRworks:
I'm only just starting to get to grips with parsing and ANTLR, don't have much experience so feedback with really appreciated!
Thanks in advance! Carl
First of all, you did a great job given the fact that you've never used ANTLR before.
You can omit the language=Java and ASTLabelType=CommonTree, which are the default values. So you can just do:
options {
output=AST;
}
Also, you don't have to specify the root node for each operator separately. So you don't have to do:
(ADD^ | SUB^)
but the following:
(ADD | SUB)^
will suffice. With only two operators, there's not much difference, but when implementing relational operators (>=, <=, > and <), the latter is a bit easier.
Now, for you AST: you'll probably want to create a binary tree: that way, all internal nodes are operators, and the leafs will be operands which makes the actual evaluating of your expressions much easier. To get a binary tree, you'll have to change your atom rule slightly:
atom
: INTEGER
| (
OPARN expression CPARN -> expression
)
(
OPARN e=expression CPARN -> ^(MUL $atom $e)
)*
;
which produces the following AST given the input "(2-3)(4+5)(6*7)":
(image produced by: graphviz-dev.appspot.com)
The DOT file was generated with the following test-class:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
GLexer lexer = new GLexer(new ANTLRStringStream("(2-3)(4+5)(6*7)"));
GParser parser = new GParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.start().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}