ANTLR4 generating code for the last expression entered in curly braces - while-loop

I'm building a language primarily used for calculation purposes. It is a small language with C like syntax but extremely limited functionality. For the past few days, I've been trying to generate code that is encapsulated in curly braces however whenever I enter expressions in curly braces, the code generated is always for the last expression entered. It is supposed to work on a while loop.
For example:
while( true )
{
// some expressions (not using any variables for simplicity)
5 + 9;
8 - 10;
4 * 6;
}
However the code generated only takes into account the last expression (4 * 6) in this case.
The link to the code:
https://codeshare.io/GL0xRk
And also, the code snippet for handling curly braces and some other relative code:
calcul returns [String code]
#init
{
$code = new String();
}
#after
{
System.out.print($code);
for( int i = 0; i < getvarg_count(); ++i )
{
System.out.println("POP");
}
System.out.println("HALT");
}
: (decl
{
// declaration
$code += $decl.code;
})*
NEWLINE*
{
$code += "";
}
(instruction
{
// instruction, eg. x = 5; 7 * 4;
$code += $instruction.code;
System.err.println("instruction found");
})*
;
whileStat returns [String code]
: WHILE '(' condition ')' NEWLINE* block
{
int cur_label = nextLabel();
$code = "LABEL " + cur_label + "\n";
$code += $condition.code;
$code += "JUMPF " + (cur_label + 1) + "\n";
$code += $block.code;
$code += "JUMP " + cur_label + "\n";
$code += "LABEL " + (cur_label + 1) + "\n";
}
;
block returns [String code]
#init
{
$code = new String();
}
: '{' instruction* '}' NEWLINE*
{
System.err.println("block found");
$code += $instruction.code;
System.err.println("curly braces for while found");
}
;
And the compiler code generated:
while(true)
{
5+9;
8-10;
4*6;
}
block found
curly braces for while found
instruction found
LABEL 0
PUSHI 1
JUMPF 1
PUSHI 4
PUSHI 6
MUL
POP
JUMP 0
LABEL 1
HALT
I have a feeling that the $code is always reinitialized. Or maybe it's because I have instruction* in two different rules. I'm not sure how else to handle this problem. All help is much appreciated.
Thank you

Anyway, it looks like your problem is that $instruction in block's action only refers to the last instruction because the block is outside of the *, so the action only gets run once.
You can either move the action inside the * like you did in the calcul rule or you can put all the instructions in a list with instructions+=instruction* and then use $instructions in the action (or better: a listener or visitor).
PS: I strongly recommend to use a listener or visitor instead of having actions all over your grammar. They make the grammar very hard to read.

Related

ANTLR4: Unexpected behavior that I can't understand

I'm very new to ANTLR4 and am trying to build my own language. So my grammar starts at
program: <EOF> | statement | functionDef | statement program | functionDef program;
and my statement is
statement: selectionStatement | compoundStatement | ...;
and
selectionStatement
: If LeftParen expression RightParen compoundStatement (Else compoundStatement)?
| Switch LeftParen expression RightParen compoundStatement
;
compoundStatement
: LeftBrace statement* RightBrace;
Now the problem is, that when I test a piece of code against selectionStatement or statement it passes the test, but when I test it against program it fails to recognize. Can anyone help me on this? Thank you very much
edit: the code I use to test is the following:
if (x == 2) {}
It passes the test against selectionStatement and statement but fails at program. It appears that program only accepts if...else
if (x == 2) {} else {}
Edit 2:
The error message I received was
<unknown>: Incorrect error: no viable alternative at input 'if(x==2){}'
Cannot answer your question given the incomplete information provided: the statement rule is partial and the compoundStatement rule is missing.
Nonetheless, there are two techniques you should be using to answer this kind of question yourself (in addition to unit tests).
First, ensure that the lexer is working as expected. This answer shows how to dump the token stream directly.
Second, use a custom ErrorListener to provide a meaningful/detailed description of its parse path to every encountered error. An example:
public class JavaErrorListener extends BaseErrorListener {
public int lastError = -1;
#Override
public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine,
String msg, RecognitionException e) {
Parser parser = (Parser) recognizer;
String name = parser.getSourceName();
TokenStream tokens = parser.getInputStream();
Token offSymbol = (Token) offendingSymbol;
int thisError = offSymbol.getTokenIndex();
if (offSymbol.getType() == -1 && thisError == tokens.size() - 1) {
Log.debug(this, name + ": Incorrect error: " + msg);
return;
}
String offSymName = JavaLexer.VOCABULARY.getSymbolicName(offSymbol.getType());
List<String> stack = parser.getRuleInvocationStack();
// Collections.reverse(stack);
Log.error(this, name);
Log.error(this, "Rule stack: " + stack);
Log.error(this, "At line " + line + ":" + charPositionInLine + " at " + offSymName + ": " + msg);
if (thisError > lastError + 10) {
lastError = thisError - 10;
}
for (int idx = lastError + 1; idx <= thisError; idx++) {
Token token = tokens.get(idx);
if (token.getChannel() != Token.HIDDEN_CHANNEL) Log.error(this, token.toString());
}
lastError = thisError;
}
}
Note: adjust the Log statements to whatever logging package you are using.
Finally, Antlr doesn't do 'weird' things - just things that you don't understand.

White-Box Testing

I was just wondering what is the difference between the statment coverage/decision coverage/condition coverage from the following code.
public static void main (String args [])
{
char letter=' ';
String word= "", vowels = "aeiouAEIOU";
int i, numVowels= 0, numCons= 0, wordLength= 0;
word = JOptionPane.showInputDialog("Input a word: " );
if (word.length() > 10 || word.length() < 3)
word = JOptionPane.showInputDialog("Input another word: ");
wordLength= word.length();
for (i = 0; i < wordLength; i++)
letter = word.charAt(i);
if (vowels.indexOf(letter) != -1)
numVowels = numVowels+1;
numCons = wordLength-numVowels;
JOptionPane.showMessageDialog(null, "Number of vowels: "+ numVowels);
JOptionPane.showMessageDialog(null, + " Consonants: " + numCons);
}
P.S. There are no braces in any of the if statements.
Different tools use a slightly different terminology to explain these numbers . For ex jacoco uses the following terminology
http://www.eclemma.org/jacoco/trunk/doc/counters.html
It might be helpful if you could tell us the tool that you are using to calculate the coverage. We can then apply it to your code

My simple ANTLR grammar ignores certain invalid tokens when parsing

I asked a question a couple of weeks ago about my ANTLR grammar (My simple ANTLR grammar is not working as expected). Since asking that question, I've done more digging and debugging and gotten most of the kinks out. I am left with one issue, though.
My generated parser code is not picking up invalid tokens in one particular part of the text that is processed. The lexer is properly breaking things into tokens, but the parser does not kick out invalid tokens in some cases. In particular, when the invalid token is at the end of a phrase like "A and "B", the parser ignores it - it's like the token isn't even there.
Some specific examples:
"A and B" - perfectly valid
"A# and B" - parser properly picks up the invalid # token
"A and #B" - parser properly picks up the invalid # token
"A and B#" - here's the mystery - the lexer finds the # token and the parser IGNORES it (!)
"(A and B#) or C" - further mystery - the lexer finds the # token and the parser IGNORES it (!)
Here is my grammar:
grammar QvidianPlaybooks;
options{ language=CSharp3; output=AST; ASTLabelType = CommonTree; }
public parse
: expression
;
LPAREN : '(' ;
RPAREN : ')' ;
ANDOR : 'AND'|'and'|'OR'|'or';
NAME : ('A'..'Z');
WS : ' ' { $channel = Hidden; };
THEREST : .;
// ***************** parser rules:
expression : anexpression EOF!;
anexpression : atom (ANDOR^ atom)*;
atom : NAME | LPAREN! anexpression RPAREN!;
The code that then processes the resulting tree looks like this:
... from the main program
QvidianPlaybooksLexer lexer = new QvidianPlaybooksLexer(new ANTLRStringStream(src));
QvidianPlaybooksParser parser = new QvidianPlaybooksParser(new CommonTokenStream(lexer));
parser.TreeAdaptor = new CommonTreeAdaptor();
CommonTree tree = (CommonTree)parser.parse().Tree;
ValidateTree(tree, 0, iValidIdentifierCount);
// recursive code that walks the tree
public static RuleLogicValidationResult ValidateTree(ITree Tree, int depth, int conditionCount)
{
RuleLogicValidationResult rlvr = null;
if (Tree != null)
{
CommonErrorNode commonErrorNode = Tree as CommonErrorNode;
if (null != commonErrorNode)
{
rlvr = new RuleLogicValidationResult();
rlvr.IsValid = false;
rlvr.ErrorType = LogicValidationErrorType.Other;
Console.WriteLine(rlvr.ToString());
}
else
{
string strTree = Tree.ToString();
strTree = strTree.Trim();
strTree = strTree.ToUpper();
if ((Tree.ChildCount != 0) && (Tree.ChildCount != 2))
{
rlvr = new RuleLogicValidationResult();
rlvr.IsValid = false;
rlvr.ErrorType = LogicValidationErrorType.Other;
rlvr.InvalidIdentifier = strTree;
rlvr.ErrorPosition = 0;
Console.WriteLine(String.Format("CHILD COUNT of {0} = {1}", strTree, tree.ChildCount));
}
// if the current node is valid, then validate the two child nodes
if (null == rlvr || rlvr.IsValid)
{
// output the tree node
for (int i = 0; i < depth; i++)
{
Console.Write(" ");
}
Console.WriteLine(Tree);
rlvr = ValidateTree(Tree.GetChild(0), depth + 1, conditionCount);
if (rlvr.IsValid)
{
rlvr = ValidateTree(Tree.GetChild(1), depth + 1, conditionCount);
}
}
else
{
Console.WriteLine(rlvr.ToString());
}
}
}
else
{
// this tree is null, return a "it's valid" result
rlvr = new RuleLogicValidationResult();
rlvr.ErrorType = LogicValidationErrorType.None;
rlvr.IsValid = true;
}
return rlvr;
}
Add EOF to the end of your start rule. :)

simple math expression parser

I have a simple math expression parser and I want to build the AST by myself (means no ast parser). But every node can just hold two operands. So a 2+3+4 will result in a tree like this:
+
/ \
2 +
/ \
3 4
The problem is, that I am not able to get my grammer doing the recursion, here ist just the "add" part:
add returns [Expression e]
: op1=multiply { $e = $op1.e; Print.ln($op1.text); }
( '+' op2=multiply { $e = new AddOperator($op1.e, $op2.e); Print.ln($op1.e.getClass(), $op1.text, "+", $op2.e.getClass(), $op2.text); }
| '-' op2=multiply { $e = null; } // new MinusOperator
)*
;
But at the end of the day this will produce a single tree like:
+
/ \
2 4
I know where the problem is, it is because a "add" can occour never or infinitly (*) but I do not know how to solve this. I thought of something like:
"add" part:
add returns [Expression e]
: op1=multiply { $e = $op1.e; Print.ln($op1.text); }
( '+' op2=(multiply|add) { $e = new AddOperator($op1.e, $op2.e); Print.ln($op1.e.getClass(), $op1.text, "+", $op2.e.getClass(), $op2.text); }
| '-' op2=multiply { $e = null; } // new MinusOperator
)?
;
But this will give me a recoursion error. Any ideas?
I don't have the full grammar to test this solution, but consider replacing this (from the first add rule in the question):
$e = new AddOperator($op1.e, $op2.e);
With this:
$e = new AddOperator($e, $op2.e); //$e instead of $op1.e
This way each iteration over ('+' multiply)* extends e rather than replaces it.
It may require a little playing around to get it right, or you may need a temporary Expression in the rule to keep things managed. Just make sure that the last expression created by the loop is somewhere on the right-hand side of the = operator, as in $e = new XYZ($e, $rhs.e);.

ANTLR Source to Output

I'm trying to implement something like a Code Contracts feature for JavaScript as an assignment for one of my courses.
The problem I'm having is that I can't seem to find a way to output the source file directly to the console without modifying the entire grammar.
Does anybody knows a way to achieve this?
Thanks in advance.
Here's an example of what I'm trying to do:
function DoClear(num, arr, text){
Contract.Requires<RangeError>(num > 0);
Contract.Requires(num < 1000);
Contract.Requires<TypeError>(arr instanceOf Array);
Contract.Requires<RangeError>(arr.length > 0 && arr.length <= 9);
Contract.Requires<ReferenceError>(text != null);
Contract.Ensures<RangeError>(text.length === 0);
// method body
[...]
return text;
}
function DoClear(num, arr, text){
if (!(num > 0))
throw RangeError;
if (!(num < 1000))
throw Error;
if (!(arr instanceOf Array))
throw TypeError;
if (!(arr.length > 0 && arr.length <= 9))
throw RangeError;
if (!(text != null))
throw ReferenceError
// method body
[...]
if (!(text.length === 0))
throw RangeError
else
return text;
}
There are a few (minor) things you'll want to consider:
ignore string literals that might contain your special contract-syntax;
ignore multi- and single line comments that might contain your special Contract syntax;
ignore code like this: var Requires = "Contract.Requires<RangeError>"; (i.e. regular JavaScript code that "looks like" your contract-syntax);
It's pretty straight forward to take the points above into account and also simply create single tokens for an entire contract-line. You'll be making your life hard when tokenizing the following into 4 different tokens Contract.Requires<RangeError>(num > 0):
Contract
Requires
<RangeError>
(num > 0)
So it's easiest to create a single token from it, and at the parsing phase, split the token on ".", "<" or ">" with a maximum of 4 tokens (leaving expressions containing ".", "<" or ">" as they are).
A quick demo of what I described above might look like this:
grammar CCJS;
parse
: atom+ EOF
;
atom
: code_contract
| (Comment | String | Any) {System.out.print($text);}
;
code_contract
: Contract
{
String[] tokens = $text.split("[.<>]", 4);
System.out.print("if (!" + tokens[3] + ") throw " + tokens[2]);
}
;
Contract
#init{
boolean hasType = false;
}
#after{
if(!hasType) {
// inject a generic Error if this contract has no type
setText(getText().replaceFirst("\\(", "<Error>("));
}
}
: 'Contract.' ('Requires' | 'Ensures') ('<' ('a'..'z' | 'A'..'Z')+ '>' {hasType=true;})? '(' ~';'+
;
Comment
: '//' ~('\r' | '\n')*
| '/*' .* '*/'
;
String
: '"' (~('\\' | '"' | '\r' | '\n') | '\\' . )* '"'
;
Any
: .
;
which you can test with the following class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String src =
"/* \n" +
" Contract.Requires to be ignored \n" +
"*/ \n" +
"function DoClear(num, arr, text){ \n" +
" Contract.Requires<RangeError>(num > 0); \n" +
" Contract.Requires(num < 1000); \n" +
" Contract.Requires<TypeError>(arr instanceOf Array); \n" +
" Contract.Requires<RangeError>(arr.length > 0 && arr.length <= 9); \n" +
" Contract.Requires<ReferenceError>(text != null); \n" +
" Contract.Ensures<RangeError>(text.length === 0); \n" +
" \n" +
" // method body \n" +
" // and ignore single line comments, Contract.Ensures \n" +
" var s = \"Contract.Requires\"; // also ignore strings \n" +
" \n" +
" return text; \n" +
"} \n";
CCJSLexer lexer = new CCJSLexer(new ANTLRStringStream(src));
CCJSParser parser = new CCJSParser(new CommonTokenStream(lexer));
parser.parse();
}
}
If you run the Main class above, the following will be printed to the console:
/*
Contract.Requires to be ignored
*/
function DoClear(num, arr, text){
if (!(num > 0)) throw RangeError;
if (!(num < 1000)) throw Error;
if (!(arr instanceOf Array)) throw TypeError;
if (!(arr.length > 0 && arr.length <= 9)) throw RangeError;
if (!(text != null)) throw ReferenceError;
if (!(text.length === 0)) throw RangeError;
// method body
// and ignore single line comments, Contract.Ensures
var s = "Contract.Requires"; // also ignore strings
return text;
}
BUT ...
... I realize that it isn't what you're exactly looking for: the RangeError is not placed at the end of your function. And that's going to be tough one: a function might have multiple returns, and is likely to have multiple code blocks { ... } making it difficult to know where the } is that ends the function. So you don't know where exactly to inject this RangeError-check. At least, not with a naive approach as I demonstrated.
The only reliable way to implement such a thing is to get a decent JavaScript grammar, add your own contract-rules to it, rewrite the AST the parser produces, and finally emit the new AST in a friendly-formatted way: not a trivial task, to say the least!
There are various ECMA/JS grammars on the ANTLR Wiki, but tread with care: they are user-committed grammars and may contain errors (probably will in this case[1]!).
If you choose to place the RangeError there where it should be rewritte, like so:
function DoClear(num, arr, text){
Contract.Requires<RangeError>(num > 0);
...
// method body
...
Contract.Ensures<RangeError>(text.length === 0);
return text;
}
which would result in:
function DoClear(num, arr, text){
if (!(num > 0)) throw RangeError;
...
// method body
...
if (!(text.length === 0))
throw RangeError
return text;
}
then you need not parse the entire method body, and you might get away with a hack as I proposed.
Best of luck!
[1] the last time I checked these ECMA/JS script grammars, none of them handled regex literals, /pattern/, properly, making them in my opinion suspect.