how do i parse perl (if,else if,else) statements to database - perl-data-structures

I want to parse a Perl file with the statements (if,else if,else) to database which will look like this:
if(expression) {
command1
command2
if(expression2){
command3}}
and the parser will make hash:
%hash= {expression=>{command1,command2,expression2=>{command3}}
I have tried using parse rec. Decent, but I didn't succeed. Does someone have an idea for a better solution? Or can help me with this grammar writing?
Have a nice day.

first of all ty for your help , this is what i have been trying to do
my $grammar = q{
startrule : expression
expression : ifstatement(s?) statement(s?)
{$return = {'if' => $item[1]}}
codeblock : /[a-zA-Z]\w*/
{$return = {$item[1]}}
code : '{' codeblock(s?) ifstatement(s?) '}'
{$return = {$item[2]}}
statement : /[a-zA-Z]\w*/
elseifstatement : 'else' 'if' '(' expression ')' code
{[#item]}
elsestatement : 'else' code
{[#item]}
ifstatement : 'if' '(' statement ')' code(s?)
elseifstatement(s?) elsestatement(?)
{ $return = { 'statement' => $item[3], 'expression' => $item[5]}}
};

Related

Ignore spaces, but allow text with spaces

I need to write a simple antlr4 grammar for expressions like this:
{paramName=simple text} //correct
{ paramName = simple text} //correct
{bad param=text} //incorrect
First two expression is almost equal. The difference is a space before and after parameter name. Third is incorrect, spaces not allowed in parameter name. I write a grammar:
grammar Test;
prog : '{' paramName '=' paramValue '}' ;
paramName : PARAM_NAME ;
paramValue : TEXT_WITH_SPACES ;
PARAM_NAME : [A-Za-zА-Яа-я_] [A-Za-zА-Яа-я_0-9]* ;
TEXT_WITH_SPACES : (LETTERS_EN|' ')+ ;
WS : [ ]+ -> skip;
fragment LETTERS_EN : ([A-Za-z]) ;
So, the task is ignore spaces around parameter name, but allow spaces in parameter value. But when I add a space inside rule TEXT_WITH_SPACES, my second expression highlight as icorrect.
screenshot
What can I do? Thank you in advance!
Ignore all spaces, but consider them to be "end of word", and allow more words in the value:
grammar Test;
prog : '{' paramName '=' paramValue '}' ;
paramName : WORD ;
paramValue : WORD+ ;
WORD : [A-Za-zА-Яа-я_] [A-Za-zА-Яа-я_0-9]* ;
WS : [ ]+ -> skip;
Update: To preserve spaces in the value:
grammar Test;
prog : '{' paramName '=' paramValue '}' ;
paramName : WORD ;
paramValue : WORD | MULTIWORD ;
MULTIWORD : WORD ((' ')+ WORD)* ;
WORD : [A-Za-zА-Яа-я_] [A-Za-zА-Яа-я_0-9]* ;
WS : [ ]+ -> skip;
This is based on MULTIWORD matching multiple words with nothing but space in between them, and other cases being matched by sequence of WORD and WS.

ANTLR4: matching token with same rule but with different position in the grammar

I have the following statement I wish to parse:
in(name,(Silver,Gold))
in: is a function.
name: is a ID.
(Silver, Gold): is string array with elements 'Silver', and 'Gold'.
The parser is always confused as ID and string array elements have the same rule. Using quotes or double quotes for string will help, but this is not the case here.
Also, predicates didn't help much.
The grammar:
grammar Rql;
statement
: EOF
| query EOF
;
query
: function
;
function
: FUNCTION_IN OPAR id COMMA OPAR array CPAR CPAR
;
array
: VALUE (COMMA VALUE)*
;
FUNCTION_IN: 'in';
id
: {in(}? ID
;
ID
: [a-zA-Z_] [a-zA-Z_0-9]*
;
VALUE
: STRING
| INT
| FLOAT
;
OPAR : '(';
CPAR : ')';
COMMA : ',';
INT
: [0-9]+
;
FLOAT
: [0-9]+ '.' [0-9]*
| '.' [0-9]+
;
SPACE
: [ \t\r\n] -> skip
;
STRING
: [a-zA-Z_] [a-zA-Z_0-9]*
;
OTHER
: .
;
The idea is to change the type of the token under some condition. Here seeing an ID for the first time in a line sets a switch to true. The next time an ID is matched, the lexer will execute the if and set the type to ID_VALUE. I wanted to reset the switch while entering the rule function, but it doesn't work :
function
#init {QuestionLexer.id_seen = false; System.out.println("id_seen has been reset" + QuestionLexer.id_seen);}
: FUNCTION_IN OPAR ID COMMA OPAR array CPAR CPAR
ID=name1 seen ? false
ID=Silver seen ? true
...
ID=Platinum seen ? true
[#0,0:1='in',<'in'>,1:0]
[#1,2:2='(',<'('>,1:2]
[#2,3:7='name1',<ID>,1:3]
[#3,8:8=',',<','>,1:8]
[#4,9:9='(',<'('>,1:9]
[#5,10:15='Silver',<10>,1:10]
...
[#12,27:31='name2',<10>,2:3]
...
[#20,52:51='<EOF>',<EOF>,3:0]
Question last update 1336
id_seen has been reset false
id_seen has been reset false
line 2:3 mismatched input 'name2' expecting ID
.
That's why I reset it in the FUNCTION_IN rule.
Grammar Question.g4 :
grammar Question;
#lexer::members {
static boolean id_seen = false;
}
tokens { ID_VALUE }
question
#init {System.out.println("Question last update 1352");}
: function+ EOF
;
function
: FUNCTION_IN OPAR ID COMMA OPAR array CPAR CPAR
;
array
: value (COMMA value)*
;
value
: ID_VALUE
| INT
| FLOAT
;
FUNCTION_IN: 'in' {id_seen = false;} ;
ID : [a-zA-Z_] [a-zA-Z_0-9]*
{System.out.println("ID=" + getText() + " seen ? " + id_seen);
if (id_seen) setType(QuestionParser.ID_VALUE); id_seen = true; } ;
OPAR : '(';
CPAR : ')';
COMMA : ',';
INT
: [0-9]+
;
FLOAT
: [0-9]+ '.' [0-9]*
| '.' [0-9]+
;
SPACE
: [ \t\r\n] -> skip
;
OTHER
: .
;
File t.text :
in(name1,(Silver,Gold))
in(name2,(Copper,Platinum))
Execution with ANTLR 4.6 :
$ grun Question question -tokens -diagnostics t.text
ID=name1 seen ? false
ID=Silver seen ? true
ID=Gold seen ? true
ID=name2 seen ? false
ID=Copper seen ? true
ID=Platinum seen ? true
[#0,0:1='in',<'in'>,1:0]
[#1,2:2='(',<'('>,1:2]
[#2,3:7='name1',<ID>,1:3]
[#3,8:8=',',<','>,1:8]
[#4,9:9='(',<'('>,1:9]
[#5,10:15='Silver',<10>,1:10]
[#6,16:16=',',<','>,1:16]
[#7,17:20='Gold',<10>,1:17]
[#8,21:21=')',<')'>,1:21]
[#9,22:22=')',<')'>,1:22]
[#10,24:25='in',<'in'>,2:0]
[#11,26:26='(',<'('>,2:2]
[#12,27:31='name2',<ID>,2:3]
[#13,32:32=',',<','>,2:8]
[#14,33:33='(',<'('>,2:9]
[#15,34:39='Copper',<10>,2:10]
[#16,40:40=',',<','>,2:16]
[#17,41:48='Platinum',<10>,2:17]
[#18,49:49=')',<')'>,2:25]
[#19,50:50=')',<')'>,2:26]
[#20,52:51='<EOF>',<EOF>,3:0]
Question last update 1352
Type <10> is ID_VALUE as can be seen in the .tokens file
$ cat Question.tokens
FUNCTION_IN=1
...
OTHER=9
ID_VALUE=10
'in'=1

How to fix extraneous input ' ' expecting, in antlr4

Hello when running antlr4 with the following input i get the following error
image showing problem
[
I have been trying to fix it by doing some changes here and there but it seems it only works if I write every component of whileLoop in a new line.
Could you please tell me what i am missing here and why the problem persits?
grammar AM;
COMMENTS :
'{'~[\n|\r]*'}' -> skip
;
body : ('BODY' ' '*) anything | 'BODY' 'BEGIN' anything* 'END' ;
anything : whileLoop | write ;
write : 'WRITE' '(' '"' sentance '"' ')' ;
read : 'READ' '(' '"' sentance '"' ')' ;
whileLoop : 'WHILE' expression 'DO' ;
block : 'BODY' anything 'END';
expression : 'TRUE'|'FALSE' ;
test : ID? {System.out.println("Done");};
logicalOperators : '<' | '>' | '<>' | '<=' | '>=' | '=' ;
numberExpressionS : (NUMBER numberExpression)* ;
numberExpression : ('-' | '/' | '*' | '+' | '%') NUMBER ;
sentance : (ID)* {System.out.println("Sentance");};
WS : [ \t\r\n]+ -> skip ;
NUMBER : [0-9]+ ;
ID : [a-zA-Z0-9]* ;
**`strong text`**
Your lexer rules produce conflicts:
body : ('BODY' ' '*) anything | 'BODY' 'BEGIN' anything* 'END' ;
vs
WS : [ \t\r\n]+ -> skip ;
The critical section is the ' '*. This defines an implicit lexer token. It matches spaces and it is defined above of WS. So any sequence of spaces is not handled as WS but as implicit token.
If I am right putting tabs between the components of whileloop will work, also putting more than one space between them should work. You should simply remove ' '*, since whitespace is to be skipped anyway.

ANTLR decision can match input such as "ID ID" using multiple alternatives

I am having a problem with the disambiguation of this parser. I would like to mention
that i am using antlrworks 1.4.3(it's a must i use it, homework assignment). I also must not use backtrack=true
It should match inputs like
main Int a, Char b, MyClass c -> Int :
expr ';'
.
.
.
expr ';'
end';'
I also comented the parser after ':' because this problem did not let me generate the code
program
: classDef+ -> ^(PROGRAM classDef+)
;
classDef
: CLASS name=ID (INHERITS parent=ID)? classBlock* END ';' ->
^(CLASS $name ^(INHERITS $parent)? classBlock*)
;
classBlock
: VAR assigmentBlock* END ';'-> ^(VAR assigmentBlock*)
| methodDecl -> ^(METHOD methodDecl)
;
methodDecl
//: name=ID methodVar* ('->' type=ID)? ':' methodBlock* END ';'
// -> ^($name methodVar* ^(RETURN $type) methodBlock*)
: name=ID methodVar* -> ^($name methodVar*)
;
methodVar
: type=ID name=ID ','? -> ^(PARAMS $type $name)
;
This is what antlrworks shows
If anyone could help me i would be much obliged.
Don't do:
methodDecl
: name=ID methodVar* ('->' type=ID)? ':' methodBlock* END ';'
;
methodVar
: type=ID name=ID ','?
;
rather do:
methodDecl
: name=ID (methodVar (',' methodVar)*)? ('->' type=ID)? ':' methodBlock* END ';'
;
methodVar
: type=ID name=ID
;
I.e. the comma should be mandatory, not optional as you defined it did.

Guide or approval for ANTLR example

I have an AlgebraRelacional.g4 file with this. I need to read a file with a syntax like a CSV file, put the content in some memory tables and then resolve relational algebra operations with that. Can you tell me if I am doing it right?
Example data file to read:
cod_buy(char);name_suc(char);Import(int);date_buy(date)
“P-11”;”DC Med”;900;01/03/14
“P-14”;”Center”;1500;02/05/14
Current ANTLR grammar:
grammar AlgebraRelacional;
SEL : '\u03C3'
;
PRO : '\u220F'
;
UNI : '\u222A'
;
DIF : '\u002D'
;
PROC : '\u0058'
;
INT : '\u2229'
;
AND : 'AND'
;
OR : 'OR'
;
NOT : 'NOT'
;
EQ : '='
;
DIFERENTE : '!='
;
MAYOR : '>'
;
MENOR : '<'
;
SUMA : '+'
;
MULTI : '*'
;
IPAREN : '('
;
DPAREN : ')'
;
COMA : ','
;
PCOMA : ';'
;
Comillas: '"'
;
file : hdr row+ ;
hdr : row ;
row : field (',' field)* '\r'? '\n' ;
field : TEXT | STRING | ;
TEXT : ~[,\n\r"]+ ;
STRING : '"' ('""'|~'"')* '"' ;
I suggest you that read this document (http://is.muni.cz/th/208197/fi_b/bc_thesis.pdf), It contains usefull information about how to write a parser for relational algebra. That is not ANTLR, but you only has to translate the grammar in BNF to EBNF.