shift/reduce conflict on symbol ',' - yacc

I have shift-reduce warnings like this:
Can someone explain me what am I missing here?
168: shift/reduce conflict (shift 316, reduce 157) on ','
state 168
static_var_list : static_var_list2 . (157)
static_var_list2 : static_var_list2 . ',' T_VARIABLE '=' static_scalar (158)

Related

Unexpected parser rule matching order

With the following (subset of a) grammer for a scripting language:
expr
...
| 'regex(' str=expr ',' re=expr ')' #regexExpr
...
an expression like regex('s', 're') parses to the following tree which makes sense:
regexExpr
'regex('
expr: stringLiteral ('s')
','
expr: stringLiteral ('re')
')'
I'm now trying to add an option third argument to my regex function, so I've used this modified rule:
'regex(' str=expr ',' re=expr (',' n=expr )? ')'
This causes regex('s', 're', 1) to be parsed in a way that's unexpected to me:
regexExpr
'regex('
expr:listExpression
expr: stringLiteral ('s')
','
expr: stringLiteral ('re')
','
expr: integerLiteral(1)
')'
where listExpression is another rule defined below regexExpr:
expr
...
| 'regex(' str=expr ',' re=expr (',' n=expr)? ')' #regexExpr
...
| left=expr ',' right=expr #listExpr
...
I think this listExpr could have been defined better (by defining surrounding tokens), but I've got compatibility concerns with changing it now.
I don't understand the parser rule matching precedence here. Is there a way I can add the optional third arg to regex() without causing the first two args to be parsed as a listExpr?
Try defining them in two separate alternatives and with the same label #regexExpr:
expr
: 'regex' '(' str=expr ',' re=expr ',' n=expr ')' #regexExpr
| 'regex' '(' str=expr ',' re=expr ')' #regexExpr
| left=expr ',' right=expr #listExpr
| ...
;

A yacc reduce/reduce conflict I can't explain

I'm getting a shift/reduce and reduce/reduce conflict that I believe shouldn't happen. Obviously I'm doing something wrong, so someone explain to me what I'm missing.
My stripped down grammar:
/*
* Test SQL Grammar
*/
%{
#include <stdio.h>
#include <string.h>
%}
/* Yacc's YYSTYPE UNION */
%union {
char* str; /* Pointer to constant string (malloc'd in lex) */
}
%token SELECT FROM AS ROWID ROWNUM NEXTVAL CURRVAL NULL
%token <str> IDENTIFIER STRING NUMBER
%%
query_block
: SELECT
select_list
FROM row_source_list
;
select_list
: '*'
| select_item_list
;
select_item_list
: select_item_list ',' select_item
| select_item
;
select_item
: row_source '.' '*'
| expr
| expr IDENTIFIER
;
row_source_list
: row_source_list ',' row_source
| row_source
;
row_source
: IDENTIFIER
| IDENTIFIER '.' IDENTIFIER
| IDENTIFIER opt_AS IDENTIFIER
| IDENTIFIER '.' IDENTIFIER opt_AS IDENTIFIER
;
opt_AS
: /* Empty */
| AS
;
expr
: IDENTIFIER '.' IDENTIFIER
| IDENTIFIER '.' ROWID
| IDENTIFIER '.' IDENTIFIER '.' IDENTIFIER
| IDENTIFIER '.' IDENTIFIER '.' ROWID
| ROWNUM
| ROWID
| STRING
| NUMBER
| IDENTIFIER '.' CURRVAL
| IDENTIFIER '.' NEXTVAL
| NULL
;
The conflicts seem to arrise because yacc doesn't know if it is working on the select_list (expr list) or the row_source_list. State 26 of y.output details the conflict:
state 26
12 row_source: IDENTIFIER '.' IDENTIFIER .
14 | IDENTIFIER '.' IDENTIFIER . opt_AS IDENTIFIER
17 expr: IDENTIFIER '.' IDENTIFIER .
19 | IDENTIFIER '.' IDENTIFIER . '.' IDENTIFIER
20 | IDENTIFIER '.' IDENTIFIER . '.' ROWID
AS shift, and go to state 16
'.' shift, and go to state 33
IDENTIFIER reduce using rule 15 (opt_AS)
IDENTIFIER [reduce using rule 17 (expr)]
'.' [reduce using rule 12 (row_source)]
$default reduce using rule 17 (expr)
opt_AS go to state 34
Now the basic rule for "query_block" states that a row_source_list must be preceded by the "FROM" keyword, so I don't see why yacc is combining the two into one state.
query_block
: SELECT
select_list
FROM row_source_list
;
I've traced the states and it ends up in this state before finding the "FROM" keyword.
I don't understand why it is considering the row_source_list before it recognized "FROM".
(I flagged this as "no longer reproducible fault" as the OP had solved it trivially, but the flag was aged/timed out).
As it has an answer I'll transcribe the answer so at least the question is noted as answered.
As the OP states:
I found it right after posting. It's the first line in the select_item rule. I should have caught that earlier.
Which to clarify, the select_item rule should be:
select_item
: expr
| expr IDENTIFIER
;
Which removes the ambiguity.

Solving YACC shift/reduce. Driving me crazy

Hey Guys this is driving me crazy I'll list the error and the relevant code below. Thanks in advance for any help.
ERROR:
51: shift/reduce conflict (shift 69, reduce 28) on '{'
state 51
funcao : publico tIDENTIFIER '(' seq_vars ')' eqliteral . corpo (13)
corpo : . (28)
'{' shift 69
$end reduce 28
tVOID reduce 28
tPUBLIC reduce 28
tCONST reduce 28
tIF reduce 28
tDO reduce 28
tFOR reduce 28
tCONTINUE reduce 28
tBREAK reduce 28
tRETURN reduce 28
tINTEGER reduce 28
tNUMBER reduce 28
tSTRING reduce 28
corpo goto 70
bloco goto 71
And this is the relevant code
// Função
funcao: publico tIDENTIFIER '(' seq_vars ')' eqliteral corpo {};
// Corpo do bloco
corpo: bloco |;
// Bloco
bloco: '{' seq_decls seq_inst '}' {/*figure this out later*/};
I'll keep trying to solve it and post the answer if I do.
Since we can't possible replicate the circumstances, I'm only guessing...
It looks like Yacc doesn't know what to do when it reaches the position after the eqliteral nonterminal. You can see that's where the parser generator is because of the . in rule in the error message.
When Yacc reaches this position, and there is no '{' terminal, should it shift using the bloco rule (you see the . in that rule too) or should it reduce when seeing something else?
One possible solution (that I'm unable to verify) is to change the funcao rule:
funcao: publico tIDENTIFIER '(' seq_vars ')' eqliteral
| publico tIDENTIFIER '(' seq_vars ')' eqliteral '{' seq_decls seq_inst '}'
;
It may work, it may not.

What is the wrong with the simple ANTLR grammar?

I am writing an ANTLR grammar to parse a log files, and faced a problem.
I have simplified my grammar to reproduce the problem as followed:
stmt1:
'[ ' elapse ': ' stmt2
;
stmt2:
'[xxx'
;
stmt3:
': [yyy'
;
elapse :
FLOAT;
FLOAT
: ('0'..'9')+ '.' ('0'..'9')*
;
When I used the following string to test the grammar:
[ 98.9: [xxx
I got the error:
E:\work\antlr\output\__Test___input.txt line 1:9 mismatched character 'x' expecting 'y'
E:\work\antlr\output\__Test___input.txt line 1:10 no viable alternative at character 'x'
E:\work\antlr\output\__Test___input.txt line 1:11 no viable alternative at character 'x'
E:\work\antlr\output\__Test___input.txt line 1:12 mismatched input '<EOF>' expecting ': '
But if I remove the ruel 'stmt3', same string would be accepted.
I am not sure what happened...
Thanks for any advice!
Leon
Thanks help from Bart. I have tried to correct the grammar.
I think, the baseline, I have to disambiguate all tokens.
And I add WS token to simplify the rule.
stmt1:
'[' elapse ':' stmt2
;
stmt2:
'[' 'xxx'
;
stmt3:
':' '[' 'yyy'
;
elapse :
FLOAT;
FLOAT
: ('0'..'9')+ '.' ('0'..'9')*
;
WS : (' ' |'\t' |'\n' |'\r' )+ {skip();} ;
ANTLR has a strict separation between lexer rules (tokens) and parser rules. Although you defined some literals inside parser rules, they are still tokens. This means the following grammar is equivalent (in practice) to your example grammar:
stmt1 : T1 elapse T2 stmt2 ;
stmt2 : T3 ;
stmt3 : T4 ;
elapse : FLOAT;
T1 : '[ ' ;
T2 : ': ' ;
T3 : '[xxx' ;
T4 : ': [yyy' ;
FLOAT : ('0'..'9')+ '.' ('0'..'9')* ;
Now, when the lexer tries to construct tokens from the input "[ 98.9: [xxx", it successfully creates the tokens T1 and FLOAT, but when it sees ": [", it tries to construct a T4 token. But when the next char in the stream is a "x" instead of a "y", the lexer tries to construct another token that starts with ": [". But since there is no such token, the lexer emit the error:
[...] mismatched character 'x' expecting 'y'
And no, the lexer will not backtrack to "give up" the character "[" from ": [" to match the token T2, nor will it look ahead in the char-stream to see if a T4 token can really be constructed. ANTLR's LL(*) is only applicable to parser rules, not lexer rules!

yacc shift/reduce conflict. It really serious complexity

I was trying many many time to solve this conflict.
But I don't know why occur conflict here.
2 conflicts occur at compliation time.
yacc(bison) error goes:
State 314 conflicts: 1 shift/reduce
State 315 conflicts: 1 shift/reduce
state 314
7 c_complex_object_id: type_identifier .
8 | type_identifier . V_LOCAL_TERM_CODE_REF
V_LOCAL_TERM_CODE_REF shift, and go to state 77
V_LOCAL_TERM_CODE_REF [reduce using rule 7 (c_complex_object_id)]
$default reduce using rule 7 (c_complex_object_id)
state 315
127 c_integer_spec: integer_value .
184 ordinal: integer_value . SYM_INTERVAL_DELIM V_QUALIFIED_TERM_CODE_REF
201 integer_list_value: integer_value . ',' integer_value
203 | integer_value . ',' SYM_LIST_CONTINUE
SYM_INTERVAL_DELIM shift, and go to state 380
',' shift, and go to state 200
SYM_INTERVAL_DELIM [reduce using rule 127 (c_integer_spec)]
$default reduce using rule 127 (c_integer_spec)
state 77
8 c_complex_object_id: type_identifier V_LOCAL_TERM_CODE_REF .
$default reduce using rule 8 (c_complex_object_id)
state 380
184 ordinal: integer_value SYM_INTERVAL_DELIM . V_QUALIFIED_TERM_CODE_REF
V_QUALIFIED_TERM_CODE_REF shift, and go to state 422
state 200
201 integer_list_value: integer_value ',' . integer_value
203 | integer_value ',' . SYM_LIST_CONTINUE
V_INTEGER shift, and go to state 2
SYM_LIST_CONTINUE shift, and go to state 276
'+' shift, and go to state 170
'-' shift, and go to state 171
integer_value go to state 277
...
yacc source goes:
c_complex_object_id
: type_identifier
| type_identifier V_LOCAL_TERM_CODE_REF
;
type_identifier
: '(' V_TYPE_IDENTIFIER ')'
| '(' V_GENERIC_TYPE_IDENTIFIER ')'
| V_TYPE_IDENTIFIER
| V_GENERIC_TYPE_IDENTIFIER
;
c_integer_spec
: integer_value
| integer_list_value
| integer_interval_value
;
c_integer
: c_integer_spec
| c_integer_spec ';' integer_value
| c_integer_spec ';' error
;
ordinal
: integer_value SYM_INTERVAL_DELIM V_QUALIFIED_TERM_CODE_REF
;
integer_list_value
: integer_value ',' integer_value
| integer_value ',' SYM_LIST_CONTINUE
;
integer_value
: V_INTEGER
| '+' V_INTEGER
| '-' V_INTEGER
;
I have two problems above. What's wrong with it?
Let's consider the messages from the first shift/reduce conflict. You can read the period (".") as a pointer. What the message says, more or less in English, is
"When I'm in state 299, and I have recognized a type_identifier, I must decide whether to reduce by rule 7 (recognize c_complex_object_id : type_identifier) or to shift to state 63 (continue scanning for a V_LOCAL_TERM_CODE_REF)."
Usually a conflict like this comes about when the production not yet recognized (V_LOCAL_TERM_CODE_REF) is optional.
Your definition of the tokens V_LOCAL_TERM_CODE_REF, etc. looks OK as far as I can tell from your comment.
It's hard to diagnose this further without seeing the yacc diagnostic output for state 63. Could you edit your question to show the output for state 63? It might tell us something.
I found some lecture notes by Pete Jinks that might be useful background for you. You might also read some of the other questions listed in the right column of this page, under the "Related" heading.
Update
In one way, you are correct: a shift/reduce conflict can be ignored. bison/yacc will produce a parser that runs, that does something. But it is important to understand why you are ignoring a specific conflict. Then you will understand why the parser, when presented with an input program, parses it the way it does and produces the output that it does. It is not good to say, "oh, this is too complex, I can't figure it out."