I am currently following the Frederico Tomasetti Antlr tutorial, however I getting the following error when trying to generate my antlr grammar definition.
Chat.g4:52:26: syntax error: ']' came as a complete surprise to me
Chat.g4:52:25 syntax error: mismatched input ')' expecting SEMI while matching a lexer rule
Can anyone see where I've gone wrong?
My g4 file:
1 grammar Chat;
2
3 /*
4 * Parser Rules
5 */
6
7 chat : line+ EOF ;
8
9 line : name command message NEWLINE;
10
11 message : (emoticon | link | color | mention | WORD | WHITESPACE)+ ;
12
13 name : WORD WHITESPACE;
14
15 command : (SAYS | SHOUTS) ':' WHITESPACE ;
16
17 emoticon : ':' '-'? ')'
18 | ':' '-'? '('
19 ;
20
21 link : '[' TEXT ']' '(' TEXT ')' ;
22
23 color : '/' WORD '/' message '/';
24
25 mention : '#' WORD ;
26
27 /*
28 * Lexer Rules
29 */
30
31 fragment A : ('A'|'a') ;
32 fragment S : ('S'|'s') ;
33 fragment Y : ('Y'|'y') ;
34 fragment H : ('H'|'h') ;
35 fragment O : ('O'|'o') ;
36 fragment U : ('U'|'u') ;
37 fragment T : ('T'|'t') ;
38
39 fragment LOWERCASE : [a-z] ;
40 fragment UPPERCASE : [A-Z] ;
41
42 SAYS : S A Y S ;
43
44 SHOUTS : S H O U T S;
45
46 WORD : (LOWERCASE | UPPERCASE | '_')+ ;
47
48 WHITESPACE : (' ' | '\t') ;
49
50 NEWLINE : ('\r'? '\n' | '\r')+ ;
51
52 TEXT : ~[])]+ ;
Any help would be appreciated.
TEXT : ~[])]+ ;
You can't use ] unescaped in a character class - not even in the beginning. You'll need to precede it with a backslash: ~[\])]+.
Related
I came across a weird behavior of jq involving a variable on the left hand side of a pipe.
For your information, this question was inspired by the jq manual: under Scoping (https://stedolan.github.io/jq/manual/#Advancedfeatures) where it mentions an example filter ... | .*3 as $times_three | [. + $times_three] | .... I believe the correct version is ... | (.*3) as $times_three | [. + $times_three] | ....
First (https://jqplay.org/s/ffMPsqmsmt)
filter:
. * 3 as $times_three | .
input:
3
output:
9
Second (https://jqplay.org/s/yOFcjRAMLL)
filter:
. * 4 as $times_four | .
input:
3
output:
9
What is happening here?
But (https://jqplay.org/s/IKrTNZjKI8)
filter:
(. * 3) as $times_three | .
input:
3
output:
3
And (https://jqplay.org/s/8zoq2-HN1G)
filter:
(. * 4) as $times_four | .
input:
3
output:
3
So if parenthesis (.*3) or (.*4) is used when the variable is declared then filter behaves predictably.
But if parenthesis is not used .*3 or .*4 then strangely the output is 9 for both.
Can you explain?
Contrary to what the examples in the Scoping section assume, . * 4 as $times_four | . is equivalent to . * ( 4 as $times_four | . ) and therefore squares its input.
You might expect
. * 4 as $times_four | .
to be equivalent to
( . * 4 ) as $times_four | .
And as you point out, some example even suggest this is the case. However, the first snippet is actually equivalent to the following:
. * ( 4 as $times_four | . )
And since … as $x produces its context[1], that's the same as
. * ( . | . )
or
. * .
jq's operator precedence is inconsistent and/or quirky.
"def" | "abc" + "def" | length means"def" | ( "abc" + "def" ) | length, but"def" | "abc" + "def" as $x | length means"def" | "abc" + ( "def" as $x | length ).
This behaviour suggests that that as isn't a binary operator of the form X as $Y as one might expect, but a ternary operator of the form X as $Y | Z.
And, in fact, this is how it's documented:
Variable / Symbolic Binding Operator: ... as $identifier | ...
This leads to surprises, especially since it binds a lot more tightly than expected. And it looks like whomever authored the examples in the Scoping section fell into the trap.
It might produce it multiple times e.g. .[] as $x.
Indeed, there seems to be a mistake in the manual. In section Scoping it is contrasting the (faulty) examples
... | .*3 as $times_three | [. + $times_three] | ... # faulty!
and
... | (.*3 as $times_three | [. + $times_three]) | ... # faulty!
While the overall statement stays valid, both examples are missing additional parentheses around .*3. Thus, it should actually read
... | (.*3) as $times_three | [. + $times_three] | ...
and
... | ((.*3) as $times_three | [. + $times_three]) | ...
respectively.
From the manual under section Variable / Symbolic Binding Operator:
The expression exp as $x | ... means: for each value of expression exp, run the rest of the pipeline with the entire original input, and with $x set to that value. Thus as functions as something of a foreach loop.
This means that a variable assignment takes the one expression left of as and assigns its evaluation to the defined variable right of as (and this happens as many times as exp produces an output). But, as everything in jq is a filter, the assignment itself also is, and as such it needs to have an output itself. If you look closely, the full title of that section
Variable / Symbolic Binding Operator: ... as $identifier | ...
also features a pipe symbol next to it, which indicates that it belongs to the assignment's structure. Try just running . as $x. You will get an error because the | ... part is missing. Thus, to simply keep the input context as is (apart from maybe duplicating it as many times as the expression left of as produced an output), a complete assignment would rather look like … as $x | ., or, if the input context is what you wanted to capture in the variable, . as $x | .
That said, let's clarify what happens with your examples by putting explicit parentheses around the assignments:
3 | . * 3 as $times_three | .
3 | . * (3 as $times_three | .)
3 | . * . # with $times_three set to 3
3 * 3 # with $times_three set to 3
9 # with $times_three set to 3
3 | . * 4 as $times_four | .
3 | . * (4 as $times_four | .)
3 | . * . # with $times_four set to 4
3 * 3 # with $times_four set to 4
9 # with $times_four set to 4
3 | (. * 3) as $times_three | .
3 | ((. * 3) as $times_three | .)
3 | ((3 * 3) as $times_three | .)
3 | (9 as $times_three | .)
3 | . # with $times_three set to 9
3 # with $times_three set to 9
3 | (. * 4) as $times_four | .
3 | ((. * 4) as $times_four | .)
3 | ((3 * 4) as $times_four | .)
3 | (12 as $times_four | .)
3 | . # with $times_four set to 12
3 # with $times_four set to 12
I have a file separated by \t.
header text with many lines
V F A B
10 30 26 42
14 33 25 45
16 32 23 43
18 37 22 48
I want to change the 3rd column by the 4th and vice versa. I'm using
awk '
BEGIN {
RS = "\n";
OFS="\t";
record=0;
};
record {
a = $4;
$4 = $3;
$3 = a;
};
$1=="V" {
record=1
};
{
print $0
};
'
}
Instead of just changing the position of the columns, column 3 also has the line break of the original 4th column:
header text with many lines
V F A B
10 30 42
26
14 33 45
25
16 32 43
23
18 37 48
22
How can I prevent this in order to get?
header text with many lines
V F A B
10 30 42 26
14 33 45 25
16 32 43 23
18 37 48 22
Could you please try following, using usual method of storing 1 field's value to a variable and then exchanging the value of 4th field to 3rd field, at last putting 4th field value as variable value(could say swapping values using a variable).
awk 'FNR==1{print;next} {val=$3;$3=$4;$4=val} 1' OFS="\t" Input_file
Or, this messy sed:
sed -E 's/([[:digit:]]+)([[:blank:]]+)([[:digit:]]+)([[:space:]]*)$/\3\2\1\4/' file
# ^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^
# 3rd column tab 4th column optional whitespce
Here is a sample column:
+---------------+
| NAME |
+---------------+
| Jim Jo'nes |
| John $mith |
| Leroy Jenkins |
| Tom & Jerry |
+---------------+
I need to write a RegEx pattern that returns fields that include non-alphanumeric characters NOT including spaces. This is a name field that can contain multiple names separated by spaces.
My expected result set is this:
Jim Jo'nes
John $mith
Tom & Jerry
Use the [^ ... ] operator (non-matching character list) applied to alphanumeric ([:alnum:]) and space ([:space:]) character classes:
[^[:alnum:][:space:]]
Demo: https://regex101.com/r/VOzqFn/1
Use a REGEXP_LIKE Condition with a Negated Character Set
I would first turn off the use of '&' to identify substition variables:
set define off;
Next, I would just identify all characters you are not looking for:
a-z, A-Z, 0-9, \s (the escape version of a space character)
I then create a negated character set:
[^a-zA-Z0-9 ]
Here is my resulting solution:
SCOTT#tst>WITH names AS (
2 SELECT
3 'Jim Jo''nes' name
4 FROM
5 dual
6 UNION ALL
7 SELECT
8 'John $mith' name
9 FROM
10 dual
11 UNION ALL
12 SELECT
13 'Leroy Jenkin' name
14 FROM
15 dual
16 UNION ALL
17 SELECT
18 'Tom & Jerry' name
19 FROM
20 dual
21 ) SELECT
22 *
23 FROM
24 names
25 WHERE
26 1 = 1
27 AND
28 REGEXP_LIKE ( names.name,'[^a-zA-Z0-9 ]' );
name
----------
Jim Jo'nes
John $mith
Tom & Jerry
I'm trying to create grammar for GNU MathProg language from glpk package https://www3.nd.edu/~jeff/mathprog/glpk-4.47/doc/gmpl.pdf
Unfortunately grammar I've written so far is ambiguous.
I don't know how to tell bison which branch of parsing tree is correct when some identifier is used. For example:
numericExpression : numericLiteral
| identifier
| numericFunctionReference
| iteratedNumericExpression
| conditionalNumericExpression
| '(' numericExpression ')' %prec PARENTH
| '-' numericExpression %prec UNARY
| '+' numericExpression %prec UNARY
| numericExpression binaryArithmeticOperator numericExpression
;
symbolicExpression : stringLiteral
| symbolicFunctionReference
| identifier
| conditionalSymbolicExpression
| '(' symbolicExpression ')' %prec PARENTH
| symbolicExpression '&' symbolicExpression
;
indexingExpression : '{' indexingEntries '}'
| '{' indexingEntries ':' logicalExpression '}'
;
setExpression : literalSet
| identifier
| aritmeticSet
| indexingExpression
| iteratedSetExpression
| conditionalSetExpression
| '(' setExpression ')' %prec PARENTH
| setExpression setOperator setExpression
;
numericLiteral : INT
| FLT
;
linearExpression : identifier
| iteratedLinearExpression
| conditionalLinearExpression
| '(' linearExpression ')' %prec PARENTH
| '-' linearExpression %prec UNARY
| '+' linearExpression %prec UNARY
| linearExpression '+' linearExpression
| linearExpression '-' linearExpression
| linearExpression '*' numericExpression
| numericExpression '*' linearExpression
| linearExpression '/' numericExpression
;
logicalExpression : numericExpression
| relationalExpression
| iteratedLogicalExpression
| '(' logicalExpression ')' %prec PARENTH
| NOT logicalExpression %prec NEG
| logicalExpression AND logicalExpression
| logicalExpression OR logicalExpression
;
identifier : SYMBOLIC_NAME
| SYMBOLIC_NAME '[' listOfIndices ']'
;
listOfIndices : SYMBOLIC_NAME
| listOfIndices ',' SYMBOLIC_NAME
;
Identifier is simply name of 'variable'. Variable has a specific type (parameter, set, decision variable) and might be indexed. In code programmer has to declare variable type in statements like eg.
param p1;
param p2{1, 2} >=0;
set s1;
set s2{i in 1..5};
var v1 >=0;
var v2{S1,S2};
But when bison sees identifier doesn't know which rule should use and i'm getting reduce/reduce conflicts like
113 numericExpression: identifier .
123 symbolicExpression: identifier .
'&' reduce using rule 123 (symbolicExpression)
ELSE reduce using rule 113 (numericExpression)
ELSE [reduce using rule 123 (symbolicExpression)]
INTEGER reduce using rule 113 (numericExpression)
INTEGER [reduce using rule 123 (symbolicExpression)]
BINARY reduce using rule 113 (numericExpression)
BINARY [reduce using rule 123 (symbolicExpression)]
ASIGN reduce using rule 113 (numericExpression)
ASIGN [reduce using rule 123 (symbolicExpression)]
',' reduce using rule 113 (numericExpression)
',' [reduce using rule 123 (symbolicExpression)]
'>' reduce using rule 113 (numericExpression)
'>' [reduce using rule 123 (symbolicExpression)]
'}' reduce using rule 113 (numericExpression)
'}' [reduce using rule 123 (symbolicExpression)]
113 numericExpression: identifier .
123 symbolicExpression: identifier .
130 setExpression: identifier .
UNION reduce using rule 130 (setExpression)
DIFF reduce using rule 130 (setExpression)
SYMDIFF reduce using rule 130 (setExpression)
ELSE reduce using rule 113 (numericExpression)
ELSE [reduce using rule 123 (symbolicExpression)]
ELSE [reduce using rule 130 (setExpression)]
WITHIN reduce using rule 130 (setExpression)
IN reduce using rule 113 (numericExpression)
IN [reduce using rule 123 (symbolicExpression)]
I have also other problems but this one is blocker for me
Basically the problem is that identifier appears in multiple rules:
numericExpression : identifier
symbolicExpression : identifier
setExpression: identifier
which may apply in the same context. One way to resolve this is to introduce different token types for set names and scalar (parameters and variables) names:
symbolicExpression : SCALAR_NAME
setExpression: SET_NAME
This will resolve conflict with set names. I don't think that you need this rule
numericExpression : identifier
because there is an automatic conversion from strings to numbers in AMPL and therefore MathProg, which is a subset of AMPL, so symbolicExpression should be allowed in the context of numericExpression.
Note that the grammar is not context-free, so you'll need to pull additional information like the name category above from the symbol table to resolve problems like this.
My flex is a bit rusty but I think you can do something like that in an identifier rule:
return is_setname(...) ? TOK_SET_NAME : TOK_SCALAR_NAME;
where is_setname is a function to check whether the current identifier is a set. You'll need to define such function and get the necessary information from a symbol table.
This is a follow up question from Grammar: difference between a top down and bottom up?
I understand from that question that:
the grammar itself isn't top-down or bottom-up, the parser is
there are grammars that can be parsed by one but not the other
(thanks Jerry Coffin
So for this grammar (all possible mathematical formulas):
E -> E T E
E -> (E)
E -> D
T -> + | - | * | /
D -> 0
D -> L G
G -> G G
G -> 0 | L
L -> 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Would this be readable by a top down and bottom up parser?
Could you say that this is a top down grammar or a bottom up grammar (or neither)?
I am asking because I have a homework question that asks:
"Write top-down and bottom-up grammars for the language consisting of all ..." (different question)
I am not sure if this can be correct since it appears that there is no such thing as a top-down and bottom-up grammar. Could anyone clarify?
That grammar is stupid, since it unites lexing and parsing as one. But ok, it's an academic example.
The thing with bottoms-up and top-down is that is has special corner cases that are difficult to implement with you normal 1 look ahead. I probably think that you should check if it has any problems and change the grammar.
To understand you grammar I wrote a proper EBNF
expr:
expr op expr |
'(' expr ')' |
number;
op:
'+' |
'-' |
'*' |
'/';
number:
'0' |
digit digits;
digits:
'0' |
digit |
digits digits;
digit:
'1' |
'2' |
'3' |
'4' |
'5' |
'6' |
'7' |
'8' |
'9';
I especially don't like the rule digits: digits digits. It is unclear where the first digits starts and the second ends. I would implement the rule as
digits:
'0' |
digit |
digits digit;
An other problem is number: '0' | digit digits; This conflicts with digits: '0' and digits: digit;. As a matter of fact that is duplicated. I would change the rules to (removing digits):
number:
'0' |
digit |
digit zero_digits;
zero_digits:
zero_digit |
zero_digits zero_digit;
zero_digit:
'0' |
digit;
This makes the grammar LR1 (left recursive with one look ahead) and context free. This is what you would normally give to a parser generator such as bison. And since bison is bottoms up, this is a valid input for a bottoms-up parser.
For a top-down approach, at least for recursive decent, left recursive is a bit of a problem. You can use roll back, if you like but for these you want a RR1 (right recursive one look ahead) grammar. To do that swap the recursions:
zero_digits:
zero_digit |
zero_digit zero_digits;
I am not sure if that answers you question. I think the question is badly formulated and misleading; and I write parsers for a living...