lexical elements of grammar

lexical elements of grammar - grammar

There is 4 lexical elements of grammar
G = (S, N, T, P)
Where G = Grammar, S = Start Symbol, N = Non-Terminals, T = Terminals, P = Production rules
I wanted to know if N is always equal with P because as I know P are lexemes which may replace with other lexemes
So in this example:
<program> --> <stmts>
<stmts> --> <stmt> | <stmt> ; <stmts>
<stmt> --> <var> = <expr>
<var> --> a | b | c | d
<expr> --> <term> + <term> | <term> - <term>
<term> --> <var> | const
S: <program>
N: <program>, <stmts>, <var>, <expr>, <term>
T: ;, a, b, c, d, +, -, const
P: <program>, <stmts>, <var>, <expr>, <term>
Is that right?

No. |N| does not necessarily = |P|. Consider:
<program> --> <stmts>
<stmts> --> <stmt>
<stmts> -->| <stmt> ; <stmts>
<stmt> --> <var> = <expr>
<var> --> a
<var> --> b
<var> --> c
<var> --> d
<expr> --> <term> + <term>
<expr> --> <term> - <term>
<term> --> <var>
<term> --> const
Your problem is that you are not precise about what is allowed in a grammar rule.
You can force the number of rules to match nonterminals, by insisting that no rule has the same left hand side. To do that, you can't insist on simple BNF; you have to have extended BNF with at least alternation.
PS: This isn't really about "lexical elements" of a grammar. It is just about the definition of a grammar.

Sounds like homework... You have it basically right, only P are the actual rules.
G = (S, N, T, P)
S: <program>
N: <program>, <stmts>, <var>, <expr>, <term>
T: ;, a, b, c, d, +, -, const
P: <program> --> <stmts>,
<stmts> --> <stmt> | <stmt> ; <stmts>,
<stmt> --> <var> = <expr>,
<var> --> a | b | c | d,
<expr> --> <term> + <term> | <term> - <term>,
<term> --> <var> | const
Without rules the a grammar is useless.

Related

Controlling parser rule alternatives precedence

I have an expression IF 1 THEN 2 ELSE 3 * 4. I want this parsed as IF 1 THEN 2 ELSE (3 * 4), however using my grammar (extract) below, it parses it as (IF 1 THEN 2 ELSE 3) * 4.
formula: expression EOF;
expression
: LPAREN expression RPAREN #parenthesisExp
| IF condition=expression THEN thenExpression=expression ELSE elseExpression=expression #ifExp
| left=expression BINARYOPERATOR right=expression #binaryoperationExp
| left=expression op=(TIMES|DIV) right=expression #muldivExp
| left=expression op=(PLUS|MINUS) right=expression #addsubtractExp
| left=expression op=(EQUALS|NOTEQUALS|LT|GT) right=expression #comparisonExp
| left=expression AMPERSAND right=expression #concatenateExp
| NOT expression #notExp
| STRINGLITERAL #stringliteralExp
| signedAtom #atomExp
;
My understanding is that because I have the ifExp alternative appearing before the muldivExp it should use that first, then because I have the muldivExp before atomExp (which handles numbers) it should do 3 * 4 to end the ELSE, rather than using just the 3. In which case I can't see why it's making the IF..THEN..ELSE a child of the multiplication.
I don't think the rest of the grammar is relevant here, but in case it is see below for the whole thing.
grammar AnaplanFormula;
formula: expression EOF;
expression
: LPAREN expression RPAREN #parenthesisExp
| IF condition=expression THEN thenExpression=expression ELSE elseExpression=expression #ifExp
| left=expression BINARYOPERATOR right=expression #binaryoperationExp
| left=expression op=(TIMES|DIV) right=expression #muldivExp
| left=expression op=(PLUS|MINUS) right=expression #addsubtractExp
| left=expression op=(EQUALS|NOTEQUALS|LT|GT) right=expression #comparisonExp
| left=expression AMPERSAND right=expression #concatenateExp
| NOT expression #notExp
| STRINGLITERAL #stringliteralExp
| signedAtom #atomExp
;
signedAtom
: PLUS signedAtom #plusSignedAtom
| MINUS signedAtom #minusSignedAtom
| func_ #funcAtom
| atom #atomAtom
;
atom
: SCIENTIFIC_NUMBER #numberAtom
| LPAREN expression RPAREN #expressionAtom // Do we need this?
| entity #entityAtom
;
func_: functionname LPAREN (expression (',' expression)*)? RPAREN #funcParameterised
| entity LSQUARE dimensionmapping (',' dimensionmapping)* RSQUARE #funcSquareBrackets
;
dimensionmapping: WORD COLON entity; // Could make WORD more specific here
functionname: WORD; // Could make WORD more specific here
entity: QUOTELITERAL #quotedEntity
| WORD+ #wordsEntity
| left=entity DOT right=entity #dotQualifiedEntity
;
WS: [ \r\n\t]+ -> skip;
/////////////////
// Fragments //
/////////////////
fragment NUMBER: DIGIT+ (DOT DIGIT+)?;
fragment DIGIT: [0-9];
fragment LOWERCASE: [a-z];
fragment UPPERCASE: [A-Z];
fragment WORDSYMBOL: [#?_£%];
//////////////////
// Tokens //
//////////////////
IF: 'IF' | 'if';
THEN: 'THEN' | 'then';
ELSE: 'ELSE' | 'else';
BINARYOPERATOR: 'AND' | 'and' | 'OR' | 'or';
NOT: 'NOT' | 'not';
WORD: (DIGIT* (LOWERCASE | UPPERCASE | WORDSYMBOL)) (LOWERCASE | UPPERCASE | DIGIT | WORDSYMBOL)*;
STRINGLITERAL: DOUBLEQUOTES (~'"' | ('""'))* DOUBLEQUOTES;
QUOTELITERAL: '\'' (~'\'' | ('\'\''))* '\'';
LSQUARE: '[';
RSQUARE: ']';
LPAREN: '(';
RPAREN: ')';
PLUS: '+';
MINUS: '-';
TIMES: '*';
DIV: '/';
COLON: ':';
EQUALS: '=';
NOTEQUALS: LT GT;
LT: '<';
GT: '>';
AMPERSAND: '&';
DOUBLEQUOTES: '"';
UNDERSCORE: '_';
QUESTIONMARK: '?';
HASH: '#';
POUND: '£';
PERCENT: '%';
DOT: '.';
PIPE: '|';
SCIENTIFIC_NUMBER: NUMBER (('e' | 'E') (PLUS | MINUS)? NUMBER)?;

Move your ifExpr down near the end of your alternatives. (In particular, below any alternative that you would wish to match your elseExpression
Your “if ... then ... else ...” is below the muldivExp precisely because you've made it a higher priority. Items lower in the tree are evaluated before items higher in the tree, so higher priority items belong lower in the tree.
With:
expression:
LPAREN expression RPAREN # parenthesisExp
| left = expression BINARYOPERATOR right = expression # binaryoperationExp
| left = expression op = (TIMES | DIV) right = expression # muldivExp
| left = expression op = (PLUS | MINUS) right = expression # addsubtractExp
| left = expression op = (EQUALS | NOTEQUALS | LT | GT) right = expression # comparisonExp
| left = expression AMPERSAND right = expression # concatenateExp
| NOT expression # notExp
| STRINGLITERAL # stringliteralExp
| signedAtom # atomExp
| IF condition = expression THEN thenExpression = expression ELSE elseExpression = expression #
ifExp
;
I get

Polynomial evaluation in Isabelle

In the book Concrete Semantics, exercise 2.11 writes:
Define arithmetic expressions in one variable over integers
(type int) as a data type:
datatype exp = Var | Const int | Add exp exp | Mult exp exp
Define a function eval :: exp ⇒ int ⇒ int such that eval e x evaluates e at the value x. A polynomial can be represented as a list of coefficients, starting with the constant. For example, [4, 2, − 1, 3] represents the polynomial 4+2x−x 2 +3x 3 . Define a function evalp :: int list ⇒ int ⇒ int that evaluates a polynomial at the given value. Define a function coeffs :: exp ⇒ int list that transforms an
expression into a polynomial. This may require auxiliary functions. Prove that coeffs preserves the value of the expression: evalp (coeffs e) x = eval e x. Hint: consider the hint in Exercise 2.10.
As a first try, and because former exercises encouraged to write iterative versions of functions, I wrote the following:
datatype exp = Var | Const int | Add exp exp | Mult exp exp
fun eval :: "exp ⇒ int ⇒ int" where
"eval Var x = x"
| "eval (Const n) _ = n"
| "eval (Add e1 e2) x = (eval e1 x) + (eval e2 x)"
| "eval (Mult e1 e2) x = (eval e1 x) * (eval e2 x)"
fun evalp_it :: "int list ⇒ int ⇒ int ⇒ int ⇒ int" where
"evalp_it [] x xpwr acc = acc"
| "evalp_it (c # cs) x xpwr acc = evalp_it cs x (xpwr*x) (acc + c*xpwr)"
fun evalp :: "int list ⇒ int ⇒ int" where
"evalp coeffs x = evalp_it coeffs x 1 0"
fun add_coeffs :: "int list ⇒ int list ⇒ int list" where
"add_coeffs [] [] = []"
| "add_coeffs (a # as) (b# bs) = (a+b) # (add_coeffs as bs)"
| "add_coeffs as [] = as"
| "add_coeffs [] bs = bs"
(there might be some zip function to do this)
fun mult_coeffs_it :: "int list ⇒ int list ⇒ int list ⇒ int list ⇒ int list" where
"mult_coeffs_it [] bs accs zeros = accs"
| "mult_coeffs_it (a#as) bs accs zeros =
mult_coeffs_it as bs (add_coeffs accs zeros#bs) (0#zeros)"
fun mult_coeffs :: "int list ⇒ int list ⇒ int list" where
"mult_coeffs as bs = mult_coeffs_it as bs [] []"
fun coeffs :: "exp ⇒ int list" where
"coeffs (Var) = [0,1]"
| "coeffs (Const n) = [n]"
| "coeffs (Add e1 e2) = add_coeffs (coeffs e1) (coeffs e2)"
| "coeffs (Mult e1 e2) = mult_coeffs (coeffs e1) (coeffs e2)"
I tried to verify the sought theorem
lemma evalp_coeffs_eval: "evalp (coeffs e) x = eval e x"
but could not. Once I got an advice that writing good definitions is very important in theorem proving, though the adviser did not give details.
So, what is the problem with my definitions, conceptually? Please do not write the good definitions but point out the conceptual problems with my definitions.
UPDATE: upon advice I started to use
fun evalp2 :: "int list ⇒ int ⇒ int" where
"evalp2 [] v = 0"|
"evalp2 (p#ps) v = p + v * (evalp2 ps v) "
and looking into src/HOL/Algebra/Polynomials.thy I formulated
fun add_cffs :: "int list ⇒ int list ⇒ int list" where
"add_cffs as bs =
( if length as ≥ length bs
then map2 (+) as (replicate (length as - length bs) 0) # bs
else add_cffs bs as)"
but that did not help much, simp add: algebra_simps or arith did not solve the corresponding subgoal.

Some hints:
Try running quickcheck and nitpick on the Lemma until no more counter example is found. For the Lemma in your question I get the following counter example:
Quickcheck found a counterexample:
e = Mult Var Var
x = - 2
Evaluated terms:
evalp (coeffs e) x = - 10
eval e x = 4
Try to prove some useful Lemmas about your auxiliary functions first. For example:
lemma "evalp (mult_coeffs A B) x = evalp A x * evalp B x"
Read about calculating with polynomials (e.g. https://en.wikipedia.org/wiki/Polynomial_ring) and choose definitions close to what mathematicians do. Although this one might spoil the fun of coming up with a definition on your own.

Why is this grammar ambiguous?

I'm using Antlr4. Here is my grammar:
assign : id '=' expr ;
id : 'A' | 'B' | 'C' ;
expr : expr '+' term
| expr '-' term
| term ;
term : term '*' factor
| term '/' factor
| factor ;
factor : expr '**' factor
| '(' expr ')'
| id ;
WS : [ \t\r\n]+ -> skip ;
I know this grammar is ambiguous and also I know I should add an element to the grammar but I don't know how to make the grammar unambiguous.

factor : expr '**' factor
Consider the input
A + B ** C
A + B is an expr so we could analyse that as a factor, semantically (A+B)C
But the other, more conventional interpretation (A + (BC)) is also possible:
<expr> =>
<expr> + <term> =>
<term> + <term> =>
<factor> + <term> =>
A + <term> =>
A + <factor> =>
A + <expr> ** <factor> =>
A + <term> ** <factor> =>
A + <factor> ** <factor> =>
A + B ** <factor> =>
A + B ** C

Using sets in lean

I'd like to do some work in topology using lean.
As a good start, I wanted to prove a couple of simple lemmas about sets in lean.
For example
def inter_to_union (H : a ∈ set.inter A B) : a ∈ set.union A B :=
sorry
or
def set_deMorgan : a ∈ set.inter A B → a ∈ set.compl (set.union (set.compl A) (set.compl B)) :=
sorry
or, perhaps more interestingly
def set_deMorgan2 : set.inter A B = set.compl (set.union (set.compl A) (set.compl B)) :=
sorry
But I can't find anywhere elimination rules for set.union or set.inter, so I just don't know how to work with them.
How do I prove the lemmas?
Also, looking at the definition of sets in lean, I can see bits of syntax, which look very much like paper maths, but I don't understand at the level of dependent type theory, for example:
protected def sep (p : α → Prop) (s : set α) : set α :=
{a | a ∈ s ∧ p a}
How can one break down the above example into simpler notions of dependent/inductive types?

That module identifies sets with predicates on some type α (α is usually called 'the universe'):
def set (α : Type u) := α → Prop
If you have a set s : set α and for some x : α you can prove s a, this is interpreted as 'x belongs to s'.
In this case, x ∈ A is a notation (let us not mind about typeclasses for now) for set.mem x A which is defined as follows:
protected def mem (a : α) (s : set α) :=
s a
The above explains why the empty set is represented as the predicate always returning false:
instance : has_emptyc (set α) :=
⟨λ a, false⟩
And also, the universe is unsurprisingly represented like so:
def univ : set α :=
λ a, true
I'll show how prove the first lemma:
def inter_to_union {α : Type} {A B : set α} {a : α} : A ∩ B ⊆ A ∪ B :=
assume (x : α) (xinAB : x ∈ A ∩ B), -- unfold the definition of `subset`
have xinA : x ∈ A, from and.left xinAB,
#or.inl _ (x ∈ B) xinA
This is all like the usual "pointful" proofs for these properties in basic set theory.
Regarding your question about sep -- you can see through notations like so:
set_option pp.notation false
#print set.sep
Here is the output:
protected def set.sep : Π {α : Type u}, (α → Prop) → set α → set α :=
λ {α : Type u} (p : α → Prop) (s : set α),
set_of (λ (a : α), and (has_mem.mem a s) (p a))

ANTLR4 Grammar Performance Very Poor

Given the grammar below, I'm seeing very poor performance when parsing longer strings, on the order of seconds. (this on both Python and Go implementations) Is there something in this grammar that is causing that?
Example output:
0.000061s LEXING "hello world"
0.014349s PARSING "hello world"
0.000052s LEXING 5 + 10
0.015384s PARSING 5 + 10
0.000061s LEXING FIRST_WORD(WORD_SLICE(contact.blerg, 2, 4))
0.634113s PARSING FIRST_WORD(WORD_SLICE(contact.blerg, 2, 4))
0.000095s LEXING (DATEDIF(DATEVALUE("01-01-1970"), date.now, "D") * 24 * 60 * 60) + ((((HOUR(date.now)+7) * 60) + MINUTE(date.now)) * 60))
1.552758s PARSING (DATEDIF(DATEVALUE("01-01-1970"), date.now, "D") * 24 * 60 * 60) + ((((HOUR(date.now)+7) * 60) + MINUTE(date.now)) * 60))
This is on Python.. though I don't expect blazing performance I would expect sub-second for any input. What am I doing wrong?
grammar Excellent;
parse
: expr EOF
;
expr
: atom # expAtom
| concatenationExpr # expConcatenation
| equalityExpr # expEquality
| comparisonExpr # expComparison
| additionExpr # expAddition
| multiplicationExpr # expMultiplication
| exponentExpr # expExponent
| unaryExpr # expUnary
;
path
: NAME (step)*
;
step
: LBRAC expr RBRAC
| PATHSEP NAME
| PATHSEP NUMBER
;
parameters
: expr (COMMA expr)* # functionParameters
;
concatenationExpr
: atom (AMP concatenationExpr)? # concatenation
;
equalityExpr
: comparisonExpr op=(EQ|NE) comparisonExpr # equality
;
comparisonExpr
: additionExpr (op=(LT|GT|LTE|GTE) additionExpr)? # comparison
;
additionExpr
: multiplicationExpr (op=(ADD|SUB) multiplicationExpr)* # addition
;
multiplicationExpr
: exponentExpr (op=(MUL|DIV) exponentExpr)* # multiplication
;
exponentExpr
: unaryExpr (EXP exponentExpr)? # exponent
;
unaryExpr
: SUB? atom # negation
;
funcCall
: function=NAME LPAR parameters? RPAR # functionCall
;
funcPath
: function=funcCall (step)* # functionPath
;
atom
: path # contextReference
| funcCall # atomFuncCall
| funcPath # atomFuncPath
| LITERAL # stringLiteral
| NUMBER # decimalLiteral
| LPAR expr RPAR # parentheses
| TRUE # true
| FALSE # false
;
NUMBER
: DIGITS ('.' DIGITS?)?
;
fragment
DIGITS
: ('0'..'9')+
;
TRUE
: [Tt][Rr][Uu][Ee]
;
FALSE
: [Ff][Aa][Ll][Ss][Ee]
;
PATHSEP
:'.';
LPAR
:'(';
RPAR
:')';
LBRAC
:'[';
RBRAC
:']';
SUB
:'-';
ADD
:'+';
MUL
:'*';
DIV
:'/';
COMMA
:',';
LT
:'<';
GT
:'>';
EQ
:'=';
NE
:'!=';
LTE
:'<=';
GTE
:'>=';
QUOT
:'"';
EXP
: '^';
AMP
: '&';
LITERAL
: '"' ~'"'* '"'
;
Whitespace
: (' '|'\t'|'\n'|'\r')+ ->skip
;
NAME
: NAME_START_CHARS NAME_CHARS*
;
fragment
NAME_START_CHARS
: 'A'..'Z'
| '_'
| 'a'..'z'
| '\u00C0'..'\u00D6'
| '\u00D8'..'\u00F6'
| '\u00F8'..'\u02FF'
| '\u0370'..'\u037D'
| '\u037F'..'\u1FFF'
| '\u200C'..'\u200D'
| '\u2070'..'\u218F'
| '\u2C00'..'\u2FEF'
| '\u3001'..'\uD7FF'
| '\uF900'..'\uFDCF'
| '\uFDF0'..'\uFFFD'
;
fragment
NAME_CHARS
: NAME_START_CHARS
| '0'..'9'
| '\u00B7' | '\u0300'..'\u036F'
| '\u203F'..'\u2040'
;
ERRROR_CHAR
: .
;

You can always try to parse with SLL(*) first and only if that fails you need to parse it with LL(*) (which is the default).
See this ticket on ANTLR's GitHub for further explaination and here is an implementation that uses this strategy.
This method will save you (a lot of) time when parsing syntactically correct input.

Seems like this performance is due to the left recursion used in the addition / multiplication etc, operators. Rewriting these to be binary rules instead yields performance that is instant. (see below)
grammar Excellent;
COMMA : ',';
LPAREN : '(';
RPAREN : ')';
LBRACK : '[';
RBRACK : ']';
DOT : '.';
PLUS : '+';
MINUS : '-';
TIMES : '*';
DIVIDE : '/';
EXPONENT : '^';
EQ : '=';
NEQ : '!=';
LTE : '<=';
LT : '<';
GTE : '>=';
GT : '>';
AMPERSAND : '&';
DECIMAL : [0-9]+('.'[0-9]+)?;
STRING : '"' (~["] | '""')* '"';
TRUE : [Tt][Rr][Uu][Ee];
FALSE : [Ff][Aa][Ll][Ss][Ee];
NAME : [a-zA-Z][a-zA-Z0-9_.]*; // variable names, e.g. contact.name or function names, e.g. SUM
WS : [ \t\n\r]+ -> skip; // ignore whitespace
ERROR : . ;
parse : expression EOF;
atom : fnname LPAREN parameters? RPAREN # functionCall
| atom DOT atom # dotLookup
| atom LBRACK expression RBRACK # arrayLookup
| NAME # contextReference
| STRING # stringLiteral
| DECIMAL # decimalLiteral
| TRUE # true
| FALSE # false
;
expression : atom # atomReference
| MINUS expression # negation
| expression EXPONENT expression # exponentExpression
| expression (TIMES | DIVIDE) expression # multiplicationOrDivisionExpression
| expression (PLUS | MINUS) expression # additionOrSubtractionExpression
| expression (LTE | LT | GTE | GT) expression # comparisonExpression
| expression (EQ | NEQ) expression # equalityExpression
| expression AMPERSAND expression # concatenation
| LPAREN expression RPAREN # parentheses
;
fnname : NAME
| TRUE
| FALSE
;
parameters : expression (COMMA expression)* # functionParameters
;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

lexical elements of grammar - grammar

Related

Controlling parser rule alternatives precedence

Polynomial evaluation in Isabelle

Why is this grammar ambiguous?

Using sets in lean

ANTLR4 Grammar Performance Very Poor

Categories

Resources