I am creating a formal spec for a very simple rule language, very simple.
I want to use EBNF as this is a standard but I can't figure out how to specify order of operations. Here is the specification so far.
rule = statement, { (‘AND’|’OR’), statement};
variable = ‘$’,alphabetic character, {alphabetic character | digit};
statement = variable, [ ‘count’,[white space ],’>’,[white space],number ];
alphabetic character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
| "H" | "I" | "J" | "K" | "L" | "M" | "N"
| "O" | "P" | "Q" | "R" | "S" | "T" | "U"
| "V" | "W" | "X" | "Y" | "Z" ;
number = [ "-" ] , digit , { digit } ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
white space = ? white space characters ? ;
The question I have is how do I show that things in brackets should be evaluated first.
So something like this
$strap AND ($greenSticker count > 5 OR ($greenSticker AND $redSticker))
It seems like a common feature to most languages, but my Google skills are failing me and I can't seem to find an example.
Given this as a simplified example LL grammar:
expression -> (+|-|ε) term ((+|-) term)*
term -> factor ((*|/) factor)*
factor -> var | number | (expression)
As you can see, the operators with lower precedence (+ and -) are in a more general rule than the higher precedence operators (* and /). It is all about producing the correct parse tree. But as a rule of thumb, "outer" or more general rules have less precedence, which is why the addition and subtraction operators are placed beside term, because term must be further derived. If you look at more complicated grammars you will see that this is taken into an extreme to have proper precedence.
Related
Give the triple representation of a statement x:= y[i] I'm having a problem in this one
You should probably be more specific about the context that you need the representation, this book has some good information about compiler design. Here is what it would look like using it's semantics.
| operator | operand1 | operand2
1. | [] | y | i
2. | := | x | (1.)
If I want to create a font with multiple style combinations, like bold AND underline, I have to place the 'or' statement between it, like in the example below:
lblArt.Font = New Font("Tahoma", 18, FontStyle.Bold Or FontStyle.Underline)
If you place bold 'and' underline, it won't work, and you only get 1 of the 2 (like how the or statement should be working), while that would be the logically way to do it. What is the reason behind this?
Boolean logic works a bit differently than the way we use the terms in English. What's happening here is that the enumerated FontStyle values are actually bit flags, and in order to manipulate bit flags, you use bitwise operations.
To combine two bit flags, you OR them together. An OR operation combines the two values. So imagine that FontStyle.Bold was 2 and FontStyle.Underline was 4. When you OR them together, you get 6—you've combined them together. In Boolean logic, you can think of an OR operation as returning "true" (i.e., setting that bit in the result) if either of the bits in the two operands are set, and "false" if neither of the bits in the two operands are set.
You can write a truth table for such an operation as follows:
| A | B | A OR B |
|---|---|--------|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 1 |
Notice that the results more closely mirror what we, in informal English, would call "and". If either one has it set, then the result has it set, too.
In contrast to OR, a bitwise AND operation only returns "true" (i.e., sets that bit in the result) if both of the bits in the two operands are set. Otherwise, the result is "false". Again, a truth table can be written:
| A | B | A AND B |
|---|---|---------|
| 0 | 0 | 0 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |
Assuming again that FontStyle.Bold has the value 2 and FontStyle.Underline has the value 4, if you AND them together, you get 0. This is because the values effectively cancel each other out. The net result is that you don't get any font styles—precisely why it doesn't work when you write FontStyle.Bold And FontStyle.Underline.
In VB, a bitwise OR operation is performed using the Or operator. The And operator performs a bitwise AND operation. So in order to do a bitwise inclusion of values, which is how you combine bit flags, you use the Or operator.
try this:
lblArt.Font = New Drawing.Font("Tahoma", _
18, _
FontStyle.Bold or FontStyle.Italic)
use "New Drawing.Font" instead of Font alone
Source
I have this grammar:
agent
= nil
| #
| id
| act . agent
| agent + agent
| agent "|" agent
| agent \ restriction
| agent [relabeling]
| agent where agent_frame end
| automation
| (agent)
where the priorities are:
"where" < "+" < "|" < "\" < "." < "[" < "nil", "#"
I need to delete the left recursion respecting the priorities ( and write all in JavaCC).
Can you help me to delete recursion?
Dinesh thank you for the answer,
your solution give me a conflict in JavaCC with (agent-postfix)*.
I solved in this way:
agent=agent2 agent'
agent'= "where" agent_frame "end" agent' | epsilon
agent2= agent3 agent2'
agent2'= "+" agent3 agent2' | epsilon
agent3= agent4 agent3'
agent3'= "|" agent4 agent3' | epsilon
agent4 = agent5 agent4'
agent4'= "\" restriction agent4' | epsilon
agent5: act "." agent | agent6
agent6 = agent7 agent6'
agent6'= "[" relabeling "]" agent6' | epsilon
agent7= id | automaton | "(" agent ")" | "nil" | "#"
but I don't know if this solution is correct.
Thank you very much.
Regards
Domenico
I'm no JavaCC expert, but here is how you can get started to get rid of your left recursion:
agent-primary
= nil
| #
| id
| act . agent
| automation
| (agent)
agent-postfix
= + agent
| "|" agent
| \ restriction
| [relabeling]
| where agent_frame end
agent
= agent-primary (agent-postfix)*
You might face some conflicts with the right agent calls as well in the "binary" expressions such as agent + agent.
In any case, your grammar looks very similar to arithmetic expressions, so I advice you to have a look at how these are typically handled in JavaCC.
I need help with constructing a left-linear and right-linear grammar for the languages below?
a) (0+1)*00(0+1)*
b) 0*(1(0+1))*
c) (((01+10)*11)*00)*
For a) I have the following:
Left-linear
S --> B00 | S11
B --> B0|B1|011
Right-linear
S --> 00B | 11S
B --> 0B|1B|0|1
Is this correct? I need help with b & c.
Constructing an equivalent Regular Grammar from a Regular Expression
First, I start with some simple rules to construct Regular Grammar(RG) from Regular Expression(RE).
I am writing rules for Right Linear Grammar (leaving as an exercise to write similar rules for Left Linear Grammar)
NOTE: Capital letters are used for variables, and small for terminals in grammar. NULL symbol is ^. Term 'any number' means zero or more times that is * star closure.
[BASIC IDEA]
SINGLE TERMINAL: If the RE is simply e (e being any terminal), we can write G, with only one production rule S --> e (where S is the start symbol), is an equivalent RG.
UNION OPERATION: If the RE is of the form e + f, where both e and f are terminals, we can write G, with two production rules S --> e | f, is an equivalent RG.
CONCATENATION: If the RE is of the form ef, where both e and f are terminals, we can write G, with two production rules S --> eA, A --> f, is an equivalent RG.
STAR CLOSURE: If the RE is of the form e*, where e is a terminal and * Kleene star closure operation, we can write two production rules in G, S --> eS | ^, is an equivalent RG.
PLUS CLOSURE: If the RE is of the form e+, where e is a terminal and + Kleene plus closure operation, we can write two production rules in G, S --> eS | e, is an equivalent RG.
STAR CLOSURE ON UNION: If the RE is of the form (e + f)*, where both e and f are terminals, we can write three production rules in G, S --> eS | fS | ^, is an equivalent RG.
PLUS CLOSURE ON UNION: If the RE is of the form (e + f)+, where both e and f are terminals, we can write four production rules in G, S --> eS | fS | e | f, is an equivalent RG.
STAR CLOSURE ON CONCATENATION: If the RE is of the form (ef)*, where both e and f are terminals, we can write three production rules in G, S --> eA | ^, A --> fS, is an equivalent RG.
PLUS CLOSURE ON CONCATENATION: If the RE is of the form (ef)+, where both e and f are terminals, we can write three production rules in G, S --> eA, A --> fS | f, is an equivalent RG.
Be sure that you understands all above rules, here is the summary table:
+-------------------------------+--------------------+------------------------+
| TYPE | REGULAR-EXPRESSION | RIGHT-LINEAR-GRAMMAR |
+-------------------------------+--------------------+------------------------+
| SINGLE TERMINAL | e | S --> e |
| UNION OPERATION | e + f | S --> e | f |
| CONCATENATION | ef | S --> eA, A --> f |
| STAR CLOSURE | e* | S --> eS | ^ |
| PLUS CLOSURE | e+ | S --> eS | e |
| STAR CLOSURE ON UNION | (e + f)* | S --> eS | fS | ^ |
| PLUS CLOSURE ON UNION | (e + f)+ | S --> eS | fS | e | f |
| STAR CLOSURE ON CONCATENATION | (ef)* | S --> eA | ^, A --> fS |
| PLUS CLOSURE ON CONCATENATION | (ef)+ | S --> eA, A --> fS | f |
+-------------------------------+--------------------+------------------------+
note: symbol e and f are terminals, ^ is NULL symbol, and S is the start variable
[ANSWER]
Now, we can come to you problem.
a) (0+1)*00(0+1)*
Language description: All the strings consist of 0s and 1s, containing at-least one pair of 00.
Right Linear Grammar:
S --> 0S | 1S | 00A
A --> 0A | 1A | ^
String can start with any string of 0s and 1s thats why included rules s --> 0S | 1S and Because at-least one pair of 00 ,there is no null symbol. S --> 00A is included because 0, 1 can be after 00. The symbol A takes care of the 0's and 1's after the 00.
Left Linear Grammar:
S --> S0 | S1 | A00
A --> A0 | A1 | ^
b) 0*(1(0+1))*
Language description: Any number of 0, followed any number of 10 and 11.
{ because 1(0 + 1) = 10 + 11 }
Right Linear Grammar:
S --> 0S | A | ^
A --> 1B
B --> 0A | 1A | 0 | 1
String starts with any number of 0 so rule S --> 0S | ^ are included, then rule for generating 10 and 11 for any number of times using A --> 1B and B --> 0A | 1A | 0 | 1.
Other alternative right linear grammar can be
S --> 0S | A | ^
A --> 10A | 11A | 10 | 11
Left Linear Grammar:
S --> A | ^
A --> A10 | A11 | B
B --> B0 | 0
An alternative form can be
S --> S10 | S11 | B | ^
B --> B0 | 0
c) (((01+10)*11)*00)*
Language description: First is language contains null(^) string because there a * (star) on outside of every thing present inside (). Also if a string in language is not null that defiantly ends with 00. One can simply think this regular expression in the form of ( ( (A)* B )* C )* , where (A)* is (01 + 10)* that is any number of repeat of 01 and 10.
If there is a instance of A in string there would be a B defiantly because (A)*B and B is 11.
Some example strings { ^, 00, 0000, 000000, 1100, 111100, 1100111100, 011100, 101100, 01110000, 01101100, 0101011010101100, 101001110001101100 ....}
Left Linear Grammar:
S --> A00 | ^
A --> B11 | S
B --> B01 | B10 | A
S --> A00 | ^ because any string is either null, or if it's not null it ends with a 00. When the string ends with 00, the variable A matches the pattern ((01 + 10)* + 11)*. Again this pattern can either be null or must end with 11. If its null, then A matches it with S again i.e the string ends with pattern like (00)*. If the pattern is not null, B matches with (01 + 10)*. When B matches all it can, A starts matching the string again. This closes the out-most * in ((01 + 10)* + 11)*.
Right Linear Grammar:
S --> A | 00S | ^
A --> 01A | 10A | 11S
Second part of you question:
For a) I have the following:
Left-linear
S --> B00 | S11
B --> B0|B1|011
Right-linear
S --> 00B | 11S
B --> 0B|1B|0|1
(answer)
You solution are wrong for following reasons,
Left-linear grammar is wrong Because string 0010 not possible to generate.
Right-linear grammar is wrong Because string 1000 is not possible to generate. Although both are in language generated by regular expression of question (a).
EDIT
Adding DFA's for each regular expression. so that one can find it helpful.
a) (0+1)*00(0+1)*
b) 0*(1(0+1))*
c) (((01+10)*11)*00)*
Drawing DFA for this regular expression is trick and complex.
For this I wanted to add DFA's
To simplify the task, we should think the kind formation of RE
to me the RE (((01+10)*11)*00)* looks like (a*b)*
(((01+10)*11)* 00 )*
( a* b )*
Actually in above expression a it self in the form of (a*b)*
that is ((01+10)*11)*
RE (a*b)* is equals to (a + b)*b + ^. The DFA for (ab) is as belows:
DFA for ((01+10)*11)* is:
DFA for (((01+10)*11)* 00 )* is:
Try to find similarity in construction of above three DFA. don't move ahead till you don't understand first one
Rules to convert regular expressions to left or right linear regular grammar
This is a follow up question from Grammar: difference between a top down and bottom up?
I understand from that question that:
the grammar itself isn't top-down or bottom-up, the parser is
there are grammars that can be parsed by one but not the other
(thanks Jerry Coffin
So for this grammar (all possible mathematical formulas):
E -> E T E
E -> (E)
E -> D
T -> + | - | * | /
D -> 0
D -> L G
G -> G G
G -> 0 | L
L -> 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Would this be readable by a top down and bottom up parser?
Could you say that this is a top down grammar or a bottom up grammar (or neither)?
I am asking because I have a homework question that asks:
"Write top-down and bottom-up grammars for the language consisting of all ..." (different question)
I am not sure if this can be correct since it appears that there is no such thing as a top-down and bottom-up grammar. Could anyone clarify?
That grammar is stupid, since it unites lexing and parsing as one. But ok, it's an academic example.
The thing with bottoms-up and top-down is that is has special corner cases that are difficult to implement with you normal 1 look ahead. I probably think that you should check if it has any problems and change the grammar.
To understand you grammar I wrote a proper EBNF
expr:
expr op expr |
'(' expr ')' |
number;
op:
'+' |
'-' |
'*' |
'/';
number:
'0' |
digit digits;
digits:
'0' |
digit |
digits digits;
digit:
'1' |
'2' |
'3' |
'4' |
'5' |
'6' |
'7' |
'8' |
'9';
I especially don't like the rule digits: digits digits. It is unclear where the first digits starts and the second ends. I would implement the rule as
digits:
'0' |
digit |
digits digit;
An other problem is number: '0' | digit digits; This conflicts with digits: '0' and digits: digit;. As a matter of fact that is duplicated. I would change the rules to (removing digits):
number:
'0' |
digit |
digit zero_digits;
zero_digits:
zero_digit |
zero_digits zero_digit;
zero_digit:
'0' |
digit;
This makes the grammar LR1 (left recursive with one look ahead) and context free. This is what you would normally give to a parser generator such as bison. And since bison is bottoms up, this is a valid input for a bottoms-up parser.
For a top-down approach, at least for recursive decent, left recursive is a bit of a problem. You can use roll back, if you like but for these you want a RR1 (right recursive one look ahead) grammar. To do that swap the recursions:
zero_digits:
zero_digit |
zero_digit zero_digits;
I am not sure if that answers you question. I think the question is badly formulated and misleading; and I write parsers for a living...