We are learning about ambiguity in class, and the following grammar was given as an example of an ambiguous grammar. I just am not seeing how it is ambiguous. Is there a set pattern or method people use to determine ambiguity or is it just like a logic puzzle where you have to work through combinations to find an ambiguous sentence in the grammar? The examples I've read online mostly given the ambiguous sentence already, but how do you find that sentence in the first place? I'd appreciate any help, thank you.
< stmt_list> ==> < stmt>
| < stmt> ; < stmt_list>
< var> ==> A | B | C
< stmt> ==> < var> + < var>
| < var> - < var>
| < var>
In general, determining whether a grammar is ambiguous or not is undecidable. So yes, finding an ambiguous sentence in a grammar reduces to a very difficult logic puzzle. Solving specific cases and finding heuristics is an active area of research though. Here is a fairly good tool at finding ambiguity: http://www.brics.dk/grammar/. The webpage includes a link to a paper explaining how it works, though honestly, that goes over my head.
Related
I'm proving the big O runtime of an algorithm for an assignment but am unfortunately quite rusty when it comes to logs. Currently, I have:
(log(n))^q <= log(log(n))
I am trying to isolate q in terms of n (where I'm hoping n will cancel out). Can someone please explain to me how to do this (and not just provide an answer)? Thanks!
This would've been prettier on math stackexchange (because we can use latex), but you can just log both sides to bring the q exponent down (since log(x^n) = nlog(x) is a property of logs over the reals):
q log(log(n)) <= log(log(log(n)))
Now you can divide both sides to isolate q:
q <= log(log(log(n)))/log(log(n))
I am writing a simple compiler as a school work. I am looking for an automated approach to generate both positive and negative testing data to test my compiler, given the formal grammar and other specification. The language I am dealing with is of mediate size with 38 or so non-terminals. For the sake of illustration, here is a snapshot of the grammar:
program: const_decl* declaration* ENDMARKER
# statement
stmt: flow_stmt | '{' stmt* '}' | NAME [stmt_trailer] ';' | ';'
stmt_trailer: arglist | ['[' expr ']'] '=' expr
flow_stmt: if_stmt | for_stmt | while_stmt | read_stmt ';' | write_stmt ';' | return_stmt ';'
return_stmt: 'return' ['(' expr ')']
if_stmt: 'if' '(' condition ')' stmt ['else' stmt]
condition: expr ('<'|'<='|'>'|'>='|'!='|'==') expr | expr
for_stmt: ('for' '(' NAME '=' expr ';' condition ';'
NAME '=' NAME ('+'|'-') NUMBER ')' stmt)
Is there any tools to generate input file with the help of the grammar? The hand-written tests are too tedious or too weak to discover problems. An example of this language here:
void main() {
int N;
int temp;
int i, j;
int array_size;
reset_heap;
scanf(N);
for (i = 0; i < N; i = i + 1) {
scanf(array_size);
if (array_size > max_heap_size) {
printf("array_size exceeds max_heap_size");
} else {
for (j = 0; j < array_size; j = j + 1) {
scanf(temp);
heap[j] = temp;
}
heap_sort(array_size);
print_heap(array_size);
}
}
}
Generating controllable testing data automatically can save the days. Given the simplicity of the language, there must be some way to effectively do this. Any pointer and insight is greatly appreciated.
Any pointer and insight is greatly appreciated.
This should have the subtopic of How to avoid combinatorial explosion when generating test data.
While I would not be surprised if there are tools to do this having had the same need to generate test data for grammars I have created a few one off applications.
One of the best series of articles I have found on this is by Eric Lippert, Every Binary Tree There Is, think BNF converted to binary operators then converted to AST when you read tree. However he uses Catalan (every branch has two leaves) and when I wrote my app I preferred Motzikin (a branch can have one or two leaves).
Also he did his in C# with LINQ and I did mine in Prolog using DCG.
Generating the data based on the BNF or DCG is not hard, the real trick is to limit the area of expansion and the size of the expansion and to inject bad data.
By area of expansion lets say you want to test nested if statements three levels deep, but have to have valid code that compiles. Obviously you need the boilerplate code to make it compile then you start changing the deeply nested if by adding or removing the else clause. So you need to put in constraints so that the boilerplate code is constant and the testing part is variable.
By size of expansion lets say that you want to test conditional expressions. You can easily calculate that if you have many operators and you want to test them all in combinations you soon run into combinatorial explosion. The trick is to ensure you test deep enough and with enough breadth but not every combination. Again the judicial use of constraints helps.
So the point of all of this is that you start with a tool that takes in the BNF and generates valid code. Then you modify the BNF to add constraints and modify the generator to understand the constraints to generate the code examples.
Then you modify the BNF for invalid data and likewise the generator to understand those rules.
After that is working you can then start layering on levels of automation.
If you do go this route and decide that you will have to learn Prolog, take a look at Mercury first. I have not done this with Mercury, but if I do it again Mercury is high on the list.
While my actual code is not public, this and this is the closest to it that is public.
Along the way I had some fun with it in Code Golf.
When generating terminals such as reserved words or values for types, you can use predefined list with both valid and invalid data, e.g. for if if the language is case sensitive I would include in the list if,If,IF,iF, etc. For value types such as unsigned byte I would include -1,0,255 and 256.
When I was testing basic binary math expressions with +, -, * and ^ I generated all the test for with three basic numbers -2,-1,0,1, and 2. I thought it would be useless since I already had hundreds of test cases, but since it only took a few minutes to generate all of the test cases and several hours to run it, to my surprise it found a pattern I did not cover. The point here is that contrary what most people say about having to many test cases, remember that it is only time on a computer by changing a few constraints so do the large number of test.
Is there a betther way to do this condition:
if ( x > 3 || y > 3 || z > 3 ) {
...
}
I was thinking of some bitwise operation but could'nt find anything.
I searched over Google but it is hard to find something related to that kind of basic question.
Thanks!
Edit
I was thinkg in general programming. Would it be different according to a specific language? Such as C/C++, Java...
What you have is good. Assuming C/C++ or Java:
Intent is clear
Short circuit optimisation means the expression will be true as soon as any one part is true.
Looking at that 2nd point- if any of x, y or z is more likely to be >3, then put them to the left of the expression so they are evaluated first which means the others may not need to be evaluated at all.
For argument's sake, if you must have a bitwise check x|y|z > 3 works, but it normally won't be reduced, so it's (probably*) always 2 bitwise ops and a compare, where the other way could be as fast as 1 compare.
(* This is where the language lawyers arrive an add comments why this edit is wrong and the bitwise version can be optimised;-)
There was a comment here (now deleted) along the lines of "new programmer shouldn't worry about this level of optimisation" - and it was 100% correct. Write easy to follow, working code and THEN try to squeeze performance out of it AFTER you know it is "too slow".
I have a problem with the recognition of languages. Given a certain language, for example ancb2n, n > 0, how do I determine quickly what type belongs according Chomsky?
My idea was to determine the grammar that generates it and then up to the language but it is a long process. I think there's another way to recognize it by eye,
without writing grammars or automata.
can someone help me?
Unfortunately, associating an arbitrary language with a level of the Chomsky hierarchy is, in the general case, undecidable. (See Rice's Theorem.)
Of course, it is easy to categorize a given grammar, since the Chomsky hierarchy is defined by a simple syntactic analysis of the grammar itself. However, languages do not have unique grammars; the existence of (for example) a Type 2 (context-free) grammar for a language does not mean that there is not a Type 3 (regular) grammar for the same language.
So there are no shortcuts.
However, there is a lot to be said for experience. The language { ancb2n | n > 0 } is context-free (and not regular), as are all languages of similar forms. That it is context free is demonstrated by the grammar
L → c
L → a L b b
and the fact that it is not regular can be proven using the pumping lemma for regular languages. (The linked Wikipedia article contains, as an example of the use of the lemma, a proof for a similar language which should be easy to adapt.)
On the other hand, a language which requires three equal counts ({ ancnbn | n > 0 }) is not context-free (but is context-sensitive). (That's not the same as { ancn+mbm | n > 0 }, which is context-free.)
I have a grammar that represents expressions. Let's say for simplicity it's:
S -> E
E -> T + E | T
T -> P * T | P
P -> a | (E)
With a, +, *, ( and ) being the letters in my alphabet.
The above rules can generate valid arithmetic expressions containing parenthesis, multiplication and addition using proper order of operations and associativity.
My goal is to accept every string, containing 0 or more of the letters of my alphabet. Here are my constraints:
The grammar must "accept" all strings contained 0 or more letters of my alphabet.
New terminals may be introduced and inserted into the input algorithmically. (I found that explicitly providing an EOF terminal helped when detecting extra tokens beyond an otherwise valid expression, for some reason.)
New production rules may be introduced. The new rules will be error flags (i.e. if the string is parsed using one, then although the string is accepted, it is considered to semantically be an error).
The production rules may be modified so that newly-introduced terminals are inserted.
The grammar should be LALR(1) at least handle-able by Lex/Yacc-like tools (If I recall the dangling else problem requires non-LALR(1), in spite of being compatible with Lex/Yacc).
Additionally I would like the extra rules to correspond to different kinds of errors (missing arguments to a binary operation, missing parenthesis - left or right - extra token(s) beyond an otherwise accept-able expression, etc.). I say that because there may be some trivial way to answer my question to "accept" all inputs that otherwise wouldn't be beneficial for error reporting.
I've found these rules to be useful, although I do not know if they violate my constraints, or not:
P -> # [error]
P -> (E [error]
S -> E $ [instead of S -> E]
S -> E X $ [error]
X -> X Y [error]
X -> Y [error]
Y -> a [error]
Y -> ( [error]
Y -> ) [error]
Y -> * [error]
Y -> + [error]
where $ is the explicit EOF token and # is the empty string.
In case my question wasn't clear: How can I modify my grammar within my constraints to achieve my goal of accepting all inputs, preferably with a nice correspondence of rules to types of errors? Do my rules meet my goal?
This question has been around for a while and covers a topic that is is often visited by beginners in the subject. One often find that those who have done a compilers course in their undergraduate degree know that this is one of those questions that has no easy or single answer. You might have noticed that you have two questions on the same topic, neither of which has been answered. Another question someone else posted was answered with pointers to the literature that explains why this is hard.
It is a question that has remained extant for over 50 years. If one examines the literature over time, from early conference papers, course textbooks, doctoral thesis and (today) SO, we can see regular references to the fact that this is the wrong question! (Or rather the wrong approach to the problem).
Just taking a sample from course texts over the years (random selections from my bookshelf):
Gries, D. (1970) Error Recovery and Correction - An introduction to the Literature, In Compiler Construction, An advanced Course edited by Bauer, F.L. & Eickel, J., Springer Verlag, pp.627-638.
Gries, D. (1971) Compiler Construction for Digital Computers, Wiley, pp.320-326.
Aho, A.V., Ullman, J.D. (1977) Principles of Compiler Design, Addison Wesley, pp.397-405.
Bornat, R. (1979) Understanding and Writing Compilers, Macmillan, pp.251-252.
Hanson, D. (1995) A retargetable C compiler: Design and Implementation,Addison-Wesley, pp.140-146.
Grune, D., Bal, H.E., Jacobs, C.J.H. & Langendoen, K.G. (2000) Modern Compiler Design, Wiley, pp.175-184.
Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D. (2007) Compilers: Principles, Techniques, & Tools, Pearson, Addison-Wesley, pp.283-296.
All of these (over a 40 year span) concur that your question is about using the wrong tools wrongly or going in the wrong direction. I think I'm trying to say "You can't there from here". You should start from somewhere else.
If you want something deeper, there is a whole Phd thesis:
Charles, P. (1991) A Practical method for Constructing Efficient LALR(k) Parsers with Automatic Error Recovery, New York University
Hopefully, for those who visit this question again in the future, there is a placeholder for the answers.
I note from comments on your other question that you are using MPPG, derived from CPPG. Not everyone will have used these, so I'm including a couple of links to these tools:
Managed Babel Systems Essentials
Garden Points Parser Generator
Irony .NET compiler Construction Kit
Writing your first Visual Studio Language Service