Why is this grammar not LL(1)? - grammar

I am doing some homework, and was asked to show that the grammar A -> aAa | ε is not LL(1). From everything that I have seen, the answer I have so far is that since the First and the Follow sets of A contain a. Is this correct or am I thinking about something the wrong way?

Related

how to skip "and" with skip rule?

I'm working on a new antlr grammar which is similar to nattys and should recognize date expressions, but I have problem with skip rules. In more detail I want to ignore useless "and"s in expressions for example:
Call Sam, John and Adam and fix a meeting with Sarah about the finance on Monday and Friday.
The first two "and"s are useless. I wrote the rule bellow to fix this problem but it didn't work, why? what should I do?
NW : [~WeekDay];
UselessAnd : AND NW -> skip;
"Useless AND" is a semantic concept.
Grammars are about syntax, and handle semantic issues poorly. Don't couple these together.
Suggestion: when you write a grammar for a language, make your parser accept the language as it is, warts and all. In your case, I suggest you "collect" the useless ANDs. That way you can get the grammar "right" more easily, and more transparently to the next coder who has to maintain your grammar.
Once you have the AST, it is pretty easy to ignore (semantically) useless things; if nothing else, you can post-process the AST and remove the useless AND nodes.

antlr add syntactic predicate

For the following rule :
switchBlockLabels
: ^(SWITCH_BLOCK_LABEL_LIST switchCaseLabel* switchDefaultLabel? switchCaseLabel*)
;
I got an error:"rule switchBlockLabels has non-LL descision due to recursive rule invocations reachable from alts 1,2".And I tried to add syntactic predicate to solve this problem.I read the book "The Definitive ANTLR Reference".And Now I am confused that since there is no alternatives in rule switchBlockLabels,then no decision need to be made on which one to choose.
Is anyone can help me?
Whenever the tree parser stumbles upon, say, 2 switchCaseLabels (and no switchDefaultLabel in the middle), it does not know to which these switchCaseLabels belong. There are 3 possibilities the parser can choose from:
2 switchCaseLabels are matched by the 1st switchCaseLabel*;
2 switchCaseLabels are matched by the 2nd switchCaseLabel*;
1 switchCaseLabel is matched by the 1st switchCaseLabel*, and one by the 2nd switchCaseLabel*.
and since the parser does not like to choose for you, it emits an error.
You need to do something like this instead:
switchBlockLabels
: ^(SWITCH_BLOCK_LABEL_LIST switchCaseLabel* (switchDefaultLabel switchCaseLabel*)?)
;
That way, when there are only switchCaseLabels, and no switchDefaultLabel, these switchCaseLabels would be always matched by the first switchCaseLabel*: there is no ambiguity anymore.

Ambiguous grammar?

hi
there is this question in the book that said
Given this grammer
A --> AA | (A) | epsilon
a- what it generates\
b- show that is ambiguous
now the answers that i think of is
a- adjecent paranthesis
b- it generates diffrent parse tree so its abmbiguous and i did a draw showing two scenarios .
is this right or there is a better answer ?
a is almost correct.
Grammar really generates (), ()(), ()()(), … sequences.
But due to second rule it can generate (()), ()((())), etc.
b is not correct.
This grammar is ambiguous due ot immediate left recursion: A → AA.
How to avoid left recursion: one, two.
a) Nearly right...
This grammar generates exactly the set of strings composed of balanced parenthesis. To see why is that so, let's try to make a quick demonstration.
First: Everything that goes out of your grammar is a balanced parenthesis string. Why?, simple induction:
Epsilon is a balanced (empty) parenthesis string.
if A is a balanced parenthesis string, the (A) is also balanced.
if A1 and A2 are balanced, so is A1A2 (I'm using too different identifiers just to make explicit the fact that A -> AA doesn't necessary produces the same for each A).
Second: Every set of balanced string is produced by your grammar. Let's do it by induction on the size of the string.
If the string is zero-sized, it must be Epsilon.
If not, then being N the size of the string and M the length of the shortest prefix that is balanced (note that the rest of the string is also balanced):
If M = N then you can produce that string with (A).
If M < N the you can produce it with A -> AA, the first M characters with the first A and last N - M with the last A.
In either case, you have to produce a string shorter than N characters, so by induction you can do that. QED.
For example: (()())(())
We can generate this string using exactly the idea of the demonstration.
A -> AA -> (A)A -> (AA)A -> ((A)(A))A -> (()())A -> (()())(A) -> (()())((A)) -> (()())(())
b) Of course left and right recursion is enough to say it's ambiguous, but to see why specially this grammar is ambiguous, follow the same idea for the demonstration:
It is ambiguous because you don't need to take the shortest balanced prefix. You could take the longest balanced (or in general any balanced prefix) that is not the size of the string and the demonstration (and generation) would follow the same process.
Ex: (())()()
You can chose A -> AA and generate with the first A the (()) substring, or the (())() substring.
Yes you are right.
That is what ambigious grammar means.
the problem with mbigious grammars is that if you are writing a compiler, and you want to identify each token in certain line of code (or something like that), then ambigiouity wil inerrupt you in identifying as you will have "two explainations" to that line of code.
It sounds like your approach for part B is correct, showing two independent derivations for the same string in the languages defined by the grammar.
However, I think your answer to part A needs a little work. Clearly you can use the second clause recursively to obtain strings like (((((epsilon))))), but there are other types of derivations possible using the first clause and second clause together.

Handling Grammar / Spelling Issues in Translation Strings

We are currently implementing a Zend Framework Project, that needs to be translated in 6 different languages. We already have a pretty sophisticated translation system, based on Zend_Translate, which also handles variables in translation keys.
Our project has a new Turkish translator, and we are facing a new issue: Grammar, especially Turkish one. I noticed that this problem might be evident in every translation system and in most languages, so I posted a question here.
Question: Any ideas how to handle translations like:
Key: I have a[n] {fruit}
Variables: apple, banana
Result: I have an apple. I have a banana.
Key: Stimme für {user}[s] Einsendung
Variables: Paul, Markus
Result: Stimme für Pauls Einsendung,
Result: Stimme für Markus Einsendung
Anybody has a solution or idea for this? My only guess would be to avoid this by not using translations where these issues occur.
How do other platforms handle this?
Of course the translation system has no idea which type of word it is placing where in which type of Sentence. It only does some string replacements...
PS: Turkish is even more complicated:
For example, on a profile page, we have "Annie's Network". This should translate as "Annie'nin Aği".
If the first name ends in a vowel, the suffix will start with an n and look like "Annie'nin"
If the first name ends in a consonant, it will not have the first n, and look like "Kris'in"
If the last vowel is an a or ı, it will look like "Dan'ın"; or Seyma'nın"
If the last vowel is an o or u, it will look like "Davud'un"; or "Burcu'nun"
If the last vowel is an e or i, it will look like "Erin'in"; or "Efe'nin"
If the last vowel is an ö or ü, it will look like "Göz'ün'; or "Iminönü'nün"
If the last letter is a k (like the name "Basak"), it will look like "Basağın"; or "Eriğin"
It is actually very hard problem, as grammar rules are different even among languages from the same family. I don't think you could easily do anything for let's say Slavic languages...
However, if you want to solve this problem (because this is extra challenging) and you are looking for creative (cross inspiring) ways to do that, you might want to look into something called ChoiceFormat (example would be one from ICU Project) or you can look up GNU Gettext's solution for plural forms problem.
ICU (mentioned above) has a SelectFormat http://site.icu-project.org/design/formatting/select that may be of help- it's like a choice format but with arbitrary keywords. Also, it does have a PluralFormat which already has rules for many language's plural rules.

Writing an ANTLR action "in between" multiplicity

I'm working on an ANTLR grammar that looks like...
A : B+;
...and I'd like to be able to perform an action before and after each instance of B. For example, I'd like something like...
A : A {out("Before");} B {out("After");}
| {out("Before");} B {out("After");};
So that on the input stream A B B I would see the output...
Before
After
Before
After
Of course the second example isn't valid ANTLR syntax because of the left recursive rule. Is there a way to accomplish what I want with proper ANTLR syntax?
I should also mention that there are other ways of reaching the B rule so simply surrounding the B rule with before and after won't work.
Doesn't something like
A : ({out("Before");} B {out("After");})+;
work?