Is this conversion from BNF to EBNF correct? - grammar

As context, my textbook uses this style for EBNF:
Sebesta, Robert W. Concepts of Programming Languages 11th ed., Pearson, 2016, 150.
The problem:
Convert the following BNF rule with three RHSs to an EBNF rule with a single RHS.
Note: Conversion to EBNF should remove all explicit recursion and yield a single RHS EBNF rule.
A ⟶ B + A | B – A | B
My solution:
A ⟶ B [ (+ | –) A ]
My professor tells me:
"First, you should use { } instead of [ ],
Second, according to the BNF rule, <"term"> is B." (He is referring the the style guide posted above)
Is he correct? I assume so but have read other EBNF styles and wonder if I am entitled to credit.

You were clearly asked to remove explicit recursion and your proposed solution doesn't do that; A is still defined in terms of itself. So independent of naming issues, you failed to do the requested conversion and your prof is correct to mark you down for it. The correct solution for the problem as presented, ignoring the names of non-terminals, is A ⟶ B { (+ | –) B }, using indefinite repetition ({…}) instead of optionality ([…]). With this solution, the right-hand side of the production for A only references B, so there is no recursion (at least, in this particular production).
Now, for naming: clearly, your textbook's EBNF style is to use angle brackets around the non-terminal names. That's a common style, and many would say that it is more readable than using single capital letters which mean nothing to a human reader. Now, I suppose your prof thinks you should have changed the name of B to <term> on the basis that that is the "textbook" name for the non-terminal representing the operand of an additive operator. The original BNF you were asked to convert does show the two additive operators. However, it makes them right-associative, which is definitely non-standard. So you might be able to construct an argument that there's no reason to assume that these operators are additive and that their operands should be called "terms" [Note 1]. But even on that basis, you should have used some name written in lower-case letters and surrounded in angle brackets. To me, that's minor compared with the first issue, but your prof may have their own criteria.
In summary, I'm afraid I have to say that I don't believe you are entitled to credit for that solution.
If you had actually come up with that explanation, your prof might have been justified in suggesting a change of major to Law.


Construct CFG from {w element of {a, b}* : 2#a(w)=3#b(w)}

If i have following language { x is element of {a,b}*, where 2#a(x)=3#b(x), then the cfg of that language is :
S=>SaSaSaSbSbS |SaSaSbSaSbS|SaSaSbSbSaS | SaSbSaSaSbS| SaSbSaSbSaS | SaSbSbSaSaS |SbSaSaSaSbS |SbSaSaSbSaS |SbSaSbSaSaS |SbSbSaSaSaS | epsilon/lambda
Is this correct? If this isnt correct/there's another more simple form, can you tell it? I have no clue on another form other than that.
At a glance it looks like this probably works:
your base case is good; the empty string is in the language
you cover all your inductive cases: you only add 2 a and 3 b and you cover all arrangements
I am not seeing a fundamentally simpler solution than this, although you might be able to remove either the leading or the trailing S from the right-hand side of all productions; then, by choosing a production you'd be committing to that first or last terminal symbol, but I think that still works out. Possibly even removing both leading and trailing S so you commit to both the first and the last. Any other simplification seems like it would increase the number of productions or the number of nonterminals, or both, which while possibly reducing the total number of symbols needed to encode the grammar, arguably doesn't make the grammar any simpler (indeed, more nonterminals and productions is typically seen as more complicated, not less). If you want to experiment with adding productions or nonterminals, consider e.g. T => Sa and R => Sb, just to cut down on repetition.

ABNF rule `zero = ["0"] "0"` matches `00` but not `0`

I have the following ABNF grammar:
zero = ["0"] "0"
I would expect this to match the strings 0 and 00, but it only seems to match 00? Why?
repl-it demo:
Good question.
ABNF ("Augmented Backus Naur Form"9 is defined by RFC 5234, which is the current version of a document intended to clarify a notation used (with variations) by many RFCs.
Unfortunately, while RFC 5234 exhaustively describes the syntax of ABNF, it does not provide much in the way of a clear statement of semantics. In particular, it does not specify whether ABNF alternation is unordered (as it is in the formal language definitions of BNF) or ordered (as it is in "PEG" -- Parsing Expression Grammar -- notation). Note that optionality/repetition are just types of alternation, so if you choose one convention for alternation, you'll most likely choose it for optionality and repetition as well.
The difference is important in cases like this. If alternation is ordered, then the parser will not backup to try a different alternative after some alternative succeeds. In terms of optionality, this means that if an optional element is present in the stream, the parser will never reconsider the decision to accept the optional element, even if some subsequent element cannot be matched. If you take that view, then alternation does not distribute over concatenation. ["0"]"0" is precisely ("0"/"")"0", which is different from "00"/"0". The latter expression would match a single 0 because the second alternative would be tried after the first one failed. The former expression, which you use, will not.
I do not believe that the authors of RFC 5234 took this view, although it would have been a lot more helpful had they made that decision explicit in the document. My only real evidence to support my belief is that the ABNF included in RFC 5234 to describe ABNF itself would fail if repetition was considered ordered. In particular, the rule for repetitions:
repetition = [repeat] element
repeat = 1*DIGIT / (*DIGIT "*" *DIGIT)
cannot match 7*"0", since the 7 will be matched by the first alternative of repeat, which will be accepted as satisfying the optional [repeat] in repetition, and element will subsequently fail.
In fact, this example (or one similar to it) was reported to the IETF as an erratum in RFC 5234, and the erratum was rejected as unnecessary, because the verifier believed that the correct parse should be produced, thus providing evidence that the official view is that ABNF is not a variant of PEG. Apparently, this view is not shared by the author of the APG parser generator (who also does not appear to document their interpretation.) The suggested erratum chose roughly the same solution as you came up with:
repeat = *DIGIT ["*" *DIGIT]
although that's not strictly speaking the same; the original repeat cannot match the empty string, but the replacement one can. (Since the only use of repeat in the grammar is optional, this doesn't make any practical difference.)
(Disclosure note: I am not a fan of PEG. So it's possible the above answer is not free of bias.)

Constructing a Follow Set

While creating a first set for a given grammar, I noticed a scenario not described in my reference for the algorithm.
Namely, how does one calculate the follow set for a nonterminal with a rule such as this.
<exp-list_tail> --> COMMA <exp> <exp-list_tail>
Expressions surrounded by <..> are nonterminals, COMMA is a terminal.
My best guess is I should just add the empty string to the follow set, but I'm not sure.
Normally, for the case of a nonterminal being at the end of a production rule, you would just compute the follow list for the left nonterminal, but you can see how this would be a problem.
To answer this properly, it would be helpful to know your entire grammar. However, here is an attempt for a general answer:
Here is the algorithm for calculating follow groups:
Init all follow groups to {}, except S which is init to {$}.
While there are changes, for each A∈V do:
For each Y → αAβ do:
follow(A) = follow(A) ∪ first(β)
If β ⇒* ε, also do: follow(A) = follow(A) ∪ follow(Y)
Note that this is a deterministic algorithm, it will give you a single answer, depending only on your (entire) grammar.
Specifically, I don't think that this particular rule will affect <exp-list_tail>'s follow set (it could, but probably wouldn't).

Ambiguous grammar?

there is this question in the book that said
Given this grammer
A --> AA | (A) | epsilon
a- what it generates\
b- show that is ambiguous
now the answers that i think of is
a- adjecent paranthesis
b- it generates diffrent parse tree so its abmbiguous and i did a draw showing two scenarios .
is this right or there is a better answer ?
a is almost correct.
Grammar really generates (), ()(), ()()(), … sequences.
But due to second rule it can generate (()), ()((())), etc.
b is not correct.
This grammar is ambiguous due ot immediate left recursion: A → AA.
How to avoid left recursion: one, two.
a) Nearly right...
This grammar generates exactly the set of strings composed of balanced parenthesis. To see why is that so, let's try to make a quick demonstration.
First: Everything that goes out of your grammar is a balanced parenthesis string. Why?, simple induction:
Epsilon is a balanced (empty) parenthesis string.
if A is a balanced parenthesis string, the (A) is also balanced.
if A1 and A2 are balanced, so is A1A2 (I'm using too different identifiers just to make explicit the fact that A -> AA doesn't necessary produces the same for each A).
Second: Every set of balanced string is produced by your grammar. Let's do it by induction on the size of the string.
If the string is zero-sized, it must be Epsilon.
If not, then being N the size of the string and M the length of the shortest prefix that is balanced (note that the rest of the string is also balanced):
If M = N then you can produce that string with (A).
If M < N the you can produce it with A -> AA, the first M characters with the first A and last N - M with the last A.
In either case, you have to produce a string shorter than N characters, so by induction you can do that. QED.
For example: (()())(())
We can generate this string using exactly the idea of the demonstration.
A -> AA -> (A)A -> (AA)A -> ((A)(A))A -> (()())A -> (()())(A) -> (()())((A)) -> (()())(())
b) Of course left and right recursion is enough to say it's ambiguous, but to see why specially this grammar is ambiguous, follow the same idea for the demonstration:
It is ambiguous because you don't need to take the shortest balanced prefix. You could take the longest balanced (or in general any balanced prefix) that is not the size of the string and the demonstration (and generation) would follow the same process.
Ex: (())()()
You can chose A -> AA and generate with the first A the (()) substring, or the (())() substring.
Yes you are right.
That is what ambigious grammar means.
the problem with mbigious grammars is that if you are writing a compiler, and you want to identify each token in certain line of code (or something like that), then ambigiouity wil inerrupt you in identifying as you will have "two explainations" to that line of code.
It sounds like your approach for part B is correct, showing two independent derivations for the same string in the languages defined by the grammar.
However, I think your answer to part A needs a little work. Clearly you can use the second clause recursively to obtain strings like (((((epsilon))))), but there are other types of derivations possible using the first clause and second clause together.

Handling Grammar / Spelling Issues in Translation Strings

We are currently implementing a Zend Framework Project, that needs to be translated in 6 different languages. We already have a pretty sophisticated translation system, based on Zend_Translate, which also handles variables in translation keys.
Our project has a new Turkish translator, and we are facing a new issue: Grammar, especially Turkish one. I noticed that this problem might be evident in every translation system and in most languages, so I posted a question here.
Question: Any ideas how to handle translations like:
Key: I have a[n] {fruit}
Variables: apple, banana
Result: I have an apple. I have a banana.
Key: Stimme für {user}[s] Einsendung
Variables: Paul, Markus
Result: Stimme für Pauls Einsendung,
Result: Stimme für Markus Einsendung
Anybody has a solution or idea for this? My only guess would be to avoid this by not using translations where these issues occur.
How do other platforms handle this?
Of course the translation system has no idea which type of word it is placing where in which type of Sentence. It only does some string replacements...
PS: Turkish is even more complicated:
For example, on a profile page, we have "Annie's Network". This should translate as "Annie'nin Aği".
If the first name ends in a vowel, the suffix will start with an n and look like "Annie'nin"
If the first name ends in a consonant, it will not have the first n, and look like "Kris'in"
If the last vowel is an a or ı, it will look like "Dan'ın"; or Seyma'nın"
If the last vowel is an o or u, it will look like "Davud'un"; or "Burcu'nun"
If the last vowel is an e or i, it will look like "Erin'in"; or "Efe'nin"
If the last vowel is an ö or ü, it will look like "Göz'ün'; or "Iminönü'nün"
If the last letter is a k (like the name "Basak"), it will look like "Basağın"; or "Eriğin"
It is actually very hard problem, as grammar rules are different even among languages from the same family. I don't think you could easily do anything for let's say Slavic languages...
However, if you want to solve this problem (because this is extra challenging) and you are looking for creative (cross inspiring) ways to do that, you might want to look into something called ChoiceFormat (example would be one from ICU Project) or you can look up GNU Gettext's solution for plural forms problem.
ICU (mentioned above) has a SelectFormat that may be of help- it's like a choice format but with arbitrary keywords. Also, it does have a PluralFormat which already has rules for many language's plural rules.