Theory: Is the given language context free or not? - grammar

In a recent test , I was asked to recognize if the below language is context free:
According to me, it is context free, and can be accepted by the below context free grammar, where S is the start symbol and Y is a non-terminal:
However, my answer was considered to be wrong and so apparently this language is not context free.
I'm confident about my answer, but the response has got me confused. Is my understanding correct? Please let me know if I've missed something.

The language is NOT context free language! your grammar is wrong!
According to language description 0s at suffix should be less then number of 1s and prefix 0s. But using your grammar you can generate wrong string as follows:
Step1: Always replace S by S0
Step2: Now replace S to Y
S --> S0 --> S00 --> S000 --> Y000
Now you can replace Y to ^ (nul) it gives 000, this string is not in your language.
either replace Y to 0Y1 then Y to ^:
Y000 ---> 0Y1000 ---> 01000
01000 string is not in language. So your grammar does not generate the same language.

Related

Precedence inside a function call

Using the defined-or operator ( // ) in a function call produces the result I'd expect:
say( 'nan'.Int // 42); # OUTPUT: «42»
However, using the lower-precedence orelse operator instead throws an error:
say( 'nan'.Int orelse 42);
# OUTPUT: «Error: Unable to parse expression in argument list;
# couldn't find final ')'
# (corresponding starter was at line 1)»
What am I missing about how precedence works?
(Or is the error a bug and I'm just overthinking this?)
I'd say, it's a grammar bug, as
say ("nan".Int orelse 42); # 42
works.
TL;DR My super useful naanswer (not-an-answer / non-authoritative answer / food for thought) is it might be a bug or it might not. :)
Other examples:
say(42 and 42);
say(42 ==> 99);
yield the same error.
What am I missing about how precedence works?
Perhaps nothing. Perhaps it will be desirable and possible to fix the grammar so these function-call-arg-list-signifying parens determine precedence just like plain expression parens do.
If so, perhaps fixing it would best wait, or perhaps realistically must wait, until when or after RakuAST lands (6.e?). Or perhaps even later, lf/when grammar cleanup/slangs lands (6.f?).
Or perhaps it's going to always stay as it is for reasons such as good usability (despite the initial "huh?") and/or expediency and/or single-pass parsing and/or whatever.
I've dug a little to see if I could find relevant commentary. Here are some (juicy?) bits:
the OPP is a bit more complex than a standard binary-operator OPP
(from a comment on #perl6)
If you scroll backwards from Larry's comment you'll see he said this in the context of Raku's extraordinary seamless parsing (no delimiters introduced) in a single pass of nested sub-languages that each can have arbitrary grammars.
(Btw, one thought I had: did std parse say(42 and 42) fine? I'm not sure if there's a running std anywhere these days.)
While we do have complete control of stock Raku, I'm not convinced there's anything compelling about bending over backwards to fix every wrinkle of this sort (foo(... op ...) in this case) when the general case (..... where the middle ... inside the outer pair of .s has arbitrary syntax) means we'll be hitting limits in how "perfect" it can all be when there's a huge amount of anarchic language / syntax mixing going on in userland/module space, as I anticipate will emerge in years to come.
So, imo, if it's reasonably easy to fix, without unduly cramping or burdening user slang freedom, great. If not, I think the current situation is fair enough (though perhaps it'll be desirable, viable and reasonable to improve the error message).
Perhaps consider the foregoing in combination with:
Raku borrows many concepts from human language ...
(from the doc)
in combination with:
☞ Self-clocking code produces better syntax error messages
(from Seeing Wrong Right)
in combination with:
Break that clock and your error messages will turn to mush
(from a mailing list comment)
But then again:
Please don't assume that rakudo's idiosyncracies and design fossils are canonical.
Do you mean this, maybe...?
> say ( NaN.Int orelse 42 )
42
since
> say( NaN.Int orelse 42 )
===SORRY!=== Error while compiling:
Unable to parse expression in argument list; couldn't find final ')' (corresponding starter was at line 1)
------> say( '42'.Int⏏ orelse 42 )
expecting any of:
infix
infix stopper
I would tend to agree with #lizmat that there is a grammar bug in the compiler.

Is this conversion from BNF to EBNF correct?

As context, my textbook uses this style for EBNF:
Sebesta, Robert W. Concepts of Programming Languages 11th ed., Pearson, 2016, 150.
The problem:
Convert the following BNF rule with three RHSs to an EBNF rule with a single RHS.
Note: Conversion to EBNF should remove all explicit recursion and yield a single RHS EBNF rule.
A ⟶ B + A | B – A | B
My solution:
A ⟶ B [ (+ | –) A ]
My professor tells me:
"First, you should use { } instead of [ ],
Second, according to the BNF rule, <"term"> is B." (He is referring the the style guide posted above)
Is he correct? I assume so but have read other EBNF styles and wonder if I am entitled to credit.
You were clearly asked to remove explicit recursion and your proposed solution doesn't do that; A is still defined in terms of itself. So independent of naming issues, you failed to do the requested conversion and your prof is correct to mark you down for it. The correct solution for the problem as presented, ignoring the names of non-terminals, is A ⟶ B { (+ | –) B }, using indefinite repetition ({…}) instead of optionality ([…]). With this solution, the right-hand side of the production for A only references B, so there is no recursion (at least, in this particular production).
Now, for naming: clearly, your textbook's EBNF style is to use angle brackets around the non-terminal names. That's a common style, and many would say that it is more readable than using single capital letters which mean nothing to a human reader. Now, I suppose your prof thinks you should have changed the name of B to <term> on the basis that that is the "textbook" name for the non-terminal representing the operand of an additive operator. The original BNF you were asked to convert does show the two additive operators. However, it makes them right-associative, which is definitely non-standard. So you might be able to construct an argument that there's no reason to assume that these operators are additive and that their operands should be called "terms" [Note 1]. But even on that basis, you should have used some name written in lower-case letters and surrounded in angle brackets. To me, that's minor compared with the first issue, but your prof may have their own criteria.
In summary, I'm afraid I have to say that I don't believe you are entitled to credit for that solution.
Notes
If you had actually come up with that explanation, your prof might have been justified in suggesting a change of major to Law.

Handling Grammar / Spelling Issues in Translation Strings

We are currently implementing a Zend Framework Project, that needs to be translated in 6 different languages. We already have a pretty sophisticated translation system, based on Zend_Translate, which also handles variables in translation keys.
Our project has a new Turkish translator, and we are facing a new issue: Grammar, especially Turkish one. I noticed that this problem might be evident in every translation system and in most languages, so I posted a question here.
Question: Any ideas how to handle translations like:
Key: I have a[n] {fruit}
Variables: apple, banana
Result: I have an apple. I have a banana.
Key: Stimme für {user}[s] Einsendung
Variables: Paul, Markus
Result: Stimme für Pauls Einsendung,
Result: Stimme für Markus Einsendung
Anybody has a solution or idea for this? My only guess would be to avoid this by not using translations where these issues occur.
How do other platforms handle this?
Of course the translation system has no idea which type of word it is placing where in which type of Sentence. It only does some string replacements...
PS: Turkish is even more complicated:
For example, on a profile page, we have "Annie's Network". This should translate as "Annie'nin Aği".
If the first name ends in a vowel, the suffix will start with an n and look like "Annie'nin"
If the first name ends in a consonant, it will not have the first n, and look like "Kris'in"
If the last vowel is an a or ı, it will look like "Dan'ın"; or Seyma'nın"
If the last vowel is an o or u, it will look like "Davud'un"; or "Burcu'nun"
If the last vowel is an e or i, it will look like "Erin'in"; or "Efe'nin"
If the last vowel is an ö or ü, it will look like "Göz'ün'; or "Iminönü'nün"
If the last letter is a k (like the name "Basak"), it will look like "Basağın"; or "Eriğin"
It is actually very hard problem, as grammar rules are different even among languages from the same family. I don't think you could easily do anything for let's say Slavic languages...
However, if you want to solve this problem (because this is extra challenging) and you are looking for creative (cross inspiring) ways to do that, you might want to look into something called ChoiceFormat (example would be one from ICU Project) or you can look up GNU Gettext's solution for plural forms problem.
ICU (mentioned above) has a SelectFormat http://site.icu-project.org/design/formatting/select that may be of help- it's like a choice format but with arbitrary keywords. Also, it does have a PluralFormat which already has rules for many language's plural rules.

Bison input analyzer - basic question on optional grammar and input interpretation

I am very new to Flex/Bison, So it is very navie question.
Pardon me if so. May look like homework question - but I need to implement project based on below concept.
My question is related to two parts,
Question 1
In Bison parser, How do I provide rules for optional input.
Like, I need to parse the statment
Example :
-country='USA' -state='INDIANA' -population='100' -ratio='0.5' -comment='Census study for Indiana'
Here the ratio token can be optional. Similarly, If I have many tokens optional, then How do I provide the grammar in the parser for the same?
My code looks like,
%start program
program : TK_COUNTRY TK_IDENTIFIER TK_STATE TK_IDENTIFIER TK_POPULATION TK_IDENTIFIER ...
where all the tokens are defined in the lexer. Since there are many tokens which are optional, If I use "|" then there will be many different ways of input combination possible.
Question 2
There are good chance that the comment might have quotes as part of the input, so I have added a token -tag which user can provide to interpret the same,
Example :
-country='USA' -state='INDIANA' -population='100' -ratio='0.5' -comment='Census study for Indiana$'s population' -tag=$
Now, I need to reinterpret Indiana$'s as Indiana's since -tag=$.
Please provide any input or related material for to understand these topic.
Q1: I am assuming we have 4 possible tokens: NAME , '-', '=' and VALUE
Then the grammar could look like this:
attrs:
attr attrs
| attr
;
attr:
'-' NAME '=' VALUE
;
Note that, unlike you make specific attribute names distinguished tokens, there is no way to say "We must have country, state and population, but ratio is optional."
This would be the task of that part of the program that analyses the data produced by the parser.
Q2: I understand this so, that you think of changing the way lexical analysis works while the parser is running. This is not a good idea, at least not for a beginner. Have you even started to think about lexical analysis, as opposed to parsing?

what is mean by contex free not regular

I am preparing contex free grammar for an exam. I couldn't understand why the language
{ a^n b^n | n>=0}
is context free but not regular. Why is it not regular?
When can we say that an expression is not regular?
Thanks
An expression is not regular if it cannot be matched (exactly) by a regular expression or (equivalently) a finite state machine. See also context free language and regular language.
Like said in previous answers, its context free because you can express it with context free grammar.
For example: S -> aSb | ε
Its not regular because you cannot express it with finite state machine nor regular expressions. You should be able to count the number of As and check that number of Bs match. This cannot be done with finite states as n could be anything
The standard approach is to use the Pumping Lemma
Well you can say it is context free because you can express it using a Context-Free Grammar. It is not regular however because a regular expression (and finite automata) can not represent that language.