Given an arbitrary context-free grammar, how can I check whether it describes a regular language?
I'm not looking for exam "tricks". I'm looking for a foolproof mechanical test that I can code.
If it helps, here's an example of a CFG that I might receive as an input.
Specifically, notice that the answer must be much more complicated than just looking for left- or right-recursion, since the presence of another type of recursion does not automatically imply the grammar is irregular.
S: A B C D X
A: A a
A:
B: b B
B:
C: c C c
C: c
D: D d D
D: d
X: x Y
X:
Y: y X
Y:
There is no such mechanical procedure because the problem of determining whether a CFG defines a regular language is undecidable.
This result is a simple application of Greibach's Thereom.
Related
I find difficulties in constructing a Grammar for the language especially with linear grammar.
Can anyone please give me some basic tips/methodology where i can construct the grammar for any language ? thanks in advance
I have a doubt whether the answer for this question "Construct a linear grammar for the language: is right
L ={a^n b c^n | n belongs to Natural numbers}
Solution:
Right-Linear Grammar :
S--> aS | bA
A--> cA | ^
Left-Linear Grammar:
S--> Sc | Ab
A--> Aa | ^
As pointed out in the comments, these grammars are wrong since they generate strings not in the language. Here's a derivation of abcc in both grammars:
S -> aS -> abA -> abcA -> abccA -> abcc
S -> Sc -> Scc -> Abcc -> Aabcc -> abcc
Also as pointed out in the comments, there is a simple linear grammar for this language, where a linear grammar is defined as having at most one nonterminal symbol in the RHS of any production:
S -> aSc | b
There are some general rules for constructing grammars for languages. These are either obvious simple rules or rules derived from closure properties and the way grammars work. For instance:
if L = {a} for an alphabet symbol a, then S -> a is a gammar for L.
if L = {e} for the empty string e, then S -> e is a grammar for L.
if L = R U T for languages R and T, then S -> S' | S'' along with the grammars for R and T are a grammar for L if S' is the start symbol of the grammar for R and S'' is the start symbol of the grammar for T.
if L = RT for languages R and T, then S = S'S'' is a grammar for L if S' is the start symbol of the grammar for R and S'' is the start symbol of the grammar for T.
if L = R* for language R, then S = S'S | e is a grammar for L if S' is the start symbol of the grammar for R.
Rules 4 and 5, as written, do not preserve linearity. Linearity can be preserved for left-linear and right-linear grammars (since those grammars describe regular languages, and regular languages are closed under these kinds of operations); but linearity cannot be preserved in general. To prove this, an example suffices:
R -> aRb | ab
T -> cTd | cd
L = RT = a^n b^n c^m d^m, 0 < a,b,c,d
L' = R* = (a^n b^n)*, 0 < a,b
Suppose there were a linear grammar for L. We must have a production for the start symbol S that produces something. To produce something, we require a string of terminal and nonterminal symbols. To be linear, we must have at most one nonterminal symbol. That is, our production must be of the form
S := xYz
where x is a string of terminals, Y is a single nonterminal, and z is a string of terminals. If x is non-empty, reflection shows the only useful choice is a; anything else fails to derive known strings in the language. Similarly, if z is non-empty, the only useful choice is d. This gives four cases:
x empty, z empty. This is useless, since we now have the same problem to solve for nonterminal Y as we had for S.
x = a, z empty. Y must now generate exactly a^n' b^n' b c^m d^m where n' = n - 1. But then the exact same argument applies to the grammar whose start symbol is Y.
x empty, z = d. Y must now generate exactly a^n b^n c c^m' d^m' where m' = m - 1. But then the exact same argument applies to the grammar whose start symbol is Y.
x = a, z = d. Y must now generate exactly a^n' b^n' bc c^m' d^m' where n' and m' are as in 2 and 3. But then the exact same argument applies to the grammar whose start symbol is Y.
None of the possible choices for a useful production for S is actually useful in getting us closer to a string in the language. Therefore, no strings are derived, a contradiction, meaning that the grammar for L cannot be linear.
Suppose there were a grammar for L'. Then that grammar has to generate all the strings in (a^n b^n)R(a^m b^m), plus those in e + R. But it can't generate the ones in the former by the argument used above: any production useful for that purpose would get us no closer to a string in the language.
I was teaching a friend about programming and I had a hard time convincing them that a = b and b = a are two very different things.
I eventually found the correct words to describe it (right associative) which got me thinking.
Are there any programming languages which are left associative? I have never seen a language where:
a = b results in b being set to the value of a.
You misunderstood associativity. An operator op is associative if (a op b) op c is equivalent to a op (b op c). For operators that are not associative it thus becomes relevant whether a op b op c stands for the former or the latter. Thus we distinguish between left-associative operators, where it's (a op b) op c, and right-associative operators, where it's a op (b op c).
Most operators in most languages are left-associative. Take for example -: a - b - c is equivalent to (a - b) - c in most languages, not a - (b - c).
The assignment operator is an exception from that as (a = b) = c is generally not legal (as you can't assign to the result of an assignment). Thus in most languages a = b = c is equivalent to a = (b = c). A notable exception is Python where a = b = c does not associate at all and is simply illegal.
None of this has anything to do with the difference between a = b and b = a. Since this involves only a single use of the = operator, associativity does not factor into this at all. Rather the relevant property is commutativity: An operator op is commutative if a op b is equivalent to b op a. I'm not aware of any language where assignment is commutative, nor do I have an idea of how it could be.
Concepts like left-commutativity or right-commutativity do not exist. There is, to the best of my knowledge, no term for the question "Does a = b assign b to a or vice-versa?" - that's just part of the semantics of the assignment operator.
I don't believe so. Though you can always overload the assignment operator and cause complete confusion inside your c++.
R has both <- and -> assignment operators defined.
> b <- 42
> b -> a
> a
[1] 42
If we talk about Operator associativity we should actually consider two types of associativity: side_associative (e.g. = ) or non-associative (e.g. == ) as it doesn't make any difference for a machine if it get's the line from left to right or from right to left.
There are no such programming languages that are originally left associative but some of them allow it via operator overloading and some (e.g. R) will allow both: -> and <-.
I believe no-one will like it in Europe but it might be found lovely in the Middle East. I can imagine that there are IDEs that switch right-side with left-side but at the end compilers are written in an European way.
I'm studying for a finite automata & grammars test and I'm stuck with this question:
Construct a grammar that generates L:
L = {a^n b^m c^2n | n>=0, m>=0}
How can I construct a grammar that generates this language?
grammar context-free-grammar automata
I think this should do the trick. I verified this on http://mdaines.github.io/grammophone/ .
S -> a B c c
| a S c c
| .
B -> b B
| .
I find it always helps with these kinds of questions to come up with some rules for how to build big strings out of little strings. First, identify the littlest strings in your language. In our case, we can start with the observation that if n = 0, b^m is in our language; that is, w in b* is in our language. We then note that if x is a string in our language we get another string by adding one a on the left and two cs on the right; that is, axcc is a string in our language also. So our rules are:
b* in L
if x in L then axcc in L
Writing this in terms of a CFG is now straightforward:
S -> B
S -> aScc
Here, S generates our language L and B generates the language b*. We complete the grammar by providing a grammar for b* with start symbol B:
(1) S -> B
(2) S -> aScc
(3) B -> e
(4) B -> B
Any string a^n b^m c^2n can be generated using n applications of rule 2, 1 application of rule 1, m applications of rule 4 and 1 application of rule 3. That this grammar generates no strings not in the language is left as an exercise.
I'm in a Formal languages class and have a grammar quiz coming up. I'm assuming something like this will appear.
Consider the alphabet ∑ = {a, b, c}. Construct a grammar that generates the language L = {bab^nabc^na^p : n ≥ 0, p ≥ 1}. Assume that the start variable is S.
It was a very long time since I worked with formal languages for the last time, so, please, forgive my rustyness, but this would be the language: We divide S to a prefix variable (A) and a suffix variable (B). Then, we handle the prefix and the suffix separately, both of them have a possible rule of further recursion, and an end sign of empty, where no occurrence is required and the constant where at least an occurrence is required.
{bab^nabc^na^p : n ≥ 0, p ≥ 1}
S -> ASB
A -> babAabc
A -> {empty}
B -> Ba
B -> a
I'm curious how to optimize this code :
fun n = (sum l, f $ f0 l, g $ g0 l)
where l = map h [1..n]
Assuming that f, f0, g, g0, and h are all costly, but the creation and storage of l is extremely expensive.
As written, l is stored until the returned tuple is fully evaluated or garbage collected. Instead, length l, f0 l, and g0 l should all be executed whenever any one of them is executed, but f and g should be delayed.
It appears this behavior could be fixed by writing :
fun n = a `seq` b `seq` c `seq` (a, f b, g c)
where
l = map h [1..n]
a = sum l
b = inline f0 $ l
c = inline g0 $ l
Or the very similar :
fun n = (a,b,c) `deepSeq` (a, f b, g c)
where ...
We could perhaps specify a bunch of internal types to achieve the same effects as well, which looks painful. Are there any other options?
Also, I'm obviously hoping with my inlines that the compiler fuses sum, f0, and g0 into a single loop that constructs and consumes l term by term. I could make this explicit through manual inlining, but that'd suck. Are there ways to explicitly prevent the list l from ever being created and/or compel inlining? Pragmas that produce warnings or errors if inlining or fusion fail during compilation perhaps?
As an aside, I'm curious about why seq, inline, lazy, etc. are all defined to by let x = x in x in the Prelude. Is this simply to give them a definition for the compiler to override?
If you want to be sure, the only way is to do it yourself. For any given compiler version, you can try out several source-formulations and check the generated core/assembly/llvm byte-code/whatever whether it does what you want. But that could break with each new compiler version.
If you write
fun n = a `seq` b `seq` c `seq` (a, f b, g c)
where
l = map h [1..n]
a = sum l
b = inline f0 $ l
c = inline g0 $ l
or the deepseq version thereof, the compiler might be able to merge the computations of a, b and c to be performed in parallel (not in the concurrency sense) during a single traversal of l, but for the time being, I'm rather convinced that GHC doesn't, and I'd be surprised if JHC or UHC did. And for that the structure of computing b and c needs to be simple enough.
The only way to obtain the desired result portably across compilers and compiler versions is to do it yourself. For the next few years, at least.
Depending on f0 and g0, it might be as simple as doing a strict left fold with appropriate accumulator type and combining function, like the famous average
data P = P {-# UNPACK #-} !Int {-# UNPACK #-} !Double
average :: [Double] -> Double
average = ratio . foldl' count (P 0 0)
where
ratio (P n s) = s / fromIntegral n
count (P n s) x = P (n+1) (s+x)
but if the structure of f0 and/or g0 doesn't fit, say one's a left fold and the other a right fold, it may be impossible to do the computation in one traversal. In such cases, the choice is between recreating l and storing l. Storing l is easy to achieve with explicit sharing (where l = map h [1..n]), but recreating it may be difficult to achieve if the compiler does some common subexpression elimination (unfortunately, GHC does have a tendency to share lists of that form, even though it does little CSE). For GHC, the flags fno-cse and -fno-full-laziness can help avoiding unwanted sharing.