Chomsky languages: how to recognize them? - grammar

I have a problem with the recognition of languages. Given a certain language, for example ancb2n, n > 0, how do I determine quickly what type belongs according Chomsky?
My idea was to determine the grammar that generates it and then up to the language but it is a long process. I think there's another way to recognize it by eye,
without writing grammars or automata.
can someone help me?

Unfortunately, associating an arbitrary language with a level of the Chomsky hierarchy is, in the general case, undecidable. (See Rice's Theorem.)
Of course, it is easy to categorize a given grammar, since the Chomsky hierarchy is defined by a simple syntactic analysis of the grammar itself. However, languages do not have unique grammars; the existence of (for example) a Type 2 (context-free) grammar for a language does not mean that there is not a Type 3 (regular) grammar for the same language.
So there are no shortcuts.
However, there is a lot to be said for experience. The language { ancb2n | n > 0 } is context-free (and not regular), as are all languages of similar forms. That it is context free is demonstrated by the grammar
L → c
L → a L b b
and the fact that it is not regular can be proven using the pumping lemma for regular languages. (The linked Wikipedia article contains, as an example of the use of the lemma, a proof for a similar language which should be easy to adapt.)
On the other hand, a language which requires three equal counts ({ ancnbn | n > 0 }) is not context-free (but is context-sensitive). (That's not the same as { ancn+mbm | n > 0 }, which is context-free.)

Related

Context-free grammar for L = { b^n c^n a^n , n>=1}

I have a language L, which is defined as: L = { b^n c^n a^n , n>=1}
The corresponding grammar would be able to create words such as:
bca
bbccaa
bbbcccaaa
...
How would such a grammar look like? Making two variables dependent of each other is relatively simple, but I have trouble with doing it for three.
Thanks in advance!
L = { b^n c^n a^n , n>=1}
As pointed out in the comments, this is a canonical example of a language which is not context free. It can be shown using the pumping lemma for context-free languages. Basically, consider a string like b^p c^p a^p where p is the pumping length and then show no matter what part you pump, you will throw off the balance (basically, the size of the part that's pumped is less than p, so it cannot "span" all three symbols to keep them in sync).
L = {a^m b^n c^n a^(m+n) |m ≥ 0,n ≥ 1}
As suggested in the comments, this is not context free either. It can be shown using the pumping lemma for context-free languages as well. However, given a proof (or acceptance) of the above, there is an easier way. Recall that the intersection of a regular language and a context-free language must be context free. Assume L is context-free. Then so must be its intersection with the regular language (b+c)(b+c)* a*. However, that intersection can be expressed as b^n c^n a^n (since m is forced to be zero), which we know is not context-free, a contradiction. Therefore, our assumption was wrong and L is not context free either.

Automata theory : Conversion of a Context free grammar to a DFA

How to convert a Context Free Grammar to a DFA? This is easy if we have transitions like
A->a B. But when we have the transitions as A->a B c. Then how should we represent it as a DFA
There is no general procedure to convert an arbitrary CFG into a DFA. For example, consider this CFG:
S → aSb | ε
This grammar is for the language { anbn | n ≥ 0 }, which is a canonical nonregular language. Since we can only build DFAs for regular languages, there’s no way to build a DFA with the same language as this CFG
First, you should convert your language to CNF (Chomskey Normal Form).
Then steps for conversion are as such:
Convert it to left/right grammar is called a regular grammar.
Convert the Regular Grammar into Finite Automata
The transitions for automata are obtained as follows
For every production A -> aB make δ(A, a) = B that is make an are labeled ‘a’ from A to B.
For every production A -> a make δ(A, a) = final state.
For every production A -> ϵ, make δ(A, ϵ) = A and A will be final state.
No. For these Grammar no DFA can form.
why?
because it requires memory. Memory of occurence of a.
Yes . it is CFL (context free Language).
We can design a PDA (Push down automata). Here , memory ( STACK is
use ). for PUSH a and POP b

Context sensitive language with non deterministic turing machine

how can i show a language is context sensitive with a non deterministic turing machine?
i know that a language that is accepted by a Linear bound automaton (LBA ) is a context -sensitive language. And a LBA is a non-deterministic turing machine.
Any idea how can i relate all these and show that a language is context sensitive?
As templatetypedef's answer has some flaws (which I will point out in a second in a comment), I give a quick answer to your question:
The language is context sensitive if (and only if) you can give a nondeterministic turing machine using linear space that defines L.
Let L = { a^n b^n a^n } for an arbitrary integer n; a^n here means n concatenations of the symbol a. This is a typical context sensitive language. Instead of giving a CSG, you can give a LBA to show that L is context sensitive:
The turing machine M 'guesses' (thanks to nondeterminism) n [in other words you may say 'every branch of the nondeterministic search tree tries out another n], and then checks whether the input matches a^n b^n a^n. You need log n cells to store n, the matching might need (if implemented trivially) another log n cells. As n + 2log n < 2n, this machine needs only linear space, and is therefore an LBA, hence L is context sensitive.
This is not an exact answer, but since the context-sensitive languages are precisely those accepted by a linear-bounded automaton (a TM with O(n) space on its tape), the context-sensitive languages are precisely those in DSPACE(n). Moreover, we know that NTIME(n) = DSPACE(n). This means that if you can find a linear-time NTM that decides membership in some language L, that language must be context-sensitive. However, there still might be a context-sensitive language that does not have a linear-time NTM (I don't know whether there is a definitive answer to this or whether this is an open problem), so this is not an exact characterization.
Hope this helps!

Is the language of all strings over the alphabet "a,b,c" with the same number of substrings "ab" & "ba" regular?

Is the language of all strings over the alphabet "a,b,c" with the same number of substrings "ab" & "ba" regular?
I believe the answer is NO, but it is hard to make a formal demonstration of it, even a NON formal demonstration.
Any ideas on how to approach this?
It's clearly not regular. How is an FA going to recognize (abc)^n c (cba)^n. Strings like this are in your language, right? The argument is a simple one based on the fact that there are infinitely many equivalence classes under the indistinguishability relation I_l.
The most common way to prove a language is NOT regular is using on of the Pumping Lemmas.
Using the lemma is a little tricky, since it has all those "exists" and so on. To prove a language L is not regular using the pumping lemma you have to prove that
for any integer p,
there is a word w in L of length n, with n>=p, such that
for all possible ways to decompose w as xyz, with len(xy) <= p and y non empty
there exists an i such that x(y^i)z (repeating the y bit i times) is NOT in L
whooo!
I'l l show how the proof looks for the "same number of as and bs" language. It should be straighfoward to convert to your case:
for any given p, we can make a word of length n = 2*p
a^p b^p (p a's followed by p b's)
any way you decompose this into xyz w/ |xy| <=p, y will only contain a's.
Thus, pumping the the y part will make the word have more as than bs,
thus NOT belonging to L.
If you need intuition on why this works, it follows from how you need to be able to count to arbritrarily large numbers to verify if a word belongs to one of these languages. However, Regular Languages are described by finite automata and no finite automata can represent the infinite ammount of states required to represent all the numbers. (The Wikipedia article should have a formal proof).
EDIT: It looks like you can't straight up use the pumping lemma in this particular case directly: if you always make y be one character long you can never make a word stop being accepted (aba becoming abbbba makes no difference and so on).
Just do the equivalence class approach suggested by Patrick87 - it will probably turn out to be cleaner than any of the dirty hacks you would need to do to make the pumping lemma applicable here.

(x86) Assembler Optimization

I'm building a compiler/assembler/linker in Java for the x86-32 (IA32) processor targeting Windows.
High-level concepts (I do not have any "source code": there is no syntax nor lexical translation, and all languages are regular) are translated into opcodes, which then are wrapped and outputted to a file. The translation process has several phases, one is the translation between regular languages: the highest-level code is translated into the medium-level code which is then translated into the lowest-level code (probably more than 3 levels).
My problem is the following; if I have higher-level code (X and Y) translated to lower-level code (x, y, U and V), then an example of such a translation is, in pseudo-code:
x + U(f) // generated by X
+
V(f) + y // generated by Y
(An easy example) where V is the opposite of U (compare with a stack push as U and a pop as V). This needs to be 'optimized' into:
x + y
(essentially removing the "useless" code)
My idea was to use regular expressions. For the above case, it'll be a regular expression looking like this: x:(U(x)+V(x)):null, meaning for all x find U(x) followed by V(x) and replace by null. Imagine more complex regular expressions, for more complex optimizations. This should work on all levels.
What do you suggest? What would be a good approach to optimize and produce fast x86 assembly?
What you should actually do is build an Abstract Syntax Tree (AST).
It is a representation of the source code in the form of a tree, that is much easier to work with, especially to make transformations and optimizations.
That code, represented as a tree, would be something like:
(+
(+
x
(U f))
(+
(V f)
y))
You could then try to make some transformations: a sum of sums is a sum of all the terms:
(+
x
(U f)
(V f)
y)
Then you could scan the tree and you could have the following rules:
(+ (U x) (V x)) = 0, for all x
(+ 0 x1 x2 ...) = x, for all x1, x2, ...
Then you would obtain what you are looking for:
(+ x y)
Any good book on compiler-writing will discuss a lot on ASTs. Functional programming languages are specially suited for this task, since in general it is easy to represent trees and to do pattern matching to parse and transform the tree.
Usually, for this task, you should avoid using regular expressions. Regular expressions define what mathematicians call regular languages. Any regular language can be parsed by a set of regular expressions. However, I think your language is not regular, so it cannot be properly parsed by regexps.
People try, and try, and try to parse languages such as HTML using regular expressions. This has been extensively discussed here in SO, and you cannot parse HTML with regular expressions. There will always be an exceptional case in which your regular expressions would fail, and you would have to adapt it.
It might be the same with your language: if it is not regular, you should avoid lots of headaches and not try to parse it (and especially "transform" it) using regular expressions.
I'm having a lot of trouble understanding this question, but I think you will find it useful to learn something about term-rewriting systems, which seems to be what you are proposing. Whether the mechanism is tree rewriting (always works) or regular expressions (will work for some languages some of the time and other languages all of the time) is of secondary importance.
It is definitely possible to optimize object code by term rewriting. You probably also will benefit from learning something about peephole optimization; a good place to start, because it is very strong on the fundamentals, is a paper by Davidson and Fraser on a retargetable peephole optimizer. There's also excellent later work by Benitez and Davidson.