Is the language of all strings over the alphabet "a,b,c" with the same number of substrings "ab" & "ba" regular? - grammar

Is the language of all strings over the alphabet "a,b,c" with the same number of substrings "ab" & "ba" regular?
I believe the answer is NO, but it is hard to make a formal demonstration of it, even a NON formal demonstration.
Any ideas on how to approach this?

It's clearly not regular. How is an FA going to recognize (abc)^n c (cba)^n. Strings like this are in your language, right? The argument is a simple one based on the fact that there are infinitely many equivalence classes under the indistinguishability relation I_l.

The most common way to prove a language is NOT regular is using on of the Pumping Lemmas.
Using the lemma is a little tricky, since it has all those "exists" and so on. To prove a language L is not regular using the pumping lemma you have to prove that
for any integer p,
there is a word w in L of length n, with n>=p, such that
for all possible ways to decompose w as xyz, with len(xy) <= p and y non empty
there exists an i such that x(y^i)z (repeating the y bit i times) is NOT in L
whooo!
I'l l show how the proof looks for the "same number of as and bs" language. It should be straighfoward to convert to your case:
for any given p, we can make a word of length n = 2*p
a^p b^p (p a's followed by p b's)
any way you decompose this into xyz w/ |xy| <=p, y will only contain a's.
Thus, pumping the the y part will make the word have more as than bs,
thus NOT belonging to L.
If you need intuition on why this works, it follows from how you need to be able to count to arbritrarily large numbers to verify if a word belongs to one of these languages. However, Regular Languages are described by finite automata and no finite automata can represent the infinite ammount of states required to represent all the numbers. (The Wikipedia article should have a formal proof).
EDIT: It looks like you can't straight up use the pumping lemma in this particular case directly: if you always make y be one character long you can never make a word stop being accepted (aba becoming abbbba makes no difference and so on).
Just do the equivalence class approach suggested by Patrick87 - it will probably turn out to be cleaner than any of the dirty hacks you would need to do to make the pumping lemma applicable here.

Related

Prove the language is not context-free?

How can you prove the language L given below is not context-free, I would like to know does my proof given below makes any sense, if not, what would be the correct method to prove?
L = {a^n b^n c^i|n ≤ i ≤ 2n}
I am trying to solve this language by contradiction. Suppose L is regular and with pumping length p such that S = a^p b^p c^p. Observe that S ∉ L. Since there must be a pumping cycle xy with length less than p, this can duplicate y which consists of some number of b to cause x(y^2)z to enter the language because the number of b exceeds the number of c by no longer bound by the given condition of i which is n ≤ i ≥ 2n, therefore, we have contradiction and hence language L is not context-free.
The proof is by contradiction. Assume the language is context-free. Then, by the pumping lemma for context-free languages, any string in L can be written as uvxyz where |vxy| < p, |vy| > 0 and for all natural numbers k, u(v^k)x(y^k)z is in the language as well. Choose a^p b^p c^(p+1). Then we must be able to write this string as uvxyz so that |vy| > 0. There are several possibilities to consider:
v and y consist only of a's. In this case, pumping in either direction causes the numbers of a's and b's to differ, producing a string not in the language; so this cannot be the case.
v and y consist only of a's and b's. In this case, pumping might keep the numbers of a's and b's the same, but pumping up will eventually cause the number of c's to be less than the number of a's and b's; so this cannot be the case.
v and y consist only of b's. This case is similar to (1) above and so cannot be a valid choice.
v and y consist only of b's and c's. Also similar to (1) and (3) in that pumping will cause the numbers of a's and b's to differ.
v and y consist only of c's. Pumping up will eventually cause there to be more c's than twice the number of a's; so this cannot be the case either.
No matter how we choose v and y, pumping will produce strings not in the language. This is a contradiction; this means our assumption that the language is context-free must have been incorrect.

construction of a^(2^i) language grammar

I'm kind a stuck with automaton and grammars problem. I've searched a lot but without any success.
Is it even possible to construct a grammar generating this language L?
L = { a(2i) | i >= 0}
Can anyone provide me with simple solution?
It's certainly possible to write a grammar for this language, but it won't be a context-free grammar. That's easy to demonstrate using the pumping lemma.
The pumping lemma states that for any CFL, there is some integer p such that any string s in the language whose length is at least p can be written as uvxyz, where u, v, x, y and z are strings and vy is not empty, and for all integers n, the string uvnxynz is also in the language.
That is, for any string in the language (whose length l is greater than p), there is there some k such that there are strings in the language whose lengths are l + nk for any integer n. That is not the case for the language a2i, since those strings have exponential lengths, so the language cannot be context-free.
Constructing a non-context-free grammar for the language is not that difficult, but I don't know how useful it is.
The following is a Type 0 grammar (i.e. it's not context-sensitive either), but only because of the productions used to get rid of the metacharacters. The basic idea here is that there we put start and end markers around the string ([ and ]) and we have a "duplicator" (&rarrtl;) which moves from left to right doubling the a's; when it hits the end marker, it either turns into a back-shuttle (&larrtl;) or it eats the end-marker and turns into a start-marker-destroyer (&Larr;)
Start: [&rarrtl;a]
&rarrtl;a: aa&rarrtl;
&rarrtl;]: &larrtl;]
&rarrtl;]: &Larr;
a&larrtl;: &larrtl;a
a&Larr;: &Larr;a
[&larrtl;: [&rarrtl;
[&Larr;:

Context sensitive language with non deterministic turing machine

how can i show a language is context sensitive with a non deterministic turing machine?
i know that a language that is accepted by a Linear bound automaton (LBA ) is a context -sensitive language. And a LBA is a non-deterministic turing machine.
Any idea how can i relate all these and show that a language is context sensitive?
As templatetypedef's answer has some flaws (which I will point out in a second in a comment), I give a quick answer to your question:
The language is context sensitive if (and only if) you can give a nondeterministic turing machine using linear space that defines L.
Let L = { a^n b^n a^n } for an arbitrary integer n; a^n here means n concatenations of the symbol a. This is a typical context sensitive language. Instead of giving a CSG, you can give a LBA to show that L is context sensitive:
The turing machine M 'guesses' (thanks to nondeterminism) n [in other words you may say 'every branch of the nondeterministic search tree tries out another n], and then checks whether the input matches a^n b^n a^n. You need log n cells to store n, the matching might need (if implemented trivially) another log n cells. As n + 2log n < 2n, this machine needs only linear space, and is therefore an LBA, hence L is context sensitive.
This is not an exact answer, but since the context-sensitive languages are precisely those accepted by a linear-bounded automaton (a TM with O(n) space on its tape), the context-sensitive languages are precisely those in DSPACE(n). Moreover, we know that NTIME(n) = DSPACE(n). This means that if you can find a linear-time NTM that decides membership in some language L, that language must be context-sensitive. However, there still might be a context-sensitive language that does not have a linear-time NTM (I don't know whether there is a definitive answer to this or whether this is an open problem), so this is not an exact characterization.
Hope this helps!

Equivalence between two automata

Which is the best or easiest method for determining equivalence between two automata?
I.e., if given two finite automata A and B, how can I determine whether both recognize the same language?
They are both deterministic or both nondeterministic.
A different, simpler approach is to complement and intersect the automata. An automaton A is equivalent to B iff L(A) is contained in L(B) and vice versa which is iff the intersection between the complement of L(B) and L(A) is empty and vice versa.
Here is the algorithm for checking if L(A) is contained in L(B):
Complementation: First, determinize B using the subset construction. Then, make every accepting state rejecting and every rejecting state accepting. You get an automaton that recognizes the complement of L(B).
Intersection: Construct an automaton that recognizes the language that is the intersection of the complement of L(B) and L(A). I.e., construct an automaton for the intersection of the automaton from step 1 and A. To intersect two automata U and V you construct an automaton with the states U x V. The automaton moves from state (u,v) to (u',v') with letter a iff there are transitions u --a--> u' in U and v --a--> v' in V. The accepting states are states (u,v) where u is accepting in U and v is accepting in V.
After constructing the automaton in step 2, all that is needed is to check emptiness. I.e., is there a word that the automaton accepts. That's the easiest part -- find a path in the automaton from the initial state to an accepting state using the BFS algorithm.
If L(A) is contained in L(B) we need to run the same algorithm to check if L(B) is contained in L(A).
Two nondeterministic finite automota (NFA's) are equivalent if they accept the same language.
To determine whether they accept the same language, we look at the fact that every NFA has a minimal DFA, where no two states are identical. A minimal DFA is also unique. Thus, given two NFA's, if you find that their corresponding minimal DFA's are equivalent, then the two NFA's must also be equivalent.
For an in-depth study on this topic, I highly recommend that you read An Introduction to Formal Language and Automata.
I am just rephrasing answer by #Guy.
To compare languages accepted by both we have to figure out if L(A) is equal to L(B) or not.
Thus, you have to find out if L(A)-L(B) and L(B)-L(A) is null set or not. (Reason1)
Part1:
To find this out, we construct NFA X from NFA A and NFA B, .
If X is empty set then L(A) = L(B) else L(A) != L(B). (Reason2)
Part2:
Now, we have to find out an efficient way of proving or disproving X is empty set. When will be X empty as DFA or NFA? Answer: X will be empty when there is no path leading from starting state to any of the final state of X. We can use BFS or DFS for this.
Reason1: If both are null set then L(A) = L(B).
Reason2: We can prove that set of regular languages is closed under intersection and union. Thus we will able to create NFA X efficiently.
and for sets:

Why are we using i as a counter in loops? [closed]

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I know this might seem like an absolutely silly question to ask, yet I am too curious not to ask...
Why did "i" and "j" become THE variables to use as counters in most control structures?
Although common sense tells me they are just like X, which is used for representing unknown values, I can't help to think that there must be a reason why everyone gets taught the same way over and over again.
Is it because it is actually recommended for best practices, or a convention, or does it have some obscure reason behind it?
Just in case, I know I can give them whatever name I want and that variables names are not relevant.
It comes ultimately from mathematics: the summation notation traditionally uses i for the first index, j for the second, and so on. Example (from http://en.wikipedia.org/wiki/Summation):
It's also used that way for collections of things, like if you have a bunch of variables x1, x2, ... xn, then an arbitrary one will be known as xi.
As for why it's that way, I imagine SLaks is correct and it's because I is the first letter in Index.
I believe it dates back to Fortran. Variables starting with I through Q were integer by default, the others were real. This meant that I was the first integer variable, and J the second, etc., so they fell towards use in loops.
Mathematicians were using i,j,k to designate integers in algebra (subscripts, series, summations etc) long before (e.g 1836 or 1816) computers were around (this is the origin of the FORTRAN variable type defaults). The habit of using letters from the end of the alphabet (...,x,y,z) for unknown variables and from the beginning (a,b,c...) for constants is generally attributed to Rene Descartes, (see also here) so I assume i,j,k...n (in the middle of the alphabet) for integers is likely due to him too.
i = integer
Comes from Fortran where integer variables had to start with the letters I through N and real variables started with the other letters. Thus I was the first and shortest integer variable name. Fortran was one of the earliest programming languages in widespread use and the habits developed by programmers using it carried over to other languages.
EDIT: I have no problem with the answer that it derives from mathematics. Undoubtedly that is where the Fortran designers got their inspiration. The fact is, for me anyway, when I started to program in Fortran we used I, J, K, ... for loop counters because they were short and the first legally allowed variable names for integers. As a sophomore in H.S. I had probably heard of Descartes (and a very few others), but made very little connection to mathematics when programming. In fact, the first course I took was called "Fortran for Business" and was taught not by the math faculty, but the business/econ faculty.
For me, at least, the naming of variables had little to do with mathematics, but everything due to the habits I picked up writing Fortran code that I carried into other languages.
i stands for Index.
j comes after i.
These symbols were used as matrix indexes in mathematics long before electronic computers were invented.
I think it's most likely derived from index (in the mathematical sense) - it's used commonly as an index in sums or other set-based operations, and most likely has been used that way since before there were programming languages.
There's a preference in maths for using consecutive letters in the alphabet for "anonymous" variables used in a similar way. Hence, not just "i, j, k", but also "f, g, h", "p, q, r", "x, y, z" (rarely with "u, v, w" prepended), and "α, β, γ".
Now "f, g, h" and "x, y, z" are not used freely: the former is for functions, the latter for dimensions. "p, q, r" are also often used for functions.
Then there are other constraints on available sequences: "l" and "o" are avoided, because they look too much like "1" and "0" in many fonts. "t" is often used for time, "d & δ" for differentials, and "a, s, m, v" for the physical measures of acceleration, displacement, mass, and velocity. That leaves not so many gaps of three consecutive letters without unwanted associations in mathematics for indices.
Then, as several others have noticed, conventions from mathematics had a strong influence on early programming conventions, and "α, β, γ" weren't available in many early character sets.
I found another possible answer that could be that i, j, and k come from Hamilton's Quaternions.
Euler picked i for the imaginary unit.
Hamilton needed two more square roots of -1:
ii = jj = kk = ijk = -1
Hamilton was really influential, and quaternions were the standard way to do 3D analysis before 1900. By then, mathematicians were used to thinking of (ijk) as a matched set.
Vector calculus replaced quaternionic analysis in the 1890s because it was a better way to write Maxwell's equations. But people tended to write vector quantities as like this: (3i-2j+k) instead of (3,-2,1). So (ijk) became the standard basis vectors in R^3.
Finally, physicists started using group theory to describe symmetries in systems of differential equations. So (ijk) started to connote "vectors that get swapped around by permutation groups," then drifted towards "index-like things that take on all possible values in some specified set," which is basically what they mean in a for loop.
by discarding (a little biased)
a seems an array
b seems another array
c seems a language name
d seems another language name
e seems exception
f looks bad in combination with "for" (for f, a pickup?)
g seems g force
h seems height
i seems an index
j seems i (another index)
k seems a constant k
l seems a number one (1)
m seems a matrix
n seems a node
o seems an output
p sounds like a pointer
q seems a queue
r seems a return value
s seems a string
t looks like time
u reserved for UVW mapping or electic phase
v reserved for UVW mapping or electic phase or a vector
w reserved for UVW mapping or electic phase or a weight
x seems an axis (or an unknown variable)
y seems an axis
z seems a third axis
One sunny afternoon, Archimedes what pondering (as was usual for sunny afternoons) and ran into his buddy Eratosthenes.
Archimedes said, "Archimedes to Eratosthenes greeting! I'm trying to come up with a solution to the ratio of several spherical rigid bodies in equilibrium. I wish to iterate over these bodies multiple times, but I'm having a frightful time keeping track of how many iterations I've done!"
Eratosthenes said, "Why Archimedes, you ripe plum of a kidder, you could merely mark successive rows of lines in the sand, each keeping track of the number of iterations you've done within iteration!"
Archimedes cried out to the world that his great friend was undeniably a shining beacon of intelligence for coming up with such a simple solution. But Archimedes remarked that he likes to walk in circles around his sand pit while he ponders. Thus, there was risk of losing track of which row was on top, and which was on bottom.
"Perhaps I should mark these rows with a letter of the alphabet just off to the side so that I will always know which row is which! What think you of that?" he asked, then added, "But Eratosthenes... whatever letters shall I use?"
Eratosthenes was sure he didn't know which letters would be best, and said as much to Archimedes. But Archimedes was unsatisfied and continued to prod the poor librarian to choose, at least, the two letters that he would require for his current sphere equilibrium solution.
Eratosthenes, finally tired of the incessant request for two letters, yelled, "I JUST DON'T KNOW!!!"
So Archimedes chose the first two letters in Eratosthenes' exclamatory sentence, and thanked his friend for the contribution.
These symbols were quickly adopted by ancient Greek Java developers, and the rest is, well... history.
i think it's because a lot of loops use an Int type variable to do the counting, like
for (int i = 0; etc
and when you type, you actually speak it out in your head (like when you read), so in your mind, you say 'int....'
and when you have to make up a letter right after that 'int....' , you say / type the 'i' because that is the first letter you think of when you've just said 'int'
like you spell a word to kids who start learning reading you spell words for them by using names, like this:
WORD spells William W, Ok O, Ruby R, Done D
So you say Int I, Double d, Float f, string s etc. based on the first letter.
And j is used because when you have done int I, J follows right after it.
I think it's a combination of the other mentioned reasons :
For starters, 'i' was commonly used by mathematicians in their notation, and in the early days of computing with languages that weren't binary (ie had to be parsed and lexed in some fashion), the vast majority of users of computers were also mathematicians (... and scientists and engineers) so the notation fell into use in computer languages for programming loops, and has kind of just stuck around ever since.
Combine this with the fact that screen space in those very early days was very limited, as was memory, it made sense to keep shorter variable names.
Possibly historical ?
FORTRAN, aurguably the first high level language, defined i,j,k,l,m as Integer datatypes by default, and loops could only be controlled by integer variable, the convention continues ?
eg:
do 100 i= j,100,5
....
100 continue
....
i = iterator, i = index, i = integer
Which ever you figure "i" stands for it still "fits the bill".
Also, unless you have only a single line of code within that loop, you should probably be naming the iterator/index/integer variable to something more meaningful. Like: employeeIndex
BTW, I usually use "i" in my simple iterator loops; unless of course it contains multiple lines of code.
i = iota, j = jot; both small changes.
iota is the smallest letter in the greek alphabet; in the English language it's meaning is linked to small changes, as in "not one iota" (from a phrase in the New Testament: "until heaven and earth pass away, not an iota, not a dot, will pass from the Law" (Mt 5:18)).
A counter represents a small change in a value.
And from iota comes jot (iot), which is also a synonym for a small change.
cf. http://en.wikipedia.org/wiki/Iota
Well from Mathematics: (for latin letters)
a,b: used as constants or as integers for a rational number
c: a constant
d: derivative
e: Euler's number
f,g,h: functions
i,j,k: are indexes (also unit vectors and the quaternions)
l: generally not used. looks like 1
m,n: are rows and columns of matrices or as integers for rational numbers
o: also not used (unless you're in little o notation)
p,q: often used as primes
r: sometimes a spatial change of variable other times related to prime numbers
s,t: spatial and temporal variables or s is used as a change of variable for t
u,v,w: change of variable
x,y,z: variables
Many possible main reasons, I guess:
mathematicians use i and j for Natural Numbers in formulas (the ones that use Complex Numbers rarely, at least), so this carried over to programming
from C, i hints to int. And if you need another int then i2 is just way too long, so you decide to use j.
there are languages where the first letter decides the type, and i is then an integer.
It comes from Fortran, where i,j,k,l,m,n are implicitly integers.
It definitely comes from mathematics, which long preceded computer programming.
So, where did if come from in math? My completely uneducated guess is that it's as one fellow said, mathematicians like to use alphabetic clusters for similar things -- f, g, h for functions; x, y, z for numeric variables; p, q, r for logical variables; u, v, w for other sets of variables, especially in calculus; a, b, c for a lot of things. i, j, k comes in handy for iterative variables, and that about exhausts the possibilities. Why not m, n? Well, they are used for integers, but more often the end points of iterations rather than the iterative variables themselves.
Someone should ask a historian of mathematics.
Counters are so common in programs, and in the early days of computing, everything was at a premium...
Programmers naturally tried to conserve pixels, and the 'i' required fewer pixels than any other letter to represent. (Mathematicians, being lazy, picked it for the same reason - as the smallest glyph).
As stated previously, 'j' just naturally followed...
:)
I use it for a number of reasons.
Usually my loops are int based, so
you make a complete triangle on the
keyboard typing "int i" with the
exception of the space I handle with
my thumb. This is a very fast
sequence to type.
The "i" could stand for iterator, integer, increment, or index, each of which makes
logical sense.
With my personal uses set aside, the theory of it being derived from FORTRAN is correct, where integer vars used letters I - N.
I learned FORTRAN on a Control Data Corp. 3100 in 1965. Variables starting with 'I' through 'N' were implied to be integers. Ex: 'IGGY' and 'NORB' were integers, 'XMAX' and 'ALPHA' were floating-point. However, you could override this through explicit declaration.