Context sensitive language with non deterministic turing machine - grammar

how can i show a language is context sensitive with a non deterministic turing machine?
i know that a language that is accepted by a Linear bound automaton (LBA ) is a context -sensitive language. And a LBA is a non-deterministic turing machine.
Any idea how can i relate all these and show that a language is context sensitive?

As templatetypedef's answer has some flaws (which I will point out in a second in a comment), I give a quick answer to your question:
The language is context sensitive if (and only if) you can give a nondeterministic turing machine using linear space that defines L.
Let L = { a^n b^n a^n } for an arbitrary integer n; a^n here means n concatenations of the symbol a. This is a typical context sensitive language. Instead of giving a CSG, you can give a LBA to show that L is context sensitive:
The turing machine M 'guesses' (thanks to nondeterminism) n [in other words you may say 'every branch of the nondeterministic search tree tries out another n], and then checks whether the input matches a^n b^n a^n. You need log n cells to store n, the matching might need (if implemented trivially) another log n cells. As n + 2log n < 2n, this machine needs only linear space, and is therefore an LBA, hence L is context sensitive.

This is not an exact answer, but since the context-sensitive languages are precisely those accepted by a linear-bounded automaton (a TM with O(n) space on its tape), the context-sensitive languages are precisely those in DSPACE(n). Moreover, we know that NTIME(n) = DSPACE(n). This means that if you can find a linear-time NTM that decides membership in some language L, that language must be context-sensitive. However, there still might be a context-sensitive language that does not have a linear-time NTM (I don't know whether there is a definitive answer to this or whether this is an open problem), so this is not an exact characterization.
Hope this helps!

Related

Context-free grammar for L = { b^n c^n a^n , n>=1}

I have a language L, which is defined as: L = { b^n c^n a^n , n>=1}
The corresponding grammar would be able to create words such as:
bca
bbccaa
bbbcccaaa
...
How would such a grammar look like? Making two variables dependent of each other is relatively simple, but I have trouble with doing it for three.
Thanks in advance!
L = { b^n c^n a^n , n>=1}
As pointed out in the comments, this is a canonical example of a language which is not context free. It can be shown using the pumping lemma for context-free languages. Basically, consider a string like b^p c^p a^p where p is the pumping length and then show no matter what part you pump, you will throw off the balance (basically, the size of the part that's pumped is less than p, so it cannot "span" all three symbols to keep them in sync).
L = {a^m b^n c^n a^(m+n) |m ≥ 0,n ≥ 1}
As suggested in the comments, this is not context free either. It can be shown using the pumping lemma for context-free languages as well. However, given a proof (or acceptance) of the above, there is an easier way. Recall that the intersection of a regular language and a context-free language must be context free. Assume L is context-free. Then so must be its intersection with the regular language (b+c)(b+c)* a*. However, that intersection can be expressed as b^n c^n a^n (since m is forced to be zero), which we know is not context-free, a contradiction. Therefore, our assumption was wrong and L is not context free either.

How to show that if the language L is regular, then L' is regular?

Let L be any regular language and a ∈ Σ. How to show that the language L'={uav | uv ∈ L} is regular too?
Wikipedia says a way to proove it is to lead it back to a regular language but I don't understand how to do that in this case. Hope somebody can help.
There are lots of ways to show this. I think an argument whereby we construct a DFA is particularly easy to visualize.
Imagine a DFA for your language L. Let's call it M. Imagine it sprawled out in diagram form on a table. Now, imagine making a copy of M and spreading it out next to M on the table. Call it M'.
Now - from M, add a new transition from state q of M to the corresponding state q' of M'. The transition is on the symbol a.
Now, consider the aggregate machine whose start state is the start state of M and whose accepting states are the accepting states of M'. This machine starts out accepting strings in L, then accepts an a somewhere in the middle, and then continues accepting strings in L from where it left off. This is the language we were going for and we have defined a perfectly reasonable NFA for it. Since any language accepted by an NFA is regular, our language is regular.

Is the language of all strings over the alphabet "a,b,c" with the same number of substrings "ab" & "ba" regular?

Is the language of all strings over the alphabet "a,b,c" with the same number of substrings "ab" & "ba" regular?
I believe the answer is NO, but it is hard to make a formal demonstration of it, even a NON formal demonstration.
Any ideas on how to approach this?
It's clearly not regular. How is an FA going to recognize (abc)^n c (cba)^n. Strings like this are in your language, right? The argument is a simple one based on the fact that there are infinitely many equivalence classes under the indistinguishability relation I_l.
The most common way to prove a language is NOT regular is using on of the Pumping Lemmas.
Using the lemma is a little tricky, since it has all those "exists" and so on. To prove a language L is not regular using the pumping lemma you have to prove that
for any integer p,
there is a word w in L of length n, with n>=p, such that
for all possible ways to decompose w as xyz, with len(xy) <= p and y non empty
there exists an i such that x(y^i)z (repeating the y bit i times) is NOT in L
whooo!
I'l l show how the proof looks for the "same number of as and bs" language. It should be straighfoward to convert to your case:
for any given p, we can make a word of length n = 2*p
a^p b^p (p a's followed by p b's)
any way you decompose this into xyz w/ |xy| <=p, y will only contain a's.
Thus, pumping the the y part will make the word have more as than bs,
thus NOT belonging to L.
If you need intuition on why this works, it follows from how you need to be able to count to arbritrarily large numbers to verify if a word belongs to one of these languages. However, Regular Languages are described by finite automata and no finite automata can represent the infinite ammount of states required to represent all the numbers. (The Wikipedia article should have a formal proof).
EDIT: It looks like you can't straight up use the pumping lemma in this particular case directly: if you always make y be one character long you can never make a word stop being accepted (aba becoming abbbba makes no difference and so on).
Just do the equivalence class approach suggested by Patrick87 - it will probably turn out to be cleaner than any of the dirty hacks you would need to do to make the pumping lemma applicable here.

Equivalence between two automata

Which is the best or easiest method for determining equivalence between two automata?
I.e., if given two finite automata A and B, how can I determine whether both recognize the same language?
They are both deterministic or both nondeterministic.
A different, simpler approach is to complement and intersect the automata. An automaton A is equivalent to B iff L(A) is contained in L(B) and vice versa which is iff the intersection between the complement of L(B) and L(A) is empty and vice versa.
Here is the algorithm for checking if L(A) is contained in L(B):
Complementation: First, determinize B using the subset construction. Then, make every accepting state rejecting and every rejecting state accepting. You get an automaton that recognizes the complement of L(B).
Intersection: Construct an automaton that recognizes the language that is the intersection of the complement of L(B) and L(A). I.e., construct an automaton for the intersection of the automaton from step 1 and A. To intersect two automata U and V you construct an automaton with the states U x V. The automaton moves from state (u,v) to (u',v') with letter a iff there are transitions u --a--> u' in U and v --a--> v' in V. The accepting states are states (u,v) where u is accepting in U and v is accepting in V.
After constructing the automaton in step 2, all that is needed is to check emptiness. I.e., is there a word that the automaton accepts. That's the easiest part -- find a path in the automaton from the initial state to an accepting state using the BFS algorithm.
If L(A) is contained in L(B) we need to run the same algorithm to check if L(B) is contained in L(A).
Two nondeterministic finite automota (NFA's) are equivalent if they accept the same language.
To determine whether they accept the same language, we look at the fact that every NFA has a minimal DFA, where no two states are identical. A minimal DFA is also unique. Thus, given two NFA's, if you find that their corresponding minimal DFA's are equivalent, then the two NFA's must also be equivalent.
For an in-depth study on this topic, I highly recommend that you read An Introduction to Formal Language and Automata.
I am just rephrasing answer by #Guy.
To compare languages accepted by both we have to figure out if L(A) is equal to L(B) or not.
Thus, you have to find out if L(A)-L(B) and L(B)-L(A) is null set or not. (Reason1)
Part1:
To find this out, we construct NFA X from NFA A and NFA B, .
If X is empty set then L(A) = L(B) else L(A) != L(B). (Reason2)
Part2:
Now, we have to find out an efficient way of proving or disproving X is empty set. When will be X empty as DFA or NFA? Answer: X will be empty when there is no path leading from starting state to any of the final state of X. We can use BFS or DFS for this.
Reason1: If both are null set then L(A) = L(B).
Reason2: We can prove that set of regular languages is closed under intersection and union. Thus we will able to create NFA X efficiently.
and for sets:

Why are we using i as a counter in loops? [closed]

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I know this might seem like an absolutely silly question to ask, yet I am too curious not to ask...
Why did "i" and "j" become THE variables to use as counters in most control structures?
Although common sense tells me they are just like X, which is used for representing unknown values, I can't help to think that there must be a reason why everyone gets taught the same way over and over again.
Is it because it is actually recommended for best practices, or a convention, or does it have some obscure reason behind it?
Just in case, I know I can give them whatever name I want and that variables names are not relevant.
It comes ultimately from mathematics: the summation notation traditionally uses i for the first index, j for the second, and so on. Example (from http://en.wikipedia.org/wiki/Summation):
It's also used that way for collections of things, like if you have a bunch of variables x1, x2, ... xn, then an arbitrary one will be known as xi.
As for why it's that way, I imagine SLaks is correct and it's because I is the first letter in Index.
I believe it dates back to Fortran. Variables starting with I through Q were integer by default, the others were real. This meant that I was the first integer variable, and J the second, etc., so they fell towards use in loops.
Mathematicians were using i,j,k to designate integers in algebra (subscripts, series, summations etc) long before (e.g 1836 or 1816) computers were around (this is the origin of the FORTRAN variable type defaults). The habit of using letters from the end of the alphabet (...,x,y,z) for unknown variables and from the beginning (a,b,c...) for constants is generally attributed to Rene Descartes, (see also here) so I assume i,j,k...n (in the middle of the alphabet) for integers is likely due to him too.
i = integer
Comes from Fortran where integer variables had to start with the letters I through N and real variables started with the other letters. Thus I was the first and shortest integer variable name. Fortran was one of the earliest programming languages in widespread use and the habits developed by programmers using it carried over to other languages.
EDIT: I have no problem with the answer that it derives from mathematics. Undoubtedly that is where the Fortran designers got their inspiration. The fact is, for me anyway, when I started to program in Fortran we used I, J, K, ... for loop counters because they were short and the first legally allowed variable names for integers. As a sophomore in H.S. I had probably heard of Descartes (and a very few others), but made very little connection to mathematics when programming. In fact, the first course I took was called "Fortran for Business" and was taught not by the math faculty, but the business/econ faculty.
For me, at least, the naming of variables had little to do with mathematics, but everything due to the habits I picked up writing Fortran code that I carried into other languages.
i stands for Index.
j comes after i.
These symbols were used as matrix indexes in mathematics long before electronic computers were invented.
I think it's most likely derived from index (in the mathematical sense) - it's used commonly as an index in sums or other set-based operations, and most likely has been used that way since before there were programming languages.
There's a preference in maths for using consecutive letters in the alphabet for "anonymous" variables used in a similar way. Hence, not just "i, j, k", but also "f, g, h", "p, q, r", "x, y, z" (rarely with "u, v, w" prepended), and "α, β, γ".
Now "f, g, h" and "x, y, z" are not used freely: the former is for functions, the latter for dimensions. "p, q, r" are also often used for functions.
Then there are other constraints on available sequences: "l" and "o" are avoided, because they look too much like "1" and "0" in many fonts. "t" is often used for time, "d & δ" for differentials, and "a, s, m, v" for the physical measures of acceleration, displacement, mass, and velocity. That leaves not so many gaps of three consecutive letters without unwanted associations in mathematics for indices.
Then, as several others have noticed, conventions from mathematics had a strong influence on early programming conventions, and "α, β, γ" weren't available in many early character sets.
I found another possible answer that could be that i, j, and k come from Hamilton's Quaternions.
Euler picked i for the imaginary unit.
Hamilton needed two more square roots of -1:
ii = jj = kk = ijk = -1
Hamilton was really influential, and quaternions were the standard way to do 3D analysis before 1900. By then, mathematicians were used to thinking of (ijk) as a matched set.
Vector calculus replaced quaternionic analysis in the 1890s because it was a better way to write Maxwell's equations. But people tended to write vector quantities as like this: (3i-2j+k) instead of (3,-2,1). So (ijk) became the standard basis vectors in R^3.
Finally, physicists started using group theory to describe symmetries in systems of differential equations. So (ijk) started to connote "vectors that get swapped around by permutation groups," then drifted towards "index-like things that take on all possible values in some specified set," which is basically what they mean in a for loop.
by discarding (a little biased)
a seems an array
b seems another array
c seems a language name
d seems another language name
e seems exception
f looks bad in combination with "for" (for f, a pickup?)
g seems g force
h seems height
i seems an index
j seems i (another index)
k seems a constant k
l seems a number one (1)
m seems a matrix
n seems a node
o seems an output
p sounds like a pointer
q seems a queue
r seems a return value
s seems a string
t looks like time
u reserved for UVW mapping or electic phase
v reserved for UVW mapping or electic phase or a vector
w reserved for UVW mapping or electic phase or a weight
x seems an axis (or an unknown variable)
y seems an axis
z seems a third axis
One sunny afternoon, Archimedes what pondering (as was usual for sunny afternoons) and ran into his buddy Eratosthenes.
Archimedes said, "Archimedes to Eratosthenes greeting! I'm trying to come up with a solution to the ratio of several spherical rigid bodies in equilibrium. I wish to iterate over these bodies multiple times, but I'm having a frightful time keeping track of how many iterations I've done!"
Eratosthenes said, "Why Archimedes, you ripe plum of a kidder, you could merely mark successive rows of lines in the sand, each keeping track of the number of iterations you've done within iteration!"
Archimedes cried out to the world that his great friend was undeniably a shining beacon of intelligence for coming up with such a simple solution. But Archimedes remarked that he likes to walk in circles around his sand pit while he ponders. Thus, there was risk of losing track of which row was on top, and which was on bottom.
"Perhaps I should mark these rows with a letter of the alphabet just off to the side so that I will always know which row is which! What think you of that?" he asked, then added, "But Eratosthenes... whatever letters shall I use?"
Eratosthenes was sure he didn't know which letters would be best, and said as much to Archimedes. But Archimedes was unsatisfied and continued to prod the poor librarian to choose, at least, the two letters that he would require for his current sphere equilibrium solution.
Eratosthenes, finally tired of the incessant request for two letters, yelled, "I JUST DON'T KNOW!!!"
So Archimedes chose the first two letters in Eratosthenes' exclamatory sentence, and thanked his friend for the contribution.
These symbols were quickly adopted by ancient Greek Java developers, and the rest is, well... history.
i think it's because a lot of loops use an Int type variable to do the counting, like
for (int i = 0; etc
and when you type, you actually speak it out in your head (like when you read), so in your mind, you say 'int....'
and when you have to make up a letter right after that 'int....' , you say / type the 'i' because that is the first letter you think of when you've just said 'int'
like you spell a word to kids who start learning reading you spell words for them by using names, like this:
WORD spells William W, Ok O, Ruby R, Done D
So you say Int I, Double d, Float f, string s etc. based on the first letter.
And j is used because when you have done int I, J follows right after it.
I think it's a combination of the other mentioned reasons :
For starters, 'i' was commonly used by mathematicians in their notation, and in the early days of computing with languages that weren't binary (ie had to be parsed and lexed in some fashion), the vast majority of users of computers were also mathematicians (... and scientists and engineers) so the notation fell into use in computer languages for programming loops, and has kind of just stuck around ever since.
Combine this with the fact that screen space in those very early days was very limited, as was memory, it made sense to keep shorter variable names.
Possibly historical ?
FORTRAN, aurguably the first high level language, defined i,j,k,l,m as Integer datatypes by default, and loops could only be controlled by integer variable, the convention continues ?
eg:
do 100 i= j,100,5
....
100 continue
....
i = iterator, i = index, i = integer
Which ever you figure "i" stands for it still "fits the bill".
Also, unless you have only a single line of code within that loop, you should probably be naming the iterator/index/integer variable to something more meaningful. Like: employeeIndex
BTW, I usually use "i" in my simple iterator loops; unless of course it contains multiple lines of code.
i = iota, j = jot; both small changes.
iota is the smallest letter in the greek alphabet; in the English language it's meaning is linked to small changes, as in "not one iota" (from a phrase in the New Testament: "until heaven and earth pass away, not an iota, not a dot, will pass from the Law" (Mt 5:18)).
A counter represents a small change in a value.
And from iota comes jot (iot), which is also a synonym for a small change.
cf. http://en.wikipedia.org/wiki/Iota
Well from Mathematics: (for latin letters)
a,b: used as constants or as integers for a rational number
c: a constant
d: derivative
e: Euler's number
f,g,h: functions
i,j,k: are indexes (also unit vectors and the quaternions)
l: generally not used. looks like 1
m,n: are rows and columns of matrices or as integers for rational numbers
o: also not used (unless you're in little o notation)
p,q: often used as primes
r: sometimes a spatial change of variable other times related to prime numbers
s,t: spatial and temporal variables or s is used as a change of variable for t
u,v,w: change of variable
x,y,z: variables
Many possible main reasons, I guess:
mathematicians use i and j for Natural Numbers in formulas (the ones that use Complex Numbers rarely, at least), so this carried over to programming
from C, i hints to int. And if you need another int then i2 is just way too long, so you decide to use j.
there are languages where the first letter decides the type, and i is then an integer.
It comes from Fortran, where i,j,k,l,m,n are implicitly integers.
It definitely comes from mathematics, which long preceded computer programming.
So, where did if come from in math? My completely uneducated guess is that it's as one fellow said, mathematicians like to use alphabetic clusters for similar things -- f, g, h for functions; x, y, z for numeric variables; p, q, r for logical variables; u, v, w for other sets of variables, especially in calculus; a, b, c for a lot of things. i, j, k comes in handy for iterative variables, and that about exhausts the possibilities. Why not m, n? Well, they are used for integers, but more often the end points of iterations rather than the iterative variables themselves.
Someone should ask a historian of mathematics.
Counters are so common in programs, and in the early days of computing, everything was at a premium...
Programmers naturally tried to conserve pixels, and the 'i' required fewer pixels than any other letter to represent. (Mathematicians, being lazy, picked it for the same reason - as the smallest glyph).
As stated previously, 'j' just naturally followed...
:)
I use it for a number of reasons.
Usually my loops are int based, so
you make a complete triangle on the
keyboard typing "int i" with the
exception of the space I handle with
my thumb. This is a very fast
sequence to type.
The "i" could stand for iterator, integer, increment, or index, each of which makes
logical sense.
With my personal uses set aside, the theory of it being derived from FORTRAN is correct, where integer vars used letters I - N.
I learned FORTRAN on a Control Data Corp. 3100 in 1965. Variables starting with 'I' through 'N' were implied to be integers. Ex: 'IGGY' and 'NORB' were integers, 'XMAX' and 'ALPHA' were floating-point. However, you could override this through explicit declaration.