how to correct a slightly incorrect DFA, for a given correct input string? - finite-automata

I wrote a program which can generate DFAs. But the DFAs are slightly incorrect. That is, sometimes they can't accept the correct strings.
My question is: is there any algorithm can correct the DFAs, so that they can accept the given correct strings?
More formally,
Suppose DFA D doesn't accept string str.
Need an algorithm A, s.t. D' = A( D, str) and D' accepts str

You could represent the additional string(s) you want to accept as chain automata and then simply take the union of these chains with the DFA D. Afterwards you may also need to determinize the unioned machine.

Related

Removing or adding the empty word from a DFA

The title is my interpretation of this question. Below is what I have attempted so far:
Case 1: Ɛ ∈ L(M)
L(M1) = L(M)
L(M2) = {Q2, Σ2, q20, F2, 𝛿2}
Q={q0, ... , qi}
Q2={q20, ... , q2i+1}
Σ2 = Σ
q2i+1 ∈ F2 iff qi ∈ F
𝛿2(q2i+1, a) = 𝛿(qi, a)
Case 2: Ɛ ∉ L(M)
L(M2) = L(M)
L(M1) = {Q1, Σ1, q10, F1, 𝛿1}
...
What you have so far looks good but incomplete. Here's a description of what's left; writing it out in symbols is left as an exercise.
In the first case, if the language already contains the empty string, we're done and can use the automaton for the language directly with no modification. If it does not contain the empty string already, we can add a new initial state and have it transition like the original initial state. If we leave all other transitions alone and make this new initial state accepting, we will accept the empty string as well as anything the original automaton accepted.
In the second case, if the language does not already contain the empty string, we're done and can use the automaton for the language directly with no modification. If it does contain the empty string already, we can add a new initial state and have it transition like the original initial state. If we leave all other transitions alone and don't make this new initial state accepting, we will not accept the empty string but will continue to accept everything else.
This is the best that can be done, in general. However, specific languages might have smaller automata than these constructed after adding or removing the empty string. For instance, the language consisting of only the empty string has a DFA with two states, but a minimal DFA for that language with the empty string removed has one state. Similarly, the language of all non-empty strings has a DFA with two states, but adding the empty string to that language means there's a one-state DFA for that language. So this construction does not always give the smallest DFA possible, but it is guaranteed to work for all cases, including those where there is no smaller DFA for the language (e.g., adding the empty string to the empty language forces the addition of a new state).

how to get the longest string in an array in Openrefine

With GREL is it possible to get the longest string of an array ?
For example, if I have an array with 3 strings ["a","aaa","aa"], I want to obtain "aaa".
You can probably do that at the cost of a very complicated formula. It's typically to face this kind of case that Open Refine added Python (and Clojure) as scripting languages. Even if you don't know Python, you can find in two minutes the answer to the question "how to choose the longest string in list?" and simply copy and paste it (by adding a "return" instead of "print")
In this case :
return max(['a','aaa','aaaa','aa'], key=len)
EDIT
Just for the sake of the challenge, here is a possible solution with GREL.
value = "a,aa,aaaa,aa"
forEach(value.split(','), e, if(length(e)==sort(forEach(value.split(','), e, e.length()))[-1], e, null)).join(',').split(',')

How to factorize a string to check its belonging to language that is generated from alphabet?

Let S= {a, bb, bab, abaab} is an alphabet. and kleene closure will be S* will all possible combinations.
Is string abaabbabbaab exists in S*?
what is the method to factorize to check whether it is in S* or not?
I have done it, by the following ways,
Possible factorization:
(abaab)(bab)(b)(a)(a)(b)
(abaab)(bab)(b)(aa)(b)
(abaab)(bab)(ba)(ab)
(abaab)(bab)(baa)(b)
(abaab)(bab)(b)(aab)
we can see that (abaab)(bab) is matching , but later part is not matching will combinations in S*. I have factorized the later part in many ways, but still its not matching.
I want to ask that,
is it correct?
Is this correct way to factorize(tokenize) the string?
are all factorization pairs are correct?
is this correct method to check a string whether it is belong to a
language or not?
Some of your factoriztions contain $(b)$, which is not in $S$. So they are not correct.
I think your method is exhaustive trial and error. If you do that correctly, it is a correct way to find a factorization. For checking membership of a language, it works if the language is given in the form of the Kleene closure of a finite language.

what is ambiguity in alphabet in automata theory?

I am just new in automata field. I have read many articles, and seen many video. I stuck in some first topics. It can be easy for others. but after spending a lot of time,i am still unable to understand it.
TOPIC is: Ambiguity in alphabet
An alphabet is = {A, Aa, bab, d}
and a string is s= AababA
and author says that, this is ambiguous alphabet, because when computer reads it , it reads from left to right. After the capital A, there again A that is prefix of small a, will create ambiguity. A letter(symbol) should not be prefix again of a new letter.
moreover author says.
we will tokenize it (AababA) in two ways:
(Aa) (bab) (A)
(A) (abab) (A)
after that , first one is ok, second is not ok due to ambiguity in alphabet define above.
What is procedure to tokenize the above string in two ways? is there any specific rule?
How alphabet is ambiguous due to second group.
If it is invalid due to prefix of A, then how? What is the role of prefix in ambiguity of alphabet?
If we don't think about prefix, and we just simply match the both string group with above alphabet, then we can easily judge, that second is not matching with above alphabet, then why do we need to discuss that prefix?
I hope, this question will be considered important, so that answer will help me to make my self out of this confusion. I will be very thankful .
The author chose a confusing example. If you share the source where you got this example, I could give a better answer, but I would argue that in this case, there is no practical ambiguity. If you see Aa, you can know that the first lexeme must be "Aa", because nothing in the alphabet starts with "a".
For an easier example, consider the alphabet {A, a, Aa} and string "AAaAaaA"
You could tokenize this in the following ways:
(A) (A) (a) (A) (a) (a) (A)
(A) (Aa) (A) (a) (a) (A)
(A) (A) (a) (Aa) (a) (A)
(A) (Aa) (Aa) (A)
This is most often resolved by choosing the longest lexeme that matches in each case, which would yield the last tokenization.
Now let us return to your example, but let's make the string a little bit different: "AababAe".
You could tokenize the string in the following ways:
(Aa) (bab) (A) <error>
(A) <error>
In one branch, you have an error. In one branch, you don't. As you noted, the tokenizer should choose the first. Both have errors, though. The point is that there is an explicit choice here to prefer the longest valid tokenization. Nothing in the alphabet forces you to make this choice. It is just as valid to choose the shortest matching option. This would be massively impractical, but it is a valid choice.

Ambiguous grammar?

hi
there is this question in the book that said
Given this grammer
A --> AA | (A) | epsilon
a- what it generates\
b- show that is ambiguous
now the answers that i think of is
a- adjecent paranthesis
b- it generates diffrent parse tree so its abmbiguous and i did a draw showing two scenarios .
is this right or there is a better answer ?
a is almost correct.
Grammar really generates (), ()(), ()()(), … sequences.
But due to second rule it can generate (()), ()((())), etc.
b is not correct.
This grammar is ambiguous due ot immediate left recursion: A → AA.
How to avoid left recursion: one, two.
a) Nearly right...
This grammar generates exactly the set of strings composed of balanced parenthesis. To see why is that so, let's try to make a quick demonstration.
First: Everything that goes out of your grammar is a balanced parenthesis string. Why?, simple induction:
Epsilon is a balanced (empty) parenthesis string.
if A is a balanced parenthesis string, the (A) is also balanced.
if A1 and A2 are balanced, so is A1A2 (I'm using too different identifiers just to make explicit the fact that A -> AA doesn't necessary produces the same for each A).
Second: Every set of balanced string is produced by your grammar. Let's do it by induction on the size of the string.
If the string is zero-sized, it must be Epsilon.
If not, then being N the size of the string and M the length of the shortest prefix that is balanced (note that the rest of the string is also balanced):
If M = N then you can produce that string with (A).
If M < N the you can produce it with A -> AA, the first M characters with the first A and last N - M with the last A.
In either case, you have to produce a string shorter than N characters, so by induction you can do that. QED.
For example: (()())(())
We can generate this string using exactly the idea of the demonstration.
A -> AA -> (A)A -> (AA)A -> ((A)(A))A -> (()())A -> (()())(A) -> (()())((A)) -> (()())(())
b) Of course left and right recursion is enough to say it's ambiguous, but to see why specially this grammar is ambiguous, follow the same idea for the demonstration:
It is ambiguous because you don't need to take the shortest balanced prefix. You could take the longest balanced (or in general any balanced prefix) that is not the size of the string and the demonstration (and generation) would follow the same process.
Ex: (())()()
You can chose A -> AA and generate with the first A the (()) substring, or the (())() substring.
Yes you are right.
That is what ambigious grammar means.
the problem with mbigious grammars is that if you are writing a compiler, and you want to identify each token in certain line of code (or something like that), then ambigiouity wil inerrupt you in identifying as you will have "two explainations" to that line of code.
It sounds like your approach for part B is correct, showing two independent derivations for the same string in the languages defined by the grammar.
However, I think your answer to part A needs a little work. Clearly you can use the second clause recursively to obtain strings like (((((epsilon))))), but there are other types of derivations possible using the first clause and second clause together.