Construct a minimal DFA

Construct a minimal DFA - automation

Construct a DFA for the set of string over {a, b} such that when length of the substring "aba" is divided by 3 leaves remainder 2 i.e, |w|aba = 2(mod 3).

Related

How can I prove this language is regular?

I'm trying to prove if this language:
L = { w={0,1}* | #0(w) % 3 = 0 } (number of 0's is divisble by 3)
is regular using the pumping lemma, but I can't find a way to do it. All other examples I got, have a simple form or let's say a more defined form such as w = axbycz etc.

I don't think you can use pumping lemma to prove that a language is regular. To prove a language is regular, you just need to give a regular expression or a DFA. In this case the regular expression is quite easy:
1*(01*01*01*)*
(proof: the regular expression clearly does not accept any string which has the number of 0's not divisible by 3, so we just need to prove that all possible strings which has the number of 0's divisible by 3 is accepted by this regular expression, which can be done by confirming that for strings that contain 3n 0's, the regular expression matches it since 1n001n101n201n3...01n3n-201n3n-101n3n has the same number of 0's and the nk's can be substituted so that it matches the string, and that this format is clearly accepted by the regular expression)
Pumping lemma cannot be used to prove that a language is regular because we cannot set the y as in Daniel Martin's answer. Here is a counter-example, in a similar format as his answer (please correct me if I'm doing something fundamentally different from his answer):
We prove that the language L = {w=0n1p | n ∈ N, n>0, p is prime} is regular using pumping lemma as follows: note that there is at least one occurrence of 0, so we take y as 0, and we have xykz = 0n+k-11p, which still satisfy the language definition. Therefore L is regular.
But this is false, since we know that a sequence with prime-numbered length is not regular. The problem here is we cannot just set y to any character.

Any string in this language with at least three characters in it has this property: either the string has a "1" in it, or there are three "0"s in a row.
If the string contains a 1, then you can split it as in the pumping lemma and set y equal to some 1 in the string. Then obviously the strings xyz, xyyz, xyyyz, etc. are all in the language because all those strings have the same number of zeros.
If the string does not contain a 1, it contains three 0s in a row. Setting y to those three 0s, it should be obvious that xyz, xyyz, xyyyz, etc. are all in the language because you're adding three 0 characters each time, so you always have a number of 0s divisible by 3.
#justhalf in the comments is perfectly correct; the pumping lemma can be used to prove that a regular language can be pumped or that a language that cannot be pumped is not regular, but you cannot use the pumping lemma to prove that a language is regular in the first place. Mea Culpa.
Instead, here's a proof that the given language is regular based on the Myhill-Nerode Theorem:
Consider the set of all strings of 0s and 1s. Divide these strings into three sets:
E0, all strings such that the number of 0s is a multiple of three,
E1, all strings such that the number of 0s is one more than a multiple of three,
E2, all strings such that the number of 0s is two more than a multiple of three.
Obviously, every string of 0s and 1s is in one of these three sets.
Furthermore, if x and z are both strings of 0s and 1s, then consider what it means if the concatenation xz is in L:
If x is in E0, then xz is in L if and only if z is in E0
If x is in E1, then xz is in L if and only if z is in E2
If x is in E2, then xz is in L if and only if z is in E1
Therefore, in the language of the theorem, there is no distinguishing extension for any two strings in the same one of our three Ei sets, and therefore there are at most three equivalence classes. A finite number of equivalence classes means the language is regular.
(in fact, there are exactly three equivalence classes, but that isn't needed)

A language is regular if and only if some nondeterministic finite automaton recognizes it.
Automaton is a finite state machine.
We have to build an automaton that regonizes L.
For each state, thinking like:
"Where am I?"
"Where can I go to, with some given entry?"
So, for L = { w={0,1}* | #0(w) % 3 = 0 }
The possibilites (states) are:
The remainder (rest of division) is 0, 1 or 2. Which means we need three states.
Let q0,q1 and q2 be the states that represent the remainderes 0,1 and 2, respectively.
q0 is the start and final state.
Now, for "0" entries, do the math #0(w)%3 and go to the aproppriated state.
Transion functions:
f(q0, 0) = q1
f(q1, 0) = q2
f(q2, 0) = q0
For "1" entries, it just loops wherever it is, 'cause it doesn't change the machine state.
f(qx, 1) = qx
The pumping lemma proves if some language is not regular.
Here is a good book for theory of computation: Introduction to the Theory of Computation 3rd Edition
by Michael Sipser.

Minimal set of test cases with modified condition/decision coverage

I have a question regarding modified condition/decision coverage that I can't figure out.
So if I have the expression ((A || B) && C) and the task is with a minimal number of test cases receive 100% MD/DC.
I break it down into two parts with the minimal number of test cases for (A || B) and (X && C).
(A || B) : {F, F} = F, {F, T} = T, {T, -} = T
(X && C) : {F, -} = F, {T, F} = F, {T, T} = T
The '-' means that it doesn't matter which value they are since they won't be evaluated by the compiler.
So when I combine these I get this as my minimal set of test cases:
((A || B) && C) : {{F, F}, -} = F, {{F, T}, F} = F, {{T, -}, T} = T
But when I google it this is also in the set: {{F, T}, T} = T
Which I do not agree on because I tested the parts of this set separately in the other tests, didn't I?
So I seem to miss what the fourth test case adds to the set and it would be great if someone could explain why I must have it?

First of all, a little explanation on MCDC
To achieve MCDC you need to find at least one Test Pair for each condition in your boolean expression that fulfills the MCDC criterion.
At the moment there are 3 types of MCDC defined and approved by certification bodies (e.g. FAA)
"Unique Cause" – MCDC (original definition): Only one, specifically the influencing condition may change between the two test values of the test pair. The resulting decision, the result of the boolean expression, must also be different for the 2 test values of the test pair. All other conditions in the test pair must be fixed and unchanged.
"Masking" – MCDC”: Only one condition of the boolean expression having an influence on the outcome of the decision may change. Other conditions may change as well, but must be masked. Masked means, using the boolean logic in the expression, they will not have an influence on the outcome. If the left side of an AND is FALSE, then the complete rightside expression and sub expressions do not matter. They are masked. Masking is a relaxation of the “Unique Cause”MCDC.
"Unique Cause + Masking" – MCDC. This is a hybrid and especially needed for boolean expressions with strongly coupled conditions like for example in “ab+ac”. It is not possible to find a test pair for condition “a” the fulfills "Unique Cause" – MCDC. So, we can relax the original definition, and allow masking for strongly coupled conditions.
With these 2 additional definitions much more test pairs can be found.
Please note additionally that, when using languages with a boolean short cut evaluation strategy (like Java, C, C++), there are even more test pairs fulfilling MCDC criteria.
Very important is to understand that a BlackBox view on a truth table does not allow to find any kind of Masking or boolean short cut evaluation.
MCDC is a structural coverage metric and so a WhiteBox View on the boolean expression is absolutely mandatory.
So, now to your expression: “(a+b)c”. A brute force analysis with boolean shortcut evaluation will give you the following test pairs for your 3 conditions a,b,c:
Influencing Condition: 'a' Pair: 0, 5 Unique Cause
Influencing Condition: 'a' Pair: 0, 7 Unique Cause
Influencing Condition: 'a' Pair: 1, 5 Unique Cause
Influencing Condition: 'a' Pair: 1, 7 Unique Cause
Influencing Condition: 'b' Pair: 0, 3 Unique Cause
Influencing Condition: 'b' Pair: 1, 3 Unique Cause
Influencing Condition: 'c' Pair: 2, 3 Unique Cause
Influencing Condition: 'c' Pair: 2, 5 Masking
Influencing Condition: 'c' Pair: 2, 7 Masking
Influencing Condition: 'c' Pair: 3, 4 Masking
Influencing Condition: 'c' Pair: 3, 6 Masking
Influencing Condition: 'c' Pair: 4, 5 Unique Cause
Influencing Condition: 'c' Pair: 4, 7 Unique Cause
Influencing Condition: 'c' Pair: 5, 6 Unique Cause
Influencing Condition: 'c' Pair: 6, 7 Unique Cause
Without a white box view, you will never find “Unique Cause” – MCDC test pairs like 0,7 for condition “a”. Also, you will not find any of the valid "Masking" – MCDC” test pairs.
The next step is now to find one minimum test set. This can be done by applying the “set cover” or “unate covering” algoritm. From above test pairs we can create a table that shows which test value covers what condition. In your case this looks like the following:
0 1 2 3 4 5 6 7
a X X X X
b X X X
c X X X X X X
Simple elimination of double columns will already reduce the table to
0 1 2 3 4 5 6 7 0 3 5
a X - X - ==> a X X
b X - X b X X
c - X - X - - c X X
Please Note: We employed some heuristics to select, which of completely identical columns should be eliminated. So effectively there are more solutions to the set cover problem. But with the geometrically growth on computing time and memory consumption related to the number of conditions, this is the only meaningful approach.
Now we will apply Petrick’s method and find out all coverage sets:
1. 0, 3
2. 0, 5
3. 3, 5
That means, we need to select, out of the above test pair list, test pairs for a,b,c that contain, for solution 1, 0 and 3, for solution 2, 0 and 5, and for solution 3 3 and 5
Also here we will apply some heuristic functions to come up with the minimum solutions:
-------- For Coverage set 1
Test Pair for Condition 'a': 0 5 (Unique Cause)
Test Pair for Condition 'b': 0 3 (Unique Cause)
Test Pair for Condition 'c': 2 3 (Unique Cause)
Resulting Test Vector: 0 2 3 5
-------- For Coverage set 2
Test Pair for Condition 'a': 0 5 (Unique Cause)
Test Pair for Condition 'b': 0 3 (Unique Cause)
Test Pair for Condition 'c': 5 6 (Unique Cause)
Resulting Test Vector: 0 3 5 6
-------- For Coverage set 3
Test Pair for Condition 'a': 1 5 (Unique Cause)
Test Pair for Condition 'b': 1 3 (Unique Cause)
Test Pair for Condition 'c': 5 6 (Unique Cause)
Resulting Test Vector: 1 3 5 6
---------------------------------------
Test vector: Recommended Result: 0 2 3 5
0: a=0 b=0 c=0 (0)
2: a=0 b=1 c=0 (0)
3: a=0 b=1 c=1 (1)
5: a=1 b=0 c=1 (1)
You see that you have for all 3 possible solutions 4 test cases that cover all conditions and that fulfill MCDC criteria. If you check publications in the internet, then you will see that the minimum number of tests is n+1 (n = number of conditions). This sounds not like a great achievement. But looking at MCC (Multiple Condition Coverage) you will have 2^n test cases. So, for 8 conditions 256 test cases. With MCDC there will be only 9. That was one reason to come up with MCDC.
I created also a software to help to better understand MCDC coverage. It uses a boolean expression as input, simplifies that (if you wish) and calculates all MCDC test pairs.
You may find it here:
MCDC
I hope, I could shed some light on the topic. I am happy to answer further questions.

Recall that for MC/DC you need for every condition P (A/B/C in your case) two test cases T and T' so that P is true in T and false in T', and so that the outcome of the predicate is true in one and false in the other test case.
An MC/DC cover for ((A || B) && C) is:
T1: (F, F, T) -> F (your first test case)
T2: (F, T, T) -> T (B flips outcome compared to T1, missing)
T3: (T, F, T) -> T (A flips outcome compared to T1, your third test case)
T4: (F, T, F) -> F (C flips outcome compared to T2, your second test case)
In concrete test cases you cannot have "-" / don't-care values: You must make a choice when you exercise the system.
So what you are missing in your answer is a pair of two test cases (T1 and T2) in which flipping only the second condition B also flips the outcome.

For MCDC to be achieved it requires the output to change when there is a change in only 1 input while the other inputs remain constant. For example in your case
MCDC for (A || B) & C
So when A changes from 1 to 0 B & C remain constant whereas output changes from 1 to 0. Same applies for both B & C.
Normally they say the minimum number of steps to achieve MCDC is Number of Inputs + Number of Outputs.
So in your case MCDC can be achieved in minimum of 4 steps.

Minimum number of states in a DFA having '1' as the 5th symbol from right

What is the minimum number of states needed in a DFA to accept the strings having '1' as 5th symbol from right? Strings are defined over the alphabet {0,1}.

The Myhill-Nerode theorem is a useful tool for solving these sorts of problems.
The idea is to build up a set of equivalence classes of strings, using the idea of "distinguishing extensions". Consider two strings x and y. If there exists a string z
such that exactly one of xz and yz is in the language, then z is a distinguishing extension,
and x and y must belong to different equivalence classes. Each equivalence class maps to a different state in the minimal DFA.
For the language you've described, let x and y be any pair of different 5-character strings
over {0,1}. If they differ at position n (counting from the right, starting at 1), then any string z with length 5-n will be a distinguishing extension: if x has a 0 at position n,
and y has a 1 at position n, then xz is rejected and yz is accepted. This gives 25 = 32
equivalence classes.
If s is a string with length k < 5 characters, it belongs to the same equivalence class
as 0(5-k)s (i.e. add 0-padding to the left until it's 5 characters long).
If s is a string with length k > 5 characters, its equivalence class is determined by its final 5 characters.
Therefore, all strings over {0,1} fall into one of the 32 equivalence classes described above, and by the Myhill-Nerode theorem, the minimal DFA for this language has 32 states.

No of state will be 2^n where n is nth symbol from right
So 2^5=32 will be no of states

Minimum number of states needed?

Definition of a language L with alphabet { a } is given as following
L = { ank | k > 0 ; and n is a positive integer constant }
What is the number of states needed in a DFA to recognize L?
In my opinion it should be k+1 but I am not sure.

The language L can be recognized by a DFA with n+1 states.
Observe that the length of any string in L is congruent to 0 mod n.
Label n of the states with integers 0, 1, 2, ... n-1, representing each possible remainder. An additional state, S, is the start state. S has a single transition, to state 1. If the machine is currently in state i, on input it moves to state (i+1) mod n. State 0 is
the only accepting state. (If the empty string were part of L, we could eliminate S and make state 0 the start state).
Suppose there were a DFA with fewer than n+1 states that still recognized L. Consider the sequence of states S0, S1, ... Sn encountered while processing the string an. Sn must be an accepting state, since an is in L. But since there are fewer than n+1 distinct states in this DFA, by the pigeonhole principle there must have been some state that was visited at least twice. Removing that loop gives another path (and another accepted string), with length < n, from S0 to Sn. But L contains no strings shorter than n, contradicting our assumption. Therefore no DFA with fewer than n+1 states recognizes L.

Closure properties of context free languages

I have the following problem:
Languages L1 = {a^n * b^n : n>=0} and L2 = {b^n * a^n : n>=0} are
context free languages so they are closed under the L1L2 so L={a^n *
b^2n A^n : n>=0} must be context free too because it is generated by a
closure property.
I have to prove if this is true or not.
So I checked the L language and I don’t think that it is context free then I also saw that L2 is just L1 reversed.
Do I have to check if L1, L2 are deterministic?

L1={anbn : n>=0} and L2={bnan : n>=0} are both
context free.
Since context-free languages are closed under concatenation, L3=L1L2 is also context-free.
However, L3 is not the same language as L4={anb2nan : n >= 0}.
The string abbbaa is in L3, but not L4.
So is L4 context-free? If so, it must obey the pumping lemma for context-free languages.
Let p be the pumping length of L4. Choose s = apb2pap.
Then s is in L4, and |s| > p. Therefore, if L4 is context-free, we can write s
as uvxyz, with |vxy| <= p, |vy| >= 1, and uvnxynz is in L4 for
any n >= 0.
Observe the following properties of any nonempty string in L4: The count of a's equals the count of b's. There is exactly one occurrence of the substring 'ab', and exactly one
occurrence of the substring 'ba'. The length of the initial string of a's equals the length of the final string of a's.
We can use these observations to constrain the possible choices of v and y in our pumping argument for L4. Neither v nor y can contain the substring 'ab', because then, as v and y are pumped an arbitrary number of times, the output string would contain multiple occurrences of 'ab', and therefore cannot be an element of L4. The same argument applies to the substring 'ba'.
So v must be either empty, all a's, or all b's. The same applies to y.
Furthermore, if v is all a's, then y must consist of the same number of b's; otherwise, the pumped string would contain unequal numbers of a's and b's since v and y are pumped by
the same n. Likewise, if v is all b's, then y must be the same number of a's.
But if v is all a's, and y is all b's, the final string of a's is unaffected by pumping v and y, therefore the leading string of a's will no longer match the trailing string of a's.
Similarly, if v is all b's and y is all a's, the leading and trailing strings of a's will again have different lengths as v and y are pumped.
v and y cannot both be empty, since that would violate the condition |vy| >= 1 for
the CFL pumping lemma. But since we have established that |v| = |y|, it follows
that neither v nor y can be empty.
But if v cannot be empty, cannot be all a's, cannot be all b's, and cannot contain
the substrings 'ab' or 'ba', then there is no possible choice of uvxyz for which
the pumped version of s is still in L4. Therefore L4 is not context-free.

I'm not sure that it is -- note that in each of the defintions of L1 and L2, n is scoped within that definition, i.e. they are two different variables. When you combine them you should rename one, and get instead:
L = {a^n * b^n b^m * a^m : n,m>=0}
This is a very different language from your L, but it is obviously a context free one.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas