Can final and non-final states having identical configurations be merged together in a minimal DFA? - finite-automata

I have been practicing some questions on automata theory where I came across a question on minimal dfa at which I can't figure out where I have gone wrong.I am getting 4 states in the minimal dfa but my book says answer is 3.The question asks a given NFA to convert into a minimal DFA and count the number of states in the latter one.The given NFA(p and r are initial and final states respectively) is:
{p}---b-->{q}
{q}---a--->{r}
{q}---b--->{s}
{r}---a--->{r}
{r}---b--->{s}
{s}---a--->{r}
{s}---b--->{s}
I am getting 4 states:[r],[p],[q,s],[dead].Can the final [r] and the non-final state [q,s] be merged here since they lead to the similar configuration on receiving inputs a and b??I have learned that final and non-final states cannot be in the same equivalence class...

OK, let's start with all possible states for our DFA, there will be 17 (2^4 for the 4 symbols plus 1 for the empty set). These will be:
{}
{p}
{q}
{r}
{s}
{p,q}
{p,r}
{p,s}
{q,r}
{q,s}
{r,q}
{r,s}
{p,q,r}
{p,q,s}
{p,r,s}
{q,r,s}
{p,q,r,s}
Now that we have all the possible sets, let's highlight all that are reachable from the start state p:
{}
{p} --- Start state. From here we can get to {q} only as defined by the transition {p}--b-->{q}
{q} --- from here, we get to r or s, so {r,s} as defined by {q}--a-->{r} and {q}--b-->{s}
{r}
{s}
{p,q}
{p,r}
{p,s}
{q,r}
{q,s}
{r,q}
{r,s} --- from here, we can only get to r or s (the same state!), so we're at a dead end.
{p,q,r}
{p,q,s}
{p,r,s}
{q,r,s}
{p,q,r,s}
The three accessible states are thus {p},{q}, and {r,s}. The reason the "dead" state, or empty set is not reachable is that none of the accessible transitions lead to it.
Hope this helps!

Related

How to find the fixpoint of a loop and why do we need this? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I know that in static analysis of program, we need to find fixpoint to analysis the info loop provided.
I have read the wiki as well as related meterials in the book Secure_programming_with_Static_Analysis.
But I am still confused with the concept fixpoint, so my questions are:
could anyone give me some explanations of the concept, fixpoint?
What is the practical way(ways) to find the fixpoint in static analysis?
What information can we get after finding the fixpoint?
Thank you!
Conceptually, the fixpoint corresponds to the most information you obtain about the loop by repeatedly iterating it on some set of abstract values. I'm going to guess that by "static analysis" you're referring here to "data flow analysis" or the version of "abstract interpretation" that most closely follows data flow analysis: a simulation of program execution using abstractions of the possible program states at each point. (Model checking follows a dual intuition in that you're simulating program states using an abstraction of possible execution paths. Both are approximations of concrete program behavior. )
Given some knowledge about a program point, this "simulation" corresponds to the effect that we know a particular program construct must have on what we know. For example, at some point in a program, we may know that x could (a) be uninitialized, or else have its value from statements (b) x = 0 or (c) x = f(5), but after (d) x = 42, its value can only have come from (d). On the other hand, if we have
if ( foo() ) {
x = 42; // (d)
bar();
} else {
baz();
x = x - 1; // (e)
}
then the value of x afterwards might have come from either of (d) or (e).
Now think about what can happen with a loop:
while ( x != 0 ) {
if ( foo() ) {
x = 42; // (d)
bar();
} else {
baz();
x = x - 1; // (e)
}
}
On entry, we have possible definitions of x from {a,b,c}. One pass through the loop means that the possible definitions are instead drawn from {d,e}. But what happens if foo() fails initially so that the loop does not run at all? What are the possibilities for x then? Well, in this case, the loop body has no effect, so the definitions of x would come from {a,b,c}. But if it ran, even once, then the answer is {d,e}. So what we know about x at the end of the loop is that the loop either ran or it didn't, which means that the assignment to x could be any one or {a,b,c,d,e}: the only safe answer here is the union of the property known at loop entry ({a,b,c}) and the property know at the end of one iteration ({d,e}).
But this also means that we must associate x with {a,b,c,d,e} at the beginning of the loop body, too, since we have no way of determining whether this is the first or the four thousandth time through the loop. So we have to consider again what we can have on loop exit: the union of the loop body's effect with the property assumed to hold on entry to the last iteration. Happily, this is just {a,b,c,d,e} ∪ {d,e} = {a,b,c,d,e}. In other words, we've not obtained any additional information through this second simulation of the loop body, and thus we can stop, since no further simulated iterations will change the result.
That's the fixpoint: the abstraction of the program state that will cause simulation to produce exactly the same result.
Now as for ways to find it, there are many, though the most straightforward ("chaotic iteration") simply runs the simulation of every program point (according to some fair strategy) until the answer doesn't change. A good starting point for learning better algorithms can be found in most any compilers textbook, though it isn't usually taught in a first course. Steven Muchnick's Advanced Compiler Design and Implementation is a more thorough and very readable treatment of the subject. If you can find a copy, Matthew Hecht's Flow Analysis of Computer Programs is another classic treatment. Both books focus on the "data flow analysis" technique for static analysis. You might also try out Principles of Program Analysis, by Nielson/Nielson/Hankin, though the technical details in the book can be pretty hairy. On the other hand, it offers a more general treatment of static analysis overall.

How to solve δ(A,01) for this DFA?

Consider the DFA :
What will be δ(A,01) equal to ?
options:
A) {D}
B) {C,D}
C) {B,C,D}
D) {A,B,C,D}
The correct answer is option B) but I don't get how. Please some one explain me the steps to solve it and also in general how do we solve sit for any DFA and any transition?
Thanks.
B) Option is not Correct answer! for this transition graph.
In Transition Graph(TG) symbol ε means NULL-move (ε-move). There is two NULL-moves in TG.
One: (A) --- `ε` ---->(B)
Second: (A) --- `ε` ---->(C)
A ε-move means without consume any symbol you can change state. In your diagram from A to B, or A to C.
What will be δ(A,01) equal to ?
Question asked "what is path from state A if input is 01". (as I understand because there is only one final state)
01 can be processed in either of two ways.
(A) -- ε --->(B) -- 0 --> (B) -- 1 --> (D)
(A) -- ε --->(C) -- 0 --> (B) -- 1 --> (D)
Also, there is no other way to process string 01 even if you don't wants to reach final state.
[ANSWER]
So there is misprinting in question (or either you did).
You can learn how to remove NULL-move from transition graph. Also HOW TO WRITE REGULAR EXPRESSION FOR A DFA
If you remove null-moves from TG you will get three ways to accept 01.
EQUIVALENT TRANSITION GRAPH WITHOUT NULL-MOVE
Note there is three start-states in Graph.
Three ways:
{ABD}
{CBD}
{BBD}
In all option state-(B) has to be come.
Also, you have written Consider the DFA : is wrong. The TG is not deterministic because there is there is non-deterministic move δ(A,ε) and next states are either B or C.

Add error checking via production rules to LALR(1) grammar to handle all inputs

I have a grammar that represents expressions. Let's say for simplicity it's:
S -> E
E -> T + E | T
T -> P * T | P
P -> a | (E)
With a, +, *, ( and ) being the letters in my alphabet.
The above rules can generate valid arithmetic expressions containing parenthesis, multiplication and addition using proper order of operations and associativity.
My goal is to accept every string, containing 0 or more of the letters of my alphabet. Here are my constraints:
The grammar must "accept" all strings contained 0 or more letters of my alphabet.
New terminals may be introduced and inserted into the input algorithmically. (I found that explicitly providing an EOF terminal helped when detecting extra tokens beyond an otherwise valid expression, for some reason.)
New production rules may be introduced. The new rules will be error flags (i.e. if the string is parsed using one, then although the string is accepted, it is considered to semantically be an error).
The production rules may be modified so that newly-introduced terminals are inserted.
The grammar should be LALR(1) at least handle-able by Lex/Yacc-like tools (If I recall the dangling else problem requires non-LALR(1), in spite of being compatible with Lex/Yacc).
Additionally I would like the extra rules to correspond to different kinds of errors (missing arguments to a binary operation, missing parenthesis - left or right - extra token(s) beyond an otherwise accept-able expression, etc.). I say that because there may be some trivial way to answer my question to "accept" all inputs that otherwise wouldn't be beneficial for error reporting.
I've found these rules to be useful, although I do not know if they violate my constraints, or not:
P -> # [error]
P -> (E [error]
S -> E $ [instead of S -> E]
S -> E X $ [error]
X -> X Y [error]
X -> Y [error]
Y -> a [error]
Y -> ( [error]
Y -> ) [error]
Y -> * [error]
Y -> + [error]
where $ is the explicit EOF token and # is the empty string.
In case my question wasn't clear: How can I modify my grammar within my constraints to achieve my goal of accepting all inputs, preferably with a nice correspondence of rules to types of errors? Do my rules meet my goal?
This question has been around for a while and covers a topic that is is often visited by beginners in the subject. One often find that those who have done a compilers course in their undergraduate degree know that this is one of those questions that has no easy or single answer. You might have noticed that you have two questions on the same topic, neither of which has been answered. Another question someone else posted was answered with pointers to the literature that explains why this is hard.
It is a question that has remained extant for over 50 years. If one examines the literature over time, from early conference papers, course textbooks, doctoral thesis and (today) SO, we can see regular references to the fact that this is the wrong question! (Or rather the wrong approach to the problem).
Just taking a sample from course texts over the years (random selections from my bookshelf):
Gries, D. (1970) Error Recovery and Correction - An introduction to the Literature, In Compiler Construction, An advanced Course edited by Bauer, F.L. & Eickel, J., Springer Verlag, pp.627-638.
Gries, D. (1971) Compiler Construction for Digital Computers, Wiley, pp.320-326.
Aho, A.V., Ullman, J.D. (1977) Principles of Compiler Design, Addison Wesley, pp.397-405.
Bornat, R. (1979) Understanding and Writing Compilers, Macmillan, pp.251-252.
Hanson, D. (1995) A retargetable C compiler: Design and Implementation,Addison-Wesley, pp.140-146.
Grune, D., Bal, H.E., Jacobs, C.J.H. & Langendoen, K.G. (2000) Modern Compiler Design, Wiley, pp.175-184.
Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D. (2007) Compilers: Principles, Techniques, & Tools, Pearson, Addison-Wesley, pp.283-296.
All of these (over a 40 year span) concur that your question is about using the wrong tools wrongly or going in the wrong direction. I think I'm trying to say "You can't there from here". You should start from somewhere else.
If you want something deeper, there is a whole Phd thesis:
Charles, P. (1991) A Practical method for Constructing Efficient LALR(k) Parsers with Automatic Error Recovery, New York University
Hopefully, for those who visit this question again in the future, there is a placeholder for the answers.
I note from comments on your other question that you are using MPPG, derived from CPPG. Not everyone will have used these, so I'm including a couple of links to these tools:
Managed Babel Systems Essentials
Garden Points Parser Generator
Irony .NET compiler Construction Kit
Writing your first Visual Studio Language Service

Some questions on DLR call site caching

Kindly correct me if my understanding about DLR operation resolving methodology is wrong
DLR has something call as call site caching. Given "a+b" (where a and b are both integers), the first time it will not be able to understand the +(plus) operator as wellas the operand. So it will resolve the operands and then the operators and will cache in level 1 after building a proper rule.
Next time onwards, if a similar kind of match is found (where the operand are of type integers and operator is of type +), it will look into the cache and will return the result and henceforth the performance will be enhanced.
If this understanding of mine is correct(if you fell that I am incorrect in the basic understanding kindly rectify that), I have the below questions for which I am looking answers
a) What kind of operations happen in level 1 , level 2
and level 3 call site caching in general
( i mean plus, minus..what kind)
b) DLR caches those results. Where it stores..i mean the cache location
c) How it makes the rules.. any algorithm.. if so could you please specify(any link or your own answer)
d) Sometime back i read somewhere that in level 1 caching , DLR has 10 rules,
level 2 it has 20 or something(may be) while level 3 has 100.
If a new operation comes, then I makes a new rule. Are these rules(level 1 , 2 ,3) predefined?
e) Where these rules are kept?
f) Instead of passing two integers (a and b in the example),
if we pass two strings or one string and one integer,
whether it will form a new rule?
Thanks
a) If I understand this correctly currently all of the levels work effectively the same - they run a delegate which tests the arguments which matter for the rule and then either performs an operation or notes the failure (the failure is noted by either setting a value on the call site or doing a tail call to the update method). Therefore every rule works like:
public object Rule(object x, object y) {
if(x is int && y is int) {
return (int)x + (int)y;
}
CallSiteOps.SetNotMatched(site);
return null;
}
And a delegate to this method is used in the L0, L1, and L2 caches. But the behavior here could change (and did change many times during development). For example at one point in time the L2 cache was a tree based upon the type of the arguments.
b) The L0 cache and the L1 caches are stored on the CallSite object. The L0 cache is a single delegate which is always the 1st thing to run. Initially this is set to a delegate which just updates the site for the 1st time. Additional calls attempt to perform the last operation the call site saw.
The L1 cache includes the last 10 actions the call site saw. If the L0 cache fails all the delegates in the L1 cache will be tried.
The L2 cache lives on the CallSiteBinder. The CallSiteBinder should be shared across multiple call sites. For example there should generally be one and only one additiona call site binder for the language assuming all additions are the same. If the L0 and L1 all of the available rules in the L2 cache will be searched. Currently the upper limit for the L2 cache is 128.
c) Rules can ultimately be produced in 2 ways. The general solution is to create a DynamicMetaObject which includes both the expression tree of the operation to be performed as well as the restrictions (the test) which determines if it's applicable. This is something like:
public DynamicMetaObject FallbackBinaryOperation(DynamicMetaObject target, DynamicMetaObject arg) {
return new DynamicMetaObject(
Expression.Add(Expression.Convert(target, typeof(int)), Expression.Convert(arg, typeof(int))),
BindingRestrictions.GetTypeRestriction(target, typeof(int)).Merge(
BindingRestrictions.GetTypeRestriction(arg, typeof(int))
)
);
}
This creates the rule which adds two integers. This isn't really correct because if you get something other than integers it'll loop infinitely - so usually you do a bunch of type checks, produce the addition for the things you can handle, and then produce an error if you can't handle the addition.
The other way to make a rule is provide the delegate directly. This is more advanced and enables some advanced optimizations to avoid compiling code all of the time. To do this you override BindDelegate on CallSiteBinder, inspect the arguments, and provide a delegate which looks like the Rule method I typed above.
d) Answered in b)
e) I believe this is the same question as b)
f) If you put the appropriate restrictions on then yes (which you're forced to do for the standard binders). Because the restrictions will fail because you don't have two ints the DLR will probe the caches and when it doesn't find a rule it'll call the binder to produce a new rule. That new rule will come back w/ a new set of restrictions, will be installed in the L0, L1, and L2 caches, and then will perform the operation for strings from then on out.

State based testing(state charts) & transition sequences

I am really stuck with some state based testing concepts...
I am trying to calculate some checking sequences that will cover all transitions from each state and i have the answers but i dont understand them:
alt text http://www.gam3r.co.uk/1m.jpg
Now the answers i have are :
alt text http://www.gam3r.co.uk/2m.jpg
I dont understand it at all. For example say we want to check transition a/x from s1, wouldnt we do ab only? As we are already in s1, we do a/x to test the transition to s2, then b to check we are in the previous right state(s1)? I cant understand why it is aba or even bb for s1...
Can anyone talk me through it?
Thanks
There are 2 events available in each of 4 states, giving 8 transitions, which the author has decided to test in 8 separate test sequences. Each sequence (except the S1 sequences - apparently the machine's initial state is S1) needs to drive the machine to the target state and then perform either event a or event b.
The sequences he has chosen are sufficient, in that each transition is covered. However, they are not unique, and - as you have observed - not minimal.
A more obvious choice would be:
a b
ab aa
aaa aab
ba bb
I don't understand the author's purpose in adding superfluous transitions at the end of each sequence. The system is a Mealy machine - the behaviour of the machine is uniquely determined by the current state and event. There is no memory of the path leading to the current state; therefore the author's extra transitions give no additional coverage and serve only to confuse.
You are also correct that you could cover the all transitions with a shorter set of paths through the graph. However, I would be disinclined to do that. Clarity is more important than optimization for test code.