Can I say that a state space is a formal specification of some system's behaviour? - verification

Given a system, and its complete state space, can I say that that state space is a formal specification of that system's behaviour?

Not unless you have formally defined all possible transitions to and from each state and your state space is inclusive of all possible states the system can be in.
In a formal definition for a computer system, it should also include unexpected transitions such as computer crashes. A fault tree analysis may help with ensuring that all possible states are defined.
See wikipedia

Related

how to show periodic call of a state machine in UML state diagram?

I work in the field of embedded software.
In a project, we are using a time-trigged software architecture so that each component is called
periodically (with component's tick accordingly) and the component has a predetermined time to do its task.
Now suppose that one of these components has a state machine which is active whenever the scheduler calls the component. As the architecture is a time-trigged architecture, some time-based transitions in the state machine shall be synchronized with the component's call tick (suppose that the component is call every 10ms via the scheduler and , say, there is a transition from state A to B in the component's state machine that is triggered after 50ms).
The question is that is it necessary to show (in a way) the call tick of the component in its state machine?
If so, how to show?
A UML statemachine describes the behaviour of the system, not its implementation. Your method of calling each component at some time interval in order to progress/update its statemachine is an implementation detail.
The question is that is it necessary to show (in a way) the call tick of the component in its state machine?
So, no it is not necessary it would be an implementation constraint, and those should generally be avoided. The same statemachine would work just the same (at least for time triggered events) if you called each component asynchronously as fast as possible.
Your example:
_____
| A |
|_____|
| after 50ms/
|
__V__
| B |
|_____|
would work just the same, (and be more responsive for non-time triggered events).
The point is the UML state machine diagram should describe the required behaviour (i.e. the design) not the code or implementation. That is what it must do not the how it does it. You are not describing how to implement a statemachine here.
UML is agnostic on the diagram purpose: you may model the requirements, the high-level design, or the actual implementation. Whatever the purpose here, the key is separation of concerns. Therefore, in general:
If you want to show the timing of component interactions -- including at state level for some components -- you'd better go for a timing diagram. It does not show the general state machine but perfectly documents synchronization in a given scenario.
If you need to show active / on-hold states, you could consider two orthogonal state machines (i.e. one with your state machine, one for the process states, if both are independent)
If you have an important timing constraint, note it as a constraint in the diagram (i.e. { duration < 50 ms } ), rather than artificially defining timing events that are in reality just means to the end.
But if you cannot separate state transition and timing and precise timing it's part of your design, you may use time events as any other transition events:
after t for a relative time expression, i.e. a duration after having entered a state.
at t for an absolute time expression, e.g. a fixed time (e.g. 20:45) or a point in time (at 50 ms), the time origin being probably the very start of the state machine.
It sounds like the code servicing the "tick" should be disconnected from the general state machine of the target. Or rather, the state machine operates on a higher abstraction layer. The logic for providing data to the "tick" would be buried in some lower layer state handling code.
Store away data necessary to service the "tick" somewhere. If there's a lot of work to do for the MCU in relation to the "tick" time, then let some ISR or DMA design grab the latest available data when it's time to act, independent of the state machine. Or alternatively, if the MCU isn't busy and can easily finish one lap in it's main loop/state machine between ticks, you could even use a polling design.

Generate events/commands using a property based testing tool?

As I understand it, most property testing tools operate at the level of functions. Given a set of arguments, such tools will generate random input and test output against some invariant.
I have read that ScalaCheck is now starting to include generation of events to test a statefull system. However, I can't find great deal of information on it. Is this becoming popular in rest of the *check ecosystem as well (fscheck, quickcheck, other variations)?
What you call "generation of events" to my knowledge originates in "Testing Monadic Code with QuickCheck", by Koen Claessen and John Hughes. The example they give is testing a queue. The approach that is used is always similar - as comments say, since "basic" quickcheck (I'll use lowercase quickcheck to describe the family of QuickCheck ports on various platforms) assumes it generates immutable data, at first sight it's not easy to use quickcheck to test a side-effecting, stateful system.
Until you realize that a stateful system gets to a certain state by executing a sequence of state transitions (these are variously called commands, actions, events etc). And this sequence can be represented perfectly as an immutable list of immutable transitions! Typically then each transition is executed on the real system under test, and a model of its state. Then after each transition the model state is compared with the real state.
To see how this plays out in Quvik QuickCheck (for Erlang) for example you can read "Testing Telecoms Software with Quviq QuickCheck" by Thomas Arts, John Hughes, Joakim Johansson and Ulf Wiger.
I do believe most quickchecks, including QuickCheck itself, have a layer on top of the basic quickcheck functionality that allows you to generate a sequence of state transitions, typically using a state machine like approach with pre-and postconditions etc.
I don't think this is particularly new, but probably a bit under-emphasized.
For example, FsCheck has had model based testing for years (dislosure: I am FsCheck's main contirbutor). I think the same is true for ScalaCheck. Quvik QuickCheck's is likely the most advanced implementation (certainly with the most advanced applications).

Matching a virtual machine design with its primary programming language

As background for a side project, I've been reading about different virtual machine designs, with the JVM of course getting the most press. I've also looked at BEAM (Erlang), GHC's RTS (kind of but not quite a VM) and some of the JavaScript implementations. Python also has a bytecode interpreter that I know exists, but have not read much about.
What I have not found is a good explanation of why particular virtual machine design choices are made for a particular language. I'm particularly interested in design choices that would fit with concurrent and/or very dynamic (Ruby, JavaScript, Lisp) languages.
Edit: In response to a comment asking for specificity here is an example. The JVM uses a stack machine rather then a register machine, which was very controversial when Java was first introduced. It turned out that the engineers who designed the JVM had done so intending platform portability, and converting a stack machine back into a register machine was easier and more efficient then overcoming an impedance mismatch where there were too many or too few registers virtual.
Here's another example: for Haskell, the paper to look at is Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine. This is very different from any other type of VM I know about. And in point of fact GHC (the premier implementation of Haskell) does not run live, but is used as an intermediate step in compilation. Peyton-Jones lists no less then 8 other virtual machines that didn't work. I would like to understand why some VM's succeed where other fail.
I'll answer your question from a different tack: what is a VM? A VM is just a specification for "interpreter" of a lower level language than the source language. Here I'm using the black box meaning of the word "interpreter". I don't care how a VM gets implemented (as a bytecode intepereter, a JIT compiler, whatever). When phrased that way, from a design point of view the VM isn't the interesting thing it's the low level language.
The ideal VM language will do two things. One, it will make it easy to compile the source language into it. And two it will also make it easy to interpret on the target platform(s) (where again the interpreter could be implemented very naively or could be some really sophisticated JIT like Hotspot or V8).
Obviously there's a tension between those two desirable properties, but they do more or less form two end points on a line through the design space of all possible VMs. (Or, perhaps some more complicated shape than a line because this isn't a flat Euclidean space, but you get the idea). If you build your VM language far outside of that line then it won't be very useful. That's what constrains VM design: putting it somewhere into that ideal line.
That line is also why high level VMs tend to be very language specific while low level VMs are more language agnostic but don't provide many services. A high level VM is by its nature close to the source language which makes it far from other, different source languages. A low level VM is by its nature close to the target platform thus close to the platform end of the ideal lines for many languages but that low level VM will also be pretty far from the "easy to compile to" end of the ideal line of most source languages.
Now, more broadly, conceptually any compiler can be seen as a series of transformations from the source language to intermediate forms that themselves can be seen as languages for VMs. VMs for the intermediate languages may never be built, but they could be. A compiler eventually emits the final form. And that final form will itself be a language for a VM. We might call that VM "JVM", "V8"...or we might call that VM "x86", "ARM", etc.
Hope that helps.
One of the techniques of deriving a VM is to just go down the compilation chain, transforming your source language into more and more low level intermediate languages. Once you spot a low level enough language suitable for a flat representation (i.e., the one which can be serialised into a sequence of "instructions"), this is pretty much your VM. And your VM interpreter or JIT compiler would just continue your transformations chain from the point you selected for a serialisation.
Some serialisation techniques are very common - e.g., using a pseudo-stack representation for expression trees (like in .NET CLR, which is not a "real" stack machine at all). Otherwise you may want to use an SSA-form for serialisation, as in LLVM, or simply a 3-address VM with an infinite number of registers (as in Dalvik). It does not really matter which way you take, since it is only a serialisation and it would be de-serialised later to carry on with your normal way of compilation.
It is a bit different story if you intend to interpret you VM code immediately instead of compiling it. There is no consensus currently in what kind of VMs are better suited for interpretation. Both stack- (or I'd dare to say, Forth-) based VMs and register-based had proven to be efficient.
I found this book to be helpful. It discusses many of the points you are asking about. (note I'm not in any way affiliated with Amazon, nor am I promoting Amazon; just was the easiest place to link from).
http://www.amazon.com/dp/1852339691/

Does "DO-178B level A" prohibits optimizing compilers?

There is an "DO-178B" level A and level B certification for airborne systems. Does it prohibit using of optimizating compilers?
E.g. Some compilers will reorder instructions to get more performance. Does DO-178B lev.A or lev.B prohibits this reordering?
Most modern CPU have such reordering builtin in the hardware. Are they allowed to be used within DO-178B lev.A softare/hardware systems?
First, and critically: For this type of question, if the answer matters, you need to get a formal professional opinion from someone who is competent to provide it, or discuss this with your certification authority. Any reply you will get here should not be relied on.
With that said, I will assume you are asking from a point of curiousity and will not be relying on the answer in any meaningful way, and I will attempt to answer in that vein. I am not a professional, and this is not professional advice.
The most on-point documentation I could find online with a quick search was this FAA guideline paper about a related topic: http://www.faa.gov/aircraft/air_cert/design_approvals/air_software/cast/cast_papers/media/cast-12.pdf. This paper describes the conditions under which one must do verification of the generated object code rather than the source code. In particular, it gives a number of examples that will occur even in non-optimized code -- automatic variable initialization and exception handling are a couple of examples. On compiler optimization, it notes:
Compiler optimization is another area addressed under section 4.4.2a of DO-178B/ED-12B. This involves the analytical determination that the optimization features do not compromise the ability of the test cases to demonstrate requirements-based testing and structural coverage consistent with the software level. This is a separate issue from the traceability and additional verifications issues addressed by Section 4.4.2b. This is outside the scope of this paper.
I do not have a copy of DO-178B handy to read section 4.4.2a, but I would note that (a) there are procedures for handling other cases where the object code does not correspond to the source code in a one-to-one manner, and (b) this pretty strongly implies that compiler optimization is discussed rather than outright prohibited.
It's also pretty clear from a number of the discussions in that paper that the answer to "we can't trace things between the source code and the object code" is to validate the object code in some manner -- in other words, there is a solution other than prohibiting such things.
Thus, I would conclude that at least some compiler optimizations must be permitted.
In particular, the sort of reordering that you describe is quite traceable, and it seems almost certain to me that it would be permitted.
DO-178B is not absolute and is open to interpretation. If you switch off optimisation there is no questions and nothing to explain. By sticking to the most obvious interpretation you avoid having to sell your interpretation to certification authorities later on and opening your self up to questions about how you did things.
When you optimise your code it is hard to do the source to instruction traceability that is required for level A. In addition if you are using Do-178B getting that extra 5% out of your software is not your greatest concern. The ease of completing all the required certification steps should be your primary concern since that is what is going to be sucking up all your time.
The hardware part of your question is interesting. For software optimisation code is not just reordered it is changed as well. But for hardware the code is not changed to get higher speed only the execution order. I have to ask around to get more info on what the thinking is on this.
I have only superficial knowledge of DO-178B (I do not work day-to-day with it, but I build tools for people who do).
The standard takes traceability very seriously. High-level requirements are declined into low-level requirements, which are implemented by the source code, which is compiled by the compiler. At each of these steps, one must be able to justify what was done in terms of the specifications produced by the previous step.
For the compiler, this means that one must be able to read the assembly and trace one particular instruction to the source code statement that caused this instruction to be generated.
So, in short, yes, I think this prohibits most optimizations.
Concerning the hardware this software is run on, it is verified differently (but I guess just as stringently). The relevant standard is then DO-254, and I do not know anything about it.
With optimization, you need to verify the generated code at the object assembly language level. There are compiler suites and libraries for embedded real-time multitasking that have been previously verified in other projects, giving you a comfort level that they can be verified again - but you still need to verify the code used in your application.
To avoid delays and having to explain things just turn off optimizations and cache. This makes the code deterministic. Also try not to use GCC if possible and go for a qualified compiler such as IAR or DDCI or Irvine Compilers or something. Instead of trying bang the screw with a fancy hammer get a screw driver that works for the screw. Because when that plane crashes with 200 people on board, with mothers, fathers and children and they find out that the compiler reordered code and that caused the failure you will wish that you only had the right screw driver.

What are Finite State Automata and why should a programmer know about them?

Erm - what the question said. It's something I keep hearing about, but I've not got round to looking into it yet.
(updated) I could look up the definition... but why not (as pointed out by #erikson) get insight into your real experiences and anecdotes. Community Wiki'd incase that helps folks vote up the most insightful answer. Interesting reading so far, thanks!
Short answer, it is a technique that you can use to express systems with concrete states (as opposed to quantum states / probability distributions).
Quoting the Wikipedia article:
A finite state machine (FSM) or finite
state automaton (plural: automata) or
simply a state machine, is a model of
behavior composed of a finite number
of states, transitions between those
states, and actions. A finite state
machine is an abstract model of a
machine with a primitive internal
memory.
So, what does that mean to you? Put simply, it is an effective way to represent the path(s) from a starting state to the end state(s) of the system that you care about. Using regular expressions as a fairly easy to understand example, let's look at the pattern AB+C (imagine that that plus is a superscript). I would expect to this pattern to accept strings such as "ABC", "ABBC", "ABBBC", etc. A at the start, C at the end, some number of B's in the middle (greater than or equal to one).
If you think about it, it's almost easier to think about this in terms of a picture. Faking it with text (and that my parentheses are a loopback arc), you can see that A (on the left), is the starting state and C (on the right) is the end state on the right.
_
( )
A --> B --> C
From FSAs, you can continue your journey into computational complexity by heading over to the land of Turing Machines.
However, you can also use state machines to represent real behaviors and systems. In my world, we use them to model certain workflow of actual people working with components that are extremely intolerant of mistakes in state order. As in, "A had better happen before C or there will be a very serious problem. Make that be not possible right now."
You could look it up, but what the hell. Intuitively, a finite state automaton is an abstraction of something that has some finite number of states, and rules by which you can go from state to state. A state is something for which a true or false statement can be made, and a rule is a way that you change from one state to another. So, you could have, say, two states: "I'm at home" and "I'm at work" and two rules, "go to work" and "go home."
It turns out that you can look at machines like this mathematically, and find there are things they can and cannot do. Regular expressions are basically a way of describing a finite state machine in which the states are a set of different strings, and the rules move you from state to state based on the next character read. You can prove that. But you can also prove that no finite state machine can tell whether or not the parentheses in an expression are matched (via the pumping lemma for FSAs.)
The reason you should learn about FSAs is that they can be used to solve many problems: string matching, control of systems, business process descriptions, digital circuit design. They're also inherently pretty.
Formally, an FSA is a algebraic structure F = 〈Σ, S, s0, F, δ〉 where Σ is the input alphabet, S is a set of states, s0 ∈ S is a particular start state, F ⊆ S is a set of accepting states, and δ:S×Σ → S is the state transition function.
in OOP terms: if you have an object with methods that you call on certain events, and some (other) methods that have different behaviour depending on the previous calls.... surprise! you have a state machine!
now, if you know the theory, you don't have to rethink it all. you simply say: "piece of cake, it's just a state machine" and go on to implement it.
if you don't know the theory you'll think about it for a while, write some clever hacks, and get something that's difficult to explain and document... because you don't have the words to describe it
Good answers above. I would only add that FSA are primarily a thinking tool, not a programming technique. What makes them useful is they have nice properties, and anything that acts like one has those properties. If you can think of something as an FSA, there are many ways you can build it:
as a regular expression
as a state-transition table
as a while-switch-on-state loop
as a goto-net (horrors!)
as simple structured program code
etc. etc.
If somebody says something is a FSA, you can immediately know what they are talking about, no matter how it is built.
You need state machines whenever you have to release your thread before you have completed your operation.
Since web services are often not statefull, you don't usually see this in web services--you re-arrange your URL so that each URL corresponds to a single path through the code.
I guess another way to think about it could be that every web server is a FSM where the state information is kept in the URL.
You often see it when processing input. You have to release your thread before the input has all been completed, so you set a flag saying "input in progress" or something like that. When done you set the flag to "awaiting input". That flag is your state monitor.
More often than not, a FSM is implemented as a switch statement that switches on a variable. Each case is a different state. At the end of the case, you may set the state to a new value. You've almost certainly seen this somewhere.
The nice thing about a FSM is that you can make the state a part of your data rather than your code. Imagine that you need to fill out 1000 items in the database. The incoming data will address one of the 1000 items, but you generally don't have enough data to complete the operation.
Without an FSM you might have hundreds of threads waiting around for the rest of the data so they can complete processing and write the results to the DB. With a FSM, you write the state to the DB, then exit your thread. Next time you can check the incoming data, read the state from the thread and that should give you enough info to determine what code to run.
Nearly every FSM operation COULD be done by dedicating a thread to it, but probably not as well (The complexity multiplies based on number of states, whereas with a state machine the rise in complexity is more linear). Also, there are some conceptual design issues--examining your code at the state level is in some cases much easier than examining it at the line of code level.
Every programmer should know about them because they are an excellent tool for certain kinds of problems, where the usual 'iterative-thinking' approach would yield nasty, complex code.
A typical example is game AI, where NPCs have different states that change according to where the player is, something like:
NPC_STATE_IDLE
NPC_STATE_ALERT (player at less than 100 meters)
NPC_STATE_ENGAGE (player attacked NPC)
NPC_STATE_FLEE (low on health)
where a FSM can describe easily the transitions and help perform complex reasoning about the system the FSM is describing.
Important: If you are a "visual" style learner, stop everything you are doing and go to this link ... Right Now.
If you are a "visual" learner, here is an excellent link that gives a very accessible introduction.
Reanimator by Oliver Steele
It looks like you've already approved an answer, but if you appreciate "visual" introduction to new concepts, as is common, you really should check out the link. It is simply outstanding.
(Note: the link points to a discussion of DFA and NDFA in the context of regular expressions -- with animated interactive diagrams)
Yes! You could look it up!
http://en.wikipedia.org/wiki/Finite_state_machine
What it is is better answered on other sites (such as Wikipedia), because there are pretty extensive answers out there already.
Why you should know them: Because you probably implemented them already.
Any time your code has a limited number of possible states (that's the "finite state" part) and switches to another one once some input/event happend (that's the "machine" part) you've written a finite state machine.
It is a very common tool and knowing the theoretical basics for that, being able to reason about it and knowing how to combine two FSMs into a single one that does the same work can be a great help.
FSAs are great data structures to understand because any chance you have to implement them, you're working at the lowest level of computational complexity on the Chomsky hierarchy. A great example is in word morphology (how parts of words come together). A lot of work has been done to show that even the most severe cases can be analyzed in this extremely fast analytical framework. Take a look at Karttunnen and Beesley's work out of PARC.
FSAs are also a great place to start learning about machine learning concepts like hidden markov models, because in many ways, the problem can be broken down using the same ideas and vocabulary.
One item that hasn't been mentioned so far is the semantic equivalence of finite state automata and regular expressions. A regular expression can be compiled to a finite state automaton (this is how regex libraries work) and vice-versa.
FSA (including DFA and NFA) are very important for computer science and they are use in many fields including many fields. For instance hidden markov fields for speech recognition also regular expressions are converted to the FSA's before they are interpreted by the software and NLP (Natural Language Processing), AI (game programming), Robot Programming etc.
One of the disadvantage of FSA's are they are usually slow and usually hard to implement and hard to understand or visualize while reading the code, but they are good because they usually provide generic solutions to the problems and they are well-known with a lot of studies on FSA's.