How do we detect semantic errors of software application? - semantics

"How do we detect semantic errors of a software application? Is it possible to eliminate all possible errors of a software application?"

A compiler will by necessity cover syntactic errors.
Semantic errors are much more difficult to address.
Many languages have lint facilities.
Lint-like tools are especially useful for interpreted languages like JavaScript and Python. Because such languages lack a compiling phase that displays a list of errors prior to execution, the tools can also be used as simple debuggers for common errors (e.g. syntactic discrepancies) as well as hard-to-find errors such as heisenbugs (drawing attention to suspicious code as "possible errors"). Lint-like tools generally perform static analysis of source code.
You may want to examine the concept of Correctness.
In theoretical computer science, correctness of an algorithm is asserted when it is said that the algorithm is correct with respect to a specification. Functional correctness refers to the input-output behavior of the algorithm (i.e., for each input it produces the expected output)
In practice, it is not possible to eliminate all semantic errors. For example, both
j = i + 1
and
i = i + 1
are correct syntactically in several different languages. In the proper context, they are also correct, semantically speaking. But the programmer may have intended one but not the other. The detection of that intention, and whether it is detectable, is at the very heart of program correctness.

Related

static and dynamic code analysis

I found several questions about this topic, and all of them with lot of references, but still I don't have a clear idea about that, because most of the references speak about concrete tools and not about the concept in general of the analysis. Thus I have some questions:
About Static analysis:
1. I would like to have a reference, or a summary of which techniques are successful and have more relevance nowadays.
2. What really can they do about discovering bugs, can we make a summary or it is depending of the tool?
About symbolic execution:
1. Where could be enclose symbolic execution? I guess depending of the approach,
I would like to know if they are dynamic analysis, or mix of static and dynamic analysis if it is possible to determine.
I found problems to differentiated the two different techniques in the tools, even I think I know the theoretical difference.
I'm actually working with C
Thanks in advance
I'm trying to give a short answer:
Static analysis looks at the syntactical structure of code and draws conclusions about the program behavior. These conclusions must not always be correct.
A typical example of static analysis is data flow analysis, where you compute sets like used, read, write for every statement. This will help to find e.g. uninitialized values.
You can also analyze the code regarding code-patterns. This way, these tools can be used to check if you are complying to a specific coding standard. A prominent coding standard example is MISRA. This coding standard is used for safety critical systems and avoids problematic constructs in C. This way you can already say a lot about the robustness of your applications against memory leaks, dangling pointers, etc.
Dynamic analysis is not looking at the syntax only, but takes state information into account. In symbolic execution, you are adding assumptions about the possible values of all variables to the statements.
The most expensive and powerful method of dynamic analysis is model checking, where you really look at all possible execution states of the system. You can think of a model checked system as a system that is tested with 100% coverage - but there are of course a lot of practical problems that prevent real systems to be checked that way.
These methods are very powerful, and you can gain a lot from the static code analysis tools especially when combined with a good coding standard.
A feature my software team found really impressive is e.g. that it will tell you in C++ when a class with virtual methods does not have a virtual destructor. Easy to check in fact, but really helpful.
The commercial tools are very expensive, but worth the money, once you learned how to use them. A typical problem in the beginning is that you will get a lot of false alarms, and don't know where to look for the real problem.
Note that nowadays g++ has some of this stuff already built-in, and that you can use something like pclint which is free.
Sorry - this is already getting quite long...hope it's interesting.
The term "static analysis" means that the analysis does not actually run a code. On the other hand, "dynamic analysis" runs a code and also requires some kinds of real test inputs. That is the definition. Nothing more.
Static analysis employs various formal methods such as abstract interpretation, model checking, and symbolic execution. In general, abstract interpretation or model checking is suitable for software verification. Symbolic execution is more appropriate for the purpose of bug finding.
Symbolic execution is categorized into static analysis. However, there is a hybrid method called concolic execution which uses both symbolic execution and dynamic testing.
Added for Zane's comment:
Maybe my explanation was little confusing.
The difference between software verification and bug finding is whether the analysis is sound or not. For example, when we say the buffer overrun analyzer is sound, it means that the analyzer must report all possible buffer overruns. If the analyzer reports nothing, it proves the absence of buffer overruns in the target program. Because model checking is the method that guarantees soundness, it is mostly used for software verification.
On the other hands, symbolic execution which is actively used by today's most commercial static analyzers does not guarantee soundness since sound analysis inherently issues lots, lots of false positives. For the purpose of bug finding, it is more important to reduce false positives even if some true positives are also lost.
In summary,
soundness: there are no false negatives
completeness: there are no false positives
software verification: soundness is more important than completeness
bug finding: completeness is more important than soundness

Why do tools like yacc and ANTLR generate source code?

These tools basically input a grammar and output code which processes a series of tokens into something more useful, like a syntax tree. But could these tools be written in the form of a library instead? What is the reason for generating source code as output? Is there a performance gain? Is it more flexible for the end user? Easier to implement for the authors of yacc and ANTLR?
Sorry if the question is too vague, I'm just curious about the historical reasons behind the decisions the authors made, and what purpose auto-generated code has in today's environment.
There's a big performance advantage achieved by the parser generator working out the interactions of the grammar rules with respect to one another, and compiling the result to code.
One could build interpreters that simply accepted grammars and did the parsing; there are parser types (Earley) that would actually be relatively good at that, and one could compute the grammar interactions at runtime (Earley parsers kind of do this anyway) rather than offline and then execute the parsing algorithm.
But you would pay a parsing performance penalty of 10 to 100x slowdown, and probably a big storage demand.
If you are parsing using only very small grammars, or you are parsing only very small documents, this might not matter. But the grammars that many parser generators get applied too end up being fairly big (people keep wanting to add things to what you can say in a language), and they often end up processing pretty big documents. So performance now matters, and viola, people build code-generating parser generators.
Once you have a tool, it is often easier to use even in simple cases. So now that you have parser generators, you can even apply them to little grammars or to parsing little documents.
EDIT: Addendum. The historical reason is probably driven by space and time demands. Earlier systems had not a lot of room (32Kb in 1975), didn't run very fast (1 MIPS same time frame), and people had big source files already. Parser generators tended to help with this set of problems; interpreted grammars would have had intolerably bad performance.
Ira Baxter gave you one set of reasons for not handling the grammar parsing at runtime.
There is another reason too. Associated with each rule in the grammar is the appropriate action. The action is normally a fragment of a separate language (for example, C or C++). All actions in a grammar interpreted at runtime would have to be mappable to something appropriate in the program. In general, that's a losing proposition. The fragments can do all sorts of things, referencing parts of the stack ($$, $1, etc) and invoking actions (YYACCEPT, etc). Designing the runtime system so that it could be reliably used with such fragments would be tough. You'd like be into creating source code and compiling that into a DSO (dynamic shared object) or DLL (dynamic link library) and loading it. That requires a compiler on the customer's machine, where the customer may have deliberately designed their production system to be compiler-free.

Does "DO-178B level A" prohibits optimizing compilers?

There is an "DO-178B" level A and level B certification for airborne systems. Does it prohibit using of optimizating compilers?
E.g. Some compilers will reorder instructions to get more performance. Does DO-178B lev.A or lev.B prohibits this reordering?
Most modern CPU have such reordering builtin in the hardware. Are they allowed to be used within DO-178B lev.A softare/hardware systems?
First, and critically: For this type of question, if the answer matters, you need to get a formal professional opinion from someone who is competent to provide it, or discuss this with your certification authority. Any reply you will get here should not be relied on.
With that said, I will assume you are asking from a point of curiousity and will not be relying on the answer in any meaningful way, and I will attempt to answer in that vein. I am not a professional, and this is not professional advice.
The most on-point documentation I could find online with a quick search was this FAA guideline paper about a related topic: http://www.faa.gov/aircraft/air_cert/design_approvals/air_software/cast/cast_papers/media/cast-12.pdf. This paper describes the conditions under which one must do verification of the generated object code rather than the source code. In particular, it gives a number of examples that will occur even in non-optimized code -- automatic variable initialization and exception handling are a couple of examples. On compiler optimization, it notes:
Compiler optimization is another area addressed under section 4.4.2a of DO-178B/ED-12B. This involves the analytical determination that the optimization features do not compromise the ability of the test cases to demonstrate requirements-based testing and structural coverage consistent with the software level. This is a separate issue from the traceability and additional verifications issues addressed by Section 4.4.2b. This is outside the scope of this paper.
I do not have a copy of DO-178B handy to read section 4.4.2a, but I would note that (a) there are procedures for handling other cases where the object code does not correspond to the source code in a one-to-one manner, and (b) this pretty strongly implies that compiler optimization is discussed rather than outright prohibited.
It's also pretty clear from a number of the discussions in that paper that the answer to "we can't trace things between the source code and the object code" is to validate the object code in some manner -- in other words, there is a solution other than prohibiting such things.
Thus, I would conclude that at least some compiler optimizations must be permitted.
In particular, the sort of reordering that you describe is quite traceable, and it seems almost certain to me that it would be permitted.
DO-178B is not absolute and is open to interpretation. If you switch off optimisation there is no questions and nothing to explain. By sticking to the most obvious interpretation you avoid having to sell your interpretation to certification authorities later on and opening your self up to questions about how you did things.
When you optimise your code it is hard to do the source to instruction traceability that is required for level A. In addition if you are using Do-178B getting that extra 5% out of your software is not your greatest concern. The ease of completing all the required certification steps should be your primary concern since that is what is going to be sucking up all your time.
The hardware part of your question is interesting. For software optimisation code is not just reordered it is changed as well. But for hardware the code is not changed to get higher speed only the execution order. I have to ask around to get more info on what the thinking is on this.
I have only superficial knowledge of DO-178B (I do not work day-to-day with it, but I build tools for people who do).
The standard takes traceability very seriously. High-level requirements are declined into low-level requirements, which are implemented by the source code, which is compiled by the compiler. At each of these steps, one must be able to justify what was done in terms of the specifications produced by the previous step.
For the compiler, this means that one must be able to read the assembly and trace one particular instruction to the source code statement that caused this instruction to be generated.
So, in short, yes, I think this prohibits most optimizations.
Concerning the hardware this software is run on, it is verified differently (but I guess just as stringently). The relevant standard is then DO-254, and I do not know anything about it.
With optimization, you need to verify the generated code at the object assembly language level. There are compiler suites and libraries for embedded real-time multitasking that have been previously verified in other projects, giving you a comfort level that they can be verified again - but you still need to verify the code used in your application.
To avoid delays and having to explain things just turn off optimizations and cache. This makes the code deterministic. Also try not to use GCC if possible and go for a qualified compiler such as IAR or DDCI or Irvine Compilers or something. Instead of trying bang the screw with a fancy hammer get a screw driver that works for the screw. Because when that plane crashes with 200 people on board, with mothers, fathers and children and they find out that the compiler reordered code and that caused the failure you will wish that you only had the right screw driver.

decompilation resources and theory

There must be a million of books and papers on the theory and techniques of building compilers. Are there any resources on doing the reverse? Im not interested in any particular HW platform. Looking for good books/research papers that examine the subject and difficulties in depth.
I've worked on an AS3 and Java decompiler and I can assure you that everything I've learned in regards to decompilation is straight from compiler theory. Intermediate representations, data flow analysis, term rewriting, and other related concepts can all be found in the dragon book.
I've written about decompilers for dynamic languages here and for Python specifically.
Note though this is for dynamic languages with custom (high-level) VMs.
Decompilation is really a misnomer. Decompilers compile object code into a source representation. In many ways they are easier to write than traditional compilers - the 'source' code is already syntax checked and usually very precisely formatted.
They build up a symbol table (of addresses) and construct a target language representation of the application. The usual difficulty is that the original compiler has to a greater or lesser degree optimised the original application by removing common sub-expressions, hoisting constant code out of loops and many other similar techniques. These are often not possible to represent in the target language.
In cases where the source is for a well defined VM, then often this optimisation is left to the JIT compiler and the resulting decompiled code is very readable - in many cases almost identical to the original. Compilers of this type often leave some or all of the symbols in the object code allowing these to be recovered. Others include line numbers to help with debugging and troubleshooting. These all help to recover the original code.
As a counter, there are code obfuscators that deliberately perform transformations to the code that prevent simple restoration of the original source by scrambling names, change the sequence code is generated (without changing its resulting meaning) and introducing constructs for which there is no source language equivalent.

Can PMD be customized to fully support a new language?

Can PMD be customized to fully support a new language, in a reasonable amount of time. I mean I know that technically almost anything can be done, but im wondering if this can be done in a reasonable amount of time? E.g. < 2 weeks
This page mentions how to write a CPD parser http://pmd.sourceforge.net/cpd-parser-howto.html
But is this just for copy / paste detection? Does writing a CPD parser give me full support of PMD in terms of rile sets?
I would guess not, but I'm not a PMD expert (and I have my own bias, check my bio).
The issues are:
Can you define a syntax for my langauge quickly (maybe, depending on how good you are, how messy the language is, and the strength of the parsing machinery offered by PMD)
Can you define the semantics of my language so that "semantic checks" provided by PMD work. You have to do this, because syntax tells you (and a tool) literally nothing about semantic of the syntax. I would guess that the PMD tool 'semantic checks' are pretty wired into the precise details of Java; if you language matched java perfectly, this would be zero work. But it doesn't, or you wouldn't be asking the question. And langauge semantics differences, even minor ones, cause discontinuous changes to the interpreation of the code. Before you get to doing even "serious" semantics, you're likely to have to build a symbol table mapping identifiers in the code to declarations (and the "semantic" type) for those symbols. Based on tool infrastructure I work with, this step alone takes 1-2 months for a real language.
Lastly, you are likely to have to code special PMD checks that are specific to your langauge. That takes time and energy, too.
I build generic compiler-type machinery (parsers, flow analyzers, style/error checkers) and get asked the equivalent of this question all the time WRT to our machinery. We try to have a lot of machinery available, try to make it easy to integrate new langauges, and we've been working on trying to make this "convenient and fast" for 15+ years. Its still not convenient, and there's no way to do this with our tools in a few weeks. I doubt PMD is better.