Related
I found several questions about this topic, and all of them with lot of references, but still I don't have a clear idea about that, because most of the references speak about concrete tools and not about the concept in general of the analysis. Thus I have some questions:
About Static analysis:
1. I would like to have a reference, or a summary of which techniques are successful and have more relevance nowadays.
2. What really can they do about discovering bugs, can we make a summary or it is depending of the tool?
About symbolic execution:
1. Where could be enclose symbolic execution? I guess depending of the approach,
I would like to know if they are dynamic analysis, or mix of static and dynamic analysis if it is possible to determine.
I found problems to differentiated the two different techniques in the tools, even I think I know the theoretical difference.
I'm actually working with C
Thanks in advance
I'm trying to give a short answer:
Static analysis looks at the syntactical structure of code and draws conclusions about the program behavior. These conclusions must not always be correct.
A typical example of static analysis is data flow analysis, where you compute sets like used, read, write for every statement. This will help to find e.g. uninitialized values.
You can also analyze the code regarding code-patterns. This way, these tools can be used to check if you are complying to a specific coding standard. A prominent coding standard example is MISRA. This coding standard is used for safety critical systems and avoids problematic constructs in C. This way you can already say a lot about the robustness of your applications against memory leaks, dangling pointers, etc.
Dynamic analysis is not looking at the syntax only, but takes state information into account. In symbolic execution, you are adding assumptions about the possible values of all variables to the statements.
The most expensive and powerful method of dynamic analysis is model checking, where you really look at all possible execution states of the system. You can think of a model checked system as a system that is tested with 100% coverage - but there are of course a lot of practical problems that prevent real systems to be checked that way.
These methods are very powerful, and you can gain a lot from the static code analysis tools especially when combined with a good coding standard.
A feature my software team found really impressive is e.g. that it will tell you in C++ when a class with virtual methods does not have a virtual destructor. Easy to check in fact, but really helpful.
The commercial tools are very expensive, but worth the money, once you learned how to use them. A typical problem in the beginning is that you will get a lot of false alarms, and don't know where to look for the real problem.
Note that nowadays g++ has some of this stuff already built-in, and that you can use something like pclint which is free.
Sorry - this is already getting quite long...hope it's interesting.
The term "static analysis" means that the analysis does not actually run a code. On the other hand, "dynamic analysis" runs a code and also requires some kinds of real test inputs. That is the definition. Nothing more.
Static analysis employs various formal methods such as abstract interpretation, model checking, and symbolic execution. In general, abstract interpretation or model checking is suitable for software verification. Symbolic execution is more appropriate for the purpose of bug finding.
Symbolic execution is categorized into static analysis. However, there is a hybrid method called concolic execution which uses both symbolic execution and dynamic testing.
Added for Zane's comment:
Maybe my explanation was little confusing.
The difference between software verification and bug finding is whether the analysis is sound or not. For example, when we say the buffer overrun analyzer is sound, it means that the analyzer must report all possible buffer overruns. If the analyzer reports nothing, it proves the absence of buffer overruns in the target program. Because model checking is the method that guarantees soundness, it is mostly used for software verification.
On the other hands, symbolic execution which is actively used by today's most commercial static analyzers does not guarantee soundness since sound analysis inherently issues lots, lots of false positives. For the purpose of bug finding, it is more important to reduce false positives even if some true positives are also lost.
In summary,
soundness: there are no false negatives
completeness: there are no false positives
software verification: soundness is more important than completeness
bug finding: completeness is more important than soundness
I am packaging up an rpm file which has a %postinstall section that detects certain conditions and runs a suite of unit, function, and system tests. I am getting some push back that it exposes some of the internal structure as I use some of the same environment variables the code itself uses for diagnostics. Thoughts?
UPDATE: I am not planning on running the tests automatically nor exposing their existance to the end users. I am proposing that the testing package simply be available to any machine where the suite lands. It adds roughtly 3% to the final size of the package and requires an obscene amount of internal knowledge to execute properly.
The program itself is a library which others may use and is exposed in an API. The internal knowledge of how things functions is not at issue. My main motivation is the lack of a suitable test resources and the large variability in the target environment. Some of the tests are really simple (similar to what configure might do to determine all the right features are available from the compiler). Other tests are more involved and they prove the basic functions the library should provide.
If you want to avoid the complaint that it runs on every install, at least use the %check rule of RPM.
Sounds like people are concerned about "reverse engineering". So the software is proprietary? This would seem to be the crux of your problem. Regardless, it's common for the test suite to be separate from the packaged software.
However, you're not being unrealistic: Allowing users to run tests themselves on their systems and give you the results is a great aspect of a collaborative relationship with users. Unfortunately, you're running up against the proprietary business model.
Perhaps you can compromise by trimming down or rewriting the tests and the diagnostics to only prove an adequate amount of fitness without revealing too much. I wouldn't back down from throwing out the tests and diagnostics of what you've written so far.
You really should make the argument that users will be pleased and have more confidence in a software package shipped with a thorough testing system, and that these outweigh any fears of revealing the software's internals.
Like almost anyone who's been programming for a while, I'm familiar with the term "production code" and have a vague sense of what it means. However, can someone offer a semi-rigorous definition, since it seems Wikipedia and Google can't? It seems like there are a lot of gray areas in what counts as production, such as internal tools that are used by a small group of people and therefore not "formalized" in terms of UI, documentation, etc. and open source apps that are feature complete, reasonably bug free and working, but lack polish, UI and extensive testing.
When your code runs on a production system, that means it is being used by the intended audience in a real-world situation.
Production code, however, does not necessarily mean robust, reliable, or stable code. The Daily WTF provides plenty of evidence in this regard.
Production means anything that you need to work reliably, and consistently.
Whether is a build script, or a public facing web server.
When others rely on your code, particularly folks who may not understand it (i.e. even "smart" developers but perhaps not in your group, but using a library you wrote), that code is production code.
It's production because "work stops" and "money is lost" when the production code fails.
The definition as I understand it is that production code is any code that is installed or in use on a live, non-test-bed system. A server used internally to a company is a production system if it is the live system used by the employees of the company. The point here is that code running on a server internal to the company writing the code can be production code.
Usually, a good distinction when looking at internal code is whether or not the group maintaining the code is separate from the group using the code. If the groups are separate, odds are that the code is production code. If running the business depends on the code, then it is certainly production code, even if it is developed and maintained in-house.
EDIT: The short answer: If you are "betting the farm on it", it is "production".
This is a great question--an absolutely critical distinction that routinely gets everyone in trouble due to misunderstandings. The question of what is "production" is a subset of the related question of what is an "environment".
So part of the answer is that "production" is THE "environment" that is most
important and is most trusted as THE "real" thing.
So now we must define "environment" (and then revisit "production"). We are still far from a satisfactory answer.
We programmers use the term "environment" constantly to refer to computer systems consisting of hardware that is executing software. That software is the code that we wrote plus software that it depends upon, which was written by others. We write our code and integrate it with the other software, then we typically run the integrated software through an escalating series of tests (unit tests, integration tests, functional tests, acceptance tests, regression tests, etc.), until we finally run the integrated software in the full manner in which it was intended.
Of course, not everything is fully automated. There are usually numerous people involved, and they have manual processes to perform. We programmers look for ways to automate as many of these processes as possible, but there is always a "man/machine boundary" in the systems we work on. Often, there are many such boundaries in any particular case.
On the other hand, there may not be any significant automation at all. For example, we spoke of "production" way back when we had a room full of people performing manual labor which produced a product. So, there doesn't have to be any automation present in our "production" "environment". There is also a middle ground, where the automation involved does not include software, such as in the case of a person running a loom to weave cloth.
Also, there may not be a product, since we have adapted our language of "production" "environment" to include product-less service providers.
Likewise, the testing may not involve software, since we may be testing a non-software-driven machine (e.g., the loom) or even the people (training and evaluation).
Now we have touched on all the crucial elements of an "environment":
there is a purpose, an intent, being pursued
an intent requires an intender, so there must be a sponsor (a person or
group, but not a machine) that specifies the intent
that intent is pursued through various processes that are performed by
various actors
those actors may be people, they may be software executing on hardware, or they
may be non-software-driven machines, so there may or may not be automation present
Now we can properly and fully define our original terms.
An environment consists of all the processes and their actors that
collaborate to pursue a particular intent on behalf of its sponsor. That
means software executing on hardware, that means non-software-driven machines, and that
means people performing their various duties. It is the intent that primarily
defines an environment, not its processes or its actors.
Furthermore...
If the intent being pursued in a particular environment is the
sponsor's ultimate goal, which usually involves producing a product or
providing a service in exchange for money, then we refer to that
environment as production.
Now we can go a bit further.
If the intent being pursued in an environment is the verification of
processes and their actors in preparation for production, we call
that a test environment.
We further call it an integration environment if that testing involves the
initial joining together of significant individuals or groups of processes and
their actors.
If that preparation involves the "programming" of human actors to perform new
processes, or the subsequent verification (evaluation), then we call that a
training environment.
Armed with these distinctions and definitions, we can now understand several common scenarios.
An environment can be mislabeled with a name that does not match its intent, such as when a training environment is used as test.
An environment can be grossly misused, such as when integration or training is done in production.
An environment can be misrepresented, such as when key processes or actors are left unidentified (e.g., manual reconciliations, or even by ignoring the people altogether).
An environment can be retasked, by repurposing its processes and actors to a new intent. A very successful technique for some organizations is to routinely "flip" several sets of actors (servers hosting software) between production, test, training, and integration upon each release.
In most cases, a single actor (person or hardware) can execute multiple processes which can participate in multiple environments. For example, a single computer server can host software that performs production transactions while also hosting other software that performs test or training functions.
Normally, a single instance of an actor should participate in only one environment at a time. On very rare occasion, a single actor can be shared across environments if the intents are mutually compatible. Most of the time, it is very unwise to attempt such sharing because the intents are not really compatible. A perfect example is running a test process on a server that also supports production processes, resulting in downtime because the test caused the entire server to fail.
Therefore, the intent of an environment must be construed with very wide latitude, to include concepts such as availability, reliability, performance, disaster recovery, accuracy, precision, repeatability, longevity, etc. This means that the actors and processes must often be construed to include things like providing power, cooling, backups, and redundancy.
Finally, note that the situation can get quite complex. For example, a desktop computer (actor) may be tasked by the development team (sponsor) to host their source control (process), which the team relies upon for their primary jobs (production). Nevertheless, the IT staff sees that same desktop computer as simply a developer workstation (development, not production) and treats it with contempt and nonchalance when it develops a hardware problem. But the developers are producing production code, so aren't they also part of production? Perspective matters.
EDIT: Production quality
A solid verification (testing) methodology should take packaged code from development and run it through a series of tests (integration, TQA, functional, regression, acceptance, etc.) until it comes out the other side "stamped" for production use. However, that makes the package production quality, but not actually production. The package only becomes production when a sponsor actually deploys it into an environment with that ultimate level of intent.
However, if your organization merely produces that package (its product) for the consumption of others, then such a release comes as close to production as that organization will experience with respect to that product, so it is common to stretch the term production to apply rather than clarify that it is production quality. In reality, that organization's production environment consists of the actors and processes involved in its development/release efforts that result in that product.
I said it could get quite complex...
Any code that will be used by it's intended userbase would fit into my definition of 'production code'.
Of course, the grey area in that definition would be clearly defining who your userbase is.
G-Man
The production software can perform at the necessary workload without disruption or degradation of the service
Software has been successfully tested in different production scenarios
Transforming working prototype into production software which runs on fail-safe redundant architecture that can work in real business, i.e. production environment, needs time, code refactoring, and attention to details
The production code has acceptable level of maintainability and is reasonably well commented
The documentation manual explains functionality, all features and facilitates maintenance
If the production software is an international service or application, it must be localized
Production code is used by end-users, often customers under conditions described in Terms-of-Service Agreement
Production software does not necessarily mean reliable mission critical software
The software does well, what it was intended to do
Log files provide an accurate description of run-time performance and software reliability metrics and reporting which do facilitate debugging and software maintainability
I think the best way to describe it, is as any code that "leads-to" deployment and "follows-up" deployment. Deployment itself is defined as all of the activities that make a software system available for use. If your code is ready to be used by people, in-house or otherwise, then it is production code.
In simple words "Production code which is live and in use by its intended audience"
The term "production code" mixes two different concepts. One is deployment management and the other is release life cycle.
In the strict sense of the word, a system is in production when it is being used as part of business or service operation. What's not in production are development, testing, QA, demo, and staging system. Production system does not immediately imply quality.
From release life cycle's point of view, a "production" build is the build that is released to general public or clients. It is the stage after pre-alpha, alpha, beta, (feature complete, code complete, etc.) and release candidate. For shrink-wrap products that cannot easily deploy updates, reaching the production stage likely implies series of testing and bug fixes.
What are the benefits of doing static code analysis on your source code? I was playing around with FxCop and I was wondering if there any benefits beyond making sure you are following the coding standards.
There are all kinds of benefits:
If there are anti-patterns in your code, you can be warned about it.
There are certain metrics (such as McCabe's Cyclomatic Complexity)
that tell useful things about source code.
You can also get great stuff like call-graphs, and class diagrams
from static analysis. Those are wonderful if you are attacking a
new code base.
Take a look at SourceMonitor
Many classes of memory leaks and common logic errors can be caught statically as well. You can also look at cyclomatic complexity and such, which may be part of the "coding standards" you mentioned, but may be a separate metric you use to evaluate the algorithmic "cleanliness" of your code.
In any case, only a judicious combination of profiling (dynamic or run-time analysis) and static analysis/linting will ensure a consistent, reliable code base. Oh, that, and a little luck ;-)
It's a trade-off. For an individual developer who wants to improve his understanding of the framework and guidelines, I would definitely encourage it. FxCop generates a lot of noise / false positives, but I've also found the following benefits:
it detects bugs (e.g. a warning about an unused argument may indicate you used the wrong argument in the method body).
understanding the guidelines FxCop is following helps you to become a better developer.
However with a mixed-ability team, FxCop may well generate too many false positives to be useful. Junior developers will have difficulty appreciating whether some of the more esoteric violations thrown up by FxCop should concern them or are just noise.
Bottom line:
If you're developing reusable class libraries, such as an in-house framework, make sure you have good developers and use FxCop.
For everyday application development with mixed-ability teams, it will probably not be practicable.
actually, fxcop doesn't particularly help you follow a coding standard. What it does help you with is designing a well-thought out framework/API. It's true that parts of the coding standard (such as casing of public members) will be caught by FxCop, but coding standards isn't the focus.
coding standards can be checked with stylecop, which checks the source code instead of the MSIL like fxcop does.
It can catch actual bugs, like forgetting to Dispose IDisposables.
Depends on the rules, but many subtle defects can be avoided, code can be cleaned, potential performance problems can be detected etc.
Put it one way...if it's cheap or free (in both time and financial costs) and doesn't break anything, why not use it?
FxCop
There is a list of all warnings in FxCop. You can see that there are warnings from the following areas:
Design Warnings
Warnings that support proper library
design as specified by the .NET
Framework Design Guidelines.
Globalization Warnings
Warnings that support world-ready
libraries and applications.
Interoperability Warnings
Warnings that support interacting with
COM clients.
Naming Warnings
Warnings that support adherence to the
naming conventions of the .NET
Framework Design Guidelines.
Performance Warnings
Warnings that support high performance
libraries and applications.
Security Warnings
Warnings that support safer libraries
and applications.
Depending on your application some of those areas might not be very interesting, but if you e.g. need COM interoperability, the tests can really help you to avoid the pitfalls.
Other tools
Other static checking tools can help you to detect bugs like not disposing an IDisposable, memory leaks and other subtle bugs. For a extreme case see the not-yet-released NStatic tool.
NStatic is used to track things such as redundant parameters, expressions that evaluate to constants, infinite loops and many other metrics.
The benefits are that you can automatically find and quantify technical debt within your software application.
I find static code analysis tools indispensable on large enterprise application development where many developers and testers come and go over the life of an application but the code quality still needs to be kept high and the technical debt managed properly.
What are the benefits of doing static code analysis on your source code?
The benefits depend on the type of static code analysis that is performed. Static code analysis can range from simple to sophisticated techniques. For example, generating metrics about your source code to identify error prone code is one technique. Other techniques actively attempt to find bugs in your code. Sophisticated techniques use formal methods to prove that your code is free of bugs.
Therefore the benefit depends on the type of static code analysis being used. If the technique produces metrics (such as code complexity etc.), then a benefit is that these metrics could be used during code review to identify error prone code. If the technique detects bugs, then the benefit is that the developer can identify bugs before unit test. If formal methods based techniques are used to prove that the code does not contain bugs, then the benefit is that this information could be used to prove to the QA department (or certification authorities) that the code is free of certain types of bugs.
A more detailed description of the techniques and benefits can also be found on this page: www.mathworks.com/static-analysis
I'll try to describe the main ones:
Static code analysis identifies detects in the program at early stage, resulting decrease in cost to fix them.
It can detect flaws in the program's inputs and outputs that cannot be seen through dynamic testing.
It automatically scans uncompiled codes and identifies vulnerabilities.
Of what I know from dealing with checkmarx is that static code analysis fixes multiple vulnerabilities at a single point, which saves a lot of time for the developer.
What is Dynamic Code Analysis?
How is it different from Static Code Analysis (ie, what can it catch that can't be caught in static)?
I've heard of bounds checking and memory analysis - what are these?
What other things are checked using dynamic analysis?
-Adam
Simply put, static analysis collect information based on source code and dynamic analysis is based on the system execution, often using instrumentation.
Advantages of dynamic analysis
Is able to detect dependencies that are not possible to detect in static analysis. Ex.: dynamic dependencies using reflection, dependency injection, polymorphism.
Can collect temporal information.
Deals with real input data. During the static analysis it is difficult to impossible to know what files will be passed as input, what WEB requests will come, what user will click, etc.
Disadvantages of dynamic analysis
May negatively impact the performance of the application.
Cannot guarantee the full coverage of the source code, as it's runs are based on user interaction or automatic tests.
Resources
There's many dynamic analysis tools in the market, being debuggers the most notorious one. On the other hand, it's still an academic research field. There's many researchers studying how to use dynamic analysis for better understanding of software systems. There's an annual workshop dedicated to dependency analysis.
Basically you instrument your code to analyze your software as it is running (dynamic) rather than just analyzing the software without running (static). Also see this JavaOne presentation comparing the two. Valgrind is one example dynamic analysis tool for C. You could also use code coverage tools like Cobertura or EMMA for Java analysis.
From Wikipedia's definition of dynamic program analysis:
Dynamic program analysis is the
analysis of computer software that is
performed with executing programs
built from that software on a real or
virtual processor (analysis performed
without executing programs is known as
static code analysis). Dynamic program
analysis tools may require loading of
special libraries or even
recompilation of program code.
You asked for a good explanation of "bounds checking and memory analysis" issues.
Our Memory Safety Check tool instruments your application to watch at runtime for memory access errors (buffer overruns, array subscript errors, bad pointers, alloc/free errors). The link contains
a detailed explanation complete with examples. This SO answer shows two programs that have pointers into dead stack frame, and how CheckPointer detects and reports the point of error in the source code
A briefer example: C (and C++) infamously do not check accesses to arrays, to see if the access is inside the bounds of the array. The benefit: well-designed program don't pay the cost of such a check in production mode. The downside: buggy programs can touch things outside the array, and this can cause behavior which is very hard to understand; thus the buggy program is difficult to debug.
What a dynamic instrumentation tool like the Memory Safety Checker does, is associate some metadata with every pointer (e.g., the type of the thing to which the pointer "points", and if it is an array, the array bounds), and then check at runtime, any accesses via pointers to arrays, whether the array bound is violated. The tool modifies the original program to collect the metadata where it is generated (e.g., on entry to scopes in which arrays are declared, or as the result of a malloc operation, etc.) and modifies the program at every array reference (written both as x[y] where either x or y is an array pointer and the the value is some type of integral type, similarly for *(x+y)!) to check the access. Now if the program runs, and performs an out-of-bounds access, the check catches the error and it reported at the first place where it could be detected. [If you think about it, you'll realize the instrumentation for metadata collection and checking has to be pretty clever, to handle all the variant cases a language like C may have. Its actually hard to make this work completely).
The good news is that now such access is reported early where it is easier to detect the problem and fix the program. Such a tool isn't intended production use; one uses during development and testing to help verify absence of errors. If there are no errors discovered, then one does a normal compile and runs the programs without the checks.
This is an extremely good example of a dynamic analysis tool: the testing happens at runtime.
Bounds checking
This means runtime checks of array accesses. Contrary to C's laissez-faire approach to memory accesses and pointer arithmetic, other languages like Java or C# actually check whether or not a given array has the element one is trying to access.