What are the benefits of doing static code analysis on your source code? I was playing around with FxCop and I was wondering if there any benefits beyond making sure you are following the coding standards.
There are all kinds of benefits:
If there are anti-patterns in your code, you can be warned about it.
There are certain metrics (such as McCabe's Cyclomatic Complexity)
that tell useful things about source code.
You can also get great stuff like call-graphs, and class diagrams
from static analysis. Those are wonderful if you are attacking a
new code base.
Take a look at SourceMonitor
Many classes of memory leaks and common logic errors can be caught statically as well. You can also look at cyclomatic complexity and such, which may be part of the "coding standards" you mentioned, but may be a separate metric you use to evaluate the algorithmic "cleanliness" of your code.
In any case, only a judicious combination of profiling (dynamic or run-time analysis) and static analysis/linting will ensure a consistent, reliable code base. Oh, that, and a little luck ;-)
It's a trade-off. For an individual developer who wants to improve his understanding of the framework and guidelines, I would definitely encourage it. FxCop generates a lot of noise / false positives, but I've also found the following benefits:
it detects bugs (e.g. a warning about an unused argument may indicate you used the wrong argument in the method body).
understanding the guidelines FxCop is following helps you to become a better developer.
However with a mixed-ability team, FxCop may well generate too many false positives to be useful. Junior developers will have difficulty appreciating whether some of the more esoteric violations thrown up by FxCop should concern them or are just noise.
Bottom line:
If you're developing reusable class libraries, such as an in-house framework, make sure you have good developers and use FxCop.
For everyday application development with mixed-ability teams, it will probably not be practicable.
actually, fxcop doesn't particularly help you follow a coding standard. What it does help you with is designing a well-thought out framework/API. It's true that parts of the coding standard (such as casing of public members) will be caught by FxCop, but coding standards isn't the focus.
coding standards can be checked with stylecop, which checks the source code instead of the MSIL like fxcop does.
It can catch actual bugs, like forgetting to Dispose IDisposables.
Depends on the rules, but many subtle defects can be avoided, code can be cleaned, potential performance problems can be detected etc.
Put it one way...if it's cheap or free (in both time and financial costs) and doesn't break anything, why not use it?
FxCop
There is a list of all warnings in FxCop. You can see that there are warnings from the following areas:
Design Warnings
Warnings that support proper library
design as specified by the .NET
Framework Design Guidelines.
Globalization Warnings
Warnings that support world-ready
libraries and applications.
Interoperability Warnings
Warnings that support interacting with
COM clients.
Naming Warnings
Warnings that support adherence to the
naming conventions of the .NET
Framework Design Guidelines.
Performance Warnings
Warnings that support high performance
libraries and applications.
Security Warnings
Warnings that support safer libraries
and applications.
Depending on your application some of those areas might not be very interesting, but if you e.g. need COM interoperability, the tests can really help you to avoid the pitfalls.
Other tools
Other static checking tools can help you to detect bugs like not disposing an IDisposable, memory leaks and other subtle bugs. For a extreme case see the not-yet-released NStatic tool.
NStatic is used to track things such as redundant parameters, expressions that evaluate to constants, infinite loops and many other metrics.
The benefits are that you can automatically find and quantify technical debt within your software application.
I find static code analysis tools indispensable on large enterprise application development where many developers and testers come and go over the life of an application but the code quality still needs to be kept high and the technical debt managed properly.
What are the benefits of doing static code analysis on your source code?
The benefits depend on the type of static code analysis that is performed. Static code analysis can range from simple to sophisticated techniques. For example, generating metrics about your source code to identify error prone code is one technique. Other techniques actively attempt to find bugs in your code. Sophisticated techniques use formal methods to prove that your code is free of bugs.
Therefore the benefit depends on the type of static code analysis being used. If the technique produces metrics (such as code complexity etc.), then a benefit is that these metrics could be used during code review to identify error prone code. If the technique detects bugs, then the benefit is that the developer can identify bugs before unit test. If formal methods based techniques are used to prove that the code does not contain bugs, then the benefit is that this information could be used to prove to the QA department (or certification authorities) that the code is free of certain types of bugs.
A more detailed description of the techniques and benefits can also be found on this page: www.mathworks.com/static-analysis
I'll try to describe the main ones:
Static code analysis identifies detects in the program at early stage, resulting decrease in cost to fix them.
It can detect flaws in the program's inputs and outputs that cannot be seen through dynamic testing.
It automatically scans uncompiled codes and identifies vulnerabilities.
Of what I know from dealing with checkmarx is that static code analysis fixes multiple vulnerabilities at a single point, which saves a lot of time for the developer.
Related
What is the difference between static analysis and dynamic analysis in terms of cyber security?
Static analysis means "read the source code and try to identify failures". For security, static analysis tools try to find security holes in the code, which are then presumably fixed before the code is released for production use.
Dynamic analysis means "watch the actual execution of the application to identify failures (e.g, deref null pointers, array access past the end of an array, re-use of dynamically allocated block without first freeing it, ...". Done during application development and debugging, it can find errors which are then presumably fixed before the code is released for production. Done during production execution, it may detect errors the software is about to make, and prevent those errors (e.g., don't actually do the deref, report an application error instead), at the price of considerably higher execution costs because of the intrusive nature of dynamic analysis.
Each has different strengths and weaknesses. Both techniques suffer from the Turing-induced inability to reason about software activities completely. Most of these tools have failings where they miss problems, or report problems that are not real. Usually these tools try to avoid reporting false positives, because people won't use tools the produce lots of such errors. Limiting the false positives tends to limit reporting of real errors too, so you can't be sure that a clean report means "no problems".
Both are types of software testing that are looking for un-unintended security vulnerabilities. As such they are separate from the unit or system testing which is focused on verifying expected outcomes or requirements
Static analysis (SAST) works at the code level. It is code scanning and looks for patterns of know vulnerabilities or poor coding practice. For instance scanning code to discover the use of insecure libraries.
Dynamic analysis (DAST) works at the compiled system level. It scans built systems looking for known vulnerabilities. For instance, scanning a web application via its front end to find cross-site scripting vulnerabilities.
Both are generally used during the SDLC pre-release. SAST tends to be to the left of DAST and can pick up issues earlier, however, neither are fully effective at picking up all issues, and both are also prone to false positives.
I found several questions about this topic, and all of them with lot of references, but still I don't have a clear idea about that, because most of the references speak about concrete tools and not about the concept in general of the analysis. Thus I have some questions:
About Static analysis:
1. I would like to have a reference, or a summary of which techniques are successful and have more relevance nowadays.
2. What really can they do about discovering bugs, can we make a summary or it is depending of the tool?
About symbolic execution:
1. Where could be enclose symbolic execution? I guess depending of the approach,
I would like to know if they are dynamic analysis, or mix of static and dynamic analysis if it is possible to determine.
I found problems to differentiated the two different techniques in the tools, even I think I know the theoretical difference.
I'm actually working with C
Thanks in advance
I'm trying to give a short answer:
Static analysis looks at the syntactical structure of code and draws conclusions about the program behavior. These conclusions must not always be correct.
A typical example of static analysis is data flow analysis, where you compute sets like used, read, write for every statement. This will help to find e.g. uninitialized values.
You can also analyze the code regarding code-patterns. This way, these tools can be used to check if you are complying to a specific coding standard. A prominent coding standard example is MISRA. This coding standard is used for safety critical systems and avoids problematic constructs in C. This way you can already say a lot about the robustness of your applications against memory leaks, dangling pointers, etc.
Dynamic analysis is not looking at the syntax only, but takes state information into account. In symbolic execution, you are adding assumptions about the possible values of all variables to the statements.
The most expensive and powerful method of dynamic analysis is model checking, where you really look at all possible execution states of the system. You can think of a model checked system as a system that is tested with 100% coverage - but there are of course a lot of practical problems that prevent real systems to be checked that way.
These methods are very powerful, and you can gain a lot from the static code analysis tools especially when combined with a good coding standard.
A feature my software team found really impressive is e.g. that it will tell you in C++ when a class with virtual methods does not have a virtual destructor. Easy to check in fact, but really helpful.
The commercial tools are very expensive, but worth the money, once you learned how to use them. A typical problem in the beginning is that you will get a lot of false alarms, and don't know where to look for the real problem.
Note that nowadays g++ has some of this stuff already built-in, and that you can use something like pclint which is free.
Sorry - this is already getting quite long...hope it's interesting.
The term "static analysis" means that the analysis does not actually run a code. On the other hand, "dynamic analysis" runs a code and also requires some kinds of real test inputs. That is the definition. Nothing more.
Static analysis employs various formal methods such as abstract interpretation, model checking, and symbolic execution. In general, abstract interpretation or model checking is suitable for software verification. Symbolic execution is more appropriate for the purpose of bug finding.
Symbolic execution is categorized into static analysis. However, there is a hybrid method called concolic execution which uses both symbolic execution and dynamic testing.
Added for Zane's comment:
Maybe my explanation was little confusing.
The difference between software verification and bug finding is whether the analysis is sound or not. For example, when we say the buffer overrun analyzer is sound, it means that the analyzer must report all possible buffer overruns. If the analyzer reports nothing, it proves the absence of buffer overruns in the target program. Because model checking is the method that guarantees soundness, it is mostly used for software verification.
On the other hands, symbolic execution which is actively used by today's most commercial static analyzers does not guarantee soundness since sound analysis inherently issues lots, lots of false positives. For the purpose of bug finding, it is more important to reduce false positives even if some true positives are also lost.
In summary,
soundness: there are no false negatives
completeness: there are no false positives
software verification: soundness is more important than completeness
bug finding: completeness is more important than soundness
I understand the pros and cons of using object oriented programming as a concept. What I'm looking for are the pros and cons of using oo in progress/openedge specifically. Are there challenges that I need to take into account? Are there parts of the language that don't mesh well with oo? Stuff like that.
Edit: using 10.2b
I'll give you my opinion, but be forewarned that I'm probably the biggest Progress hater out there. ;) That said, I have written several medium-sized projects in OOABL so I have some experience in the area. These are some things I wrote, just so you know I'm not talking out of my hat:
STOMP protocol framework for clients and servers
a simple ORM mimicking ActiveRecord
an ABL compiler interface for the organization I was at (backend and frontend)
a library for building up Excel and Word documents (serializing them using the MS Office 2003 XML schemas; none of that silly COM stuff)
an email client that could send emails using multiple strategies
OOABL Pros:
If you absolutely must write Progress code, it is a great option for creating reusable code.
Great way to clean up an existing procedural codebase
OOABL Cons:
Class hierarchies are limited; you can’t create inherited (sub-)
interfaces in 10.2B (I think this was going to be added in 11). Older
versions of OpenEdge have other limitations like lack of abstract
classes. This limits your ability to create clean OO design and will
hurt you when you start building non-trivial things.
Error handling sucks - CATCH/THROW doesn’t let you throw your custom
errors and force callers to catch them. Backwards compatibility
prevents this from evolving further so I doubt it will ever improve.
Object memory footprint is large, and there are no AVM debugging
tools to track down why (gotta love these closed systems!)
Garbage collection wasn’t existent ‘til 10.2A, and still
has some bugs even in 11 (see official OE forum for some examples)
Network programming (with sockets) is a PITA - you have to run a
separate persistent procedure to manage the socket. I think evented
programming in OOABL was a PITA in general; I remember getting a lot
of errors about “windowed environments” or something to that effect
when trying to use them. PUBLISH/SUBSCRIBE didn’t work either,
if memory serves.
Depending on your environment, code reviews may be difficult as most
Progress developers don’t do OOABL so may not understand your code
If above point is true, you may face active resistance from
entrenched developers who feel threatened by having to learn new
things
OO is all about building small, reusable pieces that can be combined to make a greater whole. A big problem with OOABL is that the “ABL” part drags you down with its coarse data structures and lack of enumerators, which prevent you from really being able to build truly beautiful things with it. Unfortunately, since it is a closed language you can’t just sidestep the hand you’re dealt and create your own new data or control structures for it externally.
Now, it is theoretically possible to try and build some of these things using MEMPTRs, fixed arrays (EXTENT), and maybe WORK-TABLEs. However, I had attempted this in 10.1C and the design fell apart due to the lack of interface inheritance and abstract classes, and as I expected, performance was quite bad. The latter part may just be due to my poor ability, but I suspect it's an implementation limitation that would be nigh impossible to surmount.
The bottom line is use OOABL if you absolutely must be coding in OpenEdge - it’s better than procedural ABL and the rough edges get slightly smoother after each iterative release of OpenEdge. However, it will never be a beautiful language (OO or otherwise).
If you want to learn proper object-oriented programming and aren’t constricted to ABL, I would highly recommend looking at a language that treats objects as a first-class citizen such as Ruby or Smalltalk.
In the last four years I have worked 80% of the time with OOABL (started with 10.1c).
I definitely recommend using OOABL but I think it is very important to consider that using OOABL the same way as in other OO languages is fraught with problems.
With "the same way" I mean design patterns and implementation practices that are common in the oo world. Also some types of applications, especially in the area of technical frameworks are hard to do with OpenEdge (e.g. ORM).
The causes are performance problems with OOABL and missing OO features in the language.
If you are programming in C# or Java for exampe, memory footprint and instantiation time of objects are not a big issue in many cases. Using ABL this becomes a big issue much more often.
This leads to other design decisions and prevents the implementation of some patterns and frameworks.
Some missing or bad OO features:
No class library, no data structures needed for oo
No package visibility as in java (internal in c#)
This becomes relevant especially in larger applications
No Generics
Really terrible Exception Handling
Very limited reflection capabilities (improved in oe11)
So if you are familiar with oo programming in other languages and start working with OOABL, you could reach a point at which you were missing a lot of things you expect to be there, and get frustrated when trying to implement such things in ABL.
If your application has to run on Windows only, it is also possible to implement new oo code in C# and call it from your existing progress code via clr bridge, which works very smoothly.
K+
Only one thing - "Error handling sucks" - it sucks, but not because you can not make your own error-classes a catch them in caller block - that works, I'm using that. What sucks is that mix from old NO-ERROR / ERROR-HANDLE option and Progress.Lang.Error / CATCH block and ROUTINE-LEVEL ON ERROR UNDO, THROW. That is a big problem, when exists no convention in team, which errorhandling and how will be used.
There is an "DO-178B" level A and level B certification for airborne systems. Does it prohibit using of optimizating compilers?
E.g. Some compilers will reorder instructions to get more performance. Does DO-178B lev.A or lev.B prohibits this reordering?
Most modern CPU have such reordering builtin in the hardware. Are they allowed to be used within DO-178B lev.A softare/hardware systems?
First, and critically: For this type of question, if the answer matters, you need to get a formal professional opinion from someone who is competent to provide it, or discuss this with your certification authority. Any reply you will get here should not be relied on.
With that said, I will assume you are asking from a point of curiousity and will not be relying on the answer in any meaningful way, and I will attempt to answer in that vein. I am not a professional, and this is not professional advice.
The most on-point documentation I could find online with a quick search was this FAA guideline paper about a related topic: http://www.faa.gov/aircraft/air_cert/design_approvals/air_software/cast/cast_papers/media/cast-12.pdf. This paper describes the conditions under which one must do verification of the generated object code rather than the source code. In particular, it gives a number of examples that will occur even in non-optimized code -- automatic variable initialization and exception handling are a couple of examples. On compiler optimization, it notes:
Compiler optimization is another area addressed under section 4.4.2a of DO-178B/ED-12B. This involves the analytical determination that the optimization features do not compromise the ability of the test cases to demonstrate requirements-based testing and structural coverage consistent with the software level. This is a separate issue from the traceability and additional verifications issues addressed by Section 4.4.2b. This is outside the scope of this paper.
I do not have a copy of DO-178B handy to read section 4.4.2a, but I would note that (a) there are procedures for handling other cases where the object code does not correspond to the source code in a one-to-one manner, and (b) this pretty strongly implies that compiler optimization is discussed rather than outright prohibited.
It's also pretty clear from a number of the discussions in that paper that the answer to "we can't trace things between the source code and the object code" is to validate the object code in some manner -- in other words, there is a solution other than prohibiting such things.
Thus, I would conclude that at least some compiler optimizations must be permitted.
In particular, the sort of reordering that you describe is quite traceable, and it seems almost certain to me that it would be permitted.
DO-178B is not absolute and is open to interpretation. If you switch off optimisation there is no questions and nothing to explain. By sticking to the most obvious interpretation you avoid having to sell your interpretation to certification authorities later on and opening your self up to questions about how you did things.
When you optimise your code it is hard to do the source to instruction traceability that is required for level A. In addition if you are using Do-178B getting that extra 5% out of your software is not your greatest concern. The ease of completing all the required certification steps should be your primary concern since that is what is going to be sucking up all your time.
The hardware part of your question is interesting. For software optimisation code is not just reordered it is changed as well. But for hardware the code is not changed to get higher speed only the execution order. I have to ask around to get more info on what the thinking is on this.
I have only superficial knowledge of DO-178B (I do not work day-to-day with it, but I build tools for people who do).
The standard takes traceability very seriously. High-level requirements are declined into low-level requirements, which are implemented by the source code, which is compiled by the compiler. At each of these steps, one must be able to justify what was done in terms of the specifications produced by the previous step.
For the compiler, this means that one must be able to read the assembly and trace one particular instruction to the source code statement that caused this instruction to be generated.
So, in short, yes, I think this prohibits most optimizations.
Concerning the hardware this software is run on, it is verified differently (but I guess just as stringently). The relevant standard is then DO-254, and I do not know anything about it.
With optimization, you need to verify the generated code at the object assembly language level. There are compiler suites and libraries for embedded real-time multitasking that have been previously verified in other projects, giving you a comfort level that they can be verified again - but you still need to verify the code used in your application.
To avoid delays and having to explain things just turn off optimizations and cache. This makes the code deterministic. Also try not to use GCC if possible and go for a qualified compiler such as IAR or DDCI or Irvine Compilers or something. Instead of trying bang the screw with a fancy hammer get a screw driver that works for the screw. Because when that plane crashes with 200 people on board, with mothers, fathers and children and they find out that the compiler reordered code and that caused the failure you will wish that you only had the right screw driver.
What are the possible and critical disadvantages of Aspect-Oriented Programming?
For example: cryptic debugging for newbies (readability impact)
I think the biggest problem is that nobody knows how to define the semantics of an aspect, or how to declare join points non-procedurally.
If you can't define what an aspect does independently of the context in which it will be embedded, or define the effects that it has in such a way that it doesn't damage the context in which it is embedded, you (and tools) can't reason about what it does reliably. (You'll note the most common example of aspects is "logging", which is defined as "write some stuff to a log stream the application doesn't know anything about", because this is pretty safe). This violates David Parnas' key notion of information hiding. One of the worst examples of aspects that I see are ones that insert synchronization primitives into code; this effects the sequence of possible interactions that code can have. How can you possibly know this is safe (won't deadlock? won't livelock? won't fail to protect? is recoverable in the face of a thrown exception in a synchronization partner?) unless the application only does trivial things already.
Join points are now typically defined by providing some kind of identifier wildcarding (e.g, "put this aspect into any method named "DataBaseAccess*". For this to work, the folks writing the affected methods have to signal the intention for their code to be aspectized by naming their code in a funny way; that's hardly modular. Worse, why should the victim of an aspect even have to know that it exists? And consider what happens if you simply rename some methods; the aspect is no longer injected where it is needed, and your application breaks. What is needed are join point specifications which are intentional; somehow, the aspect has to know where it is needed without the programmers placing a neon sign at each usage point. (AspectJ has some control-flow related join points which seem a bit better in this regard).
So aspects are kind of interesting idea, but I think they are technologically immature. And that immaturity makes their usage fragile. And that's where the problem lies.
(I'm a big fan of automated software engineering tools [see my bio] just not this way).
Poor toolchain support - debuggers, profilers etc may not know about the AOP and so may work on code as if all the aspects had been replaced by procedural code
Code bloat - small source can lead to much larger object code as code is "weaved" throughout the code base
I think the biggest disadvantage is using AOP well. People use it in places where it doesn't make sense, for example, and will not use it where it does.
For example, a factory pattern is obviously something that AOP can do better, as DI can also do it well, but the observer pattern is simpler when using AOP, as is the strategy pattern.
It will be harder to unit test, esp if you do the weaving at runtime.
If weaving at runtime then you also take a performance hit.
Having a good way to model what is going on when mixing AOP with classes is a problem, as UML I don't think is a good model at that point.
Unless you are using Eclipse then tools do have problems, but, with Eclipse and AJDT AOP is much easier.
We still use junit and nunit for example, and so have to modify our code to allow unit tests to run, when using priviledged mode AOP could do better unit tests by testing private methods also, and we don't have to change our programs just to make them work with unit-testing. This is another example of not really understanding how AOP can be helpful, we are still chained in many ways to the past with unit-test frameworks and current design pattern implementations and don't see how AOP could help us do better coding.
Maintenance and Debugging. With aop, you suddenly have code that is being run at a given point ( method entry, exit, whatever ) but in just looking at the code, you have no clue that it's even getting called, especially if the aop configuration is in another file, like xml config. If the advice causes some changes, then while debugging an application, things may look strange with no explanation. This doesn't affect only newbies.
I would not call it a critical disadvantage, but the biggest problem I have seen is one of developer experience and ability to adapt. Not all developers yet understand the difference between declarative and imperative programming.
We make use of the policy injection application block in EntLib 4.1 pretty extensively as well as Unity for DI and it's just not something that sinks in quickly for some people. I find my self explaining over and over again to the same people why the application is not behaving in the way they expect. It generally starts off them explaining something and me saying "see that declaration above the method." :) Some people get it right away, love it and become extremely productive -- others struggle.
Learning curve is not unique to AOP, but it seems to have a higher learning curve that other things that your average developer encounters.
With regard to the maintenance/debugging argument, aspect-oriented programming tends to go hand-in-hand with all the other aspects of agile software-development practices.
These practices tend to remove debugging from the picture, replacing it with unit testing and test-driven development.
In addition, it may be much easier to maintain a small, clear code footprint with advice than a large, incomprehensible code footprint without advice (the advice being the thing that transforms a large, incomprehensible code footprint into a small, clear code footprint).
Because the power of AOP, if there is a bug your crosscutting, it can cause to widespread problems. On the otherhand, someone might change the join points in a program – e.g., by renaming or moving methods – in ways that the aspect writer did not expect, with unintended consequences. One advantage of modularizing crosscutting concerns is enabling one programmer to affect the entire system easily.