Is its possible to implement CAST-like analysis with SonarQube? - code-analysis

In my (huge) company we mostly use two tools for code analysis:
Sonar(Qube) - in the development, tightly integrated with CIs, known and loved my developers.
CAST - required by the processes. No continuous measurements, only a couple of times a year, for instance on major releases. CAST analysis is completely decoupled from the development, done by a separate team (we just send the delivery package to analyse).
I'm on the dev side as you may guess, I (somewhat) know Sonar/PMD, but not CAST. In any case I'm not quite happy with the frequency of the CAST analysis, but It is probably not the process I could influence or change.
So I was thinking if it maybe would be possible to implement in Sonar similar rules as in CAST. Surely not all and not everything but at least something that there would be no big surprises from the CAST analysis of releases.
I googled all over, looking for something like "PMD rules for Sonar/PMD" but could not find anything.
My question ist for those who have experience with both Sonar and CAST:
Is it possible to implement CAST analysis rules (or a certain approximation thereof) in Sonar?

I know both tools, CAST and SonarQube.
The answer to your question will be technology dependant: which languages are you using for your developments? Are you using any framework?
Don't tell me just J2EE, because this covers a lot of different languages: Java, JavaScript, JSP, HTML, ... not talking about frameworks (Spring, Hibernate, Struts, ...) and each solution will have different analyzers for these languages with different rules.
The main thing between CAST and SonarQube is that both use lexical analyzis to identify violations to programming best practices, but CAST also identify links between components (reason why it's slower). So CAST will have some metrics like fan-in/fan-out and some additional rules (that they name architectural or structural), like avoid direct access from presentation layer to data layer. Also, it comes with some xml files to analyze framework components.
This kind of rules could represent an additional 20%, but again this is very language dependant. And also of the versions of both tools as the number of rules may be different between releases.
And no, I think that not all of these 'architectural' rules would be possible to implement with SonarQube. However, they are not all very critical so you don't miss so much.
I suppose you company use CAST as some Quality Gate?
In that case, I would recommand to work with the guys using CAST in order to identify which CAST rules are critical for them and could trigger a NoGo or KO. Just post these rules here, and I am sure you will get some good comments about it.
Do not hesitate to ask for further precision.
Regards.

I believe that the set of available rules in SonarQube should cover 99% of the rules you're checking with CAST. If you can't find a valuable rule from CAST in the SonarQube's set, feel free to inquire on stackoverflow.

We use sonarqube in our development envinronment in the company, but in my project the client uses CAST as SQ tool. Almost all CAST rules are covered by sonar rules, for those that are not, we write it using XPATH based on squid java plugin (Sonar 3.8+ if I remember well).
With XPATH we covered the 99% of the CAST rules, we write 30 rules more or less. You can find more information about in this link, there are a tool for test de XPATH rules based on AST representation of a class too (linked in the link below).
http://docs.sonarqube.org/display/SONAR/Extending+Coding+Rules
By the way, our project is based on Java, Oracle Web Center Sites, spring, cxf and JPA/hibernate
Hope helps, cheers

Related

Creating a simple Domain Specific Language

I am curious to learn about creating a domain specific language. For now the domain is quite basic, just have some variables and run some loops, if statements.
Edit :The language will be Non-English based with a very simple syntax .
I am thinking of targeting the Java Virtual Machine, ie compile to Java byte code.
Currently I know how to write some simple grammars using ANTLR.
I know that ANTLR creates a lexer and parser but how do I go forward from here?
about semantic analysis: does it have to be manually written or are there some tools to create it?
how can the output from the lexer and parser be converted to Java byte code?
I know that there are libraries like ASM or BCEL but what is the exact procedure?
are there any frameworks for doing this? And if there is, what is the simplest one?
You should try Xtext, an Eclipse-based DSL toolkit. Version 2 is quite powerful and stable. From its home page you have plenty of resources to get you started, including some video tutorials. Because the Eclipse ecosystem runs around Java, it seems the best choice for you.
You can also try MPS, but this is a projectional editor, and beginners may find it more difficult. It is nevertheless not less powerful than Xtext.
If your goal is to learn as much as possible about compilers, then indeed you have to go the hard way - write an ad hoc parser (no antlr and alike), write your own semantic passes and your own code generation.
Otherwise, you'd better extend an existing extensible language with your DSL, reusing its parser, its semantics and its code generation functionality. For example, you can easily implement an almost arbitrary complex DSL on top of Clojure macros (and Clojure itself is then translated into JVM, you'll get it for free).
A DSL with simple syntax may or may not mean simple semantics.
Simple semantics may or may not mean easy translation to a target language; such translations are "technically easy" only if the DSL and the target languate share a lot of common data types and execution models. (Constraint systems have simple semantics, but translating them to Fortran is really hard!). (You gotta wonder: if translating your DSL is easy, why do you have it?)
If you want to build a DSL (in your case you stick with easy because you are learning), you want DSL compiler infrastructure that has whatever you need in it, including support for difficult translations. "What is needed" to handle translating all DSLs to all possible target languages is clearly an impossibly large set of machinery.
However, there is a lot which is clear that can be helpful:
Strong parsing machinery (who wants to diddle with grammars whose structure is forced
by the weakness of the parsing machinery? (If you don't know what this is, go read about LL(1) grammmars as an example).
Automatic construction of a representation (e.g, an abstract syntax tree) of the parsed DSL
Ability to access/modify/build new ASTs
Ability to capture information about symbols and their meaning (symbol tables)
Ability to build analyses of the AST for the DSL, to support translations that require
informatoin from "far away" in the tree, to influence the translation at a particular point in the tree
Ability to reogranize the AST easily to achieve local optimizations
Ability to consturct/analysis control and dataflow information if the DSL has some procedural aspects, and the code generation requires deep reasoning or optimization
Most of the tools available for "building DSL generators" provide some kind of parsing, perhaps tree building, and then leave you to fill in all the rest. This puts you in the position of having a small, clean DSL but taking forever to implement it. That's not good. You really want all that infrastructure.
Our DMS Software Reengineering Toolkit has all the infrastructure sketched above and more. (It clearly doesn't, and can't have the moon). You can see a complete, all-in-one-"page", simple DSL example that exercises some ineresting parts of this machinery.

Can PMD be customized to fully support a new language?

Can PMD be customized to fully support a new language, in a reasonable amount of time. I mean I know that technically almost anything can be done, but im wondering if this can be done in a reasonable amount of time? E.g. < 2 weeks
This page mentions how to write a CPD parser http://pmd.sourceforge.net/cpd-parser-howto.html
But is this just for copy / paste detection? Does writing a CPD parser give me full support of PMD in terms of rile sets?
I would guess not, but I'm not a PMD expert (and I have my own bias, check my bio).
The issues are:
Can you define a syntax for my langauge quickly (maybe, depending on how good you are, how messy the language is, and the strength of the parsing machinery offered by PMD)
Can you define the semantics of my language so that "semantic checks" provided by PMD work. You have to do this, because syntax tells you (and a tool) literally nothing about semantic of the syntax. I would guess that the PMD tool 'semantic checks' are pretty wired into the precise details of Java; if you language matched java perfectly, this would be zero work. But it doesn't, or you wouldn't be asking the question. And langauge semantics differences, even minor ones, cause discontinuous changes to the interpreation of the code. Before you get to doing even "serious" semantics, you're likely to have to build a symbol table mapping identifiers in the code to declarations (and the "semantic" type) for those symbols. Based on tool infrastructure I work with, this step alone takes 1-2 months for a real language.
Lastly, you are likely to have to code special PMD checks that are specific to your langauge. That takes time and energy, too.
I build generic compiler-type machinery (parsers, flow analyzers, style/error checkers) and get asked the equivalent of this question all the time WRT to our machinery. We try to have a lot of machinery available, try to make it easy to integrate new langauges, and we've been working on trying to make this "convenient and fast" for 15+ years. Its still not convenient, and there's no way to do this with our tools in a few weeks. I doubt PMD is better.

When its enough for a programming language that you need to switch to another?

I have wonder that many big applications (e.g. social websites such as facebook) are build with many languages into its platform.
They usually start with AJAX browser support, then scale down to PHP scripting, then move towards a powrful OOP technologie such as Java or .NET, and finally a primitive language to increase performance in crucial operations such as C.
My question is how should I determinate the edge of the layers between languages. When PHP, when Java, when C and so on. And the other question is if should those languages integrate in a vertcal fashion for simplicity and maintanance, or could it be cases when you decide to program on module of your app in Java and the other in native C.
What are the context variables that push me to move to a better performance language? (e.g. concurrency issues due increase of users)
Don't tell me that PHP overlaps .NET and Java Technologies. In a starter point it does, but when the network is overload you start seeing the diferences. I mean how can I achieve Multithreading in PHP as in Java with the same performance. The thing it's hard to answer my wuestion is becasue there is not so much reading about this. You maybe find some good books covering PHP, but few telling how when and why integrate different languages.
Each language was created for different purposes, Python is strong with string operations, Perl very powerful in batch scripting, PHP a very reliable application web server, C the mother of most popular languages.
Best,
Demian.
On one end of the scale, you move to a higher performance language whenever your profiling and measurements tell you that you have a bottleneck that can't be fixed with better algorithms, data structures, or other optimisation.
At the other end, you move to a higher level language (ie. more abstraction, better libraries) whenever your management allow you to do so. ;)
I believe most teams simply use what they are best familiar with.
There are also questions of licensing that can influence the decision.
That is, if you're talking about technologies that compare to each other and solve the problem on the same level (for example ASP.NET/JSF/JSP/PHP...). But you can't compare .NET with C++ for example, they are meant to solve different problems on different abstraction levels.
My criterion for any programming language is "does it help me to get the job done or does it just get in the way?" If the latter, then it's time to move on.
From an economical point of view the answer is easy: on a regular basis just look what will be cheaper. Either continue with the current technology and maybe stretch the envelope a bit more. Or switch to something new. When you compare the two alternatives the cost of the investment already done is not important anymore since you've already spent that money/effort. You only have to look ahead: cost of licenses, education, etc.
Of course this is easier said then done, but just sitting down with a few people, thinking about it, and maybe try to come up with some numbers already helps a lot. I have seen too many projects that continued with technology that really wasn't suited for the job anymore.
Also hard numbers don't tell the whole story. There will be resistance because of unfamiliar technology, experts who are losing their status, etc.
Identify the bottleneck
Solve bottleneck
Go to 1
I'm sure you can imagine that step 2 is the one where decisions like "What programming language do we use" and "where do we put the coffee machine" come into play. That's the basic rule.

How do you write good highly useful general purpose libraries?

I asked this question about Microsoft .NET Libraries and the complexity of its source code. From what I'm reading, writing general purpose libraries and writing applications can be two different things. When writing libraries, you have to think about the client who could literally be everyone (supposing I release the library for use in the general public).
What kind of practices or theories or techniques are useful when learning to write libraries? Where do you learn to write code like the one in the .NET library? This looks like a "black art" which I don't know too much about.
That's a pretty subjective question, but here's on objective answer. The Framework Design Guidelines book (be sure to get the 2nd edition) is a very good book about how to write effective class libraries. The content is very good and the often dissenting annotations are thought-provoking. Every shop should have a copy of this book available.
You definitely need to watch Josh Bloch in his presentation How to Design a Good API & Why it Matters (1h 9m long). He is a Java guru but library design and object orientation are universal.
One piece of advice often ignored by library authors is to internalize costs. If something is hard to do, the library should do it. Too often I've seen the authors of a library push something hard onto the consumers of the API rather than solving it themselves. Instead, look for the hardest things and make sure the library does them or at least makes them very easy.
I will be paraphrasing from Effective C++ by Scott Meyers, which I have found to be the best advice I got:
Adhere to the principle of least astonishment: strive to provide classes whose operators and functions have a natural syntax and an intuitive semantics. Preserve consistency with the behavior of the built-in types: when in doubt, do as the ints do.
Recognize that anything somebody can do, they will do. They'll throw exceptions, they'll assign objects to themselves, they'll use objects before giving them values, they'll give objects values and never use them, they'll give them huge values, they'll give them tiny values, they'll give them null values. In general, if it will compile, somebody will do it. As a result, make your classes easy to use correctly and hard to use incorrectly. Accept that clients will make mistakes, and design your classes so you can prevent, detect, or correct such errors.
Strive for portable code. It's not much harder to write portable programs than to write unportable ones, and only rarely will the difference in performance be significant enough to justify unportable constructs.
Even programs designed for custom hardware often end up being ported, because stock hardware generally achieves an equivalent level of performance within a few years. Writing portable code allows you to switch platforms easily, to enlarge your client base, and to brag about supporting open systems. It also makes it easier to recover if you bet wrong in the operating system sweepstakes.
Design your code so that when changes are necessary, the impact is localized. Encapsulate as much as you can; make implementation details private.
Edit: I just noticed I very nearly duplicated what cherouvim had posted; sorry about that! But turns out we're linking to different speeches by Bloch, even if the subject is exactly the same. (cherouvim linked to a December 2005 talk, I to January 2007 one.) Well, I'll leave this answer here — you're probably best off by watching both and seeing how his message and way of presenting it has evolved :)
FWIW, I'd like to point to this Google Tech Talk by Joshua Bloch, who is a greatly respected guy in the Java world, and someone who has given speeches and written extensively on API design. (Oh, and designed some exceptionally good general purpose libraries, like the Java Collections Framework!)
Joshua Bloch, Google Tech Talks, January 24, 2007:
"How To Design A Good API and Why it
Matters" (the video is about 1 hour long)
You can also read many of the same ideas in his article Bumper-Sticker API Design (but I still recommend watching the presentation!)
(Seeing you come from the .NET side, I hope you don't let his Java background get in the way too much :-) This really is not Java-specific for the most part.)
Edit: Here's another 1½ minute bit of wisdom by Josh Bloch on why writing libraries is hard, and why it's still worth putting effort in it (economies of scale) — in a response to a question wondering, basically, "how hard can it be". (Part of a presentation about the Google Collections library, which is also totally worth watching, but more Java-centric.)
Krzysztof Cwalina's blog is a good starting place. His book, Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries, is probably the definitive work for .NET library design best practices.
http://blogs.msdn.com/kcwalina/
The number one rule is to treat API design just like UI design: gather information about how your users really use your UI/API, what they find helpful and what gets in their way. Use that information to improve the design. Start with users who can put up with API churn and gradually stabilize the API as it matures.
I wrote a few notes about what I've learned about API design here: http://www.natpryce.com/articles/000732.html
I'd start looking more into design patterns. You'll probably not going to find much use for some of them, but as you get deeper into your library design the patterns will become more applicable. I'd also pick up a copy of NDepend - a great code measuring utility which may help you decouple things better. You can use .NET libraries as an example, but, personally, i don't find them to be great design examples mostly due to their complexities. Also, start looking at some open source projects to see how they're layered and structured.
A couple of separate points:
The .NET Framework isn't a class library. It's a Framework. It's a set of types meant to not only provide functionality, but to be extended by your own code. For instance, it does provide you with the Stream abstract class, and with concrete implementations like the NetworkStream class, but it also provides you the WebRequest class and the means to extend it, so that WebRequest.Create("myschema://host/more") can produce an instance of your own class deriving from WebRequest, which can have its own GetResponse method returning its own class derived from WebResponse, such that calling GetResponseStream will return your own class derived from Stream!
And your callers will not need to know this is going on behind the scenes!
A separate point is that for most developers, creating a reusable library is not, and should not be the goal. The goal should be to write the code necessary to meet requirements. In the process, reusable code may be found. In that case, it should be refactored out into a separate library, where it can be reused in the future.
I go further than that (when permitted). I will usually wait until I find two pieces of code that actually do the same thing, or which overlap. Presumably both pieces of code have passed all their unit tests. I will then factor out the common code into a separate class library and run all the unit tests again. Assuming that they still pass, I've begun the creation of some reusable code that works (since the unit tests still pass).
This is in contrast to a lesson I learned in school, when the result of an entire project was a beautiful reusable library - with no code to reuse it.
(Of course, I'm sure it would have worked if any code had used it...)

Real time scripting language + MS DLR?

For starters I should let you guys know what I'm trying to do. The project I'm working on has a requirement that requires a custom scripting system to be built. This will be used by non-programmers who are using the application and should be as close to natural language as possible. An example would be if the user needs to run a custom simulation and plot the output, the code they would write would need to look like
variable input1 is 10;
variable input2 is 20;
variable value1 is AVERAGE(input1, input2);
variable condition1 is true;
if condition1 then PLOT(value1);
Might not make a lot of sense, but its just an example. AVERAGE and PLOT are functions we'd like to define, they shouldn't be allowed to change them or really even see how they work. Is something like this possible with DLR? If not what other options would we have(start with ANTRL to define the grammar and then move on?)? In the future this may need to run using XBAP and WPF too, so this is also something we need to consider, but haven't seen much if anything on dlr & xbap. Thanks, and hopefully this all makes sense.
Lua is not an option as it is to different from what they are already accustomed to.
Ralf, its going to reactive, and to be honest the timeframe for when the results should get back to the user may be 1/100 of a second all the way up to 2 weeks or a month(very complex mathematical functions).
Basically they already have a system they purchased that does some of what they need, and included a custom scripting language that does what I mentioned above and they don't want to have to learn a new one, they basically just want us to copy it and add functionality. I think I'll just start with ANTRL and go from there.
Lua
it's small, fast, easy to embed, portable, extensible, and fun!
Lua is definitly the best choice for soft real-time system (like computer games).
See http://shootout.alioth.debian.org/ for detailed benchmarks.
However, last time I checked, Lua used a mark-and-sweep garbage collector which can lead to deadline-violation and non-deterministic jitter in real-time systems.
I believe that you could use theoretically use the DLR, but I'm unsure about support in an XBAP (partially trusted?) scenario.
If you host the DLR you would quickly be able to take advantage of IronRuby or IronPython scripting. You would want to look at these implementations when creating your own language implementation. If you post your question to the IronPython mailing list I'm sure you would get a better reply around the XBAP scenario, and some of the developers there created ToyScript.
What kind of real-time requirement are you trying to fulfill? Is the simulation a hard real-time simulation (some kind of hardware-in-the-loop simulation ==> deadline is less than 1/1000 second)?
Or do you want the scripting-system to be "reactive" to user-input ==> 1/10 should be sufficient.
I am no expert regarding MS DLR, but as far as I know, it does not support hard real-time systems. You may want to take a look at the real-time specification for Java (RTSJ)
Firstly I think that defining your own language is not the way to go.
Primarily because the biggest productivity gains you can get for programmers or non-programmers are the development tools. You (and 99.9% of the rest of us) are not going to write tools as good as what is out their.
Language design is hard.
Language support and documentation, also hard
I would recommend looking for a pre-built solution. If you could find a language that can lock down some functionality, that would be a good starting point. MatLab would be the first that comes to my mind.
Lastly, ditch the natural language part, BASIC, COBOL and YA-TDWTF-Lang all tried and failed at it.
Full disclosure: I work for a company that is developing a generalized domain specific language "system". It's targeted at data-in/text-out applications so it's not apropos and it's not yet to beta. The result is I'm somewhat knowledgeable and biased.