How to write static code analyzer for .net - vb.net

I am interested in writing static code analyzer for vb.net to see if it conforms to my company standard coding guidelines. Please advise from where i have to start.

Rather than write your own static code analyzer, I recommend you use FxCop: and instead, write custom FxCop rules for your needs. It'll save you a lot of time.
http://www.binarycoder.net/fxcop/

I would suggest you use Mono's Gendarme. It's a very nice tool, with plenty of built in rules. It also generates nice HTML reports.

if you need mroe architectural insight use NDepend. This tool does not stop to amaze me. It can do soo much more than FxCop. It's commercial though, but has a free trial version

FXCop is a good start for coding problems/mistakes, StyleCop is good for coding style (obviously), but if neither of those two work then you then you can either write a parser yourself or use the VBCodeProvider class in the .Net Framework

Start with FxCop. If you can't do what you're trying there, try something like NStatic or NDepend.

The best options are to use FxCop or StyleCop and write custom rules if necessary.

Use FxCop, this isn't a project you want to undertake personally. The parsing/lexical rules involved and the possible catches would be insane. The only way I could imagine doing it while retaining a modicum of sanity would be to use Lisp thanks to the extreme amount of expressiveness, but again, best to use FxCop.
If you must write a custom in-house tool for some (dogmatic?) reason, I'd recommend writing a Lisp program that does only basic rules-checking. Don't try to make it comprehensive, we're talking the kind of frontier that AI researchers are dealing with in terms of the parsing capabilities of a piece of software.
Just use Lisp to find the possible obvious offenders, or just at catching whatever it ends up being good at catching in terms of non-compliant code, then subject it to a brief human eye scan. I highly recommend abusing macros if you do use Lisp to write the parser.

I agree with one of the posters that it would be a quite difficult taks, but rather than with Lisp I'd start with F#, just like Microsoft did for their 3rd party windows drivers analysis tool:
http://arstechnica.com/journals/microsoft.ars/2005/11/10/1796
F# shares Lisp's expressiveness (ok, almost) and works on CLR just like VB.NET, which would make the whole thing easier.

Related

interpreting a script through F#

I really like F# but I feel like it's not succint and short enough. I want to go further. I do have an idea of how I'd like to improve it but I have no experience in making compilers so I thought I'd make it a scripting language. Then I realized that I could make it a scripting language and interpret it using F# but still get pretty much 100% performance thanks to F# having the inline option. Am I right? Is it really possible to make a script interpreter in F# that would go through my script and turn it into lots of functors and stuff and so get really good performance?
I really like F# but I feel like it's not succinct and short enough. I want to go further. I do have an idea of how I'd like to improve it but I have no experience in making compilers so I thought I'd make it a scripting language.
F# supports scripting scenarios via F# Interactive, so I'd recommend considering an internal DSL first, or suggesting features on the F# Language UserVoice page.
Then I realized that I could make it a scripting language and interpret it using F# but still get pretty much 100% performance thanks to F# having the inline option. Am I right?
Depending on the scenario, interpreted code may be fast enough, for example if 99% of your application's time is spent waiting on network, database or graphics rendering, the overall cost of interpreting the code may be negligible. This is less true for compute based operations. F#'s inline functions can help with performance tuning but are unlikely to provide a global panacea.
Is it really possible to make a script interpreter in F#
As a starting point, it is possible to write an interpreter for vanilla F# code. You could for example use F#'s quotation mechanism to get an abstract syntax tree (AST) for a code fragment or entire module and then evaluate it. Here's a small F# snippet that evaluates a small subset of F# code quotations: http://fssnip.net/h1
Alternatively you could design your own language from scratch...
Is it really possible to make a script interpreter in F# that would go through my script and turn it into lots of functors and stuff and so get really good performance?
Yes, you could design your own scripting language, defining an AST using the F# type system, then writing a parser that transforms script code into the AST representation, and finally interpreting the AST.
Parser
There are a number of options for parsing including:
active patterns & regex, for example evaluating cells in a spreadsheet
FsLex & FsYacc, for example to parse SQL
FParsec, a parser combinator library, for example to parse Small Basic
I'd recommend starting with FParsec, it's got a good tutorial, plenty of samples and gives basic error messages for free based on your code.
Small Examples
Here's a few simple example interpreters using FParsec to get you started:
Turtle - http://fssnip.net/nM
Minimal Logo language - http://fssnip.net/nN
Small Basic - http://fssnip.net/le
Fun Basic
A while back I wrote my own simple programming language with F#, based on Microsoft's Small Basic with interesting extensions like support for tuples and pattern matching. It's called Fun Basic, has an IDE with code completion and is available free on the Windows Store. The Windows Store version is interpreted (due to restrictions on emitting code) and the performance is adequate. There is also a compiler version for the desktop which runs on Windows, Mac and Linux.
Is it really possible to make a script interpreter in F#
So I guess, the answer is YES, if you'd like to learn more there's a free recording of a talk I did at NDC London last year on how to Write Your Own Compiler in 24 Hours
I'd also recommend picking up Peter Sestoft's Programming Language Concepts book which has a chapter on building your own functional language.

How do you write good highly useful general purpose libraries?

I asked this question about Microsoft .NET Libraries and the complexity of its source code. From what I'm reading, writing general purpose libraries and writing applications can be two different things. When writing libraries, you have to think about the client who could literally be everyone (supposing I release the library for use in the general public).
What kind of practices or theories or techniques are useful when learning to write libraries? Where do you learn to write code like the one in the .NET library? This looks like a "black art" which I don't know too much about.
That's a pretty subjective question, but here's on objective answer. The Framework Design Guidelines book (be sure to get the 2nd edition) is a very good book about how to write effective class libraries. The content is very good and the often dissenting annotations are thought-provoking. Every shop should have a copy of this book available.
You definitely need to watch Josh Bloch in his presentation How to Design a Good API & Why it Matters (1h 9m long). He is a Java guru but library design and object orientation are universal.
One piece of advice often ignored by library authors is to internalize costs. If something is hard to do, the library should do it. Too often I've seen the authors of a library push something hard onto the consumers of the API rather than solving it themselves. Instead, look for the hardest things and make sure the library does them or at least makes them very easy.
I will be paraphrasing from Effective C++ by Scott Meyers, which I have found to be the best advice I got:
Adhere to the principle of least astonishment: strive to provide classes whose operators and functions have a natural syntax and an intuitive semantics. Preserve consistency with the behavior of the built-in types: when in doubt, do as the ints do.
Recognize that anything somebody can do, they will do. They'll throw exceptions, they'll assign objects to themselves, they'll use objects before giving them values, they'll give objects values and never use them, they'll give them huge values, they'll give them tiny values, they'll give them null values. In general, if it will compile, somebody will do it. As a result, make your classes easy to use correctly and hard to use incorrectly. Accept that clients will make mistakes, and design your classes so you can prevent, detect, or correct such errors.
Strive for portable code. It's not much harder to write portable programs than to write unportable ones, and only rarely will the difference in performance be significant enough to justify unportable constructs.
Even programs designed for custom hardware often end up being ported, because stock hardware generally achieves an equivalent level of performance within a few years. Writing portable code allows you to switch platforms easily, to enlarge your client base, and to brag about supporting open systems. It also makes it easier to recover if you bet wrong in the operating system sweepstakes.
Design your code so that when changes are necessary, the impact is localized. Encapsulate as much as you can; make implementation details private.
Edit: I just noticed I very nearly duplicated what cherouvim had posted; sorry about that! But turns out we're linking to different speeches by Bloch, even if the subject is exactly the same. (cherouvim linked to a December 2005 talk, I to January 2007 one.) Well, I'll leave this answer here — you're probably best off by watching both and seeing how his message and way of presenting it has evolved :)
FWIW, I'd like to point to this Google Tech Talk by Joshua Bloch, who is a greatly respected guy in the Java world, and someone who has given speeches and written extensively on API design. (Oh, and designed some exceptionally good general purpose libraries, like the Java Collections Framework!)
Joshua Bloch, Google Tech Talks, January 24, 2007:
"How To Design A Good API and Why it
Matters" (the video is about 1 hour long)
You can also read many of the same ideas in his article Bumper-Sticker API Design (but I still recommend watching the presentation!)
(Seeing you come from the .NET side, I hope you don't let his Java background get in the way too much :-) This really is not Java-specific for the most part.)
Edit: Here's another 1½ minute bit of wisdom by Josh Bloch on why writing libraries is hard, and why it's still worth putting effort in it (economies of scale) — in a response to a question wondering, basically, "how hard can it be". (Part of a presentation about the Google Collections library, which is also totally worth watching, but more Java-centric.)
Krzysztof Cwalina's blog is a good starting place. His book, Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries, is probably the definitive work for .NET library design best practices.
http://blogs.msdn.com/kcwalina/
The number one rule is to treat API design just like UI design: gather information about how your users really use your UI/API, what they find helpful and what gets in their way. Use that information to improve the design. Start with users who can put up with API churn and gradually stabilize the API as it matures.
I wrote a few notes about what I've learned about API design here: http://www.natpryce.com/articles/000732.html
I'd start looking more into design patterns. You'll probably not going to find much use for some of them, but as you get deeper into your library design the patterns will become more applicable. I'd also pick up a copy of NDepend - a great code measuring utility which may help you decouple things better. You can use .NET libraries as an example, but, personally, i don't find them to be great design examples mostly due to their complexities. Also, start looking at some open source projects to see how they're layered and structured.
A couple of separate points:
The .NET Framework isn't a class library. It's a Framework. It's a set of types meant to not only provide functionality, but to be extended by your own code. For instance, it does provide you with the Stream abstract class, and with concrete implementations like the NetworkStream class, but it also provides you the WebRequest class and the means to extend it, so that WebRequest.Create("myschema://host/more") can produce an instance of your own class deriving from WebRequest, which can have its own GetResponse method returning its own class derived from WebResponse, such that calling GetResponseStream will return your own class derived from Stream!
And your callers will not need to know this is going on behind the scenes!
A separate point is that for most developers, creating a reusable library is not, and should not be the goal. The goal should be to write the code necessary to meet requirements. In the process, reusable code may be found. In that case, it should be refactored out into a separate library, where it can be reused in the future.
I go further than that (when permitted). I will usually wait until I find two pieces of code that actually do the same thing, or which overlap. Presumably both pieces of code have passed all their unit tests. I will then factor out the common code into a separate class library and run all the unit tests again. Assuming that they still pass, I've begun the creation of some reusable code that works (since the unit tests still pass).
This is in contrast to a lesson I learned in school, when the result of an entire project was a beautiful reusable library - with no code to reuse it.
(Of course, I'm sure it would have worked if any code had used it...)

What should I use the "My" namespace for in VB .NET?

I'm considering building a framework for VB.NET, and using the My namespace to plug it into VB seems like a reasonable idea. What is "My" used for?
The purpose of My, as I understand it, is to be an easy shortcut to certain API tasks that are common but hard-to-find or hard-to-use. You probably shouldn't completely subsume your framework under My. (For one thing, C# people using your framework may get grouchy.)
Instead, you should design it as a normal framework. When you're finished, make a list of some common tasks that people might want to use your framework for. See whether any of those could be useful to have under My, especially where there are classes or methods that can be used in a number of ways, but they have one or two really common usages that can be abbreviated with My.
This article shows how to extend My, and it has a section at the end that describes a few design guidelines to follow: Simplify Common Tasks by Customizing the My Namespace
As to your main question, when coding in VB .NET, I use My as often as I can. It reduces a number of operations to one line of code.
I really like the "My" Namespace in VB.NET and I always use it in my WindowsForms applications, because it is very intuitive.
I use primarily these categories:
My.Computer: primarily for file system and network purposes
My.Application: Version number, current directory
My.Resources: Access to resources used by the application residing in resource files in a strongly typed manner.
My.Settings: very handy
I think, if your extensions for My of your framework fit well, then many VB.NET programmers would appreciate them.
I've used My in my VB.NET projects, and I don't feel guilty about it. I am primarily a C# guy, but until I transitioned my company to C#, we were a VB shop. In my mind, the My namespace is a nice piece of syntactic sugar. Just as I'm not embarrassed to use C#'s coalesce operator and other sugar, I'm not embarrassed to use VB's sugar, either. (To an extent; I won't use the classic VB functions that .NET still exposes.)
That said, never put anything in that namespace. It's Microsoft's namespace, and just as you wouldn't put anything under System nor Microsoft, don't put anything under My. It will cause confusion later on -- if not for you, then for others who maintain your code. Create your own namespace for your own code.
We do use it in some code, but hesitantly so. It's true that My often helps make code more readable. For example, the Environment.SpecialFolder enumeration curiously lacks a Temp member, whereas My.Computer.FileSystem.SpecialDirectories has one (Path.GetTempPath() will do as well, but is hardly intuitive compared to other special folders).
But My is only beneficial in such cases because the existing APIs are badly-designed, not because My is inherently better. Like JAGregory, I strongly suggest one avoids extending My — or any other kind of global namespace, variable, etc. — whenever possible. The idea just doesn't fit a clean OOP architecture.
I never use the My namespace (I'm a C# developer), but my VB co-worker doesn't as well. I found the My members not necessary, because in many cases, they're counter-intuitive for me, e.g. in my opinion opening a file has something to do with IO (hence System.IO.File) and not with my computer (My.Computer.FileSystem). They always seem so scattered and bunched together.
It's just some re-roll of functionality that is already available otherwise, from all languages. And I don't like depending on Microsoft.VisualBasic.dll when I'm developing for .NET - I always prefer System.*.
And then, it's always kind of limited. I see VB developers struggle with their app when they can't find something in the My namespace, because they can't imagine that you can use something in the System namespace. That of course is not a problem of the My namespace itself.
I mainly use C# and Boo, but when I do use VB.NET I use My namespace quite often. I dont see any reason to not simplify coding. It still retains its readability.
I've only used it from a user perspective, I've never plugged anything into it. I consider the My namespace to be some highly reliable, platform-provided, global helper mechanisms. Officially sanctioned shortcuts, really. I might be surprised to see external user or third-party code in there.
As such, I'd encourage a vb framework to define its own appropriately-named namespace instead of latching on to the existing My namespace. Such a framework shouldn't have that "global" feel to it.
Never used it so far, although I've never actually looked into it either.
I wouldn't advise putting anything into the My namespace yourself, it's much more clear just to lay it out like you would if it were a non-VB framework.
Love the My! Anything that helps me get the job done faster, and provides code for solutions that I don't have to write, the better!
I use My.Settings and My.Computer often while programming in VB.NET. I particularly enjoy My.Settings as an alternative to using ConfigurationManager.AppSettings when it is appropriate.
I agree with John Rudy about the use of My. It is syntactic sugar that makes life a little more readable.
I don't use it a lot.
I'm considering building a framework for VB.NET, and using the My namespace to plug it into VB seems like a reasonable idea. Is it?
If it fits, by all means, use it. Since you didn't offer any further information about your framework it's hard to say. I wouldn't put general-purpose stuff into the My namespace (such as the My.Computer stuff) because there isn't really any advantage to putting it there. However, application-centered helpers fit in well.

Should I choose scripting or compiled code for small tasks?

I'm a Java programmer, and I like my compiler, static analysis tools and unit testing frameworks as tools that help me quickly deliver robust and efficient code. The JRE is pretty much everywhere I would work, too.
Given that situation, I can't see a reason why I would ever choose to use shell scripting, vb scripting etc, no matter how small the task is if I wear one of my other hats like my cool black sysadmin fedora.
I don't wear the other hats too often, under what circumstances should I choose scripting over writing compiled code?
Whatever you think will be most efficient for you!
I had a co-worker who seemed to use a different language for every task; Perl for quick text processing, PHP for small internal web applications, .NET for our main product, cygwin for filesystem stuff. He preferred to use the technology which was most specific to the task at hand.
Personally, I find that context switching between technologies is painful. My day-to-day work is in .NET, so that's pretty much the terms I think in. For most tasks I find it more efficient to knock something up in C# using SnippetCompiler than I would to hack around in PowerShell or a scripting environment.
If you are comfortable with Java, and the JRE is everywhere you work, then I would say keep using it. There are, however, languages like perl and python that are particularly suited to quickly solving problems. I would suggest learning either perl or python, and then use your judgement on when to use it.
If I have a small problem that I'd like to solve quickly, I tend to use a scripting language. The code tax is smaller, and, for me at least, the result comes faster.
I would say where it makes sense. If it's going to take you longer to open up your IDE, compile the script, etc. than it would to edit a script file and be done with it than use script file. If you're not going to be changing the thing often and are quicker at Java coding then go that route :)
It is usually quicker to write scripts than compiled programmes. You don't have to worry so much about portability between different platforms and environments. A shell script will run pretty much every where on most platforms. Because you're a java developer and you mention that you have java everywhere you might look at groovy (http://groovy.codehaus.org/). It is a scripting language written in java with the ability to use java libraries.
The way I see it (others disagree) all your code needs to be maintainable. The smallest useful collection of code is that which a single person maintains. Even that benefits from the language and tools you mentioned.
However, there may obviously be tasks where specialised languages are more advantageous than a single general purpose language.
If you can write it quicker in Java, then go for it.
Just try and be aware of what the various scripting languages can do.
e.g. Don't make a full blown Java app when you can do the same with a bash one-liner.
Weigh the importance of the tool against popping open a text editor for a quick edit vs. opening IDE, recompiling, redeploying, etc.
Of course, the prime directive should be to "use whatever you're comfortable with." If Java is getting the job done right and on time, stick to it. But a lot of the scripting languages could save you some time because they're attuned to different problems. If you're using regular expressions, the scripting languages are a good fit. If you're dropping into shell commands, scripts are nice.
I tend to use Ruby scripts whenever I'm writing something that's small, because it's quick to write, easy to maintain, and (with Gems) easy to bolt on additional functionality without needed to use JARs or anything. Your milage will, of course, vary.
At the end of the day this is a question that only you can answer for yourself. Based on the fact that you said "I can't see a reason why I would ever choose to use shell scripting , ..." then it's probably the case that you should never choose it right now.
But if I were you I would pick a scripting language like python, ruby or perl and start trying to solve some of these small problems with this language. Over time you will start to get a feel for when it is more appropriate to write a quick script than build a full-blown solution.
I use scripting languages for writing programs which are not expected to be maintained beyond few executions. Most of these languages are light on boiler-plate syntax and do have a REPL. Both these features enable rapid prototyping.
Since you already know Java, you can try JVM languages like Groovy, JRuby, BeanShell etc. Scala has much lighter syntax than Java, has a REPL, is statically typed and runs on the JVM - you might give that a shot as well.

Which scripting language to support in an existing codebase?

I'm looking at adding scripting functionality to an existing codebase and am weighing up the pros/cons of various packages. Lua is probably the most obvious choice, but I was wondering if people have any other suggestions based on their experience.
Scripts will be triggered upon certain events and may stay resident for a period of time. For example upon startup a script may define several options which the program presents to the user as a number of buttons. Upon selecting one of these buttons the program will notify the script where further events may occur.
These are the only real requirements;
Must be a cross-platform library that is compilable from source
Scripts must be able to call registered code-side functions
Code must be able to call script-side functions
Be used within a C/C++ codebase.
Based on my own experience:
Python. IMHO this is a good choice. We have a pretty big code base with a lot of users and they like it a lot.
Ruby. There are some really nice apps such as Google Sketchup that use this. I wrote a Sketchup plugin and thought it was pretty nice.
Tcl. This is the old-school embeddable scripting language of choice, but it doesn't have a lot of momentum these days. It's high quality though, they use it on the Hubble Space Telescope!
Lua. I've only done baby stuff with it but IIRC it only has a floating point numeric type, so make sure that's not a problem for the data you will be working with.
We're lucky to be living in the golden age of scripting, so it's hard to make a bad choice if you choose from any of the popular ones.
I have played around a little bit with Spidermonkey. It seems like it would at least be worth a look at in your situation. I have heard good things about Lua as well. The big argument for using a javascript scripting language is that a lot of developers know it already and would probably be more comfortable from the get go, whereas Lua most likely would have a bit of a learning curve.
I'm not completely positive but I think that spidermonkey your 4 requirements.
I've used Python extensively for this purpose and have never regretted it.
Lua is has the most straight-forward C API for binding into a code base that I've ever used. In fact, I usually quickly roll bindings for it by hand. Whereas, you often wouldn't consider doing so without a generator like swig for others. Also, it's typically faster and more light weight than the alternatives, and coroutines are a very useful feature that few other languages provide.
AngelScript
lets you call standard C functions and C++ methods with no need for proxy functions. The application simply registers the functions, objects, and methods that the scripts should be able to work with and nothing more has to be done with your code. The same functions used by the application internally can also be used by the scripting engine, which eliminates the need to duplicate functionality.
For the script writer the scripting language follows the widely known syntax of C/C++ (with minor changes), but without the need to worry about pointers and memory leaks.
The original question described Tcl to a "T".
Tcl was designed from the beginning to be an embedded scripting language. It has evolved to be a first class dynamic language in its own right but still is used all over the world as an embeded language. It is available under the BSD license so it is just about as free as it gets. It also compiles on pretty much any moden platform, and many not-so-modern. And not only does it work on desktop systems, there are variations available for mobile platforms.
Tcl excels as a "glue" language, where you can write performance-intensive functions in C while still benefiting from the advantages of a scripting language for less performance critical parts of the application.
Tcl also comes with a first class GUI toolkit (Tk) that is arguably one of the easiest cross platform GUI toolkits available. It also interfaces very nicely with SQLite and other databases, and has had built-in support for unicode for quite some time.
If the scripting interface will be made available to your customers (as opposed to simply enabling your own engineers to work at the scripting level), Tcl is extremely easy to learn as there are a total of only 12 rules that govern the entire language (as of tcl 8.6). In fact, Tcl shines as a way to invent domain specific languages which is often how it is used as an end-user scripting solution.
There were some excellent suggestions already, but I just wanted to mention that Perl can also be called / can call to C/C++.
You probably could use any modern scripting / bytecode language.
If you're willing to put up with the growing pains of a new product, you could use the Parrot VM. Which has support for many, if not all of the languages listed on this page. Unfortunately it's not done yet, but that hasn't stopped some people from using it in a production environment.
I think most people are probably mentioning the scripting language that they are most familiar with. From my perspective, Tcl was designed specifically to interface with C, so your problem domain is tailor-made for the language. However, I'm sure Python, Perl, or Lua would be fine. You should probably choose the language that is most familiar to your current team, since that will reduce the learning time.