Is there an automatic tool to find the DRY-ness of your code base? - dry

I am a strong advocate of the DRY principle:
Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.
Are there any tools that can test a code base for the amount of DRYness and both quantify and pinpoint specific examples for correction?

Simian
Simian (Similarity Analyser) identifies duplication in Java, C#, C,
C++, COBOL, Ruby, JSP, ASP, HTML, XML, Visual Basic, Groovy source
code and even plain text files. In fact, simian can be used on any
human readable files such as ini files, deployment descriptors, you
name it.
Simian runs natively in any .NET 1.1 or higher supported environment
and on any Java 5 or higher virtual machine, meaning Simian can be run
on just about any hardware and any operating system you can hope for.
Both the Java and .NET runtimes are included as part of the
distribution.

PMD's Copy Past Detector (CPD)
Duplicate code can be hard to find, especially in a large project. But
PMD's Copy/Paste Detector (CPD) can find it for you!
Note that CPD works with Java, JSP, C, C++, Fortran and PHP code.
You can run it commandline, there is an ANT task, and an Eclipse plugin.

Clone Detective for Visual Studio
Clone Detective is a Visual Studio integration that allows you to
analyze C# projects for source code that is duplicated somewhere else.
Having duplicates can easily lead to inconsistencies and often is an
indicator for poorly factored code.

See our CloneDR tool, which finds exact and near miss sets of duplicated code using computer langauge structure as guide. As well as detecting clones, it will show you a rough abstraction and the parameter bindings that explain the differences between the clone instances.
It has instantiations for many lanuages: C, C++, C#, Java, JavaScript, PHP, COBOL, Python, PLSQL, ... It tends to find 10-15% duplicated code across systems that have any serious size (e.g, 100K SLOC and above). There are sample reports for many languages at the web site, and you can download a trial copy.

Related

Any tools to check for duplicate VB.NET code?

I wish to get a quick feeling for how much “copy and paste” coding we have, there are many tools for C# / Java to check for this type of thing. Are there any such tools that work well with VB.NET?
(I have seen what looks like lots of repeated code, but wish to get some number to help me make a case for sorting it out)
Update on progress.
I have just tried Simian.
It does not seem to be able to produce a nicely formatted report I can sent by email
It does not cope when the names of local variables, or parameters etc may have been changed, e.g it just matches on lines of text being the same.
Clone Doctor does not support VB.NET (only C# and VB 6 and lot of other)
October 2010: VB.net added to langauges supported by CloneDR
Clone Detective for Visual Studio only supports C#
SolidSDD - Source Code Duplication Detector only supports C, C++, C# and Java
DuplicateFinder is open source, but otherwise looks very match like Simian, e.g it just works on lines of text
ConQAT - Continuous Quality Assessment Toolkit seems to have a clone detector that works for VB.NET (not tried it yet)
Gendarme is a bit like FXCop and has a AvoidCodeDuplicatedInSameClassRule rule, this looks very promising, as it avoids the problem of working at the text level. Just tried it, it is the best solution so far, pity it does not search with a greater scope.
Before claiming that this question is a duplicate, please check that the other question addresses VB.NET, as a lot of tools that work well for C# don't work so well for VB.NET. (However it would not surprise me if this question is a real duplicate)
CodeRush 11.2 introduced a new feature called Duplicate Detection and Consolidation (DDC)
http://community.devexpress.com/blogs/markmiller/archive/2011/11/29/duplicate-detection-and-consolidation-in-coderush-for-visual-studio.aspx
Make sure to check out the options for it as well, as you can have it run when so many lines are changed, certainly time has passed, etc.
They've posted some decent videos on the DevExpress site too.
Simian: http://www.redhillconsulting.com.au/products/simian/
[I'm the author of CloneDR ("Clone Doctor").]
CloneDR is parameterized by a full grammar for the programming language in question. So it doesn't just match lines. Rather, it can find clones which are syntactically well-formed, with variations that are more than just identifier changes, regardless of where they stop or start in a line.
The engine on which CloneDR rests, The DMS Software Reengineering Toolkit" is a tool for analyzing large scale systems in any programming language, and uses language descriptions to drive the analysis. DMS has a wide variety of language front ends already available.
Presently it has VBScript and VB6 (as dialects of "Visual Basic"). It doesn't have VB.net, but that would be pretty straightforward to do given the DMS infrastructure and our experience with lots of other languages.
So, CloneDR could do this just fine, with a small bit of effort on our part.
EDIT October 2010: VB.net added as a language CloneDR can process.
Atomiq supports vb.net amongst other languages, and the results are nicely presented.
JetBrains published console tool set Resharper Console Tools to run duplication analysis. Once installed it allows you to do the same analysis as TeamCity does and generate duplicates report locally and even include duplicates search into custom build process with MSBuild. This tool does exactly what you need. More details you can find here at JetBrains blog post
Try Simian:
Simian (Similarity Analyser) identifies duplication in Java, C#, C, C++, COBOL, Ruby, JSP, ASP, HTML, XML, Visual Basic, Groovy source code and even plain text files.
I once saw an impressive demo of Pattern Insight; its CP Miner may be what you’re looking for: http://patterninsight.com/products/cp-miner.php. It seems to be language-independent, though I couldn’t find anything explicit about languages other than C/C++.
Roll up your sleeves and write your own parser to use it with CPD?
See the question for the tools I found.

Can statically compiled languages replace scripting language?

Assuming you can get a dynamic interpreter; can statically compiled languages replace scripting language? I never quite understood why anyone would use a scripting language? I am talking about on PC, not a limited system which needs a simplistic interpreter. I seen some python install scripts and seen similar python and C# solutions to a problem. So why use a scripting language?
NOTE: There are things that bother me about C#, i am not asking why not use C# instead. I am asking why use a scripting language? I find static compiled languages much easier to debug and often easier to code in.
There is very little distinction these days between compiling and interpreting. Look at how an interpreted language is executed - the first step is to convert the script into some kind of internal executable form, like byte code that can be executed by a simpler instruction set. This is essentially compilation to a virtual machine format. This is exactly what modern compiled languages do. And when compiled languages are deployed in server-side web apps, they even recompile from the source on the fly. So there's practically no difference in terms of the compile/execute technique.
The only difference is in the details of the instruction set, specifically in the type system. Scripting languages are usually (but not always) dynamically typed. But many large applications are also written in dynamically typed languages too. So again, there is no clear distinction here.
Personally I think static typing, far from being "extra unnecessary effort" (as it is often described) is actually a huge productivity booster, making it much easier to write short snippets correctly on the first attempt, thanks to intellisense/autocompletion. To underline this, look at how Microsoft has improved the jQuery library simply by adding static type information to it (in specially formatted comments) so we can have intellisense in the IDE.
And meanwhile, static languages (including C# and Java) are bringing in more dynamic typing features.
So I see these categories as eventually merging and the distinction being meaningless.
Wikipedia says that a Scripting Language is a language that controls other software. You can do that with C#, but true scripting languages like Powershell are designed specifically for this.
I tend to think of a scripting language in more "interactive" terms than C#. With a scripting language, you can write a line or two of code, execute it and see the results immediately. That's not so easy in C#, where you have to put your code in a Console Application, or fire it off from a unit test, or type it into the Immediate window where you don't have intellisense.
That rapid cycle of write, execute allows rapid prototyping of complete "scripts" in a scripting language, because it gives you immediate feedback on each line of code.
This kind of question often starts flame wars as people are passionate about their respective camps.
In the computer olden days, Unix command line tools and console shells provided a rich scripting environment where all sorts of processing could be done. You didn't need to be an expert programmer in any specific language and could string (pun intended) various programs (other people wrote) together using the pipe structure to massage your data which was mostly text not binary related. It is quick and easy to make changes to your batch command file. You don't have a source file that has to be edited, compiled linked with external static or shared libries/DLLS in the case of Windows.
One thing scripting does not have normally have is speed. You don't write device drives and live internet trading AI systems in scripting. But if you run a script once a day on some data received via e-mail or ftp you don't normally care how long it takes as it can run it background anyway.
Rewind back to the present and the waters become muddy. Some scripting enviroments offer a kind of speed up facility where they will read you script and almost compile and link in modules the same a normal C++ or VB program might use for speed puposes. But this very iffy and can't be relied on.
So how do you choose which route to go. Start doing tasks using scripting. If it runs too slow or you are having to do stuff every 5 minutes then parts of your script might benifit from a section written in a traditional language or the whole thing could be written in a language.
Like anything dabble and learn
Each is used for different purposes. Programs written in scripting languages are often not self-contained; they often function as "glue code" or (as Robert Harvey mentions) to automate a task. You often find scripting language interpreters embedded within an application (cf Python in Blender; Guile, Perl and Python in GIMP; JS in umpteen different browsers; Lua in countless games). Compiled languages, on the other hand, are used to produce self-contained applications. Scripts are mostly cross-platform; compiled applications usually aren't.
Note that a scripting language doesn't necessarily use an interactive interpreter (e.g. Perl), and an interpreted language isn't necessarily use for scripts (e.g. games made using PyGame). Note also that there's nothing about the languages themselves that make them interpreted or compiled. You could have a C# interpreter or a Ruby compiler. There have been a number of Lisp systems that offered both interpreters and compilers.
I would call my shell (bash) a scripting language, and I don't see a replacement comming, which is compiled.
I like to use scala, which is a statically typed language which comes with an interpreter-like REPL-interface, and due to type interference looks pretty much like a scripting language; have a look here: http://www.simplyscala.com/ .
But it isn't meant to be the glue between other programs as the shell is, so for small jobs, which are easily verified by hand and eye, which are just a few lines of code, I prefer to use the shell. And jumping from directory to directory is comfortable in a shell, where the prompt shows where I am.
Before we begin, I don't think that I've ever met a static language user who "got" scripting language without trying them, including myself. It is a different experience.
So no. Basically, you can add features to static languages which makes them superficially seem like scripting languages (like simple type inference), but its not the same:
Many scripting language users hate static languages. They feel constrained. Scripting languages are typically very good at not getting in the users way, which is sacrificed in static languages for speed/correctness.
Duck typing will not appear in static languages.
Scripting language users don't like type annotations. Its not really possible to provide a type-inference system for scripting languages, and the simple type inference appearing in some languages now only works for static types.
Techniques like monkey patching (which to my mind is a very bad idea) is pervasive in Ruby, and allows for very powerful techniques, which won't become available soon in static languages either.
Which isn't to say that a yet-to-be-designed language can't handle scripting language features in a relatively static way, but it would be difficult for it to become popular relative to the entrenched Python/PHP/Perl/Ruby/Javascript set. Factor is the closest thing, AFAICT.
What will happen is that scripting language implementations will get faster by using JITs.
Can a screw driver replace a hammer ? No, because you just don't use them for the same purpose. And if both exist, and if such a lot of people use either one or the other, there must be a reason...
Same anwser for :
class inheritance vs prototype;
imperative vs oo;
static vs dynamic typing;
strongly vs weakly typed;
manual memory management vs GC;
C# vs Java;
blue vs red;
man vs woman;
batman vs superman (but I do think superman would win... wait, there is kryptonite... oh man, I don't know...)
etc...
Because it is shorter to write since it is a higher level language, and it doesn't need the compilation cycle which also makes thing shorter.
I am asking why use a scripting
language? I find static compiled
languages much easier to debug and
often easier to code in.
Because I find loosely-typed dynamic languages without an explicit compile-run cycle much easier to debug and generally easier to code in.

Does anyone know of any VB.Net code generation plug-ins for StarUML?

Simple question really. No elaboration required.
I have found similar tools for Java, C#, C++ and other languages but not VB.Net.
Thanks
Unfortunately, I don't know of a plugin for StarUML that will generate VB.NET code, however, if using a different modelling tool is a possibility, Sparx System's Enterprise Architect will generate code in VB.NET (as well as many other languages).
Enterprise Architect enables you to
generate source code from UML model
elements, creating a source code
equivalent of the Class or Interface
element for future elaboration and
compilation. In particular you can
generate C, C++, C#, Delphi, Java,
PHP, Python, ActionScript, Visual
Basic and VB.NET source code. The
source code generated includes Class
definitions, variables and function
stubs for each attribute and method in
the UML Class.
Unfortunately, Enterprise Architect is not free nor is it open source, however, there is a full functional 30 day trial available, along with discount options for academic pricing (if appropriate).
If you're looking to stay with free and/or open source software, I think the closest for VB.NET code generation would be to use a couple of tools in conjunction with each other:
Dia is an open source, GTK+ based diagram creation program for Linux, Unix and Windows released under the GPL license. You can use this to create your UML diagrams. From there, you would use:
Project: CodeGen from Novell, which will take the UML diagrams saved/exported from Dia and convert them into skeleton C# or VB.NET code.
If you're prepared to look at a more "general purpose" code generator, this page:
Generators that build VB.NET
will give you a good starting list of tools that will generate VB.NET code.

Has Lua a future as a general-purpose scripting language?

As already discussed in "Lua as a general-purpose scripting language?" Lua currently probably isn't the best scripting language for the desktop environment.
But what do you think about the future? Will Lua get so popular that there will soon be enough libraries to be able to use it like Python, Ruby or something similar?
Or will it simply stay in it's WoW niche and that's it?
I think it has a great future, a lot of projects are starting to adopt it for it's simplicity and usefulness.
Example: Awesome WM (Window Manager)
The project recently released version 3, incorporating a new configuration system completely written in Lua. Allowing you to literally write your configuration file as a program, loops, booleans, data structures.
Personally I love the syntax and the flexibility of such a system, I think it has great potential.
I wouldn't be surprised if it became more popular in the future.
Brian G
I suppose the answer starts with 'It depends how you want to use it...'.
If you're writing the common business app (fetch the data from the database, display the data in a web page or window, save the data to the database), Lua already has what you need.
The Kepler Project contains goodies for web development. Check out their modules to see some of the available libraries - there's network, MVC, DBMS access, XML, zip, WSAPI, docs...
As an example web app, check out Sputnik.
For desktop UI, there's wxLua - Lua hooks for wxWidgets.
ORM is conspicuously missing but that didn't stop people from developing in other languages before ORM was available.
If you're looking for specialized libraries - scientific, multimedia , security - don't count Lua out before you check LuaForge.
When it comes down to it, there's nothing in Lua's design that prevents general purpose use. It just happens to be small, fast, and easy to embed... so people do.
Uh? I would say instead WoW is a niche in the Lua ecosystem... The world of Lua doesn't revolve around WoW, there are lot of applications, some big like Adobe Lightroom (to take a non game), using Lua.
Lua is initially a scripting language, in the initial sense, ie. made to be embedded in an application to script it. But it is also designed as an extensible language, so we will see progressively more and more bindings of various libraries for various purposes.
But you will never get an official big distribution with batteries included, like Python or Perl, because it is just not the philosophy of the authors.
Which doesn't prevent other people to make distributions including lot of features out of the box (for Windows, particularly, where it is difficult to build the softwares).
Lot of people already use it for general system-level scripting, desktop applications, and such anyway.
There are more and more libraries for Lua.
If you are a Windows user, have look at Lua for Windows. It comes with "batteries included" (wxLua, LuaCURL, LuaUnit, getopt, LuaXML, LPeg...).
Very usefull!
It's 2017, 9 years after this question was first asked, and lua is now being heavily used in the field of machine learning due to the Torch library.
I really like it as an embedded language. It's small, very easy to use and embed and mostly does what I need right out of the box. It's also similar enough to most languages that it has never really been an issue for me. I also like how easy it is to redefine and add base functions and keywords to the language to suit whatever needs my application has.
I have used it in the WoW area but I've also found it useful as a generic scripting language for a number of different applications I've worked on, including as a type of database trigger. I like Ruby and Python and other more full-featured scripting languages but they're not nearly as convenient for embedding in small applications to give users more options for customizing their environments.
being comfortable as a shell language has nothing to do with being a great general purpose language.
i, for one, don't use it embedded in other applications; i write my applications in Lua, and anything 'extra' is a special-purpose library, either in Lua or in C.
Also, being 'popular' isn't so important. in the Lua-users list periodically someone appears that says "Lua won't be popular unless it does X!", and the usual answer is either: "great!, write it!", or "already discussed and rejected".
I think the great feature of Lua is, that it is very easily extensible. It is very easy to add the Lua interpreter to a program of your own (e.g. one written in C, C++ or Obj-C) and with just a few lines of code, you can give Lua access to any system resource you can think of. E.g. Lua offers no function to do xxx. Write one and make it available to Lua. But it's also possible the other way round. Write your own Lua extension in a language of your choice (one that is compilable), compile it into a native library, load the library within Lua and you can use the function.
That said, Lua might not be the best choice as a standalone crossplatform language. But Lua is a great language to add scripting support to your application in a crossplatform manner (if your app is crossplatform, the better!). I think Lua will have a future and I think you can expect that this language will constantly gain popularity in the long run.
Warhammer Online, and World of Warcraft use it for their addon language I believe.
I think it's hot! I'm just no good at it!
Well, greetings from 2022.
It is already a general purpose language. Today you can even serve pages using OpenResty, extend games, read databases or create scripts as shellscript replacements.
There are a plenty of libraries "modules" for Lua, many ways to achieve what you are wanting and Lua 5.4 is even faster.
The "extendable and extensive" nature of Lua, accostumed people to think it should only be used as plugin or extension. In Linux, by example, you can shebang a file with lua-any, make it executable and run like any system script. Or you can make a folder app like Python or virtualenv using Lupe. Lua 5.3 also gained impressive performance improvements.
Also there are many good tools like IUP to create native windows in Lua for Mac, BSD, Linux and Windows and side environments like Terra that lets you use Lua with its counterpart Terra and write compiled programs. Lua now, is more than a extension language, it has its own universe.

Why do we need other JVM languages

I see here that there are a load of languages aside from Java that run on the JVM. I'm a bit confused about the whole concept of other languages running in the JVM. So:
What is the advantage in having other languages for the JVM?
What is required (in high level terms) to write a language/compiler for the JVM?
How do you write/compile/run code in a language (other than Java) in the JVM?
EDIT: There were 3 follow up questions (originally comments) that were answered in the accepted answer. They are reprinted here for legibility:
How would an app written in, say, JPython, interact with a Java app?
Also, Can that JPython application use any of the JDK functions/objects??
What if it was Jaskell code, would the fact that it is a functional language not make it incompatible with the JDK?
To address your three questions separately:
What is the advantage in having other languages for the JVM?
There are two factors here. (1) Why have a language other than Java for the JVM, and (2) why have another language run on the JVM, instead of a different runtime?
Other languages can satisfy other needs. For example, Java has no built-in support for closures, a feature that is often very useful.
A language that runs on the JVM is bytecode compatible with any other language that runs on the JVM, meaning that code written in one language can interact with a library written in another language.
What is required (in high level terms) to write a language/compiler for the JVM?
The JVM reads bytecode (.class) files to obtain the instructions it needs to perform. Thus any language that is to be run on the JVM needs to be compiled to bytecode adhering to the Sun specification. This process is similar to compiling to native code, except that instead of compiling to instructions understood by the CPU, the code is compiled to instructions that are interpreted by the JVM.
How do you write/compile/run code in a language (other than Java) in the JVM?
Very much in the same way you write/compile/run code in Java. To get your feet wet, I'd recommend looking at Scala, which runs flawlessly on the JVM.
Answering your follow up questions:
How would an app written in, say, JPython, interact with a Java app?
This depends on the implementation's choice of bridging the language gap. In your example, Jython project has a straightforward means of doing this (see here):
from java.net import URL
u = URL('http://jython.org')
Also, can that JPython application use any of the JDK functions/objects?
Yes, see above.
What if it was Jaskell code, would the fact that it is a functional language not make it incompatible with the JDK?
No. Scala (link above) for example implements functional features while maintaining compatibility with Java. For example:
object Timer {
def oncePerSecond(callback: () => unit) {
while (true) { callback(); Thread sleep 1000 }
}
def timeFlies() {
println("time flies like an arrow...")
}
def main(args: Array[String]) {
oncePerSecond(timeFlies)
}
}
You need other languages on the JVM for the same reason you need multiple programming languages in general: Different languages are better as solving different problems ... static typing vs. dynamic typing, strict vs. lazy ... Declarative, Imperative, Object Oriented ... etc.
In general, writing a "compiler" for another language to run on the JVM (or on the .Net CLR) is essentially a matter of compiling that language into java bytecode (or in the case of .Net, IL) instead of to assembly/machine language.
That said, a lot of the extra languages that are being written for JVM aren't compiled, but rather interpreted scripting languages...
Turning this on its head, consider you want to design a new language and you want it to run in a managed runtime with a JIT and GC. Then consider that you could:
(a) write you own managed runtime (VM) and tackle all sorts of technically difficult issues that will doubtless lead to many bugs, bad performance, improper threading and a great deal of portability effort
or
(b) compile your language into bytecode that can run on the Java VM which is already quite mature, fast and supported on a number of platforms (sometimes with more than one choice of vendor impementation).
Given that the JavaVM bytecode is not tied so closely to the Java language as to unduly restrict the type of language you can implement, it has been a popular target environment for languages that want to run in a VM.
Java is a fairly verbose programming language that is getting outdated very quickly with all of the new fancy languages/frameworks coming out in the past 5 years. To support all the fancy syntax that people want in a language AND preserve backwards compatibility it makes more sense to add more languages to the runtime.
Another benefit is it lets you run some web frameworks written in Ruby ala JRuby (aka Rails), or Grails(Groovy on Railys essentially), etc. on a proven hosting platform that likely already is in production at many companies, rather than having to using that not nearly as tried and tested Ruby hosting environments.
To compile the other languages you are just converting to Java byte code.
I would answer, “because Java sucks” but then again, perhaps that's too obvious … ;-)
The advantage to having other languages for the JVM is quite the same as the advantage to having other languages for computer in general: while all turing-complete languages can technically accomplish the same tasks, some languages make some tasks easier than others while other languages make other tasks easier. Since the JVM is something we already have the ability to run on all (well, nearly all) computers, and a lot of computers, in fact already have it, we can get the "write once, run anywhere" benefit, but without requiring that one uses Java.
Writing a language/compiler for the JVM isn't really different from writing one for a real machine. The real difference is that you have to compile to the JVM's bytecode instead of to the machine's executable code, but that's really a minor difference in the grand scheme of things.
Writing code for a language other than Java in the JVM really isn't different from writing Java except, of course, that you'll be using a different language. You'll compile using the compiler that somebody writes for it (again, not much different from a C compiler, fundamentally, and pretty much not different at all from a Java compiler), and you'll end up being able to run it just like you would compiled Java code since once it's in bytecode, the JVM can't tell what language it came from.
Different languages are tailored to different tasks. While certain problem domains fit the Java language perfectly, some are much easier to express in alternative languages. Also, for a user accustomed to Ruby, Python, etc, the ability to generate Java bytecode and take advantage of the JDK classes and JIT compiler has obvious benefits.
Answering just your second question:
The JVM is just an abstract machine and execution model. So targetting it with a compiler is just the same as any other machine and execution model that a compiler might target, be it implemented in hardware (x86, CELL, etc) or software (parrot, .NET). The JVM is fairly simple, so its actually a fairly easy target for compilers. Also, implementations tend to have pretty good JIT compilers (to deal with the lousy code that javac produces), so you can get good performance without having to worry about a lot of optimizations.
A couple of caveats apply. First, the JVM directly embodies java's module and inheritance system, so trying to do anything else (multiple inheritance, multiple dispatch) is likely to be tricky and require convoluted code. Second, JVMs are optimized to deal with the kind of bytecode that javac produces. Producing bytecode that is very different from this is likely to get into odd corners of the JIT compiler/JVM which will likely be inefficient at best (at worst, they can crash the JVM or at least give spurious VirtualMachineError exceptions).
What the JVM can do is defined by the JVM's bytecode (what you find in .class files) rather than the source language. So changing the high level source code language isn't going to have a substantial impact on the available functionality.
As for what is required to write a compiler for the JVM, all you really need to do is generate correct bytecode / .class files. How you write/compile code with an alternate compiler sort of depends on the compiler in question, but once the compiler outputs .class files, running them is no different than running the .class files generated by javac.
The advantage for these other languages is that they get relatively easy access to lots of java libraries.
The advantage for Java people varies depending on language -- each has a story tell Java coders about what they do better. Some will stress how they can be used to add dynamic scripting to JVM-based apps, others will just talk about how their language is easier to use, has a better syntax, or so forth.
What's required are the same things to write any other language compiler: parsing to an AST, then transforming that to instructions for the target architecture (byte code) and storing it in the right format (.class files).
From the users' perspective, you just write code and run the compiler binaries, and out comes .class files you can mix in with those your java compiler produces.
The .NET languages are more for show than actual usefulness. Each language has been so butchered, that they're all C# with a new face.
There are a variety of reasons to provide alternative languages for the Java VM:
The JVM is multiplatform. Any language ported to the JVM gets that as a free bonus.
There is quite a bit of legacy code out there. Antiquated engines like ColdFusion perform better while offering customers the ability to slowly phase their applications from the legacy solution to the modern solution.
Certain forms of scripting are better suited to rapid development. JavaFX, for example, is designed with rapid Graphical development in mind. In this way it competes with engines like DarkBasic. (Processing is another player in this space.)
Scripting environments can offer control. For example, an application may wish to expose a VBA-like environment to the user without exposing the underlying Java APIs. Using an engine like Rhino can provide an environment that supports quick and dirty coding in a carefully controlled sandbox.
Interpreted scripts mean that there's no need to recompile anything. No need to recompile translates into a more dynamic environment. e.g. Despite OpenOffice's use of Java as a "scripting language", Java sucks for that use. The user has to go through all kinds of recompile/reload gyrations that are unnecessary in a dynamic scripting environment like Javascript.
Which brings me to another point. Scripting engines can be more easily stopped and reloaded without stopping and reloading the entire JVM. This increases the utility of the scripting language as the environment can be reset at any time.
It's much easier for a compiler writer to generate JVM or CLR byte-codes. They are a much cleaner and higher level abstraction than any machine language. Because of this, it is much more feasible to experiment with creating new languages than ever before, because all you have to do is target one of these VM architectures and you will have a set of tools and libraries already available for your language. They let language designers focus more on the language than all the necessary support infrastructure.
Because the JSR process is rendering Java more and more dead: http://www.infoq.com/news/2009/01/java7-updated
It's a shame that even essential and long known additions like Closures are not added just because the members cannot agree on an implementation.
Java has accumulated a massive user base over seven major versions (from 1.0 to 1.6). Its capability to evolve is limited by the need to preserve backwards compatibility for the uncountable millions of lines of Java code running in production.
This is a problem because Java needs to evolve to:
compete with newer programming languages that have learned from Java's successes and failures.
incorporate new advances in programming language design.
allow users to take full advantage of advances in hardware - e.g. multi-core processors.
fix some cutting edge ideas that introduced unexpected problems (e.g. checked exceptions, generics).
The requirement for backwards compatibility is a barrier to staying competitive.
If you compare Java to C#, Java has the advantage in mature, production ready libraries and frameworks, and a disadvantage in terms of language features and rate of increase in market share. This is what you would expect from comparing two successful languages that are one generation apart.
Any new language has the same advantage and disadvantage that C# has compared to Java to an extreme degree. One way of maximizing the advantage in terms of language features, and minimizing the disadvantage in terms of mature libraries and frameworks is to build the language for an existing virtual machine and make it interoperable with code written for that virtual machine. This is the reason behind the modest success of Groovy and Clojure; and the excitement around Scala. Without the JVM these languages could only ever have occupied a tiny niche in a very specialized market segment, whereas with the JVM they occupy a significant niche in the mainstream.
They do it to keep up with .Net. .Net allows C#, VB, J# (formerly), F#, Python, Ruby (coming soon), and c++. I'm probably missing some. Probably the big one in there is Python, for the scripting people.
To an extent it is probably an 'Arms Race' against the .NET CLR.
But I think there are also genuine reasons for introducing new languages to the JVM, particularly when they will be run 'in parallel', you can use the right language for the right job, a scripting language like Groovy may be exactly what you need for your page presentation, whereas regular old Java is better for your business logic.
I'm going to leave someone more qualified to talk about what is required to write a new language/compiler.
As for how to writing code, you do it in notepad/vi as usual! (or use a development tool that supports the language if you want to do it the easy way.) Compiling will require a special compiler for the language that will interpret and compile it into bytecode.
Since java also produces bytecode technically you don't need to do anything special to run it.
The reason is that the JVM platform offers a lot of advantages.
Giant number of libraries
Broader degree of platform
implementations
Mature frameworks
Legacy code that's
already part of your infrastructure
The languages Sun is trying to support with their Scripting spec (e.g. Python, Ruby) are up and comers largely due to their perceived productivity enhancements. Running Jython allows you to, in theory, be more productive, and leverage the capabilities of Python to solve a problem more suited to Python, but still be able to integrate, on a runtime level, with your existing codebase. The classic implementations of Python and Ruby effect the same ability for C libraries.
Additionally, it's often easier to express some things in a dynamic language than in Java. If this is the case, you can go the other way; consume Python/Ruby libraries from Java.
There's a performance hit, but many are willing to accept that in exchange for a less verbose, clearer codebase.