How does Parrot compare to other virtual machines? - jvm

Parrot is the virtual machine originally designed for Perl 6.
What technical capabilities does the Parrot VM offer that competing virtual machines such as the Java Virtual Machine (JVM)/Hotspot VM and Common Language Runtime (CLR) lack?

The following answer was written in 2009. See also this 2015 update by raiph.
To expand on #Reed and point out some highlights, Parrot's opcodes are at a far higher level than most virtual machines. For example, while most machines store integers and floats, the basic registers are integers, numbers, strings and Parrot Magic Cookies (PMCs). Just having strings built in is a step up from the JVM.
More interesting is the PMC, sort of like JVM's object type but far more fungible. PMCs are a container for all the other more complicated types you need in a real language like arrays, tables, trees, iterators, I/O etc. The PMC and the wide variety of built in ops for it means less work for the language writer. Parrot does not shy away from the messy but necessary bits of implementing a language.
My information may be out of date, but I believe opcodes are pluggable, you can ship a Parrot VM that only contains the opcodes your language needs. They were also going to be inheritable, if your language wants their arrays to work a little different from stock Parrot arrays you can do that subclass it.
Finally, Parrot can be written for not just in assembler (PASM) but also a slightly higher level language, Parrot Intermediate Representation (PIR). PIR has loops, subroutines, localized variables and some basic math and comparison ops, all the basics people expect in a programming language, without getting too far away from the metal.
All in all, Parrot is very friendly to language designers (it is written by and for them) who want to design a language and leave as much of the implementation as possible to somebody else.

You can read about much of this on the Parrot VM Intro page.
The main advantage Parrot has over the JVM or the CLR would be that it is designed to support dynamic languages first, and potentially provide better support and performance for dynamically typed languages. The JVM and the CLR are both geared more towards supporting statically typed languages, and many of the design decisions show that.

Parrot is the virtual machine originally designed for Perl 6.
There are now two VMs originally designed for Perl 6; commits to MoarVM began in 2012.
What technical capabilities does the Parrot VM offer that competing virtual machines such as the Java Virtual Machine (JVM)/Hotspot VM and Common Language Runtime (CLR) lack?
In another answer on this page, Reini Urban, the current (April 2015) Parrot lead dev, provides a brief comparison of Parrot with the JVM and CLR VM.
According to Reini, a key advantage Parrot has over MoarVM is "effectively lock-less threads".

One other thing that makes Parrot different from most VMs (certainly different from the JVM), is that it's a register machine rather than a stack machine. But I think people will be arguing for a long long time whether that can be called an advantage or a disadvantage.

I don't know JVM and CLR enough, but my tips:
dynamic languages (closures, polymorphic scalars, continuations, co-routines) support (speed)
dynamic method dispatch,
first class functions,
first-class continuations,
parameters (optional, named, ..),
register based
has HLL interoperability provided at an assembly level
range of platforms
Update: This is probably irrelevant as JVM is one of Rakudo Perl 6 backends nowadays. See Rakudo Perl 6 on the JVM (Perl 6 Advent calendar 2013, Day 3).

The main advantage and technical difference over the JVM and the CLR is that types (classes called PMC's) and ops (methods) may be dynamically loaded from efficient user-provided C implementations, and the parser framework to create and extend languages is built-in.

This question is outdated. Rakudo Perl 6 no longer targets Parrot as a backend; MoarVM is the preferred backend, with the JVM backend a work in progress (generally works, but many Perl 6 features not implemented or currently broken). Development work (not ready for users) is being done to add Javascript as a third backend.

Related

How does language design influence VM and bytecode design?

For example, how did the design of C# and VB.NET shape the development of CIL (and vice-versa)? What about Java and the JVM? How did the nature of PHP affect the development of HHBC/the HHVM, or Perl and Parrot, or Smalltalk and the VMs for various implementations?
Language design will influence the VM if the designers want it to. Some VMs are more independent than others. For example, Java does not have multiple inheritance, so JVM does not either.
Generally, a language machine (such as the Java Virtual Machine or the .NET CLR) will closely reflect the requirements of the language (Java for the JVM, C# for the CLR) for which it was designed.
For example, pretty much every Java byte code in the original JVM v1.0 was needed by the compiler. One could suggest that the needs of the JavaC compiler author(s) were being provided on demand by the JVM author(s). (It was a small team, so it may have even been the same person.)
The CLR is a bit different, because in addition to C#, they jammed in some stuff to support a pretend-C++ language, which required at least 3 additional op codes (IIRC). Nonetheless, the CLR was pretty much designed just to support C#.
It's interesting to analyze the Android Davlik engine, since it was designed as a JVM-but-without-using-JVM-byte-codes engine. (It is also register based, instead of stack based.)
At some level, the primary decision becomes this: Whether the engine is a low level Turing complete machine (something like a software RISC machine), or whether the engine's primitive language (its IL) is simply a binary form of its primary source code language. The former is more like WASM (arguably general purpose), while the latter is more like the JVM and CLR specs.

Matching a virtual machine design with its primary programming language

As background for a side project, I've been reading about different virtual machine designs, with the JVM of course getting the most press. I've also looked at BEAM (Erlang), GHC's RTS (kind of but not quite a VM) and some of the JavaScript implementations. Python also has a bytecode interpreter that I know exists, but have not read much about.
What I have not found is a good explanation of why particular virtual machine design choices are made for a particular language. I'm particularly interested in design choices that would fit with concurrent and/or very dynamic (Ruby, JavaScript, Lisp) languages.
Edit: In response to a comment asking for specificity here is an example. The JVM uses a stack machine rather then a register machine, which was very controversial when Java was first introduced. It turned out that the engineers who designed the JVM had done so intending platform portability, and converting a stack machine back into a register machine was easier and more efficient then overcoming an impedance mismatch where there were too many or too few registers virtual.
Here's another example: for Haskell, the paper to look at is Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine. This is very different from any other type of VM I know about. And in point of fact GHC (the premier implementation of Haskell) does not run live, but is used as an intermediate step in compilation. Peyton-Jones lists no less then 8 other virtual machines that didn't work. I would like to understand why some VM's succeed where other fail.
I'll answer your question from a different tack: what is a VM? A VM is just a specification for "interpreter" of a lower level language than the source language. Here I'm using the black box meaning of the word "interpreter". I don't care how a VM gets implemented (as a bytecode intepereter, a JIT compiler, whatever). When phrased that way, from a design point of view the VM isn't the interesting thing it's the low level language.
The ideal VM language will do two things. One, it will make it easy to compile the source language into it. And two it will also make it easy to interpret on the target platform(s) (where again the interpreter could be implemented very naively or could be some really sophisticated JIT like Hotspot or V8).
Obviously there's a tension between those two desirable properties, but they do more or less form two end points on a line through the design space of all possible VMs. (Or, perhaps some more complicated shape than a line because this isn't a flat Euclidean space, but you get the idea). If you build your VM language far outside of that line then it won't be very useful. That's what constrains VM design: putting it somewhere into that ideal line.
That line is also why high level VMs tend to be very language specific while low level VMs are more language agnostic but don't provide many services. A high level VM is by its nature close to the source language which makes it far from other, different source languages. A low level VM is by its nature close to the target platform thus close to the platform end of the ideal lines for many languages but that low level VM will also be pretty far from the "easy to compile to" end of the ideal line of most source languages.
Now, more broadly, conceptually any compiler can be seen as a series of transformations from the source language to intermediate forms that themselves can be seen as languages for VMs. VMs for the intermediate languages may never be built, but they could be. A compiler eventually emits the final form. And that final form will itself be a language for a VM. We might call that VM "JVM", "V8"...or we might call that VM "x86", "ARM", etc.
Hope that helps.
One of the techniques of deriving a VM is to just go down the compilation chain, transforming your source language into more and more low level intermediate languages. Once you spot a low level enough language suitable for a flat representation (i.e., the one which can be serialised into a sequence of "instructions"), this is pretty much your VM. And your VM interpreter or JIT compiler would just continue your transformations chain from the point you selected for a serialisation.
Some serialisation techniques are very common - e.g., using a pseudo-stack representation for expression trees (like in .NET CLR, which is not a "real" stack machine at all). Otherwise you may want to use an SSA-form for serialisation, as in LLVM, or simply a 3-address VM with an infinite number of registers (as in Dalvik). It does not really matter which way you take, since it is only a serialisation and it would be de-serialised later to carry on with your normal way of compilation.
It is a bit different story if you intend to interpret you VM code immediately instead of compiling it. There is no consensus currently in what kind of VMs are better suited for interpretation. Both stack- (or I'd dare to say, Forth-) based VMs and register-based had proven to be efficient.
I found this book to be helpful. It discusses many of the points you are asking about. (note I'm not in any way affiliated with Amazon, nor am I promoting Amazon; just was the easiest place to link from).
http://www.amazon.com/dp/1852339691/

Why do almost all OO languages compile to bytecode?

Of the object-oriented languages I know, pretty much all but C++ and Objective-C compile to bytecode running on some sort of virtual machine. Why have so many different languages settled on compiling to bytecode, as opposed to machine code? Is it possible in princible to have a high-level memory-managed OOP language that compiled to machine code?
Edit: I'm aware that multiplatform support is often advanced as an advantage of this approach. However, it's quite possible to compile natively on multiple platforms, without making a new compiler per platform. One can, per example, emit C code and then compile that with GCC.
There's no reason in fact, this is a kind of coincidence. OOP now is the leading concept in "big" programming, and so virtual machines are.
Also note, that there are 2 distinct parts of traditional virtual machines - garbage collector and bytecode interpreter/JIT-compiler, and these parts can exist separately. For example, Common Lisp implementation called SBCL compiles program to a native code, but at runtime heavily uses garbage collection.
This is done to allow a VM or JIT compiler the chance to compile the code on demand optimally for the architecture on which the code is executed. Also, it allows for cross-platform bytecode to be created once and then executed on multiple hardware architectures. This allows for hardware specific optimizations to be placed into the compiled code.
Since byte code is not limited to a microarchitecture, it can be smaller than machine code. Complex instructions can be represented vs. the much more primitive instructions available in modern day CPUs, since the constraints in the design of CPU instructions are very different from the constraints in designing a bytecode architecture.
Then there's the issue of security. The bytecode can be verified and analyzed prior to execution (i.e., no buffer overflows, variables of a certain type being accessed as something they are not), etc...
Java uses bytecode because two of its initial design goals were portability and compactness. Those both came from the initial vision of a language for embedded devices, where fragments of code could be downloaded on the fly.
Python, Ruby, Smalltalk, javascript, awk and so on use bytecode because writing a native compiler is a lot of work, but a textual interpreter is too slow - bytecode hits a sweet spot of being fairly easy to write, but also satisfactorily quick to run.
I have no idea why the Microsoft languages use bytecode, since for them, neither portability nor compactness is a big deal. A lot of the thinking behind the CLR came out of computer scientists in Cambridge, so i imagine considerations like ease of program analysis and verification were involved.
Note that as well as C++ and Objective C, Eiffel, Ada 9X, Vala and Go are OO languages (of varying vintage) that are compiled straight to native code.
All in all, i'd say that OO and bytecode do not go hand in hand. Rather, we have a coincidental convergence of several streams of development: the traditional bytecoded interpreters of scripting languages like Python and Ruby, the mad Gosling masterplan of Java, and whatever it is Microsoft's motives are.
The biggest reason why most interpreted languages (not specifically OO languages) are compiled to bytecode is for performance. The most expensive part of interpreting code is transforming text source to an intermediate representation. For instance, to perform something like:
foo + bar;
The interpreter would have to scan 10 characters, transform them into 4 tokens, build an AST for the operation, resolve three symbols (+ is a symbol, which depends on the types of foo and bar), all before it can perform any action that actually depends on the run-time state of the program. None of this can change from run to run, and so many languages try to store some form of intermediate representation.
bytecode, rather than storing an AST has a few advantages. For one, bytecodes are easy to serialize, so the IR can be written to disk and reused at the next invocation, further reducing interpretation time. Another reason is that bytecode often takes up less actual ram. significantly bytecode representations are often easy to just in time compile, because they are often structurally similar to typical machine code.
As another data point, the D programming language is GC'ed, OO, and a lot higher level than C++ while still being compiled to native code.
Bytecode is significantly more flexible medium than machine code. First, it provides the basis for platform portability without the need for a compiler or shipping source code. So a developer can distribute a single version of the application without needing to give up the source, require complex developer tools, or anticipate potential target platforms. While the later is not always practical it does happen. Especially with developer libraries say I distribute a library that I've only tested on Windows, but someone else uses it on Linux or Android. It happens quite frequently actually, and most of the time it works as expected.
Byte code is also generally more optimized that an interpreter because it's closer to machine instructions therefore faster to translate to machine instructions. Not all OO languages are compiled. Ruby, Python, and even Javascript are interpreted so they aren't compiled to anything so the ruby interpreter has to take a very flexible language and turn that into instructions, but that flexibility comes at a price paid an runtime: parse text, generate AST, translate AST to machine code, etc. It's also easy to do optimizations like JIT where byte code is translated to machine code directly, and even gives the possibility for creating optimizations for specific hardware.
Finally, just because one language compiles to bytecode doesn't preclude other languages taking advantage of of that byte code. Now any optimization using that byte code can be applied to these other languages that might know how to translate themselves to that byte code. That makes the byte code a very important layer for reusability for other languages.
OO and byte code compilation goes back to the 70s with Smalltalk, and I'm sure someone will say LISP as early as the 50s/60s. But, it really wasn't until the 90s that it started to really be used in production systems on a large scale.
Native compilation sounds like the optimal path, and probably why our industry spent 20 years or more thinking that was THE ANSWER to all our problems, but the last 15 years we've seen byte code compilation take stage and it's been a significant advantage over what we did before. Looking back we realize how much time wasted natively compiling everything mostly by hand.
I agree with Chubbard's answer and I'd add that in OO languages type information can be very important for enabling optimizations by virtual-machines or last-level compilers
It is easier to develop an interpreter than a compiler.
Effort in development of...:
interpreter < bytecode-interpreter < bytecode-jit-compiler < compiler-to-platform-independent-language < compiler-to-multiple-machine-dependent-assembler.
It is a general trend to stop the development at jit-compilers because of platform independence. Only the preferred languages in respect to performance and research in theoretical computer science are and will be developed in ALL possible directions, including new bytecode-interpreter, even while there are good and advanced compilers to platform independent languages and to different machine-dependant assemblers.
The research in OOP languages is pretty ...let's say dull, compared to functional languages, because really new language and compiler technologies are more easily expressed with/in/using mathematical cathegory theory and mathematical descriptions of touring-complete type-systems. In other words: it is nearly functional in itself, while imperative languages are nearly only assembler-frontends with some syntactic sugar. OOP languages tend to be imperative languages, because functional languages have already closures and lambda. There are other ways to implement java-like "interfaces" in functional languages, and there is just no need for additional object oriented features.
In i.e. Haskell, adding the feature of OOP-like programming would probably be more than only a few steps back in technology – there would be no point in using that. (<- that is not only IMHO... you ever heard of GADTs or Multi-parameter-type-classes?) Probably there might be even better ways to dynamically create Objects with Interfaces to communicate with OOP-languges than changing that language itself. But there are other functional languages, too, that explicitely combine functional and OOP aspects. There is just more science with mainly functional languages than non-functional OO-languages.
OO languages can not be easily compiled to other OO languages, iff they are in some way more "advanced". Usually, they have features like stack-protector, advanced debugging abilities, abstract and inspectable multi-threading, dynamic object-loading from files from the internet... Many of these features are not or not-easily realisable with C or C++ as compiler-backend. The functional language LISP (which is 50 years old!) was AFAIK the first with garbage collector. As compiler-backend LISP used a hacked version of the language C, because plain C did not allow some of those things, assembler did allow, i.e. proper-tail-calls or tables-next-to-code. C-- allows that.
An other aspect: Imperative languages are intended to run on a specific architecture, i.e. C and C++ programs run on only those architectures, they are programmed for. Java is more extreme: it runs only on a single architecture, a virtual one, which itself runs on others.
Functional languages are usually by design pretty architecture-independent: LISP was developed to be so immense architecture-unspecific, that it could be compiled to genetic code, in some distant future. Yes, like programs running in living biologic cells.
With the bytecode for the LLVM, functional languages will most-likely be compiled to bytecode in the future, too. Most imperative languages will most likely still have the same inherited problems as they have now from not-abstracting-far-enough. Well, I'm not that sure about clang and D, but those two are not "the most" anyway.

What are alternatives to the Java VM?

As Oracle sues Google over the Dalvik VM it becomes clear, that you cannot implement a Java VM without license from Oracle (EDIT: Matthew Flaschen points out, that the claims of Oracle may not be valid. Anyways we have currently a situation, where Oracle threats VM-implementations.). That may become the death for Open-Source-implementations of Java (like Apache Harmony).
I don't want to discuss the impact or the legitimation of this lawsuit. but as a Java-programmer I want to take a deeper look into the alternatives, to be prepared for every case. As I see the creation of a compiler as a minor problem, my main interest are alternative VM-implementations, that serve a similar purpose as the JVM.
The VM I'm looking for, should meet some conditions:
free of patent-issues
an Open-Source-implementation exists
potential for optimizations/good performance
platform independent (the VM can be ported to different platforms without bigger hurdles)
Please add some recommendations for me.
LLVM is a really good optimizing, low level virtual machine. It can support languages like C and C++, and does not have built in support for high level features like garbage collection.
VMKit is an implementation of the Java and CLI virtual machines on top of LLVM. Since it uses Java bytecode, this probably wouldn't help with the patent issues.
HLVM is another interesting high level virtual machine built on top of LLVM. It is probably different enough to avoid most well known patents, but it is mainly targeted at numerical computing and functional programming.
On the dynamically typed side, there is Parrot.
I am actually working on a compiler and VM for a language of my own design, but don't count on it ever being finished. ;-)
Keep in mind that any large piece of software will infringe on numerous patents, the important thing is how well known they are (and how much the patents' owners actively seek out infringers). Of course, the whole patent system is absurd, and we would be much better off getting rid of it.
I don't think there is any significant piece of software that is free from patent issues.
If you are an independent developer or working for a smaller company you probably won't get hit directly by the problems though. It's unlikely that big companies holding patents will go after lots of small claims - it's an expensive process and causes a lot of resentment. SCO tried something like that and it didn't work out too well for them.
I would concentrate on finding the best tool for the job without worrying too much about the patent issues, otherwise you will never get anything done.
GraalVM is a research project developed by Oracle Labs and already in production at Twitter. I can't believe my eyes that no one mentions anything about it, it’s so weird. Anyways, GraalVM is a well promising extension of the java virtual machine to support more language and execution modes for running applications like JavaScript, Python, Ruby, R, JVM-based languages, and LLVM-based languages such as C and C++.The GraalVM project includes a new high-performance Java compiler, itself called Graal, which can be used in a just-in-time configuration on the HotSpot VM, or in an ahead-of-time configuration on the SubstrateVM. The main goal of this project is to improve the performance of the java virtual machine base language to match the performance of native languages. Let’s sum up the novel features that this project offers and make a brief explanation according to the docs why you should adopt it.
Polyglot: All languages (even LLVM-based) share the same VM and its capabilities. Zero overhead interoperability between programming languages allows you to write polyglot applications and select the best language for your task
Native: Native images compiled with GraalVM ahead-of-time improve the startup time and reduce the memory footprint of JVM-based applications.
Embeddable: GraalVM can be embedded in both managed and native applications. There are existing integrations into OpenJDK, Node.js, Oracle Database, and MySQL GraalVM removes the isolation between programming languages and enables interoperability in a shared runtime. It can run either standalone or in the context of OpenJDK, Node.js, Oracle Database, or MySQL.
Performance: Graal benchmark reports show great performance improvements in almost all of its implementations thanks to the way that GraalVM performs object allocations
If someone don’t get convinced by now that is a good choice and it is a really awesome project you can see this talk by Christian Thalinger on “on why Graal is a good fit for Twitter”

Embedded platform development in (!C)

I'm curious to see how popular the alternatives to C are in the embedded developer world e.g. Ada...
I've only ever used C (with a little bit of assembler), but then my targets have very limited resources. Is there a move else where in this space to something else? What is winning the ware in set top boxes?
If !C what was the underlying reason?
Compiler support for target
Trace \ static analysis tools
other...
Thanks.
Forth is quite popular for embedded development.
Also, while Smalltalk is probably not popular in the embedded community, embedded development is definitely popular in the Smalltalk community.
When you say "embedded development", keep in mind that you have to consider the scale of the project.
When programming something on the scale of a microcontroller or the firmware for an ASIC, you tend to see C and assembly dominate the scene. Embedded developers tend to "specialize" in these languages since compilers for them are available for nearly every embedded target platform. If your project migrates from, say, a chip with a PowerPC core to a chip with an ARM core, you can be fairly confident that your C code will not be overly difficult to port over. Some chips do have compilers available for other languages, but typically they do not match the C compiler in terms of efficiency of the resulting binary. Since embedded systems are often low on resources, system designers want to make their code as efficient as possible (also one reason why you see a lot of assembly language code). I have seen development tools available for languages such as C++, Pascal, Basic, and others, but they are typically niche tools that are not mature enough to match the efficiency of the available C compilers. Debugging tools for these languages also tend to be harder to find than what is available for C/assembly.
You also mentioned set-top boxes. Embedded systems on this scale can pack the equivalent power of a desktop computer from 7-8 years ago. Their available RAM, storage space, and processing power allows them to run full-featured operating systems and interpreters for higher-level languages. On these more powerful systems you will still see C and assembly language being used (for driver code, if nothing else), but other languages (such as Java, Lua, Tcl, Ruby, etc) are becoming more and more common. Using interpreted languages makes porting code from one platform to another even easier, as long as the platform has sufficient resources to handle the overhead of the language interpreter. Any low-level code that interfaces directly with hardware (drivers) with still typically use assembly or C since high-level languages don't always have the capability to do this sort of thing. Anything running as an application on top of the embedded operating system can usually be developed and tested inside an emulator or virtual machine, and so you will see a lot of code being developed in whatever language the developer happens to be comfortable with.
TLDR version: C is popular because is it a versatile language that nearly all developers are familiar with. Assembly is popular because it allows for low-level hardware access in ways that would otherwise be difficult or impossible. Interpreted/scripted languages such as Java are becoming more popular, but the resource requirements of the interpreters for these languages may be too much for some embedded systems to handle. The quality and variety of development/debugging tools availability for the C and assembly languages also makes these options attractive.
Perhaps not quite the large step from C you're looking for but C++ is also resonably popular for embedded projects.
I haven't used myself, but Bascom is quite popular for AVR microcontrollers. It is a Basic IDE that lets you interact with the peripherals very easily. I've met hardware people that successfully use it.
Yes. Java is becoming more popular - many processors have added instructions that pertain primarily to Java and similar languages (.net). Also, uclinux runs on microcontrollers, so you can use practically any language for some of the larger micros.
Basic is still common, as is assembly.
You'll see Ada in certain gov't projects.
And some engineers are even putting Lua and other interpreters on their micros so their customers can extend the functionality.
But C is still dominant.
-Adam
In the early 90 I did a lot of embedded development on the 8051 using Intel PLM51 and the DCX51 operating system.
PLM is very simple language – but very powerful
We now use C
If you work in the smartcard space, you get to use Java Card. Yep, Java, on an 8-bit micro. It's kinda fun, actually. I get to develop in Eclipse, test ( & debug!) on the PC simulator, and can be confident that it'll run the same on the card. It's just such a pity Java is a terrible language for embedded apps :)
I've used EC++ (Embedded C++) quite extensively.
Also, PICBasic has been popular with the PIC'ers for eons now.
I have used Ada in embedded project for military avionics because of customer requirements. There is lots of Ada tools for embedded development but most of it is very expensive. Personally I would just use C.
There is a Pascal compiler for 8051
JAL
There is a group of folks working to make Lua a viable option for embedded work. They are targeting primarily 32-bit ARMs with 256K FLASH and 64K RAM or better, and seem happy with their work so far.
They are partly inspired by the classic BASIC-Stamp, a BASIC interpreter running in a moderately powerful PIC with the program itself stored in a serial EEPROM device.
At work, I am still maintaining a customer's embedded system that is written in a compiled flavor of BASIC running in a Zilog Z180 CPU. 1980's technology all around, with most of the system still built out of 24-bin DIP packages in sockets. The compiler runs under CP/M-80 running in a Z80 simulator, that itself runs in the MS-DOS simulator built into Windows. Aside from the shear amazement that anything productive can be done this way (and that you can still buy 27C256 UV erasable EPROMS, and that my nearly 20 year old Data/IO PROM programmer still works) I really wish the customer could afford to move to a new hardware design so the system could be rewritten in a maintainable language.
Depends on the microcontroller, many of them have C but the compilers are horribly, assembler is usually easy and the best performing, most efficient, etc. Ones like the msp, avr, and arm are good for C compilers and for those I would and do use C (depending on the problem).
I would stick to C or assembler, you are wasting memory, performance, and resources using anything else.
Pascal, Modula2 work fine too. Essentially they are pretty much equivalent to C, except for the inability to do alloca (though some have that as extension).
But the core problem will be the problem with any !C compiler: what do you prefer, a better compiler/toolchain or the language of preference.
Despite I like the Wirthian languages most, I simply use C, and am living with the consequences, simply because the toolchain is better.
There have been examples in the past (Pascals, or even tightly compiled Basics), but C is mostly the norm. I never understood why.
I worked on a device which ran some incredibly old version of python (1.4 or something). There was no way to debug it (other than printing debug messages) so when your code hit an exception everything would just stop and you scratched your head for an hour. Whenever you made a change and upgraded the code it was running, it took about 10 minutes to interpret and compile it.
Needless to say we scrapped that and replaced the microcontroller with one that ran C.
See this related question:
What languages are used for real-time systems programming.
In response to your "why" question, from the standpoint of government/military acquisition, there is a perception that Java (language, platform, etc...) is the lingua franca these days and that economies of scale in the language will reduce acquisition and maintenance cost. There's also a hope that one can efficiently train a competent Java programmer to be a reasonable RT/embedded programmer in Java faster than if they are required to learn a new language. This rationale is suspect, in my opinion, but it does answer the "why" question.
If you include the iPhone as an embedded platform then Objective-C
Considering how many times I've had a Java out-of-memory exception on my phone(most of the time I do anything remotely interesting), I'd run away from Java like a bat out of a hot place.
I've heard that Erlang was designed for use for cell phones. I think Lisp is a good architecture for remote device support- if the device cna handle the run-time.
A lot of home-brew users and small companies needing a cheap solution have found Tiny Tiger and Basic STAMP (using BASIC) meets their needs.