Getting Embedded with D (the programming language)

Getting Embedded with D (the programming language) - embedded

I like a lot of what I've read about D.
Unified Documentation (That would
make my job a lot easier.)
Testing capability built in to the
language.
Debug code support in the language.
Forward Declarations. (I always
thought it was stupid to declare the
same function twice.)
Built in features to replace the
Preprocessor.
Modules
Typedef used for proper type checking
instead of aliasing.
Nested functions. (Cough PASCAL
Cough)
In and Out Parameters. (How obvious is that!)
Supports low level programming -
Embedded systems, oh yeah!
However:
Can D support an embedded system that
not going to be running an OS?
Does the outright declearation that
it doesn't support 16 bit processors
proclude it entirely from embedded
applications running on such machines? Sometimes you don't need a hammer to solve your problem.
Garbage collection is great on Windows or Linux, but, and unfortunately embedded applications sometime must do explicit memory management.
Array bounds checking, you love it, you hate it. Great for design assurance, but not alway permissable for performance issues.
What are the implications on an embedded system, not running an OS, for multithreading support? We have a customer that doesn't even like interrupts. Much less OS/multithreading.
Is there a D-Lite for embedded systems?
So basically is D suitable for embedded systems with only a few megabytes (sometimes less than a magabyte), not running an OS, where max memory usage must be known at compile time (Per requirements.) and possibly on something smaller than a 32 bit processor?
I'm very interested in some of the features, but I get the impression it's aimed at desktop application developers.
What is specifically that makes it unsuitable for a 16-bit implementation? (Assuming the 16 bit architecture could address sufficient amounts of memory to hold the runtimes, either in flash memory or RAM.) 32 bit values could still be calculated, albeit slower than 16 bit and requiring more operations, using library code.

I have to say that the short answer to this question is "No".
If your machines are 16 bit, you'll have big problems fitting D into it - it is explicitly not designed for it.
D is not a light languages in itself, it generates a lot of runtime type info that normally is linked into your app, and that also is needed for typesafe variadics (and thus the standard formatting features be it Tango or Phobos). This means that even the smallest applications are surprisingly large in size, and may thus disqualify D from the systems with low RAM. Also D with a runtime as a shared lib (which could alleviate some of these issues), has been little tested.
All current D libraries requires a C standard library below it, and thus typically also an OS, so even that works against using D. However, there do exist experimental kernels in D, so it is not impossible per se. There just wouldn't be any libraries for it, as of today.
I would personally like to see you succeed, but doubt that it will be easy work.

First and foremost read larsivi's answer. He's worked on the D runtime and knows of what he's talking about.
I just wanted to add: Some of what you asked about is already possible. It won't get you all the way, and a miss is as good as a mile here but still, FYI:
Garbage collection is great on Windoze or Linux, but, and unfortunately embedded apps sometime must do explicite memory management.
You can turn garbage collection off. The various experimental D OSes out there do it. See the std.gc module, in particular std.gc.disable. Note also that you do not need to allocate memory with new: you can use malloc and free. Even arrays can be allocated with it, you just need to attach a D array around the allocated memory using a slice.
Array bounds checking, you love it, you hate it. Great for design assurance, but not alway permissable for performance issues.
The specification for arrays specifically requires that compilers allow for bounds checking to be turned off (see the "Implementation Note"). gdc provides -fno-bounds-check, and in dmd using -release should disable it.
What are the implications on an embedded system, not running an OS, for multithreading support? We have a customer that doesn't even like interrupts. Much less OS/multithreading.
This I'm less clear on, but given that most C runtimes allow turning off multithreading, it seems likely one could get the D runtime to disable it as well. Whether that's easy or possible right now though I can't tell you.

The answers to this question are outdated:
Can D support an embedded system that not going to be running an OS?
D can be cross-compiled for ARM Linux and for ARM Cortex-M. Some projects aim at creating libraries for Cortex-M architectures like MiniLibD for the STM32 or this project which uses a generic library for the STM32. (You could implement your own minimalistic OS in D on ARM Cortex-M.)
Does the outright declearation that it doesn't support 16 bit processors proclude it entirely from embedded applications running on such machines? Sometimes you don't need a hammer to solve your problem.
No, see answer above... (But I would not expect that "smaller" architectures than Cortex-M will be supported in the near future.)
Garbage collection is great on Windows or Linux, but, and unfortunately embedded applications sometime must do explicit memory management.
You can write Garbage Collection free code. (The D foundation seems to aim at a "GC free compliant" standard library Phobos but that is work in progress.)
Array bounds checking, you love it, you hate it. Great for design assurance, but not alway permissable for performance issues.
(As you said this depends on your "personal taste" and design decisions. But I would assume an acceptable performance overhead for bound checking due to the background of the D compiler developers and D's design aims.)
What are the implications on an embedded system, not running an OS, for multithreading support? We have a customer that doesn't even like interrupts. Much less OS/multithreading.
(What is the question? One could implement mutlithreading using D's language capabilities e.g. like explained in this question. BTW: If you want to use interrupts consider this "hello world" project for a Cortex-M3.)
Is there a D-Lite for embedded systems?
The SafeD subset of D targets at the embedded domain.

Related

Matching a virtual machine design with its primary programming language

As background for a side project, I've been reading about different virtual machine designs, with the JVM of course getting the most press. I've also looked at BEAM (Erlang), GHC's RTS (kind of but not quite a VM) and some of the JavaScript implementations. Python also has a bytecode interpreter that I know exists, but have not read much about.
What I have not found is a good explanation of why particular virtual machine design choices are made for a particular language. I'm particularly interested in design choices that would fit with concurrent and/or very dynamic (Ruby, JavaScript, Lisp) languages.
Edit: In response to a comment asking for specificity here is an example. The JVM uses a stack machine rather then a register machine, which was very controversial when Java was first introduced. It turned out that the engineers who designed the JVM had done so intending platform portability, and converting a stack machine back into a register machine was easier and more efficient then overcoming an impedance mismatch where there were too many or too few registers virtual.
Here's another example: for Haskell, the paper to look at is Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine. This is very different from any other type of VM I know about. And in point of fact GHC (the premier implementation of Haskell) does not run live, but is used as an intermediate step in compilation. Peyton-Jones lists no less then 8 other virtual machines that didn't work. I would like to understand why some VM's succeed where other fail.

I'll answer your question from a different tack: what is a VM? A VM is just a specification for "interpreter" of a lower level language than the source language. Here I'm using the black box meaning of the word "interpreter". I don't care how a VM gets implemented (as a bytecode intepereter, a JIT compiler, whatever). When phrased that way, from a design point of view the VM isn't the interesting thing it's the low level language.
The ideal VM language will do two things. One, it will make it easy to compile the source language into it. And two it will also make it easy to interpret on the target platform(s) (where again the interpreter could be implemented very naively or could be some really sophisticated JIT like Hotspot or V8).
Obviously there's a tension between those two desirable properties, but they do more or less form two end points on a line through the design space of all possible VMs. (Or, perhaps some more complicated shape than a line because this isn't a flat Euclidean space, but you get the idea). If you build your VM language far outside of that line then it won't be very useful. That's what constrains VM design: putting it somewhere into that ideal line.
That line is also why high level VMs tend to be very language specific while low level VMs are more language agnostic but don't provide many services. A high level VM is by its nature close to the source language which makes it far from other, different source languages. A low level VM is by its nature close to the target platform thus close to the platform end of the ideal lines for many languages but that low level VM will also be pretty far from the "easy to compile to" end of the ideal line of most source languages.
Now, more broadly, conceptually any compiler can be seen as a series of transformations from the source language to intermediate forms that themselves can be seen as languages for VMs. VMs for the intermediate languages may never be built, but they could be. A compiler eventually emits the final form. And that final form will itself be a language for a VM. We might call that VM "JVM", "V8"...or we might call that VM "x86", "ARM", etc.
Hope that helps.

One of the techniques of deriving a VM is to just go down the compilation chain, transforming your source language into more and more low level intermediate languages. Once you spot a low level enough language suitable for a flat representation (i.e., the one which can be serialised into a sequence of "instructions"), this is pretty much your VM. And your VM interpreter or JIT compiler would just continue your transformations chain from the point you selected for a serialisation.
Some serialisation techniques are very common - e.g., using a pseudo-stack representation for expression trees (like in .NET CLR, which is not a "real" stack machine at all). Otherwise you may want to use an SSA-form for serialisation, as in LLVM, or simply a 3-address VM with an infinite number of registers (as in Dalvik). It does not really matter which way you take, since it is only a serialisation and it would be de-serialised later to carry on with your normal way of compilation.
It is a bit different story if you intend to interpret you VM code immediately instead of compiling it. There is no consensus currently in what kind of VMs are better suited for interpretation. Both stack- (or I'd dare to say, Forth-) based VMs and register-based had proven to be efficient.

I found this book to be helpful. It discusses many of the points you are asking about. (note I'm not in any way affiliated with Amazon, nor am I promoting Amazon; just was the easiest place to link from).
http://www.amazon.com/dp/1852339691/

Choosing CPU architecture for LLVM/CLANG

I am designing TTL serial computer, and I am struggling on choosing architecture more suitable for LLVM compiler backend (I want to be able to run any C++ software there). There will be no MMU, no multiplication/division, no hardware stack, no interrupts.
I have 2 main options:
1) 8-bit memory, 8-bit ALU, 8-bit registers (~12-16). Memory address width 24 bit. So I will need to use 3 registers as IP and 3 registers for any memory location.
Needless to say that any address calculations would be pure pain to implement in compiler.
2) 24-bit memory, 24-bit ALU, 24-bit registers (~6-8). Flat memory, nice. The drawbacks is that due to serial nature of the design, each operation would take 3 time more clocks, even if we are operating on some booleans. 24-bit memory data width is expensive. And it's harder to implement in hardware in general.
The question is : Do you think implementing all c++ features on this 8-bit, stack-less based hardware is possible, or I need to have more complex hardware to have generated code of reasonable quality & speed?

I second the suggestion to use LCC. I used it in this homebrew 16-bit RISC project: http://fpgacpu.org/xsoc/cc.html .
I don't think it should make much difference whether you build the 8-bit variant and use 3 add-with-carries to increment IP, or the 24-bit variant and do the whole thing in hardware. You can hide the difference in your assembler.
If you look at my article above, or an even simpler CPU here: http://fpgacpu.org/papers/soc-gr0040-paper.pdf you will see you really don't need that many operators / instructions to cover the integer C repetoire. In fact there is an lcc utility (ops) to print the min operator set for a given machine.
For more information see my article on porting lcc to a new machine here: http://www.fpgacpu.org/usenet/lcc.html
Once I had ported lcc, I wrote an assembler, and it synthesized a larger repetoire of instructions from the basic ones. For example, my machine had load-byte-unsigned but not load-byte-signed, so I emitted this sequence:
lbs rd,imm(rs) ->
lbu rd,imm(rs)
lea r1,0x80
xor rd,r1
sub rd,r1
So I think you can get by with this min cover of operations:
registers
load register with constant
load rd = *rs
store *rs1 = rs2
+ - (w/ w/o carry) // actually can to + with - and ^
>> 1 // << 1 is just +
& ^ // (synthesize ~ from ^, | from & and ^)
jump-and-link rd,rs // rd = pc, pc = rs
skip-z/nz/n/nn rs // skip next insn on rs==0, !=0, <0, >=0
Even simpler is to have no registers (or equivalently blur registers with memory -- all registers have a memory address).
Set aside a register for SP, and write the function prolog/epilog handler in the compiler and you won't have to worry about stack instructions. There's just code to store each of the callee save registers, adjust the SP by the frame size, and so forth.
Interrupts (and return from interrupts) are straightforward. All you need to do is force a jump-and-link instruction into the instruction register. If you chose the bit pattern for that to be something like 0, and put the right addresses into the source register rs (especially if it is r0), it can be done with a flip-flop reset input or an extra force-to-0 and gate. I use a similar trick in the second paper above.
Interesting project. I see a TTL / 7400 contest is underway and I was thinking myself of how simple a machine could you get away with and would it be cheating to add a 32 KB or 128 KB async SRAM to the machine to hold the code and data.
Anyway, happy hacking!
p.s.
1) You will want to decide how large each integral type is. You can certainly make char, short, int, long, long long, etc. the same size, one 24b word, if you wish, although it won't be compliant in min representation ranges.
2) And although I focused on lcc here, you were asking about C++. I recommend persuing C first. Once you have things figured out for C, including *, /, % operators in software, etc., it should be more tractable to move to full blown C++ whether in LLVM or GCC. The difference between C and C++ is "only" the extra vtables and RTTI tables and code sequences (entirely built up out the primitive C integer operator repetoire) required to handle virtual function calls, pointer to member dereference, dynamic casts, static constructors, exception handling, etc.

IMHO, It is possible for c compiler. i am not sure for c++, though.
LLVM/CLang could be hard choice for 8bit computer,
Instead, first try lcc, then second llvm/etc, HTH.
Bill Buzbee succeed to retarget lcc compiler for his Magic-1(known as homebrewcpu).
Although the hardware design and construction of Magic-1 usually gets the most attention, the largest part of the project (by far) has been developing/porting the software. To this end, I've had to write an assembler and linker from scratch, retarget a C compiler, write and port the standard C libraries, write a simplified operating system and then port a more sophisticated one. It's been a challenge, but a fun one. I suppose I'm somewhat twisted, but I happen to enjoy debugging difficult problems. And, when the bug you're trying to track down could involve one or more of: hardware design flaw, loose or broken wire, loose or bad TTL chip, assembler bug, linker bug, compiler bug, C runtime library bug, or finally a bug in the program in question there's lot of opportunity for fun. Oh, and I also don't have the luxury of blaming the bugs on anyone else.
I'm continually amazed that the damn thing runs at all, much less runs as well as it does.

In my opinion, stackless hardware is already poorly suited for C and C++ code. If you have nested function calls, you will need to emulate a stack in software anyway, which of course is much slower.
When going the stackless route, you will probably allocate most of your variables as 'static', and have no re-entrant functions. In this case, 6502-style addressing modes can be effective. You could for example have these addressing modes:
Immediate address (24bit) as part of opcode
Immediate address (24bit) plus index register (8bit)
Indirect access: immediate 24bit address to memory, which contains the actual address
Indirect access: 24 bit address to memory, 8 bit index register added to value from memory.
The address modes outlined above would allow efficient access to arrays, structures and objects allocated at a constant address (static allocation). They would be less efficient (but still usable) for dynamically and stack-allocated objects.
You would also get some benefit from your serial design: usually the 24 bit + 8 bit addition does not take 24 cycles, but you can instead short-circuit the addition when carry is 0.
Instead of mapping the IP as registers directly, you could allow changing it only through goto/branch instructions, using the same address modes as above. Jumps into dynamically computed addresses are quite rare so it makes more sense to give the whole 24-bit address directly in the opcode.
I think that if you design the CPU carefully, you can use many C++ features quite efficiently. However, do not expect that any random C++ code would run fast on such a limited CPU.

The implementation is certainly possible, but I doubt it will be usable (at lest for C++ code). As it was already noted, first problem is lack of stack. Next, bunch of C++ relies heavily on dynamic memory allocation, also C++ "internal" structures are quite big.
So, as it seems to me, it will be better, if you:
Get rid of C++ requirement (or at least, limit yourself to some subset)
Use 24 bits, not 8 bits for everything (for registers as well)
Add hardware stack

You are not going to be able to run "any" C++ code there. For example fork(), system(), etc. Anything that clearly relies on interrupts for example. You can get a long way there, sure.
Now do you mean any programs that can/have been written in C++ or are you limiting yourself to the language only and not the libraries that are commonly associated with C/C++? The language itself is a much easier rule to live with.
I think the easier question/answer, is, why not just try? What have you tried so far? It could be argued that the x86 is an 8-bit machine, no regard for alignment and many 8 bit instructions. the msp430 was ported to llvm to show how easily and quickly it could be done, I would like to see that platform with better support (not where my strengths lie otherwise I would be doing it) a 16 bit platform. no mmu. does have a stack and interrupts sure, dont have to use them and if you remove library rules then what is left that needs an interrupt?
I would look at llvm but note that the documentation produced that shows how easy it is to port, is dated and wrong and you basically have to figure it out on your own from the compiler sources. llc has a book, known for that, not optimized. Sources dont compile well on modern computers, always having to go backwards in time to use it, any time I go near it after an evening just trying to build it as is I give up. vbcc, simple, clean, documented, not unfriendly to smaller processors. Is it C++, dont remember. Of all of them the easiest to get a compiler up and running though. Of all of them LLVM is the most attractive and most useful when all said and done. dont go near gcc or even think of it, duct tape and bailing wire inside holding it together.
Have you invented your instruction set yet? do you have a simulator and assembler yet? Look up lsasim at github to find my instruction set. You can write an llvm backend for mine as practice for yours...grin...(my vbcc backend is horrible, I need to start over)...
You have to have some idea of how the high level will be implemented but you really have to start with an instruction set and an instruction set simulator and an assembler of some sort. Then start hand converting C/C++ code into assembly for your instruction set, that should pretty quickly get you through "can I do this without a stack", etc. In this process define your calling convention, implement more C/C++ code by hand using your calling convention. THEN dig into a compiler and make a back end. I think you should consider vbcc as a stepping stone, then head for LLVM if it appears like it (the isa) will work.

What are alternatives to the Java VM?

As Oracle sues Google over the Dalvik VM it becomes clear, that you cannot implement a Java VM without license from Oracle (EDIT: Matthew Flaschen points out, that the claims of Oracle may not be valid. Anyways we have currently a situation, where Oracle threats VM-implementations.). That may become the death for Open-Source-implementations of Java (like Apache Harmony).
I don't want to discuss the impact or the legitimation of this lawsuit. but as a Java-programmer I want to take a deeper look into the alternatives, to be prepared for every case. As I see the creation of a compiler as a minor problem, my main interest are alternative VM-implementations, that serve a similar purpose as the JVM.
The VM I'm looking for, should meet some conditions:
free of patent-issues
an Open-Source-implementation exists
potential for optimizations/good performance
platform independent (the VM can be ported to different platforms without bigger hurdles)
Please add some recommendations for me.

LLVM is a really good optimizing, low level virtual machine. It can support languages like C and C++, and does not have built in support for high level features like garbage collection.
VMKit is an implementation of the Java and CLI virtual machines on top of LLVM. Since it uses Java bytecode, this probably wouldn't help with the patent issues.
HLVM is another interesting high level virtual machine built on top of LLVM. It is probably different enough to avoid most well known patents, but it is mainly targeted at numerical computing and functional programming.
On the dynamically typed side, there is Parrot.
I am actually working on a compiler and VM for a language of my own design, but don't count on it ever being finished. ;-)
Keep in mind that any large piece of software will infringe on numerous patents, the important thing is how well known they are (and how much the patents' owners actively seek out infringers). Of course, the whole patent system is absurd, and we would be much better off getting rid of it.

I don't think there is any significant piece of software that is free from patent issues.
If you are an independent developer or working for a smaller company you probably won't get hit directly by the problems though. It's unlikely that big companies holding patents will go after lots of small claims - it's an expensive process and causes a lot of resentment. SCO tried something like that and it didn't work out too well for them.
I would concentrate on finding the best tool for the job without worrying too much about the patent issues, otherwise you will never get anything done.

GraalVM is a research project developed by Oracle Labs and already in production at Twitter. I can't believe my eyes that no one mentions anything about it, it’s so weird. Anyways, GraalVM is a well promising extension of the java virtual machine to support more language and execution modes for running applications like JavaScript, Python, Ruby, R, JVM-based languages, and LLVM-based languages such as C and C++.The GraalVM project includes a new high-performance Java compiler, itself called Graal, which can be used in a just-in-time configuration on the HotSpot VM, or in an ahead-of-time configuration on the SubstrateVM. The main goal of this project is to improve the performance of the java virtual machine base language to match the performance of native languages. Let’s sum up the novel features that this project offers and make a brief explanation according to the docs why you should adopt it.
Polyglot: All languages (even LLVM-based) share the same VM and its capabilities. Zero overhead interoperability between programming languages allows you to write polyglot applications and select the best language for your task
Native: Native images compiled with GraalVM ahead-of-time improve the startup time and reduce the memory footprint of JVM-based applications.
Embeddable: GraalVM can be embedded in both managed and native applications. There are existing integrations into OpenJDK, Node.js, Oracle Database, and MySQL GraalVM removes the isolation between programming languages and enables interoperability in a shared runtime. It can run either standalone or in the context of OpenJDK, Node.js, Oracle Database, or MySQL.
Performance: Graal benchmark reports show great performance improvements in almost all of its implementations thanks to the way that GraalVM performs object allocations
If someone don’t get convinced by now that is a good choice and it is a really awesome project you can see this talk by Christian Thalinger on “on why Graal is a good fit for Twitter”

When implementing an interpreter, is it a good or bad to piggyback off the host language's garbage collector?

Let's say you are implementing an interpreter for a GCed language in a language that is GCed.
It seems to me you'd get garbage collection for free as long as you are reasonably careful about your design.
Is this generally how it's done? Are there good reasons not to do this?

Language and runtime are two different things. They are not really related IMHO.
Therefore, if your existing runtime offers a GC already, there must be a good reason to extend the runtime with another GC. In the good old days when memory allocations in the OS were slow and expensive, apps brought their own heap managers which where more efficient in dealing with small chunks of data. That was one readon for adding another memory management to an existing runtime (or OS). But if you're talking Java, .NET or so - those should be good and efficient enough for most tasks at hand.
However, you may want to create a proper interface/API for memory and object management tasks (and others), so that your language ("guest") runtime could be implemented on to of another host runtime later on.

For an interpreter, there should be no problem with using the host GC, IMHO, particularly at first. As always they goal should be to get something working, then make it work right, then make it fast. This is particularly true for Domain Specific Languages (DSL) where the goal is a small language. For these, implementing a full GC would be overkill.

Embedded platform development in (!C)

I'm curious to see how popular the alternatives to C are in the embedded developer world e.g. Ada...
I've only ever used C (with a little bit of assembler), but then my targets have very limited resources. Is there a move else where in this space to something else? What is winning the ware in set top boxes?
If !C what was the underlying reason?
Compiler support for target
Trace \ static analysis tools
other...
Thanks.

Forth is quite popular for embedded development.
Also, while Smalltalk is probably not popular in the embedded community, embedded development is definitely popular in the Smalltalk community.

When you say "embedded development", keep in mind that you have to consider the scale of the project.
When programming something on the scale of a microcontroller or the firmware for an ASIC, you tend to see C and assembly dominate the scene. Embedded developers tend to "specialize" in these languages since compilers for them are available for nearly every embedded target platform. If your project migrates from, say, a chip with a PowerPC core to a chip with an ARM core, you can be fairly confident that your C code will not be overly difficult to port over. Some chips do have compilers available for other languages, but typically they do not match the C compiler in terms of efficiency of the resulting binary. Since embedded systems are often low on resources, system designers want to make their code as efficient as possible (also one reason why you see a lot of assembly language code). I have seen development tools available for languages such as C++, Pascal, Basic, and others, but they are typically niche tools that are not mature enough to match the efficiency of the available C compilers. Debugging tools for these languages also tend to be harder to find than what is available for C/assembly.
You also mentioned set-top boxes. Embedded systems on this scale can pack the equivalent power of a desktop computer from 7-8 years ago. Their available RAM, storage space, and processing power allows them to run full-featured operating systems and interpreters for higher-level languages. On these more powerful systems you will still see C and assembly language being used (for driver code, if nothing else), but other languages (such as Java, Lua, Tcl, Ruby, etc) are becoming more and more common. Using interpreted languages makes porting code from one platform to another even easier, as long as the platform has sufficient resources to handle the overhead of the language interpreter. Any low-level code that interfaces directly with hardware (drivers) with still typically use assembly or C since high-level languages don't always have the capability to do this sort of thing. Anything running as an application on top of the embedded operating system can usually be developed and tested inside an emulator or virtual machine, and so you will see a lot of code being developed in whatever language the developer happens to be comfortable with.
TLDR version: C is popular because is it a versatile language that nearly all developers are familiar with. Assembly is popular because it allows for low-level hardware access in ways that would otherwise be difficult or impossible. Interpreted/scripted languages such as Java are becoming more popular, but the resource requirements of the interpreters for these languages may be too much for some embedded systems to handle. The quality and variety of development/debugging tools availability for the C and assembly languages also makes these options attractive.

Perhaps not quite the large step from C you're looking for but C++ is also resonably popular for embedded projects.

I haven't used myself, but Bascom is quite popular for AVR microcontrollers. It is a Basic IDE that lets you interact with the peripherals very easily. I've met hardware people that successfully use it.

Yes. Java is becoming more popular - many processors have added instructions that pertain primarily to Java and similar languages (.net). Also, uclinux runs on microcontrollers, so you can use practically any language for some of the larger micros.
Basic is still common, as is assembly.
You'll see Ada in certain gov't projects.
And some engineers are even putting Lua and other interpreters on their micros so their customers can extend the functionality.
But C is still dominant.
-Adam

In the early 90 I did a lot of embedded development on the 8051 using Intel PLM51 and the DCX51 operating system.
PLM is very simple language – but very powerful
We now use C

If you work in the smartcard space, you get to use Java Card. Yep, Java, on an 8-bit micro. It's kinda fun, actually. I get to develop in Eclipse, test ( & debug!) on the PC simulator, and can be confident that it'll run the same on the card. It's just such a pity Java is a terrible language for embedded apps :)

I've used EC++ (Embedded C++) quite extensively.
Also, PICBasic has been popular with the PIC'ers for eons now.

I have used Ada in embedded project for military avionics because of customer requirements. There is lots of Ada tools for embedded development but most of it is very expensive. Personally I would just use C.

There is a Pascal compiler for 8051

JAL

There is a group of folks working to make Lua a viable option for embedded work. They are targeting primarily 32-bit ARMs with 256K FLASH and 64K RAM or better, and seem happy with their work so far.
They are partly inspired by the classic BASIC-Stamp, a BASIC interpreter running in a moderately powerful PIC with the program itself stored in a serial EEPROM device.
At work, I am still maintaining a customer's embedded system that is written in a compiled flavor of BASIC running in a Zilog Z180 CPU. 1980's technology all around, with most of the system still built out of 24-bin DIP packages in sockets. The compiler runs under CP/M-80 running in a Z80 simulator, that itself runs in the MS-DOS simulator built into Windows. Aside from the shear amazement that anything productive can be done this way (and that you can still buy 27C256 UV erasable EPROMS, and that my nearly 20 year old Data/IO PROM programmer still works) I really wish the customer could afford to move to a new hardware design so the system could be rewritten in a maintainable language.

Depends on the microcontroller, many of them have C but the compilers are horribly, assembler is usually easy and the best performing, most efficient, etc. Ones like the msp, avr, and arm are good for C compilers and for those I would and do use C (depending on the problem).
I would stick to C or assembler, you are wasting memory, performance, and resources using anything else.

Pascal, Modula2 work fine too. Essentially they are pretty much equivalent to C, except for the inability to do alloca (though some have that as extension).
But the core problem will be the problem with any !C compiler: what do you prefer, a better compiler/toolchain or the language of preference.
Despite I like the Wirthian languages most, I simply use C, and am living with the consequences, simply because the toolchain is better.
There have been examples in the past (Pascals, or even tightly compiled Basics), but C is mostly the norm. I never understood why.

I worked on a device which ran some incredibly old version of python (1.4 or something). There was no way to debug it (other than printing debug messages) so when your code hit an exception everything would just stop and you scratched your head for an hour. Whenever you made a change and upgraded the code it was running, it took about 10 minutes to interpret and compile it.
Needless to say we scrapped that and replaced the microcontroller with one that ran C.

See this related question:
What languages are used for real-time systems programming.
In response to your "why" question, from the standpoint of government/military acquisition, there is a perception that Java (language, platform, etc...) is the lingua franca these days and that economies of scale in the language will reduce acquisition and maintenance cost. There's also a hope that one can efficiently train a competent Java programmer to be a reasonable RT/embedded programmer in Java faster than if they are required to learn a new language. This rationale is suspect, in my opinion, but it does answer the "why" question.

If you include the iPhone as an embedded platform then Objective-C

Considering how many times I've had a Java out-of-memory exception on my phone(most of the time I do anything remotely interesting), I'd run away from Java like a bat out of a hot place.
I've heard that Erlang was designed for use for cell phones. I think Lisp is a good architecture for remote device support- if the device cna handle the run-time.

A lot of home-brew users and small companies needing a cheap solution have found Tiny Tiger and Basic STAMP (using BASIC) meets their needs.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas