Where to learn about bytecode generation? - virtual-machine

I'm trying to learn about generating code for a virtual machine. Where can I find out more about this, any useful books that are about this topic? This is probably a bad question for Stack Overflow, but I'm really confused and I've searched everywhere for a good resource.

Off-topic, however:
read Queinnec's Lisp in Small Pieces
study the source code of lua (5.2 or newer), of nekoVM, of Ocaml bytecode interpreter (file byterun/interp.c), and of parrot

Related

What is a good VM for developing a hobby language?

I'm thinking about writing my own little language.
I found a few options, but feel free to suggest more.
JVM
Parrot
OSA
A lot of languages are using the JVM, but unless you write a Java-ish language, all the power the stdlib gives you is going to feel ugly; It's not very good at dynamic stuff either.
Parrot seems a good VM for developing languages, but it has a little abandoned/unfinished/hobby project smell to it.
OSA is what powers Applescript, not a particularly well known VM, but I use Mac, and it offers good system integration.
CLR+Mac doesn't seem a good combination...
My language is going to be an object orientated functional concurrent dataflow language with strong typing and a mix of Python and Lisp syntax.
Sounds good, eh?
[edit]
I accepted Python for now, but I'd like to hear more about OSA and Parrot.
One approach I've played with is to use the Python ast module to build an abstract syntax tree representing the code to run. The Python compile function can compile an AST into Python bytecode, which exec can then run. This is a bit higher level than directly generating bytecode, but you will have to deal with some quirks of the Python language (for example, the fundamental difference between statements and expressions).
In doing this I've also written a "deparse" module that attempts to convert an AST back to equivalent Python source code, just for debugging. You can find code in the psil repository if you're interested.
Have a look at LLVM. It's not a pure VM as such, more a framework with it's own IR that allows you to build high level VMs. Has nice stuff like static code analysis and JIT support
Lua has a small, well-written and fast VM
Python VM - you can really attach a new language to it if you want. Or write (use?) something like tinypy which is a small and simple implementation of the Python VM.
Both options above have access to useful standard libraries that will save you work, and are coded in relatively clean and modular C, so they shouldn't be hard to connect to.
That said, I disagree that Parrot is abandoned/hobby. It's quite mature, and has some very strong developers working on it. Furthermore, it's specifically a VM designed to be targeted by multiple dynamic languages. Thus, is was designed with flexibility in mind.
Have you considered Pypy? From what I've read, in addition to being a Python JIT Compiler, it also has the capability to handle other languages. For example there is a tutorial which explains how to create a Brainfuck JIT compiler using Pypy.

GNUstep NSString.m file

I'm using GNUstep to begin with learning Objective-C.
I could find the header files for all, but don't know where to find its implementation files.
I was thinking, with that I can understand the whole programming style and many more.
I am working without mac , so if some body knows about any good tutorials , that i can use to identify structure of every Classes.
For instance, i have to parse an xml file, just to learn,
but don't know where to start.
Without IDE its hard to find out the sequence, and I don't have got access to any tutorials that best explains this, ( all that i get is in accordance with i-Phone and Cocoa. )
I'm concentrating on console programs, so that I can be thorough with the syntax and language.
Pls help me.
http://gnustep.org/ is the best resource for GNUstep related information, including source and documentation.
GNUStep has some tutorials and definitely the source code available.
You will find that there are small bits and pieces where Cocoa has moved on so GNUStep will not recognize new methods and things like properties or any new objective-2.0 stuff and so on.
Where ever you have the source installed, you can find NSString.m here
/path_to_my_src/gnustep/modules/core/base/Source/NSString.m

Programming features missing in C++ and Java

What are the programming features that are missing in C++ and Java ?
For eg. You can't do recursive programming in QBasic ? You can't dynamically allocate memory in QBasic.
What would be the good to have features in C++, Java.
I think Lisp Programmers will be able to add a few.
I miss lambda expressions.
This answer deals only with C++
Things I miss from the syntax, or the standard library:
RegExp as part of the standard library
Threads as part of the standard library
Pointer to member methods (not objects!)
Properties would be nice (I have seen codes that emulate this via C++ preprocessor... note an nice looking code).
Some lower level networking API (sockets!), and higher level API (give me this file from this ftp, submit "this" to this site via POST).
This is the list of things I would like to see, but I assume other people will disagree with me.
Memory garbage collector is nice.
A n interface for a GUI toolkit - let MSVC map it to win32, and on Linux... (good question!)
A stable ABI. In C it's a standard - but on C++ we are still missing a few decades. I want also stable ABI between compilers - I want to compile one library in MinGW, the other with CL and all should work.
This is the list of things I want to see, but I know they will not get away:
Compatibility with C. Really, it's a myth right now. using namespace std killed it.
Include, headers. Most of the information is already available in the DLL/so/a/"library", do we really need to keep this bad decision from 30 years ago? If needed the compilers should keep information in the binaries.
The need for Makefiles - the compiler should be smart enough to know what to do with this code, from the code itself. Pascal is doing it quite good. I think also D.
(I might be wrong, please correct me) The official standard openly and freely available for viewing. Why should I pay for the official papers? Do I need to do it for HTTP? UTF8? Unicode?
I think this is a very subjective question. From a theoretical point of view there's nothing "missing" in Java because you can do everything you want to from the perspective of the outcome as an application.
As with QBasic - recursion may not be possible but that doesn't prevent you from changing your recursive algorithm to an iterative algorithm. Programming language theory tells us that you can do this with every recursive problem. So there's also nothing missing here.
I think what you mean are features that are "nice to have" - and here everyone has to decide for himself. I'd even say there are features in the language which would have been "nice not to have" such as static imports - but again this is my subjective opinion...

F/OSS for the PIC24?

I'm learning embedded programming with the PIC24, and I'm looking for something "real-world" to dig into to help me learn. Are there any free software projects that might be targeting to the PIC? Anything that I could help port, or a niche I could try to fill?
Quite old (2005) but may help you to start:
http://www.gnupic.org/
For example, gpsim at http://www.dattalo.com/gnupic/gpsim.html
If you are looking for software projects, look up gputils - is the open source assembler and linker, and sdcc - an open source c/c++ compiler (lacks a PIC24 port).
If you are looking for embedded projects, there is nothing specific that comes to mind. However, do visit the piclist to pick up code snippets and learn from examples. Some of the best PIC coders contribute there.
If you are looking for hardware projects, there are verilog PIC16/18 cores on opencores but no PIC24 cores yet.
So, there's plenty to suit all tastes.

Creating your own language

If I were looking to create my own language are there any tools that would help me along? I have heard of yacc but I'm wondering how I would implement features that I want in the language.
Closely related questions (all taken by searching on [compiler] on stackoverflow):
Learning Resources on Parsers, Interpreters, and Compilers
Learning to write a compiler
Constructing a simple interpreter
...
And similar topics (from the same search):
Bootstrapping a language
How much of the compiler should we know?
Writing a compiler in its own language
...
Edit: I know the stackoverflow related question search isn't what we'd like it to be, but
did we really need the nth iteration of this topic? Meh!
The first tool I would recommend is the Dragon Book. That is the reference for building compilers. Designing a language is no easy task, implementing it is even more difficult. The dragon book helps there. The book even reference to the standard unix tools lex and yacc. The gnu equivalent tools are called flex and bison. They both generate lexer and parser. There exist also more modern tools for generating lexer and parser, e.g. for java there are ANTLR (I also remember javacc and CUP, but I used myself only ANTLR). The fact that ANTLR combines parser and lexer and that eclipse plugin is availabe make it very comfortable to use. But to compare them, the type of parser you need, and know for what you need them, you should read the Dragon book. There are also other things you have to consider, like runtime environment, programming paradigm, ....
If you have already certain design ideas and need help for a certain step or detail the anwsers could be more helpful.
ANTLR is a very nice parser generator written in Java. There's a terrific book available, too.
I like Flex (Fast Lex) [Lexical scanner]
and Bison (A Hairy Yacc) [Yet another compiler compiler]
Both are free and available on all *NIX installations. For Windows just install cygwin.
But I old school.
By using these tools you can also find the lex rules and yacc gramers for a lot of popular languages on the internet. Thus providing you with a quick way to get up and running and then you can customize the grammers as you go.
Example: Arithmetic expression handling [order of precedence etc is a done to death problem] you can quickly get the grammer for this from the web.
An alternative to think about is to write a front-end extension to GCC.
Non Trivial but if you want a compiled language it saves a lot of work in the code generation section (you will still need to know love and understand flex/bison).
I never finished the complete language, I had used rply and llvmlite implements a simple foxbase language, in https://github.com/acekingke/foxbase_compiler
so if you want use python, rply or llvmlite is helpful.
if you want use golang, goyacc maybe useful. But you should write a lexical analyzer by hard coding by hand. Or you can use https://github.com/acekingke/lexergo to simplify it.