Methodologies for designing a simple programming language - language-design

In my ongoing effort to quench my undying thirst for more programming knowledge I have come up with the idea of attempting to write a (at least for now) simple programming language that compiles into bytecode. The problem is I don't know the first thing about language design. Does anyone have any advice on a methodology to build a parser and what the basic features every language should have? What reading would you recommend for language design? How high level should I be shooting for? Is it unrealistic to hope to be able to include a feature to allow one to inline bytecode in a way similar to gcc allowing inline assembler? Seeing I primarily code in C and Java which would be better for compiler writing?

There are so many ways...
You could look into stack languages and Forth. It's not very useful when it comes to designing other languages, but it's something that can be done very quickly.
You could look into functional languages. Most of them are based on a few simple concepts, and have simple parsing. And, yet, they are very powerful.
And, then, the traditional languages. They are the hardest. You'll need to learn about lexical analysers, parsers, LALR grammars, LL grammars, EBNF and regular languages just to get past the parsing.
Targeting a bytecode is not just a good idea – doing otherwise is just insane, and mostly useless, in a learning exercise.
Do yourself a favour, and look up books and tutorials about compilers.
Either C or Java will do. Java probably has an advantage, as object orientation is a good match for this type of task. My personal recommendation is Scala. It's a good language to do this type of thing, and it will teach you interesting things about language design along the way.

You might want to read a book on compilers first.
For really understanding what's going on, you'll likely want to write your code in C.
Java wouldn't be a bad choice if you wanted to write an interpreted language, such as Jython. But since it sounds like you want to compile down to machine code, it might be easier in C.

I recommend reading the following books:
ANTLR
Language Design Patterns
This will give you tools and techniques for creating parsers, lexers, and compilers for custom languages.

Related

Powerful static OO orientated multi paradigm alternatives to Scala

I wondered if there are any alternatives to Scala that attempt to offer a more powerful type system and syntax. I'm aware of functional alternatives such as Haskell, but are there any are really pushing the static OO side of things, for example in such areas, where Scala is lacking such as virtual classes, full multiple inheritance and more flexible constructor syntax, static contract checking, more powerful path dependence, MyTypes, friend modifier, first class imports, or maybe some esoteric typing tool, I haven't even thought of / heard of.
OO and to lesser extent Static do not seem to be fashionable these days. However, it strikes me that the power of modern computers enable the creation of static compilers way beyond the dreams of compiler writers in previous decades.
I presume as I haven't come across anything, there's no alternative that I'm likely to want to knock out production code in any time soon. But even if they're still very much academic languages, I'd still like to keep an eye on them and maybe play around with them. I'm particularly looking for what might be called left field alternatives to Scala. So not Ceylon or Kotlin that are trying to prioritise simplicity over power. Eiffel doesn't seem to be going anywhere these days. I've come across gBeta and Ceasar but haven't been able to work out if there are any areas where they lose out to Scala. Are there any other possibilities?
In a word, no. There are no popular OO alternatives that come anywhere close to Scala's type system. Given your desired features, I'd suggest you take a hard look at C++, D, and Go.
If you're feeling adventurous and you aren't completely attached to the idea of OO, then take a look at Typed Racket. Coq, Idris, and Agda offer dependently typed goodies that are quite intriguing. Or just turn to popular FP languages like Haskell, F#, and OCaml.
Is there any particular reason you want an OO language? Again, Scala is probably as good as it gets right now if you want a cool type system and OO.
D (specifically, D version 2, aka D2) is pretty much exactly the language you're looking for.
There are videos on Youtube introducing D, IDEs/plugins like Visual D (plugin for Visual C++), Mono-D (plugin for MonoDevelop), and DDT (plugin for Eclipse).
The main site at dlang.org has a full library reference, language syntax, tutorials, forums for beginners/advanced discussions, etc.
For a GUI, look at GtkD. I believe you need to use the DMD compiler for this, currently.
dsource.org and github have many other third-party libraries/code, but you'll find that the core library includes all the basics, like json parsers, XML parsers, etc., and the core language has many things you need built-in, like hashmaps, dynamic arrays, design by contract, statically evaluated templates/expressions, etc.
With D2, you can link directly to C and C++, and bind to Python/LUA code, etc. It's capable as a systems language (you CAN write an OS with it, if you want), but also works well as a modern, high-level, elegant, rapid application language with support for things like concurrent, safe code.
All in all, it's very impressive. Sad that it's not more popular, given that Scala is a slow memory hog by comparison ;)

Creating a simple Domain Specific Language

I am curious to learn about creating a domain specific language. For now the domain is quite basic, just have some variables and run some loops, if statements.
Edit :The language will be Non-English based with a very simple syntax .
I am thinking of targeting the Java Virtual Machine, ie compile to Java byte code.
Currently I know how to write some simple grammars using ANTLR.
I know that ANTLR creates a lexer and parser but how do I go forward from here?
about semantic analysis: does it have to be manually written or are there some tools to create it?
how can the output from the lexer and parser be converted to Java byte code?
I know that there are libraries like ASM or BCEL but what is the exact procedure?
are there any frameworks for doing this? And if there is, what is the simplest one?
You should try Xtext, an Eclipse-based DSL toolkit. Version 2 is quite powerful and stable. From its home page you have plenty of resources to get you started, including some video tutorials. Because the Eclipse ecosystem runs around Java, it seems the best choice for you.
You can also try MPS, but this is a projectional editor, and beginners may find it more difficult. It is nevertheless not less powerful than Xtext.
If your goal is to learn as much as possible about compilers, then indeed you have to go the hard way - write an ad hoc parser (no antlr and alike), write your own semantic passes and your own code generation.
Otherwise, you'd better extend an existing extensible language with your DSL, reusing its parser, its semantics and its code generation functionality. For example, you can easily implement an almost arbitrary complex DSL on top of Clojure macros (and Clojure itself is then translated into JVM, you'll get it for free).
A DSL with simple syntax may or may not mean simple semantics.
Simple semantics may or may not mean easy translation to a target language; such translations are "technically easy" only if the DSL and the target languate share a lot of common data types and execution models. (Constraint systems have simple semantics, but translating them to Fortran is really hard!). (You gotta wonder: if translating your DSL is easy, why do you have it?)
If you want to build a DSL (in your case you stick with easy because you are learning), you want DSL compiler infrastructure that has whatever you need in it, including support for difficult translations. "What is needed" to handle translating all DSLs to all possible target languages is clearly an impossibly large set of machinery.
However, there is a lot which is clear that can be helpful:
Strong parsing machinery (who wants to diddle with grammars whose structure is forced
by the weakness of the parsing machinery? (If you don't know what this is, go read about LL(1) grammmars as an example).
Automatic construction of a representation (e.g, an abstract syntax tree) of the parsed DSL
Ability to access/modify/build new ASTs
Ability to capture information about symbols and their meaning (symbol tables)
Ability to build analyses of the AST for the DSL, to support translations that require
informatoin from "far away" in the tree, to influence the translation at a particular point in the tree
Ability to reogranize the AST easily to achieve local optimizations
Ability to consturct/analysis control and dataflow information if the DSL has some procedural aspects, and the code generation requires deep reasoning or optimization
Most of the tools available for "building DSL generators" provide some kind of parsing, perhaps tree building, and then leave you to fill in all the rest. This puts you in the position of having a small, clean DSL but taking forever to implement it. That's not good. You really want all that infrastructure.
Our DMS Software Reengineering Toolkit has all the infrastructure sketched above and more. (It clearly doesn't, and can't have the moon). You can see a complete, all-in-one-"page", simple DSL example that exercises some ineresting parts of this machinery.

Finding language designers and programmers

I'd like to create a new and open sourced language.
Since it's really rare to find programmers that actually dealt with compiler theory I need some advice.
How would you make a person interested in your open source project?
How do you bring him to a position where he wants to contribute?
Is there a special place where I can find those pepole (except sourceforge.net)?
It will be very hard to get people interested in your project. History has shown that 99% (at a conservative estimate) of new programming languages are only ever used by their designer. So if you do it, do it for love and don't expect much if any outside interest.
You may want to spend some time lurking on sites like, say, Lambda The Ultimate and reading up on theory of programming languages, compiler design, etc. I've heard that Essentials of Programming Languages by Friedman et al is a good intro text for the former, while you can't go wrong with the "Dragon Book" for the latter (whose official title escapes me at the moment... by Aho et al though).
take a look at Haskell (and its supporting community)
http://www.haskell.org/
I've used Haskell to model a small OO programming language in grad school and it seemed to be a common tool used in the Academia for designing programming language
BTW, this doesn't answer your questions but these two Microsoft / Codeplx projects both sparked my interest as possible starting points for creating a new language:
Dynamic Language Runtime
Common Compiler Infrastructure

A dynamic language to learn for curiosity's sake

This is sort of a "best language" question, but hopefully with enough of a twist to make it worthwhile.
As someone who only uses C and C#, I'm curious to learn a dynamic language to expand my knowledge. I don't know which to choose.
The thing is that my motivation isn't necessarily to create any "real world" projects, or projects that integrate with other systems, but rather just to learn.
With that said, for someone only familiar languages such as those I mentioned, and possibly ignoring obscurity and lack of support..
Which dynamic language would be the biggest departure?
Which would introduce the most novel concepts?
Which is the exemplar of dynamic languages?
I would suggest learning IronPython. As a language it will still be a significant departure for you, but you'll be able to use everything in the .NET framework that you're familiar with. (I usually think it's a good idea to try to vary just one aspect of development radically at a time... work your way through the different aspects one at a time, and you'll always be comfortable with part of what you're doing, which will help you learn the new part more quickly, IMO.)
Also, with C# 4 you'll be able to call into IronPython from your C# code, including using its dynamic features that way.
The functional languages (LISP, Scheme, etc.) are always worth checking out. They may be some of the bigger departures.
JavaScript is a great stepping stone to go from the C arena to the functional arena. From there you can mess around with JQuery, which, although not a language, forces you to do things in non procedural ways.
Another often overlooked language is SQL. It's obviously a niche language, and as Josh points out, not really 'dynamic', but acquiring a deep understanding of they way set based languages work can really progress a coder.
Careful, if you 'Learn' to much you may end up frustrated with the older languages.
Which dynamic language would be the biggest departure?
Which would introduce the most novel concepts?
I guess that would include Scheme, Erlang and Oz
Which is the exemplar of dynamic languages?
I'd say Ruby and Python
I would suggest any Lisp dialect or Smalltalk. These are dynamic and had heavy influence on the design of other, more mainstream languages.
They also include interesting concepts that are not found in other languages.
Another interesting dynamic language to have a look at is Lua.
It is hard to say, it is definitely a matter of personal taste in a lot of ways. I like learning Python but I am sure that you could learn just as many good things from Ruby about a dynamically typed language.
If you are used to C and C# then any dynamically typed language is going to be a departure. So I say you should use Python because that is what I like, and hopefully you will like it too. If you start using it and you hate it then try something else (like Ruby, Perl, PHP, etc.).
I would say that Lisp fits most, if not all, of your criteria. It's definitely a big departure from C/C++ and C#. It has got alot of novel concepts, and many would argue that it's hard to find a more dynamic language.
Barring Lisp, I myself would go for Ruby.
I'm going to have to vote for Common Lisp here. It is a highly dynamic language that can be adapted to just about anything. You get not only functional programming, but also OO, and even procedural if you so desire. And macros in Lisp are very interesting to study, since to my knowledge no other language has its equivalent.
Plus, developing in a functional style tends to help development in other languages as well. For example, I've noticed that I do OO primarily with immutable objects, thanks to concepts influenced by Lisp and Scheme. And with this, I've noticed an improvement in the stability and maintainability of my OO apps. Just my two cents.
I've been a C++ and C# developer for a long time, and recently started experimenting and learning other languages. I played with Ruby for little while and like it, but it wasn't what I wanted.
I ended up choosing Erlang. After reading about Erlang, I've decided that I really wanted to learn it. I'm not learning Erlang with any hopes of getting a job writing Erlang code. I'm learning Erlang only to become a better developer.
I really do like this language so far. It's only been about a month, and the syntax still gets me sometimes, but I can really see the power of the pattern matching and get excited to write it again. I struggled with the concept of everything being non-mutable at first. But this was mostly because I've "grown up" on C# and C++. C# is a great language and has some amazing tools, but you really have fun with some other languages, particularly something like Erlang. Just don't expect to land a job as a full time Erlang developer. (At least not yet).
For anyone curious, my hobby project is multiple player iPhone app connecting to an Erlang server. For a Windows developer, this has been a major change. But it has renewed my passion for programming, which really was my goal.
If you really want to go crazy but want things to at least be slightly familiar (I know that sounds like a contradiction but it's true), look at F#. It's a type-inferred language but it supports a lot of dynamic type properties. It's a functional language built on top of the CLR so you get full use of the .Net object system which is cool. Because it's a functional language, there are enough novel concepts to really work your brain.
If you really want to go for "biggest departure", Clojure might be of interest. It's a Lisp dialect built on the JVM. It's getting some pretty serious attention both in the Java and Lisp world. It might suit your purposes.

OOP vs Functional Programming vs Procedural [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
What are the differences between these programming paradigms, and are they better suited to particular problems or do any use-cases favour one over the others?
Architecture examples appreciated!
All of them are good in their own ways - They're simply different approaches to the same problems.
In a purely procedural style, data tends to be highly decoupled from the functions that operate on it.
In an object oriented style, data tends to carry with it a collection of functions.
In a functional style, data and functions tend toward having more in common with each other (as in Lisp and Scheme) while offering more flexibility in terms of how functions are actually used. Algorithms tend also to be defined in terms of recursion and composition rather than loops and iteration.
Of course, the language itself only influences which style is preferred. Even in a pure-functional language like Haskell, you can write in a procedural style (though that is highly discouraged), and even in a procedural language like C, you can program in an object-oriented style (such as in the GTK+ and EFL APIs).
To be clear, the "advantage" of each paradigm is simply in the modeling of your algorithms and data structures. If, for example, your algorithm involves lists and trees, a functional algorithm may be the most sensible. Or, if, for example, your data is highly structured, it may make more sense to compose it as objects if that is the native paradigm of your language - or, it could just as easily be written as a functional abstraction of monads, which is the native paradigm of languages like Haskell or ML.
The choice of which you use is simply what makes more sense for your project and the abstractions your language supports.
I think the available libraries, tools, examples, and communities completely trumps the paradigm these days. For example, ML (or whatever) might be the ultimate all-purpose programming language but if you can't get any good libraries for what you are doing you're screwed.
For example, if you're making a video game, there are more good code examples and SDKs in C++, so you're probably better off with that. For a small web application, there are some great Python, PHP, and Ruby frameworks that'll get you off and running very quickly. Java is a great choice for larger projects because of the compile-time checking and enterprise libraries and platforms.
It used to be the case that the standard libraries for different languages were pretty small and easily replicated - C, C++, Assembler, ML, LISP, etc.. came with the basics, but tended to chicken out when it came to standardizing on things like network communications, encryption, graphics, data file formats (including XML), even basic data structures like balanced trees and hashtables were left out!
Modern languages like Python, PHP, Ruby, and Java now come with a far more decent standard library and have many good third party libraries you can easily use, thanks in great part to their adoption of namespaces to keep libraries from colliding with one another, and garbage collection to standardize the memory management schemes of the libraries.
These paradigms don't have to be mutually exclusive. If you look at python, it supports functions and classes, but at the same time, everything is an object, including functions. You can mix and match functional/oop/procedural style all in one piece of code.
What I mean is, in functional languages (at least in Haskell, the only one I studied) there are no statements! functions are only allowed one expression inside them!! BUT, functions are first-class citizens, you can pass them around as parameters, along with a bunch of other abilities. They can do powerful things with few lines of code.
While in a procedural language like C, the only way you can pass functions around is by using function pointers, and that alone doesn't enable many powerful tasks.
In python, a function is a first-class citizen, but it can contain arbitrary number of statements. So you can have a function that contains procedural code, but you can pass it around just like functional languages.
Same goes for OOP. A language like Java doesn't allow you to write procedures/functions outside of a class. The only way to pass a function around is to wrap it in an object that implements that function, and then pass that object around.
In Python, you don't have this restriction.
For GUI I'd say that the Object-Oriented Paradigma is very well suited. The Window is an Object, the Textboxes are Objects, and the Okay-Button is one too. On the other Hand stuff like String Processing can be done with much less overhead and therefore more straightforward with simple procedural paradigma.
I don't think it is a question of the language neither. You can write functional, procedural or object-oriented in almost any popular language, although it might be some additional effort in some.
In order to answer your question, we need two elements:
Understanding of the characteristics of different architecture styles/patterns.
Understanding of the characteristics of different programming paradigms.
A list of software architecture styles/pattern is shown on the software architecture article on Wikipeida. And you can research on them easily on the web.
In short and general, Procedural is good for a model that follows a procedure, OOP is good for design, and Functional is good for high level programming.
I think you should try reading the history on each paradigm and see why people create it and you can understand them easily.
After understanding them both, you can link the items of architecture styles/patterns to programming paradigms.
I think that they are often not "versus", but you can combine them. I also think that oftentimes, the words you mention are just buzzwords. There are few people who actually know what "object-oriented" means, even if they are the fiercest evangelists of it.
One of my friends is writing a graphics app using NVIDIA CUDA. Application fits in very nicely with OOP paradigm and the problem can be decomposed into modules neatly. However, to use CUDA you need to use C, which doesn't support inheritance. Therefore, you need to be clever.
a) You devise a clever system which will emulate inheritance to a certain extent. It can be done!
i) You can use a hook system, which expects every child C of parent P to have a certain override for function F. You can make children register their overrides, which will be stored and called when required.
ii) You can use struct memory alignment feature to cast children into parents.
This can be neat but it's not easy to come up with future-proof, reliable solution. You will spend lots of time designing the system and there is no guarantee that you won't run into problems half-way through the project. Implementing multiple inheritance is even harder, if not almost impossible.
b) You can use consistent naming policy and use divide and conquer approach to create a program. It won't have any inheritance but because your functions are small, easy-to-understand and consistently formatted you don't need it. The amount of code you need to write goes up, it's very hard to stay focused and not succumb to easy solutions (hacks). However, this ninja way of coding is the C way of coding. Staying in balance between low-level freedom and writing good code. Good way to achieve this is to write prototypes using a functional language. For example, Haskell is extremely good for prototyping algorithms.
I tend towards approach b. I wrote a possible solution using approach a, and I will be honest, it felt very unnatural using that code.