Creating your own language - language-design

If I were looking to create my own language are there any tools that would help me along? I have heard of yacc but I'm wondering how I would implement features that I want in the language.

Closely related questions (all taken by searching on [compiler] on stackoverflow):
Learning Resources on Parsers, Interpreters, and Compilers
Learning to write a compiler
Constructing a simple interpreter
...
And similar topics (from the same search):
Bootstrapping a language
How much of the compiler should we know?
Writing a compiler in its own language
...
Edit: I know the stackoverflow related question search isn't what we'd like it to be, but
did we really need the nth iteration of this topic? Meh!

The first tool I would recommend is the Dragon Book. That is the reference for building compilers. Designing a language is no easy task, implementing it is even more difficult. The dragon book helps there. The book even reference to the standard unix tools lex and yacc. The gnu equivalent tools are called flex and bison. They both generate lexer and parser. There exist also more modern tools for generating lexer and parser, e.g. for java there are ANTLR (I also remember javacc and CUP, but I used myself only ANTLR). The fact that ANTLR combines parser and lexer and that eclipse plugin is availabe make it very comfortable to use. But to compare them, the type of parser you need, and know for what you need them, you should read the Dragon book. There are also other things you have to consider, like runtime environment, programming paradigm, ....
If you have already certain design ideas and need help for a certain step or detail the anwsers could be more helpful.

ANTLR is a very nice parser generator written in Java. There's a terrific book available, too.

I like Flex (Fast Lex) [Lexical scanner]
and Bison (A Hairy Yacc) [Yet another compiler compiler]
Both are free and available on all *NIX installations. For Windows just install cygwin.
But I old school.
By using these tools you can also find the lex rules and yacc gramers for a lot of popular languages on the internet. Thus providing you with a quick way to get up and running and then you can customize the grammers as you go.
Example: Arithmetic expression handling [order of precedence etc is a done to death problem] you can quickly get the grammer for this from the web.
An alternative to think about is to write a front-end extension to GCC.
Non Trivial but if you want a compiled language it saves a lot of work in the code generation section (you will still need to know love and understand flex/bison).

I never finished the complete language, I had used rply and llvmlite implements a simple foxbase language, in https://github.com/acekingke/foxbase_compiler
so if you want use python, rply or llvmlite is helpful.
if you want use golang, goyacc maybe useful. But you should write a lexical analyzer by hard coding by hand. Or you can use https://github.com/acekingke/lexergo to simplify it.

Related

Are there any text-editors/IDEs that support languages generically?

I'm looking for an editor/IDE that can provide features that are nice to have while coding (example: being able to click-through to function definitions) for languages that it is not specifically built for. By these, I have in mind languages designed for a very specific purpose and often only used by a small community. In other words, ones that would not have native support in most software.
I realize this would require a fair bit of fairy dust but I don't think it's out of the scope of what's possible. Basically, the editor would have to be smart enough to recognize the commonly used syntax and semantics that many declarative languages have in common. It's quite possible this would require some amount of configuration.
Does something like this exist? If not, what challenges do you think there would be in creating it?
If you need only the feature to jump of to the definition of a specific function or class, then VIM (and many other editors, like Emacs, Epsilon and JOE) can read the jump location from the ctags file. You just have to write a ctags file generator for your custom language.
For programmable editors (like VIM, Emacs, Epsilon, Eclipse and gedit), you can write your own plugin for your custom language, but it may quickly become time-consuming and a maintenance nightmare, because new versions of editors tend to change the plugin interface.
Please note that adding support for syntax highlighting is usually much easier than adding ctags-like support for symbol lookups. More advanced features, like refactoring and context-sensitive symbol completion (like Ctrl-Space and Tab in modern IDEs) are even harder to implement.
GNU Emacs has a pretty good infrastructure for this sort of thing. Until recently Haskell was a relatively unknown language used primarily by researchers. Nevertheless, in a few thousand lines of Emacs Lisp, we have
Syntax highlighting with colors
Automatic indentation
Package support
Automatic highlighting of type and other information when placing the cursor over library functions
Meta-dot on an identifier to jump to its definition (through the standard emacs tags mechanism)
The nice thing about Emacs is that (a) there are many models to follow, and (b) you can build up the environment gradually, starting with those aspects that are most important to you.
I'm suprised no one has mentioned Notepad++ yet:
http://notepad-plus-plus.org/
It offers syntax support for a great many languages and offers the user to add new languages, and an active community that adds many languages that are not included out-of-the-box.
Most good IDE's are language agnostic and supports several if not many programming languages. If you are talking about DSL's, eclipse has a solution that seems pretty awsome - Xtext
EditPadPro comes with a set of tools that allow you to build your own syntax highlighting, code folding and file navigation schemes, based on a very powerful regex syntax. So if your language is not among the many that have already been provided out-of-the-box or can be downloaded off the website, you can roll your own quite easily (and share it with the community).
Visual Studio is designed to allow for this, but it requires the language to add explicit support. For example, Delphi Prism will install into Visual Studio, and provide full language support.
This is far above and beyond "configuration", however, and requires quite a bit of custom development to support.
SciTE and Scintilla offer a generic editor/platform for different languages. The library contains several parsers that work with many programming languages and you can reuse one of these for your own language to add formatting and syntax highlighting.
They don't offer advanced features like click-throughs, but you could build it on top of the library.
Visual Studio and Eclipse also support language plug-ins.
Zeus is a language neutral IDE for the Windows platform and it provides this go to definition/declaration functionality for any language supported by ctags.
To make it work you just create a Zeus project/workspace and then add the files to this workspace.

How to create an Eclipse editor plugin with syntax checking and coloring as fast as possible?

I'm working on a project that requires me to create a series of editors for languages that are quite different. The syntaxes are defined by us.
I'm looking for a solution for this.
Is there a shortcut to take in this problem?
You could use XText:
a framework for development of textual domain specific languages (DSLs).
Just describe your very own DSL using Xtext's simple EBNF grammar language and the generator will create a parser, an AST-meta model (implemented in EMF) as well as a full-featured Eclipse text editor from that.
Alternatives to XText are Rascal or Spoofax, both less popular than XText but interesting for they support more general context-free grammars, among other things. Nice to check out.
If you are looking for a more low level, programmable solution, then Eclipse's IDE Meta-tooling platform is a good choice (IMP).
What IMP gives you is API to connect your existing parsers to Eclipse without much hassle. You need to implement an IParseController interface, to call your parser and ITokenIterator to produce tokens and some other interface to assign fonts to each kind of token.
The aforementioned Rascal and Spoofax are both build on top of IMP.
Not mentioned is DLTK (proposed also in Tutorial regarding the development of a custom Eclipse editor)
There are Ruby, bash that are implemented with it.

How to create a compiler in vb.net

Before answering this question, understand that I am not asking how to create my own programming language, I am asking how, using vb.net code, I can create a compiler for a language like vb.net itself. Essentially, the user inputs code, they get a .exe. By NO MEANS do I want to write my own language, as it seems other compiler related questions on here have asked. I also do not want to use the vb.net compiler itself, nor do I wish to duplicate the IDE.
The exact purpose of what I wish to do is rather hard to explain, but all I need is a nudge in the right direction for writing a compiler (from scratch if possible) which can simply take input and create a .exe. I have opened .exe files as plain text before (my own programs) to see if I could derive some meaning from what I assumed would be human readable text, yet I was obviously sorely disappointed to see the random ascii, though it is understandable why this is all I found.
I know that a .exe file is simply lines of code, being parsed by the computer it is on, but my question here really boils down to this: What code makes up a .exe? How could I go about making one in a plain text editor if I wanted to? (No, I do not want to do that, but if I understand the process my goals will be much easier to achieve.) What makes an executable file an executable file? Where does the logic of the code fit in?
This is intended to be a programming question as opposed to a computer question, which is why I did not post it on SuperUser. I know plenty of information about the System.IO namespace, so I know how to create a file and write to it, I simply do not know what exactly I would be placing inside this file to get it to work as an executable file.
I am sorry if this question is "confusing", "dumb", or "obvious", but I have not been able to find any information regarding the actual contents of an executable file anywhere.
One of my google searches
Something that looked promising
EDIT: The second link here, while it looked good, was an utter fail. I am not going to waste hours of my time hitting keys and recording the results. "Use the "Alt" and the 3-digit combinations to create symbols that do not appear on the keyboard but that you need in the program." (step 4) How the heck do I know what symbols I need???
Thank you very much for your help, and my apologies if this question is a nooby or "bad" one.
To sum this up simply: I want to create a program in vb.net that can compile code in a particular language to a single executable file. What methods exist that can allow me to do this, and if there are none, how can I go about writing my own from scratch?
What you're asking is a pretty complex question. Sure, at its core it seems pretty basic:
Interpret the code itself
Write out the interpreted code
but each of those steps can be pretty intense. Step 1 should be somewhat achievable with some time and a LOT of elbow grease - you need to parse the code into a number of control statements based upon the specification of the language. Check out http://en.wikipedia.org/wiki/Parsing for more information on this step. In essence you're converting the typed code into a common format that represents the functionality you want.
Once you've parsed the code, the next step would be to take that parsed code and convert it into machine-runnable code (typically assembly, though with VB.NET you can write Microsoft Intermediate Language code as the output and then run it in the CLR). This is what will actually create the executable file in a manner that lets the computer run the program.
Unfortunately, the best advice for solving this problem is to either:
Go purchase several books on a programming language, machine code, assembly language, compilers, and so on, then spend several months or years reading and experimenting until the knowledge you gain from the books results in your writing a successful compiler.
Enroll in a computer science program at a local university. Writing compilers and programming languages are usually covered at a rudimentary level during the second or third year of a BS in computer science, and then in much more depth at a graduate level.
Good luck!
EDIT: If all you're looking for is a way to write code and then write a way to execute it, you might try looking at writing an interpreter for one of the scripting languages that already exist - Ruby, Python, Lua, etc.
Process.Start(String.Format("vbc.exe {0}", sourceFilePath))
Here is one book that explains it all at a very basic level. Including sample code that you can get working in a matter of hours.
As with many programming endevors, it will take you many months of study to accomplish what you want. Good luck!!
Let me start by saying this is totally doable. Ignore the naysayers.
OK, so it seems you bit off more than you could chew here. I would suggest slightly re-evaluating your goal. It seems that you want to learn how compilers work, and writing a VB.net compiler is your project to get started.
Try instead to write an interpreter, which is a much smaller and simpler task. Here's how you do it:
Write a parser for a very very small language, maybe only supporting assignment statements. Use a parser toolkit, like Antlr. This will build an AST ("abstract syntax tree" - ie a few classes or objects representing the program's syntax, without the crap like semi-colons, as a tree), which will look something like this:
then you go through the AST, and just execute it. Google for "AST walker". So for the assignment x = 5, create a hashtable entry for 'x', assign 5 to it, then move on to the next statement.
keep adding features as you go.
Once you've got the full language, you'll probably have learned enough on the way to understand the compiler books. Don't use the dragon book, try Appel's book instead, or Cooper/Torczon. There are online books if you prefer, I've never tried them.
When you go to write the compiler, you'll just be changing the bit that executes the AST into one which generates assembly (or C if you prefer) which will do the same action when it's run.
I'll grant that it seems daunting, but if you stick to a something simple to start, you'll get something working in a few days at most. In a few months, you'll have built it up to something like what you're looking for. Good luck.
To the final part of your question:
It seems that you don't know how executables are made from code. People haven't messed with hex in their compilers since the 80s, and hopefully not much even then.
Basically, after parsing the code, you go through a series of steps which make the code progressively simpler. At the end of that, you have something that is quite close to assembly. You then generate assembly, and the assembler and linker conspire to make it into an executable.
The Visual Basic .NET compiler is shipped for free as part of the .NET Framework - you don't even need the SDK or Express Editions. The compiler for VB (and C#) is located at c:\windows\microsoft.net\framework[version]\vbc.exe (or csc.exe for C#). Therefore, any computer which can run a VB.NET program can compile one. The .NET Framework also includes the System.CodeDom namespace which provides a way from within a program to compile a program, either from a document model or from a string (or file) into a .NET assembly (.exe or .dll) and generate code in both VB and C#.
Regards,
Anthony D. Green | Program Manager | Visual Basic Compiler
You create a compiler for vb.net the same way you create any other compiler. You need a lexer/parser. Entire books have been written on this topic, the most famous probably being The Dragon Book.
To provide a definitive answer: No, you cannot create a decent compiler that will generate an executable file using Notepad. You need a compiler to convert from human-readable text into the machine language (assembly or IL) that a linker or interpreter can then execute.
You can try checking out my tutorial at http://www.icemanind.com
It is a tutorial on creating your own virtual machine and assembler, written in 100% C#.
Cyclone, I'm wondering exactly what are you trying to achieve? You say "the exact purpose of what I wish to do is rather hard to explain" and "essentially, the user inputs code, they get a .exe".
If you just want the user to be able to enter code and then execute it, you might consider an existing scripting language. VBScript is built into Windows and the language is fairly similar to VB.Net, or there are various excellent free languages you can download like Python.
If the user really needs to be able to create a .exe - I think it's likely scripting might do - then why not use an existing free compiler like FreeBasic, or even Visual Basic.Net Express Edition.
I am making a tutorial for that in my website, http://dgblogs.weebly.com. It is written with C# and you can create your own computer programming language!
You will never see this syntax in my tutorials:
Console.Readline();
My website is currently offline by now so, GOOD LUCK!
Thanks,
DgBlogs
Im quite shocked that this question was never fully answered ; Recently i came across some tutorials on the subject of developing a programming language from scratch https://www.youtube.com/c/DmitrySoshnikov-education/playlists
Although the person uses Java Script the technique used for creating the tokenizer and the parser to consume the output from the tokenizer producing the AST for the transcribed language which i would consider to be GOLD! ...
Another tutorial or person https://youtube.com/playlist?list=PLSq9OFrD2Q3DasoOa54Vm9Mr8CATyTbLF
Toby Ho! Has also produce various methods using the Nearly parser (plugin) to build a language much easier but not a Pure code coder like myself(does not like extensions to do the work which i can do myself)....
https://github.com/spydaz/SpydazWeb-AI-_Emulators
I was able to do some different experiments designing a stack machine (runs mini assembly language). for my "Toy" Programming language(basic) could be executed on. (the VM) - Using Visual Basic(My personal Lang - I think in VB)
Since revisiting the tutorials ;
https://youtube.com/playlist?list=PLRAdsfhKI4OWNOSfS7EUu5GRAVmze1t2y
Immo Landwerth! Had a few more experiments This time with C#
A bit more Spaghetti to mull over; but actually revealing (after knowing a lot more);
Yes IT is possible to design a Compiler / VM with VB.net Quite easily ... without external tools .
For myself as an AI developer (CONVERSATIONAL AI / NLP) i was interested in building a compiler for the English language if it would be possible to use the technique for parsing syntax trees and building syntax trees from text using the Tokenizer/Parser to AST Method , as well as designing the transpiler to allow for the syntax tree to be used for translation to other languages. as with today's programming languages merely being a Higher lever language interfacing with a low level(VM) language. in-fact we should be developing more natural language - programming languages and dispelling the more cryptic languages such as C# (Lol) and Java and Go and the like (so many brackets and semi colons / tabs etc - truly not needed!) and that is to say that we programmers still think in code! the journey as a AI developer crosses many domains. forces many off topic research pathways .. So Again a big YES! and we Still could say that ANy programming language can be used to create another programming language it just depends on which language you "THINK IN" .... hence not having too many languages using (i understand most programming languages - but i will always think in VB)...

Can statically compiled languages replace scripting language?

Assuming you can get a dynamic interpreter; can statically compiled languages replace scripting language? I never quite understood why anyone would use a scripting language? I am talking about on PC, not a limited system which needs a simplistic interpreter. I seen some python install scripts and seen similar python and C# solutions to a problem. So why use a scripting language?
NOTE: There are things that bother me about C#, i am not asking why not use C# instead. I am asking why use a scripting language? I find static compiled languages much easier to debug and often easier to code in.
There is very little distinction these days between compiling and interpreting. Look at how an interpreted language is executed - the first step is to convert the script into some kind of internal executable form, like byte code that can be executed by a simpler instruction set. This is essentially compilation to a virtual machine format. This is exactly what modern compiled languages do. And when compiled languages are deployed in server-side web apps, they even recompile from the source on the fly. So there's practically no difference in terms of the compile/execute technique.
The only difference is in the details of the instruction set, specifically in the type system. Scripting languages are usually (but not always) dynamically typed. But many large applications are also written in dynamically typed languages too. So again, there is no clear distinction here.
Personally I think static typing, far from being "extra unnecessary effort" (as it is often described) is actually a huge productivity booster, making it much easier to write short snippets correctly on the first attempt, thanks to intellisense/autocompletion. To underline this, look at how Microsoft has improved the jQuery library simply by adding static type information to it (in specially formatted comments) so we can have intellisense in the IDE.
And meanwhile, static languages (including C# and Java) are bringing in more dynamic typing features.
So I see these categories as eventually merging and the distinction being meaningless.
Wikipedia says that a Scripting Language is a language that controls other software. You can do that with C#, but true scripting languages like Powershell are designed specifically for this.
I tend to think of a scripting language in more "interactive" terms than C#. With a scripting language, you can write a line or two of code, execute it and see the results immediately. That's not so easy in C#, where you have to put your code in a Console Application, or fire it off from a unit test, or type it into the Immediate window where you don't have intellisense.
That rapid cycle of write, execute allows rapid prototyping of complete "scripts" in a scripting language, because it gives you immediate feedback on each line of code.
This kind of question often starts flame wars as people are passionate about their respective camps.
In the computer olden days, Unix command line tools and console shells provided a rich scripting environment where all sorts of processing could be done. You didn't need to be an expert programmer in any specific language and could string (pun intended) various programs (other people wrote) together using the pipe structure to massage your data which was mostly text not binary related. It is quick and easy to make changes to your batch command file. You don't have a source file that has to be edited, compiled linked with external static or shared libries/DLLS in the case of Windows.
One thing scripting does not have normally have is speed. You don't write device drives and live internet trading AI systems in scripting. But if you run a script once a day on some data received via e-mail or ftp you don't normally care how long it takes as it can run it background anyway.
Rewind back to the present and the waters become muddy. Some scripting enviroments offer a kind of speed up facility where they will read you script and almost compile and link in modules the same a normal C++ or VB program might use for speed puposes. But this very iffy and can't be relied on.
So how do you choose which route to go. Start doing tasks using scripting. If it runs too slow or you are having to do stuff every 5 minutes then parts of your script might benifit from a section written in a traditional language or the whole thing could be written in a language.
Like anything dabble and learn
Each is used for different purposes. Programs written in scripting languages are often not self-contained; they often function as "glue code" or (as Robert Harvey mentions) to automate a task. You often find scripting language interpreters embedded within an application (cf Python in Blender; Guile, Perl and Python in GIMP; JS in umpteen different browsers; Lua in countless games). Compiled languages, on the other hand, are used to produce self-contained applications. Scripts are mostly cross-platform; compiled applications usually aren't.
Note that a scripting language doesn't necessarily use an interactive interpreter (e.g. Perl), and an interpreted language isn't necessarily use for scripts (e.g. games made using PyGame). Note also that there's nothing about the languages themselves that make them interpreted or compiled. You could have a C# interpreter or a Ruby compiler. There have been a number of Lisp systems that offered both interpreters and compilers.
I would call my shell (bash) a scripting language, and I don't see a replacement comming, which is compiled.
I like to use scala, which is a statically typed language which comes with an interpreter-like REPL-interface, and due to type interference looks pretty much like a scripting language; have a look here: http://www.simplyscala.com/ .
But it isn't meant to be the glue between other programs as the shell is, so for small jobs, which are easily verified by hand and eye, which are just a few lines of code, I prefer to use the shell. And jumping from directory to directory is comfortable in a shell, where the prompt shows where I am.
Before we begin, I don't think that I've ever met a static language user who "got" scripting language without trying them, including myself. It is a different experience.
So no. Basically, you can add features to static languages which makes them superficially seem like scripting languages (like simple type inference), but its not the same:
Many scripting language users hate static languages. They feel constrained. Scripting languages are typically very good at not getting in the users way, which is sacrificed in static languages for speed/correctness.
Duck typing will not appear in static languages.
Scripting language users don't like type annotations. Its not really possible to provide a type-inference system for scripting languages, and the simple type inference appearing in some languages now only works for static types.
Techniques like monkey patching (which to my mind is a very bad idea) is pervasive in Ruby, and allows for very powerful techniques, which won't become available soon in static languages either.
Which isn't to say that a yet-to-be-designed language can't handle scripting language features in a relatively static way, but it would be difficult for it to become popular relative to the entrenched Python/PHP/Perl/Ruby/Javascript set. Factor is the closest thing, AFAICT.
What will happen is that scripting language implementations will get faster by using JITs.
Can a screw driver replace a hammer ? No, because you just don't use them for the same purpose. And if both exist, and if such a lot of people use either one or the other, there must be a reason...
Same anwser for :
class inheritance vs prototype;
imperative vs oo;
static vs dynamic typing;
strongly vs weakly typed;
manual memory management vs GC;
C# vs Java;
blue vs red;
man vs woman;
batman vs superman (but I do think superman would win... wait, there is kryptonite... oh man, I don't know...)
etc...
Because it is shorter to write since it is a higher level language, and it doesn't need the compilation cycle which also makes thing shorter.
I am asking why use a scripting
language? I find static compiled
languages much easier to debug and
often easier to code in.
Because I find loosely-typed dynamic languages without an explicit compile-run cycle much easier to debug and generally easier to code in.

Programming features missing in C++ and Java

What are the programming features that are missing in C++ and Java ?
For eg. You can't do recursive programming in QBasic ? You can't dynamically allocate memory in QBasic.
What would be the good to have features in C++, Java.
I think Lisp Programmers will be able to add a few.
I miss lambda expressions.
This answer deals only with C++
Things I miss from the syntax, or the standard library:
RegExp as part of the standard library
Threads as part of the standard library
Pointer to member methods (not objects!)
Properties would be nice (I have seen codes that emulate this via C++ preprocessor... note an nice looking code).
Some lower level networking API (sockets!), and higher level API (give me this file from this ftp, submit "this" to this site via POST).
This is the list of things I would like to see, but I assume other people will disagree with me.
Memory garbage collector is nice.
A n interface for a GUI toolkit - let MSVC map it to win32, and on Linux... (good question!)
A stable ABI. In C it's a standard - but on C++ we are still missing a few decades. I want also stable ABI between compilers - I want to compile one library in MinGW, the other with CL and all should work.
This is the list of things I want to see, but I know they will not get away:
Compatibility with C. Really, it's a myth right now. using namespace std killed it.
Include, headers. Most of the information is already available in the DLL/so/a/"library", do we really need to keep this bad decision from 30 years ago? If needed the compilers should keep information in the binaries.
The need for Makefiles - the compiler should be smart enough to know what to do with this code, from the code itself. Pascal is doing it quite good. I think also D.
(I might be wrong, please correct me) The official standard openly and freely available for viewing. Why should I pay for the official papers? Do I need to do it for HTTP? UTF8? Unicode?
I think this is a very subjective question. From a theoretical point of view there's nothing "missing" in Java because you can do everything you want to from the perspective of the outcome as an application.
As with QBasic - recursion may not be possible but that doesn't prevent you from changing your recursive algorithm to an iterative algorithm. Programming language theory tells us that you can do this with every recursive problem. So there's also nothing missing here.
I think what you mean are features that are "nice to have" - and here everyone has to decide for himself. I'd even say there are features in the language which would have been "nice not to have" such as static imports - but again this is my subjective opinion...