Is ANTLR available on QBasic? - antlr

I have just started studying about compiler design. And i have a little task to write a grammar on QBasic. But there are only few targeted languages on ANTLR. Is it possible on QBasic? Please anyone explain about this.

The biggest repository of ANTLR grammars I know of in one place is at this Github page. And it doesn't look like QBasic is among them.
I've written a grammar/interpreter or three with BASIC-like syntax with domain-specific extensions (and no line numbers!) but it doesn't look like anyone has undertaken QBasic in ANTLR4, not publicly at least.

Related

Creating a simple Domain Specific Language

I am curious to learn about creating a domain specific language. For now the domain is quite basic, just have some variables and run some loops, if statements.
Edit :The language will be Non-English based with a very simple syntax .
I am thinking of targeting the Java Virtual Machine, ie compile to Java byte code.
Currently I know how to write some simple grammars using ANTLR.
I know that ANTLR creates a lexer and parser but how do I go forward from here?
about semantic analysis: does it have to be manually written or are there some tools to create it?
how can the output from the lexer and parser be converted to Java byte code?
I know that there are libraries like ASM or BCEL but what is the exact procedure?
are there any frameworks for doing this? And if there is, what is the simplest one?
You should try Xtext, an Eclipse-based DSL toolkit. Version 2 is quite powerful and stable. From its home page you have plenty of resources to get you started, including some video tutorials. Because the Eclipse ecosystem runs around Java, it seems the best choice for you.
You can also try MPS, but this is a projectional editor, and beginners may find it more difficult. It is nevertheless not less powerful than Xtext.
If your goal is to learn as much as possible about compilers, then indeed you have to go the hard way - write an ad hoc parser (no antlr and alike), write your own semantic passes and your own code generation.
Otherwise, you'd better extend an existing extensible language with your DSL, reusing its parser, its semantics and its code generation functionality. For example, you can easily implement an almost arbitrary complex DSL on top of Clojure macros (and Clojure itself is then translated into JVM, you'll get it for free).
A DSL with simple syntax may or may not mean simple semantics.
Simple semantics may or may not mean easy translation to a target language; such translations are "technically easy" only if the DSL and the target languate share a lot of common data types and execution models. (Constraint systems have simple semantics, but translating them to Fortran is really hard!). (You gotta wonder: if translating your DSL is easy, why do you have it?)
If you want to build a DSL (in your case you stick with easy because you are learning), you want DSL compiler infrastructure that has whatever you need in it, including support for difficult translations. "What is needed" to handle translating all DSLs to all possible target languages is clearly an impossibly large set of machinery.
However, there is a lot which is clear that can be helpful:
Strong parsing machinery (who wants to diddle with grammars whose structure is forced
by the weakness of the parsing machinery? (If you don't know what this is, go read about LL(1) grammmars as an example).
Automatic construction of a representation (e.g, an abstract syntax tree) of the parsed DSL
Ability to access/modify/build new ASTs
Ability to capture information about symbols and their meaning (symbol tables)
Ability to build analyses of the AST for the DSL, to support translations that require
informatoin from "far away" in the tree, to influence the translation at a particular point in the tree
Ability to reogranize the AST easily to achieve local optimizations
Ability to consturct/analysis control and dataflow information if the DSL has some procedural aspects, and the code generation requires deep reasoning or optimization
Most of the tools available for "building DSL generators" provide some kind of parsing, perhaps tree building, and then leave you to fill in all the rest. This puts you in the position of having a small, clean DSL but taking forever to implement it. That's not good. You really want all that infrastructure.
Our DMS Software Reengineering Toolkit has all the infrastructure sketched above and more. (It clearly doesn't, and can't have the moon). You can see a complete, all-in-one-"page", simple DSL example that exercises some ineresting parts of this machinery.

Tools to generating a grammar using examples?

This answer shows a pretty example of using a parser generator to look through text for some patterns of interest. In that example, it's product prices.
Does anyone know of tools to generate the grammars given training examples (document + info I want from it)? I found a couple papers, but no tools. I looked through ANTLR docs a bit, but it deals with grammars; a "recognizer" takes as input a grammar, not training examples.
This is a machine learning problem. You can at best get an approximation. But I don't think anybody has done this well, let alone released a tool. (I actively track what people do to build grammars for computer languages, and this idea has been proposed many times, but I have yet to see a useful implementation).
The problem is that for any fixed set of examples, there's a huge number of possible grammars. It is easy to construct a naive one: for the fixed set of examples, simply propose a grammar that has one rule to recognize each example. That works, but is hardly helpful. Now the question is, how many ways can you generalize this, and which one is the best? In fact you can't know, because your next new example may be a total surprise in terms of structure. (Theory definition: A language is the set of sentences that comprise it).
We haven't even talked about the simpler problem of learning the lexemes of the language. How would you propose to learn what legal strings for floating point numbers are?
One tool that does this is NLTK. I Highly recommend it, and the O'Reilly book that covers it is available free online. There are tools for parsing, learning grammars, etc... The only downside is that it is mainly a research rather than production tool, so the emphasis isn't on performance.
NLTK is able to construct grammar from labeled training samples, which is exactly what you are asking. Have a look at the great docs and the book. (My last experience with it also had it working on the JVM through Jython without any issues.)

Methodologies for designing a simple programming language

In my ongoing effort to quench my undying thirst for more programming knowledge I have come up with the idea of attempting to write a (at least for now) simple programming language that compiles into bytecode. The problem is I don't know the first thing about language design. Does anyone have any advice on a methodology to build a parser and what the basic features every language should have? What reading would you recommend for language design? How high level should I be shooting for? Is it unrealistic to hope to be able to include a feature to allow one to inline bytecode in a way similar to gcc allowing inline assembler? Seeing I primarily code in C and Java which would be better for compiler writing?
There are so many ways...
You could look into stack languages and Forth. It's not very useful when it comes to designing other languages, but it's something that can be done very quickly.
You could look into functional languages. Most of them are based on a few simple concepts, and have simple parsing. And, yet, they are very powerful.
And, then, the traditional languages. They are the hardest. You'll need to learn about lexical analysers, parsers, LALR grammars, LL grammars, EBNF and regular languages just to get past the parsing.
Targeting a bytecode is not just a good idea – doing otherwise is just insane, and mostly useless, in a learning exercise.
Do yourself a favour, and look up books and tutorials about compilers.
Either C or Java will do. Java probably has an advantage, as object orientation is a good match for this type of task. My personal recommendation is Scala. It's a good language to do this type of thing, and it will teach you interesting things about language design along the way.
You might want to read a book on compilers first.
For really understanding what's going on, you'll likely want to write your code in C.
Java wouldn't be a bad choice if you wanted to write an interpreted language, such as Jython. But since it sounds like you want to compile down to machine code, it might be easier in C.
I recommend reading the following books:
ANTLR
Language Design Patterns
This will give you tools and techniques for creating parsers, lexers, and compilers for custom languages.

What are the disadvantages of using ANTLR compared to Flex/Bison?

I've worked on Flex, Bison few years ago during my undergraduate studies. However, I don't remember much about it now. Recently, I have come to hear about ANTLR.
Would you recommend that I learn ANTLR or better to brush up Flex/Bison?
Does ANTLR have more/less features than Flex/Bison?
ANTLRv3 is LL(k), and can be configured to be LL(*). The latter in particular is ridiculously easy to write parsers in, as you can essentially use EBNF as-is.
Also, ANTLR generates code that is quite a lot like recursive descent parser you'd write from scratch. It's very readable and easy to debug to see why the parse doesn't work, or works wrong.
The advantage of Flex/Bison (or any other LALR parser) is that it's faster.
ANTLR has a run-time library JAR that you must include in your project.
ANTLR's recursive-descent parsers are easier to debug than the "bottom-up" parsers generated by Flex/Bison, but the grammar rules are slightly different.
If you want a Flex/Bison-style (LALR) parser generator for Java, look at JavaCC.
We have decided to use ANTLR for some of our information processing requirements - parsing legacy files and natural language. The learning curve is steep ut we are getting on top of it and I feel that it's a more modern and versatile approach for what we need to do. The disadvantages - since you ask - are mainly the learning curve which seems to be inevitable.

Getting started with ANTLR and avoiding common mistakes

I have started to learn ANTLR and have both the 2007 book "The Definitive ANTLR Reference" and ANTLRWorks (an interactive tool for creating grammars). And, being that sort of person, I started at Chapter 3. ("A quick tour for the impatient").
It's a fairly painful process especially as some errors are rather impenetrable (e.g. ANTLR: "missing attribute access on rule scope" problem which just means to me "you got something wrong"). Also I have some very simple grammars (3-4 productions only) and simple input (2 lines) which when run give "OutOfMemory" error.
The ANTLR site is useful but somewhat fragmented and some SO users have commented (https://stackoverflow.com/questions/278480/good-tutorial-for-antlr) that the book and the tutorials expect a high entry level. I've been reluctant to approach the ANTLR discussion list because of this.
LATER We are beginning to get to grips with it. It would be useful to have simple reliable examples that could be gently expanded. It's certainly worth mastering as we have remodelled quite a lot of our thinking based on ANTLR.
One problem is that ANTLR V3 has signifcant changes from V2. One answer on SO (and on the ANTLR pages) refered to a V2 syntax that is no longer available.
Some of the ANTLR questions on SO have helped me a lot, but finding them is a bit ad hoc. So I'd like to know how SO users can help to make the learning process less painful. (If you refer to the reference book it would be useful to point to particular pages).
EDIT. #duffymo and #JamesAnderson have confirmed that ANTLR is hard work - largely because parsers are difficult. (FWIW I have been through LEX/YACC, etc. and there's no doubt that ANTLR is more powerful and easier to work with.) I think it would still be useful to have areas where it's possible to avoid fouling up such as:
ensure correct capitalisation of variable names
add package name to lexer as well as parser
take care over order of rules as it affects precedence
and more of these sort would be useful.
I agree - ANTLR is not for the faint of heart. It does expect a high entry level, because grammars and parsers are not trivial.
With that said, here are a few suggestions:
Forget about v2. Version 3 is the standard; don't even waste time considering the earlier version or its documentation.
OutOfMemoryError is telling you that there's something circular in the grammar you've defined.
IntelliJ has a wonderful IDE for working with ANTLR v3. It'll give you a graphical representation of your grammar, step-through debugging, etc. If you're going to be doing a lot of work with ANTLR it'd be worth a few dollars to buy a license.
ANTLR won't be easy to master. The book is good, but dense. The error messages are cryptic, as you've noted. I'd be surprised if anyone here could make it easy.
Sorry but my experience of ANTLR (indeed javacc, bison or any full function parser) is that most of your learning will be by fixing your own mistakes!
Getting good examples of other peoples code will cut this down somewhat, the best examples look really simple -- but you are missing all the sweat and hair pulling it took to get them looking that easy.
Even if you prefer command line, it is worth using AntlrWorks when you have problems. The diagramatic representation can make it easier to see what i sgoing wrong.
A picture is worth a thousand error messages.