Code Editor using ANTLR - ide

I'm in the process of writing a code editor for a custom language. We're using ANTLR for the lexer and parser, and CodeMirror for the editor framework (running in a browser).
I can do some basic things like syntax coloring of keywords, as well as providing rudimentary code completion.
What I'm finding is that the user is frequently in the middle of editing something, so the ANTLR parser is not very useful since the current input stream is not fully parseable (and often leads ANTLR down the wrong path due to an incomplete input stream).
Therefore, I'm using the token stream to figure out what's going on and attempt to provide context-sensitive help.
I'm wondering if anyone can provide some guidance around using ANTLR as part of a code editor. Am I on the right track using the token stream instead of the parse tree?
Can the ANTLR API be leveraged to do things like looking ahead for tokens to figure out the overall context of what the user is currently editing?
Sorry if this is kind of vague. Just getting started on this project. :-)
Thanks for any help.

I find ANTRL great for syntax checking and easy details retrieval for valid input. For code completion however you have a different scenario. As you found out already the parser can often not give you good answers, because the input is not valid while the user is typing. Bart has linked to an answer where Sam described that he implemented a great solution using ANTLR 4, but unfortunately doesn't describe how.
But even if you can get the parser to give you a set of expected tokens, what are you going to make out of it? What do you wanna show if, say, an identifier is expected? That can be anything, like a class member, a var name etc. I don't believe this is the answer, hence I developed an own solution which I describe here: Universal Code Completion using ANTLR. This is for ANTLR 3, but can certainly be made working with 4 as well.
This article also contains several links to (C++) source code that shows how code completion is implemented in my application. It's amazing how simple the implementation is, after all, but still delivers very precise results.
Update
Meanwhile I developed a solution for ANTLR4 too, called [antlr4-c3][2]. This is a Typescript solution but comes with translations to [Java][3] and [C++][4].

If someone is interested in the integration of Codemirror 6 and Antlr4 grammar,
you can find an example on github page.
The main idea is here:
token: (stream, state) => {
// getting tokens for the current stream.string, using antlr4 TokenStream
const tokens = getTokensForText(stream.string);
// getting the next token
const nextToken = tokens.filter(t => t.startIndex >= stream.pos)[0];
// matching the next token in current stream
if (stream.match(nextToken.text)) {
let valueClass = getStyleNameByTag(tags.keyword);
switch (nextToken.type) {
case ...:
valueClass = getStyleNameByTag(tags.string);
break;
...
default:
valueClass = getStyleNameByTag(tags.keyword);
break;
}
return valueClass;
} else {
stream.next();
return null;
}
}

Related

Throw Custom Exception From ANTLR4 Grammar File

I have a grammar file which parse a specific file type. Now I need a simple thing.
I have a parser rule, and when the parser rule doesn't satisfy with the input token I need to throw my own custom exception.
That is, when my input file is giving me an extraneous error because the parser is expecting something and the i/p file doesn't has that. I want to throw an exception in this scenario.
Is this possible ?
If yes, How ?
If no, any work around ?
I'm beginner in this skill.
grammar Test
exampleParserRule : [a-z]+ ;
My input file contains 12345. Now I need to throw a custom exception
For parsing issues such as this, ANTLR will, internally, throw certain exceptions, but they are caught and handled by an ErrorListener. By default, ANTLR will hook up a ConsoleErrorListener that just formats and writes error messages to the console. But it will then continue processing, attempting to recover from the error, and sync back up with your input. This is something you want in a parser. It’s not very useful to have a parser just report the first problem it encounters and then exception out.
You can implement your own ErrorListener (there’s a BaseErrorListener class you can subclass). There you can handle the reported error yourself (the method you override provide a lot of detailed information about the error) and produce whatever message you’d like. (You can also do things like collect all the errors in a list, keep track of error levels, etc.)
In short, you probably don’t want a different exception, you want a different error message.
Sometimes, depending on how difficult it is to sort out your particular situation for a custom message, it’s really better to look for it in a Listener that processes the parse tree that ANTLR hands back. (Note: a very common path beginners take is to try to get everything into the grammar. It’s going to be hard to get really nice error messages if you do this. ANTLR is pretty good with error messages, but, as a generalized tool, it’s just more likely you can produce more meaningful messages.)
Just try to get ANTLR to produce a parse tree that accurately reflects the structure of your input. Then you can walk the ParseTree with a validation Listener of your own code, producing your own messages.
Another “trick” that doesn’t occur to many ANTLR devs (early on), is that the grammar doesn’t HAVE to ONLY include rules for valid input. If there’s some particular input you want to give a more helpful error message for, you can add a rule to match that (invalid) input, and when you encounter that Context in your validation listener, generate an error message specific to that construct.
BTW… [a-z]+ would almost always be a Lexer rule. If you don’t yet understand the difference between lexer and parser rules, or the processing pipeline ANTLR uses to tokenize an input stream, and then parse a token stream, do yourself a favor and get a firm understanding of those basics. ANTLR is going to be very confusing without that basic understanding. It’s pretty simple, but very important to understand.
You can do this in your grammar:
grammar Test
#header {
package $your.$package;
import $your.$package.$yourExceptionClass;
}
exampleParserRule : [a-z]+ ;
catch [RecognitionException re] {
reportError(re);
recover(input,re);
retval.tree = (CommonTree)adaptor.errorNode(input, retval.start, input.LT(-1), re);
String msg = getErrorMessage(re, this.getTokenNames());
throw new $yourExceptionClass(msg, re);
}
It's up to you if you really want to reportError(logs to console) , recover etc. - but these are the defaults so it may be good to use these.
Also, you may want to generate a more human readable error message (use getErrorMessage.
If you do more complex work follow #mike-cargal`s advice.

What is the meaning of 'cimode' in react-i18next and why isn't it properly documented?

I started using react-i18next a few days ago and I am very satisfied with it. However, I've been seeing this 'cimode' language here and there, in some posts and while debugging, but have no clue what it means. I've searched all over, I believe, and can't find any documentation on it.
In my particular case, I am generating some boilerplate code in a new website and created a demo page to show how to use localization in the website. I am generating toggle language buttons from the languages I set on the whitelist and, to my surprise, I have a 'cimode' button. I know I can filter it out and I will, but I would like to know what it should be used for and maybe to see better documentation for it in https://react.i18next.com/.
From my understanding, CIMODE is used for testing to consistently return the translation key instead of the variant value.
It seems rather hidden on the FAQ.

How to write an Identity-Visitor

Let us assume I am using the Python3.g4 grammar:
How can I write a program that parses a python-script, walks the syntax tree and outputs the same old program?
Later I want to apply some changes to the program, but for now I would be happy if I could just reproduce the program.
By now I guess I lose some information when walking an abstract syntax tree and there is no easy way.
The problem is that you are losing all the content of tokens on the a hidden channel. Therefore you have to check each token in your parse tree whether there is a hidden token in the TokenStream next to it that is not listed in the ParseTree.
For this the methods getHiddenTokensToRight and getHiddenTokensToLeft in the BufferedTokenStream should be the right tools for this work.
However if you just want a reproduction of the parsed input you should try to access the TokenStream directly and read out Token after Token from there without taking care of the channel the Token is on.
With that you should be able to reproduce the original input without many problems.

Using ANTLR4 lexing for Code Completion in Netbeans Platform

I am using ANTLR4 to parse code in my Netbeans Platform application. I have successfully implemented syntax highlighting using ANTLR4 and Netbeans mechanisms.
I have also implemented a simple code completion for two of my tokens. At the moment I am using a simple implementation from a tutorial, which searches for a whitespace and starts the completion process from there. This works, but it deems the user to prefix a whitespace before starting code completion.
My question: is it possible or even contemplated using ANTLR's lexer to determine which tokens are currently read from the input to determine the correct completion item?
I would appreciate every pointer in the right direction to improve this behaviour.
not really an answer, but I do not have enough reputation points to post comments.
is it possible or even contemplated using ANTLR's lexer to determine which tokens are currently read from the input to determine the correct completion item?
Have a look here: http://www.antlr3.org/pipermail/antlr-interest/2008-November/031576.html
and here: https://groups.google.com/forum/#!topic/antlr-discussion/DbJ-2qBmNk0
Bear in mind that first post was written in 2008 and current antlr v4 is very different from the one available at the time, which is why Sam’s opinion on this topic appear to have evolved.
My personal experience - most of what you are asking is probably doable with antlr, but you would have to know antlr very well. A more straightforward option is to use antlr to gather information about the context and use your own heuristics to decide what needs to be shown in this context.
The ANTLRv3 grammar https://sourceware.org/git/?p=frysk.git;a=blob_plain;f=frysk-core/frysk/expr/CExpr.g;hb=HEAD implements context sensitive completion of C expressions (no macros).
For instance, if fed the string:
a_struct->a<tab>
it would just lists the fields of "a_struct" starting with "a" (tab could, technically be any character or marker).
The technique it used was to:
modify a C grammar to recognize both IDENT and IDENT_TAB tokens
for IDENT_TAB capture the partial expression AST and "TOKEN_TAB" and throw them back to 'main' (there are hacks to help capture the AST)
'main' then performs a type-eval on the partial expression (compute the expression's type not value) and use that to expand TOKEN_TAB
the same technique, while not exactly ideal, can certainly be used in ANTLRv4.

Is it feasible to use Antlr for source code completion?

I don't know, if this question is valid since i'm not very familiar with source code parsing. My goal is to write a source code completion function for one existing programming language (Language "X") for learning purposes.
Is Antlr(v4) suitable for such a task or should the necessary AST/Parse Tree creation and parsing be done by hand, assuming no existing solutions exists?
I haven't found much information about that specific topic, except a list of compiler books, except a compiler is not what i'm after for.
The code completion in GoWorks is completely implemented using ANTLR 4. The following video shows the level of completion of this code completion engine. The code completion example runs from 5 minutes through the end of the video.
Intro to Tunnel Vision Labs' GoWorks IDE (Preview Release)
I have been working on code completion algorithms for many years, and strongly believe that there is no better solution (automated or manual) for producing a code completion solution for a new language that meets the requirements for what I would call highly-responsive code completion. If you are not interested in that level of performance or accuracy, other solutions may be easier for you to get involved with (I don't work with those personally, because I am too easily disappointed in the results).
Xtext uses ANTLR3 and has good autocomplete facilities. The problem is, it generates a seperate parser (again using antlr3) for autocomplete processing which is derived from AbstractInternalContentAssistParser. This multi-thousand line code part shows that the error recovery of ANTLR3 alone found to be insufficient by the xtext team.
Meanwhile ANTLR4 has a function parser.getExpectedTokensWithinCurrentRule() which lists possible token types for given position. It works when used in a ParseTreeListener. Remaining is semantics, scoping etc which is out of ANTLRs scope.