Using ANTLR4 lexing for Code Completion in Netbeans Platform - antlr

I am using ANTLR4 to parse code in my Netbeans Platform application. I have successfully implemented syntax highlighting using ANTLR4 and Netbeans mechanisms.
I have also implemented a simple code completion for two of my tokens. At the moment I am using a simple implementation from a tutorial, which searches for a whitespace and starts the completion process from there. This works, but it deems the user to prefix a whitespace before starting code completion.
My question: is it possible or even contemplated using ANTLR's lexer to determine which tokens are currently read from the input to determine the correct completion item?
I would appreciate every pointer in the right direction to improve this behaviour.

not really an answer, but I do not have enough reputation points to post comments.
is it possible or even contemplated using ANTLR's lexer to determine which tokens are currently read from the input to determine the correct completion item?
Have a look here: http://www.antlr3.org/pipermail/antlr-interest/2008-November/031576.html
and here: https://groups.google.com/forum/#!topic/antlr-discussion/DbJ-2qBmNk0
Bear in mind that first post was written in 2008 and current antlr v4 is very different from the one available at the time, which is why Sam’s opinion on this topic appear to have evolved.
My personal experience - most of what you are asking is probably doable with antlr, but you would have to know antlr very well. A more straightforward option is to use antlr to gather information about the context and use your own heuristics to decide what needs to be shown in this context.

The ANTLRv3 grammar https://sourceware.org/git/?p=frysk.git;a=blob_plain;f=frysk-core/frysk/expr/CExpr.g;hb=HEAD implements context sensitive completion of C expressions (no macros).
For instance, if fed the string:
a_struct->a<tab>
it would just lists the fields of "a_struct" starting with "a" (tab could, technically be any character or marker).
The technique it used was to:
modify a C grammar to recognize both IDENT and IDENT_TAB tokens
for IDENT_TAB capture the partial expression AST and "TOKEN_TAB" and throw them back to 'main' (there are hacks to help capture the AST)
'main' then performs a type-eval on the partial expression (compute the expression's type not value) and use that to expand TOKEN_TAB
the same technique, while not exactly ideal, can certainly be used in ANTLRv4.

Related

ANTLR4: What is the best approach to implement C like include file handling?

I am implementing a lexer/parser for the real-time language OpenPEARL. For better struturing of my testsuite I want to implement a include file handling similiar to C/C++. The parser iteself uses the visitors. What would be the best approach to implement this? One thing which concern me when instantiating a nested parser the included file does not need to contain a complete program depending where it is included.
Cheers
Marcel
I can't speak for ANTLR, but in general one implements a C-like preprocessor in the lexer.
You accomplish this by having a stack of input streams, with the base of the stack being the source file. You read input from the stream on top of the stack.
When an include is encountered in the lexer, a new stream is pushed on top of the stack, and reading continues (now from the new stream). When a stream encounters EOF, you pop the stack and continue; if the stack is empty, the lexer emits an EOF token.
You can abuse these streams to implement macros. On macro call, simply push a new stream that represents the macro body. When you encounter a macro parameter name, push a stream for the argument supplied to the corresponding macro.
I have seen implementations where include handling has been done in the (parser) grammar. Doing it in the lexer like Ira suggests is certainly possible, but with some extra work.
However, full include handling is more than simply switching input streams, namely macro handling, line splicing, trigraph handling, charizing and stringizing + as evaluator for #if(def) commands. All that I have implemented in my Windows Resource File Parser, which was written for ANTLR 2.7 and hence needs an update, but is certainly good for getting ideas.
In this project I handle include files outside of the normal ANTLR parsing chain, which follows more the preprocessor approach you often see for C/C++.

Code Editor using ANTLR

I'm in the process of writing a code editor for a custom language. We're using ANTLR for the lexer and parser, and CodeMirror for the editor framework (running in a browser).
I can do some basic things like syntax coloring of keywords, as well as providing rudimentary code completion.
What I'm finding is that the user is frequently in the middle of editing something, so the ANTLR parser is not very useful since the current input stream is not fully parseable (and often leads ANTLR down the wrong path due to an incomplete input stream).
Therefore, I'm using the token stream to figure out what's going on and attempt to provide context-sensitive help.
I'm wondering if anyone can provide some guidance around using ANTLR as part of a code editor. Am I on the right track using the token stream instead of the parse tree?
Can the ANTLR API be leveraged to do things like looking ahead for tokens to figure out the overall context of what the user is currently editing?
Sorry if this is kind of vague. Just getting started on this project. :-)
Thanks for any help.
I find ANTRL great for syntax checking and easy details retrieval for valid input. For code completion however you have a different scenario. As you found out already the parser can often not give you good answers, because the input is not valid while the user is typing. Bart has linked to an answer where Sam described that he implemented a great solution using ANTLR 4, but unfortunately doesn't describe how.
But even if you can get the parser to give you a set of expected tokens, what are you going to make out of it? What do you wanna show if, say, an identifier is expected? That can be anything, like a class member, a var name etc. I don't believe this is the answer, hence I developed an own solution which I describe here: Universal Code Completion using ANTLR. This is for ANTLR 3, but can certainly be made working with 4 as well.
This article also contains several links to (C++) source code that shows how code completion is implemented in my application. It's amazing how simple the implementation is, after all, but still delivers very precise results.
Update
Meanwhile I developed a solution for ANTLR4 too, called [antlr4-c3][2]. This is a Typescript solution but comes with translations to [Java][3] and [C++][4].
If someone is interested in the integration of Codemirror 6 and Antlr4 grammar,
you can find an example on github page.
The main idea is here:
token: (stream, state) => {
// getting tokens for the current stream.string, using antlr4 TokenStream
const tokens = getTokensForText(stream.string);
// getting the next token
const nextToken = tokens.filter(t => t.startIndex >= stream.pos)[0];
// matching the next token in current stream
if (stream.match(nextToken.text)) {
let valueClass = getStyleNameByTag(tags.keyword);
switch (nextToken.type) {
case ...:
valueClass = getStyleNameByTag(tags.string);
break;
...
default:
valueClass = getStyleNameByTag(tags.keyword);
break;
}
return valueClass;
} else {
stream.next();
return null;
}
}

Skip general rule enter/exit listener method in favor of more specific one? (ANTLR4)

I have generated a grammar in ANTLR4. A sample excerpt is shown below:
list : defunExpr # defun
: lambdaExpr # lambda
: condExpr # cond
...
: items # other
;
The rules are listed in order of priority and are called as appropriate when testing the grammar. All higher priority rules of #defun, #lambda, #cond, etc. would also match items (#other) if they did not match higher up (expected behavior of placing higher-priority rules before lower).
I then implemented a simple listener-based application in Java, which simply formats the parsed code and prints it back out the the console. I have overridden the appropriate enter/exit methods for #defun, #lambda, #cond, etc. I would like to implement a generalized catch-all for items which do not match the more specific rule. However, when I implement enter/exit methods for #other, it executes for every matched rule further up the priority as well, effectively outputting formatted code twice for rules such as #defun, #lambda, #cond, etc.
Is there some way to achieve this behavior? I have a handful of specific rules I want to implement, and then have a general case catch the others. The grammar parses properly (test rig shows expected behavior over numerous test cases), but the catch-all method (enterOther) seems to act upon the specific rules as well.
EDIT: Wow, after all this time and posting this question, I now actually believe it is a grammar error. I will leave the question open until I verify, however.
Thanks for the interest, guys. I'm not evaluating anything, just echoing parsed input, so listeners work fine. Grammar was actually fine, non-ambiguous. The catch-all rule (it was catch-all, despite not showing enough of my grammar here) worked fine. My problem (embarrassingly), was that while I wanted to write enter/exit #other methods, I was actually writing enter/exit Expr methods the whole time, which was why all specific rules we triggered as well (since they are Exprs). Embarrassing, but lesson learned. Thanks for the ideas and taking the time. Cheers!

Is it feasible to use Antlr for source code completion?

I don't know, if this question is valid since i'm not very familiar with source code parsing. My goal is to write a source code completion function for one existing programming language (Language "X") for learning purposes.
Is Antlr(v4) suitable for such a task or should the necessary AST/Parse Tree creation and parsing be done by hand, assuming no existing solutions exists?
I haven't found much information about that specific topic, except a list of compiler books, except a compiler is not what i'm after for.
The code completion in GoWorks is completely implemented using ANTLR 4. The following video shows the level of completion of this code completion engine. The code completion example runs from 5 minutes through the end of the video.
Intro to Tunnel Vision Labs' GoWorks IDE (Preview Release)
I have been working on code completion algorithms for many years, and strongly believe that there is no better solution (automated or manual) for producing a code completion solution for a new language that meets the requirements for what I would call highly-responsive code completion. If you are not interested in that level of performance or accuracy, other solutions may be easier for you to get involved with (I don't work with those personally, because I am too easily disappointed in the results).
Xtext uses ANTLR3 and has good autocomplete facilities. The problem is, it generates a seperate parser (again using antlr3) for autocomplete processing which is derived from AbstractInternalContentAssistParser. This multi-thousand line code part shows that the error recovery of ANTLR3 alone found to be insufficient by the xtext team.
Meanwhile ANTLR4 has a function parser.getExpectedTokensWithinCurrentRule() which lists possible token types for given position. It works when used in a ParseTreeListener. Remaining is semantics, scoping etc which is out of ANTLRs scope.

OSLO, ANTLR or other parser grammar, for parsing QUERY EXPRESSION

Greetings
I'm working on a project that requires me to write queries in text form, then convert them to some easily processed nodes to be processed by some abiguous repository. Of everything there, the part I'm least interested is the part that converts the text to nodes. I'm hoping it's already done somewhere.
Because I'm making stuff up as I go, I chose to use a LINQish expression syntax.
from m in Movie select m.A, m.B
I started parsing it manually and got the basics, but it's pretty cheesy. I'm looking for the better solution. I made some progress using MGrammar, but it would be nice if such a thing already existed. Does anyone know of anything that already does this? I looked for existing ANTLR templates, but no luck.
Thanks for the help.
You could start with a full C# grammar and throw away everything but the LINQ syntax :-}
The DMS Software Reengineering Toolkit is a tool for building parsers/program analyzers/transformers that has a full C# 4.0 front end, including all the LINQ syntax.
Try this example from the Pyparsing wiki Examples page. It should give you a start.