Let us assume I am using the Python3.g4 grammar:
How can I write a program that parses a python-script, walks the syntax tree and outputs the same old program?
Later I want to apply some changes to the program, but for now I would be happy if I could just reproduce the program.
By now I guess I lose some information when walking an abstract syntax tree and there is no easy way.
The problem is that you are losing all the content of tokens on the a hidden channel. Therefore you have to check each token in your parse tree whether there is a hidden token in the TokenStream next to it that is not listed in the ParseTree.
For this the methods getHiddenTokensToRight and getHiddenTokensToLeft in the BufferedTokenStream should be the right tools for this work.
However if you just want a reproduction of the parsed input you should try to access the TokenStream directly and read out Token after Token from there without taking care of the channel the Token is on.
With that you should be able to reproduce the original input without many problems.
Related
I have a grammar file which parse a specific file type. Now I need a simple thing.
I have a parser rule, and when the parser rule doesn't satisfy with the input token I need to throw my own custom exception.
That is, when my input file is giving me an extraneous error because the parser is expecting something and the i/p file doesn't has that. I want to throw an exception in this scenario.
Is this possible ?
If yes, How ?
If no, any work around ?
I'm beginner in this skill.
grammar Test
exampleParserRule : [a-z]+ ;
My input file contains 12345. Now I need to throw a custom exception
For parsing issues such as this, ANTLR will, internally, throw certain exceptions, but they are caught and handled by an ErrorListener. By default, ANTLR will hook up a ConsoleErrorListener that just formats and writes error messages to the console. But it will then continue processing, attempting to recover from the error, and sync back up with your input. This is something you want in a parser. It’s not very useful to have a parser just report the first problem it encounters and then exception out.
You can implement your own ErrorListener (there’s a BaseErrorListener class you can subclass). There you can handle the reported error yourself (the method you override provide a lot of detailed information about the error) and produce whatever message you’d like. (You can also do things like collect all the errors in a list, keep track of error levels, etc.)
In short, you probably don’t want a different exception, you want a different error message.
Sometimes, depending on how difficult it is to sort out your particular situation for a custom message, it’s really better to look for it in a Listener that processes the parse tree that ANTLR hands back. (Note: a very common path beginners take is to try to get everything into the grammar. It’s going to be hard to get really nice error messages if you do this. ANTLR is pretty good with error messages, but, as a generalized tool, it’s just more likely you can produce more meaningful messages.)
Just try to get ANTLR to produce a parse tree that accurately reflects the structure of your input. Then you can walk the ParseTree with a validation Listener of your own code, producing your own messages.
Another “trick” that doesn’t occur to many ANTLR devs (early on), is that the grammar doesn’t HAVE to ONLY include rules for valid input. If there’s some particular input you want to give a more helpful error message for, you can add a rule to match that (invalid) input, and when you encounter that Context in your validation listener, generate an error message specific to that construct.
BTW… [a-z]+ would almost always be a Lexer rule. If you don’t yet understand the difference between lexer and parser rules, or the processing pipeline ANTLR uses to tokenize an input stream, and then parse a token stream, do yourself a favor and get a firm understanding of those basics. ANTLR is going to be very confusing without that basic understanding. It’s pretty simple, but very important to understand.
You can do this in your grammar:
grammar Test
#header {
package $your.$package;
import $your.$package.$yourExceptionClass;
}
exampleParserRule : [a-z]+ ;
catch [RecognitionException re] {
reportError(re);
recover(input,re);
retval.tree = (CommonTree)adaptor.errorNode(input, retval.start, input.LT(-1), re);
String msg = getErrorMessage(re, this.getTokenNames());
throw new $yourExceptionClass(msg, re);
}
It's up to you if you really want to reportError(logs to console) , recover etc. - but these are the defaults so it may be good to use these.
Also, you may want to generate a more human readable error message (use getErrorMessage.
If you do more complex work follow #mike-cargal`s advice.
I'm building a single page application in Elm and was having difficulty deciding how to split my code in files.
I ended up splitting it using 1 module per page and have Main.elm convert the Html and Cmd emitted by each page using Cmd.map and Html.map.
My issue is that the documentation for both Cmd.map and Html.map says that :
This is very rarely useful in well-structured Elm code, so definitely read the section on structure in the guide before reaching for this!
I checked the only 2 large apps I'm aware of :
elm-spa-example uses Cmd.map (https://github.com/rtfeldman/elm-spa-example/blob/cb32acd73c3d346d0064e7923049867d8ce67193/src/Main.elm#L279)
I was not able to figure out how https://github.com/elm/elm-lang.org
deals with the issue.
Also, both answers to this stackoverflow question suggest using Cmd.map without second thoughts.
Is Cmd.map the "right" way to split a single page application in modules ?
I think sometimes you just have to do what's right for you. I used the Cmd.map/Sub.map/Html.map approach for an application I wrote that had 3 "pages" - Initializing, Editing and Reporting.
I wanted to make each of these pages its own module as they were relatively complicated, each had a fair number of messages that are only relevant to each page, and it's easier to reason about each page independently in its own context.
The downside is that the compiler won't prevent you from receiving the wrong message for a given page, leading to a runtime error (e.g., if the application receives an Editing.Save when it is in the Reporting page, what is the correct behavior? For my specific implementation, I just log it to the console and move on - this was good enough for me (and it never happened anyway); Other options I've considered include displaying a nasty error page to indicate that something horrible has happened - a BSOD if you will; Or to simply reset/reinitialize the entire application).
An alternative is to use the effect pattern as described extensively in this discourse post.
The core of this approach is that :
The extended Effect pattern used in this application consists in definining an Effect custom type that can represent all the effects that init and update functions want to produce.
And the main benefits :
All the effects are defined in a single Effect module, which acts as an internal API for the whole application that is guaranteed to list every possible effect.
Effects can be inspected and tested, not like Cmd values. This allows to test all the application effects, including simulated HTTP requests.
Effects can represent a modification of top level model data, like the Session 3 when logging in 3, or the current page when an URL change is wanted by a subpage update function.
All the update functions keep a clean and concise Msg -> Model -> ( Model, Effect Msg ) 2 signature.
Because Effect values carry the minimum information required, some parameters like the Browser.Navigation.key are needed only in the effects perform 3 function, which frees the developer from passing them to functions all over the application.
A single NoOp or Ignored String 25 can be used for the whole application.
I'm in the process of writing a code editor for a custom language. We're using ANTLR for the lexer and parser, and CodeMirror for the editor framework (running in a browser).
I can do some basic things like syntax coloring of keywords, as well as providing rudimentary code completion.
What I'm finding is that the user is frequently in the middle of editing something, so the ANTLR parser is not very useful since the current input stream is not fully parseable (and often leads ANTLR down the wrong path due to an incomplete input stream).
Therefore, I'm using the token stream to figure out what's going on and attempt to provide context-sensitive help.
I'm wondering if anyone can provide some guidance around using ANTLR as part of a code editor. Am I on the right track using the token stream instead of the parse tree?
Can the ANTLR API be leveraged to do things like looking ahead for tokens to figure out the overall context of what the user is currently editing?
Sorry if this is kind of vague. Just getting started on this project. :-)
Thanks for any help.
I find ANTRL great for syntax checking and easy details retrieval for valid input. For code completion however you have a different scenario. As you found out already the parser can often not give you good answers, because the input is not valid while the user is typing. Bart has linked to an answer where Sam described that he implemented a great solution using ANTLR 4, but unfortunately doesn't describe how.
But even if you can get the parser to give you a set of expected tokens, what are you going to make out of it? What do you wanna show if, say, an identifier is expected? That can be anything, like a class member, a var name etc. I don't believe this is the answer, hence I developed an own solution which I describe here: Universal Code Completion using ANTLR. This is for ANTLR 3, but can certainly be made working with 4 as well.
This article also contains several links to (C++) source code that shows how code completion is implemented in my application. It's amazing how simple the implementation is, after all, but still delivers very precise results.
Update
Meanwhile I developed a solution for ANTLR4 too, called [antlr4-c3][2]. This is a Typescript solution but comes with translations to [Java][3] and [C++][4].
If someone is interested in the integration of Codemirror 6 and Antlr4 grammar,
you can find an example on github page.
The main idea is here:
token: (stream, state) => {
// getting tokens for the current stream.string, using antlr4 TokenStream
const tokens = getTokensForText(stream.string);
// getting the next token
const nextToken = tokens.filter(t => t.startIndex >= stream.pos)[0];
// matching the next token in current stream
if (stream.match(nextToken.text)) {
let valueClass = getStyleNameByTag(tags.keyword);
switch (nextToken.type) {
case ...:
valueClass = getStyleNameByTag(tags.string);
break;
...
default:
valueClass = getStyleNameByTag(tags.keyword);
break;
}
return valueClass;
} else {
stream.next();
return null;
}
}
I am using ANTLR4 to parse code in my Netbeans Platform application. I have successfully implemented syntax highlighting using ANTLR4 and Netbeans mechanisms.
I have also implemented a simple code completion for two of my tokens. At the moment I am using a simple implementation from a tutorial, which searches for a whitespace and starts the completion process from there. This works, but it deems the user to prefix a whitespace before starting code completion.
My question: is it possible or even contemplated using ANTLR's lexer to determine which tokens are currently read from the input to determine the correct completion item?
I would appreciate every pointer in the right direction to improve this behaviour.
not really an answer, but I do not have enough reputation points to post comments.
is it possible or even contemplated using ANTLR's lexer to determine which tokens are currently read from the input to determine the correct completion item?
Have a look here: http://www.antlr3.org/pipermail/antlr-interest/2008-November/031576.html
and here: https://groups.google.com/forum/#!topic/antlr-discussion/DbJ-2qBmNk0
Bear in mind that first post was written in 2008 and current antlr v4 is very different from the one available at the time, which is why Sam’s opinion on this topic appear to have evolved.
My personal experience - most of what you are asking is probably doable with antlr, but you would have to know antlr very well. A more straightforward option is to use antlr to gather information about the context and use your own heuristics to decide what needs to be shown in this context.
The ANTLRv3 grammar https://sourceware.org/git/?p=frysk.git;a=blob_plain;f=frysk-core/frysk/expr/CExpr.g;hb=HEAD implements context sensitive completion of C expressions (no macros).
For instance, if fed the string:
a_struct->a<tab>
it would just lists the fields of "a_struct" starting with "a" (tab could, technically be any character or marker).
The technique it used was to:
modify a C grammar to recognize both IDENT and IDENT_TAB tokens
for IDENT_TAB capture the partial expression AST and "TOKEN_TAB" and throw them back to 'main' (there are hacks to help capture the AST)
'main' then performs a type-eval on the partial expression (compute the expression's type not value) and use that to expand TOKEN_TAB
the same technique, while not exactly ideal, can certainly be used in ANTLRv4.
I'm trying to port pydoctor to twisted.web.template and have hit a pretty basic problem: pydoctor uses epydoc to render docstrings into HTML but I can't see a way to include this HTML in the generated page without escaping. What can I do?
There is, somewhat intentionally, no way to insert HTML into the page without parsing; twisted.web.template is a bit more of a stickler about producing correct output than nevow was.
There are a couple of ways around this.
Ultimately, your HTML is going to some kind of output stream. You could simply insert a renderer that returns a pair of Deferred objects, and does a .write to the underlying stream after the first one fires but before the second. Kind of gross, but it effectively expresses your intent :).
You can simply re-parse the output of epydoc into HTML using XMLString or similar, so that twisted.web.template can write it out correctly. This will "waste" a little bit of CPU, but in my opinion it will be worth it for (A) the stress-test it will give t.w.t and (B) the guarantee - presuming that t.w.t is correct - that it will give you that you're emitting valid HTML.
As I was writing this answer, however, I realized that point 2 isn't generally possible with arbitrary HTML with the current public API of twisted.web.template. Ideally, you could use html5lib to parse this stuff, and then just dump the parsed input into your document tree.
If you don't mind mucking around with private API, you could probably hook up html5lib's SAX support to the internal SAX parser that we use to load templates.
Of course, the real solution is to fix the ticket you already filed, so you don't have to use private API outside of Twisted itself...