Throw Custom Exception From ANTLR4 Grammar File - antlr

I have a grammar file which parse a specific file type. Now I need a simple thing.
I have a parser rule, and when the parser rule doesn't satisfy with the input token I need to throw my own custom exception.
That is, when my input file is giving me an extraneous error because the parser is expecting something and the i/p file doesn't has that. I want to throw an exception in this scenario.
Is this possible ?
If yes, How ?
If no, any work around ?
I'm beginner in this skill.
grammar Test
exampleParserRule : [a-z]+ ;
My input file contains 12345. Now I need to throw a custom exception

For parsing issues such as this, ANTLR will, internally, throw certain exceptions, but they are caught and handled by an ErrorListener. By default, ANTLR will hook up a ConsoleErrorListener that just formats and writes error messages to the console. But it will then continue processing, attempting to recover from the error, and sync back up with your input. This is something you want in a parser. It’s not very useful to have a parser just report the first problem it encounters and then exception out.
You can implement your own ErrorListener (there’s a BaseErrorListener class you can subclass). There you can handle the reported error yourself (the method you override provide a lot of detailed information about the error) and produce whatever message you’d like. (You can also do things like collect all the errors in a list, keep track of error levels, etc.)
In short, you probably don’t want a different exception, you want a different error message.
Sometimes, depending on how difficult it is to sort out your particular situation for a custom message, it’s really better to look for it in a Listener that processes the parse tree that ANTLR hands back. (Note: a very common path beginners take is to try to get everything into the grammar. It’s going to be hard to get really nice error messages if you do this. ANTLR is pretty good with error messages, but, as a generalized tool, it’s just more likely you can produce more meaningful messages.)
Just try to get ANTLR to produce a parse tree that accurately reflects the structure of your input. Then you can walk the ParseTree with a validation Listener of your own code, producing your own messages.
Another “trick” that doesn’t occur to many ANTLR devs (early on), is that the grammar doesn’t HAVE to ONLY include rules for valid input. If there’s some particular input you want to give a more helpful error message for, you can add a rule to match that (invalid) input, and when you encounter that Context in your validation listener, generate an error message specific to that construct.
BTW… [a-z]+ would almost always be a Lexer rule. If you don’t yet understand the difference between lexer and parser rules, or the processing pipeline ANTLR uses to tokenize an input stream, and then parse a token stream, do yourself a favor and get a firm understanding of those basics. ANTLR is going to be very confusing without that basic understanding. It’s pretty simple, but very important to understand.

You can do this in your grammar:
grammar Test
#header {
package $your.$package;
import $your.$package.$yourExceptionClass;
}
exampleParserRule : [a-z]+ ;
catch [RecognitionException re] {
reportError(re);
recover(input,re);
retval.tree = (CommonTree)adaptor.errorNode(input, retval.start, input.LT(-1), re);
String msg = getErrorMessage(re, this.getTokenNames());
throw new $yourExceptionClass(msg, re);
}
It's up to you if you really want to reportError(logs to console) , recover etc. - but these are the defaults so it may be good to use these.
Also, you may want to generate a more human readable error message (use getErrorMessage.
If you do more complex work follow #mike-cargal`s advice.

Related

How to write an Identity-Visitor

Let us assume I am using the Python3.g4 grammar:
How can I write a program that parses a python-script, walks the syntax tree and outputs the same old program?
Later I want to apply some changes to the program, but for now I would be happy if I could just reproduce the program.
By now I guess I lose some information when walking an abstract syntax tree and there is no easy way.
The problem is that you are losing all the content of tokens on the a hidden channel. Therefore you have to check each token in your parse tree whether there is a hidden token in the TokenStream next to it that is not listed in the ParseTree.
For this the methods getHiddenTokensToRight and getHiddenTokensToLeft in the BufferedTokenStream should be the right tools for this work.
However if you just want a reproduction of the parsed input you should try to access the TokenStream directly and read out Token after Token from there without taking care of the channel the Token is on.
With that you should be able to reproduce the original input without many problems.

ANTLR4: What is the best approach to implement C like include file handling?

I am implementing a lexer/parser for the real-time language OpenPEARL. For better struturing of my testsuite I want to implement a include file handling similiar to C/C++. The parser iteself uses the visitors. What would be the best approach to implement this? One thing which concern me when instantiating a nested parser the included file does not need to contain a complete program depending where it is included.
Cheers
Marcel
I can't speak for ANTLR, but in general one implements a C-like preprocessor in the lexer.
You accomplish this by having a stack of input streams, with the base of the stack being the source file. You read input from the stream on top of the stack.
When an include is encountered in the lexer, a new stream is pushed on top of the stack, and reading continues (now from the new stream). When a stream encounters EOF, you pop the stack and continue; if the stack is empty, the lexer emits an EOF token.
You can abuse these streams to implement macros. On macro call, simply push a new stream that represents the macro body. When you encounter a macro parameter name, push a stream for the argument supplied to the corresponding macro.
I have seen implementations where include handling has been done in the (parser) grammar. Doing it in the lexer like Ira suggests is certainly possible, but with some extra work.
However, full include handling is more than simply switching input streams, namely macro handling, line splicing, trigraph handling, charizing and stringizing + as evaluator for #if(def) commands. All that I have implemented in my Windows Resource File Parser, which was written for ANTLR 2.7 and hence needs an update, but is certainly good for getting ideas.
In this project I handle include files outside of the normal ANTLR parsing chain, which follows more the preprocessor approach you often see for C/C++.

Using ANTLR4 lexing for Code Completion in Netbeans Platform

I am using ANTLR4 to parse code in my Netbeans Platform application. I have successfully implemented syntax highlighting using ANTLR4 and Netbeans mechanisms.
I have also implemented a simple code completion for two of my tokens. At the moment I am using a simple implementation from a tutorial, which searches for a whitespace and starts the completion process from there. This works, but it deems the user to prefix a whitespace before starting code completion.
My question: is it possible or even contemplated using ANTLR's lexer to determine which tokens are currently read from the input to determine the correct completion item?
I would appreciate every pointer in the right direction to improve this behaviour.
not really an answer, but I do not have enough reputation points to post comments.
is it possible or even contemplated using ANTLR's lexer to determine which tokens are currently read from the input to determine the correct completion item?
Have a look here: http://www.antlr3.org/pipermail/antlr-interest/2008-November/031576.html
and here: https://groups.google.com/forum/#!topic/antlr-discussion/DbJ-2qBmNk0
Bear in mind that first post was written in 2008 and current antlr v4 is very different from the one available at the time, which is why Sam’s opinion on this topic appear to have evolved.
My personal experience - most of what you are asking is probably doable with antlr, but you would have to know antlr very well. A more straightforward option is to use antlr to gather information about the context and use your own heuristics to decide what needs to be shown in this context.
The ANTLRv3 grammar https://sourceware.org/git/?p=frysk.git;a=blob_plain;f=frysk-core/frysk/expr/CExpr.g;hb=HEAD implements context sensitive completion of C expressions (no macros).
For instance, if fed the string:
a_struct->a<tab>
it would just lists the fields of "a_struct" starting with "a" (tab could, technically be any character or marker).
The technique it used was to:
modify a C grammar to recognize both IDENT and IDENT_TAB tokens
for IDENT_TAB capture the partial expression AST and "TOKEN_TAB" and throw them back to 'main' (there are hacks to help capture the AST)
'main' then performs a type-eval on the partial expression (compute the expression's type not value) and use that to expand TOKEN_TAB
the same technique, while not exactly ideal, can certainly be used in ANTLRv4.

memory release and error handling in antlr3 c target

I have a couple of questions about C target of antlr. (I am using libantlr3c-3.4)
Since there is no garbage collection, I have to clean up the memory myself, so I want to throw away all parser data after my data structures are populated and parsing is completed. Is there a simple way to delete the entire parser memory, instead of walking through each and every object and deleting it explicitly? (I noticed a comment in antlr3string.h that this is possible, but I could not find a code example of how to do it.)
My parser is working fine when the input is in correct syntax. But when the input syntax is wrong, it reports an error and gives a segmentation fault. I guess this is because there is no catch-throw of exceptions in C (unlike java). How to make the exit graceful in such situations? (I saw an answer on this topic - 4751699 - but that was more than 2 years ago and an older version of antlr .. just wanted to confirm if that answer is still valid, or some other stuff has to be done.)
Cleanup after you are done is simple unless you are manually creating own structures. All what's needed is:
pANTLR3_INPUT_STREAM _input;
pMySQLLexer _lexer;
pANTLR3_COMMON_TOKEN_STREAM _tokens;
pMySQLParser _parser;
MySQLParser_query_return _ast;
_parser->free(_parser);
_tokens ->free(_tokens);
_lexer->free(_lexer);
_input->close(_input);
No need to free the tree in the stored ast, since the nodes are from a pool that gets freed when you free the parser.
For the invalid input: there must be something wrong in your error handler. ANTLR doesn't throw an exception if the input is wrong. See where the exception comes from. You are probably accessing an element that you think exist but doesn't.

Should examples--even beginner examples--include the error-handling code?

Brian Kernighan was asked this question in a recent interview. I'll quote his reply:
Brian: I'm torn on this. Error-handling code tends to be bulky and very uninteresting and uninstructive, so it often gets in the way of learning and understanding the basic language constructs. At the same time, it's important to remind programmers that errors do happen and that their code has to be able to cope with errors.
My personal preference is to pretty much ignore error handling in the earlier parts of a tutorial, other than to mention that errors can happen, and similarly to ignore errors in most examples in reference manuals unless the point of some section is errors. But this can reinforce the unconscious belief that it's safe to ignore errors, which is always a bad idea.
I often leave off error handling in code examples here and on my own blog, and I've noticed that this is the general trend on Stack Overflow. Are we reinforcing bad habits? Should we spend more time polishing examples with error handling, or does it just get in the way of illustrating the point?
I think it might be an improvement if when posting example code we at least put comments in that say you should put error handling code in at certain spots. This might at least help somebody using that code to remember that they need to have error handling. This will keep the extra code for error handling out but will still reinforce the idea that there needs to be error handling code.
Any provided example code will be copy-pasted into production code at least once, so be at your best when writing it.
Beyond the question of cluttering the code when you're demonstrating a coding point, I think the question becomes, how do you choose to handle the error in your example code?
That is to say, what do you do ? What's fatal for one application is non-fatal for another. e.g. if I can't retrieve some info from a webserver (be it a 404 error or a non-responsive server) that may be fatal if you can't do anything without that data. But if that data is supplementary to what you're doing, then perhaps you can live without it.
So the above may point to simply logging the error. That's better than ignoring the error completely. But I think often the difficulty is in knowing how/when (and when not) to recover from an error. Perhaps that's a whole new tutorial in itself.
Examples should be illustrative. They should always show the point being made clearly with as little distraction as possible. Here's a meta-example:
Say we want to read a number from a file, add 3, and print it to the console. We'll need to demonstrate a few things.
infile = file("example.txt")
content = infile.read()
infile.close()
num = int(content)
print (3 + num)
wordy, but correct, except there are a few things that could go wrong. First, what if the file didn't exist? What if it does exist but doesn't contain a number?
So we show how the errors would be handled.
try:
infile = file("example.txt")
content = infile.read()
infile.close()
num = int(content)
print (3 + num)
except ValueError:
print "Oops, the file didn't have a number."
except IOError:
print "Oops, couldn't open the file for some reason."
After a few iterations of showing how to handle the errors raised by, in this case, file handling and parsing. Of course we'd like to show a more pythonic way of expressing the try clause. Now we drop the error handling, cause that's not what we're demonstrating.
First lets eliminate the unneeded extra variables.
infile = file("example.txt")
print (3 + int(infile.read()))
infile.close()
Since we're not writing to it, nor is it an expensive resource on a long-running process, it's actually safe to leave it open. It will closewhen the program terminates.
print ( 3 + int(file("example.txt").read()))
However, some might argue that's a bad habit and there's a nicer way to handle that issue. We can use a context to make it a little clearer. of course we would explain that a file will close automatically at the end of a with block.
with file("example.txt") as infile:
print (3 + int(infile.read()))
And then, now that we've expressed everything we wanted to, we show a complete example at the very end of the section. Also, we'll add some documentation.
# Open a file "example.txt", read a number out of it, add 3 to it and print
# it to the console.
try:
with file("example.txt") as infile:
print (3 + int(infile.read()))
except ValueError: # in case int() can't understand what's in the file
print "Oops, the file didn't have a number."
except IOError: # in case the file didn't exist.
print "Oops, couldn't open the file for some reason."
This is actually the way I usually see guides expressed, and it works very well. I usually get frustrated when any part is missing.
I think the solution is somewhere in the middle. If you are defining a function to find element 'x' in list 'y', you do something like this:
function a(x,y)
{
assert(isvalid(x))
assert(isvalid(y))
logic()
}
There's no need to be explicit about what makes an input valid, just that the reader should know that the logic assumes valid inputs.
Not often I disagree with BWK, but I think beginner examples especially should show error handling code, as this is something that beginners have great difficulty with. More experienced programmers can take the error handling as read.
One idea I had would be to include a line like the following in your example code somewhere:
DONT_FORGET_TO_ADD_ERROR_CHECKING(); // You have been warned!
All this does is prevent the code compiling "off the bat" for anyone who just blindly copies and pastes it (since obviously DONT_FORGET_TO_ADD_ERROR_CHECKING() is not defined anywhere). But it's also a hassle, and might be deemed rude.
I would say that it depends on the context. In a blog entry or text book, I would focus on the code to perform or demonstrate the desired functionality. I would probably give the obligatory nod to error handling, perhaps, even put in a check but stub the code with an ellipsis. In teaching, you can introduce a lot of confusion by including too much code that doesn't focus directly on the subject at hand. In SO, in particular, shorter (but complete) answers seem to be preferred so handling errors with "a wave of the hand" may be more appropriate in this context as well.
That said, if I made a code sample available for download, I would generally make it as complete as possible and include reasonable error handling. The idea here is that for learning the person can always go back to the tutorial/blog and use that to help understand the code as actually implemented.
In my personal experience, this is one of the issues that I have with how TDD is typically presented -- usually you only see the tests developed to check that the code succeeds in the main path of execution. I would like to see more TDD tutorials include developing tests for alternate (error) paths. This aspect of testing, I think, is the hardest to get a handle on since it requires you to think, not of what should happen, but of all the things that could go wrong.
Error handling is a paradigm by itself; it normally shouldn't be included in examples since it seriously corrupts the point that the author tries to come across with.
If the author wants to pass knowledge about error handling in a specific domain or language then I would prefer as a reader to have a different chapter that outlines all the dominant paradigms of error handling and how this affects the rest of the chapters.
I don't think error handling should be in the example if it obscures the logic. But some error handling is just the idiom of doing some things, and in theese case include it.
Also if pointing out that error handling needs to be added. For the love of deity also point out what errors needs to be handled.
This is the most frustrating part of reading some examples. If you don't know what you are doing (which we have to assume of the reader of the example...) you don't know what errors to look for either. Which turns the "add error handling" suggestion into "this example is useless".
One approach I've seen, notably in Advanced Programming in the UNIX Environment and UNIX Network Programming is to wrap calls with error checking code and then use the wrappers in the example code. For instance:
ssiz_t Recv(...)
{
ssize_t result;
result = recv(...);
/* error checking in full */
}
then, in calling code:
Recv(...);
That way you get to show error handling while allowing the flow of calling code to be clear and concise.
No, unless the purpose of the example is to demonstrate an aspect of exception handling. This is a pet peeve of mine -- many examples try to demonstrate best practices and end up obscuring and complicating the example. I see this all the time in code examples that start by defining a bunch of interfaces and inheritance chains that aren't necessary for the example. A prime example of over complicating was a hands-on lab I did at TechEd last year. The lab was on Linq, but the sample code I was directed to write created a multi-tier application for no purpose.
Examples should start with the simplest possible code that demonstrates the point, then progress into real-world usage and best practices.
As an aside, when I've asked for code samples from job candidates almost all of them are careful to demonstrate their knowledge of exception handling:
public void DoSomethingCool()
{
try
{
// do something cool
}
catch (Exception ex)
{
throw ex;
}
}
I've received hundreds of lines of code with every method like this. I've started to award bonus points for those that use throw; instead of throw ex;
Sample code need not include error handling but it should otherwise demonstrate proper secure coding techniques. Many web code snippets violate the OWASP Top ten.