Internal EBCDIC support for ANTLR 3.1.3? - antlr

I'm trying to use ANTLR 3.1.3 on a system with local EBCDIC char set
Even a simple grammar like this:
lexer grammar test;
GENERIC_ID
: (LETTER)*
;
fragment LETTER
: 'a' .. 'z'
;
results in these errors during the initial compile (java org.antlr.Tool test.g):
error(10): internal error: problem parsing group <unknown>: line 1:1: unexpected char: 0x7 : line 1:1: unexpected char: 0x7
org.antlr.stringtemplate.language.GroupLexer.nextToken(GroupLexer.java:233)
antlr.TokenBuffer.fill(TokenBuffer.java:69)
antlr.TokenBuffer.LA(TokenBuffer.java:80)
antlr.LLkParser.LA(LLkParser.java:52)
antlr.Parser.match(Parser.java:210)
org.antlr.stringtemplate.language.GroupParser.group(GroupParser.java:120)
org.antlr.stringtemplate.StringTemplateGroup.parseGroup(StringTemplateGroup.java:792)
org.antlr.stringtemplate.StringTemplateGroup.<init>(StringTemplateGroup.java:274)
org.antlr.stringtemplate.PathGroupLoader.loadGroup(PathGroupLoader.java:67)
org.antlr.stringtemplate.StringTemplateGroup.loadGroup(StringTemplateGroup.java:969)
org.antlr.stringtemplate.StringTemplateGroup.loadGroup(StringTemplateGroup.java:955)
org.antlr.codegen.CodeGenerator.loadTemplates(CodeGenerator.java:198)
org.antlr.codegen.CodeGenerator.genRecognizer(CodeGenerator.java:292)
org.antlr.Tool.generateRecognizer(Tool.java:607)
org.antlr.Tool.process(Tool.java:429)
org.antlr.Tool.main(Tool.java:91)
error(10): internal error: test.g : java.lang.IllegalArgumentException: Can't find template outputFile.st; group hierarchy is [null]
org.antlr.stringtemplate.StringTemplateGroup.lookupTemplate(StringTemplateGroup.java:507)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:392)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:404)
org.antlr.codegen.CodeGenerator.genRecognizer(CodeGenerator.java:314)
org.antlr.Tool.generateRecognizer(Tool.java:607)
org.antlr.Tool.process(Tool.java:429)
org.antlr.Tool.main(Tool.java:91)
the grammar file seems to be processed appropriately, but it appears something internal is causing some problems. No matter what chars I use in the grammar file, the illegal char always seems to be 0x7.
Can I not compile ANTLR on a system with local EBCDIC char set? Any suggestions?
Update: it appears the problem lies in the template files (.stg files). If I convert the files in the codegen/templates directory to EBCDIC (also ANTLRCore.sti) then the compilation seems to complete. Is there a way to tell java/antlr to not read these files in local encoding? Either that or are these template files available in other encodings? Otherwise I am forced to convert by hand and replace each one

Related

What do the numbers mean in antlr4 error and warning messages ?

Can someone point to the antlr 4 documentation or tell me about the numbers in error and warning messages ?
I have a lexer file and a parser file that is generating this warning:
warning(125): Sybase\SybTSqlParser.g4:1084:158: implicit definition of token R in parser
The numbers "1084:158" do not seem to correspond to a line number or character count.
After some inspiration by Bart Kiers jogging some old memory cells this is the explanation:
When compiling independent lexer and parser files, the lexer files lines are concatenated to the parser file line numbers.
If the lexer has 10 lines and the error is detected on parser file line 25, the error is reported on line 35.

Antlr4 No viable alternative at input symbols

I'm implementing a simple program walker grammar and I get this common error in multiple lines. I think it is caused by same reason, but I'm new to antlr so I couldn't figure it out.
For example, in this following code snippet:
program
: (declaration)*
(statement)*
EOF!
;
I got error:
No viable alternative at input '!'
after EOF, and I got a similar error with:
declaration
: INT VARNUM '=' expression ';'
-> ^(DECL VARNUM expression)
;
I got the error:
No viable alternative at input '->'
After reading other questions, I know that matching one token with multiple definitions can cause this problem. But I haven't test it with any input yet, I got this error in intelliJ. How can I fix my problem?
This is ANTLR v3 syntax, you're trying to compile it with ANTLR v4, which won't work.
Either downgrade to ANTLR v3, or use v4 syntax. The difference comes from the fact that v4 doesn't support automatic AST generation, and you're trying to use AST construction operators, which were removed.
The first snippet only requires you to remove the !. Parentheses aren't necessary.
program
: declaration*
statement*
EOF
;
As for the second one, remove everything after the ->:
declaration
: INT VARNUM '=' expression ';'
;
If you need to build an AST with v4, see my answer here.

IDEA javadoc generator - ignore errors

When I trying to generate javadoc - I get some meaningless errors like encoding error in non-javadoc comments. Maybe exists some command-line argument or something for ignore errors and generate docs anyway?
Error example (It's even not in javadocs, just inline Cyrillic comment)
GroupsSorter.java:83: error: unmappable character for encoding Cp1251
//Р?тоговый СЃРїРёСЃРѕРє
^

Mismatched double token

In ANTLR, I have a MismatchedTokenException with the following definition:
type : IDENTIFIER ('<' (type (',' type)*) '>')?;
And the following test:
A<B,C<D>>
The exception occurs when parsing the first >. ANTLR tries parsing both '>>' at once, and fails.
With a silent whitespace channel, the following test does work:
A<B,C<D> >
In which ANTLR is clearly instructed to treat each token separately.
How can I fix that?
I could not reproduce that. The parser generated by:
grammar T;
type : IDENTIFIER ('<' (type (',' type)*) '>')?;
IDENTIFIER : 'A'..'Z';
parses the input A<B,C<D>> (without spaces) into the following parse tree:
You'll need to provide the grammar that causes this input to produce a MismatchedTokenException.
Perhaps you're using ANTLRWorks' interpreter (or Eclipse's ANTLR-IDE, which uses the same interpreter)? In that case, that is probably the problem: it's notoriously buggy. Don't use it, but use ANTLRWorks' debugger: it's great (the image posted above comes from the debugger).
Lazlo Bonin wrote:
Got it. I had a << token defined. Quickly, is there a way to priorize token recognition over another?
No, the lexer simply tries to match as much as possible. So if it can create a token matching << (or >>), it will do so in favor of two single < (or >) tokens. Only when two (or more) lexer rules match the same amount of characters, a prioritization is made: the rule defined first will then "win" over the one(s) defined later in the grammar.

Specific symbol and the Strings file

So I have a symbol: π in the strings file and it turnes out that due to it I cannot successfuly compile to fatal:
Copy EN.strings
Command /Developer/Library/Xcode/Plug-ins/CoreBuildTasks.xcplugin/Contents/Resources/copystrings failed with exit code 1
If I remove π it's fine. The strange thing is that even if I put π in the comment it still won't compile.
what to do?
Thankx
If you can find the Unicode value of the character, you could escape it in the following manor:
NSString *str = #"\u00F6"
And Java (just for comparison):
String str = "\u00F6";
Although I'd imagine that the compile issue relates to the character being from a different encoding to the specified encoding of your source file. I believe the compiler will interpret your source as UTF-8 by default.
Make sure your strings file is using a Unicode encoding, and make sure the string is quoted; this has solved the issue for me in the past.