ParseException EOF error in hive - hive

I have a simple query but it fails with the ParseException
select * from apachelog
where split(regexp_extract(cookie, 'PHPSESSID.*?\;', 0), '=')[1]='7nsi2pj4icv875sdfdcoj90bv6;'
and day='2015-10-01' limit 10;
FAILED: ParseException line 2:91 character '<EOF>' not supported here
Where did I make some stupid error?

Related

ANTLR4 error when using identifier but not when using literal

Testing the following simple grammar.
grammar SQL;
selectStatement: SELECT selectElements EOF;
selectElements: (star='*' | ID ) (',' ID)*;
ID: ID_LITERAL;
WS: [ \t\r\n]+ -> channel(HIDDEN);
fragment ID_LITERAL: [A-Z_$0-9]*? [A-Z_$]+? [A-Z_$0-9]*;
SELECT: 'SELECT';
Given the input SELECT * it produces the following errors:
line 1:0 missing 'SELECT' at 'SELECT'
line 1:7 extraneous input '*' expecting <EOF>
While changing SELECT identifier to inline literal in the selectStatement results in the following grammar, that parses the same input without errors. Why?
grammar SQL;
selectStatement: 'SELECT' selectElements EOF;
selectElements: (star='*' | ID ) (',' ID)*;
ID: ID_LITERAL;
WS: [ \t\r\n]+ -> channel(HIDDEN);
fragment ID_LITERAL: [A-Z_$0-9]*? [A-Z_$]+? [A-Z_$0-9]*;
The patterns [A-Z_$0-9]*? [A-Z_$]+? [A-Z_$0-9]* and 'SELECT' both match on the input SELECT * and they both produce a match of the same length (i.e. they both match SELECT and then leave * as the rest of the input). In cases like this, ANTLR (like most lexer generators) applies the rule that comes first in the grammar. In your first grammar that's ID. So SELECT * is lexed as ID, WS, '*', not SELECT, WS, '*'.
If you move the rule SELECT: 'SELECT'; before the definition of ID, it will work as you want it to.

Parsing an input string containing a dot(.) is not getting validated in ANTLR

I am having an application "abc" and I am trying to parse a job (Input string).
abc throwing error to show status of job if the job name contains dot(.)
»abc status -jn UpgradeJob_435_1.61.4_xyz_1000_KPI_Upgrade_confirm
Error 2001 : Command Syntax error. extraneous input
'.61.4_xyz_1000_KPI_Upgrade_confirm' expecting
{<EOF>, JOB, JOB_OWNER, JOB_TYPE, JOB_STATUS}
Suggested Solution : Please check online help for correct syntax
It works fine if we give the jobname in double quotes.
For fix of the same I have added DOT rule in the command parser. Below are the snippets of the changes made.
Snippet of the Parser:
jobNameQuery :
JOB (id | DOT | stringWithQuotes)
;
jobOwnerQuery:
JOB_OWNER (id | DOT | stringWithQuotes)
;
Snippet Of Lexar:
DOT : '.' ;
ID: [a-zA-Z0-9_]([a-zA-Z0-9_#{}()#$%^~!`'-] | '[' | ']' )*;
Error Message:
Command Syntax error. extraneous input '.1' expecting {, JOB, JOB_OWNER, JOB_TYPE, JOB_STATUS}
Can someone please suggest what changes I need to make.
Depending on your exact requirements, either make . one of the allowed characters in ID, or change
(id | DOT | stringWithQuotes)
to
(
id (DOT id)*
| stringWithQuotes
)
As it is now, you allow either a quoted string, an identifier, or a single dot - not identifier intermixed with dots.

ANTLR 4: Using custom grammar keywords in input script

I'm running into a situation where a keyword from my grammar is used in the input script where the user can essentially type anything (e.g. a variable name). But ANTLR doesn't like this when it parses the script.
I know most languages have a set of reserved keywords that are pretty much forbidden in the source code because they get in the way of parsing.
But I thought that my grammar rules are clear enough that ANTLR wouldn't get confused.
Here's a simplified version of the grammar:
grammar test;
script : statements EOF ;
statements : statement* ;
statement : (output_statement | variable_statement) ;
output_statement : identifier ('format' column_format) ;
column_format : STRING_LITERAL;
variable_statement : identifier '=' STRING_LITERAL ;
identifier : IDENTIFIER ;
IDENTIFIER : [a-z]+ ;
STRING_LITERAL : '"' ( ~[\\\r\n"] )* '"' ;
WS : [ \t\r\n\u000C]+ -> channel(HIDDEN) ;
The following parses ok:
x = "a"
x format "str"
But this next input text does not parse:
format = "a"
format format "str"
test::script:1:0: mismatched input 'format' expecting EOF
Is there any way to structure my grammar so "format" is permitted as an identifier?
Thanks.
Since format is both a keyword and an identifier:
output_statement : identifier (FORMAT column_format) ;
.....
identifier : IDENTIFIER | FORMAT ;
.....
FORMAT : 'format' ;
IDENTIFIER : [a-z]+ ;
.....

Antlr Sample Grammar Error

I am new to ANTLR, I defined the following test grammar, it's basically intended to parse a series of assignment statement like the following
x=1
y=10
=======================================================================
grammar test;
program
:
assignstatement*
;
assignstatement
:
ID '=' INT
;
ID : ('_'|'a'..'z'|'A'..'Z'|DIGIT) ('_'|'a'..'z'|'A'..'Z'|DIGIT)*;
INT: DIGIT+;
fragment DIGIT : [0-9] ; // not a token by itself
I got the following error when running the testRig
[#0,0:0='x',<1>,1:0]
[#1,2:2='=',<3>,1:2]
[#2,4:4='1',<1>,1:4]
[#3,7:7='y',<1>,2:0]
[#4,9:9='=',<3>,2:2]
[#5,11:12='10',<1>,2:4]
[#6,14:13='<EOF>',<-1>,3:0]
line 1:4 missing INT at '1'
line 2:0 extraneous input 'y' expecting '='
line 2:4 missing INT at '10'
line 3:0 mismatched input '<EOF>' expecting '='
(program (assignstatement x = <missing INT>) (assignstatement 1 y = <missing INT>) (assignstatement 10))
Can someone figure out what's causing these errors?
The lexer will never create INT tokens because your ID rule also matches tokens consisting of only digits.
Let your ID rule not be able start with a digit, and you're fine:
ID : ('_'|'a'..'z'|'A'..'Z') ('_'|'a'..'z'|'A'..'Z'|DIGIT)*;
Or the equivalent:
ID : [_a-zA-Z] [_a-zA-Z0-9]*;

How to solve a problem with ANTLR mismatch input

given the grammar
test : 'test' ID '\n' 'begin' '\n' 'end' '\n' -> ^(TEST ID);
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
and a test string of
"test blah\n begin\n end\n"
resulting in
line 1:0 mismatched input 'test blah\\n begin\\n end\\n' expecting 'test'
<mismatched token: [#0,0:21='test blah\\n begin\\n end\\n',<12>,1:0], resync=test blah
begin
end
>
whats gone wrong here?
When you use '\n' in your grammar rules, you're not matching a backslash+n but a new line character. And it looks like your input does not contain new line characters but backslash+n's.
So, my guess is you need to either change your test rule into:
test
: 'test' ID '\\n' 'begin' '\\n' 'end' '\\n'
;
resulting in the parse-tree:
or leave your test rule as is but change your input into:
test blah
begin
end
resulting in the parse-tree:
If that is not the case, could you post a SSCCE: a small, full working demo that I (or someone else can run) that shows this error?