Hive String operator concatenate by || double pipe - hive

Language manual of Hive claims that double pipe string concatenation is supported, however I am unable to use this feature in my current version of HIVE 1.2.1000.2.4.3.6-2
hive> select 'a'||'b';
NoViableAltException(5#[323:1: atomExpression : ( ( KW_NULL )=> KW_NULL -> TOK_NULL | ( constant )=> constant | castExpression | caseExpression | whenExpression | ( functionName LPAREN )=> function | tableOrColumn | LPAREN ! expression RPAREN !);])
I was trying to find a version which starts to support that, but without any luck :-(
I know that I can use build in function concat to do the same thing, but I am rewriting bunch of Oracle views to Hive and I don't want to change things which can stay the same if possible.

Hive 2.2.0
The documentation is very clear about that
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringOperators

Related

Parsing an input string containing a dot(.) is not getting validated in ANTLR

I am having an application "abc" and I am trying to parse a job (Input string).
abc throwing error to show status of job if the job name contains dot(.)
»abc status -jn UpgradeJob_435_1.61.4_xyz_1000_KPI_Upgrade_confirm
Error 2001 : Command Syntax error. extraneous input
'.61.4_xyz_1000_KPI_Upgrade_confirm' expecting
{<EOF>, JOB, JOB_OWNER, JOB_TYPE, JOB_STATUS}
Suggested Solution : Please check online help for correct syntax
It works fine if we give the jobname in double quotes.
For fix of the same I have added DOT rule in the command parser. Below are the snippets of the changes made.
Snippet of the Parser:
jobNameQuery :
JOB (id | DOT | stringWithQuotes)
;
jobOwnerQuery:
JOB_OWNER (id | DOT | stringWithQuotes)
;
Snippet Of Lexar:
DOT : '.' ;
ID: [a-zA-Z0-9_]([a-zA-Z0-9_#{}()#$%^~!`'-] | '[' | ']' )*;
Error Message:
Command Syntax error. extraneous input '.1' expecting {, JOB, JOB_OWNER, JOB_TYPE, JOB_STATUS}
Can someone please suggest what changes I need to make.
Depending on your exact requirements, either make . one of the allowed characters in ID, or change
(id | DOT | stringWithQuotes)
to
(
id (DOT id)*
| stringWithQuotes
)
As it is now, you allow either a quoted string, an identifier, or a single dot - not identifier intermixed with dots.

How to express a hex literal in Spark SQL?

I am new to Spark SQL. I have searched the language manual for Hive/SparkSQL and googled for the answer, but could not find an obvious answer.
In MySQL we can express a hex literal 0xffff like this:
mysql>select 0+0xffff;
+----------+
| 0+0xffff |
+----------+
| 65535 |
+----------+
1 row in set (0.00 sec)
But in Spark SQL (I am using the beeline client), I could only do the following where the numerical values are expressed in decimal not hexidecimal.
> select 0+65535;
+--------------+--+
| (0 + 65535) |
+--------------+--+
| 65535 |
+--------------+--+
1 row selected (0.047 seconds)
If I did the following instead, I would get an error:
> select 0+0xffff;
Error: org.apache.spark.sql.AnalysisException:
cannot resolve '`0xffff`' given input columns: []; line 1 pos 9;
'Project [unresolvedalias((0 + '0xffff), None)]
+- OneRowRelation$ (state=,code=0)
How do we express a hex literal in Spark SQL?
Unfortunatelly, you can't do it in Spark SQL.
You can discover it just by looking at the ANTLR grammar file. There, the number rule defined via DIGIT lexer rule which looks like this:
number
: MINUS? DECIMAL_VALUE #decimalLiteral
| MINUS? INTEGER_VALUE #integerLiteral
| MINUS? BIGINT_LITERAL #bigIntLiteral
| MINUS? SMALLINT_LITERAL #smallIntLiteral
| MINUS? TINYINT_LITERAL #tinyIntLiteral
| MINUS? DOUBLE_LITERAL #doubleLiteral
| MINUS? BIGDECIMAL_LITERAL #bigDecimalLiteral
;
...
INTEGER_VALUE
: DIGIT+
;
...
fragment DIGIT
: [0-9]
;
It does not include any hexadecimal characters, so you can't use them.

Lvalue awareness in ANTLR grammar and syntax predicates

I am implementing a parser with ANTLR for D. This language is based on C so there are some ambiguity around the declarations and the expressions. Consider this:
a* b = c; // This is a declaration of the variable d with a pointer-to-a type.
c = a * b; // as an expression is a multiplication.
As the second example could only appear on the right of an assignment expression I tried to resolve this problem with the following snippet:
expression
: left = assignOrConditional
(',' right = assignOrConditional)*
;
assignOrConditional
: ( postfixExpression ('=' | '+=' | '-=' | '*=' | '/=' | '%=' | '&=' | '|=' | '^=' | '~=' | '<<=' | '>>=' | '>>>=' | '^^=') )=> assignExpression
| conditionalExpression
;
assignExpression
: left = postfixExpression
( op = ('=' | '+=' | '-=' | '*=' | '/=' | '%=' | '&=' | '|=' | '^=' | '~=' | '<<=' | '>>=' | '>>>=' | '^^=')
right = assignOrExpression
)?
;
conditionalExpression
: left = logicalOrExpression
('?' e1 = conditionalExpression ':' e2 = conditionalExpression)?
;
As far as my understanding goes, this should do the trick to avoid the ambiguity but the tests are failing. If I feed the interpreter with any input, starting with the rule assignOrConditional, it will fail with NoViableAltException.
the inputs were
a = b
b-=c
d
Maybe I'm misunderstanding how the predicates are working therefore it would be great if someone could correct my explanation to the code: If the input can be read as a postfixExpression it will check if the next token after the postfixExpression is one of the assignment operators and if it is, it will parse the rule as an assignmentExpression. (Note, that the assignmentExpression and the conditionalExpression works well). If the next token isn't of them, it tries to parse it as a conditionalExpression.
EDIT
[solved] Now, there's an other problem with this solution that I could realize: the assignmentExpression has to choose in it's right hand expression is an assignment again (that is, postfix and assignment operator follows), if it is chained up.
Any idea what's wrong with my understanding?
If I feed the interpreter with any input, ...
Don't use ANTLRWorks' interpreter: it is buggy, and disregards any type of predicate. Use its debugger: it works flawlessly.
If the input can be read as a postfixExpression it will check if the next token after the postfixExpression is one of the assignment operators and if it is, it will parse the rule as an assignmentExpression.
You are correct.
EDIT [solved] Now, there's an other problem with this solution that I could realize: the assignmentExpression has to choose in it's right hand expression is an assignment again (that is, postfix and assignment operator follows), if it is chained up.
What's wrong with that?

Unambiguous grammar for arithmetic expression with Unary + and -

I have just started self-studying the Dragon book of Compiler Design. I am working on a problem that says to design grammar for an expression containing binary +,-,*,/ and unary +,-
I came up with following
E -> E+T | E-T | T
T -> T*P | T/P | P
P -> +S | -S | S
S -> id | constant | (E)
However, there is an obvious flaw in it. According to this grammar, expressions like
1--3
are valid, which is an error in all programming languages I know. Though, expressions like
1+-+3
and
1- -3
must be valid. How can such a grammar be designed?
I believe your problem is with tokenization. You are identifying 1--3 as an error because you think it should be resolved as 1 --3 rather than 1 - -3, the latter being perfectly valid. So I think you problem comes because when you tokenize the string you are getting:
['1', '-', '-' , '3']
rather than:
['1', '--', '3']
I think you have one extra production rule
P -> +S | -S | S
S -> id | constant | (E)
can be shrinked to
P -> +P | -P | id | constant | (E)
With such grammar you will succesfully match exp "1+-+3" as valid.
You have tokenizer(scanner) problem! before passing token to parser You must differentiate between "-" and "--". you must define a token struct which contains a token type and value, then parse the token list.
Also the rule P->--S must be added to the production rules!!

ANTLR 3.x - How to format rewrite rules

I'm finding myself challenged on how to properly format rewrite rules when certain conditions occur in the original rule.
What is the appropriate way to rewrite this:
unaryExpression: op=('!' | '-') t=term
-> ^(UNARY_EXPR $op $t)
Antlr doesn't seem to like me branding anything in parenthesis with a label and "op=" fails. Also, I've tried:
unaryExpression: ('!' | '-') t=term
-> ^(UNARY_EXPR ('!' | '-') $t)
Antlr doesn't like the or '|' and throws a grammar error.
Replacing the character class with a token name does solve this problem, however it creates a quagmire of other issues with my grammar.
--- edit ----
A second problem has been added. Please help me format this rule with tree grammar:
multExpression
: unaryExpression (MULT_OP unaryExpression)*
;
Pretty simple: My expectation is to enclose every matched token in a parent (imaginary) token MULT so that I end up with something like:
MULT
o
|
o---o---o---o---o
| | | | |
'3' '*' '6' '%' 2
unaryExpression
: (op='!' | op='-') term
-> ^(UNARY_EXPR[$op] $op term)
;
I used the UNARY_EXPR[$op] so the root node gets some useful line/column information instead of defaulting to -1.