How to express a hex literal in Spark SQL? - hive

I am new to Spark SQL. I have searched the language manual for Hive/SparkSQL and googled for the answer, but could not find an obvious answer.
In MySQL we can express a hex literal 0xffff like this:
mysql>select 0+0xffff;
+----------+
| 0+0xffff |
+----------+
| 65535 |
+----------+
1 row in set (0.00 sec)
But in Spark SQL (I am using the beeline client), I could only do the following where the numerical values are expressed in decimal not hexidecimal.
> select 0+65535;
+--------------+--+
| (0 + 65535) |
+--------------+--+
| 65535 |
+--------------+--+
1 row selected (0.047 seconds)
If I did the following instead, I would get an error:
> select 0+0xffff;
Error: org.apache.spark.sql.AnalysisException:
cannot resolve '`0xffff`' given input columns: []; line 1 pos 9;
'Project [unresolvedalias((0 + '0xffff), None)]
+- OneRowRelation$ (state=,code=0)
How do we express a hex literal in Spark SQL?

Unfortunatelly, you can't do it in Spark SQL.
You can discover it just by looking at the ANTLR grammar file. There, the number rule defined via DIGIT lexer rule which looks like this:
number
: MINUS? DECIMAL_VALUE #decimalLiteral
| MINUS? INTEGER_VALUE #integerLiteral
| MINUS? BIGINT_LITERAL #bigIntLiteral
| MINUS? SMALLINT_LITERAL #smallIntLiteral
| MINUS? TINYINT_LITERAL #tinyIntLiteral
| MINUS? DOUBLE_LITERAL #doubleLiteral
| MINUS? BIGDECIMAL_LITERAL #bigDecimalLiteral
;
...
INTEGER_VALUE
: DIGIT+
;
...
fragment DIGIT
: [0-9]
;
It does not include any hexadecimal characters, so you can't use them.

Related

How to add ARRAY column to Spark table (using ALTER TABLE)?

I am trying to add a new column of Array Type to the table with default value.
%sql
ALTER TABLE testdb.tabname ADD COLUMN new_arr_col ARRAY DEFAULT ['A','B','C'];
But it says that the data type in not supported
Error in SQL statement: ParseException:
DataType array is not supported.(line 1, pos 54)
== SQL ==
ALTER TABLE testdb.dim_category ADD COLUMN c_cat_area ARRAY
So, there is no way we can add an array column directly to the table? Kindly assist me on this. Thanks in advance!
The reason for this error is that complex types (e.g. ARRAY) require another type to be specified (cf. SqlBase.g4):
dataType
: complex=ARRAY '<' dataType '>' #complexDataType
| complex=MAP '<' dataType ',' dataType '>' #complexDataType
| complex=STRUCT ('<' complexColTypeList? '>' | NEQ) #complexDataType
| INTERVAL from=(YEAR | MONTH) (TO to=MONTH)? #yearMonthIntervalDataType
| INTERVAL from=(DAY | HOUR | MINUTE | SECOND)
(TO to=(HOUR | MINUTE | SECOND))? #dayTimeIntervalDataType
| identifier ('(' INTEGER_VALUE (',' INTEGER_VALUE)* ')')? #primitiveDataType
;
In your case, it'd be as follows:
ARRAY<CHAR(3)>

How can I correctly express in BNF this condition?

I am looking for a way to express the following types of conditions in BNF:
if(carFixed) { }
if(carFixed = true) {}
if(cars >= 4) { }
if(cars != 15) { }
if(cars < 3 && cars > 1) {}
Note:
* denotes 0 or more instances of something.
I have replaced normal BNF ::= with :.
I presently am using the following code, and am not sure if it's correct:
conditionOperator: "=" | "!=" | "<=" | ">=" | "<" | ">" | "is";
logicalAndOperator: "&&";
condition: (booleanIdentifier ((conditionOperator booleanIdentifier)* (logicalAndOperator | logicalOrOperator) booleanIdentifer (conditionOperator booleanIdentifier)*)*);
There are several approaches and they usually rely on the capabilities of the parser to indicate precedence and associativty. One that is typically used with recursive-descent parsers is to recreate the precedence of the operators by using the hierarchy provided by the bnf (or, in this case, pseudo-bnf) structure.
(In the examples bellow, CONDITIONAL_OP are the likes of <, != etc and LOGICAL_OP are &&, || etc)
Something in the lines of:
condition: logicalExpr
logicalExpr: conditionalExpr (LOGICAL_OP conditionalExpr)*
conditionalExpr: primary (CONDITIONAL_OP primary)*
primary: NUMBER | IDENTIFIER | BOOLEAN_LITERAL | '(' condition ')'
The problem with the above solution is that the left-associativity of the operators is lost and requires special measures to restore it while parsing.
For parsers able to deal with left recursion, a more 'correct' notation could be:
condition: logicalExpr
logicalExpr: logicalExpr LOGICAL_OP conditionalExpr
| conditionalExpr
conditionalExpr: conditionalExpr CONDITIONAL_OP primary
| primary
primary: NUMBER | IDENTIFIER | BOOLEAN_LITERAL | '(' condition ')'
Finally, some parsers allow a special notation to indicate precedence and associativity. Something like (note that this is a completely invented syntax):
%LEFT LOGICAL_OP
%LEFT CONDITIONAL_OP
condition: condition CONDITIONAL_OP condition
| condition LOGICAL_OP condition
| '(' condition ')'
| NUMBER
| IDENTIFIER
| BOOLEAN_LITERAL
Hope this points you the right direction.

Parsing SQL 'between' and 'and' expressions with ANTLR 4

I have difficulties with a SQL expression parser. Specifically, with the a AND b and a BETWEEN c AND d rules. The alternatives are specified as follows:
| lhs=exprRule K_AND rhs=exprRule # AndExpression
| value=exprRule K_NOT? K_BETWEEN lower=exprRule K_AND upper=exprRule # BetweenExpression
Unfortunately, this grammar parses a string, such as
...
l_discount BETWEEN 0.02 - 0.01 AND 0.02 + 0.01
AND l_quantity < 25
...
as BetweenExpression with lower={0.02 - 0.01 AND 0.02 + 0.01} and upper={l_quantity < 25}. Clearly, I want it to be parsed as lower={0.02 - 0.01} and upper={0.02 + 0.01} with an AndExpression as parent node.
Basically, I want the lower=exprRule of the BetweenExpression to take the smallest number of tokens instead of the largest number. It seems to me that there should be a straightforward solution to this but I lack the nomenclature to phrase the correct google search and could not find an answer in the ANTLR documantation either.
I also tried, as suggest by mnesarco, to give BETWEEN expressions alt a higher precedence, but in both cases, the parse tree:
is created. Which makes sense if you think about it.
The only thing I could come up with is the introduce an extra "numeric expression" rule that does not match and and between expressions:
exprRule
: value=exprRule ( '+' | '-' ) lower=exprRule #AddExpression
| value=exprRule ( '<' | '>' | '<=' | '=>' ) lower=exprRule #ComparisonExpression
| value=exprRule K_NOT? K_BETWEEN lower=exprNumeric K_AND upper=exprNumeric #BetweenExpression
| lhs=exprRule K_AND rhs=exprRule #AndExpression
| NUMBER #NumberExpression
| ID #IdExpression
;
exprNumeric
: value=exprNumeric ( '+' | '-' ) lower=exprNumeric #AddNumericExpression
| NUMBER #NumNumericberExpression
| ID #IdNumericExpression
;
which results in the parse tree:
It looks like a precedence problem. basically you need [Between] operator to have higher precedence than [And] and probably than [Or] too.
In Antlr4, precedence is just order of definition. So just try swapping the alternative order. ie:
| value=exprRule K_NOT? K_BETWEEN lower=exprRule K_AND upper=exprRule # BetweenExpression
| lhs=exprRule K_AND rhs=exprRule # AndExpression

Hive String operator concatenate by || double pipe

Language manual of Hive claims that double pipe string concatenation is supported, however I am unable to use this feature in my current version of HIVE 1.2.1000.2.4.3.6-2
hive> select 'a'||'b';
NoViableAltException(5#[323:1: atomExpression : ( ( KW_NULL )=> KW_NULL -> TOK_NULL | ( constant )=> constant | castExpression | caseExpression | whenExpression | ( functionName LPAREN )=> function | tableOrColumn | LPAREN ! expression RPAREN !);])
I was trying to find a version which starts to support that, but without any luck :-(
I know that I can use build in function concat to do the same thing, but I am rewriting bunch of Oracle views to Hive and I don't want to change things which can stay the same if possible.
Hive 2.2.0
The documentation is very clear about that
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringOperators

Lvalue awareness in ANTLR grammar and syntax predicates

I am implementing a parser with ANTLR for D. This language is based on C so there are some ambiguity around the declarations and the expressions. Consider this:
a* b = c; // This is a declaration of the variable d with a pointer-to-a type.
c = a * b; // as an expression is a multiplication.
As the second example could only appear on the right of an assignment expression I tried to resolve this problem with the following snippet:
expression
: left = assignOrConditional
(',' right = assignOrConditional)*
;
assignOrConditional
: ( postfixExpression ('=' | '+=' | '-=' | '*=' | '/=' | '%=' | '&=' | '|=' | '^=' | '~=' | '<<=' | '>>=' | '>>>=' | '^^=') )=> assignExpression
| conditionalExpression
;
assignExpression
: left = postfixExpression
( op = ('=' | '+=' | '-=' | '*=' | '/=' | '%=' | '&=' | '|=' | '^=' | '~=' | '<<=' | '>>=' | '>>>=' | '^^=')
right = assignOrExpression
)?
;
conditionalExpression
: left = logicalOrExpression
('?' e1 = conditionalExpression ':' e2 = conditionalExpression)?
;
As far as my understanding goes, this should do the trick to avoid the ambiguity but the tests are failing. If I feed the interpreter with any input, starting with the rule assignOrConditional, it will fail with NoViableAltException.
the inputs were
a = b
b-=c
d
Maybe I'm misunderstanding how the predicates are working therefore it would be great if someone could correct my explanation to the code: If the input can be read as a postfixExpression it will check if the next token after the postfixExpression is one of the assignment operators and if it is, it will parse the rule as an assignmentExpression. (Note, that the assignmentExpression and the conditionalExpression works well). If the next token isn't of them, it tries to parse it as a conditionalExpression.
EDIT
[solved] Now, there's an other problem with this solution that I could realize: the assignmentExpression has to choose in it's right hand expression is an assignment again (that is, postfix and assignment operator follows), if it is chained up.
Any idea what's wrong with my understanding?
If I feed the interpreter with any input, ...
Don't use ANTLRWorks' interpreter: it is buggy, and disregards any type of predicate. Use its debugger: it works flawlessly.
If the input can be read as a postfixExpression it will check if the next token after the postfixExpression is one of the assignment operators and if it is, it will parse the rule as an assignmentExpression.
You are correct.
EDIT [solved] Now, there's an other problem with this solution that I could realize: the assignmentExpression has to choose in it's right hand expression is an assignment again (that is, postfix and assignment operator follows), if it is chained up.
What's wrong with that?