How to add ARRAY column to Spark table (using ALTER TABLE)? - apache-spark-sql

I am trying to add a new column of Array Type to the table with default value.
%sql
ALTER TABLE testdb.tabname ADD COLUMN new_arr_col ARRAY DEFAULT ['A','B','C'];
But it says that the data type in not supported
Error in SQL statement: ParseException:
DataType array is not supported.(line 1, pos 54)
== SQL ==
ALTER TABLE testdb.dim_category ADD COLUMN c_cat_area ARRAY
So, there is no way we can add an array column directly to the table? Kindly assist me on this. Thanks in advance!

The reason for this error is that complex types (e.g. ARRAY) require another type to be specified (cf. SqlBase.g4):
dataType
: complex=ARRAY '<' dataType '>' #complexDataType
| complex=MAP '<' dataType ',' dataType '>' #complexDataType
| complex=STRUCT ('<' complexColTypeList? '>' | NEQ) #complexDataType
| INTERVAL from=(YEAR | MONTH) (TO to=MONTH)? #yearMonthIntervalDataType
| INTERVAL from=(DAY | HOUR | MINUTE | SECOND)
(TO to=(HOUR | MINUTE | SECOND))? #dayTimeIntervalDataType
| identifier ('(' INTEGER_VALUE (',' INTEGER_VALUE)* ')')? #primitiveDataType
;
In your case, it'd be as follows:
ARRAY<CHAR(3)>

Related

How can I correctly express in BNF this condition?

I am looking for a way to express the following types of conditions in BNF:
if(carFixed) { }
if(carFixed = true) {}
if(cars >= 4) { }
if(cars != 15) { }
if(cars < 3 && cars > 1) {}
Note:
* denotes 0 or more instances of something.
I have replaced normal BNF ::= with :.
I presently am using the following code, and am not sure if it's correct:
conditionOperator: "=" | "!=" | "<=" | ">=" | "<" | ">" | "is";
logicalAndOperator: "&&";
condition: (booleanIdentifier ((conditionOperator booleanIdentifier)* (logicalAndOperator | logicalOrOperator) booleanIdentifer (conditionOperator booleanIdentifier)*)*);
There are several approaches and they usually rely on the capabilities of the parser to indicate precedence and associativty. One that is typically used with recursive-descent parsers is to recreate the precedence of the operators by using the hierarchy provided by the bnf (or, in this case, pseudo-bnf) structure.
(In the examples bellow, CONDITIONAL_OP are the likes of <, != etc and LOGICAL_OP are &&, || etc)
Something in the lines of:
condition: logicalExpr
logicalExpr: conditionalExpr (LOGICAL_OP conditionalExpr)*
conditionalExpr: primary (CONDITIONAL_OP primary)*
primary: NUMBER | IDENTIFIER | BOOLEAN_LITERAL | '(' condition ')'
The problem with the above solution is that the left-associativity of the operators is lost and requires special measures to restore it while parsing.
For parsers able to deal with left recursion, a more 'correct' notation could be:
condition: logicalExpr
logicalExpr: logicalExpr LOGICAL_OP conditionalExpr
| conditionalExpr
conditionalExpr: conditionalExpr CONDITIONAL_OP primary
| primary
primary: NUMBER | IDENTIFIER | BOOLEAN_LITERAL | '(' condition ')'
Finally, some parsers allow a special notation to indicate precedence and associativity. Something like (note that this is a completely invented syntax):
%LEFT LOGICAL_OP
%LEFT CONDITIONAL_OP
condition: condition CONDITIONAL_OP condition
| condition LOGICAL_OP condition
| '(' condition ')'
| NUMBER
| IDENTIFIER
| BOOLEAN_LITERAL
Hope this points you the right direction.

Parsing an input string containing a dot(.) is not getting validated in ANTLR

I am having an application "abc" and I am trying to parse a job (Input string).
abc throwing error to show status of job if the job name contains dot(.)
»abc status -jn UpgradeJob_435_1.61.4_xyz_1000_KPI_Upgrade_confirm
Error 2001 : Command Syntax error. extraneous input
'.61.4_xyz_1000_KPI_Upgrade_confirm' expecting
{<EOF>, JOB, JOB_OWNER, JOB_TYPE, JOB_STATUS}
Suggested Solution : Please check online help for correct syntax
It works fine if we give the jobname in double quotes.
For fix of the same I have added DOT rule in the command parser. Below are the snippets of the changes made.
Snippet of the Parser:
jobNameQuery :
JOB (id | DOT | stringWithQuotes)
;
jobOwnerQuery:
JOB_OWNER (id | DOT | stringWithQuotes)
;
Snippet Of Lexar:
DOT : '.' ;
ID: [a-zA-Z0-9_]([a-zA-Z0-9_#{}()#$%^~!`'-] | '[' | ']' )*;
Error Message:
Command Syntax error. extraneous input '.1' expecting {, JOB, JOB_OWNER, JOB_TYPE, JOB_STATUS}
Can someone please suggest what changes I need to make.
Depending on your exact requirements, either make . one of the allowed characters in ID, or change
(id | DOT | stringWithQuotes)
to
(
id (DOT id)*
| stringWithQuotes
)
As it is now, you allow either a quoted string, an identifier, or a single dot - not identifier intermixed with dots.

Hive String operator concatenate by || double pipe

Language manual of Hive claims that double pipe string concatenation is supported, however I am unable to use this feature in my current version of HIVE 1.2.1000.2.4.3.6-2
hive> select 'a'||'b';
NoViableAltException(5#[323:1: atomExpression : ( ( KW_NULL )=> KW_NULL -> TOK_NULL | ( constant )=> constant | castExpression | caseExpression | whenExpression | ( functionName LPAREN )=> function | tableOrColumn | LPAREN ! expression RPAREN !);])
I was trying to find a version which starts to support that, but without any luck :-(
I know that I can use build in function concat to do the same thing, but I am rewriting bunch of Oracle views to Hive and I don't want to change things which can stay the same if possible.
Hive 2.2.0
The documentation is very clear about that
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringOperators

How to express a hex literal in Spark SQL?

I am new to Spark SQL. I have searched the language manual for Hive/SparkSQL and googled for the answer, but could not find an obvious answer.
In MySQL we can express a hex literal 0xffff like this:
mysql>select 0+0xffff;
+----------+
| 0+0xffff |
+----------+
| 65535 |
+----------+
1 row in set (0.00 sec)
But in Spark SQL (I am using the beeline client), I could only do the following where the numerical values are expressed in decimal not hexidecimal.
> select 0+65535;
+--------------+--+
| (0 + 65535) |
+--------------+--+
| 65535 |
+--------------+--+
1 row selected (0.047 seconds)
If I did the following instead, I would get an error:
> select 0+0xffff;
Error: org.apache.spark.sql.AnalysisException:
cannot resolve '`0xffff`' given input columns: []; line 1 pos 9;
'Project [unresolvedalias((0 + '0xffff), None)]
+- OneRowRelation$ (state=,code=0)
How do we express a hex literal in Spark SQL?
Unfortunatelly, you can't do it in Spark SQL.
You can discover it just by looking at the ANTLR grammar file. There, the number rule defined via DIGIT lexer rule which looks like this:
number
: MINUS? DECIMAL_VALUE #decimalLiteral
| MINUS? INTEGER_VALUE #integerLiteral
| MINUS? BIGINT_LITERAL #bigIntLiteral
| MINUS? SMALLINT_LITERAL #smallIntLiteral
| MINUS? TINYINT_LITERAL #tinyIntLiteral
| MINUS? DOUBLE_LITERAL #doubleLiteral
| MINUS? BIGDECIMAL_LITERAL #bigDecimalLiteral
;
...
INTEGER_VALUE
: DIGIT+
;
...
fragment DIGIT
: [0-9]
;
It does not include any hexadecimal characters, so you can't use them.

XText datatype definition and usage

I want to build an Editor for a language with different groups of variable types, but have problems with the generated content assistant.
Type:
'TYPE' ':' name=ID '(' type=[ANY] ')' ';'
;
ANY:
ANY_NUM | Type
;
ANY_NUM:
ANY_REAL | ANY_INT ...
;
ANY_REAL:
'real' | 'float'
;
ANY_INT:
'int' | 'sint' | 'lint'
;
The idea is, that specific types are not allowed everywhere, so I want to use type=(ANY_REAL) for example in some cases. The generated content assistant does not show anything here, so I want to know if this is the correct approach to specify variable types and groups.
OK. The answer is quite simple. Each Variable type has to be defined within an enum (EnumRule), the structure itself is a simple type reference (ParserRule):
TR_Any:
TR_AnyDerived | TR_AnyElementary
;
TR_AnyDerived:
...
;
TR_AnyElementary:
TR_AnyReal | TR_AnyInt |...
;
TR_AnyReal:
type = E_AnyReal
;
TR_AnyInt:
type = E_AnyInt
;
enum E_AnyReal:
FLOAT = "float" |
DOUBLE = "double" |
...
;
enum E_AnyInt:
INT = "int"
;
The types can be referenced as described in the xtext documentation:
MyRule:
anyvar = [TR_Any]
intvar = [TR_Int]
;