Does the MultiDelimitSerde support NULL DEFINED AS clause? - sql

This article shows that we can use multi-character delimiter in Hive.
But can we also specify the NULL value?
I tried the following hive sql which returns an error:
CREATE TABLE temp
( a STRING, b STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
WITH SERDEPROPERTIES ("field.delim"="##")
NULL DEFINED AS 'NULL'
STORED AS TEXTFILE;
The error:
Error: Error while compiling statement: FAILED: ParseException line 5:0 missing EOF at 'NULL' near ')' (state=42000,code=40000)

The option to use NULL DEFINED AS 'NULL' is available when we are using a ROW FORMAT DELIMITED option. Here we are using a ROW FORMAT SERDE option so we need to explicitly pass the property serialization.null.format.
you can use the below query by setting the property value of serialization.null.format:
CREATE TABLE temp
( a STRING, b STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
WITH SERDEPROPERTIES ("field.delim"="##",'serialization.null.format'='NULL')
STORED AS TEXTFILE;
For more information you can refer Hive DDL reference guide. MultiDelimitSerde source code.
HIVE DDL GUIDE:
row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

Related

Table or database name may not contain dot(.) character

When I use hive to create a table, I am prompted not to include the dot symbol.
(state=42000,code=40000)
How can I solve this problem?
CREATE EXTERNAL TABLE `ods.a2`(
`key` string COMMENT 'k',
`value` string COMMENT 'v')
COMMENT '注释'
ROW FORMAT SERDE
'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',,',
'serialization.format'=',,')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs:///user/hive/warehouse/ods.db/a2';
err:
Error: Error while compiling statement: FAILED: SemanticException Line 1:22 Table or database name may not contain dot(.) character 'ods.a2' (state=42000,code=40000)
Please use CREATE EXTERNAL TABLE `ods`.`a2`
If any components of a multiple-part name require quoting, quote them individually rather than quoting the name as a whole. For example, write `my-table`.`my-column`, not `my-table.my-column`

Cloudera - Hive/Impala Show Create Table - Error with the syntax

I'm making some automatic processes to create tables on Cloudera Hive.
For that I am using the show create table statement that me give (for example) the following ddl:
CREATE TABLE clsd_core.factual_player ( player_name STRING, number_goals INT ) PARTITIONED BY ( player_name STRING ) WITH SERDEPROPERTIES ('serialization.format'='1') STORED AS PARQUET LOCATION 'hdfs://nameservice1/factual_player'
What I need is to run the ddl on a different place to create a table with the same name.
However, when I run that code I return the following error:
Error while compiling statement: FAILED: ParseException line 1:123 missing EOF at 'WITH' near ')'
And I remove manually this part "WITH SERDEPROPERTIES ('serialization.format'='1')" it was able to create the table with success.
Is there a better function to retrieves the tables ddls without the SERDE information?
First issue in your DDL is that partitioned column should not be listed in columns spec, only in the partitioned by. Partition is the folder with name partition_column=value and this column is not stored in the table files, only in the partition directory. If you want partition column to be in the data files, it should be named differently.
Second issue is that SERDEPROPERTIES is a part of SERDE specification, If you do not specify SERDE, it should be no SERDEPROPERTIES. See this manual: StorageFormat andSerDe
Fixed DDL:
CREATE TABLE factual_player (number_goals INT)
PARTITIONED BY (player_name STRING)
STORED AS PARQUET
LOCATION 'hdfs://nameservice1/factual_player';
STORED AS PARQUET already implies SERDE, INPUTFORMAT and OUPPUTFORMAT.
If you want to specify SERDE with it's properties, use this syntax:
CREATE TABLE factual_player(number_goals int)
PARTITIONED BY (player_name string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES ('serialization.format'='1') --I believe you really do not need this
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'hdfs://nameservice1/factual_player'

Error while creating hive table

I used the following syntax while creating the hive table--
Create table tablename (ColumnName Type)
row format SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with SERDEPROPERTIES ("separatorChar" = "\;")
lines terminated by '\n'
tblproperties ("skip.header.line.count" = "1");
But I am getting an error message
FAILED: ParseException line 1:361 missing EOF at 'lines' near ')'
I'm not sure what I'm doing wrong. Please help!
If you have a single column, you don't need a separatorchar.If you have multiple fields and if they are separated by ';' then you don't need to escape the ';'
SERDEPROPERTIES ("separatorChar" = ";")
STORED AS TEXTFILE
LOCATION '/path/yourfile.csv'

error loading csv into hive table

I'm trying to load a tab delimited file into a table in hive, and I want to skip the first row because it contains column names. I'm trying to run the code below, but I'm getting the error below. Does anyone see what the issue is?
Code:
set hive.exec.compress.output=false;
set hive.mapred.mode=nonstrict;
-- region to state mapping
DROP TABLE IF EXISTS StateRegion;
CREATE TEMPORARY TABLE StateRegion (Zip_Code int,
Place_Name string,
State string,
State_Abbreviate string,
County string,
Latitude float,
Longitude float,
ZIP_CD int,
District_NM string,
Region_NM string)
row format delimited fields terminated by '\t'
tblproperties("skip.header.line.count"="1");
STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH 'StateRegion'
OVERWRITE INTO TABLE StateRegion;
--test Export
INSERT OVERWRITE LOCAL DIRECTORY './StateRegionTest/'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
select * from StateRegion;
Error:
FAILED: ParseException line 2:0 cannot recognize input near 'STORED' 'AS' 'TEXTFILE'

Creating column names with "(" in hive 1.1.0

I tried to create table in hive as below:
create table IF NOT EXISTS department(deptid int, deptname(1) string, deptname(2) string)
row format delimited
fields terminated by ','
lines terminated by '\n'
stored as textfile;
I am getting error as
Error while compiling statement: FAILED: ParseException line 1:58 cannot recognize input near '(' '1' ')' in column type
Is there any other way to create columns with "("
Use ` (backtick) to escape ( (round bracket).
It can be used for both tables names and fields names.
Try:
create table IF NOT EXISTS department(`deptid` int, `deptname(1)` string, `deptname(2)` string) row format delimited fields terminated by ',' lines terminated by '\n' stored as textfile;