I have created a table in hive as below ,
hive> create table engeometry(name string,shape binary)
> ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde'
> row format delimited by '\n'
> STORED AS INPUTFORMAT 'com.esri.json.hadoop.UnenclosedJsonInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> ;
I get error as shown below:
FAILED: ParseException line 3:0 missing EOF at 'row' near
''com.esri.hadoop.hive.serde.JsonSerde''
I want to create table in hive with space defined between rows , when i try the same i get message as show above.
You don't need to give row format delimited by Clause.
Below query is enough to work:
hive> create table engeometry(name string,shape binary)
> ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde'
> STORED AS INPUTFORMAT 'com.esri.json.hadoop.UnenclosedJsonInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
Related
I want to create an external table from a .csv file I uploaded to the server earlier.
In Bline (shell for Hive), I tried running this script:
CREATE EXTERNAL TABLE c_fink_category_mapping (
trench_code string,
fink_code string
)
row format delimited fields terminated by '\073' stored as textfile
location '/appl/trench/dev/data/in/main/daily_wf/fink_category_mapping'
TABLEPROPERTIES ('serialization.null.format' = '')
;
which creates the table w/o any error byt the table itself is empty.
Help would be appreciated.
My textfile is populated with data.
First, check if the location path is correct.
Then try with this configuration:
CREATE EXTERNAL TABLE c_fink_category_mapping (
trench_code string,
fink_code string
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'quoteChar'='"',
'separatorChar'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/appl/trench/dev/data/in/main/daily_wf/fink_category_mapping';
response provided above seems to be correct:
CREATE EXTERNAL TABLE c_fink_category_mapping (
trench_code string,
fink_code string
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'quoteChar'='"',
'separatorChar'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/appl/trench/dev/data/in/main/daily_wf/fink_category_mapping';
This will create the table using a comma as the delimiter, which should correctly parse the data in your CSV file and populate the table with the data from the file. You can also specify a different delimiter character, such as '\t', if that is more appropriate for your data.
I'm making some automatic processes to create tables on Cloudera Hive.
For that I am using the show create table statement that me give (for example) the following ddl:
CREATE TABLE clsd_core.factual_player ( player_name STRING, number_goals INT ) PARTITIONED BY ( player_name STRING ) WITH SERDEPROPERTIES ('serialization.format'='1') STORED AS PARQUET LOCATION 'hdfs://nameservice1/factual_player'
What I need is to run the ddl on a different place to create a table with the same name.
However, when I run that code I return the following error:
Error while compiling statement: FAILED: ParseException line 1:123 missing EOF at 'WITH' near ')'
And I remove manually this part "WITH SERDEPROPERTIES ('serialization.format'='1')" it was able to create the table with success.
Is there a better function to retrieves the tables ddls without the SERDE information?
First issue in your DDL is that partitioned column should not be listed in columns spec, only in the partitioned by. Partition is the folder with name partition_column=value and this column is not stored in the table files, only in the partition directory. If you want partition column to be in the data files, it should be named differently.
Second issue is that SERDEPROPERTIES is a part of SERDE specification, If you do not specify SERDE, it should be no SERDEPROPERTIES. See this manual: StorageFormat andSerDe
Fixed DDL:
CREATE TABLE factual_player (number_goals INT)
PARTITIONED BY (player_name STRING)
STORED AS PARQUET
LOCATION 'hdfs://nameservice1/factual_player';
STORED AS PARQUET already implies SERDE, INPUTFORMAT and OUPPUTFORMAT.
If you want to specify SERDE with it's properties, use this syntax:
CREATE TABLE factual_player(number_goals int)
PARTITIONED BY (player_name string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES ('serialization.format'='1') --I believe you really do not need this
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'hdfs://nameservice1/factual_player'
command "show tables" is not showing view with name temp.
hive > SET hive.support.sql11.reserved.keywords=false
> create view temp as SELECT hour,id,regexp_replace(text,'\n','') as text FROM proj_tweets_2
> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS TEXTFILE;
Warning: Value had a \n character in it.
I'm moving the data from Mysql to S3 using data pipeline and it creates empty file for couple of days. I believe, it is making my athena query fails with
"HIVE_CURSOR_ERROR: Unexpected end of input stream".
Below is my script
CREATE EXTERNAL TABLE `test`(
`col0` bigint,
`col1` bigint,
`col2` string,
`col3` string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://dummy/'
Could you please let me know if there is any option to skip zero bytes S3 file?
I need to create partition on a column with an UPPERCASE Column Name. However, Hive converts all the column Names to lowercase implicitly. I am not able to get data in my select * query. I cannot change the Folder name to lowercase. Below is the Create query I am using:
CREATE EXTERNAL TABLE db.ALERT_DS_Hive_Test
PARTITIONED BY (PROC_ID String)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES ('casesensitive'='PROC_ID')
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/user/sam/hive-test/ALERT_DS.avro/'
TBLPROPERTIES ('avro.schema.url'='/user/sam/hive-test/ALERT_DS.avsc');