Create Hive table using “as select” and also specify TBLPROPERTIES - hive

For example, when using Parquet format, I'd like to be able to specify the compression scheme (("parquet.compression"="SNAPPY")). Running this query:
CREATE TABLE table_a_copy
STORED AS PARQUET
TBLPROPERTIES("parquet.compression"="SNAPPY")
AS
SELECT * FROM table_a
returns an error:
Error: Error while compiling statement: FAILED: ParseException line 1:69 cannot recognize input near 'parquet' '.' 'compression' in table properties list (state=42000,code=40000)
The same query without the TBLPROPERTIES works just fine.
This is similar to this question: Create hive table using "as select" or "like" and also specify delimiter. But I can't figure out how to make TBLPROPERTIES work with that approach. I'm using Hive 1.1.

I was able to run same exact statement in Hive 2.1.1 version.
Try with this workaround:
CREATE TABLE table_a_copy like table_a STORED AS PARQUET;
alter table set TBLPROPERTIES("parquet.compression"="SNAPPY");
insert into table table_a_copy select * from table_a ;

Related

Create a table in hive with timestamp as comment

I would like to create a table in hive, inside the comment include the creation date (current_timestamp function). Something like this:
CREATE TABLE IF NOT EXISTS ex.tb_test ( field1 int, field2 String) COMMENT current_timestamp STORED AS TEXTFILE;
But it returns error: ILED: ParseException line 2: 8 mismatched input 'current_timestamp' expecting StringLiteral near 'COMMENT'
Do you know any way to add to the comment the creation date of the table?
Functions are not supported in table DDL. You can pass pre-calculated timestamp as a --hiveconf parameter and use for example like this: comment '${hiveconf:ts}'(it should be quoted), such parameter will be resolved as a string literal before command execution.
BTW Hive stores CreateTime.
describe formatted table_name command outputs CreateTime along with other table info.

Cloudera - Hive/Impala Show Create Table - Error with the syntax

I'm making some automatic processes to create tables on Cloudera Hive.
For that I am using the show create table statement that me give (for example) the following ddl:
CREATE TABLE clsd_core.factual_player ( player_name STRING, number_goals INT ) PARTITIONED BY ( player_name STRING ) WITH SERDEPROPERTIES ('serialization.format'='1') STORED AS PARQUET LOCATION 'hdfs://nameservice1/factual_player'
What I need is to run the ddl on a different place to create a table with the same name.
However, when I run that code I return the following error:
Error while compiling statement: FAILED: ParseException line 1:123 missing EOF at 'WITH' near ')'
And I remove manually this part "WITH SERDEPROPERTIES ('serialization.format'='1')" it was able to create the table with success.
Is there a better function to retrieves the tables ddls without the SERDE information?
First issue in your DDL is that partitioned column should not be listed in columns spec, only in the partitioned by. Partition is the folder with name partition_column=value and this column is not stored in the table files, only in the partition directory. If you want partition column to be in the data files, it should be named differently.
Second issue is that SERDEPROPERTIES is a part of SERDE specification, If you do not specify SERDE, it should be no SERDEPROPERTIES. See this manual: StorageFormat andSerDe
Fixed DDL:
CREATE TABLE factual_player (number_goals INT)
PARTITIONED BY (player_name STRING)
STORED AS PARQUET
LOCATION 'hdfs://nameservice1/factual_player';
STORED AS PARQUET already implies SERDE, INPUTFORMAT and OUPPUTFORMAT.
If you want to specify SERDE with it's properties, use this syntax:
CREATE TABLE factual_player(number_goals int)
PARTITIONED BY (player_name string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES ('serialization.format'='1') --I believe you really do not need this
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'hdfs://nameservice1/factual_player'

Create hive external table with partitions

I have already an internal table in hive. Now I want to create an external table with partitions based on date to it. But it throws error, when I try to create it.
Sample code:
create external table db_1.T_DATA1 partitioned by (date string) as select * from db_2.temp
LOCATION 'file path';
Error:
ParseException line 2:0 cannot recognize input near 'LOCATION' ''file
path'' '' in table source
As per the answer provided at https://stackoverflow.com/a/26722320/4326922 you should be able to create external table with CTAS.

How to add avro.schema.url to hive partition storage information?

I am trying command
ALTER TABLE mytable PARTITION(date='2010-02-22') SET 'avro.schema.url'
'hdfs://xxx.com:9000/location/to/my/schema/_schema.avsc';
But it is returning parsing Error :
FAILED: ParseException line 1:49 cannot recognize input near 'SET'
''avro.schema.url'' ''hdfs://xxx.com:9000/location/to/my/schema/_schema.avsc'' in alter
table partition statement suffix
This is right syntax:
ALTER TABLE mytable PARTITION(date='2010-02-22') SET TBLPROPERTIES(
'avro.schema.url'
'hdfs://xxx.com:9000/location/to/my/schema/_schema.avsc');

Unable to create table in hive

I am creating table in hive like:
CREATE TABLE SEQUENCE_TABLE(
SEQUENCE_NAME VARCHAR2(225) NOT NULL,
NEXT_VAL NUMBER NOT NULL
);
But, in result there is parse exception. Unable to read Varchar2(225) NOT NULL.
Can anyone guide me that how to create table like given above and any other process to provide path for it.
There's no such thing as VARCHAR, field width or NOT NULL clause in hive.
CREATE TABLE SEQUENCE_TABLE( SEQUENCE_TABLE string, NEXT_VAL bigint);
Please read this for CREATE TABLE syntax:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
Anyway Hive is "SQL Like" but it's not "SQL". I wouldn't use it for things such as sequence table as you don't have support for transactions, locking, keys and everything you are familiar with from Oracle (though I think that in new version there is simple support for transactions, updates, deletes, etc.).
I would consider using normal OLTP database for whatever you are trying to achieve
only you have option here like:
CREATE TABLE SEQUENCE_TABLE(SEQUENCE_NAME String,NEXT_VAL bigint) row format delimited fields terminated by ',' stored as textfile;
PS:Again depends the types to data you are going to load in hive
Use following syntax...
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[ROW FORMAT row_format]
[STORED AS file_format]
And Example of hive create table
CREATE TABLE IF NOT EXISTS employee ( eid int, name String,
salary String, destination String)
COMMENT ‘Employee details’
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;