Cloudera - Hive/Impala Show Create Table - Error with the syntax - hive

I'm making some automatic processes to create tables on Cloudera Hive.
For that I am using the show create table statement that me give (for example) the following ddl:
CREATE TABLE clsd_core.factual_player ( player_name STRING, number_goals INT ) PARTITIONED BY ( player_name STRING ) WITH SERDEPROPERTIES ('serialization.format'='1') STORED AS PARQUET LOCATION 'hdfs://nameservice1/factual_player'
What I need is to run the ddl on a different place to create a table with the same name.
However, when I run that code I return the following error:
Error while compiling statement: FAILED: ParseException line 1:123 missing EOF at 'WITH' near ')'
And I remove manually this part "WITH SERDEPROPERTIES ('serialization.format'='1')" it was able to create the table with success.
Is there a better function to retrieves the tables ddls without the SERDE information?

First issue in your DDL is that partitioned column should not be listed in columns spec, only in the partitioned by. Partition is the folder with name partition_column=value and this column is not stored in the table files, only in the partition directory. If you want partition column to be in the data files, it should be named differently.
Second issue is that SERDEPROPERTIES is a part of SERDE specification, If you do not specify SERDE, it should be no SERDEPROPERTIES. See this manual: StorageFormat andSerDe
Fixed DDL:
CREATE TABLE factual_player (number_goals INT)
PARTITIONED BY (player_name STRING)
STORED AS PARQUET
LOCATION 'hdfs://nameservice1/factual_player';
STORED AS PARQUET already implies SERDE, INPUTFORMAT and OUPPUTFORMAT.
If you want to specify SERDE with it's properties, use this syntax:
CREATE TABLE factual_player(number_goals int)
PARTITIONED BY (player_name string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES ('serialization.format'='1') --I believe you really do not need this
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'hdfs://nameservice1/factual_player'

Related

How can I create an EXTERNAL table with HIVE format in databricks

I am having a external table with below format in hive.
CREATE EXTERNAL TABLE cs_mbr_prov(
key struct<inid:string,......>,
memkey string,
ob_id string,
.....
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
'hbase.columns.mapping'=' :key,ci:MEMKEY, .....',
'serialization.format'='1')
I want to create same type of table in Azure Databricks where my Input and Output are in parquet format.
As per the official Doc I created and reproduced the table with Input and Output are in parquet format.
Sample code:
CREATE EXTERNAL TABLE `vams`(
`country` string,
`count` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'dbfs:/FileStore/'
TBLPROPERTIES (
'totalSize'='2335',
'numRows'='240',
'rawDataSize'='2095',
'COLUMN_STATS_ACCURATE'='true',
'numFiles'='1',
'transient_lastDdlTime'='1418173653')
Reference:
https://learn.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-table-hiveformat

How do you add Data to an Existing Hive Metastore?

I have multiple subdirectories in S3 that contain .orc files. I'm trying to create a hive metastore so I can query the data with Presto / Hive, etc. The data is poorlly structured (no consistent delimiter, ugly characters, etc). Here's a scrubbed sample:
1488736466 199.199.199.199 0_b.www.sphericalcow.com.f9b1.qk-g6m6z24tdr.v4.url.name.com TXT IN: NXDOMAIN/0/143
1488736466 6.6.5.4 0.3399.186472.4306.6668.638.cb5a.names-things.update.url.name.com TXT IN: NOERROR/3/306 0\009253\009http://az.blargi.ng/%D3%AB%EF%BF%BD%EF%BF%BD/\009 0\009253\009http://casinoroyal.online/\009 0\009253\009http://d2njbfxlilvpsq.cloudfront.net/b_zq_ym_bangvideo/bangvideo0826.apk\009
I was able to create a table pointing to one of the subdirectories using a serde regex and the fields are parsing properly, but as far as I can tell I can only load one subfolder at a time.
How does one add more data to an existing hive metastore?
Here's an example of my hive metastore create statement with the regex serde bit:
DROP TABLE IF EXISTS test;
CREATE EXTERNAL TABLE test (field1 string, field2 string, field3 string, field4 string)
COMMENT 'fill all the tables with the datas.'
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([0-9]{10}) ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}) (\\S*) (.*)",
"output.format.string" = "%1$s %2$s %3$s %4$s"
)
STORED AS ORC
LOCATION 's3://path/to/one/of/10/folders/'
tblproperties ("orc.compress" = "SNAPPY", "skip.header.line.count"="2");
select * from test limit 10;
I realize there is probably a very simple solution, but I tried INSERT INTO in place of CREATE EXTERNAL TABLE, but it understandably complains about the input, and I looked in both the hive and serde documentation for help but was unable to find a reference to adding to an existing store.
Possible solution using partitions.
CREATE EXTERNAL TABLE test (field1 string, field2 string, field3 string, field4 string)
partitioned by (mypartcol string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([0-9]{10}) ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}) (\\S*) (.*)"
)
LOCATION 's3://whatever/as/long/as/it/is/empty'
tblproperties ("skip.header.line.count"="2");
alter table test add partition (mypartcol='folder 1') location 's3://path/to/1st/of/10/folders/';
alter table test add partition (mypartcol='folder 2') location 's3://path/to/2nd/of/10/folders/';
.
.
.
alter table test add partition (mypartcol='folder 10') location 's3://path/to/10th/of/10/folders/';
For #TheProletariat (the OP)
It seems there is no need for RegexSerDe since the columns are delimited by space (' ').
Note the use of tblproperties ("serialization.last.column.takes.rest"="true")
create external table test
(
field1 bigint
,field2 string
,field3 string
,field4 string
)
row format delimited
fields terminated by ' '
tblproperties ("serialization.last.column.takes.rest"="true")
;

Error while creating table in Hive

I am new to hadoop. I need a help regarding error encountered in Hive while creating a new table. I have gone through this Hive FAILED: ParseException line 2:0 cannot recognize input near ''macaddress'' 'CHAR' '(' in column specification
My question: Is it necessary to write a location of the table in the script? because I am writing table location at starting and I am afraid about writing the location because it should not disturb my rest of the databases by any mulfunction operation.
Here is my query:
CREATE TABLE meta_statistics.tank_items (
shop_offers_history_before bigint,
shop_offers_temp bigint,
videos_distinct_temp bigint,
deleted_temp bigint,
t_stamp timestamp )
CLUSTERED BY (
tank_items_id)
INTO 8 BUCKETS
ROW FORMAT SERDE
TBLPROPERTIES (transactional=true)
STORED AS ORC;
The error I am getting is-
ParseException line 1:3 cannot recognize input near 'TBLPROPERTIES'
'(' 'transactional'
What would be the other possibilities of errors and how can I remove those?
There is a syntax error in your create query. Error which you have shared says that hive cannot recognize input near 'TBLPROPERTIES'.
Solution:
As per hive syntax, the key value passed in TBLPROPERTIES should be in double quotes. it should be like this: TBLPROPERTIES ("transactional"="true")
So if I correct your query it will be:
CREATE TABLE meta_statistics.tank_items (
shop_offers_history_before bigint,
shop_offers_temp bigint,
videos_distinct_temp bigint,
deleted_temp bigint,
t_stamp timestamp
) CLUSTERED BY (tank_items_id) INTO 8 BUCKETS
ROW FORMAT SERDE TBLPROPERTIES ("transactional"="true") STORED AS ORC;
Execute above query, then if you get any other syntax error them make sure that the order of STORED AS , CLUSTERED BY , TBLPROPERTIES is as per the hive syntax.
Refer this for more details:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
1) ROW FORMAT SERDE -> you should pass some serde
2) TBLPROPERTIES key value should be in double quotes
3) if you give CLUSTERED BY value should be there in the columns given
replace as follows
CREATE TABLE meta_statistics.tank_items ( shop_offers_history_before bigint, shop_offers_temp bigint, videos_distinct_temp bigint, deleted_temp bigint, t_stamp timestamp ) CLUSTERED BY (shop_offers_history_before) INTO 8 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' STORED AS ORC TBLPROPERTIES ("transactional"="true");
hope this helps

Case Sensitive Partition Column in hive

I need to create partition on a column with an UPPERCASE Column Name. However, Hive converts all the column Names to lowercase implicitly. I am not able to get data in my select * query. I cannot change the Folder name to lowercase. Below is the Create query I am using:
CREATE EXTERNAL TABLE db.ALERT_DS_Hive_Test
PARTITIONED BY (PROC_ID String)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES ('casesensitive'='PROC_ID')
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/user/sam/hive-test/ALERT_DS.avro/'
TBLPROPERTIES ('avro.schema.url'='/user/sam/hive-test/ALERT_DS.avsc');

How to create table using external key word in hive

I want to process the data in hdfs , i am trying to create table using external keyword then i am getting following error, can you please provide solution for this.
hive> create EXTERNAL table samplecv(id INT, name STRING)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties (
"separatorChar" = "\t",
"quoteChar" = "'",
"escapeChar" = "\\"
)
LOCATION '/home/siva/jobportal/sample.csv';
I am getting following error, can u please provide solution for this
FAILED: Error in metadata: MetaException(message:Got exception: org.apache.hadoop.ipc.RemoteException java.io.FileNotFoundException: Parent path is not a directory: /home/siva/jobportal/sample.csv
can you please confirm that this path is on HDFS?
More info on the creating external tables in Hive: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ExternalTables
I use the following for XML parsing serde in hive ---
CREATE EXTERNAL TABLE XYZ(
X STRING,
Y STRING,
Z ARRAY<STRING>
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.X"="/XX/#X",
"column.xpath.Y"="/YY/#Y"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/user/XXX'
TBLPROPERTIES (
"xmlinput.start"="<xml start",
"xmlinput.end"="</xml end>"
);
At the moment, Hive only allows you to set a directory as a partition location while adding a partition. What you are trying to do here is to set a file as partition location. The workaround that I use is to first add a partition with a dummy/non-existent directory (Hive doesn't demand the directory exist while it's being set as a partition location) and then use alter table partition set location to change the partition location to your desired file. Surprisingly, Hive doesn't force the location to be a directory while setting the location of an existing partition the way it does while adding a new partition. So in your case, it will look like -
alter table samplecv add partition (id='11', name='somename') location '/home/siva/jobportal/somedirectory'
alter table samplecv partition (id='11', name='somename') set location '/home/siva/jobportal/sample.csv'
Hive always expect directory name in Location path instead of file name.
create your file inside a directory for example inside /home/siva/jobportal/sample/sample.csv and then try running below command to create your hive table.
create EXTERNAL table samplecv(id INT, name STRING)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties (
"separatorChar" = "\t",
"quoteChar" = "'",
"escapeChar" = "\\"
)
LOCATION '/home/siva/jobportal/sample';
In case if you get any error just put your file in hdfs and try, it should work.