How can I create external tables with Hive? - hive

This is the script I run on Hive:
CREATE EXTERNAL TABLE 'people'(
'name' string,
'surname' string,
'age' string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION
'gs://directory/subdirectory/'
TBLPROPERTIES (
'avro.schema.url'='gs://directory/schema.avsc',
'transient_lastDdlTime'='1644235388');
I get this error:
Error while compiling statement:
FAILED: ParseException line 1:22 cannot recognize input near ''people'' '(' ''nam'' in table name

could you pls enclose them with backtick( `)
CREATE EXTERNAL TABLE `people`(
`name` string,
`surname` string,
`age` string)
...

Related

athena insert and hive format error for HiveIgnoreKeyTextOutputFormat

Before the question/issue, here's the setup:
Table 1
CREATE EXTERNAL TABLE `table1`(
`mac_address` string,
`node` string,
`wave_found` string,
`wave_data` string,
`calc_dt` string,
`load_dt` string)
PARTITIONED BY (
`site_id` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://foobucket/object-thing'
TBLPROPERTIES (
'has_encrypted_data'='false',
'transient_lastDdlTime'='1654609315')
Table 2
CREATE EXTERNAL TABLE `table2`(
`mac_address` string,
`node` string,
`wave_found` string,
`wave_data` string,
`calc_dt` string,
PARTITIONED BY (
`load_dt` string,
`site_id` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://foobucket/object-thing'
TBLPROPERTIES (
'has_encrypted_data'='false',
'transient_lastDdlTime'='1654147830')
When the following Athena SQL is executed, the error below is thrown:
insert into tabl2
select * from table1;
"HIVE_UNSUPPORTED_FORMAT: Output format
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat with SerDe
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe is not
supported."
That error seems relatively straighforward, but I'm still stuck on
building a solution despite looking for alternatives to the so-called
HiveIgnoreKeyTextOutputFormat. There's also the partition difference
going on, but I'm not sure if that has any bearing on this current error
shown here.
Here's some sources I've found and used so far: 1, 2

How can I create an EXTERNAL table with HIVE format in databricks

I am having a external table with below format in hive.
CREATE EXTERNAL TABLE cs_mbr_prov(
key struct<inid:string,......>,
memkey string,
ob_id string,
.....
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
'hbase.columns.mapping'=' :key,ci:MEMKEY, .....',
'serialization.format'='1')
I want to create same type of table in Azure Databricks where my Input and Output are in parquet format.
As per the official Doc I created and reproduced the table with Input and Output are in parquet format.
Sample code:
CREATE EXTERNAL TABLE `vams`(
`country` string,
`count` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'dbfs:/FileStore/'
TBLPROPERTIES (
'totalSize'='2335',
'numRows'='240',
'rawDataSize'='2095',
'COLUMN_STATS_ACCURATE'='true',
'numFiles'='1',
'transient_lastDdlTime'='1418173653')
Reference:
https://learn.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-table-hiveformat

Table or database name may not contain dot(.) character

When I use hive to create a table, I am prompted not to include the dot symbol.
(state=42000,code=40000)
How can I solve this problem?
CREATE EXTERNAL TABLE `ods.a2`(
`key` string COMMENT 'k',
`value` string COMMENT 'v')
COMMENT '注释'
ROW FORMAT SERDE
'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',,',
'serialization.format'=',,')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs:///user/hive/warehouse/ods.db/a2';
err:
Error: Error while compiling statement: FAILED: SemanticException Line 1:22 Table or database name may not contain dot(.) character 'ods.a2' (state=42000,code=40000)
Please use CREATE EXTERNAL TABLE `ods`.`a2`
If any components of a multiple-part name require quoting, quote them individually rather than quoting the name as a whole. For example, write `my-table`.`my-column`, not `my-table.my-column`

Parsing nested xml file return null data in hive

I am parsing a nested xml file using hivexml serde but it returns null while we select the data from hive table.
Sample xml file is xml data.
Query which i created for parsing the xml.
CREATE EXTERNAL TABLE IF NOT EXISTS abc ( mail string, Type string, Id bigint, Date string, LId bigint, value string)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.OptOutEmail"="/Re/mail/text()",
"column.xpath.OptOutType"="/Re/Type/text()",
"column.xpath.SurveyId"="/Re/Id/text()",
"column.xpath.RequestedDate"="/Re/Date/text()",
"column.xpath.EmailListId"="/Re/Lists/LId/text()",
"column.xpath.Description"="/Re/Lists/value/text()")
STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/abc/xyz'
TBLPROPERTIES ("xmlinput.start"="<Out>","xmlinput.end"= "</Out>");
Please can someone help.
Try with the below query. I have loaded the data into the table from a local path.
CREATE EXTERNAL TABLE IF NOT EXISTS xmlList ( mail string, Type string, Id
bigint, Dated string, LId bigint, value string)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.mail"="/Re/mail/text()",
"column.xpath.Type"="/Re/Type/text()",
"column.xpath.Id"="/Re/Id/text()",
"column.xpath.Dated"="/Re/Dated/text()",
"column.xpath.LId"="/Re/Lists/List/LId/text()",
"column.xpath.value"="/Re/Lists/List/value/text()")
STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES ("xmlinput.start"="<Re>","xmlinput.end"= "</Re>");

Issue in Spark not able to read S3 subfolders of a hive table

I have 3 non-partitioned tables in Hive.
drop table default.test1;
CREATE EXTERNAL TABLE `default.test1`(
`c1` string,
`c2` string,
`c3` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
's3://bucket_name/dev/sri/sri_test1/';
drop table default.test2;
CREATE EXTERNAL TABLE `default.test2`(
`c1` string,
`c2` string,
`c3` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
's3://bucket_name/dev/sri/sri_test2/';
drop table default.test3;
CREATE EXTERNAL TABLE `default.test3`(
`c1` string,
`c2` string,
`c3` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
's3://bucket_name/dev/sri/sri_test3/';`
`
--INSERT:
insert into default.test1 values("a","b","c");
insert into default.test2 values("d","e","f");
insert overwrite default.test3 select * from default.test1 union all default.test2;
`
If I look in s3 two subfolders have been created because of the union all operation.
aws s3 ls s3://bucket_name/dev/sri/sri_test3/`;
PRE 1/
PRE 2/
Now the issue is that if I try to read the default.test3 table in pyspark and create dataframe.
df = spark.sql("select * from default.test3")
df.count()
0
How to fix this issue?