Created external table but it's empty - sql

I want to create an external table from a .csv file I uploaded to the server earlier.
In Bline (shell for Hive), I tried running this script:
CREATE EXTERNAL TABLE c_fink_category_mapping (
trench_code string,
fink_code string
)
row format delimited fields terminated by '\073' stored as textfile
location '/appl/trench/dev/data/in/main/daily_wf/fink_category_mapping'
TABLEPROPERTIES ('serialization.null.format' = '')
;
which creates the table w/o any error byt the table itself is empty.
Help would be appreciated.
My textfile is populated with data.

First, check if the location path is correct.
Then try with this configuration:
CREATE EXTERNAL TABLE c_fink_category_mapping (
trench_code string,
fink_code string
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'quoteChar'='"',
'separatorChar'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/appl/trench/dev/data/in/main/daily_wf/fink_category_mapping';

response provided above seems to be correct:
CREATE EXTERNAL TABLE c_fink_category_mapping (
trench_code string,
fink_code string
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'quoteChar'='"',
'separatorChar'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/appl/trench/dev/data/in/main/daily_wf/fink_category_mapping';
This will create the table using a comma as the delimiter, which should correctly parse the data in your CSV file and populate the table with the data from the file. You can also specify a different delimiter character, such as '\t', if that is more appropriate for your data.

Related

How can I create an EXTERNAL table with HIVE format in databricks

I am having a external table with below format in hive.
CREATE EXTERNAL TABLE cs_mbr_prov(
key struct<inid:string,......>,
memkey string,
ob_id string,
.....
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
'hbase.columns.mapping'=' :key,ci:MEMKEY, .....',
'serialization.format'='1')
I want to create same type of table in Azure Databricks where my Input and Output are in parquet format.
As per the official Doc I created and reproduced the table with Input and Output are in parquet format.
Sample code:
CREATE EXTERNAL TABLE `vams`(
`country` string,
`count` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'dbfs:/FileStore/'
TBLPROPERTIES (
'totalSize'='2335',
'numRows'='240',
'rawDataSize'='2095',
'COLUMN_STATS_ACCURATE'='true',
'numFiles'='1',
'transient_lastDdlTime'='1418173653')
Reference:
https://learn.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-table-hiveformat

Zero results in Athena query of S3 object

I placed a text file that is comma delimited in an S3 bucket. I am attempting to query the folder the file resides in but it returns zero results.
Create table DDL:
CREATE EXTERNAL TABLE myDatabase.myTable (
`field_1` string,
`field_2` string,
...
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = ',',
'field.delim' = ','
) LOCATION 's3://bucket/files from boss/'
TBLPROPERTIES ('has_encrypted_data'='false');
The issue was the whitespace in the location:
LOCATION 's3://bucket/files from boss/'
I removed the whitespace from the folder name in S3 and I was able to query without issue:
LOCATION 's3://bucket/files_from_boss/'

HIVE_CURSOR_ERROR: Unexpected end of input stream

I'm moving the data from Mysql to S3 using data pipeline and it creates empty file for couple of days. I believe, it is making my athena query fails with
"HIVE_CURSOR_ERROR: Unexpected end of input stream".
Below is my script
CREATE EXTERNAL TABLE `test`(
`col0` bigint,
`col1` bigint,
`col2` string,
`col3` string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://dummy/'
Could you please let me know if there is any option to skip zero bytes S3 file?

Parsing nested xml file return null data in hive

I am parsing a nested xml file using hivexml serde but it returns null while we select the data from hive table.
Sample xml file is xml data.
Query which i created for parsing the xml.
CREATE EXTERNAL TABLE IF NOT EXISTS abc ( mail string, Type string, Id bigint, Date string, LId bigint, value string)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.OptOutEmail"="/Re/mail/text()",
"column.xpath.OptOutType"="/Re/Type/text()",
"column.xpath.SurveyId"="/Re/Id/text()",
"column.xpath.RequestedDate"="/Re/Date/text()",
"column.xpath.EmailListId"="/Re/Lists/LId/text()",
"column.xpath.Description"="/Re/Lists/value/text()")
STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/abc/xyz'
TBLPROPERTIES ("xmlinput.start"="<Out>","xmlinput.end"= "</Out>");
Please can someone help.
Try with the below query. I have loaded the data into the table from a local path.
CREATE EXTERNAL TABLE IF NOT EXISTS xmlList ( mail string, Type string, Id
bigint, Dated string, LId bigint, value string)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.mail"="/Re/mail/text()",
"column.xpath.Type"="/Re/Type/text()",
"column.xpath.Id"="/Re/Id/text()",
"column.xpath.Dated"="/Re/Dated/text()",
"column.xpath.LId"="/Re/Lists/List/LId/text()",
"column.xpath.value"="/Re/Lists/List/value/text()")
STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES ("xmlinput.start"="<Re>","xmlinput.end"= "</Re>");

Case Sensitive Partition Column in hive

I need to create partition on a column with an UPPERCASE Column Name. However, Hive converts all the column Names to lowercase implicitly. I am not able to get data in my select * query. I cannot change the Folder name to lowercase. Below is the Create query I am using:
CREATE EXTERNAL TABLE db.ALERT_DS_Hive_Test
PARTITIONED BY (PROC_ID String)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES ('casesensitive'='PROC_ID')
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/user/sam/hive-test/ALERT_DS.avro/'
TBLPROPERTIES ('avro.schema.url'='/user/sam/hive-test/ALERT_DS.avsc');