How to partition by a transformed column in Hive? - hive

I want to make use of year-month as partition in my table, but there is no such column in the table.
Is it possible to be partitioned by the custom field? For example I tried as below:
INSERT OVERWRITE table zhihu_answer partition (ym)
SELECT
answer_id,
answer_updated,
author_headline,
author_id,
author_name,
question_created,
question_id,
question_title,
question_type,
voteup_count,
date_format(insert_time,'yyyyMM') as ym
FROM zhihu_answer;
But it failed with:
Error while compiling statement: FAILED: ValidationFailureSemanticException table is not partitioned but partition spec exists: {ym=null}
DDL:
CREATE TABLE `zhihu_answer`(
`answer_id` string,
`answer_updated` string,
`author_headline` string,
`author_id` string,
`author_name` string,
`insert_time` string,
`question_created` string,
`question_id` string,
`question_title` string,
`question_type` string,
`voteup_count` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://device1:8020/user/hive/warehouse/zhihu.db/zhihu_answer'
TBLPROPERTIES (
'transient_lastDdlTime'='1569629962')
Thanks for your help.

Related

athena insert and hive format error for HiveIgnoreKeyTextOutputFormat

Before the question/issue, here's the setup:
Table 1
CREATE EXTERNAL TABLE `table1`(
`mac_address` string,
`node` string,
`wave_found` string,
`wave_data` string,
`calc_dt` string,
`load_dt` string)
PARTITIONED BY (
`site_id` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://foobucket/object-thing'
TBLPROPERTIES (
'has_encrypted_data'='false',
'transient_lastDdlTime'='1654609315')
Table 2
CREATE EXTERNAL TABLE `table2`(
`mac_address` string,
`node` string,
`wave_found` string,
`wave_data` string,
`calc_dt` string,
PARTITIONED BY (
`load_dt` string,
`site_id` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://foobucket/object-thing'
TBLPROPERTIES (
'has_encrypted_data'='false',
'transient_lastDdlTime'='1654147830')
When the following Athena SQL is executed, the error below is thrown:
insert into tabl2
select * from table1;
"HIVE_UNSUPPORTED_FORMAT: Output format
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat with SerDe
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe is not
supported."
That error seems relatively straighforward, but I'm still stuck on
building a solution despite looking for alternatives to the so-called
HiveIgnoreKeyTextOutputFormat. There's also the partition difference
going on, but I'm not sure if that has any bearing on this current error
shown here.
Here's some sources I've found and used so far: 1, 2

hive cannot display special characters

I created a hive table on a hbase table
like that
CREATE EXTERNAL TABLE RGPD.TEST_TAB(
`HBASE_ID` STRING,
`INTEGRATION_ID` STRING,
`LAST_NAME` STRING,
`MAIDEN_NAME` STRING,
`FST_NAME` STRING,
`PER_TITLE` STRING,
`BIRTH_DT` STRING
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ('hbase.columns.mapping'=':key,all:integration_id,all:last_name,all:maiden_name,all:fst_name,all:per_title,all:birth_dt','serialization.encoding'='UTF-8')
TBLPROPERTIES ('hbase.table.name'='key:TEST_TAB');
when I make a select query in hive
I have a return like this:
10131472 FRAN�OIS
while the query in hbase returns:
01061948 FRAN\xC7OIS
I know this is a charset (utf-8) issue but I have not found the solution
Any help please !!

Hive and Hbase table for hipotesis

I have an IBM cloud where I have Hive/Hbase, I just create a "table" on Hive and I also load some data from a csv file.
My csv file contains information from google play store apps.
My commands for creating and upload data to my table are the following ones:
hive> create table if not exists app_desc (name string,
category string, rating int,
reviews int, installs string,
type string, price int,
content string, genres string,
last_update string, current_ver string,
android_ver string)
row format delimited fields terminated by ',';
hive > load data local inpath '/home/uamibm130/googleplaystore.csv' into table app_desc;
Ok, It works correctly and using a Select I obtain the data correctly.
Now what I want to do is to create a HBASE table, my problem is that I don't know how to do it correctly.
First of all I create a Hbase Db -> create google_db_ , google_data, info_data
Now I try to create an external table using this hive command, but what I am getting is an error that my table is not found.
This is the command I am using for the creation of the external hive table.
create external table uamibm130_hbase_google (name string, category string, rating int, reviews int, installs string, type string, price int, content string, genres string, last_update string, current_ver string, android_ver string)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,
google_data:category,google_data:rating, info_data:reviews,
info_data:installs, info_data:type, info_data:price, info_data:content,
info_data:genres, info_data:last_update, info_data:current_ver,
info_data:android_ver") TBLPROPERTIES("hbase.table.name" = "google_db_");
I don't know the correct way for the creation of Hbase table based on an Hive schema, for uploading correctly my .csv data.
Any idea ? I am new on it.
Thanks!
Try with below create table statement in HBase,
Create Hbasetable:
hbase(main):001:0>create 'google_db_','google_data','info_data'
Create Hive External table on Hbase:
hive> create external table uamibm130_hbase_google (name string, category string, rating int, reviews int, installs string, type string, price int, content string, genres string, last_update string, current_ver string, android_ver string)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,
google_data:category,google_data:rating, info_data:reviews,
info_data:installs, info_data:type, info_data:price, info_data:content,
info_data:genres, info_data:last_update, info_data:current_ver,
info_data:android_ver") TBLPROPERTIES("hbase.table.name" = "google_db_",
"hbase.mapred.output.outputtable" = "google_db_");
Then insert data into Hive-Hbase table(uamibm130_hbase_google) from Hive table(app_desc).
Insert data into Hive-Hbase table:
Hive> insert into table uamibm130_hbase_google select * from app_desc;

Parsing nested xml file return null data in hive

I am parsing a nested xml file using hivexml serde but it returns null while we select the data from hive table.
Sample xml file is xml data.
Query which i created for parsing the xml.
CREATE EXTERNAL TABLE IF NOT EXISTS abc ( mail string, Type string, Id bigint, Date string, LId bigint, value string)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.OptOutEmail"="/Re/mail/text()",
"column.xpath.OptOutType"="/Re/Type/text()",
"column.xpath.SurveyId"="/Re/Id/text()",
"column.xpath.RequestedDate"="/Re/Date/text()",
"column.xpath.EmailListId"="/Re/Lists/LId/text()",
"column.xpath.Description"="/Re/Lists/value/text()")
STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/abc/xyz'
TBLPROPERTIES ("xmlinput.start"="<Out>","xmlinput.end"= "</Out>");
Please can someone help.
Try with the below query. I have loaded the data into the table from a local path.
CREATE EXTERNAL TABLE IF NOT EXISTS xmlList ( mail string, Type string, Id
bigint, Dated string, LId bigint, value string)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.mail"="/Re/mail/text()",
"column.xpath.Type"="/Re/Type/text()",
"column.xpath.Id"="/Re/Id/text()",
"column.xpath.Dated"="/Re/Dated/text()",
"column.xpath.LId"="/Re/Lists/List/LId/text()",
"column.xpath.value"="/Re/Lists/List/value/text()")
STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES ("xmlinput.start"="<Re>","xmlinput.end"= "</Re>");

Insert data into hive table without delimiters

I want 10 words in one column, another 10 words in another column .How to insert data into hive table with no specified delimiters using UDFs?
CREATE TABLE employees_stg (emplid STRING, name STRING, age STRING, salary STRING, dept STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "(.{4})(.{35})(.{3})(.{11})(.{4})", --Length of each column specified between braces "({})"
"output.format.string" = "%1$s %2$s %3$s %4$s %5$s" --Output in string format
)
LOCATION '/path/to/input/employees_stg';
LOAD DATA INPATH '/path/to/sample_file.txt' INTO TABLE employees_stg;
SELECT * FROM employees_stg;