Hive alter table column fails when it has struct column - hive

I've created hive external table.
CREATE EXTERNAL TABLE test_db.test_table (
`testfield` string,
`teststruct` struct<teststructfield:string>
)
ROW FORMAT SERDE
'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://some/path';
hive> describe test_table;
+-------------+---------------------------------+--------------------+
| col_name | data_type | comment |
+-------------+---------------------------------+--------------------+
| testfield | string | from deserializer |
| teststruct | struct<teststructfield:string> | from deserializer |
+-------------+---------------------------------+--------------------+
and I want to alter table column. but when table has struct column (teststruct),
error occurs with < less than sign.
ALTER TABLE test_db.test_table CHANGE COLUMN testfield testfield2 string;
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Starting task [Stage-0:DDL] in serial mode
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error: type expected at the position 7 of 'string:<derived from deserializer>' but '<' is found.
It succeed without struct column which has <. what should I do for this problem?

If nothing else helps, as a workaround you can drop/create table and recover partitions. The table is EXTERNAL and drop will not affect the data.
(1) Drop table
DROP TABLE test_db.test_table;
(2) Create table with required column name
CREATE EXTERNAL TABLE test_db.test_table (
testfield2 string,
teststruct struct<teststructfield:string>
)
PARTITIONED BY (....)
ROW FORMAT SERDE
'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION
'hdfs://some/path';
(3) Recover partitions
MSCK REPAIR TABLE test_db.test_table;
or if you are running Hive on EMR:
ALTER TABLE test_db.test_table RECOVER PARTITIONS;

Related

SQL Error [XX000]: ERROR: Already present: The column already exists - Yugabyte DB

SQL Alter script:
alter table table_name ADD COLUMN IF NOT EXISTS chk_col jsonb not null
DEFAULT '{}'::jsonb;
In this script, I am getting "ERROR: Already present: The column already exists" but the column chk_col is not present in the table.
If I remove DB and then create again the same script is executed successfully.
How do I correct it without removing the database?
Can you explain more your setup ?
What version are you using ?
Are there needed other steps to reproduce ?
I can't reproduce in 2.7.1.1:
yugabyte=# create database fshije
yugabyte=# \c fshije
fshije=# create table table_name(id bigint);
fshije=# alter table table_name ADD COLUMN IF NOT EXISTS chk_col jsonb not null DEFAULT '{}'::jsonb;
fshije=# \d table_name
Table "public.table_name"
Column | Type | Collation | Nullable | Default
---------+--------+-----------+----------+-------------
id | bigint | | |
chk_col | jsonb | | not null | '{}'::jsonb

columns has 2 elements while hbase.columns.mapping has 3 elements error while creating hive table from hbase

I'm getting following error when I run the below command for creating hive table.
sample is my hive table I'm trying to create. hloan is my existing hbase table. Please help.
create external table sample(id int, name string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES("hbase.columns.mapping"=":key,hl:id,hl:name")
TBLPROPERTIES ("hbase.table.name"="hloan","hbase.mapred.output.outputtable"="sample");
ERROR:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 2 elements while hbase.columns.mapping has 3 elements (counting the key if implicit))
As error describes your create external table statement having 2 columns id,name.
In Hbase mapping you are having 3 columns :key,hl:id,hl:name
Create table with 3 columns:
hive> create external table sample(key int, id int, name string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES("hbase.columns.mapping"=":key,hl:id,hl:name")
TBLPROPERTIES ("hbase.table.name"="hloan","hbase.mapred.output.outputtable"="hloan");
(or)
if key and id columns having same data then you can skip hl:id in mapping.
Create table with 2 columns:
hive> create external table sample(id int, name string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES("hbase.columns.mapping"=":key,hl:name")
TBLPROPERTIES ("hbase.table.name"="hloan","hbase.mapred.output.outputtable"="hloan");

Can we load text file separated by :: into hive table?

Is there a way to load a simple text file where fields are separated by "::" into hive table other than replacing those "::" with "," and then load it?
Replacing the "::" with "," is quicker when the text file is small but what if contains millions of records?
Try creating Hive table using Regex serde
Example:
i had file with below text in it.
i::90
w::99
Create Hive table:
hive> create external table default.i
(Id STRING,
Name STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES ('input.regex' = '(.*?)::(.*)')
STORED AS TEXTFILE;
Select from Hive table:
hive> select * from i;
+-------+---------+--+
| i.id | i.name |
+-------+---------+--+
| i | 90 |
| w | 99 |
+-------+---------+--+
In case if you want to skip the header then use below syntax:
hive> create external table default.i
(Id STRING,
Name STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES ('input.regex' = '(.*?)::(.*)')
STORED AS TEXTFILE
tblproperties ('skip.header.line.count'='1');
UPDATE:
Check is there any older files in your table location.if some files are there then delete them(if you don't want them).
1.Create Hive table as:
create external table <db_name>.<table_name>
(col1 STRING,
col2 STRING,
col3 string,
col4 string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES ('input.regex' = '(.*?)::(.*?)::(.*?)::(.*)')
STORED AS TEXTFILE;
2.Then run:
load data local inpath 'Source path' overwrite into table 'Destination table'

unable to perform update on a transactional table

I am unable to perform update on a table.
I have create the transnational table.
CREATE TABLE d_mat.mat_data(
d_id int,
dname string,
dloc string)
CLUSTERED BY (
dloc)
INTO 2 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
TBLPROPERTIES ('transactional'='true');
I am using Hive CLI.
SET hive.support.concurrency=true;
Error: Error while processing statement: Cannot modify
hive.support.concurrency at runtime. It is not in list of params that
are allowed to be modified at runtime (state=42000,code=1)
UPDATE d_mat.mat_data SET dloc='Australia' where d_id=1;
Please help me.
Thanks in advance.
You are trying to update a column which is bucketed in the table(d_loc).
Hive doesn't support yet to update Bucketed columns.
Change your bucketed column to d_id
Ex:
CREATE TABLE mat_data(
d_id int,
dname string,
dloc string)
CLUSTERED BY (
d_id)
INTO 2 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
TBLPROPERTIES ('transactional'='true');
Insert value into table:
hive> insert into mat_data values(1,"hi","das");
hive> select * from mat_data;
+----------------+-----------------+----------------+--+
| mat_data.d_id | mat_data.dname | mat_data.dloc |
+----------------+-----------------+----------------+--+
| 1 | hi | das |
+----------------+-----------------+----------------+--+
Update the table:
hive> UPDATE mat_data SET dloc='Australia' where d_id=1;
hive> select * from mat_data;
+----------------+-----------------+----------------+--+
| mat_data.d_id | mat_data.dname | mat_data.dloc |
+----------------+-----------------+----------------+--+
| 1 | hi | Australia |
+----------------+-----------------+----------------+--+
Error: Error while processing statement: Cannot modify
hive.support.concurrency at runtime. It is not in list of params that
are allowed to be modified at runtime (state=42000,code=1)
This error is related to configurations as you are trying to SET hive.support.concurrency=true;
and this property is not listed in whitelist parameters.
To fix this issue change hive.security.authorization.sqlstd.confwhitelist in Ambari.
Refer to this and this link for more details.

Impala can not drop external table

I create a external table with a wrong(non-exists) path :
create external table IF NOT EXISTS ds_user_id_csv
(
type string,
imei string,
imsi string,
idfa string,
msisdn string,
mac string
)
PARTITIONED BY(prov string,day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
stored as textfile
LOCATION 'hdfs://cdh0:8020/user/hive/warehouse/test.db/ds_user_id';
And I can not drop the table:
[cdh1:21000] > drop table ds_user_id_csv
> ;
Query: drop table ds_user_id_csv
ERROR:
ImpalaRuntimeException: Error making 'dropTable' RPC to Hive Metastore:
CAUSED BY: MetaException: java.lang.IllegalArgumentException: Wrong FS: hdfs://cdh0:8020/user/hive/warehouse/test.db/ds_user_id, expected: hdfs://nameservice1
So how to solve this? Thank you.
Use the following command to change the location
ALTER TABLE name ds_user_id_csv SET LOCATION '{new location}';