How can I disable transactions for a Hive table? - hive

I have a Hive table that was originally created as transactional, but I want to disable transactions on the table because they are not actually needed.
I tried to disable them using ALTER TABLE, but I got an error:
hive> ALTER TABLE foo SET TBLPROPERTIES('transactional'='false');
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. TBLPROPERTIES with 'transactional'='true' cannot be unset
I am using Hive 2.3.2

According to the documentation changing TBLPROPERTIES ("transactional"="false") is not allowed.
You can re-create the table.
Do table backup first:
create table bkp_table as
select * from your_table;
Then drop table and create again without transactional property. Reload data from backup.
Or make a new table, load data from old one, delete old, rename new.

You have to re-create the table.
First backup table if you want. then, DROP TABLE
Create Table with TBLPROPERTIES ( 'transactional'='false' )
CREATE TABLE your_table(
`col` string,
`col2` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
TBLPROPERTIES (
'transactional'='false'
)
You can Choose Input and Output format

Related

How to truncate a partitioned external table in hive?

I'm planning to truncate the hive external table which has one partition. So, I have used the following command to truncate the table :
hive> truncate table abc;
But, it is throwing me an error stating : Cannot truncate non-managed table abc.
Can anyone please suggest me out regarding the same ...
Make your table MANAGED first:
ALTER TABLE abc SET TBLPROPERTIES('EXTERNAL'='FALSE');
Then truncate:
truncate table abc;
And finally you can make it external again:
ALTER TABLE abc SET TBLPROPERTIES('EXTERNAL'='TRUE');
By default, TRUNCATE TABLE is supported only on managed tables. Attempting to truncate an external table results in the following error:
Error: org.apache.spark.sql.AnalysisException: Operation not allowed: TRUNCATE TABLE on external tables
Action Required
Change applications. Do not attempt to run TRUNCATE TABLE on an external table.
Alternatively, change applications to alter a table property to set external.table.purge to true to allow truncation of an external table:
ALTER TABLE mytable SET TBLPROPERTIES ('external.table.purge'='true');
There is an even better solution to this, which is basically a one liner.
insert overwrite table table_xyz select * from table_xyz where 1=2;
This code will delete all the files and create a blank file in the external folder location with absolute zero records.
Look at https://issues.apache.org/jira/browse/HIVE-4367 : use
truncate table my_ext_table force;

Hive Table is MANAGED or EXTERNAL - issue post table type conversion

I have a hive table in XYZ db named ABC.
When I run describe formatted XYZ.ABC; from hue, I get the following..
that is
Table Type: MANAGED_TABLE
Table Parameters: EXTERNAL True
So is this actually an external or a managed/internal hive table?
This is treated as an EXTERNAL table. Dropping table will keep the underlying HDFS data. The table type is being shown as MANAGED_TABLE since the parameter EXTERNAL is set to True, instead of TRUE.
To fix this metadata, you can run this query:
hive> ALTER TABLE XYZ.ABC SET TBLPROPERTIES('EXTERNAL'='TRUE');
Some details:
The table XYZ.ABC must have been created via this kind of query:
hive> CREATE TABLE XYZ.ABC
<additional table definition details>
TBLPROPERTIES (
'EXTERNAL'='True');
Describing this table will give:
hive> desc formatted XYZ.ABC;
:
Location: hdfs://<location_of_data>
Table Type: MANAGED_TABLE
:
Table Parameters:
EXTERNAL True
Dropping this table will keep the data referenced in Location in describe output.
hive> drop table XYZ.ABC;
# does not drop table data in HDFS
The Table Type still shows as MANAGED_TABLE which is confusing.
Making the value for EXTERNAL as TRUE will fix this.
hive> ALTER TABLE XYZ.ABC SET TBLPROPERTIES('EXTERNAL'='TRUE');
Now, doing a describe will show it as expected:
hive> desc formatted XYZ.ABC;
:
Location: hdfs://<location_of_data>
Table Type: EXTERNAL_TABLE
:
Table Parameters:
EXTERNAL TRUE
Example -
Lets create a sample MANAGED table,
CREATE TABLE TEST_TBL(abc int, xyz string);
INSERT INTO TABLE test_tbl values(1, 'abc'),(2, 'xyz');
DESCRIBE FORMATTED test_tbl;
Changing type to EXTERNAL (in the wrong way using True, instead of TRUE):
ALTER TABLE test_tbl SET TBLPROPERTIES('EXTERNAL'='True');
This gives,
Now lets DROP the table,
DROP TABLE test_tbl;
The result:
Table is dropped but data on HDFS isn't. Showing correct external table behavior!
If we re-create the table we can see data exists:
CREATE TABLE test_tbl(abc int, xyz string);
SELECT * FROM test_tbl;
Result:
The describe shows it wrongly as MANAGED TABLE along with EXTERNAL True because of:
.equals check in the meta
Hive Issue JIRA: HIVE-20057
Proposed fix: Use case insensitive equals

Hive-Hbase integration - issues while inserting data

I was able to successfully integrate Hive & Hbase for straight forward scenarios (No Partition & bucketing). I was able to insert data in both Hive & hbase for those simple scenarios.
I am having issues with Hive partitioned table stored in Hbase. I was able to execute "Create DDL" statement. When I try to perform Insert I get an error message saying "Must specify table name"
CREATE TABLE hivehbase_customer(id int,holdid int,fname string,lname string,address string,zipcode string)
partitioned by (city string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,personal_data:hold_id,personal_data:f_name,personal_data:l_name,personal_address:address,personal_address:zipcode")
TBLPROPERTIES ("hbase.table.name" = "hivehbase_custom", "hbase.mapred.output.outputtable" = "hivehbase_custom");
insert into table hivehbase_customer partition(city= 'tegacay') values (7394,NULL,NULL,NULL,NULL,29708);
try following insert query
insert into table hivehbase_customer partition(city) values (7394,NULL,NULL,NULL,NULL,29708,'tegacay');
actually partitioned column needs to be specified as last column in insert query.

HIVE ALTER SERDE COMMAND

Is there a command in hive that would alter the serde properties of an existing table . Well the tables are created using com.bizo.hive.serde.csv.CSVSerde which needs to be changed to org.apache.hadoop.hive.serde2.OpenCSVSerde,I am looking for something like:
alter table table_X change serde
Thanks,
This will help:
ALTER TABLE TABLE_NAME SET SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde';

create external table in hive as a select query pointing to s3 buckets

I am attempting to create a table in Hive environment and point it to an external location in S3.
When I try :
create table x (key int, value string) location 's3/...'
it works well.
But, when I attempt :
'create external table as select x,y,z from alphabet location 's3/...'
it doesn't run. Is there a way to create a table as a select statement and store it at an external location?
You can create a managed table using the select statement and update the table property to External.
ALTER TABLE <table name> SET TBLPROPERTIES('EXTERNAL'='TRUE')
or
Write the output of the select query to a location
INSERT OVERWRITE DIRECTORY ‘/myDirectory’
SELECT * FROM PARAGRAPH;
CREATE EXTERNAL TABLE <table name> LOCATION '/myDirectory'