hive transactional table compaction fails - hive

Table created with this :
create table syslog_staged (id string, facility string, sender string, severity string, tstamp string, service string, msg string) partitioned by (hostname string, year string, month string, day string) clustered by (id) into 20 buckets stored as orc tblproperties("transactional"="true");
the table is populated with Apache nifi's PutHiveStreaming...
alter table syslog_staged partition (hostname="cloudserver19", year="2016", month="10", day="24") compact 'major';
Now it turns out compaction fails for some reason.....(from job history)
No of maps and reduces are 0 job_1476884195505_0031
Job commit failed: java.io.FileNotFoundException: File hdfs://hadoop1.openstacksetup.com:8020/apps/hive/warehouse/log.db/syslog_staged/hostname=cloudserver19/year=2016/month=10/day=24/_tmp_27c40005-658e-48c1-90f7-2acaa124e2fa does not exist.
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:904)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:113)
at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:966)
at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:962)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:962)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorOutputCommitter.commitJob(CompactorMR.java:776)
at org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:291)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:285)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
from hive metastore log :
2016-10-24 16:33:35,503 WARN [Thread-14]: compactor.Initiator (Initiator.java:run(132)) - Will not initiate compaction for log.syslog_staged.hostname=cloudserver19/year=2016/month=10/day=24 since last hive.compactor.initiator.failed.compacts.threshold attempts to compact it failed.

Please set below properties for optimizing compaction for transactional table-
set hive.compactor.worker.threads=1;
set hive.compactor.initiator.on=true;
I assume you have set below transactional Hive properties
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.support.concurrency=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.enforce.bucketing=true;

Related

How can I disable transactions for a Hive table?

I have a Hive table that was originally created as transactional, but I want to disable transactions on the table because they are not actually needed.
I tried to disable them using ALTER TABLE, but I got an error:
hive> ALTER TABLE foo SET TBLPROPERTIES('transactional'='false');
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. TBLPROPERTIES with 'transactional'='true' cannot be unset
I am using Hive 2.3.2
According to the documentation changing TBLPROPERTIES ("transactional"="false") is not allowed.
You can re-create the table.
Do table backup first:
create table bkp_table as
select * from your_table;
Then drop table and create again without transactional property. Reload data from backup.
Or make a new table, load data from old one, delete old, rename new.
You have to re-create the table.
First backup table if you want. then, DROP TABLE
Create Table with TBLPROPERTIES ( 'transactional'='false' )
CREATE TABLE your_table(
`col` string,
`col2` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
TBLPROPERTIES (
'transactional'='false'
)
You can Choose Input and Output format

DynamoDB EMR Hive Connector writes 1 item at a time

While writing to dynamodb with on-demand capacity using hive > INSERT OVERWRITE TABLE t SELECT * FROM s3data; I notice that it writes 1 item at a time which is evident from the writecapacity graph below. Here are the settings
SET dynamodb.throughput.write.percent=1.0;
CREATE EXTERNAL TABLE IF NOT EXISTS t (userId string, categoryName string, score double)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "data.reader-score.test",
"dynamodb.column.mapping" = "userId:userId,categoryName:categoryName,score:score",
"dynamodb.throughput.write" = "5000");
Is there any other configuration that must be done?

Hive-Hbase integration - issues while inserting data

I was able to successfully integrate Hive & Hbase for straight forward scenarios (No Partition & bucketing). I was able to insert data in both Hive & hbase for those simple scenarios.
I am having issues with Hive partitioned table stored in Hbase. I was able to execute "Create DDL" statement. When I try to perform Insert I get an error message saying "Must specify table name"
CREATE TABLE hivehbase_customer(id int,holdid int,fname string,lname string,address string,zipcode string)
partitioned by (city string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,personal_data:hold_id,personal_data:f_name,personal_data:l_name,personal_address:address,personal_address:zipcode")
TBLPROPERTIES ("hbase.table.name" = "hivehbase_custom", "hbase.mapred.output.outputtable" = "hivehbase_custom");
insert into table hivehbase_customer partition(city= 'tegacay') values (7394,NULL,NULL,NULL,NULL,29708);
try following insert query
insert into table hivehbase_customer partition(city) values (7394,NULL,NULL,NULL,NULL,29708,'tegacay');
actually partitioned column needs to be specified as last column in insert query.

Hive Update 0.14 version is not working Attempt to do update or delete using transaction manager that does not support these operations.“

I'm trying to update hive orc bucket table. but it throwing exception FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.“
I'm running in hive Command prompt.
STEP 1:
set hive.support.concurrency = true;
SET hive.enforce.bucketing = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
SET hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.compactor.initiator.on = true;
SET hive.compactor.worker.threads = 1;
STEP 2:
create table test(id int ,name string ) clustered by (id) into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');
STEP 3:
insert into table test values (1,'row1'),(2,'row2'),(3,'row3'); -- 3 rows inserted successfully
STEp 4 :
insert into table testTable values (1,'row1'),(2,'row2');
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.“
After this when I open another hive prompt and run show tables, it remains in hang state no results are return. I restart hive services also but no use.
According to the Hive wiki, Bucketing and Partitioning columns cannot be updated. Can you retry by using a column other than id?

Issue creating Hive External table using tblproperties

I am trying to create an external table with tblproperties in Hive. The table gets created but it does not display the rows. Any ideas? Please find the scripts i am using below:
Thanks for your time and suggestions in advance.
Data is in a recursive folder: /user/test/test1/test2/samplefile.csv
use dw_raw;
drop table if exists temp_external_tab1;
create external table if not exists temp_external_tab1 (
col1 int,
col2 string,
col3 string,
col4 string
)
row format delimited fields terminated by ','
lines terminated by '\n'
stored as textfile
location '/user/test/test1/'
tblproperties ("hive.input.dir.recursive" = "TRUE",
"hive.mapred.supports.subdirectories" = "TRUE",
"hive.supports.subdirectories" = "TRUE",
"mapred.input.dir.recursive" = "TRUE");
These are not table properties, but global settings.
You should set these using 'set', i.e.:
set hive.mapred.supports.subdirectories=true;
set mapred.input.dir.recursive=true;
You've created a table but haven't put any data into it. Try
hive> LOAD DATA LOCAL INPATH '/user/test/test1/test2/samplefile.csv'
INTO TABLE temp_external_tab1;
If you are using ambari the set the following properties to hive advanced config inside custom hive-site.xml.
SET hive.input.dir.recursive=TRUE
SET hive.mapred.supports.subdirectories=TRUE
SET hive.supports.subdirectories=TRUE
SET mapred.input.dir.recursive=TRUE
And then restart the affected services. This will read all the data recursively.