HDINSIGHT hive, MSCK REPAIR TABLE table_name throwing error

HDINSIGHT hive, MSCK REPAIR TABLE table_name throwing error - hive

i have an external partitioned table named employee with partition(year,month,day), everyday a new file come and seat at the particular day location call for today's date it will be at 2016/10/13.
TABLE SCHEMA:
create External table employee(EMPID Int,FirstName String,.....)
partitioned by (year string,month string,day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' LOCATION '/.../emp';
so everyday we need to run command which is working fine as
ALTER TABLE employee ADD IF NOT EXISTS PARTITION (year=2016,month=10,day=14) LOCATION '/.../emp/2016/10/14';
but once we are trying with below command because we don't want to execute the above alter table command manually, it throws below Error
hive> MSCK REPAIR TABLE employee;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Note:
hive> MSCK TABLE employee; //this show me that a partition has not added in the table
OK
Partitions not in metastore: employee:2016/10/14
Time taken: 1.066 seconds, Fetched: 1 row(s)
please help me as i stuck with this. do we have any workaround for this type of situation?

I got a workaround solution for my problem which is, if the table static partition name is like 'year=2016/month=10/day=13' then we can use below command and it is working...
set hive.msck.path.validation=ignore;
MSCK REPAIR TABLE table_name;

Related

Deletion of Partitions

I am not able to drop partition in hive table.
ALTER TABLE db.table drop if exists partition(dt="****-**-**/id=**********");
OK
Time taken: 0.564 seconds
But partitions are not getting deleted
Below is the what I get when I check partitions of my table:
hive> show partitions db.table;
OK
dt=****-**-**/id=**********
dt=****-**-**/id=**********
dt=****-**-**/id=**********
dt=****-**-**/id=**********
After running Alter table db.table drop if exists command it should actually delete the partition . But it is not happening so .
Can you please suggest me on this.
Thanks in advance.

Try this:
ALTER TABLE db.table drop if exists partition(dt='****-**-**', id='**********');

As #leftjoin also mentioned, you have to specify partitions with comma seperated.
ALTER TABLE page_view DROP if exists PARTITION (dt='****-**-**', id='**********');
Please note -
In Hive 0.7.0 or later, DROP returns an error if the partition doesn't
exist, unless IF EXISTS is specified or the configuration variable
hive.exec.drop.ignorenonexistent is set to true.
Due to this reason, your query didn't fail and returned OK response.

How to create table over partitioned data

I have text file with snappy compression partitioned by field 'process_time' (result of Flume job). Example: hdfs://data/mytable/process_time=25-04-2019
This is my script for create table:
CREATE EXTERNAL TABLE mytable
(
...
)
PARTITIONED BY (process_time STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/data/mytable/'
TBLPROPERTIES("textfile.compress"="snappy");
The result of queries against this table are allways 0 (but I know that there are some data). Any help?
Thanks!

As you are creating external table on top of HDFS directory then to add the partitions to the hive table we need to run either of these commands.
if any partition added to HDFS directly(instead of using insert queries) then hive doesn't know about the newly added partitions, so we need to run either msck (or) add partitions to add newly added partitions to hive table.
To add all partitions to hive table:
hive> msck repair table <db_name>.<table_name>;
(or)
To manually add each partition to hive table:
hive> alter table <db_name>.<table_name> add partition(process_time="25-04-2019")
location '/data/mytable/process_time=25-04-2019';
For more details refer to this link.

Droped and recreated hive external table, but no data shown

First a hive external table is created:
create external table user_tables.test
(col1 string, col2 string)
partitioned by (date_partition date);
records are inserted:
INSERT INTO TABLE user_tables.test
PARTITION (date_partition='2017-11-16') VALUES ('abc', 'xyz'), ('abc1', 'xyz1');
Now, the table is dropped and recreated with the same script.
When I try-
SELECT * FROM user_tables.test WHERE date_partition='2017-11-16';`
I get Done. 0 results.

This is because the table you created is a partitioned table. The insert you ran would have created a partition for date_partition='2017-11-16'. When you drop and re-create the table, Hive looses the information about the partition, it only knows about the table.
Run the command below to get hive to re-create the partitions based on the data.
MSCK REPAIR TABLE user_tables.test;
Now you will see the data when you run SELECT.
If you want to know the partitions in the table run the statement below:
SHOW PARTITIONS user_tables.test;
Run this before and after MSCK to see the effect.

External table does not return the data in its folder

I have created an external table in Hive with at this location :
CREATE EXTERNAL TABLE tb
(
...
)
PARTITIONED BY (datehour INT)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/user/cloudera/data';
The data is present in the folder but when I query the table, it returns nothing. The table is structured in a way that it fits the data structure.
SELECT * FROM tb LIMIT 3;
Is there a kind of permission issue with Hive tables: do specific users have permissions to query some tables?
Do you know some solutions or workarounds?

You have created your table as partitioned table base on column datehour, but you are putting your data in /user/cloudera/data. Hive will look for data in /user/cloudera/data/datehour=(some int value). Since it is an external table hive will not update the metastore. You need to run some alter statement to update that
So here are the steps for external tables with partition:
1.) In you external location /user/cloudera/data, create a directory datehour=0909201401
OR
Load data using: LOAD DATA [LOCAL] INPATH '/path/to/data/file' INTO TABLE partition(datehour=0909201401)
2.) After creating your table run a alter statement:
ALTER TABLE ADD PARTITION (datehour=0909201401)
Hope it helps...!!!

When we create an EXTERNAL TABLE with PARTITION, we have to ALTER the EXTERNAL TABLE with the data location for that given partition. However, it need not be the same path as we specify while creating the EXTERNAL TABLE.
hive> ALTER TABLE tb ADD PARTITION (datehour=0909201401)
hive> LOCATION '/user/cloudera/data/somedatafor_datehour'
hive> ;
When we specify LOCATION '/user/cloudera/data' (though its optional) while creating an EXTERNAL TABLE we can take some advantage of doing repair operations on that table. So when we want to copy the files through some process like ETL into that directory, we can sync up the partition with the EXTERNAL TABLE instead of writing ALTER TABLE statement to create another new partition.
If we already know the directory structure of the partition that HIVE would create, we can simply place the data file in that location like '/user/cloudera/data/datehour=0909201401/data.txt' and run the statement as shown below:
hive> MSCK REPAIR TABLE tb;
The above statement will sync up the partition to the hive meta store of the table "tb".

How to Update/Drop a Hive Partition?

After adding a partition to an external table in Hive, how can I update/drop it?

You can update a Hive partition by, for example:
ALTER TABLE logs PARTITION(year = 2012, month = 12, day = 18)
SET LOCATION 'hdfs://user/darcy/logs/2012/12/18';
This command does not move the old data, nor does it delete the old data. It simply sets the partition to the new location.
To drop a partition, you can do
ALTER TABLE logs DROP IF EXISTS PARTITION(year = 2012, month = 12, day = 18);

in addition, you can drop multiple partitions from one statement (Dropping multiple partitions in Impala/Hive).
Extract from above link:
hive> alter table t drop if exists partition (p=1),partition (p=2),partition(p=3);
Dropped the partition p=1
Dropped the partition p=2
Dropped the partition p=3
OK
EDIT 1:
Also, you can drop bulk using a condition sign (>,<,<>), for example:
Alter table t
drop partition (PART_COL>1);

Alter table table_name drop partition (partition_name);

You can either copy files into the folder where external partition is located or use
INSERT OVERWRITE TABLE tablename1 PARTITION (partcol1=val1, partcol2=val2...)...
statement.

You may also need to make database containing table active
use [dbname]
otherwise you may get error (even if you specify database i.e. dbname.table )
FAILED Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter partition. Unable to alter partitions because table or database does not exist.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

HDINSIGHT hive, MSCK REPAIR TABLE table_name throwing error - hive

I got a workaround solution for my problem which is, if the table static partition name is like 'year=2016/month=10/day=13' then we can use below command and it is working... set hive.msck.path.validation=ignore; MSCK REPAIR TABLE table_name;

Related

Deletion of Partitions

How to create table over partitioned data

Droped and recreated hive external table, but no data shown

External table does not return the data in its folder

How to Update/Drop a Hive Partition?

Categories

Resources