how can we rename multiple partitions in Hive? - hive

If have two partitioned columns for eg. school name and class
How can I rename a specific class partition which is present inside all school partitions
so,
/school=ABC/class=1/
/school=PQR/class=1/
.
.
.
.
class = 1 should be transformed to class = 2
/school=ABC/class=2/
/school=PQR/class=2/
.
.
.
.
Edit: In this example there are only two schools but it is variable there could be thousands of schools.

If table is managed table, you can simply use below command to rename the partition,
alter table tbl_name PARTITION (school='ABC', class=1) RENAME TO
PARTITION (school='ABC', class=2);
alter table tbl_name PARTITION (school='PQR', class=1) RENAME TO
PARTITION (school='PQR', class=2);
Below is execution I tried in hive,
hive> create table tbl_name (
name string,
age int)
partitioned by (school string, class int);
hive> alter table tbl_name ADD PARTITION (school='ABC', class=1); OK
Time taken: 0.157 seconds
hive> alter table tbl_name ADD PARTITION (school='PQR', class=1); OK
Time taken: 0.128 seconds
hive> show partitions tbl_name; OK school=ABC/class=1
school=PQR/class=1
hive> alter table tbl_name PARTITION (school='ABC', class=1) RENAME TO
PARTITION (school='ABC', class=2); OK Time taken: 0.468 seconds
hive> alter table tbl_name PARTITION (school='PQR', class=1) RENAME TO
PARTITION (school='PQR', class=2); OK Time taken: 0.432 seconds
hive> show partitions tbl_name; OK school=ABC/class=2
school=PQR/class=2
Hope this will help.

You can try getting the partition information from metadata.
1> get the metadata information from hive-site.xml file (location: /hive/installation/location/hive/hive-2.1/conf)
2> Get the env and credentials
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>[hostname]</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>UserName</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>PWD</value>
<description>password to use against metastore database</description>
</property>
3> connect to metastore, and below is the query to get partition information.
select D.NAME, P.PART_NAME, T.TBL_NAME from PARTITIONS P INNER JOIN TBLS T ON P.TBL_ID=T.TBL_ID INNER JOIN DBS D ON T.DB_ID=D.DB_ID WHERE D.NAME=<DBNAME> AND T.TBL_NAME=<TBLNAME> AND P.PART_NAME LIKE '%class=2%';
Once you have partition information, then you can make use of replace and concat function to derive alter statement.
Hope this helps.

Related

Alter Hive partition column name NOT changed in HDFS

When alter the partition column name of the partition table(named partitioned_table), the corresponding directory in the HDFS does not change. However, the deletion and movement of partitions can be changed in the HDFS.And the the column name is changed using "show partitioin partitioned_table".
Hive version is 4.0.0-alpha-2.
Use the below statement to alter partiton column name.
ALTER TABLE table_name PARTITION
(partition_column = partition_col_value,
partition_column = partition_col_value)
RENAME TO PARTITION (partition_column = partition_col_value,
partition_column = partition_col_value);
Why and how to change the corresponding directory in HDFS when alter partition column name in Hive.
When you alter a partition, it only affects the Hive Metastore, and will never affect data in HDFS. For that, you need to explicitly insert data into the Hive table at that partition, or issue an hdfs mv command, then MSCK REPAIR Hive query to fix the metadata

Deletion of Partitions

I am not able to drop partition in hive table.
ALTER TABLE db.table drop if exists partition(dt="****-**-**/id=**********");
OK
Time taken: 0.564 seconds
But partitions are not getting deleted
Below is the what I get when I check partitions of my table:
hive> show partitions db.table;
OK
dt=****-**-**/id=**********
dt=****-**-**/id=**********
dt=****-**-**/id=**********
dt=****-**-**/id=**********
After running Alter table db.table drop if exists command it should actually delete the partition . But it is not happening so .
Can you please suggest me on this.
Thanks in advance.
Try this:
ALTER TABLE db.table drop if exists partition(dt='****-**-**', id='**********');
As #leftjoin also mentioned, you have to specify partitions with comma seperated.
ALTER TABLE page_view DROP if exists PARTITION (dt='****-**-**', id='**********');
Please note -
In Hive 0.7.0 or later, DROP returns an error if the partition doesn't
exist, unless IF EXISTS is specified or the configuration variable
hive.exec.drop.ignorenonexistent is set to true.
Due to this reason, your query didn't fail and returned OK response.

Dropping multiple partitions in Impala/Hive

1- I'm trying to delete multiple partitions at once, but struggling to do it with either Impala or Hive. I tried the following query, with and without ':
ALTER TABLE cz_prd_corrti_st.s1mme_transstats_info DROP IF EXISTS
PARTITION (pr_load_time='20170701000317')
PARTITION (pr_load_time='20170701000831')
The error I'm getting is as follow:
AnalysisException:
Syntax error in line 3: PARTITION (pr_load_time='20170701000831') ^
Encountered: PARTITION Expected: CACHED, LOCATION, PURGE, SET,
UNCACHED CAUSED BY: Exception: Syntax error
The partition column is bigint type, query for deleting only one partition works as expected:
ALTER TABLE cz_prd_corrti_st.s1mme_transstats_info DROP IF EXISTS
PARTITION (pr_load_time='20170701000317')
2- Is it a good practice delete the hdfs data first and then drop the partitions in Impala/Hive, or is it supposed to be done vice versa?
1.
Your syntax is wrong.
In the DROP command the partitions should be separated by commas.
Demo
hive> create table t (i int) partitioned by (p int);
OK
hive> alter table t add partition (p=1) partition(p=2) partition(p=3) partition(p=4) partition(p=5);
OK
hive> show partitions t;
OK
partition
p=1
p=2
p=3
p=4
p=5
hive> alter table t drop if exists partition (p=1),partition (p=2),partition(p=3);
Dropped the partition p=1
Dropped the partition p=2
Dropped the partition p=3
OK
hive> show partitions t;
OK
partition
p=4
p=5
2.
You can drop a range.
Demo
hive> create table t (i int) partitioned by (p int);
OK
hive> alter table t add partition (p=1) partition(p=2) partition(p=3) partition(p=4) partition(p=5);
OK
hive> show partitions t;
OK
partition
p=1
p=2
p=3
p=4
p=5
hive> alter table t drop if exists partition (p<=3);
Dropped the partition p=1
Dropped the partition p=2
Dropped the partition p=3
OK
hive> show partitions t;
OK
partition
p=4
p=5

How to rename partition value in Hive?

I have a hive table 'videotracking_playevent' which uses the following partition format (all strings): source/createyear/createmonth/createday.
Example: source=home/createyear=2016/createmonth=9/createday=1
I'm trying to update the partition values of createmonth and createday to consistently use double digits instead.
Example: source=home/createyear=2016/createmonth=09/createday=01
I've tried to the following query:
ALTER TABLE videotracking_playevent PARTITION (
source='home',
createyear='2015',
createmonth='11',
createday='1'
) RENAME TO PARTITION (
source='home',
createyear='2015',
createmonth='11',
createday='01'
);
However that returns the following, non-descriptive error from hive: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. null
I've confirmed that this partition exists, and I think I'm using the correct syntax. My hive version is Hive 1.1.0
Any ideas what I might be doing wrong?
There was an issue with old version of Hive with renaming partition. This might be an issue for your case too. Please see this link for detail.
You need to set below two property before executing the rename partition command if you are using Older version of Hive.
set fs.hdfs.impl.disable.cache=false;
set fs.file.impl.disable.cache=false;
Now run the query by setting this property.
hive> set fs.hdfs.impl.disable.cache=false;
hive> set fs.file.impl.disable.cache=false;
hive> ALTER TABLE partition_test PARTITION (year='2016',day='1') RENAME TO PARTITION (year='2016',day='01');
OK
Time taken: 0.28 seconds
hive> show partitions partition_test;
OK
year=2016/day=01
Time taken: 0.091 seconds, Fetched: 1 row(s)
hive>
This issue is fixed in Hive latest version. In my case Hive version is 1.2.1 and it works, without setting that property. Please see the example below.
Create a partitioned table.
hive> create table partition_test(
> name string,
> age int)
> partitioned by (year string, day string);
OK
Time taken: 5.35 seconds
hive>
Now add the partition and check the newly added partition.
hive> alter table partition_test ADD PARTITION (year='2016', day='1');
OK
Time taken: 0.137 seconds
hive>
hive> show partitions partition_test;
OK
year=2016/day=1
Time taken: 0.169 seconds, Fetched: 1 row(s)
hive>
Rename the partition using RENAME TO PARTITION command and check it.
hive> ALTER TABLE partition_test PARTITION (year='2016',day='1') RENAME TO PARTITION (year='2016',day='01');
OK
Time taken: 0.28 seconds
hive> show partitions partition_test;
OK
year=2016/day=01
Time taken: 0.091 seconds, Fetched: 1 row(s)
hive>
Hope it helps you.
Rename lets you change the value of a partition column. One of use cases is that you can use this statement to normalize your legacy partition column value to conform to its type. In this case, the type conversion and normalization are not enabled for the column values in old partition_spec even with property hive.typecheck.on.insert set to true (default) which allows you to specify any legacy data in form of string in the old partition_spec"
Bug open
https://issues.apache.org/jira/browse/HIVE-10362
You can create a copy of the table without partition, then update the column of the table, and then recreate the first one with partition
create table table_name partitioned by (table_column) as
select
*
from
source_table
That worked for me.

Hive external table not showing partitions

I have created an external table using Hive. My
hive> desc <table_name>;
shows the following output:
OK
transactiontype string
transactionid int
sourcenumber int
destnumber int
amount int
assumedfield1 int
transactionstatus string
assumedfield2 int
assumedfield3 int
transactiondate date
customerid int
# Partition Information
# col_name data_type comment
transactiondate date
customerid int
Time taken: 0.094 seconds, Fetched: 17 row(s)
But when I execute the following command:
hive> show partitions <dbname.tablename>;
OK
Time taken: 0.11 seconds
No partitions are shown. What might be the problem? When i see the hive.log, data in the table seems to be paritioned properly according to the 'transactiondate' and the 'customerid' fields. What is the max number of partitions that a single node should have? I have set 1000 partitions.
2015-06-15 10:33:44,713 INFO [LocalJobRunner Map Task Executor #0]: exec.FileSinkOperator (FileSinkOperator.java:createBucketForFileIdx(593)) - Writing to temp file: FS hdfs://localhost:54310/home/deepak/mobile_money_jan.txt/.hive-staging_hive_2015-06-15_10-30-53_308_5507019849041735537-1/_task_tmp.-ext-10002/transactiondate=2015-01-16/customerid=34560544/_tmp.000002_0
I am running hive on a single node hadoop cluster.
Try adding partitions manually
> alter table db.table add IF NOT EXISTS
> partition(datadate='2017-01-01') location
>'hdfs_location/datadate=2017-01-01'
HI whenever we create an external table it's location is changed to a specified location in HIVE METADATA,it means now this changes reflects in hive meta store too.
BUT the partition information remain unchanged ,partition information is not updated in hive meta store so we need to add those partitions manually.
ALTER TABLE "your-table" ADD PARTITION(transactiondate='datevalue',customerid='id-value');