Dropping multiple partitions in Impala/Hive - sql

1- I'm trying to delete multiple partitions at once, but struggling to do it with either Impala or Hive. I tried the following query, with and without ':
ALTER TABLE cz_prd_corrti_st.s1mme_transstats_info DROP IF EXISTS
PARTITION (pr_load_time='20170701000317')
PARTITION (pr_load_time='20170701000831')
The error I'm getting is as follow:
AnalysisException:
Syntax error in line 3: PARTITION (pr_load_time='20170701000831') ^
Encountered: PARTITION Expected: CACHED, LOCATION, PURGE, SET,
UNCACHED CAUSED BY: Exception: Syntax error
The partition column is bigint type, query for deleting only one partition works as expected:
ALTER TABLE cz_prd_corrti_st.s1mme_transstats_info DROP IF EXISTS
PARTITION (pr_load_time='20170701000317')
2- Is it a good practice delete the hdfs data first and then drop the partitions in Impala/Hive, or is it supposed to be done vice versa?

1.
Your syntax is wrong.
In the DROP command the partitions should be separated by commas.
Demo
hive> create table t (i int) partitioned by (p int);
OK
hive> alter table t add partition (p=1) partition(p=2) partition(p=3) partition(p=4) partition(p=5);
OK
hive> show partitions t;
OK
partition
p=1
p=2
p=3
p=4
p=5
hive> alter table t drop if exists partition (p=1),partition (p=2),partition(p=3);
Dropped the partition p=1
Dropped the partition p=2
Dropped the partition p=3
OK
hive> show partitions t;
OK
partition
p=4
p=5
2.
You can drop a range.
Demo
hive> create table t (i int) partitioned by (p int);
OK
hive> alter table t add partition (p=1) partition(p=2) partition(p=3) partition(p=4) partition(p=5);
OK
hive> show partitions t;
OK
partition
p=1
p=2
p=3
p=4
p=5
hive> alter table t drop if exists partition (p<=3);
Dropped the partition p=1
Dropped the partition p=2
Dropped the partition p=3
OK
hive> show partitions t;
OK
partition
p=4
p=5

Related

Deletion of Partitions

I am not able to drop partition in hive table.
ALTER TABLE db.table drop if exists partition(dt="****-**-**/id=**********");
OK
Time taken: 0.564 seconds
But partitions are not getting deleted
Below is the what I get when I check partitions of my table:
hive> show partitions db.table;
OK
dt=****-**-**/id=**********
dt=****-**-**/id=**********
dt=****-**-**/id=**********
dt=****-**-**/id=**********
After running Alter table db.table drop if exists command it should actually delete the partition . But it is not happening so .
Can you please suggest me on this.
Thanks in advance.
Try this:
ALTER TABLE db.table drop if exists partition(dt='****-**-**', id='**********');
As #leftjoin also mentioned, you have to specify partitions with comma seperated.
ALTER TABLE page_view DROP if exists PARTITION (dt='****-**-**', id='**********');
Please note -
In Hive 0.7.0 or later, DROP returns an error if the partition doesn't
exist, unless IF EXISTS is specified or the configuration variable
hive.exec.drop.ignorenonexistent is set to true.
Due to this reason, your query didn't fail and returned OK response.

unable to delete hive table partition contains special character Equal sign(=)

inserted data in Hive table with partition column(CL) value as ('CL=18') which stored as /db/tbname/CL=CL%3D18 (invalid partition contains url encoded special character for equal sign).
As per hortonworks community , it was mentioned hive stored special characters as url escaped.
I tried using escape sequence for equal sign as \x3D(hex) , \u0030 (unicode) but did not work
Ex: alter table tb drop partition (CL='CL\x3D18'); <-- did not work
Can some one help me, am I doing some thing wrong for Equal(=) sign?
Try with alter table id drop partition(cl="cl=18"); (or) by enclosing partition value with single quotes(') also.
i have recreated the scenario on end and able to drop the partitions with special characters without using any hex..etc sequence.
Example:
I have created partition table with cl as partition column stringtype.
hive> alter table t1 add partition(cl="cl=18"); --add the partition to the table
hive> show partitions t1; --list the partititons in the table
+-------------+--+
| partition |
+-------------+--+
| cl=cl%3D18 |
+-------------+--+
hive> alter table t1 drop partition(cl='cl=18'); --drop the partition from the table.
hive> show partitions t1;
+------------+--+
| partition |
+------------+--+
+------------+--+

Dropping a range of partitions in HIVE

I have a Hive (ver 0.11.0) table partitioned by column date, of type string. I want to know if there exists a way in Hive by which I can drop partitions for a range of dates (say from 'date1' to 'date2'). I have tried the following (SQL type) queries, but they don't seem to be syntactically correct:
ALTER TABLE myTable DROP IF EXISTS PARTITION
(date>='date1' and date<='date2');
ALTER TABLE myTable DROP IF EXISTS PARTITION
(date>='date1' && date<='date2');
ALTER TABLE myTable DROP IF EXISTS PARTITION
(date between 'date1' and 'date2');
I tried this syntax it worked.
ALTER TABLE mytable DROP PARTITION (dates>'2018-04-14',dates<'2018-04-16');
Command output:
Dropped the partition dates=2018-04-15/country_id=107
Dropped the partition dates=2018-04-15/country_id=110
Dropped the partition dates=2018-04-15/country_id=112
Dropped the partition dates=2018-04-15/country_id=14
Dropped the partition dates=2018-04-15/country_id=157
Dropped the partition dates=2018-04-15/country_id=159
Dropped the partition dates=2018-04-15/country_id=177
Dropped the partition dates=2018-04-15/country_id=208
Dropped the partition dates=2018-04-15/country_id=22
Dropped the partition dates=2018-04-15/country_id=233
Dropped the partition dates=2018-04-15/country_id=234
Dropped the partition dates=2018-04-15/country_id=76
Dropped the partition dates=2018-04-15/country_id=83
OK
Time taken: 0.706 seconds
I am using, Hive 1.2.1000.2.5.5.0-157
Solution: alter table myTable drop partition (unix_timestamp('date1','yyyy-MM-dd')>unix_timestamp(myDate,‌​'yyyy-MM-dd'),unix_t‌​imestamp('date2','yy‌​yy-MM-dd')<unix_time‌​stamp(myDate,'yyyy-M‌​M-dd'));

Drop hive partition by date range

I use hive-0.10.0-cdh-4.7.0 in my environment.
I have a table named test store as sequence file and some partitions by date_dim like below:
game=Test/date_dim=2014-07-01
game=Test/date_dim=2014-07-11
game=Test/date_dim=2014-07-21
game=Test/date_dim=2014-07-31
I want to drop partitions between 2014-07-21 and 2014-07-30 in SQL command:
alter table test drop partition (date_dim>='2014-07-11',date_dim<='2014-07-30')
I hope these 2 partitions be deleted:
game=Test/date_dim=2014-07-11
game=Test/date_dim=2014-07-21
But actually, these 3 partitions be deleted:
game=Test/date_dim=2014-07-01
game=Test/date_dim=2014-07-11
game=Test/date_dim=2014-07-21
It seems hive drop partition only use the date_dim<='2014-07-30' condition.
Is there anyway to make hive drop partition as I wish?
You should convert the string to the date type, for that purpose you can use unix_timestamp function:
alter table test drop partition (unix_timestamp(date_dim,'yyyy-MM-dd')>=unix_timestamp('2014-07-11','yyyy-MM-dd'),unix_timestamp(date_dim,'yyyy-MM-dd')<=unix_timestamp('2014-07-30','yyyy-MM-dd'))

How to Update/Drop a Hive Partition?

After adding a partition to an external table in Hive, how can I update/drop it?
You can update a Hive partition by, for example:
ALTER TABLE logs PARTITION(year = 2012, month = 12, day = 18)
SET LOCATION 'hdfs://user/darcy/logs/2012/12/18';
This command does not move the old data, nor does it delete the old data. It simply sets the partition to the new location.
To drop a partition, you can do
ALTER TABLE logs DROP IF EXISTS PARTITION(year = 2012, month = 12, day = 18);
in addition, you can drop multiple partitions from one statement (Dropping multiple partitions in Impala/Hive).
Extract from above link:
hive> alter table t drop if exists partition (p=1),partition (p=2),partition(p=3);
Dropped the partition p=1
Dropped the partition p=2
Dropped the partition p=3
OK
EDIT 1:
Also, you can drop bulk using a condition sign (>,<,<>), for example:
Alter table t
drop partition (PART_COL>1);
Alter table table_name drop partition (partition_name);
You can either copy files into the folder where external partition is located or use
INSERT OVERWRITE TABLE tablename1 PARTITION (partcol1=val1, partcol2=val2...)...
statement.
You may also need to make database containing table active
use [dbname]
otherwise you may get error (even if you specify database i.e. dbname.table )
FAILED Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter partition. Unable to alter partitions because table or database does not exist.