Deletion of Partitions - hive

I am not able to drop partition in hive table.
ALTER TABLE db.table drop if exists partition(dt="****-**-**/id=**********");
OK
Time taken: 0.564 seconds
But partitions are not getting deleted
Below is the what I get when I check partitions of my table:
hive> show partitions db.table;
OK
dt=****-**-**/id=**********
dt=****-**-**/id=**********
dt=****-**-**/id=**********
dt=****-**-**/id=**********
After running Alter table db.table drop if exists command it should actually delete the partition . But it is not happening so .
Can you please suggest me on this.
Thanks in advance.

Try this:
ALTER TABLE db.table drop if exists partition(dt='****-**-**', id='**********');

As #leftjoin also mentioned, you have to specify partitions with comma seperated.
ALTER TABLE page_view DROP if exists PARTITION (dt='****-**-**', id='**********');
Please note -
In Hive 0.7.0 or later, DROP returns an error if the partition doesn't
exist, unless IF EXISTS is specified or the configuration variable
hive.exec.drop.ignorenonexistent is set to true.
Due to this reason, your query didn't fail and returned OK response.

Related

Removing specific rows from table without using DELETE

I am working in a company and I need to find a way to delete specific rows from table without using DELETE function.
So I was thinking to use partition and then remove it using drop partition if exists:
select *, count(validity_date) over(partition by another_column) as indicator from schema.table
Which worked, but when I try dropping the partition using
ALTER TABLE schema.table DROP IF EXISTS PARTITION(year(validity_date) = '2022');
I get an error saying
mismatched input '(' expecting set null in drop partition statement
So my question is there any other way to remove specific rows from a table without using the delete function?
Thank you !
There is a typo in your query - missing a closing parenthesis in the end.
ALTER TABLE schema.table DROP IF EXISTS PARTITION(year(validity_date) = '2022'));
These are the only two options to delete the data : Either via drop partition or delete query
P.S. Hive supports ACID transactions like delete and update records/rows on Table only Hive 0.14 version onwards.

dropping hive partition dynamically

I have an HIVE table with daily partitions day wise, something like below (which includes future date's partition as well)
20160901
20160902
........
........
........
20160931
20161001
20161002
I want to pass one date say for example yesterday's date 20160922 and want to drop all partitions dynamically which are >= 20160922 (though today is 20160923, but I want to drop from 20160922 date).
How can I can drop all these partitions dynamically.
You can not do in hive directly as it does not support dynamic sql.
There can be work around using shell script/or any script create file having drop partition script like below.
alter table partition_t drop if exists partition (y=20160922 );
alter table partition_t drop if exists partition (y=20160921 );
alter table partition_t drop if exists partition (y=20160920 );
...
then run hive -v -f ./file.sh
alter table partition_t drop if exists partition
Before Inserting Data Into Table Perform the below steps.
1) Go to Hdfs Folder of that table and delete all the folders Inside
Table Directory using Shell Commands. hadoop fs -rm r <>
2) Run MSCK repair Table to update the metadata about partitions.
above two steps will delete all the available partitions based on pattern.
Now Insert your new data.
You can drop partitions giving a range filter. For reference see that answer : https://stackoverflow.com/a/48422251/3132181
So your code could be like that:
Alter table mytable drop partition (datehour >= '20160922')

How to drop hive partitions with setting limit

I have a drop alter query , i want to set limit for it.
Query is :
alter table dim_known_hosts drop if exists PARTITION (dimensional_partition_folder<'2016_01_04_00_30');
It drops all partition less than 2016_01_04_00_30.
I want to delete just first 10.

Drop hive partition by date range

I use hive-0.10.0-cdh-4.7.0 in my environment.
I have a table named test store as sequence file and some partitions by date_dim like below:
game=Test/date_dim=2014-07-01
game=Test/date_dim=2014-07-11
game=Test/date_dim=2014-07-21
game=Test/date_dim=2014-07-31
I want to drop partitions between 2014-07-21 and 2014-07-30 in SQL command:
alter table test drop partition (date_dim>='2014-07-11',date_dim<='2014-07-30')
I hope these 2 partitions be deleted:
game=Test/date_dim=2014-07-11
game=Test/date_dim=2014-07-21
But actually, these 3 partitions be deleted:
game=Test/date_dim=2014-07-01
game=Test/date_dim=2014-07-11
game=Test/date_dim=2014-07-21
It seems hive drop partition only use the date_dim<='2014-07-30' condition.
Is there anyway to make hive drop partition as I wish?
You should convert the string to the date type, for that purpose you can use unix_timestamp function:
alter table test drop partition (unix_timestamp(date_dim,'yyyy-MM-dd')>=unix_timestamp('2014-07-11','yyyy-MM-dd'),unix_timestamp(date_dim,'yyyy-MM-dd')<=unix_timestamp('2014-07-30','yyyy-MM-dd'))

How to Update/Drop a Hive Partition?

After adding a partition to an external table in Hive, how can I update/drop it?
You can update a Hive partition by, for example:
ALTER TABLE logs PARTITION(year = 2012, month = 12, day = 18)
SET LOCATION 'hdfs://user/darcy/logs/2012/12/18';
This command does not move the old data, nor does it delete the old data. It simply sets the partition to the new location.
To drop a partition, you can do
ALTER TABLE logs DROP IF EXISTS PARTITION(year = 2012, month = 12, day = 18);
in addition, you can drop multiple partitions from one statement (Dropping multiple partitions in Impala/Hive).
Extract from above link:
hive> alter table t drop if exists partition (p=1),partition (p=2),partition(p=3);
Dropped the partition p=1
Dropped the partition p=2
Dropped the partition p=3
OK
EDIT 1:
Also, you can drop bulk using a condition sign (>,<,<>), for example:
Alter table t
drop partition (PART_COL>1);
Alter table table_name drop partition (partition_name);
You can either copy files into the folder where external partition is located or use
INSERT OVERWRITE TABLE tablename1 PARTITION (partcol1=val1, partcol2=val2...)...
statement.
You may also need to make database containing table active
use [dbname]
otherwise you may get error (even if you specify database i.e. dbname.table )
FAILED Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter partition. Unable to alter partitions because table or database does not exist.