How to drop hive partitions with setting limit

How to drop hive partitions with setting limit - hive

I have a drop alter query , i want to set limit for it.
Query is :
alter table dim_known_hosts drop if exists PARTITION (dimensional_partition_folder<'2016_01_04_00_30');
It drops all partition less than 2016_01_04_00_30.
I want to delete just first 10.

Related

Deletion of Partitions

I am not able to drop partition in hive table.
ALTER TABLE db.table drop if exists partition(dt="****-**-**/id=**********");
OK
Time taken: 0.564 seconds
But partitions are not getting deleted
Below is the what I get when I check partitions of my table:
hive> show partitions db.table;
OK
dt=****-**-**/id=**********
dt=****-**-**/id=**********
dt=****-**-**/id=**********
dt=****-**-**/id=**********
After running Alter table db.table drop if exists command it should actually delete the partition . But it is not happening so .
Can you please suggest me on this.
Thanks in advance.

Try this:
ALTER TABLE db.table drop if exists partition(dt='****-**-**', id='**********');

As #leftjoin also mentioned, you have to specify partitions with comma seperated.
ALTER TABLE page_view DROP if exists PARTITION (dt='****-**-**', id='**********');
Please note -
In Hive 0.7.0 or later, DROP returns an error if the partition doesn't
exist, unless IF EXISTS is specified or the configuration variable
hive.exec.drop.ignorenonexistent is set to true.
Due to this reason, your query didn't fail and returned OK response.

exclude partitions in select query

I have a table in hive which is partitioned based on country.
I want to exclude 3 specific partition like somalia,iraq.
I do not want to give in where clause (not in 'somalia','iraq').
Do we have option to exclude specific partitions like (we have exclude columns from the select statement)?.
Please suggest.

You can drop the partitions that are not needed,
hive> alter table <db_name>.<table_name> drop partition
(<partition_filed>="somalia"),(<partition_filed>="iraq");
(or)
Create a view on top of the table by excluding the partitions that are not needed.
hive> create view <db_name>.<view_name> as select * from <db_name>.<table_name>
where <partition_filed> not in ("somalia","iraq");
hive> select * from <db_name>.<view_name>;

dropping hive partition dynamically

I have an HIVE table with daily partitions day wise, something like below (which includes future date's partition as well)
20160901
20160902
........
........
........
20160931
20161001
20161002
I want to pass one date say for example yesterday's date 20160922 and want to drop all partitions dynamically which are >= 20160922 (though today is 20160923, but I want to drop from 20160922 date).
How can I can drop all these partitions dynamically.

You can not do in hive directly as it does not support dynamic sql.
There can be work around using shell script/or any script create file having drop partition script like below.
alter table partition_t drop if exists partition (y=20160922 );
alter table partition_t drop if exists partition (y=20160921 );
alter table partition_t drop if exists partition (y=20160920 );
...
then run hive -v -f ./file.sh
alter table partition_t drop if exists partition

Before Inserting Data Into Table Perform the below steps.
1) Go to Hdfs Folder of that table and delete all the folders Inside
Table Directory using Shell Commands. hadoop fs -rm r <>
2) Run MSCK repair Table to update the metadata about partitions.
above two steps will delete all the available partitions based on pattern.
Now Insert your new data.

You can drop partitions giving a range filter. For reference see that answer : https://stackoverflow.com/a/48422251/3132181
So your code could be like that:
Alter table mytable drop partition (datehour >= '20160922')

Dropping a range of partitions in HIVE

I have a Hive (ver 0.11.0) table partitioned by column date, of type string. I want to know if there exists a way in Hive by which I can drop partitions for a range of dates (say from 'date1' to 'date2'). I have tried the following (SQL type) queries, but they don't seem to be syntactically correct:
ALTER TABLE myTable DROP IF EXISTS PARTITION
(date>='date1' and date<='date2');
ALTER TABLE myTable DROP IF EXISTS PARTITION
(date>='date1' && date<='date2');
ALTER TABLE myTable DROP IF EXISTS PARTITION
(date between 'date1' and 'date2');

I tried this syntax it worked.
ALTER TABLE mytable DROP PARTITION (dates>'2018-04-14',dates<'2018-04-16');
Command output:
Dropped the partition dates=2018-04-15/country_id=107
Dropped the partition dates=2018-04-15/country_id=110
Dropped the partition dates=2018-04-15/country_id=112
Dropped the partition dates=2018-04-15/country_id=14
Dropped the partition dates=2018-04-15/country_id=157
Dropped the partition dates=2018-04-15/country_id=159
Dropped the partition dates=2018-04-15/country_id=177
Dropped the partition dates=2018-04-15/country_id=208
Dropped the partition dates=2018-04-15/country_id=22
Dropped the partition dates=2018-04-15/country_id=233
Dropped the partition dates=2018-04-15/country_id=234
Dropped the partition dates=2018-04-15/country_id=76
Dropped the partition dates=2018-04-15/country_id=83
OK
Time taken: 0.706 seconds
I am using, Hive 1.2.1000.2.5.5.0-157

Solution: alter table myTable drop partition (unix_timestamp('date1','yyyy-MM-dd')>unix_timestamp(myDate,‌'yyyy-MM-dd'),unix_t‌imestamp('date2','yy‌yy-MM-dd')<unix_time‌stamp(myDate,'yyyy-M‌M-dd'));

How to Update/Drop a Hive Partition?

After adding a partition to an external table in Hive, how can I update/drop it?

You can update a Hive partition by, for example:
ALTER TABLE logs PARTITION(year = 2012, month = 12, day = 18)
SET LOCATION 'hdfs://user/darcy/logs/2012/12/18';
This command does not move the old data, nor does it delete the old data. It simply sets the partition to the new location.
To drop a partition, you can do
ALTER TABLE logs DROP IF EXISTS PARTITION(year = 2012, month = 12, day = 18);

in addition, you can drop multiple partitions from one statement (Dropping multiple partitions in Impala/Hive).
Extract from above link:
hive> alter table t drop if exists partition (p=1),partition (p=2),partition(p=3);
Dropped the partition p=1
Dropped the partition p=2
Dropped the partition p=3
OK
EDIT 1:
Also, you can drop bulk using a condition sign (>,<,<>), for example:
Alter table t
drop partition (PART_COL>1);

Alter table table_name drop partition (partition_name);

You can either copy files into the folder where external partition is located or use
INSERT OVERWRITE TABLE tablename1 PARTITION (partcol1=val1, partcol2=val2...)...
statement.

You may also need to make database containing table active
use [dbname]
otherwise you may get error (even if you specify database i.e. dbname.table )
FAILED Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter partition. Unable to alter partitions because table or database does not exist.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to drop hive partitions with setting limit - hive

I have a drop alter query , i want to set limit for it. Query is : alter table dim_known_hosts drop if exists PARTITION (dimensional_partition_folder<'2016_01_04_00_30'); It drops all partition less than 2016_01_04_00_30. I want to delete just first 10.

Related

Deletion of Partitions

exclude partitions in select query

dropping hive partition dynamically

Dropping a range of partitions in HIVE

How to Update/Drop a Hive Partition?

Categories

Resources