exclude partitions in select query

exclude partitions in select query - hive

I have a table in hive which is partitioned based on country.
I want to exclude 3 specific partition like somalia,iraq.
I do not want to give in where clause (not in 'somalia','iraq').
Do we have option to exclude specific partitions like (we have exclude columns from the select statement)?.
Please suggest.

You can drop the partitions that are not needed,
hive> alter table <db_name>.<table_name> drop partition
(<partition_filed>="somalia"),(<partition_filed>="iraq");
(or)
Create a view on top of the table by excluding the partitions that are not needed.
hive> create view <db_name>.<view_name> as select * from <db_name>.<table_name>
where <partition_filed> not in ("somalia","iraq");
hive> select * from <db_name>.<view_name>;

Related

SQL Impala create partitioned table from a view

I am interested in turning a view into a table, but I want the table to be partitioned wrt to one variable:
My query is:
CREATE TABLE table_test AS (
SELECT
*
FROM view_test
And I want to have the table partitioned by the variable "time-period".
Thank you in advance.

You can do this directly like this
CREATE TABLE table_test
PARTITIONED BY (time_period)
as
SELECT col1,col2,
time_period -- Pls make sure this partition column as the last column in SELECT.
FROM schema.view_test;

How to modify CTAS query to append query results to table based on if new partition doesn't exist? - Athena

I have a query that I want to execute daily that's to be partitioned by the date it's executed. The results of this query should be appended to a the same table.
My idea was ideally having something similar to the CREATE TABLE IF NOT EXISTS command for adding data by a new partition every day to the existing table if the partition doesn't already exist, but I can't figure out how I'd be able to integrate this in my query.
My query:
CREATE TABLE IF NOT EXISTS db_name.table_name
WITH (
external_location = 's3://my-query-results-location/',
format = 'PARQUET',
parquet_compression = 'SNAPPY',
partitioned_by = ARRAY['date_executed'])
AS
SELECT
{columns_that_I_am_selecting_here_including_'date_executed'}
What this does is create a new table for the first day it's executed but nothing happens for subsequent days, I'm assuming because of the CREATE TABLE IF NOT EXISTS validating that the table already exists and not proceeding with the logic.
Is there a way to modify my query to create a table for the first day executed and append the results by a new partition for each subsequent day?
I'm quite sure ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION would not apply to my use case here as I'm running a CTAS query.

You can simply use INSERT INTO existing_table SELECT....
Presumably your table is already partitioned, so include that partition column in the SELECT and Amazon Athena will automatically put the data in the correct directory.
For example, you might include hte column like this: SELECT ... CURRENT_DATE as date_executed
See: INSERT INTO - Amazon Athena

Copied data column is not partitioned in target table in hive

I have created a table in hive from existing partitioned table using the command
create table new_table As select * from old_table;
Record counts are matching in both the table but when I give DESC table I could see the column is not partitioned in New table.

You should explicitly specify partition columns when creating the table.
create table new_table partitioned by (col1 datatype,col2 datatype,...) as
select * from old_table;

Drop hive partition by date range

I use hive-0.10.0-cdh-4.7.0 in my environment.
I have a table named test store as sequence file and some partitions by date_dim like below:
game=Test/date_dim=2014-07-01
game=Test/date_dim=2014-07-11
game=Test/date_dim=2014-07-21
game=Test/date_dim=2014-07-31
I want to drop partitions between 2014-07-21 and 2014-07-30 in SQL command:
alter table test drop partition (date_dim>='2014-07-11',date_dim<='2014-07-30')
I hope these 2 partitions be deleted:
game=Test/date_dim=2014-07-11
game=Test/date_dim=2014-07-21
But actually, these 3 partitions be deleted:
game=Test/date_dim=2014-07-01
game=Test/date_dim=2014-07-11
game=Test/date_dim=2014-07-21
It seems hive drop partition only use the date_dim<='2014-07-30' condition.
Is there anyway to make hive drop partition as I wish?

You should convert the string to the date type, for that purpose you can use unix_timestamp function:
alter table test drop partition (unix_timestamp(date_dim,'yyyy-MM-dd')>=unix_timestamp('2014-07-11','yyyy-MM-dd'),unix_timestamp(date_dim,'yyyy-MM-dd')<=unix_timestamp('2014-07-30','yyyy-MM-dd'))

CREATE TABLE AS select * from partitioned table

I want to create a table using CTAS of partitioned table.
New table must have all the data and partitions, subpartitions of old table.
How to do this?

You need to first create the new table with all the partitions, there is no way you can add partition definitions to a CTAS. Once the table is created you can populate it using insert into .. select.
You can use dbms_metadata.get_ddl to get the definition of the old table.
select dbms_metadata.get_ddl('TABLE', 'NAME_OF_EXISTING_TABLE')
from dual;
Save the output of that into a script, do a search and replace to adjust the table name, then run the create table and then run the insert into ... select ...

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

exclude partitions in select query - hive

Related

SQL Impala create partitioned table from a view

How to modify CTAS query to append query results to table based on if new partition doesn't exist? - Athena

Copied data column is not partitioned in target table in hive

Drop hive partition by date range

CREATE TABLE AS select * from partitioned table

Categories

Resources