How to discard/hide partition column from hive view while selecting , at the same time filter can apply using where clause from the view created on base table partition column, base table is a partitioned table?
For Ex: my table ddl is create table test(id int) partitioned by (year);
view DDL: create view myview select id,year from test;
Now I don't want to see the value of year while selecting the data from view at the same time I should be able to query on specific partition of the base table using myview.
There is a concept of creating a partitioned view available now in HIVE. You should try exploring.
For example,
CREATE VIEW myview PARTITIONED ON (year)
AS SELECT id, year FROM test;
Refer to the below link to understand the rules to be adhered for partition columns when written from the base tables. It seems still limited therefore to be used only if it fits your needs.
https://cwiki.apache.org/confluence/display/Hive/PartitionedViews
Related
This is an extension of a previous question I asked: How to compare two columns with different data type groups
We are exploring the idea of changing the metadata on the table as opposed to performing a CAST operation on the data in SELECT statements. Changing the metadata in the MySQL metastore is easy enough. But, is it possible to have that metadata change applied to partitions (they are daily)? Otherwise, we might be stuck with current and future data being of type BIGINT while the historical is STRING.
Question: Is it possible to change partition meta data in HIVE? If yes, how?
You can change partition column type using this statement:
alter table {table_name} partition column ({column_name} {column_type});
Also you can re-create table definition and change all columns types using these steps:
Make your table external, so it can be dropped without dropping the data
ALTER TABLE abc SET TBLPROPERTIES('EXTERNAL'='TRUE');
Drop table (only metadata will be removed).
Create EXTERNAL table using updated DDL with types changed and with the same LOCATION.
recover partitions:
MSCK [REPAIR] TABLE tablename;
The equivalent command on Amazon Elastic MapReduce (EMR)'s version of Hive is:
ALTER TABLE tablename RECOVER PARTITIONS;
This will add Hive partitions metadata. See manual here: RECOVER PARTITIONS
And finally you can make you table MANAGED again if necessary:
ALTER TABLE tablename SET TBLPROPERTIES('EXTERNAL'='FALSE');
Note: All commands above should be ran in HUE, not MySQL.
You can not change the partition column in hive infact Hive does not support alterting of partitioning columns
Refer : altering partition column type in Hive
You can think of it this way
- Hive stores the data by creating a folder in hdfs with partition column values
- Since if you trying to alter the hive partition it means you are trying to change the whole directory structure and data of hive table which is not possible
exp if you have partitioned on year this is how directory structure looks like
tab1/clientdata/2009/file2
tab1/clientdata/2010/file3
If you want to change the partition column you can perform below steps
Create another hive table with required changes in partition column
Create table new_table ( A int, B String.....)
Load data from previous table
Insert into new_table partition ( B ) select A,B from table Prev_table
I have an existing date partitioned table and I want to create a new date partitioned table with only one column from the original table while keeping the original partitioning.
I have tried:
Creating an empty partitioned table and copying the results in from a query but then the partitioning is missing.
The following Create statement should work if I also include the partitioned date as a column in my new table which I don't want. Is there a way to use the part_date column as a partition decorator when loading in the data from a query result ?
CREATE TABLE
cat_dataset.cats_names(cat_name string)
PARTITION BY
part_date AS
SELECT
cat_name,
_PARTITIONDATE AS part_date
FROM
`myproject.cat_dataset.cats`
I want to avoid looping over all the dates and writing the data off that date to the new table. Is there a way to use the part_date column as a partition decorator when loading in the data from a query result ?
INSERT INTO allows you to specify _PARTITIONTIME as a column, see link. Code below should work:
CREATE TABLE cat_dataset.cats_names(cat_name string)
PARTITION BY DATE(_PARTITIONTIME);
INSERT INTO cat_dataset.cats_names (_PARTITIONTIME, cat_name)
SELECT _PARTITIONTIME, cat_name
FROM `myproject.cat_dataset.cats`
Is it possible to add partitions dynamically instead of fixed to specific static data. For example, if we need to create partitions for all dates from different CSV records.
You have to create the partition using ALTER TABLE ADD PARTITION explicitly (after creating the partitioned table) today. So the current suggestion is to look at all distinct dates of your data and generate the ALTER statement programmatically.
I suggest you add a request to http://aka.ms/adlfeedback for a more dynamic partition generation.
You can pass dynamic data (dates is the classic example) to create partitions, sample construct below, does this help?
E.g.
CREATE TABLE MyTable(Day DateTime, MyValue string, ....,
INDEX idx CLUSTERED(MyValue)
PARTITIONED BY BUCKETS(Day)
HASH(MyValue) INTO 100
);
I need to prepare the script to increase the partition range if the partition is going to get finished in next 2-3 months. How to find the existing table partition and we can edit to existing table or we need to create a new script.
Appreciate response
How to find the existing table partition
You could either generate the table DDL using DBMS_METADATA package to get the complete table DDL.
Or, query the user_tab_partitions view to get the table partition information.
To add new partitions, you need to use ADD PARTITION clause:
ALTER TABLE <table_name>
ADD PARTITION <new_partition>
VALUES (<new_value>)
TABLESPACE <tablespace_name>;
I have a partitioned view with 20 tables. Each table has a partition key (usp_id) ranging from 1 to 20. If I query the partitioned view using the partition key then only the table with the correct usp_id is queried which is fine.
Now i have a second table which has two fields. Usp_id and insert_date. The insert_date in this table is updated daily. It is a one to one mapping in this table.
I would like to be able to query my partition view based on insert_date which then would use the usp_id to query the partitioned view.
Is this possible?
Many thanks in advance!