I have a partitioned table Student which already has one partition column dept. I need to add new partition column gender
Will it be possible to add this new partition column in already partitioned hive table.
The table data does not have gender column. It is a new constant column to be added in hive table.
Partitions are hierarchical folders like table_location/dept=Accounting/gender=male/
Folder structure should exist. You can easily add non-partition column as the last one and it will return NULLs if the data does not contain that column, but to add a partition column the easiest way is to create new table partitioned as you want, insert overwrite that table from the old one (selecting partitions columns as the last ones), drop old table, rename new one.
See this answer about dynamic partitions load: https://stackoverflow.com/a/48901871/2700344
Related
I have an existing date partitioned table and I want to create a new date partitioned table with only one column from the original table while keeping the original partitioning.
I have tried:
Creating an empty partitioned table and copying the results in from a query but then the partitioning is missing.
The following Create statement should work if I also include the partitioned date as a column in my new table which I don't want. Is there a way to use the part_date column as a partition decorator when loading in the data from a query result ?
CREATE TABLE
cat_dataset.cats_names(cat_name string)
PARTITION BY
part_date AS
SELECT
cat_name,
_PARTITIONDATE AS part_date
FROM
`myproject.cat_dataset.cats`
I want to avoid looping over all the dates and writing the data off that date to the new table. Is there a way to use the part_date column as a partition decorator when loading in the data from a query result ?
INSERT INTO allows you to specify _PARTITIONTIME as a column, see link. Code below should work:
CREATE TABLE cat_dataset.cats_names(cat_name string)
PARTITION BY DATE(_PARTITIONTIME);
INSERT INTO cat_dataset.cats_names (_PARTITIONTIME, cat_name)
SELECT _PARTITIONTIME, cat_name
FROM `myproject.cat_dataset.cats`
I have a hive external table with 3 partition columns (A,B,C) and now I want to drop B and C columns from the partition.Is it possible to do so?
I have tried with Alter table tab_name drop column col_name; --- but it throws an error stating partitioned columns cannot be dropped.
To drop partition columns the table should be recreated. The steps are:
Drop table, dropping external table will not drop data files.
Reorganize data folders to reflect new partition structure. Partitions are folders on physical level, hierarchically organized. If you delete upper level partition, then all sub-folders should be moved to the upper level and so on. if you are deleting two upper partition columns and only one is left then it should be only one level subfolders under the table location.
Create table with new partitioning schema on top of old location.
Run MSCK repair table. It will create partition metadata for all found partitions folders.
If all of these steps seem too complex or too difficult to do, then simply create new table and load data :
Create new table with new partitioning schema.
Load data into new table.
drop old table and rename new one
Like this:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table new_table partition(C)
select --list columns without deleted
from old_table;
And finally, after dropping old table, you can rename new one using ALTER TABLE table_name RENAME TO new_table_name.
Hi my table has 150 columns and i dont want to mention the column name while doing partition. Is there any way using temp table to create partitioned table from non partitioned one.
Temporary table in Hive doesn't support Partitioning.
You cannot create any permanent partitioned table without mentioning Partition
Column.
I am not sure about altering a table to create a new partition as I am afraid I will lose data. If a table in an Oracle SQL DB is already partitioned but I am adding a new partition, will the existing data in the table be deleted?
No you dont lose the data
You can create list partition and expand it on default partition
for example if your partion is date:
alter table your_table split partition PDEFAULT values(TO_DATE('20161206','yyyymmdd')) into ( partition P20161206,partition PDEFAULT)
The only ALTER TABLE partitioning commands that can destroy data are DROP and TRUNCATE.
The EXCHANGE partition command can move data from a table partition to a different table, and vice-versa.
ADD, MOVE, COALESCE, RENAME, SPLIT, and MERGE do not change the table's data, although COALESCE, SPLIT, and MERGE can change the partition or subpartition in which data is stored.
How to apply Partition on hive table which is already partitioned. I am not able to fetch the partitioned data into the folder after the data is loaded.
1st rule of partitioning in hive is that the partitionioning column should be the last column in the data. since the data is already partitioned lets say we are partitioning data on gender M/F there will be two directories gender=M and gender=F be created inside each of the directories respective gender data will be available and last column again in this data will be gender.
If you want to partiton data again on partitioned table use insert into select and make sure last column you use is the partition column you want to on the partitioned data.
Did you add a partition manually with the Hdfs command ? In that case metastore will not keep track of partitions being added unless you specify " alter table add partition "...
try this
MSCK REPAIR TABLE table_name;
If that is not the case , then try to drop partitions and create the partitions again . Use alter table command to do this. but you will lose the data . and your partitioning column value should be mentioned as last column in case if you are doing a dynamic partition insert.