Split hive partition to create multiple partition - hive

I have an external hive table which is partitioned on load_date (DD-MM-YYYY). however the very first period lets say 01-01-2000 has all the data from 1980 till 2000. How can I further create partitions on year for the previous data while keeping the existing data (data for load date greater than 01-01-2000) still available

First load the data of '01-01-2000' into a table and create a dynamic partition table partitioned by data '01-01-2000'. This might solve your problem.

Related

Drop part of partitions in Hive SQL

I have an external hive table, partitioned on the date column. Each date has multiple AB test experiment’s data.
I need to create a job that drops experiments which have ended more than 6 months ago.
Since dropping data in an external hive partitioned table, drops the entire partition. In this case, data for one whole date. Is there a way to only drop part of a date?

Updating a specific partition in bigquery

I have a couple of years of data on a big query partitioned table by day. I want to replace the data of just the last 30 days; however, when I use the create or replace table function on bigquery, it replaces the entire table with just the new dates partitions. Is there any way to update only those partitions without losing the previous data?

Schedule Query Partition Table Hourly

I have schedule query which runs hourly I want to partition the table hourly so in the destinaltion I have provided this mytable_{run_time|"%Y%m%d%H"}, but this is creating a new table for every run in my BigQuery datasets , when I change the destination to mytable_{run_time|"%Y%m%d"}, it's partition the data correctly based on date
How to enable hourly partition in big query ?
What you are doing is aligned to table sharding, which you can do but it is not as performant and involves more management. In theory it acts similarly to a partition but is not the same. What you are likely seeing when you use the format mytable_{run_time|"%Y%m%d"} is that you're inserting multiple hours into the same day, and depending on what your table definition is may be partitioned within a single day.
You will want to define the partition in the creation of the table see below:
https://cloud.google.com/bigquery/docs/creating-partitioned-tables#create_a_time-unit_column-partitioned_table

Keeping older data on partition hive table

Keep history data onto a partitioned table
Team,
I have scenario here - I have 2 tables - one is non partitioned and another one is partitioned table partition on one date field.
Have loaded the data from non-partitioned tables to a partitioned table and I have set the below parameter to load onto partition table.
write.partitionBy(“date”) \
.format(“orc”) \
.mode(“overwrite”) \
.saveAsTable(“schema.table1”)
Now both table count match which has 3 years of data. Which is as expected.
now I have refreshed only latest one year of data and try to load the partitioned table but it got loaded only with 1 year data, where as I need all 3 years data in the partitioned table.
What am I missing here.. I have to refresh only 1 year of data and load it to the partition table and keep building history.
Kindly suggest. Thanks
write.partitionBy(“date”)
.format(“orc”)
.mode(“overwrite”)
.saveAsTable(“schema.table1”)
Need to keep history with latest data refresh on each day basis.

How to apply Partition on hive table which is already partitioned

How to apply Partition on hive table which is already partitioned. I am not able to fetch the partitioned data into the folder after the data is loaded.
1st rule of partitioning in hive is that the partitionioning column should be the last column in the data. since the data is already partitioned lets say we are partitioning data on gender M/F there will be two directories gender=M and gender=F be created inside each of the directories respective gender data will be available and last column again in this data will be gender.
If you want to partiton data again on partitioned table use insert into select and make sure last column you use is the partition column you want to on the partitioned data.
Did you add a partition manually with the Hdfs command ? In that case metastore will not keep track of partitions being added unless you specify " alter table add partition "...
try this
MSCK REPAIR TABLE table_name;
If that is not the case , then try to drop partitions and create the partitions again . Use alter table command to do this. but you will lose the data . and your partitioning column value should be mentioned as last column in case if you are doing a dynamic partition insert.