I have below scenario.
Suppose I have a table which has 3 partition. one is for 20190201 next is 20190202 and one is for 20190210.
I have been given requirement. whichever date we pass automatic partition should be created.
so if I am using dynamic sql I am able to create partition after the max partition for eg 20190211. but if I want to create partition for 20190205 it is giving error.
Is there anyway to create the partition at run time without data loss even when max partition exist.
We have been told not to create interval partitioning
this is very simple.
while creating the table itself use interval partition on the date column.
you can choose the partition interval as hour/day/month whichever you like.
so any time you insert a new data to the table based on the date value the data will go to correct partition or create a new partition.
use the below syntax in your table while creating..
partition by range ( date_col )
interval ( NUMTODSINTERVAL(1,'day') )
( partition p1 values less then ( date '2016-01-01' ))
Related
My data is partitioned by day in the standard Hive format:
/year=2020/month=10/day=01
/year=2020/month=10/day=02
/year=2020/month=10/day=03
/year=2020/month=10/day=04
...
I want to query all data from the last 60 days, using Amazon Athena (IE: Presto). I want this query to use the partitioned columns (year, month, day) so that only the necessary partition files are scanned. Assuming I can't change the file partition format, what is the best approach to this problem?
You don't have to use year, month, day as the partition keys for the table. You can have a single partition key called date and add the partitions like this:
ALTER TABLE the_table ADD
PARTITION (`date` = '2020-10-01') LOCATION 's3://the-bucket/data/year=2020/month=10/day=01'
PARTITION (`date` = '2020-10-02') LOCATION 's3://the-bucket/data/year=2020/month=10/day=02'
...
With this setup you can even set the type of the partition key to date:
PARTITIONED BY (`date` date)
Now you have a table with a date column typed as a DATE, and you can use any of the date and time functions to do calculations on it.
What you won't be able to do with this setup is use MSCK REPAIR TABLE to load partitions, but you really shouldn't do that anyway – it's extremely slow and inefficient and really something you only do when you have a couple of partitions to load into a new table.
An alternative way to that proposed by Theo, is to use the following syntax, e.g.:
select ... from my_table where year||month||day between '2020630' and '20201010'
this works when the format for the columns year, month and day are string. It's particularly useful to query across months.
Creating oracle partition for a table for the every day.
ALTER TABLE TAB_123 ADD PARTITION PART_9999 VALUES LESS THAN ('0001') TABLESPACE TS_1
Here I am getting error because value is decreased as 0001 as lower boundary.
You can have Oracle automatically create partitions by using the PARTITION BY RANGE option.
Sample DDL, assuming that the partition key is column my_date_column :
create table TAB_123
( ... )
partition by range(my_date_column) interval(/*numtoyminterval*/ NUMTODSINTERVAL(1,'day'))
( partition p_first values less than (to_date('2010-01-01', 'yyyy-mm-dd')) tablespace ts_1)
;
With this set up in place, Oracle will, if needed, create a partition on the fly when you insert data into the table. It is also usually a good idea to create a default partition, as shown above.
This naming convention (last digit of year plus day number) won't support holding more than ten years worth of data. Maybe you think that doesn't matter but I know databases which are well into their second decade. Be optimistic!
Also, that key is pretty much useless for querying. Most queries against partitioned tables want to get the benefit of partition elimination. But that only' works if the query uses the same value as the partition key. Developers really won't want to be casting a date to YDDD format every time they write a select on the table.
So. Use an actual date for defining the partition key and hence range. Also for naming the partition if it matters that much.
ALTER TABLE TAB_123
ADD PARTITION P20200101 VALUES LESS THAN (date '2020-01-02') TABLESPACE TS_1
/
Note that the range is defined by less than the next day. Otherwise the date of the partition name won't align with the date of the records in the actual partition.
I need to select rows from a partitioned table and save the result into another table, how can I keep records' __PARTITIONTIME the same as they are in the source table? I mean, not only to keep the value of __PARTITIONTIME, but the whole partition feature so that I can do further queries on the target table using time decor and like stuff.
(I'm using Datalab notebooks)
%%sql -d standard --module TripData
SELECT
HardwareId,
TripId,
StartTime,
StopTime
FROM
`myproject.mydataset.TripData`
WHERE
_PARTITIONTIME BETWEEN TIMESTAMP_TRUNC(TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 * 24 HOUR),DAY)
AND TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(),DAY)
You cannot do this for multiple partitions at once!
You should do it one partition at a time specifying target partition - targetTable$yyyymmdd
Note: first you need to create target table as a partitioned table with respective schema
in Teradata you can do something like:
DROP RANGE BETWEEN DATE FROM_DATE AND DATE TO_DATE, EACH INTERVAL '1' DAY;
Is there an equivalent way of doing this in Oracle? dropping a r
DROP RANGE BETWEEN DATE FROM_DATE AND DATE TO_DATE, EACH INTERVAL '1' DAY;
In Oracle, you could use PARTITION FOR clause.
For example,
alter table table_name drop partition for (TO_DATE('some date','date format'))
I think one important thing needs to be kept in mind, you cannot drop the last partition, i.e. the first partition. It would throw an error:
ORA-14758: Last partition in the range section cannot be dropped
Since you have not mentioned your exact Oracle version, I am sharing this 11gR2 documentation about Dropping partitions.
Our system has many tables that require partitioning to support data maintenance. Let's talk about one table to simplify the question. If the data in a table hits 100GB, then the OLTP system starts to slow down. We recommend to customers to move the data from the OLTP system to the OLAP system. We use partitioning by year or month (based on data insertion rates) to facilitate this move.
Here is a sample of a table definition:
create table myPartionedTable
(
object_id number ,
object_type varchar2(18),
RETIREDTIMESTAMP timestamp
)
partition by range (RETIREDTIMESTAMP)
(
partition WM_2010 values less than(TO_DATE('01/01/2011','MM/DD/YYYY')),
partition WM_2011 values less than(TO_DATE('01/01/2012','MM/DD/YYYY')),
partition WM_2012 values less than(TO_DATE('01/01/2013','MM/DD/YYYY')),
partition WM_2013 values less than(TO_DATE('01/01/2014','MM/DD/YYYY')),
partition WM_2014 values less than(TO_DATE('01/01/2015','MM/DD/YYYY')),
partition WM_ACTIVE values less than(MAXVALUE)
)
tablespace MYDATE;
The important point is that the data needs to be retained in the WM_ACTIVE partition till the data is deemed RETIRED. Once retired, the data moves to the appropriate partition and is then eligible for PARTITION_MOVE out of OLTP and into OLAP.
Is this a good approach? Is there a better approach for managing this list of requirements?
Interval partitioning may help. Oracle can automatically create partitions as needed.
There may not be a need to worry about active vs. inactive since it's easy to use the smallest range for all partitions.
create table myPartionedTable
(
object_id number ,
object_type varchar2(18),
RETIREDTIMESTAMP timestamp
)
partition by range (RETIREDTIMESTAMP) INTERVAL(NUMTOYMINTERVAL(1, 'MONTH'))
(
--You still need to specify the lowest possible partition.
partition WM_2010 values less than(date '2011-01-01')
);