Creating oracle partition for a table for the every day.
ALTER TABLE TAB_123 ADD PARTITION PART_9999 VALUES LESS THAN ('0001') TABLESPACE TS_1
Here I am getting error because value is decreased as 0001 as lower boundary.
You can have Oracle automatically create partitions by using the PARTITION BY RANGE option.
Sample DDL, assuming that the partition key is column my_date_column :
create table TAB_123
( ... )
partition by range(my_date_column) interval(/*numtoyminterval*/ NUMTODSINTERVAL(1,'day'))
( partition p_first values less than (to_date('2010-01-01', 'yyyy-mm-dd')) tablespace ts_1)
;
With this set up in place, Oracle will, if needed, create a partition on the fly when you insert data into the table. It is also usually a good idea to create a default partition, as shown above.
This naming convention (last digit of year plus day number) won't support holding more than ten years worth of data. Maybe you think that doesn't matter but I know databases which are well into their second decade. Be optimistic!
Also, that key is pretty much useless for querying. Most queries against partitioned tables want to get the benefit of partition elimination. But that only' works if the query uses the same value as the partition key. Developers really won't want to be casting a date to YDDD format every time they write a select on the table.
So. Use an actual date for defining the partition key and hence range. Also for naming the partition if it matters that much.
ALTER TABLE TAB_123
ADD PARTITION P20200101 VALUES LESS THAN (date '2020-01-02') TABLESPACE TS_1
/
Note that the range is defined by less than the next day. Otherwise the date of the partition name won't align with the date of the records in the actual partition.
Related
This may sound crazy, but I want to implement something like having a view with a partition.
Background:
I had a table with a date partition on a column which is really huge in size. We are running data ingestion to this table at every 2mins interval. All the data loads are append-only. Ever load will insert 10k+ rows. After some time, we encountered the partition limitation issue.
message: "Quota exceeded: Your table exceeded quota for Number of partition modifications to a column partitioned table. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors"
Root cause:(from GCP support team)
The root cause under the hood was that due to your partitioned tables
have pretty granular partition for instance by minutes, hours or date,
when the loaded data cover a wide range of partition period, the
number of partition get modified will be high and above 4000. As per
internal documentation, it was suggested the user who ran into this
issue to consider making a less granular partition for instance change
a date/hour/minute based partitioned table to a week based partitioned
table. Alternatively split the load to multiple and hence limit the
data range to cover less number of partitions that would be affected.
This is the best recommendation I could have now.
So I'm planning to keep this table as un-partitioned and create a view(we need a view for eliminating the duplicates) and it should have parition. Is this possible? or any other alternate solution for this?
You can't partition a view, it's not physically materialized. Partitioning on day can be limiting with the 4000 limit, would year work? then you can use an integer partition:
create or replace table BI.test
PARTITION BY RANGE_BUCKET(Year, GENERATE_ARRAY(2000, 3000, 1)) as
select 2000 as Year, 1 as value
union all
select 2001 as Year, 1 as value
union all
select 2002 as Year, 1 as value
Alternatively, I've used month (YYYYMM) or week (YYYYWW) to integer partition by which gets you around 40 years:
RANGE_BUCKET(monthasintegerfield, GENERATE_ARRAY(201612, 205712, 1))
My data is partitioned by day in the standard Hive format:
/year=2020/month=10/day=01
/year=2020/month=10/day=02
/year=2020/month=10/day=03
/year=2020/month=10/day=04
...
I want to query all data from the last 60 days, using Amazon Athena (IE: Presto). I want this query to use the partitioned columns (year, month, day) so that only the necessary partition files are scanned. Assuming I can't change the file partition format, what is the best approach to this problem?
You don't have to use year, month, day as the partition keys for the table. You can have a single partition key called date and add the partitions like this:
ALTER TABLE the_table ADD
PARTITION (`date` = '2020-10-01') LOCATION 's3://the-bucket/data/year=2020/month=10/day=01'
PARTITION (`date` = '2020-10-02') LOCATION 's3://the-bucket/data/year=2020/month=10/day=02'
...
With this setup you can even set the type of the partition key to date:
PARTITIONED BY (`date` date)
Now you have a table with a date column typed as a DATE, and you can use any of the date and time functions to do calculations on it.
What you won't be able to do with this setup is use MSCK REPAIR TABLE to load partitions, but you really shouldn't do that anyway – it's extremely slow and inefficient and really something you only do when you have a couple of partitions to load into a new table.
An alternative way to that proposed by Theo, is to use the following syntax, e.g.:
select ... from my_table where year||month||day between '2020630' and '20201010'
this works when the format for the columns year, month and day are string. It's particularly useful to query across months.
I have a table Partitioned by:
HASH(timestamp DIV 43200 )
When I perform this query
SELECT max(id)
FROM messages
WHERE timestamp BETWEEN 1581708508 AND 1581708807
it scans all partitions while both numbers 1581708508 & 1581708807& numbers between them are in the same partition, how can I make it to scan only that partition?
You have discovered one of the reasons why PARTITION BY HASH is useless.
In your situation, the Optimizer sees the range (BETWEEN) and says "punt, I'll just scan all the partitions".
That is, "partition pruning" does not work when the WHERE clause involves a range and you are using PARTITION BY HASH. PARTITION BY RANGE, on the other hand, may be able to prune. But... What's the advantage? It does not make the query any faster.
I have found only four uses for partitioning: http://mysql.rjweb.org/doc.php/partitionmaint . It sounds like your application does not fit any of those cases.
That particular query would best be done without partitioning. Instead have a non-partitioned table with this 'composite' index:
INDEX(timestamp, id)
It must scan all the rows to discover the MAX(id), but with this index, it is
Scanning only the 2-column index
Not touching any rows outside the timestamp range.
Hence it will be as fast as possible. Even if PARTITION BY HASH were smart enough to do the desired pruning, it would not run any faster.
In particular, when you ask for a range on the Partition key, such as with WHERE timestamp BETWEEN 1581708508 AND 1581708807, the execution looks in all partitions for the desired rows. This is one of the major failings of Hash. Even if it could realize that only Partition is needed, it would be no faster than simply using the index I suggest.
You can determine that individual partition by using modular arithmetic
MOD(<formula which's argument of hash function>,<number of partitions>)
assuming you have 2 partitions
CREATE TABLE messages(ID int, timestamp int)
PARTITION BY HASH( timestamp DIV 43200 )
PARTITIONS 2;
look up partition names by
SELECT CONCAT( 'p',MOD(timestamp DIV 43200,2)) AS partition_name, timestamp
FROM messages;
and determine the related partition name for the value 1581708508 of timestamp column (assume p1). Then Use
SELECT MAX(id)
FROM messages PARTITION(p1)
to get all the records only in the partition p1 without need of a WHERE condition such as
WHERE timestamp BETWEEN 1581708508 AND 1581708807
Btw, all partitions might be listed through
SELECT *
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE table_name='messages'
Demo
Need help on below issue.
I need to delete rows from a table having huge amount of data getting inserted on daily basis, I have written a procedure which deletes the rows based on a column having index on it which to me should be enough but my collegue suggested me to use a date column as well to delete the data as this will use date parition (Parition is based on date).
My doubt is which delete statement would be faster to delete the data.
E.g
1. Column name :- FILE_NAME (Having index)
delete from table_name where column_name1=file_name
2. Column name1 :- FILE_NAME (HHaving index) and column name2:- TXN_DATE (no index, Partition is on this column)
delete from table_name where column_name1=file_name and txn_date=date_value
Please advise.
Thanks
Yes, your colleague is right. The second query will be quicker.
The process is called partition pruning. Using the column, based on which partitions are created will automatically hit only the necessary partitions where the data is available.
You can also directly reference the partition if you can determine the name of the partition for the date_value, as
DELETE FROM table_name
PARTITION (partition_date_value)
WHERE column_name1=file_name;
References:
Examples for DELETE on Oracle Database SQL Language Reference
Partition Pruning
Another Partition Pruning website
If file name is a index that actually improves the navigation on your table, i think it would be faster to use the first one.
Our system has many tables that require partitioning to support data maintenance. Let's talk about one table to simplify the question. If the data in a table hits 100GB, then the OLTP system starts to slow down. We recommend to customers to move the data from the OLTP system to the OLAP system. We use partitioning by year or month (based on data insertion rates) to facilitate this move.
Here is a sample of a table definition:
create table myPartionedTable
(
object_id number ,
object_type varchar2(18),
RETIREDTIMESTAMP timestamp
)
partition by range (RETIREDTIMESTAMP)
(
partition WM_2010 values less than(TO_DATE('01/01/2011','MM/DD/YYYY')),
partition WM_2011 values less than(TO_DATE('01/01/2012','MM/DD/YYYY')),
partition WM_2012 values less than(TO_DATE('01/01/2013','MM/DD/YYYY')),
partition WM_2013 values less than(TO_DATE('01/01/2014','MM/DD/YYYY')),
partition WM_2014 values less than(TO_DATE('01/01/2015','MM/DD/YYYY')),
partition WM_ACTIVE values less than(MAXVALUE)
)
tablespace MYDATE;
The important point is that the data needs to be retained in the WM_ACTIVE partition till the data is deemed RETIRED. Once retired, the data moves to the appropriate partition and is then eligible for PARTITION_MOVE out of OLTP and into OLAP.
Is this a good approach? Is there a better approach for managing this list of requirements?
Interval partitioning may help. Oracle can automatically create partitions as needed.
There may not be a need to worry about active vs. inactive since it's easy to use the smallest range for all partitions.
create table myPartionedTable
(
object_id number ,
object_type varchar2(18),
RETIREDTIMESTAMP timestamp
)
partition by range (RETIREDTIMESTAMP) INTERVAL(NUMTOYMINTERVAL(1, 'MONTH'))
(
--You still need to specify the lowest possible partition.
partition WM_2010 values less than(date '2011-01-01')
);