Create new index on specific partitions postgresql 14 - sql

I am using postgresql 14.
I have a table which is ranged partitioned by days, the table's retention is rather small - i.e have 14 days worth of data (and dropping partitions older than 14 days).
I would like to introduce a new index, and was thinking if it is possible to create the index only for new partitions and not for old partitions,so I can avoid reindexing existing data currently on the "older partitions" table as these will anyways be deleted.
My question - is this worth doing? if so, do I have to create the index on table level after all partitions available in the table have the new index?
If not, would the best way to go is to create the index concurrently?
This is currently a thought, I do not have much experience with such operations on partitioned tables

Yes, you can just create the index for new partitions. When you create the new partition, just create an index for it, either manually or scriptedly. Once all extant partitions have the index, then you can create the index on the parent table very quickly, as it just sets some metadata and doesn't build any indexes. At that point, new partitions would automatically have the index created on them.
Whether it is worth doing, I don't think anyone can answer that for you. If each partition would take most of a day to index, maybe that would be worth avoiding. But in that case, can you even tolerate having the index in the first place?

Related

Automated way of deleting millions of rows from Postgres tables

Postgres Version: PostgreSQL 10.9 (Ubuntu 10.9-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, 64-bit
Before I ask my question, I would like to explain why I'm looking into this. we have a history table which has more than 5 million rows and growing every hour.
As the table length grows the select queries are becoming slower, even though we have a proper index. So ideally the first choice for us to delete the old records which are unused.
Approach #1
We tried deleting the records from the table using simple delete from table_name where created_date > certain_date where is_active = false
This took a very long time.
Approach #2
Create a script which would delete the rows with the cursor-based approach.
This also takes a very long time.
Approach #3
Created a new unlogged table.
Create an index on the new_table.
Copy the contents from the old table to a new table
Then set table is logged.
Rename the master table as a backup.
Issues with this approach, it requires some downtime.
On live productions instances, this would result in missing data / resulting in failures
Approach #4
On further investigation, the performant way to delete unused rows is if we create a table with partition https://www.postgresql.org/docs/10/ddl-partitioning.html - Which we could drop the entire partition immediately.
Questions with the above approach are
How can I create a partition on the existing table?
Will that require downtime?
How can we configure Postgres to create partition automatically, we can't really create partitions manually everyday right?
Any other approaches are also welcome, the thing is I really want this to be automated than manual because I would extend this to multiple tables.
Please let me know your thoughts, which would very helpful
I would go for approach 4, table partitioning.
Create partitions
New data goes directly to the correct partition
Move old data (manually / scripted) to the correct partition
Set a cron job to create partitions for the next X days, if they don't exists already
No downtime needed
We tried deleting the records from the table using simple delete from table_name where created_date > certain_date where is_active = false This took a very long time.
Surely you mean <, not > there? So what if it took a long time? How fast do you need it to be? Did it cause problems? How much data were you trying to delete at once?
5 millions rows is pretty small. I probably wouldn't use partitioning in the first place on something that size.
There is no easy and transparent way to migrate to having partitioned data if you don't already have it. The easiest way is to have a bit of downtime as you populate the new table.
If you do want to partition, your partitioning scheme would have to include is_active, not just created_date. And by day seems way too fine, you could do it by month and pre-create a few years worth up front.
Answering specifically to the below:
How can we configure Postgres to create partition automatically, we can't really create partitions manually everyday right?
As you are using Postgres 10, you could use https://github.com/pgpartman/pg_partman for auto management of partitions.

Am I supposed to drop and recreate indexes on tables every time I add data to it?

I'm currently using MSSQL Server, I've created a table with indexes on 4 columns. I plan on appending 1mm rows every month end. Is it customary to drop the indexes, and recreate them every time you add data to the table?
Don't recreate the index. Instead, you can use update statistics to compute the statistics for the given index or for the whole table:
UPDATE STATISTICS mytable myindex; -- statistics for the table index
UPDATE STATISTICS mytable; -- statistics for the whole table
I don't think it is customary, but it is not uncommon. Presumably the database would not be used for other tasks during the data load, otherwise, well, you'll have other problems.
It could save time and effort if you just disabled the indexes:
ALTER INDEX IX_MyIndex ON dbo.MyTable DISABLE
More info on this non-trivial topic can be found here. Note especially that disabling the clustered index will block all access to the table (i.e. don't do that). If the data being loaded is ordered in [clustered index] order, that can help some.
A last note, do some testing. 1MM rows doesn't seem like that much; the time you save may get used up by recreating the indexes.

Deleting some data from partition will impact local index?

I have a partitioned table "alarms" as following
partitioned by range(version); version: 1,2,3 ..
each partition have local index on version
each partition have a mix of columns as local indexes
version is a local index
no global index
Due to some business constrains,
I need to delete some data from each version (but not all partition data).
no update will happen to old versions, only select
on daily basis, i am inserting new version data
So for this i will delete as following:
delete /*+ full(alarms) parallel(alarms,4)*/ from alarms where version <= (number) and alarm_type = 'type1';
And this will not delete all the partition. But may be each 1 month, this partition will be empty.
So I have a procedure loops on all versions and all empty partitions will be dropped by name.
My question is: Until partition is not empty
this may impact performance?
Do i need to rebuild index each delete?
This may impact performance?
I'm not sure just how you mean this. If you mean "will deleting data from a table impact other concurrent users of that table", the answer is yes, although it's impossible to state what the degree of impact will be. If you mean, "will deleting data from a table have long-term impact on access to that table", my answer is that there should be very little long-term affect.
Do I need to rebuild index each delete?
Deleting data from a table is a normal activity in a database, and the indexes will be maintained properly.
Best of luck.

Index rebuilding best practices on Oracle range partitioned table

We have a range partitioned table and about 10 bitmap local indexes for that table. We perform some ddl/dml operations on that table in our daily load, which is truncate a specific partition and load data. when we do this, the local bitmap indexes are not becoming unusable. They are in usable status. However, my question is, even though the indexes are not getting unusable, do we always need to incorporate index rebuilding as part of the best practice for range partitioned tables, or use the index rebuilding only when it is required? because index rebuilding takes time, imagine we have 10 local indexes on that table which has large volume, then it becomes a costly affair for etl.
Please provide me your suggestions or thoughts in this situation?
No a rebuild of local indexes is not required, that is one of the main purpose of an local index.
Local partitioned indexes actually create 'sub index' for each partition, so such 'sub index' can be managed independently from other partitions. And when you truncate partition all its local indexes are truncated either.
Oracle doc:
"You cannot truncate an index partition. However, if local indexes are
defined for the table, the ALTER TABLE ... TRUNCATE PARTITION
statement truncates the matching partition in each local index."
So when you load data to that partition it recreate local indexes. But statistic on that index could be wrong and optimizer can consider don't use index. So consider gathering statistics from such indexes if you don't do it.

SQL Server Table Partitioning, what is happening behind the scenes?

I'm working with table partitioning on extremely large fact table in a warehouse. I have executed the script a few different ways. With and without non clustered indexes. With indexes it appears to dramatically expand the log file while without the non clustered indexes it appears to not expand the log file as much but takes more time to run due to the rebuilding of the indexes.
What I am looking for is any links or information as to what is happening behind the scene specifically to the log file when you split a table partition.
I think it isn't to hard to theorize what is going on (to a certain extent). Behind the scenes each partition is given a different HoBT, which in normal language means each partition is in effect sitting on it's own hidden table.
So theorizing the splitting of a partition (assuming data is moving) would involve:
inserting the data into the new table
removing data from the old table
The NC index can be figured out, but depending on whether there is a clustered index or not, the theorizing will alter. It also matters whether the index is partition aligned or not.
Given a bit more information on the table (CL or Heap) we could theorize this further
If the partition function is used by a
partitioned table and SPLIT results in
partitions where both will contain
data, SQL Server will move the data to
the new partition. This data movement
will cause transaction log growth due
to inserts and deletes.
This is from an article by Microsoft on Partitioned Table and Index Strategies
So looks like its doing a delete from old partition and and insert into the new partition. This could explain the growth in t-log.