We have a range partitioned table and about 10 bitmap local indexes for that table. We perform some ddl/dml operations on that table in our daily load, which is truncate a specific partition and load data. when we do this, the local bitmap indexes are not becoming unusable. They are in usable status. However, my question is, even though the indexes are not getting unusable, do we always need to incorporate index rebuilding as part of the best practice for range partitioned tables, or use the index rebuilding only when it is required? because index rebuilding takes time, imagine we have 10 local indexes on that table which has large volume, then it becomes a costly affair for etl.
Please provide me your suggestions or thoughts in this situation?
No a rebuild of local indexes is not required, that is one of the main purpose of an local index.
Local partitioned indexes actually create 'sub index' for each partition, so such 'sub index' can be managed independently from other partitions. And when you truncate partition all its local indexes are truncated either.
Oracle doc:
"You cannot truncate an index partition. However, if local indexes are
defined for the table, the ALTER TABLE ... TRUNCATE PARTITION
statement truncates the matching partition in each local index."
So when you load data to that partition it recreate local indexes. But statistic on that index could be wrong and optimizer can consider don't use index. So consider gathering statistics from such indexes if you don't do it.
Related
I'm currently using MSSQL Server, I've created a table with indexes on 4 columns. I plan on appending 1mm rows every month end. Is it customary to drop the indexes, and recreate them every time you add data to the table?
Don't recreate the index. Instead, you can use update statistics to compute the statistics for the given index or for the whole table:
UPDATE STATISTICS mytable myindex; -- statistics for the table index
UPDATE STATISTICS mytable; -- statistics for the whole table
I don't think it is customary, but it is not uncommon. Presumably the database would not be used for other tasks during the data load, otherwise, well, you'll have other problems.
It could save time and effort if you just disabled the indexes:
ALTER INDEX IX_MyIndex ON dbo.MyTable DISABLE
More info on this non-trivial topic can be found here. Note especially that disabling the clustered index will block all access to the table (i.e. don't do that). If the data being loaded is ordered in [clustered index] order, that can help some.
A last note, do some testing. 1MM rows doesn't seem like that much; the time you save may get used up by recreating the indexes.
I had a table which was having a global index(b-tree) on a date column.
We created range partition on it for a better retrieval performance. The partition key we used is the same column which has indexes on it.
Later we moved the global indexes to local by dropping the recreating the local index on same date column.
Now we have local index and partition key on same column.But after this the dataload into this table is taking three times more of the usual time
What could be the reason that if the partition key and local index are on the same column of the table then the dataload takes more time?
When we checked the explain plan , we figured that the queries are not using local indexes. why it does not use the local index in this case?
is there any hidden inbuilt indexes attached with the partion key as well , which oracle uses in place of the local index?
What could be solution to this so the performance of dataload doesnot get impacted?
Any leads on this would be highly appreciated.
I was reding this post by Pinal Dave and found this line:
Unindexed tables are good for fast storing of data. Many times, it is
better to drop all the indexes from table and then do bulk of INSERTs
and restore those indexes after that.
Is that really an effective technique in the case of clustered indexes? I mean, won't it be an overhead then to recreate all those indexes again? And I have also read that records are physically stored in the same order as that of logical records when clustered indexes are used. Then how will this affect the physical storage of records when we drop the index and restore them later?
He's partially correct.
Tables without clustered indexes (known as "heaps") are useful for staging tables for bulk loads. These staging tables are not your final tables. For example, the data you load may be data you already have so you only need to find new, changed and deleted records for your final table.
And yes, it is an overhead to recreate the clustered index. When dropped, data will be allocated wherever. When rebuilt, it will rearrange data on disk.
I have a table of about 60GB and I'm trying to create an index,
and its very slow (almost a day, and still running!)
I see most of the time is on Disk I/O(4MB/s), and it doesn't use the memory or cpu so much
I tried: running 'pragma cache_zise = 10000' and 'pragma page_zise=4000'
(after I created the table), and its still doesn't help.
How can I make the 'create index' run in reasonable time?
Creating an index on a database table is a one time operation and it can be expensive based on many factors ranging from how many fields and of what type are included in the index, the size of the data table that is to be indexed, the hardware of the machine the database is running on, and possibly even more.
To give a reasonable answer on speeding things up, we would need to know the schema of the table, the definition of the index you are creating, are you reasonably sure if you are including uniqueness in your index that the data is actually unique, what are the hardware specs of your server, what are your disk speeds, how much available space on the disks, are you using a raid array, what level of raid, how much ram do you have and what is the utilization. etc...
Now all that said, this might be faster but I have not tested it.
make a structurally duplicate table of the table you wish to index.
Add the index to the new empty table.
copy the data from the old table to the new table in chunks.
drop the old table.
My theory is that it will be less expensive to index the data as it is added than to dig through the data that is already there and add the index after the fact.
When you create table,you should create the index. PS:you should consider the index is properly.and you need not to create the index at runtime.
I'm working with table partitioning on extremely large fact table in a warehouse. I have executed the script a few different ways. With and without non clustered indexes. With indexes it appears to dramatically expand the log file while without the non clustered indexes it appears to not expand the log file as much but takes more time to run due to the rebuilding of the indexes.
What I am looking for is any links or information as to what is happening behind the scene specifically to the log file when you split a table partition.
I think it isn't to hard to theorize what is going on (to a certain extent). Behind the scenes each partition is given a different HoBT, which in normal language means each partition is in effect sitting on it's own hidden table.
So theorizing the splitting of a partition (assuming data is moving) would involve:
inserting the data into the new table
removing data from the old table
The NC index can be figured out, but depending on whether there is a clustered index or not, the theorizing will alter. It also matters whether the index is partition aligned or not.
Given a bit more information on the table (CL or Heap) we could theorize this further
If the partition function is used by a
partitioned table and SPLIT results in
partitions where both will contain
data, SQL Server will move the data to
the new partition. This data movement
will cause transaction log growth due
to inserts and deletes.
This is from an article by Microsoft on Partitioned Table and Index Strategies
So looks like its doing a delete from old partition and and insert into the new partition. This could explain the growth in t-log.