I was reding this post by Pinal Dave and found this line:
Unindexed tables are good for fast storing of data. Many times, it is
better to drop all the indexes from table and then do bulk of INSERTs
and restore those indexes after that.
Is that really an effective technique in the case of clustered indexes? I mean, won't it be an overhead then to recreate all those indexes again? And I have also read that records are physically stored in the same order as that of logical records when clustered indexes are used. Then how will this affect the physical storage of records when we drop the index and restore them later?
He's partially correct.
Tables without clustered indexes (known as "heaps") are useful for staging tables for bulk loads. These staging tables are not your final tables. For example, the data you load may be data you already have so you only need to find new, changed and deleted records for your final table.
And yes, it is an overhead to recreate the clustered index. When dropped, data will be allocated wherever. When rebuilt, it will rearrange data on disk.
Related
Performance-wise, does a clustered index help or not when bulk inserting hundreds of millions of rows in a table?
LE: after the INSERTs I have to put the database into production so I will have to create the one or more indexes.
A clustered index specifies that the data is ordered on the data pages.
When you are inserting data, the new data has to be sorted and compared to existing values. This is going to incur overhead.
The one exception is when you have an identity column -- that is being generated during the insert. Then the database knows that the new data goes "at the end" of the table.
Indexes are meant for speeding up retrieval (SELECT) of rows. They only have anti-effect with respect to INSERT or DELETE or UPDATE. And, in your case, if INSERT is the predominant operation to be performed in your system, don't go for indexes at all. Even in your Production system, assess the ratio between retrieval operations and insert/update operations and if it turns out to be that the retrieval operation is going to be dominant, then you can think of indexes.
Note: Whenever we define a Primary Key on a table, a basic index structure is already created for that table. So, without any specific need for retrieval optimization, there is no actual need to design and implement indexes.
You can know more here: https://www.geeksforgeeks.org/sql-indexes/
I need to know that if we delete some rows (I am talking for sql server) from a table which has some indexes (clustered or non-clustered, for both situation) can give any damage to indexes or not? What happens to indexes when we delete rows? Which one is better for performance, deleting rows from a table after processing them, or mark up them as processed (When we will need to reuse them like 20 times more). Thanks for the answers.
I don't know what you mean by "damage". When you delete rows from the table, the index entries need to be deleted as well. This does not "damage" the index per se. At least, the index continues to be useful.
If you have lots of deletes, updates, and inserts, then over time the index will be fragmented. This does affect performance. At some point it becomes useful to re-build the index for performance purposes. You can read about this in the documentation.
I would not worry about rebuilding the indexes because of a handful of deletes. It takes a bit of work to really fragment an index.
My answer is YES.
Index is created on data in the tables and in short if data is deleted from the tables then the levels of fragmentation rise.
Rise in fragmentation levels effects the data retrieval in many ways.
Is there any reason I shouldn't create an index for every one of my database tables, as a means of increasing performance? It would seem there must be some reason(s) else all tables would automatically have one by default.
I use MS SQL Server 2016.
One index on a table is not a big deal. You automatically have an index on columns (or combinations of columns) that are primary keys or declared as unique.
There is some overhead to an index. The index itself occupies space on disk and memory (when used). So, if space or memory are issues then too many indexes could be a problem. When data is inserted/updated/deleted, then the index needs to be maintained as well as the original data. This slows down updates and locks the tables (or parts of the tables), which can affect query processing.
A small number of indexes on each table are reasonable. These should be designed with the typical query load in mind. If you index every column in every table, then data modifications would slow down. If your data is static, then this is not an issue. However, eating up all the memory with indexes could be an issue.
Advantage of having an index
Read speed: Faster SELECT when that column is in WHERE clause
Disadvantages of having an index
Space: Additional disk/memory space needed
Write speed: Slower INSERT / UPDATE / DELETE
As a minimum I would normally recommend having at least 1 index per table, this would be automatically created on your tables primary key, for example an IDENTITY column. Then foreign keys would normally benefit from an index, this will need to be created manually. Other columns that are frequently included in WHERE clauses should be indexed, especially if they contain lots of unique values. The benefit of indexing columns, such as gender (low-cardinality) when this only has 2 values is debatable.
Most of the tables in my databases have between 1 and 4 indexes, depending on the data in the table and how this data is retrieved.
We have a range partitioned table and about 10 bitmap local indexes for that table. We perform some ddl/dml operations on that table in our daily load, which is truncate a specific partition and load data. when we do this, the local bitmap indexes are not becoming unusable. They are in usable status. However, my question is, even though the indexes are not getting unusable, do we always need to incorporate index rebuilding as part of the best practice for range partitioned tables, or use the index rebuilding only when it is required? because index rebuilding takes time, imagine we have 10 local indexes on that table which has large volume, then it becomes a costly affair for etl.
Please provide me your suggestions or thoughts in this situation?
No a rebuild of local indexes is not required, that is one of the main purpose of an local index.
Local partitioned indexes actually create 'sub index' for each partition, so such 'sub index' can be managed independently from other partitions. And when you truncate partition all its local indexes are truncated either.
Oracle doc:
"You cannot truncate an index partition. However, if local indexes are
defined for the table, the ALTER TABLE ... TRUNCATE PARTITION
statement truncates the matching partition in each local index."
So when you load data to that partition it recreate local indexes. But statistic on that index could be wrong and optimizer can consider don't use index. So consider gathering statistics from such indexes if you don't do it.
I have a temp table #Data, that I populate inside a stored procedure.
It contains like 15M rows.
Then I create a clustered index, say IX_Data, for a couple of column of the temp table #Data.
Then I delete from #Data which deletes like 1M rows (keeping the total rows 14M now).
My question: at this point, should I drop IX_Data and recreate it?
#Data is being referred further in the rest of the stored procedure just at one place.
You should not. Indexes are maintained by dbms automatically and always kept in sync.
That's why it's not recommended to create more indexes than you need since it's a performance penalty for every DML query.
You don't specify which database you are on (maybe it's Oracle?), but your question seems about fragmentation, since data integrity should be mantained in every database even if you delete a million of rows (so there's no need to drop and recreate an index for data integrity).
So, how do you know if your index is fragmented (so you need to recreate, or better, rebuild it)? Every database has his method. Oracle has internal tables from which you can determine if an object is fragmented.
But it depends on your type of database.