Does deleting rows from a table distrupts indexes? - sql

I need to know that if we delete some rows (I am talking for sql server) from a table which has some indexes (clustered or non-clustered, for both situation) can give any damage to indexes or not? What happens to indexes when we delete rows? Which one is better for performance, deleting rows from a table after processing them, or mark up them as processed (When we will need to reuse them like 20 times more). Thanks for the answers.

I don't know what you mean by "damage". When you delete rows from the table, the index entries need to be deleted as well. This does not "damage" the index per se. At least, the index continues to be useful.
If you have lots of deletes, updates, and inserts, then over time the index will be fragmented. This does affect performance. At some point it becomes useful to re-build the index for performance purposes. You can read about this in the documentation.
I would not worry about rebuilding the indexes because of a handful of deletes. It takes a bit of work to really fragment an index.

My answer is YES.
Index is created on data in the tables and in short if data is deleted from the tables then the levels of fragmentation rise.
Rise in fragmentation levels effects the data retrieval in many ways.

Related

Why is only reading faster in an indexed table and not writing?

The data structure used for indexing in a DB table is B-Tree (default, out of B-Tree, R-Tree, Hash). Since look-ups, deletions, and insertions can all be done in logarithmic time in a B-Tree, then why is only reading from an indexed table is faster and but writing is slower?
Indexes are only used for speeding up SELECT statements. For INSERT, UPDATE, and DELETE your statements will be slower than normal due to the index needing to be updated as part of the statement.
I should maybe clarify on the UPDATE/DELETE point. It is true that the statements will be slowed down due to the change to the index added to the overhead, however the initial lookup part (WHERE) of the UPDATE and DELETE statement could be sped up due to the index. Basically any place a WHERE clause is used and you reference the indexed fields, the record selection part of that statement should see some increase.
Also, if an UPDATE statement does not alter any of the columns that are part of an index then you should not see any additional slowness as the index is not being updated.
Because indexes require additional disk space. Indexes increase the amount of data that needs to be logged and written to a database. Indexes reduce write performance. When a column covered by an index is updated, that index also must be updated. Similarly any deletes or insert requires updating the relevant indexes.
The disk space and write penalties of indexes is precisely why you need to be careful about creating indices.
That said, updates to non-indexed columns can have their performance improved with indexes.
This:
UPDATE Table SET NonIndexedColumn = 'Value' WHERE IndexedKey = 'KeyValue'
Will be faster than this:
UPDATE Table SET IndexedColumn = 'Value' WHERE IndexedKey = 'KeyValue'
But the above two will likely both be faster than this in any reasonably sized table:
UPDATE Table SET NonIndexedColumn = 'Value' WHERE NonIndexedKey = 'KeyValue'
Deletes, especially single deletes, can similarly be faster even though the table and the indexes need to be updated. This is simply because the query engine can find the target row(s) faster. That is, it can be faster to read an index, find the row, and remove the row and update the index, instead of scanning the entire table for the correct rows and removing the relevant ones. However, even in this case there is going to be more data to write; it's just that the IO cost of scanning an entire table could be fairly high compared to an index.
Finally, in theory, a clustering key that spreads inserts across multiple disk pages can allow the system to support more concurrent inserts since inserts typically require page locks to function, but that is a somewhat uncommon situation and it may result in worse read performance due to fragmenting your clustered indexes.
INSERT and DELETE have to update every index for the table (and the heap if there's no clustered index), in order to maintain consistency. UPDATEs may get away with updating fewer indexes, depending on which columns have been affected by the update (because only those indices that index/include those columns have to be updated)
A SELECT, on the other hand, is only reading and so, if an index contains all columns required by the SELECT, only that index has to be accessed. We know that the data in that index is accurate precisely because the modification operations are required to maintain that consistency.

SQL When NOT to use an index

I have a table which keeps track of time slots for scheduling. It's a record of "locks" so that 2 users would not book appointments in the same time slot.
We have a deadlock issue in production and I noticed that this table has 5 indexes.
This table is read and written frequently but it would rarely have more than a few hundred rows it.
5 indexes seems like overkill for me. I don't think I need any indexes at all for a table with under 1000 rows.
Am I correct in that assumption?
TIA
Quick, rule-of-thumb answer: drop all indexes except the unique index on the primary key, on relatively small tables (typically less than 100,000 rows).
Indexes does not affect locking. Note, every write operation should change indexes as well. It could have some performance issue, but should not affect locking. If your table has only 1000 rows, you can drop your indexes. Iteration over log1000 is not much faster then iteration over 1000 rows.
This table is read and written frequently but it would rarely have more than a few hundred rows it.
Row locking is regular situation. Deadlock can happen sometimes too. Does your DB provider handle deadlock ? It should release it itself. Probably locking happens cause your try to update rows in order A,B in 1 transcation and second transaction updates rows in B,A order.

Batch index updates?

I'm writing several hundred or potentially several thousand rows into a set of tables at a time, each of which is heavily indexed both internally and via indexed views.
Generally, the inserts are occurring where the rows inserted will be adjacent in the index.
I expect these inserts to be expensive, but they are really slow. I think part of the performance issue is that the indexes are being updated with each individual INSERT.
Is there a way to tell SQL Server to hold off on updating the indexes until I am finished with my batch of inserts so the index trees will only need to be updated once?
These are executed as separate statements due to needing to show the user a progress bar during save and log any individual issues, but are all coming from the same connection in C#. I can place them in a transaction if needed, though I'd prefer not to.
You are paying the cost of adding those rows to the index one way or another. Not updating the index during the insert would cause an issue with accuracy of concurrent statements - any query on that table that used any of the indexes would not "see" the new rows!
If speed is of the essence, and downtime after the insert isn't a major concern, you can:
Disable non-clustered indexes on the target table
Inert
Rebuild non-clustered indexes
You probably should clarify some more about your table:
How wide is the table?
How many indexes?
How wide are the indexes?
If you have 20 indexes and each index has 5 fields, you are really updating 100 extra fields per row which can get expensive quickly.

To what degree can effective indexing overcome performance issues with VERY large tables?

So, it seems to me like a query on a table with 10k records and a query on a table with 10mil records are almost equally fast if they are both fetching roughly the same number of records and making good use of simple indexes(auto increment, record id type indexed field).
My question is, will this extend to a table with close to 4 billion records if it is indexed properly and the database is set up in such a way that queries always use those indexes effectively?
Also, I know that inserting new records in to a very large indexed table can be very slow because all the indexes have to be recalculated, if I add new records only to the end of the table can I avoid that slow down, or will that not work because the index is a binary tree and a large chunk of the tree will still have to be recalculated?
Finally, I looked around a bit for a FAQs/caveats about working with very large tables, but couldn't really find one, so if anyone knows of something like that, that link would be appreciated.
Here is some good reading about large tables and the effects of indexing on them, including cost/benefit, as you requested:
http://www.dba-oracle.com/t_indexing_power.htm
Indexing very large tables (as with anything database related) depends on many factors, incuding your access patterns, ratio of Reads to Writes and size of available RAM.
If you can fit your 'hot' (i.e. frequently accessed index pages) into memory then accesses will generally be fast.
The strategy used to index very large tables, is using partitioned tables and partitioned indexes. BUT if your query does not join or filter on the partition key then there will no improvement in performance over an unpartitioned table i.e. no partition elimination.
SQL Server Database Partitioning Myths and Truths
Oracle Partitioned Tables and Indexes
It's very important to keep your indexes as narrow as possible.
Kimberly Tripp's The Clustered Index Debate Continues...(SQL Server)
Accessing the data via a unique index lookup will slow down as the table gets very large, but not by much. The index is stored as a B-tree structure in Postgres (not binary tree which only has two children per node), so a 10k row table might have 2 levels whereas a 10B row table might have 4 levels (depending on the width of the rows). So as the table gets ridiculously large it might go to 5 levels or higher, but this only means one extra page read so is probably not noticeable.
When you insert new rows, you cant control where they are inserted in the physical layout of the table so I assume you mean "end of the table" in terms of using the maximum value being indexed. I know Oracle has some optimisations around leaf block splitting in this case, but I dont know about Postgres.
If it is indexed properly, insert performance may be impacted more than select performance. Indexes in PostgreSQL have vast numbers of options which can allow you to index part of a table or the output of an immutable function on tuples in the table. Also size of the index, assuming it is usable, will affect speed much more slowly than will the actual scan of the table. The biggest difference is between searching a tree and scanning a list. Of course you still have disk I/O and memory overhead that goes into index usage, and so large indexes don't perform as well as they theoretically could.

Index and Insert Operations

I have one job with around 100K records to process. This job truncates the destination tables, and then insert all the records "one at a time", not a batch insert, in these tables.
I need to know how indexes will take affect as these records are inserted? Whether cost of creating index during the job will be more than benefit from using them?
Are there any best practices or optimization hints in such situation?
This kind of question can only be answered on a case-by-case basis. However the following general considerations may be of help:
Unless some of the data for the inserts comes from additional lookups and such, No index is useful during INSERT (i.e. for this very operation, indexes may of course be useful for other queries under other sessions/users....)
[on the other hand...] The presence of indexes on a table slows down the INSERT (or more generally UPDATE or DELETE) operations
The order in which the new records are added may matter
Special consideration is to be had with regards if the table is a clustered index
Deciding whether to drop indexes (all of them or some of them) prior to the INSERT operation depends much on the relative number of records (added vs. readily in)
INSERT operations may often introduce index fragmentation, which is in of itself an additional incentive to mayve drop the index(es) prior to data load, then to rebuild it (them).
In general, adding 100,000 records is "small potatoes" for MS-SQL, and unless of a particular situation such as unusually wide records, or the presence of many (and possibly poorly defined) constraints of various nature, SQL Server should handle this load in minutes rather than hours on most any hardware configuation.
The answer to this question is very different depending if the indexes you're talking about are clustered or not. Clustered indexes force SQL Server to store data in sorted order, so if you try to insert a record that doesn't sort to the bottom the end of your clustered index, your insert can result in significant reshuffling of your data, as many of your records are moved to make room for your new record.
Nonclustered indexes don't have this problem; all the server has to do is keep track of where the new record is stored. So if your index is clustered (most clustered indexes are primary keys, but this is not required; run "sp_helpindex [TABLENAME]" to find out for sure), you would almost certainly be better off adding the index after all of your inserts are done.
As to the performance of inserts on nonclustered indexes, I can't actually tell you; in my experience, the slowdown hasn't been enough to worry about. The index overhead in this case would be vastly outweighed by the overhead of doing all your inserts one at a time.
Edit: Since you have the luxury of truncating your entire table, performance-wise, you're almost certainly better off dropping (or NOCHECKing) your indexes and constraints before doing all of your inserts, then adding them back in at the end.
The insert statement is the only operation that cannot directly benefit from indexing because it has no where clause.
The more the indexes table has more slower execution becomes .
If there are indexes on the table, the database must make sure the new entry is also found via these indexes. For this reason it has to add the new entry to each and every index on that table. The number of indexes is therefore a multiplier for the cost of an insert statement.
Check here