I'm writing several hundred or potentially several thousand rows into a set of tables at a time, each of which is heavily indexed both internally and via indexed views.
Generally, the inserts are occurring where the rows inserted will be adjacent in the index.
I expect these inserts to be expensive, but they are really slow. I think part of the performance issue is that the indexes are being updated with each individual INSERT.
Is there a way to tell SQL Server to hold off on updating the indexes until I am finished with my batch of inserts so the index trees will only need to be updated once?
These are executed as separate statements due to needing to show the user a progress bar during save and log any individual issues, but are all coming from the same connection in C#. I can place them in a transaction if needed, though I'd prefer not to.
You are paying the cost of adding those rows to the index one way or another. Not updating the index during the insert would cause an issue with accuracy of concurrent statements - any query on that table that used any of the indexes would not "see" the new rows!
If speed is of the essence, and downtime after the insert isn't a major concern, you can:
Disable non-clustered indexes on the target table
Inert
Rebuild non-clustered indexes
You probably should clarify some more about your table:
How wide is the table?
How many indexes?
How wide are the indexes?
If you have 20 indexes and each index has 5 fields, you are really updating 100 extra fields per row which can get expensive quickly.
Related
The data structure used for indexing in a DB table is B-Tree (default, out of B-Tree, R-Tree, Hash). Since look-ups, deletions, and insertions can all be done in logarithmic time in a B-Tree, then why is only reading from an indexed table is faster and but writing is slower?
Indexes are only used for speeding up SELECT statements. For INSERT, UPDATE, and DELETE your statements will be slower than normal due to the index needing to be updated as part of the statement.
I should maybe clarify on the UPDATE/DELETE point. It is true that the statements will be slowed down due to the change to the index added to the overhead, however the initial lookup part (WHERE) of the UPDATE and DELETE statement could be sped up due to the index. Basically any place a WHERE clause is used and you reference the indexed fields, the record selection part of that statement should see some increase.
Also, if an UPDATE statement does not alter any of the columns that are part of an index then you should not see any additional slowness as the index is not being updated.
Because indexes require additional disk space. Indexes increase the amount of data that needs to be logged and written to a database. Indexes reduce write performance. When a column covered by an index is updated, that index also must be updated. Similarly any deletes or insert requires updating the relevant indexes.
The disk space and write penalties of indexes is precisely why you need to be careful about creating indices.
That said, updates to non-indexed columns can have their performance improved with indexes.
This:
UPDATE Table SET NonIndexedColumn = 'Value' WHERE IndexedKey = 'KeyValue'
Will be faster than this:
UPDATE Table SET IndexedColumn = 'Value' WHERE IndexedKey = 'KeyValue'
But the above two will likely both be faster than this in any reasonably sized table:
UPDATE Table SET NonIndexedColumn = 'Value' WHERE NonIndexedKey = 'KeyValue'
Deletes, especially single deletes, can similarly be faster even though the table and the indexes need to be updated. This is simply because the query engine can find the target row(s) faster. That is, it can be faster to read an index, find the row, and remove the row and update the index, instead of scanning the entire table for the correct rows and removing the relevant ones. However, even in this case there is going to be more data to write; it's just that the IO cost of scanning an entire table could be fairly high compared to an index.
Finally, in theory, a clustering key that spreads inserts across multiple disk pages can allow the system to support more concurrent inserts since inserts typically require page locks to function, but that is a somewhat uncommon situation and it may result in worse read performance due to fragmenting your clustered indexes.
INSERT and DELETE have to update every index for the table (and the heap if there's no clustered index), in order to maintain consistency. UPDATEs may get away with updating fewer indexes, depending on which columns have been affected by the update (because only those indices that index/include those columns have to be updated)
A SELECT, on the other hand, is only reading and so, if an index contains all columns required by the SELECT, only that index has to be accessed. We know that the data in that index is accurate precisely because the modification operations are required to maintain that consistency.
I need to know that if we delete some rows (I am talking for sql server) from a table which has some indexes (clustered or non-clustered, for both situation) can give any damage to indexes or not? What happens to indexes when we delete rows? Which one is better for performance, deleting rows from a table after processing them, or mark up them as processed (When we will need to reuse them like 20 times more). Thanks for the answers.
I don't know what you mean by "damage". When you delete rows from the table, the index entries need to be deleted as well. This does not "damage" the index per se. At least, the index continues to be useful.
If you have lots of deletes, updates, and inserts, then over time the index will be fragmented. This does affect performance. At some point it becomes useful to re-build the index for performance purposes. You can read about this in the documentation.
I would not worry about rebuilding the indexes because of a handful of deletes. It takes a bit of work to really fragment an index.
My answer is YES.
Index is created on data in the tables and in short if data is deleted from the tables then the levels of fragmentation rise.
Rise in fragmentation levels effects the data retrieval in many ways.
I have a table which keeps track of time slots for scheduling. It's a record of "locks" so that 2 users would not book appointments in the same time slot.
We have a deadlock issue in production and I noticed that this table has 5 indexes.
This table is read and written frequently but it would rarely have more than a few hundred rows it.
5 indexes seems like overkill for me. I don't think I need any indexes at all for a table with under 1000 rows.
Am I correct in that assumption?
TIA
Quick, rule-of-thumb answer: drop all indexes except the unique index on the primary key, on relatively small tables (typically less than 100,000 rows).
Indexes does not affect locking. Note, every write operation should change indexes as well. It could have some performance issue, but should not affect locking. If your table has only 1000 rows, you can drop your indexes. Iteration over log1000 is not much faster then iteration over 1000 rows.
This table is read and written frequently but it would rarely have more than a few hundred rows it.
Row locking is regular situation. Deadlock can happen sometimes too. Does your DB provider handle deadlock ? It should release it itself. Probably locking happens cause your try to update rows in order A,B in 1 transcation and second transaction updates rows in B,A order.
I have one job with around 100K records to process. This job truncates the destination tables, and then insert all the records "one at a time", not a batch insert, in these tables.
I need to know how indexes will take affect as these records are inserted? Whether cost of creating index during the job will be more than benefit from using them?
Are there any best practices or optimization hints in such situation?
This kind of question can only be answered on a case-by-case basis. However the following general considerations may be of help:
Unless some of the data for the inserts comes from additional lookups and such, No index is useful during INSERT (i.e. for this very operation, indexes may of course be useful for other queries under other sessions/users....)
[on the other hand...] The presence of indexes on a table slows down the INSERT (or more generally UPDATE or DELETE) operations
The order in which the new records are added may matter
Special consideration is to be had with regards if the table is a clustered index
Deciding whether to drop indexes (all of them or some of them) prior to the INSERT operation depends much on the relative number of records (added vs. readily in)
INSERT operations may often introduce index fragmentation, which is in of itself an additional incentive to mayve drop the index(es) prior to data load, then to rebuild it (them).
In general, adding 100,000 records is "small potatoes" for MS-SQL, and unless of a particular situation such as unusually wide records, or the presence of many (and possibly poorly defined) constraints of various nature, SQL Server should handle this load in minutes rather than hours on most any hardware configuation.
The answer to this question is very different depending if the indexes you're talking about are clustered or not. Clustered indexes force SQL Server to store data in sorted order, so if you try to insert a record that doesn't sort to the bottom the end of your clustered index, your insert can result in significant reshuffling of your data, as many of your records are moved to make room for your new record.
Nonclustered indexes don't have this problem; all the server has to do is keep track of where the new record is stored. So if your index is clustered (most clustered indexes are primary keys, but this is not required; run "sp_helpindex [TABLENAME]" to find out for sure), you would almost certainly be better off adding the index after all of your inserts are done.
As to the performance of inserts on nonclustered indexes, I can't actually tell you; in my experience, the slowdown hasn't been enough to worry about. The index overhead in this case would be vastly outweighed by the overhead of doing all your inserts one at a time.
Edit: Since you have the luxury of truncating your entire table, performance-wise, you're almost certainly better off dropping (or NOCHECKing) your indexes and constraints before doing all of your inserts, then adding them back in at the end.
The insert statement is the only operation that cannot directly benefit from indexing because it has no where clause.
The more the indexes table has more slower execution becomes .
If there are indexes on the table, the database must make sure the new entry is also found via these indexes. For this reason it has to add the new entry to each and every index on that table. The number of indexes is therefore a multiplier for the cost of an insert statement.
Check here
Using SQL Server 2005. This is something I've noticed while doing some performance analysis.
I have a large table with about 100 million rows. I'm comparing the performance of different indexes on the table, to see what the most optimal is for my test scenario which is doing about 10,000 inserts on that table, among other things on other tables. While my test is running, I'm capturing an SQL Profiler trace which I load in to an SQL table when the test has finished so I can analyse the stats.
The first test run after recreating a different set of indexes on the table is very noticeably slower than subsequent runs - typically about 10-15 times slower for the inserts on this table on the first run after the index creation.
Each time, I clear the data and execution plan cache before the test.
What I want to know, is the reason for this initial poorer performance with a newly created set of indexes?
Is there a way I can monitor what is happening to cause this for the first run?
One possibility is that the default fill factor of zero is coming in to play.
This means that there's 'no room' in the index to accommodate your inserts. When you insert, a page split in the index is needed, which adds some empty space to store the new index information. As you carry out more inserts, more space is created in the index. After a while the rate of splitting will go down, because your inserts are hitting pages that are not fully filled, so splits are not needed. An insert requiring page splits is more expensive than one that doesn't.
You can set the fill factor when you create the index. Its a classic trade off between space used and performance of different operations.
I'm going go include a link to some Sybase ASE docs, 'cos they are nicely written and mostly applicable to SQL Server too.
Just to clarify:
1) You build an index on a table with 100m pre-existing rows.
2) You insert 10k rows into the table
3) You insert another 10k rows into the table
Step 3 is 10x faster than step 2?
What kind of index is the new index - not clustered, right? Because inserts on a clustered index will cause very different behavior. In addition, is there any significant difference in the profile of the 2 inserts, because depending on the clustered index, they will have different behavior. Typically, it should either have no clustered index or be clustered on an increasing key.