Why is only reading faster in an indexed table and not writing? - sql

The data structure used for indexing in a DB table is B-Tree (default, out of B-Tree, R-Tree, Hash). Since look-ups, deletions, and insertions can all be done in logarithmic time in a B-Tree, then why is only reading from an indexed table is faster and but writing is slower?

Indexes are only used for speeding up SELECT statements. For INSERT, UPDATE, and DELETE your statements will be slower than normal due to the index needing to be updated as part of the statement.
I should maybe clarify on the UPDATE/DELETE point. It is true that the statements will be slowed down due to the change to the index added to the overhead, however the initial lookup part (WHERE) of the UPDATE and DELETE statement could be sped up due to the index. Basically any place a WHERE clause is used and you reference the indexed fields, the record selection part of that statement should see some increase.
Also, if an UPDATE statement does not alter any of the columns that are part of an index then you should not see any additional slowness as the index is not being updated.

Because indexes require additional disk space. Indexes increase the amount of data that needs to be logged and written to a database. Indexes reduce write performance. When a column covered by an index is updated, that index also must be updated. Similarly any deletes or insert requires updating the relevant indexes.
The disk space and write penalties of indexes is precisely why you need to be careful about creating indices.
That said, updates to non-indexed columns can have their performance improved with indexes.
This:
UPDATE Table SET NonIndexedColumn = 'Value' WHERE IndexedKey = 'KeyValue'
Will be faster than this:
UPDATE Table SET IndexedColumn = 'Value' WHERE IndexedKey = 'KeyValue'
But the above two will likely both be faster than this in any reasonably sized table:
UPDATE Table SET NonIndexedColumn = 'Value' WHERE NonIndexedKey = 'KeyValue'
Deletes, especially single deletes, can similarly be faster even though the table and the indexes need to be updated. This is simply because the query engine can find the target row(s) faster. That is, it can be faster to read an index, find the row, and remove the row and update the index, instead of scanning the entire table for the correct rows and removing the relevant ones. However, even in this case there is going to be more data to write; it's just that the IO cost of scanning an entire table could be fairly high compared to an index.
Finally, in theory, a clustering key that spreads inserts across multiple disk pages can allow the system to support more concurrent inserts since inserts typically require page locks to function, but that is a somewhat uncommon situation and it may result in worse read performance due to fragmenting your clustered indexes.

INSERT and DELETE have to update every index for the table (and the heap if there's no clustered index), in order to maintain consistency. UPDATEs may get away with updating fewer indexes, depending on which columns have been affected by the update (because only those indices that index/include those columns have to be updated)
A SELECT, on the other hand, is only reading and so, if an index contains all columns required by the SELECT, only that index has to be accessed. We know that the data in that index is accurate precisely because the modification operations are required to maintain that consistency.

Related

SQL Server : large data import with clustered index

Performance-wise, does a clustered index help or not when bulk inserting hundreds of millions of rows in a table?
LE: after the INSERTs I have to put the database into production so I will have to create the one or more indexes.
A clustered index specifies that the data is ordered on the data pages.
When you are inserting data, the new data has to be sorted and compared to existing values. This is going to incur overhead.
The one exception is when you have an identity column -- that is being generated during the insert. Then the database knows that the new data goes "at the end" of the table.
Indexes are meant for speeding up retrieval (SELECT) of rows. They only have anti-effect with respect to INSERT or DELETE or UPDATE. And, in your case, if INSERT is the predominant operation to be performed in your system, don't go for indexes at all. Even in your Production system, assess the ratio between retrieval operations and insert/update operations and if it turns out to be that the retrieval operation is going to be dominant, then you can think of indexes.
Note: Whenever we define a Primary Key on a table, a basic index structure is already created for that table. So, without any specific need for retrieval optimization, there is no actual need to design and implement indexes.
You can know more here: https://www.geeksforgeeks.org/sql-indexes/

Does deleting rows from a table distrupts indexes?

I need to know that if we delete some rows (I am talking for sql server) from a table which has some indexes (clustered or non-clustered, for both situation) can give any damage to indexes or not? What happens to indexes when we delete rows? Which one is better for performance, deleting rows from a table after processing them, or mark up them as processed (When we will need to reuse them like 20 times more). Thanks for the answers.
I don't know what you mean by "damage". When you delete rows from the table, the index entries need to be deleted as well. This does not "damage" the index per se. At least, the index continues to be useful.
If you have lots of deletes, updates, and inserts, then over time the index will be fragmented. This does affect performance. At some point it becomes useful to re-build the index for performance purposes. You can read about this in the documentation.
I would not worry about rebuilding the indexes because of a handful of deletes. It takes a bit of work to really fragment an index.
My answer is YES.
Index is created on data in the tables and in short if data is deleted from the tables then the levels of fragmentation rise.
Rise in fragmentation levels effects the data retrieval in many ways.

What are the disadvantages to Indexes in database tables?

Is there any reason I shouldn't create an index for every one of my database tables, as a means of increasing performance? It would seem there must be some reason(s) else all tables would automatically have one by default.
I use MS SQL Server 2016.
One index on a table is not a big deal. You automatically have an index on columns (or combinations of columns) that are primary keys or declared as unique.
There is some overhead to an index. The index itself occupies space on disk and memory (when used). So, if space or memory are issues then too many indexes could be a problem. When data is inserted/updated/deleted, then the index needs to be maintained as well as the original data. This slows down updates and locks the tables (or parts of the tables), which can affect query processing.
A small number of indexes on each table are reasonable. These should be designed with the typical query load in mind. If you index every column in every table, then data modifications would slow down. If your data is static, then this is not an issue. However, eating up all the memory with indexes could be an issue.
Advantage of having an index
Read speed: Faster SELECT when that column is in WHERE clause
Disadvantages of having an index
Space: Additional disk/memory space needed
Write speed: Slower INSERT / UPDATE / DELETE
As a minimum I would normally recommend having at least 1 index per table, this would be automatically created on your tables primary key, for example an IDENTITY column. Then foreign keys would normally benefit from an index, this will need to be created manually. Other columns that are frequently included in WHERE clauses should be indexed, especially if they contain lots of unique values. The benefit of indexing columns, such as gender (low-cardinality) when this only has 2 values is debatable.
Most of the tables in my databases have between 1 and 4 indexes, depending on the data in the table and how this data is retrieved.

Batch index updates?

I'm writing several hundred or potentially several thousand rows into a set of tables at a time, each of which is heavily indexed both internally and via indexed views.
Generally, the inserts are occurring where the rows inserted will be adjacent in the index.
I expect these inserts to be expensive, but they are really slow. I think part of the performance issue is that the indexes are being updated with each individual INSERT.
Is there a way to tell SQL Server to hold off on updating the indexes until I am finished with my batch of inserts so the index trees will only need to be updated once?
These are executed as separate statements due to needing to show the user a progress bar during save and log any individual issues, but are all coming from the same connection in C#. I can place them in a transaction if needed, though I'd prefer not to.
You are paying the cost of adding those rows to the index one way or another. Not updating the index during the insert would cause an issue with accuracy of concurrent statements - any query on that table that used any of the indexes would not "see" the new rows!
If speed is of the essence, and downtime after the insert isn't a major concern, you can:
Disable non-clustered indexes on the target table
Inert
Rebuild non-clustered indexes
You probably should clarify some more about your table:
How wide is the table?
How many indexes?
How wide are the indexes?
If you have 20 indexes and each index has 5 fields, you are really updating 100 extra fields per row which can get expensive quickly.

Index and Insert Operations

I have one job with around 100K records to process. This job truncates the destination tables, and then insert all the records "one at a time", not a batch insert, in these tables.
I need to know how indexes will take affect as these records are inserted? Whether cost of creating index during the job will be more than benefit from using them?
Are there any best practices or optimization hints in such situation?
This kind of question can only be answered on a case-by-case basis. However the following general considerations may be of help:
Unless some of the data for the inserts comes from additional lookups and such, No index is useful during INSERT (i.e. for this very operation, indexes may of course be useful for other queries under other sessions/users....)
[on the other hand...] The presence of indexes on a table slows down the INSERT (or more generally UPDATE or DELETE) operations
The order in which the new records are added may matter
Special consideration is to be had with regards if the table is a clustered index
Deciding whether to drop indexes (all of them or some of them) prior to the INSERT operation depends much on the relative number of records (added vs. readily in)
INSERT operations may often introduce index fragmentation, which is in of itself an additional incentive to mayve drop the index(es) prior to data load, then to rebuild it (them).
In general, adding 100,000 records is "small potatoes" for MS-SQL, and unless of a particular situation such as unusually wide records, or the presence of many (and possibly poorly defined) constraints of various nature, SQL Server should handle this load in minutes rather than hours on most any hardware configuation.
The answer to this question is very different depending if the indexes you're talking about are clustered or not. Clustered indexes force SQL Server to store data in sorted order, so if you try to insert a record that doesn't sort to the bottom the end of your clustered index, your insert can result in significant reshuffling of your data, as many of your records are moved to make room for your new record.
Nonclustered indexes don't have this problem; all the server has to do is keep track of where the new record is stored. So if your index is clustered (most clustered indexes are primary keys, but this is not required; run "sp_helpindex [TABLENAME]" to find out for sure), you would almost certainly be better off adding the index after all of your inserts are done.
As to the performance of inserts on nonclustered indexes, I can't actually tell you; in my experience, the slowdown hasn't been enough to worry about. The index overhead in this case would be vastly outweighed by the overhead of doing all your inserts one at a time.
Edit: Since you have the luxury of truncating your entire table, performance-wise, you're almost certainly better off dropping (or NOCHECKing) your indexes and constraints before doing all of your inserts, then adding them back in at the end.
The insert statement is the only operation that cannot directly benefit from indexing because it has no where clause.
The more the indexes table has more slower execution becomes .
If there are indexes on the table, the database must make sure the new entry is also found via these indexes. For this reason it has to add the new entry to each and every index on that table. The number of indexes is therefore a multiplier for the cost of an insert statement.
Check here