Anybody have an idea about Table fragmentation in SQL Server(not Index fragmentation). We have a table, this is the main table and its not storing any data permently, data come here and goes out continusly. There is no index on this because only insert and delete statements are running frequently. Recently we faced a huge delay for the response from this table. If we select anything it tooks more than 2 to 5 minuts to return result,even there is very few datas. At last we delete and recreate this table and now it's working very fine. Appreciate If any comments,how this is happining?
Joseph
A table without a clustered index is called a heap. A heap can be fragmented too.
Performance would probably improve if you added a clustered index with an auto-increment primary key. A clustered index does not slow deletes or inserts (to the contrary.) In addition, the scheduled routine maintenance will keep tables with a clustered index defragmented.
If you are selecting parts of the data from the table, then it may well be beneficial to have appropriate indexes on the table.
Related
We have a large data warehouse database where we continuously get new rows inserted in 5 different tables, at the left(Edit: right)-hand side of the b-tree (=at end of the table)
This means that ordinary statistics very quickly gets outdated, regarding the new data.
So we've changed our insert procedure to also do a CREATE STATISTICS FST_xxx
with a WHERE clause that covers the latest two million rows.
This has ensured that we do not get incorrect execution plans.
Now we are stranded with hundreds of these.
We have a cleanup job that runs once a day that will drop unneeded statistics.
but this causes a lot of Deadlocks.
Is there a way to disable Filtered Statistics or to drop Filtered Statistics without causing deadlocks?
(edit:) The table is clustered on a Bigint Identity(1,1) asending.
Can you clarify where the rows are getting inserted? You said left-hand side of the b-tree, but you also said end of the table. Is it correct to assume this a clustered index you're talking about? And whether it's clustered or not, is the index key ascending? Thanks.
Why not try to update the statistics instead of creating a new one every time?
We have 2 tables that have about 40M rows. the size of database is about 20GB, most are for these 2 tables. Everyday, We need delete some data, i.e. about 10M rows. So, we are using batch delete to keep the log file within certain size.
originally, there is no primary key for the table. But has unique, clustered index for each table. the delete takes for ever. i.e. it takes about 2-3 hours to delete 500K rows on a virtual machine. * before delete, the index was rebuilt.
now, we converted the unique, clustered index to primary key. it takes about 20-30 minutes to delete 2M rows.
I understand there is difference between primary key and unique clustered index, but why the performance is so different?
anyone has some insight?
thanks
Rolling my 8-Ball: if you declared a non-clustered primary key (as it seems to suggest from your post) then on each batch you would very likely hit the index tipping point. Thus each batch would do a full scan of 40M rows to delete the batch size. Then, on the next batch, again a full scan. And so on until your 10M would be deleted. With a clustered key the batches should scan only the actual rows being deleted (of course I assume your batch delete criteria would actually use the clustered key...). As you see, there are many unknowns when one starts guessing...
But ultimately... you have a performance question and you should investigate it using the performance troubleshooting techniques. Capture the execution plans, the wait stats, the statistics io. Follow a methodology like Waits and Queues. Measure. Don't listen to guesses from someone on the internet that just rolled an 8-Ball...
You can try to remove the index prior to delete and then re-add it back after. If I'm not mistaken, the index would be re-organized after each delete; which takes the extra time.
I imagine it could be something like your index was very fragmented before one delete operation but not before another. How fragmented was the clustered unique index? You could see if there is still a difference in runtime after doing a rebuild on all indexes before the delete with something like ALTER INDEX ALL ON blah REBUILD
What options did you use when creating your unique clustered index (specifically what are the following set to: PAD_INDEX, STATISTICS_NORECOMPUTE, SORT_IN_TEMPDB, IGNORE_DUP_KEY, ALLOW_ROW_LOCKS, and ALLOW_PAGE_LOCKS)?
I have a table in SQL Server 2008 R2 consisting of about 400 rows (pretty much nothing) - it has a clustered index on the primary key (which is an identity). The table is referenced via referential integrity (no cascade delete or update) by about 13 other tables.
Inserts/Updates/Gets are almost instant - we're talking a split second (as should be expected). However, a delete using the PK takes as long as 3 minutes and I've never seen it faster than 1.5 minutes:
DELETE FROM [TABLE] WHERE [TABLE].[PK_WITH_CLUSTERED_INDEX] = 1
The index was heavily fragmented - 90%. I rebuilt and reorganized that index (along with the rest on that table), but I can't get it below 50%.
Additionally, I did a backup/restore of the database to my local PC and I have no issues with deleting - less than a second.
The one thing I have not done is delete the clustered index entirely and re-add it. That, in and of itself is a problem, because SQL Server does not allow you to drop a PK index when it is referenced by other tables.
Any ideas?
Update
I should have included this in my original post. The execution plan places 'blame' on the clustered index delete - 70%. Of the 13 tables that reference this table, the execution plan says that none exceed more than 3% of the overall query - almost all hit on index seeks.
If you delete a row, the database must check that none of the 13 tables references that row. Are there sufficient indexes on the foreign key columns on those other tables that reference the table you are deleting from?
Well, I have an answer...
First off, I pretty much exhausted all options indicated in the question above along with the associating answers. I had no luck with what seemed like a trivial problem.
What I decided to do was the following:
Add a temporary unique index (so SQL
Server would allow me to delete the
clustered index)
Delete the clustered index.
Re-add the clustered index.
Delete temporary the unique index.
Essentially, I wiped and re-added the clustered index. The only thing I'm able to take away from this is that perhaps part of the index or where it was physically stored was 'corrupted' (I use that term loosely).
Maybe the table is locked by another time-consuming process in production.
Another thought, is there a delete trigger on the table? Could it be causing the issue?
Which option is better and faster?
Insertion of data after creating index on empty table or creating unique index after inserting data. I have around 10M rows to insert. Which option would be better so that I could have least downtime.
Insert your data first, then create your index.
Every time you do an UPDATE, INSERT or DELETE operation, any indexes on the table have to be updated as well. So if you create the index first, and then insert 10M rows, the index will have to be updated 10M times as well (unless you're doing bulk operations).
It is faster and better to insert the records and then create the index after rows have been imported. It's faster because you don't have the overhead of index maintenance as the rows are inserted and it is better from a fragmentation standpoint on your indexes.
Obviously for a unique index, be sure that the data you are importing is unique so you don't have failures when trying to create the index.
As others have said, insert first and add the index later. If the table already exists and you have to insert a pile of data like this, drop all indexes and constraints, insert the data, then re-apply first your indexes and then your constraints. You'll certainly want to do intermediate commits to help preclude the possibility that you'll run out of rollback segment space or something similar. If you're inserting this much data it might prove useful to look at using SQL*Loader to save yourself time and aggravation.
I hope this helps.
I try to insert millions of records into a table that has more than 20 indexes.
In the last run it took more than 4 hours per 100.000 rows, and the query was cancelled after 3½ days...
Do you have any suggestions about how to speed this up.
(I suspect the many indexes to be the cause. If you also think so, how can I automatically drop indexes before the operation, and then create the same indexes afterwards again?)
Extra info:
The space used by the indexes is about 4 times the space used by the data alone
The inserts are wrapped in a transaction per 100.000 rows.
Update on status:
The accepted answer helped me make it much faster.
You can disable and enable the indexes. Note that disabling them can have unwanted side-effects (such as having duplicate primary keys or unique indices etc.) which will only be found when re-enabling the indexes.
--Disable Index
ALTER INDEX [IXYourIndex] ON YourTable DISABLE
GO
--Enable Index
ALTER INDEX [IXYourIndex] ON YourTable REBUILD
GO
This sounds like a data warehouse operation.
It would be normal to drop the indexes before the insert and rebuild them afterwards.
When you rebuild the indexes, build the clustered index first, and conversely drop it last. They should all have fillfactor 100%.
Code should be something like this
if object_id('Index') is not null drop table IndexList
select name into Index from dbo.sysindexes where id = object_id('Fact')
if exists (select name from Index where name = 'id1') drop index Fact.id1
if exists (select name from Index where name = 'id2') drop index Fact.id2
if exists (select name from Index where name = 'id3') drop index Fact.id3
.
.
BIG INSERT
RECREATE THE INDEXES
As noted by another answer disabling indexes will be a very good start.
4 hours per 100.000 rows
[...]
The inserts are wrapped in a transaction per 100.000 rows.
You should look at reducing the number, the server has to maintain a huge amount of state while in a transaction (so it can be rolled back), this (along with the indexes) means adding data is very hard work.
Why not wrap each insert statement in its own transaction?
Also look at the nature of the SQL you are using, are you adding one row per statement (and network roundtrip), or adding many?
Disabling and then re-enabling indices is frequently suggested in those cases. I have my doubts about this approach though, because:
(1) The application's DB user needs schema alteration privileges, which it normally should not possess.
(2) The chosen insert approach and/or index schema might be less then optimal in the first place, otherwise rebuilding complete index trees should not be faster then some decent batch-inserting (e.g. the client issuing one insert statement at a time, causing thousands of server-roundtrips; or a poor choice on the clustered index, leading to constant index node splits).
That's why my suggestions look a little bit different:
Increase ADO.NET BatchSize
Choose the target table's clustered index wisely, so that inserts won't lead to clustered index node splits. Usually an identity column is a good choice
Let the client insert into a temporary heap table first (heap tables don't have any clustered index); then, issue one big "insert-into-select" statement to push all that staging table data into the actual target table
Apply SqlBulkCopy
Decrease transaction logging by choosing bulk-logged recovery model
You might find more detailled information in this article.