Removing composite PK and adding new index is incredibly slow

Removing composite PK and adding new index is incredibly slow - sql

I currently have a composite PK (clustered) consisting of 3 columns, let's call them A, B and C, all needed to ensure uniqueness. Due to external factors I need to modify this table by removing the current PK and adding a new index on a new column instead.
This is done by a standard
ALTER TABLE Table_Name DROP CONSTRAINT PK_Name
CREATE INDEX Index_Name ON Table_Name (NewColumn)
The problem is that the table is huge (some 70 million rows) and performing a drop on the current PK and then adding the new index takes over an hour. Is there any way to fix the situation in a more performance efficient way?
The table only has its composite PK, so no NCIs or FKs and other dependencies to worry about. I am working on SQL Server 2008.

One alternative is to create a new table with the right constraints. Then you copy over all the rows. At the end, you drop the old table, and sp_rename the new one. The rename is fast and that reduces downtime.
You might have to put some thought into rows that are being added during the copy operation. One way to deal with that is to rename to old table, and then copy any new rows over, and only after that rename the new table. This still results in a much shorter downtime than an altering the table in-place.

Related

Dropping a global temp table does not drop its index

Problem summary:
I created a global temp table and then I created a clustered index (added a primary key clustered constraint) on the table. I dropped the table assuming that it will also drop the index. I then recreated the table with the same name without problems. Then, when I tried to recreate the same index with the same name I get the error:
The CREATE UNIQUE INDEX statement terminated because a duplicate key was found for the object name 'dbo.##MyTempTable' and the index name 'PK_TempSampleID'.
Problem details:
I need to create a global temp table and load records into this table in a particular order using:
SELECT FROM SomeTable INTO ##MyTempTable WHERE SomeCondition
ORDER BY does not do the job here, so I created a clustered index (primary key clustered constraint). This did the trick and my records were ordered by the respective field.
However, I encountered a different problem: I dropped the table and recreated it without problems but when I tried recreating the index I got the above mentioned error.
I tried dropping the index using:
DROP INDEX PK_TempSampleID ON ##MyTempTable
And I got this error message:
Cannot drop the index '##TempFormattedSnapshot.PK_TempSampleID', because it does not exist or you do not have permission.
I researched posts on this and other forums and all seem to advise that when dropping a temp table the index on that table is also dropped. My experience shows me otherwise.

The error message
The CREATE UNIQUE INDEX statement terminated because a duplicate key
was found for the object name 'dbo.##MyTempTable' and the index name
'PK_TempSampleID'.
means that there are duplicate values in the column(s) for which you are trying to create a unique index. In other words, not all values are unique.
It has nothing to do with the previously existed index of the same name.
When you drop a table, all indexes on that table are dropped as well.
You have another problem, which is out of the scope of this question. You said "load records into this table in a particular order". You'd better ask another question explaining what you are trying to achieve, because there is no such thing as "inserting rows into a table in a particular order". The only thing that you can do is generate IDENTITY values in a particular order when inserting rows.
Anyway, this is not the point of this question.

Checking foreign key constraint "online"

If we have a giant fact table and want to add a new dimension, we can do it like this:
BEGIN TRANSACTION
ALTER TABLE [GiantFactTable]
ADD NewDimValueId INT NOT NULL
CONSTRAINT [temp_DF_NewDimValueId] DEFAULT (-1)
WITH VALUES -- table is not actually rebuilt!
ALTER TABLE [GiantFactTable]
WITH NOCHECK
ADD CONSTRAINT [FK_GiantFactTable_NewDimValue]
FOREIGN KEY ([NewDimValueId])
REFERENCES [NewDimValue] ([Id])
-- drop the default constraint, new INSERTs will specify a value for NewDimValueId column
ALTER TABLE [GiantFactTable]
DROP CONSTRAINT [temp_DF_NewDimValueId]
COMMIT TRANSACTION
NB: all of the above only manipulate table metadata and should be fast regardless of table size.
Then we can run a job to backfill GiantFactTable.NewDimValueId in small transactions, such that the FK is not violated. (At this point any INSERTs/UPDATEs - e.g. backfill operation - are verified by the FK since it's enabled, but not "trusted")
After the backfill we know the data is consistent, my question is how can SQL engine become enlightened too? Without taking the table offline.
This command will make the FK trusted but it requires a schema modification (Sch-M) lock and likely take hours (days?) taking the table offline:
ALTER TABLE [GiantFactTable]
WITH CHECK CHECK CONSTRAINT [FK_GiantFactTable_NewDimValue]
About the workload: Table has a few hundred partitions (fixed number), data is appended to one partition at a time (in a round-robin fashion), never deleted. There is also a constant read workload that uses the clustering key to get a (relatively small) range of rows from one partition at a time.
Checking one partition at a time, taking it offline, would be acceptable. But I can't find any syntax to do this. Any other ideas?

A few ideas come to mind but they aren't pretty:
Redirect workloads and run check constraint offline
Create a new table with the same structure.
Change the "insert" workload to insert into the new table
Copy the data from the partition used by the "read" workload to the new table (or a third table with the same structure)
Change the "read" workload to use the new table
Run alter table to check the constraint and let it take as long as it needs
Change the both workloads back to the main table.
Insert the new rows back into the main table
Drop new table(s)
A variation on the above is to switch the relevant partition to the new table in step 3. That should be faster than copying the data but I think you will have to copy (and not just switch) the data back after the constraint has been checked.
Insert all the data into a new table
Create a new table with the same structure and constraint enabled
Change the "insert" workload to the new table
Copy all the data from old to new table in batches and wait as long as it takes to complete
Change the "read" workload to the new table. If step 3 takes too long and the "read" workload needs rows that have only been inserted into the new table, you will have to manage this changeover manually.
Drop old table
Use index to speed up constraint check?
I have no idea if this works but you can try to create a non-clustered index on the foreign key column. Also make sure there's an index on the relevant unique key on the table referenced by the foreign key. The alter table command might be able to use them to speed up the check (at least by minimizing IO compared to doing a full table scan). The indexes, of course, can be created online to avoid any disruption.

What is faster in access: alter + update or create?

I need to add a new column to at table. I wonder if it is faster to run an alter table query to add the new column and then an update query to insert data in the column. In compare to creating at new table.
I suppose I could just try both to see witch is faster, but maybe someone could explain why?

Point of view speed:
It's more faster create only one column instead of re-creating a table
Point of view data consistence:
A table probabily has a lot of relation with other DB table (it can be a foreign table for others), so if you re-creating a table you must value a script about update other tables reference to your.
I hope, I've answered completely to your question. Have a nice day

Slow progress when adding sequential identity column

We have 8 million row table and we need to add a sequential id column to it. It is used for data warehousing.
From testing, we know that if we remove all the indexes, including the primary key index, adding a new sequential id column was like 10x faster. I still haven't figure out why dropping the indexes would help adding a identity column.
Here is the SQL that add identity column:
ALTER TABLE MyTable ADD MyTableSeqId BIGINT IDENTITY(1,1)
However, the table in question has dependencies, thus I cannot drop the primary key index unless I remove all the FK constraints. As a result adding identity column.
Is there other ways to improve the speed when adding a identity column, so that client down time is minimal?
or
Is there a way to add an identity column without locking the table, so that table can be access, or at least be queried?
The database is SQL Server 2005 Standard Edition.

Adding a new column to a table will acquire a Sch-M (schema modification) lock, which prevents all access to the table for the duration of the operation.
You may get some benefit from switching the database into bulk-logged or simple mode for the duration of the operation, but of course, do so only if you're aware of the effects this will have on your backup / restore strategy.

Create Index on partial CHAR Column

I have a CHAR(250) column being used as a foreign key to a varchar(24) column.
In MySQL I recall that I could create an index specifying column(24) in order to create an index on the leftmost 24 characters. This doesn't appear to be possible on MS SQL Server.
My question is this:
Is it possible to use an indexed view on SQL Server 2008 to index a substring of that column, and if so, would it have any side-effects on the table's performance?

You can create a persisted computed column, then index it, see Creating Indexes on Computed Columns
alter table add newcolumn as cast(oldcolumn as varchar(24)) persisted;
create index table_newcolumn on table (newcolumn);

I hope you have a good relational reason for doing this. I'm guessing the first 24 characters of the vendor-provided table actually constitute a discrete attribute and should have been in a separate column in the first place.
So...
Create a view of the vendor's table. Index it if you like. I doubt you can point a FK constraint at the view, but you certainly can write a trigger to the same effect. A trigger checking against an indexed view will be very fast, at the cost of a slight increase in update times on the view's base table.
HTH.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas