Oracle BEST PRACTICE to update 50 million child rows in a table using value from parent table - sql

I have a child table with 100 million rows and need to update 50 million rows of a column using the value from the parent table. I have read around that assuming if we have enough space, it would be the fastest to "create table as select", but I want to know if anyone disagrees or if other factors are required in order to make a better guess? Would it be better to go this route versus using pl/sql's BULK COLLECT FORALL UPDATE feature?

If you have a lot of data then CREATE TABLE AS SELECT is definitely faster because it does not require UNDO table space. However, to recreate all the indices on the new table can be quite a hassle due to name conflicts.
Good news is: 50 min rows is not really a lot of data. If you have a modern midrange server it should not cause problems so it is not worth the extra work. The best way to find out is to make a copy of the original table (including all indices) and try the update there. Then you get a rough idea how long it will take.

Parallel UPDATE is probably the best option for a large change to a child table. (If you have Enterprise Edition, sufficient resources, a sane configuration, etc.)
alter session enable parallel dml;
update /*+ parallel */ ...;
(You might want to play with different parallel numbers, like parallel(8). The default degree of parallelism is usually good enough. But some platforms like SPARC inflate their "CPU_COUNT", leading to ridiculous degrees of parallelism.)
Parallel UPDATE is likely not the optimal solution. Recreating the objects can be faster because it can almost completely avoid generating REDO and UNDO. But re-creating objects is usually buggy and getting that optimal performance is tricky.
Here are things to consider before you decide to simply drop and recreate a table:
Grants. Save and re-apply the object grants after the objects are recreated.
Dependent objects. The process needs to re-create all objects, and dependent objects, in the exact same way. This can be painfully difficult depending on how complex your schema is. DBMS_METADATA can be tricky, and in some cases still won't make the objects exactly the same way. If you decide to hard-code the DDL instead you have to remember to update the process whenever the objects change.
Invalid objects. Most objects will automatically recompile when necessary. But you probably don't want to wait for that because it always looks bad to have invalid objects. And even if they do compile correctly, some programs may still get those pesky ORA-04068: existing state of packages has been discarded errors. (Because most PL/SQL programmers are unaware of session state and make every package variable public by default.)
Statistics. Simply re-gathering them after the table is re-created is not always sufficient. Histograms depend on whether columns were used in a predicate. If the table is re-created all the columns are new and no histograms will be initially created.
Direct-path writes are elusive. A parent-child table implies a foreign key, which normally prevents direct-path writes. The process needs to disable or drop the foreign key. And also set the table and index to NOLOGGING, and then remember to set them back to LOGGING at the end. When you re-create the foreign key, if you want to do it in parallel you have to initially create it as NOVALIDATE, set the table to parallel, enable validate the constraint, and then set the table back to NOPARALLEL.
In a large data warehouse it's worth going through all those steps and building code for dealing with all the issues. If this is your only large table UPDATE I suggest you avoid that work and accept a slightly non-optimal solution.

Related

Is it possible to do usual atomic INSERT operation but update Indexes asynchronously?

indexes make read fast but write slower. But why can't you have single writes and have db add indexes asynchronously with time, also cache in the INSERT until it's indexed?
Is there any database like that?
Converting my comments to an answer:
indexes make read fast but write slower
That's an oversimplification and it's also misleading.
Indexes make data lookups faster because the DBMS doesn't need to do a table-scan to find rows matching a predicate (the WHERE part of a query). Indexes don't make "reads" any faster (that's entirely dependent on the characteristics of your disk IO) and when used improperly they can sometimes even make queries slower (for reasons I won't get into).
I want to stress that the additional cost of writing to a single index, or even multiple indexes, when executing a DML statement (INSERT/UPDATE/DELETE/MERGE/etc) is negligible, really! (In actuality: foreign-key constraints are a much bigger culprit - and I note you can practically eliminate the cost of foreign-key constraint checking by adding additional indexes!). Indexes are primarily implemented using B-trees (a B-tree is essentially like a binary-tree, except rather than each node having only 2 children it can have many children because each tree-node comes with unused space for all those child node pointers, so inserting into the middle of a B-tree won't require data to be moved-around on-disk unlike with other kinds of trees, like a heap-tree).
Consider this QA where a Postgres user (like yourself) reports inserting 10,000 rows into a table. Without an index it took 78ms, with an index it took 84ms, that's only a 7.5% increase, which at that scale (6ms!) is so small it may as well be a rounding error or caused by IO scheduling. That should be proof enough it shouldn't be something you should worry about without actual hard data showing it's a problem for you and your application.
I assume you have this negative impression about indexes after reading an article like this one, which certainly gives the impression that "indexes are bad" - but while the points mentioned in that article are not wrong, there's a LOT of problems with that article so you shouldn't take it dogmatically. (I'll list my concerns with that article in the footer).
But why can't you have single writes and have db add indexes asynchronously with time
By this I assume you mean you'd like a DMBS to do a single-row INSERT by simply appending a new record to the end of a table and then immediately returning and then at an arbitrary point later the DBMS' housekeeping system would update the indexes afterwards.
The problem with that is that it breaks the A, C, and I parts of the the A.C.I.D. model.
Indexes are used for more than just avoiding table-scans: they're also used to store copies of table data for the benefit of queries that would use the index and which also need (for example) a small subset of the table's data, this significantly reduces disk reads. For this reason, RDBMS (and ISO SQL) allow indexes to include non-indexed data using the INCLUDES clause.
Consider this scenario:
CREATE INDEX IX_Owners ON cars ( ownerId ) INCLUDE ( colour );
CREATE INDEX IX_Names ON people ( name ) INCLUDE ( personId, hairColour );
GO;
SELECT
people.name,
people.hairColour,
cars.colour
FROM
cars
INNER JOIN people ON people.personId = cars.ownerId
WHERE
people.name LIKE 'Steve%'
The above query will not need to read either the cars or people tables on-disk. The DBMS will be able to fully answer the query using data only in the index - which is great because indexes tend to exist in a small number of pages on-disk which tend to be in proximal-locality which is good for performance because it means it will use sequential IO which scales much better than random IO.
The RDBMS will perform a string-prefix index-scan of the people.IX_Names index to get all of the personId (and hairColour) values, then it will look-up those personId values in the cars.IX_Owners index and be able to get the car.colour from the copy of the data inside the IX_Owners index without needing to read the tables directly.
Now, assuming that another database client has just completed inserted a load of records into the cars and/or people table with a COMMIT TRANSACTION just for good measure, and the RDMBS uses your idea of only updating indexes later whenever it feels like it, then if that same database client re-runs the query from above it would return stale data (i.e. wrong data) because the query uses the index, but the index is old.
In addition to using index tree nodes to store copies of table data to avoid non-proximal disk IO, many RDBMS also use index-trees to store entire copies - even multiple copies of table data, to enable other scenarios, such as columnar data storage and indexed-VIEWs - both of these features absolutely require that indexes are updated atomically with table data.
Is there any database like that?
Yes, they exist - but they're not widely used (or they're niche) because for the vast majority of applications it's entirely undesirable behaviour for the reasons described above.
There are distributed databases that are designed around eventual consistency, but clients (and entire application code) needs to be designed with that in-mind, and it's a huge PITA to have to redesign a data-centric application to support eventual-consistency which is why you only really see them being used in truly massive systems (like Facebook, Google, etc) where availability (uptime) is more important than users seeing stale-data for a few minutes.
Footnote:
Regarding this article: https://use-the-index-luke.com/sql/dml/insert
The number of indexes on a table is the most dominant factor for insert performance. The more indexes a table has, the slower the execution becomes. The insert statement is the only operation that cannot directly benefit from indexing because it has no where clause.
I disagree. I'd argue that foreign-key constraints (and triggers) are far more likely to have a larger detrimental effect on DML operations.
Adding a new row to a table involves several steps. First, the database must find a place to store the row. For a regular heap table—which has no particular row order—the database can take any table block that has enough free space. This is a very simple and quick process, mostly executed in main memory. All the database has to do afterwards is to add the new entry to the respective data block.
I agree with this.
If there are indexes on the table, the database must make sure the new entry is also found via these indexes. For this reason it has to add the new entry to each and every index on that table. The number of indexes is therefore a multiplier for the cost of an insert statement.
This is true, but I don't know if I agree that it's a "multiplier" of the cost of an insert.
For example, consider a table with hundreds of nvarchar(1000) columns and several int columns - and there's separate indexes for each int column (with no INCLUDE columns). If you're inserting 100x megabyte-sized rows all-at-once (using an INSERT INTO ... SELECT FROM statement) the cost of updating those int indexes is very likely to require much less IO than the table data.
Moreover, adding an entry to an index is much more expensive than inserting one into a heap structure because the database has to keep the index order and tree balance. That means the new entry cannot be written to any block—it belongs to a specific leaf node. Although the database uses the index tree itself to find the correct leaf node, it still has to read a few index blocks for the tree traversal.
I strongly disagree with this, especially the first sentence: "adding an entry to an index is much more expensive than inserting one into a heap structure".
Indexes in RDBMS today are invariably based on B-trees, not binary-trees or heap-trees. B-trees are essentially like binary-trees except each node has built-in space for dozens of child node pointers and B-trees are only rebalanced when a node fills its internal child pointer list, so a B-tree node insert will be considerably cheaper than the article is saying because each node will have plenty of empty space for a new insertion without needing to re-balance itself or any other relatively expensive operation (besides, DBMS can and do index maintenance separately and independently of any DML statement).
The article is correct about how the DBMS will need to traverse the B-tree to find the node to insert into, but index nodes are efficently arranged on-disk, such as keeping related nodes in the same disk page which minimizes index IO reads (assuming they aren't already loaded into memory first). If an index tree is too big to store in-memory the RDBMS can always keep a "meta-indexes" in-memory so it could potentially instantly find the correct B-tree index without needing to traverse the B-tree from the root.
Once the correct leaf node has been identified, the database confirms that there is enough free space left in this node. If not, the database splits the leaf node and distributes the entries between the old and a new node. This process also affects the reference in the corresponding branch node as that must be duplicated as well. Needless to say, the branch node can run out of space as well so it might have to be split too. In the worst case, the database has to split all nodes up to the root node. This is the only case in which the tree gains an additional layer and grows in depth.
In practice this isn't a problem, because the RDBMS's index maintenance will ensure there's sufficient free space in each index node.
The index maintenance is, after all, the most expensive part of the insert operation. That is also visible in Figure 8.1, “Insert Performance by Number of Indexes”: the execution time is hardly visible if the table does not have any indexes. Nevertheless, adding a single index is enough to increase the execute time by a factor of a hundred. Each additional index slows the execution down further.
I feel the article is being dishonest by suggesting (implying? stating?) that index-maintenance happens with every DML. This is not true. This may have been the case with some early dBase-era databases, but this is certainly not the case with modern RDBMS like Postgres, MS SQL Server, Oracle and others.
Considering insert statements only, it would be best to avoid indexes entirely—this yields by far the best insert performance.
Again, this claim in the article is not wrong, but it's basically saying if you want a clean and tidy house you should get rid of all of your possessions. Indexes are a fact of life.
However tables without indexes are rather unrealistic in real world applications. You usually want to retrieve the stored data again so that you need indexes to improve query speed. Even write-only log tables often have a primary key and a respective index.
Indeed.
Nevertheless, the performance without indexes is so good that it can make sense to temporarily drop all indexes while loading large amounts of data—provided the indexes are not needed by any other SQL statements in the meantime. This can unleash a dramatic speed-up which is visible in the chart and is, in fact, a common practice in data warehouses.
Again, with modern RDBMS this isn't necessary. If you do a batch insert then a RDBMS won't update indexes until after the table-data has finished being modified, as a batch index update is cheaper than many individual updates. Similarly I expect that multiple DML statements and queries inside an explicit BEGIN TRANSACTION may cause an index-update deferral provided no subsequent query in the transaction relies on an updated index.
But my biggest issue with that article is that the author is making these bold claims about detrimental IO performance without providing any citations or even benchmarks they've run themselves. It's even more galling that they posted a bar-chart with arbitrary numbers on, again, without any citation or raw benchmark data and instructions for how to reproduce their results. Always demand citations and evidence from anything you read making claims: because the only claims anyone should accept without evidence are logical axioms - and a quantitative claim about database index IO cost is not a logical axiom :)
For PostgreSQL GIN indexes, there is the fastupdate feature. This stores new index entries into a unordered unconsolidated area waiting for some other process to file them away into the main index structure. But this doesn't directly match up with what you want. It is mostly designed so that the index updates are done in bulk (which can be more IO efficient), rather than in the background. Once the unconsolidated area gets large enough, then a foreground process might take on the task of filing them away, and it can be hard to tune the settings in a way to get this to always be done by a background process instead of a foreground process. And it only applies to GIN indexes. (With the use of the btree_gin extension, you can create GIN indexes on regular scalar columns rather than the array-like columns it usually works with.) While waiting for the entries to be consolidated, every query will have to sequential scan the unconsolidated buffer area, so delaying the updates for the sake of INSERT can come at a high cost for SELECTs.
There are more general techniques to do something like this, such as fractal tree indexes. But these are not implemented in PostgreSQL, and wherever they are implemented they seem to be proprietary.

avoiding write conflicts while re-sorting a table

I have a large table that I need to re-sort periodically. I am partly basing this on a suggestion I was given to stay away from using cluster keys since I am inserting data ordered differently (by time) from how I need it clustered (by ID), and that can cause re-clustering to get a little out of control.
Since I am writing to the table on a hourly I am wary of causing problems with these two processes conflicting: If I CTAS to a newly sorted temp table and then swap the table name it seems like I am opening the door to have a write on the source table not make it to the temp table.
I figure I can trigger a flag when I am re-sorting that causes the ETL to pause writing, but that seems a bit hacky and maybe fragile.
I was considering leveraging locking and transactions, but this doesn't seem to be the right use case for this since I don't think I'd be locking the table I am copying from while I write to a new table. Any advice on how to approach this?
I've asked some clarifying questions in the comments regarding the clustering that you are avoiding, but in regards to your sort, have you considered creating a nice 4XL warehouse and leveraging the INSERT OVERWRITE option back into itself? It'd look something like:
INSERT OVERWRITE INTO table SELECT * FROM table ORDER BY id;
Assuming that your table isn't hundreds of TB in size, this will complete rather quickly (inside an hour, I would guess), and any inserts into the table during that period will queue up and wait for it to finish.
There are some reasons to avoid the automatic reclustering, but they're basically all the same reasons why you shouldn't set up a job to re-cluster frequently. You're making the database do all the same work, but without the built in management of it.
If your table is big enough that you are seeing performance issues with the clustering by time, and you know that the ID column is the main way that this table is filtered (in JOINs and WHERE clauses) then this is probably a good candidate for automatic clustering.
So I would recommend at least testing out a cluster key on the ID and then monitoring/comparing performance.
To give a brief answer to the question about resorting without conflicts as written:
I might recommend using a time column to re-sort records older than a given time (probably in a separate table). While it's sorting, you may get some new records. But you will be able to use that time column to marry up those new records with the, now sorted, older records.
You might consider revoking INSERT, UPDATE, DELETE privileges on the original table within the same script or procedure that performs the CTAS creating the newly sorted copy of the table. After a successful swap you can re-enable the privileges for the roles that are used to perform updates.

Is there a more elegant way to detect changes in a large SQL table without altering it? [duplicate]

This question already has answers here:
How can I get a hash of an entire table in postgresql?
(7 answers)
Closed 9 years ago.
Suppose you have a reasonably large (for local definitions of “large”), but relatively stable table.
Right now, I want to take a checksum of some kind (any kind) of the contents of the entire table.
The naïve approach might be to walk the entire table, taking the checksum (say, MD5) of the concatenation of every column on each row, and then perhaps concatenate them and take its MD5sum.
From the client side, that might be optimized a little by progressively appending columns' values into the MD5 sum routine, progressively mutating the value.
The reason for this, is that at some point in future, we want to re-connect to the database, and ensure that no other users may have mutated the table: that includes INSERT, UPDATE, and DELETE.
Is there a nicer way to determine if any change/s have occurred to a particular table? Or a more efficient/faster way?
Update/clarification:
We are not able/permitted to make any alterations to the table itself (e.g. adding a “last-updated-at” column or triggers or so forth)
(This is for Postgres, if it helps. I'd prefer to avoid poking transaction journals or anything like that, but if there's a way to do so, I'm not against the idea.)
Adding columns and triggers is really quite safe
While I realise you've said it's a large table in a production DB so you say you can't modify it, I want to explain how you can make a very low impact change.
In PostgreSQL, an ALTER TABLE ... ADD COLUMN of a nullable column takes only moments and doesn't require a table re-write. It does require an exclusive lock, but the main consequence of that is that it can take a long time before the ALTER TABLE can actually proceed, it won't hold anything else up while it waits for a chance to get the lock.
The same is true of creating a trigger on the table.
This means that it's quite safe to add a modified_at or created_at column and an associated trigger function to maintain them to a live table that's in intensive real-world use. Rows added before the column was created will be null, which makes perfect sense since you don't know when they were added/modified. Your trigger will set the modified_at field whenever a row changes, so they'll get progressively filled in.
For your purposes it's probably more useful to have a trigger-maintained side-table that tracks the timestamp of the last change (insert/update/delete) anywhere in the table. That'll save you from storing a whole bunch of timestamps on disk and will let you discover when deletes have happened. A single-row side-table with a row you update on each change using a FOR EACH STATEMENT trigger will be quite low-cost. It's not a good idea for most tables because of contention - it essentially serializes all transactions that attempt to write to the table on the row update lock. In your case that might well be fine, since the table is large and rarely updated.
A third alternative is to have the side table accumulate a running log of the timestamps of insert/update/delete statements or even the individual rows. This allows your client read the change-log table instead of the main table and make small changes to its cached data rather than invalidating and re-reading the whole cache. The downside is that you have to have a way to periodically purge old and unwanted change log records.
So... there's really no operational reason why you can't change the table. There may well be business policy reasons that prevent you from doing so even though you know it's quite safe, though.
... but if you really, really, really can't:
Another option is to use the existing "md5agg" extension: http://llg.cubic.org/pg-mdagg/ . Or to apply the patch currently circulating pgsql-hackers to add an "md5_agg" to the next release to your PostgreSQL install if you built from source.
Logical replication
The bi-directional replication for PostgreSQL project has produced functionality that allows you to listen for and replay logical changes (row inserts/updates/deletes) without requiring triggers on tables. The pg_receivellog tool would likely suit your purposes well when wrapped with a little scripting.
The downside is that you'd have to run a patched PostgreSQL 9.3, so I'm guessing if you can't change a table, running a bunch of experimental code that's likely to change incompatibly in future isn't going to be high on your priority list ;-) . It's included in the stock release of 9.4 though, see "changeset extraction".
Testing the relfilenode timestamp won't work
You might think you could look at the modified timestamp(s) of the file(s) that back the table on disk. This won't be very useful:
The table is split into extents, individual files that by default are 1GB each. So you'd have to find the most recent timestamp across them all.
Autovacuum activity will cause these timestamps to change, possibly quite a while after corresponding writes happened.
Autovacuum must periodically do an automatic 'freeze' of table contents to prevent transaction ID wrap-around. This involves progressively rewriting the table and will naturally change the timestamp. This happens even if nothing's been added for potentially quite a long time.
Hint-bit setting results in small writes during SELECT. These writes will also affect the file timestamps.
Examine the transaction logs
In theory you could attempt to decode the transaction logs with pg_xlogreader and find records that affect the table of interest. You'd have to try to exclude activity caused by vacuum, full page writes after hint bit setting, and of course the huge amount of activity from every other table in the entire database cluster.
The performance impact of this is likely to be huge, since every change to every database on the entire system must be examined.
All in all, adding a trigger on a table is trivial in comparison.
What about creating a trigger on insert/update/delete events on the table? The trigger could call a function that inserts a timestamp into another table which would mark the time for any table-changing event.
The only concern would be an update event updated using the same data currently in the table. The trigger would fire, even though the table didn't really change. If you're concerned about this case, you could make the trigger call a function that generates a checksum against just the updated rows and compares against a previously generated checksum, which would usually be more efficient than scanning and checksumming the whole table.
Postgres documentation on triggers here: http://www.postgresql.org/docs/9.1/static/sql-createtrigger.html
If you simply just want to know when a table has last changed without doing anything to it, you can look at the actual file(s) timestamp(s) on your database server.
SELECT relfilenode FROM pg_class WHERE relname = 'your_table_name';
If you need more detail on exactly where it's located, you can use:
select t.relname,
t.relfilenode,
current_setting('data_directory')||'/'||pg_relation_filepath(t.oid)
from pg_class t
join pg_namespace ns on ns.oid = t.relnamespace
where relname = 'your_table_name';
Since you did mention that it's quite a big table, it will definitely be broken into segments, and toasts, but you can utilize the relfilenode as your base point, and do a ls -ltr relfilenode.* or relfilnode_* where relfilenode is the actual relfilenode from above.
These files gets updated at every checkpoint if something occured on that table, so depending on how often your checkpoints occur, that's when you'll see the timestamps update, which if you haven't changed the default checkpoint interval, it's within a few minutes.
Another trivial, but imperfect way to check if INSERTS or DELETES have occurred is to check the table size:
SELECT pg_total_relation_size('your_table_name');
I'm not entirely sure why a trigger is out of the question though, since you don't have to make it retroactive. If your goal is to ensure nothing changes in it, a trivial trigger that just catches an insert, update, or delete event could be routed to another table just to timestamp an attempt but not cause any activity on the actual table. It seems like you're not ensuring anything changes though just by knowing that something changed.
Anyway, hope this helps you in this whacky problem you have...
A common practice would be to add a modified column. If it were MySQL, I'd use timestamp as datatype for the field (updates to current date on each updade). Postgre must have something similar.

When this query is performed, do all the records get loaded into physical memory?

I have a table where i have millions of records. The total size of that table only is somewhere 6-7 GigaByte. This table is my application log table. This table is growing really fast, which makes sense. Now I want to move records from log table into backup table. Here is the scenario and here is my question.
Table Log_A
Insert into Log_b select * from Log_A;
Delete from Log_A;
I am using postgres database. the question is
When this query is performed Does all the records from Log_A gets load in physical memory ? NOTE: My both of the above query runs inside a stored procedure.
If No, then how will it works ?
I hope this question applies for all database.
I hope if somebody could provide me some idea on this.
In PostgreSQL, that's likely to execute a sequential scan, loading some records into shared_buffers, inserting them, writing the dirty buffers out, and carrying on.
All the records will pass through main memory, but they don't all have to be in memory at once. Because they all get read from disk using normal buffered reads (pread) it will affect the operating system disk cache, potentially pushing other data out of the cache.
Other databases may vary. Some could execute the whole SELECT before processing the INSERT (though I'd be surprised if any serious ones did). Some do use O_DIRECT reads or raw disk I/O to avoid the OS cache affects, so the buffer cache effects might be different. I'd be amazed if any database relied on loading the whole SELECT into memory, though.
When you want to see what PostgreSQL is doing and how, the EXPLAIN and EXPLAIN (BUFFERS, ANALYZE) commands are quite useful. See the manual.
You may find writable common table expressions interesting for this purpose; it lets you do all this in one statement. In this simple case there's probably little benefit, but it can be a big win in more complex data migrations.
BTW, make sure to run that pair of queries wrapped in BEGIN and COMMIT.
Probably not.
Each record is individually processed; this particular query doesn't need to have knowledge of any of the other records to successfully execute. So the only record that needs to be in memory at any given moment is the one currently being processed.
But it really depends on whether or not the database thinks it can do it faster by loading up the whole table. Check the execution plan of the query.
If your setup allows it, just rename the old table and create a new empty one. Much faster, obviously, as no copying is done at all.
ALTER TABLE log_a RENAME TO log_b;
CREATE TABLE log_a (LIKE log_b INCLUDING ALL);
The LIKE clause copies the structure of the (now renamed) old table. INCLUDING ALL includes defaults, constraints, indexes, ...
Foreign key constraints or views depending on the table or other less common dependencies (but not queries in plpgsql functions) might be a hurdle for this route. You would have to recreate those to have them point to the new table. But a logging table like you describe probably carries no such dependencies.
This acquires an exclusive lock on the table. I assume, typical write access will be INSERT only in your case? One way to deal with concurrent access would then be to create the new table in a different schema and alter the search_path for your application user. Then the applications starts to write to the new table without concurrency issues. Of course, you wouldn't schema-qualify the table name in your INSERT statements for this to take effect.
CREATE SCHEMA log20121018;
CREATE TABLE log20121018.log_a (LIKE log20121011.log_a INCLUDING ALL);
ALTER ROLE myrole SET search_path = app, log20121018, public;
Or alter the search_path setting at whatever level is effective for you:
globally, per database, per role, per session, per function ...

Making structural changes to very large tables in an online environment

So here's what I'm facing.
The Problem
A large table, with ~230,000,000
rows.
We want to change the
clustering index and primary key of
this table to a simple bigint
identity field. There is one other
empty field being added to the table,
for future use.
The existing table
has a composite key. For the sake of
argument, let's say it's 2 bigint's.
The first one may have 1 or 10,000
'children' in the 2nd part of the
key.
Requirements
Minimal downtime, like preferably the
length of time it takes to run
SP_Rename.
Existing rows may change
while we're copying data. The updates
must be reflected in the new table.
Ideas
Put a trigger on existing table,
to update row in new table if it
already exists there.
Iterate through original table, copying data
into new table ~10,000 at a time.
Maybe 2,000 of the first part of the
old key.
When the copy is
complete, rename the old table to
"ExistingTableOld" and the new one
from "NewTable" to "ExistingTable".
This should allow stored procs to
continue to run without intervention
Are there any glaring omissions in the plan, or best practices I'm ignoring?
Difficult problem. Your plan sounds good, but I'm not totally sure you really need to batch the query as long as you run it in a transaction isolation level of READ UNCOMMITTED to stop locks being generated.
My experience making big schema changes is big changes are best done during a maintenance window—at night/over a weekend—when users are booted off the system. Just like running dbcc checkdb with the repair option. Then, when things go south, you have the option to roll back to the full backup that you providentially made right before starting the upgrade.
Item #3 on your list: Renaming the old/new tables. You'll probably want to recompile the stored procedures/views. My experience is that execution plans are bound against the object ids rather than object names.
Consider table dbo.foo: if it is renamed to dbo.foo_old, any stored procedures or user-defined functions won't necessarily error out until the dependent object is recompiled and its execution plan rebound. Cached execution plans continue to work perfectly fine.
sp_recompile is your friend.