I have been asked to look into a manual process that one of my colleagues is completing every now and again.
He sometimes needs to add a new column onto a large table (200 million rows), it is taking him more than 1 hour to do this. Before you ask, yes, the columns are nullable but sometimes the new column will have 90% data in it.
Instead of adding a new column to the existing table, he...
Creates a new table
Select (*) from old table (inserts into new)
Adds the new column as part of his script
Then he deletes the old table and renames the new table back to the original, adds index and then compresses. He says it much quicker like that.
If this is the best way then I will try and write SSIS package to try and make the process more seamless
Any advice is welcome!
Thanks
creating a new table structure and moving all the data to that table and delete the prior table is a good way just for a few data,you can do it by wizard in SQL Server. but it is the worst way for solving this problem(millions of data).
for large amount of data (millions of records) you should use "Alter Table".
Alter Table MyTable
ADD NewColumn nvarchar(10) null
the new column will add to the table as the last column.
if you use this script it takes less that one second because all data will not moving,you just add a new column in to the table.
but if you use the wizard method as you mentioned with millions of data records it takes hours.
as Ali says
alter Table MyTable
ADD NewColumn nvarchar(10) null
but then to fill in 90% of data. As he has a table already with it in and the key he's joining on in the copy so this is all he needs:
UPDATE MyTable
SET [NewColumn] = b.[NewColumn]
FROM MyTable a INNER JOIN NewColumnTable b ON a.[KeyField]= b.[KeyField]
would be a lot quicker. You could do it in SSIS but if this happens a lot then not really worth it for a few lines of SQL.
Related
I have a table TEST that has 41 million+ records in it.
I have two main columns in this table that I am interested in:
MESSAGE of type CLOB
MESSAGE_C of type VARCHAR2(2048)
The table Test is range partitioned using a partition column named PART_DATE where one partition has data for one day.
I tried using the below to get the job done:
ALTER TABLE TEST ADD MESSAGE_C VARCHAR2(2048);
UPDATE TEST SET MESSAGE_C = MESSAGE;
COMMIT;
ALTER TABLE TEST DROP COLUMN MESSAGE;
ALTER TABLE TEST RENAME COLUMN MESSAGE_C TO MESSAGE;
But I got stuck on step 2 for around 4 hours. Our DBA said, these was a blocking due to full table scans.
Can someone please tell me:
What would be a better/more efficient way to get this done?
Would using the PART_DATE field in the where clause of the update query help?
Consider using an INSERT INTO SELECT to create the new table on the fly with a new name, then add the indexes after creating the table, drop the old table, and rename the new table to the old name.
It's a DML operation, so it will be significantly faster, and also isn't slowed down by server logging settings.
I've used this approach to alter tables with 500 million records a bit recently.
What does it do to create a Partition Switching table?
Then how can I fill these tables?
A partition switching table is a normal table. But it must be identical to the table you want to switch into.
With regards as to how do you fill a table? You might need to post more info... how are you filling your other tables? It's just another table. Fill it however you see fit.
The trick is....
once you have your table filled and you want to switch it in against another table you run this:
TRUNCATE TABLE targettable;
ALTER TABLE sourcetable SWITCH TO targettable;
You can also add this though I've never tested it (I will be today as I just found it today, here https://littlekendra.com/2017/01/19/why-you-should-switch-in-staging-tables-instead-of-renaming/)
TRUNCATE TABLE targettable;
ALTER TABLE sourcetable SWITCH TO targettable
WITH ( WAIT_AT_LOW_PRIORITY
(MAX_DURATION = 1 MINUTES, ABORT_AFTER_WAIT = BLOCKERS)
);
This replaces targettable with the data in sourcetable
As always, someone has done all this before you and me and blogged it, and it's only a google away
https://sqlsunday.com/2014/08/24/reloading-fact-tables-with-zero-downtime/
The basic idea... It's just a simple means of replacing an empty table with a populated table, without having to drop the empty table and rename the populated table. The only caviat is that both tables MUST have the exact same structure, including any and all indexes.
So, say you have 10 million rows of data and you want to delete 9 million of those rows. Deleting 9 million rows in one pop is likely to blow up your tempdb and your transaction logs. As an alternative, you can to a "SELECT INTO" to put the 1 million rows you want to keep into a new table (minimally logged)... add indexes that match the original table... truncate the original table (minimally logged) and the switch the partitions (minimally logged).
I had about 20,000,000 records
in a table (random data), and then I added empty column to that table...
but when I update that table to fill that column, the process was broken down..
I tried to use the cursor and the index but no results..
do you have any fast solution or any alternative solution?
Thank you in advance :)
Maybe the fastest way would be to create new_table as select * from existing table, and then inside the select statment of CTAS, calculate the value of the new column. After that, you can rename old table to something like table_bckp, then rename new table to the original table name, and then apply constraints, indexes, and other scripts previously saved from old table definitions.
I have a table people with less than 100,000 records and I have taken a backup of this table using the following:
create table people_backup as select * from people
I add some new records to my people table over time, but eventually I want to merge the records from my backup table into people. Unfortunately I cannot simply DROP my table as my new records will be lost!
So I want to update the records in my people table using the records from people_backup, based on their primary key id and I have found 2 ways to do this:
MERGE the tables together
use some sort of fancy correlated update
Great! However, both of these methods use SET and make me specify what columns I want to update. Unfortunately I am lazy and the structure of people may change over time and while my CTAS statement doesn't need to be updated, my update/merge script will need changes, which feels like unnecessary work for me.
Is there a way merge entire rows without having to specify columns? I see here that not specifying columns during an INSERT will direct SQL to insert values by order, can the same methodology be applied here, is this safe?
NB: The structure of the table will not change between backups
Given that your table is small, you could simply
DELETE FROM table t
WHERE EXISTS( SELECT 1
FROM backup b
WHERE t.key = b.key );
INSERT INTO table
SELECT *
FROM backup;
That is slow and not particularly elegant (particularly if most of the data from the backup hasn't changed) but assuming the columns in the two tables match, it does allow you to not list out the columns. Personally, I'd much prefer writing out the column names (presumably those don't change all that often) so that I could do an update.
I have a live production table which has more than 1 million records. Now i don't need to tamper anything on this table and would like to create another table which fetches all records from this live production table. I would schedule a job which can take entries from my main table and inserts them to my new table. But i don't want all the records daily; i just need the records added on a daily basis in the production table to get added in my new table.
Please suggest a faster and efficient approach.
You could do this with an INSERT/UPDATE/DELETE trigger to send the INSERTED/UPDATED/DELETED row to the new table, however this feels like reinventing the wheel on the most basic level.
You could just use asynchronous replication rather than hand-rolling it all yourself, this is probably safer, more sustainable and scalable. You could add as many tables as you like to the replicated source.
Copying one million records from an existing table to a new table should not take very long -- and might even be faster than figuring out what records to copy. You could do something like:
truncate table copytable;
insert into copytable
select *
from productiontable;
Note that you should explicitly list the columns when doing the insert.
You can also readily add new records -- assuming you have some form of id on the production table, such as an id assigned by a sequence. Then you can do:
insert into copytable
select *
from productiontable p
where p.id > (select max(id) from copytable);