Add column to a huge table - sql-server-2017

I have a table that has around 13 billion records. Size of this table is around 800 GB. I want to add a column of type tinyint to the table but it takes a lot of time to run add column command. Another option would be to create another table with the additional column and copy data from source table to the new table using BCP (data export and import) or copy data directly to the new table.
Is there a better way to achieve this?

My preference for tables of this size is to create a new table and then batch the records into it (BCP, Bulk Insert, SSIS, whatever you like). This may take longer but it keeps your log from blowing out. You can also do the most relevant data (say last 30 days) first, swap out the table, then batch in the remaining history so that you can take advantage of the new row immediately...if your application lines up with that strategy.

Related

SQL Server Data Type Change

In SQL Server, We have table with 80 million of records and 3 column data type is float right now. Now we need to change the float data type column into Decimal column. How to proceed it with minimum downtime?
We executed the usual ALTER statement to change the data type, but log file got filled and going to system out of memory exception. So kindly let me the better way to solve this issue.
We cant use this technique Like: Creating 3 new temp columns and updating the existing data batch wise and dropping the existing column and renaming the temp column to live columns.
I have done exactly same in one of my project. Here are the steps which you can follow where minimal logging, minimum downtime with least complexity.
Create new table with new data types with same column names (Without index if table require any index create once data will be loaded in new table) but different name. For example existing table is EMPLOYEE then new table name should be EMPLOYEE_1. Keep in mind all constraints like foreign key and all you can create before loading or after loading it is not going to impact anything in terms of performance. But recommend don't create as existing table is having so names of constraints you have rename after renaming the table.
Keep in mind have max precision in new data type at what max precision value is available in your existing table.
Load data from your existing table to new table using SSIS with fast load option so that logging will not happen in temp database.
Rename the old table with EMPLOYEE_2 during downtime and rename the EMPLOYEE_1 to EMPLOYEE.
Alter table definition for foreign key, default or any other constrains.
Go live and create indexes in lower load time on table.
by using this approach we have changed the table data type where we had about more than billions records. Using this approach you can have minimum downtime and logging into tempDB as SSIS fast load option will not log anything in database.
This is too long for a comment.
One of these approaches might work.
The simplest approach is to give up. Well almost, but do the following:
Rename the existing column to something else (say, _col)
Add a computed column that does the conversion.
If the data is infrequently modified (so you only need read access), then create a new table. That is, copy the data in to a new table with the correct type, then drop the original table and rename the new table to the old name.
You can do something similar by swapping table spaces rather than renaming the table.
Another way could be:
Add a new column to the table with the correct datatype.
Insert the values from the float column to the decimal column, but do it in bulk of x records.
When everything is done, remove the float column and rename the decimal column to what the float column name was.

SQL - Delete table with 372 million rows starting from first row

I have a table with 372 million rows, I want to delete old rows starting from the first ones without blocking the DB. How can I reach that?
The table have
id | memberid | type | timeStamp | message |
1 123 10 2014-03-26 13:17:02.000 text
UPDATE:
I deleted about 30 GB of Space in DB, but my DISC is ON 6gb space yet..
Any suggestion to get that free space?
Thank you in advance!
select 1;
while(##ROWCOUNT > 0)
begin
WAITFOR DELAY '00:00:10';
delete top(10) from tab where <your-condition>;
end
delete in chunks using above sql
You may want to consider another approach:
Create a table based on the existing one
Adjust the identity column in the empty table to start from the latest value from the old table (if there is any)
Swap the two tables using sp_rename
Copy the records in batches into the new table from the old table
You can do whatever you want with the old table.
BACKUP your database before you start deleting records / play with tables.
the best performance is to query data by id, then:
delete from TABLENAME where id>XXXXX
is the lowest impact you can execute.
You can also divide the operation in suboperations limiting the number of deleted rows for each operation adding ROWCONT declatarion,
example if you want to delete only 5.000.000 of rows per call you can do this:
SET ROWCOUNT=5000000;
delete from TABLENAME where id>XXXXX;
here you can find a reference https://msdn.microsoft.com/it-it/library/ms188774%28v=sql.120%29.aspx?f=255&MSPPError=-2147217396
The answer to the best way to delete rows from an Oracle table is: It
depends! In a perfect world where you can take the table offline for
maintenance, a complete reorganization is always best because it does
the delete and places the table back into a pristine state. We will
address the tools for doing large scale deletes and the appropriate
methods for each environment.
Factors and tools for massive deletes
The choice of the delete methods depends on many factors:
Is the target table partitioned? Partitioning greatly improves delete performance. For example, it is common to have a large time-based table partition and deleting elderly rows from these table can be as simple as dropping the desired partition. See these notes on managing partitioned tables.
Can you reorganize the table after the delete to remove fragmentation?
What percentage of the table will be deleted? In cases where you are deleting more than 30-50% of the rows in a very large table it is faster to use CTAS to delete from a table than to do a vanilla delete and a reorganization of the table blocks and a rebuild of the constraints and indexes.
Do you want to release the space consumed by the deleted rows? If you know that the empty space will be re-used by subsequent DML then you will want to leave the empty space within the table. Conversely, if you want to released the space back onto the tablespace then you will need to reorganize the table.
There are many tools that you can use to delete from large tables:
dbms_metadata.get_ddl: This procedure wil punch-off the definitions of all table indexes and constraints.
dbms_redefinition: This procedure will reorganize a table while it remains available for updating.
Create Table as Select: You can use CTAS to copy a table while removing rows in bulk.
Rename table: If you copy a table when deleting rows you can rename it back to its original name.
COMMIT: In cases where a delete might run for many hours, even the largest UNDO log will not be able to hold the rollback information and it becomes necessary to do the delete in a PL/SQL loop, issuing a COMMIT every zillion-rows to free-up the undo logs. This approach will be re-startable automatically because the delete will pick-up where it left off as on your last commit checkpoint.
More information visit here

Loading fact table using pentaho data integation - Reduce Runtime of ktr

I am using pentaho DI to insert data into fact table . But the thing is the table from which I am populating contains 10000 reccords and increasing on daily basis.
In my populating table contain 10,000 records and newly 200 records are added then i need to run the ktr, If I am running the ktr file then again it truncates all 10,000 data from fact table and start inserting the new 10,200 records.
To avoid this i unchecked the truncate option in table output step and also made one key as unique in the fact table and check the Ignore inputs error option. Now it's working fine and it inserting only the 200 records, But it taking the same execution time.
I tried with the stream lookup step also in the ktr, But there is no change in my execution time.
Please can any one help me to solve this problem.
Thanks in advance.
If you need to capture all of Inserts, Updates, and Deletes, the Merge Rows Diff step followed by a Synchronize after Merge step will do this, and typically will do it very quickly.

Microsoft BIDS How to clear table data before new transfer?

I have a database from which I am pulling rows from, manipulating the data a bit and then putting into another table. Every time I run the package, it doesnt remove any data from the destination and thus grows by X number of rows each time.
Is there any way that I can clear the destination before adding the new rows?
YOu can run a truncate statement to clear all records from the table.
TRUNCATE TABLE YourTableName

Altering 2 columns in table - risky operation?

I have a table with ~100 columns, about ~30M rows, on MSSQL server 2005.
I need to alter 2 columns - change their types from VARCHAR(1024) to VARCHAR(max). These columns does not have index on them.
I'm worried that doing so will fill up the log, and cause the operation to fail. How can I estimate the needed free disk space, both of the data and the log, needed for such operation to ensure it will not fail?
You are right, increasing the column size (including to MAX) will generate a huge log for a large table, because every row will be updated (behind the scenens the old column gets dropped and a new column gets added and data is copied).
Add a new column of type VARCHAR(MAX) NULL. As a nullable column, will be added as metadata only (no data update)
Copy the data from the old column to new column. This can be done in batches to alleviate the log pressure.
Drop the old column. This will be a metadata only operation.
Use sp_rename to rename the new column to the old column name.
Later, at your convenience, rebuild the clustered index (online if needed) to get rid of the space occupied by the old column
This way you get control over the log by controlling the batches at step 2). You also minimize the disruption on permissions, constraints and relations by not copying the entire table into a new one (as SSMS so poorly does...).
You can do this sequence for both columns at once.
I would recommend that you consider, instead:
Create a new table with the new schema
Copy data from old table to new table
Drop old table
Rename new table to name of old table
This might be a far less costly operation and could possibly be done with minimal logging using INSERT/SELECT (if this were SQL Server 2008 or higher).
Why would increasing the VARCHAR limit fill up the log?
Try to do some test in smaller pieces. I mean, you could create the same structure locally with few thousand rows, and see the difference before and after. I think the change will be linear. The real question is about redo log, if it will fit into it or not, since you can do it at once. Must you do it online, or you can stop production for a while? If you can stop, maybe there is a way to stop redo log in MSSQL like in Oracle. It could make it a lot faster. If you need to do it online, you could try to make a new column, copy the value into it by a cycle for example 100000 rows at once, commit, continue. After completing maybe to drop original column and rename new one is faster than altering.