Our production sever is sql server 2005, and we have a very large table of 103 Million records. We want increase length of one particular field from varchar(20) to varchar(30). Though i said its just a metadata change as its a increase in the column length my manager says he doesnt want to alter such a huge table. pl advice the best option. I am thinking to create a new column and update the new column with old column values.
I looked at many blogs and they say that the alter will impact and some say it will not impact.
As you said, it is a metadata-only operation and this is the way to go. Prove to your manager (and to yourself!) through testing that you are right.
You should test any advice first, unless from a SQL Server MVP who might actually know the details of what happens.
However, changing the varchar length from 20 to 30 does not affect the layout of any existing data in the table. That is, the layout of the two variables is exactly the same. That means that the data does not have to change when you alter the table.
This offers optimism that the change would be "easy".
The data page does contain some information about types -- at least the length of the type in the record. I don't know if this includes the maximum length of a character type. It is possible that the data pages would need to be changed.
This is a bit of pessimism.
Almost any other change will require changes to every record and/or data page. For instance, changing from int to bigint is moving from a 4-byte field to an 8-byte field. All the records are affected by this change in data layout. Big change.
Changing from varchar() to either nvarchar() or char() would have the same impact.
On the other hand, changing a field from being NULLABLE to NOT NULLABLE (or vice versa) would not affect the record storage on each page. But, that information is stored on the page in the NULLABLE flags array, so all the pages would need to be updated.
So, there is some possibility that the change would not cause any data to be rewritten. But test on a smaller table to see what happens.
Related
Typically to expose version data you'd have to add a column of type rowversion, but this operation would take quite a while on a large table. I did it anyway in a dev sandbox environment, and indeed it took a while, but I also noticed that the column was populated with some meaningful-looking initial value. I expected it to be all 0's or 1's to indicate that each row is in some sort of "initial" state (after all, there was no history before this), but what I saw were what looked like accurate values for each row (they were all different, non-default-looking values).
Where did they come from? It seems like the rowversion is being tracked behind the scenes anyway, regardless of whether you've exposed it in a column. If so, can I get at it directly without adding the column? Like maybe some kind of system function I can call directly? I really want to avoid downtime, and I also have a huge number of existing queries so migration to a different table/view/combo is not an option (as suggested in other related questions).
The rowversion value is generated when a table with a rowversion (a.k.a timestamp) value is modified. The rowversion value is database-scoped and the last generated value can be retrieved via ##DBTS.
Since the value is incremented only when a rowversion table is modified, I don't think you'll be able to use ##DBTS to avoid the downtime.
I have a table with more than 600 million records and a NVARCHAR(MAX) column.
I need to change the column size to fixed size but I usually have two options with this large DB:
Change it using the designer in SSMS and this takes a very long time and perhaps never finishes
Change it using Alter Column, but then the database doesn't work after that, perhaps because we have several indexes
My questions would be: what are possible solutions to achieve the desired results without errors?
Note: I use SQL Server on Azure and this database is in production
Thank you
Clarification: I actually have all the current data within the range of the new length that I want to put
Never did this on Azure, but did it with hundred million rows on SQL Server.
I'll add a new column of the desired size allowing null values of course and creating the desired index(es).
After that, I'll update in chunks this new column trimming / converting old column values to new one. After all, remove indexes on the old column and removing it.
Change it using the designer in SSMS and this takes a very long time
and perhaps never finishes
What SSMS will do behind the scenes is
create a new table with the new data type for your column
copy all the data from original table to newe table
rename new table with the old name
drop old table
recreate all the indexes on new table
Of course it will take time to copy all of your "more than 600 million records".
Besides, the whole table will locked for the duration of data loading with (holdlock, tablockx)
Change it using Alter Column, but then the database doesn't work after
that, perhaps because we have several indexes
That is not true.
If there is any index involved, and it can be only indexes that have that field ad included column beacuse of its datatype nvarchar(max), server will give you an error and will do nothing until you drop that index. And in case there is no index affected it just cannot "doesn't work after that, perhaps because we have several indexes".
Please think one more time what you want to achieve changing the type of that column. If you think you'll gain some space doing so it's wrong.
After passing from nvarchar(max) to nchar(max) (from variable length type to fixed length type) your data will occupy more space than now because you are going to store fixed number of bytes even if the column has null or 1-2-3 characters.
And if you just want to change max to smth else, for example 8000, you gain nothing because this is not the real data size but only the maximum size the data can have. And if your strings size is small enough, your nvarchar(max) is already stored as in-row data, not as LOB data as it was with ntext
Two part question:
What is Postgresql behavior for storing text/varchars in-row vs
out-of-row? Am I correct in thinking that with default settings, all columns will always be stored in-row until the 2kB size is reached?
Do we have any control over the above behavior? Is there any way I can change the threshold for a specific column/table, or force a specific column to always be stored out-of-row?
I've read through PostGresql Toast documentation (http://www.postgresql.org/docs/8.3/static/storage-toast.html), but I don't see any option for changing the thresholds (default seems to be 2kB-for-row) or for forcing a column to always store out-of-row (EXTERNAL only allows it, but doesn't enforce it).
I've found documentation explaining how to do this on SQL Server (https://msdn.microsoft.com/en-us/library/ms173530.aspx), but don't see anything similar for PostGresql.
If anyone's interested in my motivation, I have a table that has a mix of short-consistent columns (IDs, timestamps etc), a column that is varchar(200), and a column that is text/varchar(max), which can be extremely large in length. I currently have both varchars stored in a separate table, just to allow efficient storage/lookups/scanning on the short-consistent columns.
This is a pain however, because I constantly have to do joins to read all the data. I would really like to store all the above fields in the same table, and tell Postgresql to force-store the 2 VARCHARs out-of-row, always.
Edited Answer
For first part of the question: you are correct (see for instance this).
For the second part of the question: the standard way of storing columns is to compress variable length text fields if their size is over 2KB, and eventually store them into a separate area, called “TOAST table”.
You can give a “hint” to the system on how to store a field by using the following command for your columns:
ALTER TABLE YourTable
ALTER COLUMN YourColumn SET STORAGE (PLAIN | EXTENDED | EXTERNAL | MAIN)
From the manual:
SET STORAGE
This form sets the storage mode for a column. This controls whether this column is held inline or in a secondary TOAST table, and whether the data should be compressed or not. PLAIN must be used for fixed-length values such as integer and is inline, uncompressed. MAIN is for inline, compressible data. EXTERNAL is for external, uncompressed data, and EXTENDED is for external, compressed data. EXTENDED is the default for most data types that support non-PLAIN storage. Use of EXTERNAL will make substring operations on very large text and bytea values run faster, at the penalty of increased storage space. Note that SET STORAGE doesn't itself change anything in the table, it just sets the strategy to be pursued during future table updates. See Section 59.2 for more information.
Since the manual is not completely explicit on this point, this is my interpretation: the final decision about how to store the field is left in any case to the system, given the following constraints:
No field can be stored such that the total size of a row is over
8KB
No field is stored out-of-row if its size is less then the
TOAST_TUPLE_THRESHOLD
After satisfying the previous
constraints, the system tries to satisfy the SET STORAGE strategy
specified by the user. If no storage strategy is specified, each TOAST-able
field is automatically declared EXTENDED.
Under these assumption, the only way to be sure that all the values of a column are stored out-of-row is to recompile the system with a value of TOAST_TUPLE_THRESHOLD less then the minumum size of any value of the column.
is the connection between the size of a database and the speed in which a command as below is
completed dependending on the data in the column? Does "It" have to check whether it is possible to change the datatype depending on the entries? For the 12,5 mln records it takes about
15 minutes.
Code:
USE RDW_DB
GO
ALTER TABLE dbo.RDWTabel
ALTER COLUMN Aantalcilinders SmallInt
All ALTER COLUMN operations are implemented as an add of a new column and drop of the old column:
add a new column of the new type
run an internal UPDATE <table> SET <newcolumn> = CAST(<oldcolumn> as <newtype>);
mark the old column as dropped
You can inspect a table structure after an ALTER COLUMN and see all the dropped, hidden columns. See SQL Server table columns under the hood.
As you can see, this results in a size-of-data update that must touch every row to populate the values of the new column. This takes time on its own on a big table, but the usual problem is from the log growth. As the operation must be accomplished in a single transaction, the log must grow to accommodate this change. Often newbs run out of disk space when doing such changes.
Certain operations may be accomplished 'inline'. IF the new type fits in the space reserved for the old type and the on-disk layout is compatible then the update is not required: the new column is literally overlayed on top of the old column data. The article linked above exemplifies this. Also operations on variable length types that change the length are often not required to change the data on-disk and are much faster.
Additionally any ALTER operation requires an exclusive schema modification lock on the table. This will block, waiting for any current activity (queries) to drain out. Often time the perceived duration time is due to locking wait, not execution. Read How to analyse SQL Server performance for more details.
Finally some ALTER COLUMN operations do not need to modify the existing data but they do need to validate the existing data, to ensure it matches the requirements of the new type. Instead of running an UPDATE they will run a SELECT (they will scan the data) which will still be size-of-data, but at least won't generate any log.
In your case ALTER COLUMN Aantalcilinders SmallInt is impossible to tell whether the operation was size-of-data or not. It depends on what was the previous type of Aantalcilinders. If the new type grew the size then it requires a size-of-data update. Eg. if the previous type was tinyint then the ALTER COLUMN will have to update every row.
When you change the type of a column, the data generally needs to be rewritten. For instance, if the data were originally an integer, then each record will be two bytes shorter after this operation. This affects the layout of records on a page as well as the allocation of pages. Every page needs to be touched.
I believe there are some operations in SQL Server that you can do on a column that do no affect the layout. One of them is changing a column from nullable to not-nullable, because the null flags are stored for every column, regardless of nullability.
I have a SQL Server table in production that has millions of rows, and it turns out that I need to add a column to it. Or, to be more accurate, I need to add a field to the entity that the table represents.
Syntactically this isn't a problem, and if the table didn't have so many rows and wasn't in production, this would be easy.
Really what I'm after is the course of action. There are plenty of websites out there with extremely large tables, and they must add fields from time to time. How do they do it without substantial downtime?
One thing I should add, I did not want the column to allow nulls, which would mean that I'd need to have a default value.
So I either need to figure out how to add a column with a default value in a timely manner, or I need to figure out a way to update the column at a later time and then set the column to not allow nulls.
ALTER TABLE table1 ADD
newcolumn int NULL
GO
should not take that long... What takes a long time is to insert columns in the middle of other columns... b/c then the engine needs to create a new table and copy the data to the new table.
I did not want the column to allow nulls, which would mean that I'd need to have a default value.
Adding a NOT NULL column with a DEFAULT Constraint to a table of any number of rows (even billions) became a lot easier starting in SQL Server 2012 (but only for Enterprise Edition) as they allowed it to be an Online operation (in most cases) where, for existing rows, the value will be read from meta-data and not actually stored in the row until the row is updated, or clustered index is rebuilt. Rather than paraphrase any more, here is the relevant section from the MSDN page for ALTER TABLE:
Adding NOT NULL Columns as an Online Operation
Starting with SQL Server 2012 Enterprise Edition, adding a NOT NULL column with a default value is an online operation when the default value is a runtime constant. This means that the operation is completed almost instantaneously regardless of the number of rows in the table. This is because the existing rows in the table are not updated during the operation; instead, the default value is stored only in the metadata of the table and the value is looked up as needed in queries that access these rows. This behavior is automatic; no additional syntax is required to implement the online operation beyond the ADD COLUMN syntax. A runtime constant is an expression that produces the same value at runtime for each row in the table regardless of its determinism. For example, the constant expression "My temporary data", or the system function GETUTCDATETIME() are runtime constants. In contrast, the functions NEWID() or NEWSEQUENTIALID() are not runtime constants because a unique value is produced for each row in the table. Adding a NOT NULL column with a default value that is not a runtime constant is always performed offline and an exclusive (SCH-M) lock is acquired for the duration of the operation.
While the existing rows reference the value stored in metadata, the default value is stored on the row for any new rows that are inserted and do not specify another value for the column. The default value stored in metadata is moved to an existing row when the row is updated (even if the actual column is not specified in the UPDATE statement), or if the table or clustered index is rebuilt.
Columns of type varchar(max), nvarchar(max), varbinary(max), xml, text, ntext, image, hierarchyid, geometry, geography, or CLR UDTS, cannot be added in an online operation. A column cannot be added online if doing so causes the maximum possible row size to exceed the 8,060 byte limit. The column is added as an offline operation in this case.
The only real solution for continuous uptime is redundancy.
I acknowledge #Nestor's answer that adding a new column shouldn't take long in SQL Server, but nevertheless, it could still be an outage that is not acceptable on a production system. An alternative is to make the change in a parallel system, and then once the operation is complete, swap the new for the old.
For example, if you need to add a column, you may create a copy of the table, then add the column to that copy, and then use sp_rename() to move the old table aside and the new table into place.
If you have referential integrity constraints pointing to this table, this can make the swap even more tricky. You probably have to drop the constraints briefly as you swap the tables.
For some kinds of complex upgrades, you could completely duplicate the database on a separate server host. Once that's ready, just swap the DNS entries for the two servers and voilà!
I supported a stock exchange company
in the 1990's who ran three duplicate
database servers at all times. That
way they could implement upgrades on
one server, while retaining one
production server and one failover
server. Their operations had a
standard procedure of rotating the
three machines through production,
failover, and maintenance roles every
day. When they needed to upgrade
hardware, software, or alter the
database schema, it took three days to
propagate the change through their
servers, but they could do it with no
interruption in service. All thanks
to redundancy.
"Add the column and then perform relatively small UPDATE batches to populate the column with a default value. That should prevent any noticeable slowdowns"
And after that you have to set the column to NOT NULL which will fire off in one big transaction. So everything will run really fast until you do that so you have probably gained very little really. I only know this from first hand experience.
You might want to rename the current table from X to Y. You can do this with this command sp_RENAME '[OldTableName]' , '[NewTableName]'.
Recreate the new table as X with the new column set to NOT NULL and then batch insert from Y to X and include a default value either in your insert for the new column or placing a default value on the new column when you recreate table X.
I have done this type of change on a table with hundreds of millions of rows. It still took over an hour, but it didn't blow out our trans log. When I tried to just change the column to NOT NULL with all the data in the table it took over 20 hours before I killed the process.
Have you tested just adding a column filling it with data and setting the column to NOT NULL?
So in the end I don't think there's a magic bullet.
select into a new table and rename. Example, Adding column i to table A:
select *, 1 as i
into A_tmp
from A_tbl
//Add any indexes here
exec sp_rename 'A_tbl', 'A_old'
exec sp_rename 'A_tmp', 'A_tbl'
Should be fast and won't touch your transaction log like inserting in batches might.
(I just did this today w/ a 70 million row table in < 2 min).
You can wrap it in a transaction if you need it to be an online operation (something might change in the table between the select into and the renames).
Another technique is to add the column to a new related table (Assume a one-to-one relationship which you can enforce by giving the FK a unique index). You can then populate this in batches and then you can add the join to this table wherever you want the data to appear. Note I would only consider this for a column that I would not want to use in every query on the original table or if the record width of my original table was getting too large or if I was adding several columns.