Does changing varbinary(MAX) to varbinary(300) make any difference on the physical disk space? - sql

Firt of all excuse me for the grammar as English is not my primary key :)
I am trying to find out if changing varbinary(max) to varbinary(300) reduces physical disk space usage by the table.
We have a very limited physical disk space and were trying to optimize everywhere including columns in the database.
We have >100 columns (in different tables with millions of rows) with varbinary(max) data type used for storing encrypted values and we don't need the max length as it fits in < 300 length.
Is there any gain in disk space if we switch to varbinary(300) ?
Does varbinary(max) preallocates all its required disk space when creating the table or inserting data into that column?
Does varbinary(max) column take all its disk space even if it has data with length <300?
I haven't been able to find anything anywhere except the following line:
"The storage size is the actual length of the data entered + 2 bytes."
https://learn.microsoft.com/en-us/sql/t-sql/data-types/binary-and-varbinary-transact-sql
Aly help in answering the above 3 questions would be appreciated. Thanks.

Is there any gain in disk space if we switch to varbinary(300) ?
If all your binary values fit into 300 bytes, it will be no change at all in terms of space used. That's because these values are already stored in-row.
The format, in which SQL Server stores the data from the (MAX)
columns, such as varchar(max), nvarchar(max), and varbinary(max),
depends on the actual data size. SQL Server stores it in-row when
possible. When in-row allocation is impossible, and data size is less
than or equal to 8,000 bytes, it is stored as row-overflow data. The
data that exceeds 8,000 bytes is stored as LOB data
So when you change it to varbinary(300) it will an instantaneous operation with only metadata change.
Does varbinary(max) preallocates all its required disk space when
creating the table or inserting data into that column?
No, it doesn't.
Variable-length data types, such as varchar, varbinary, and a few
others, use as much storage space as is required to store data, plus
two extra bytes
Does varbinary(max) column take all its disk space even if it has
data with length <300?
No, as said above, it will take exactly the size need to store the actual value, so if you put a 3-bytes value into varchar(max) column it will used only 5 bytes and if ypu put a null value it will used 2 bytes only (and if the null value is in the last column it will not take space at all)
When you turn varbinary(max) to varbinary(300) as I said it will change nothing data (metadata only), but if you then update the row with a new value, the old column will be dropped and a new one will be created, so not only you'll not gain a space, you'll waste the space, because the space used for old column will not be released, it will be only marked as dropped
Literature:
Microsoft SQL Server 2012 Internals (Developer Reference) by Kalen
Delaney
Pro SQL Server Internals by Dmitri Korotkevitch

Related

SQL Query Performance with an nvarchar(500) where the MAX(LEN(column)) < 30 [duplicate]

I've read up on this on MSDN forums and here and I'm still not clear. I think this is correct: Varchar(max) will be stored as a text datatype, so that has drawbacks. So lets say your field will reliably be under 8000 characters. Like a BusinessName field in my database table. In reality, a business name will probably always be under (pulling a number outta my hat) 500 characters. It seems like plenty of varchar fields that I run across fall well under the 8k character count.
So should I make that field a varchar(500) instead of varchar(8000)? From what I understand of SQL there's no difference between those two. So, to make life easy, I'd want to define all my varchar fields as varchar(8000). Does that have any drawbacks?
Related: Size of varchar columns (I didn't feel like this one answered my question).
One example where this can make a difference is that it can prevent a performance optimization that avoids adding row versioning information to tables with after triggers.
This is covered by Paul White here
The actual size of the data stored is immaterial – it is the potential
size that matters.
Similarly if using memory optimised tables since 2016 it has been possible to use LOB columns or combinations of column widths that could potentially exceed the inrow limit but with a penalty.
(Max) columns are always stored off-row. For other columns, if the data row size in the table definition can exceed 8,060 bytes, SQL Server pushes largest variable-length column(s) off-row. Again, it does not depend on amount of the data you store there.
This can have a large negative effect on memory consumption and performance
Another case where over declaring column widths can make a big difference is if the table will ever be processed using SSIS. The memory allocated for variable length (non BLOB) columns is fixed for each row in an execution tree and is per the columns' declared maximum length which can lead to inefficient usage of memory buffers (example). Whilst the SSIS package developer can declare a smaller column size than the source this analysis is best done up front and enforced there.
Back in the SQL Server engine itself a similar case is that when calculating the memory grant to allocate for SORT operations SQL Server assumes that varchar(x) columns will on average consume x/2 bytes.
If most of your varchar columns are fuller than that this can lead to the sort operations spilling to tempdb.
In your case if your varchar columns are declared as 8000 bytes but actually have contents much less than that your query will be allocated memory that it doesn't require which is obviously inefficient and can lead to waits for memory grants.
This is covered in Part 2 of SQL Workshops Webcast 1 downloadable from here or see below.
use tempdb;
CREATE TABLE T(
id INT IDENTITY(1,1) PRIMARY KEY,
number int,
name8000 VARCHAR(8000),
name500 VARCHAR(500))
INSERT INTO T
(number,name8000,name500)
SELECT number, name, name /*<--Same contents in both cols*/
FROM master..spt_values
SELECT id,name500
FROM T
ORDER BY number
SELECT id,name8000
FROM T
ORDER BY number
From a processing standpoint, it will not make a difference to use varchar(8000) vs varchar(500). It's more of a "good practice" kind of thing to define a maximum length that a field should hold and make your varchar that length. It's something that can be used to assist with data validation. For instance, making a state abbreviation be 2 characters or a postal/zip code as 5 or 9 characters. This used to be a more important distinction for when your data interacted with other systems or user interfaces where field length was critical (e.g. a mainframe flat file dataset), but nowadays I think it's more habit than anything else.
There are some disadvantages to large columns that are a bit less obvious and might catch you a little later:
All columns you use in an INDEX - must not exceed 900 bytes
All the columns in an ORDER BY clause may not exceed 8060 bytes. This is a bit difficult to grasp since this only applies to some columns. See SQL 2008 R2 Row size limit exceeded for details)
If the total row size exceeds 8060 bytes, you get a "page spill" for that row. This might affect performance (A page is an allocation unit in SQLServer and is fixed at 8000 bytes+some overhead. Exceeding this will not be severe, but it's noticable and you should try to avoid it if you easily can)
Many other internal datastructures, buffers and last-not-least your own varaibles and table-variables all need to mirror these sizes. With excessive sizes, excessive memory allocation can affect performance
As a general rule, try to be conservative with the column width. If it becomes a problem, you can easily expand it to fit the needs. If you notice memory issues later, shrinking a wide column later may become impossible without losing data and you won't know where to begin.
In your example of the business names, think about where you get to display them. Is there really space for 500 characters?? If not, there is little point in storing them as such. http://en.wikipedia.org/wiki/List_of_companies_of_the_United_States lists some company names and the max is about 50 characters. So I'd use 100 for the column max. Maybe more like 80.
Apart from best practices (BBlake's answer)
You get warnings about maximum row size (8060) bytes and index width (900 bytes) with DDL
DML will die if you exceed these limits
ANSI PADDING ON is the default so you could end up storing a wholeload of whitespace
Ideally you'd want to go smaller than that, down to a reasonably sized length (500 isn't reasonably sized) and make sure the client validation catches when the data is going to be too large and send a useful error.
While the varchar isn't actually going to reserve space in the database for the unused space, I recall versions of SQL Server having a snit about database rows being wider than some number of bytes (do not recall the exact count) and actually throwing out whatever data didn't fit. A certain number of those bytes were reserved for things internal to SQL Server.

Impact of altering table column size/length in SQL Server

Consider I've column VARCHAR(MAX). What if I change it to VARCHAR(500) will Microsoft SQL Server will decrease the size claimed by table?
If you got any link just comment it. I'll check it out.
Update:
I've tested following two case with table.
ALTER column size
Create new table and import data from old table.
Initial Table size
ALTER TABLE table_transaction ALTER COLUMN column_name VARCHAR(500)
After ALTER column, table size is incresed
Create new table with new column size and import data from old table
I've taken care of Index in new table.
Why table size is increased in case of ALTER COLUMN. Ideally table size should decrease.
After performing de-fragmentation on PK in original table few MB decreased. However its not promising like creating new table.
When you change varchar(n) column to varchar(MAX) or visa-versa, SQL Server will update every row in the table. This will temporarily increase table size until you rebuild the clustered index or execute DBCC CLEANTABLE.
For ongoing space requirements of a varchar(MAX) column, the space will be the same as varchar(n) as long as the value remains in-row. However, if the value exceeds 8000 bytes, it will be stored on separate LOB page(s) dedicated to the value. This will increase space requirements and require extra I/O when a query needs the value.
I good rule of thumb is to use MAX types only if the value may exceed 8000 bytes, and specify a proper max length for the domain of data stored for data 8000 bytes or less.
According to the documentation, there is no difference in the storage of strings:
varchar [ ( n | max ) ]
Variable-length, non-Unicode string data. n defines the string length
and can be a value from 1 through 8,000. max indicates that the
maximum storage size is 2^31-1 bytes (2 GB). The storage size is the
actual length of the data entered + 2 bytes.
As I read this, the storage size is the actual length plus two bytes regardless of whether you use n or max.
I am suspicious about this. I would expect the length of varchar(max) to occupy four bytes. And there might be additional overhead for storing off-page references (if they exist). However, the documentation is pretty clear on this point.
Whether changing the data type changes the size of the field depends on the data already stored. You can have several situations.
If all the values are NULL, then there will be no changes at all. The values are not being stored.
If all the values are less than 20 bytes, then -- according to the documentation -- there would be no change. I have a nagging suspicion that you might save 2 bytes per value, but I can't find a reference to it and don't have SQL Server on hand today to check.
If values exceed 20 bytes but remain on the page, then you will save space because the values will change.
If the values go off-page, then you will save the header information as well as truncating the data (thank you Dan for pointing this out).

Max Row Size in SQL Server 2012 with varchar(max) fields

I have created a table with column types as nvarchar(max), which my understanding is that they can support 2GB. However on inserting, I still receive this error:
Cannot create a row of size 8061 which is greater than the allowable maximum row size of 8060.
Is there a global setting on the database required, or is there another limit I am hitting? Is there a limit to the number of varchar(max) fields per table?
SQL server uses page to store data. Page size is 8kb.
So a record size (row size) in SQL server cannot be greater than 8060 bytes.
If data is not fitted in 8060 bytes then reference pointers are used.
When a combination of varchar, nvarchar, varbinary, sql_variant, or CLR user-defined type columns exceeds this limit, the SQL Server Database Engine moves the record column with the largest width to another page in the ROW_OVERFLOW_DATA allocation unit, while maintaining a 24-byte pointer on the original page.
Moving large records to another page occurs dynamically as records are lengthened based on update operations. Update operations that shorten records may cause records to be moved back to the original page in the IN_ROW_DATA allocation unit.
Also, querying and performing other select operations, such as sorts or joins on large records that contain row-overflow data slows processing time, because these records are processed synchronously instead of asynchronously.
The record-size limit for tables that use sparse columns is 8,018 bytes. When the converted data plus existing record data exceeds 8,018 bytes, MSSQLSERVER ERROR 576 is returned. When columns are converted between sparse and nonsparse types, Database Engine keeps a copy of the current record data. This temporarily doubles the storage that is required for the record. .
To obtain information about tables or indexes that might contain row-overflow data, use the sys.dm_db_index_physical_stats dynamic management function.
From the SQL Server documentation:
The length of individual columns must still fall within the limit of
8,000 bytes for varchar, nvarchar, varbinary, sql_variant, and CLR
user-defined type columns. Only their combined lengths can exceed the
8,060-byte row limit of a table.
The sum of other data type columns, including char and nchar data,
must fall within the 8,060-byte row limit. Large object data is also
exempt from the 8,060-byte row limit.
More info here: https://technet.microsoft.com/en-us/library/ms186981%28v=sql.105%29.aspx
This comes from an earlier thread on StackOverflow that can be found here:
Cannot create a row of size 8937 which is greater than the allowable maximum of 8060
The error is caused because you cannot have a row in SQL server which is larger than 8KB (the size of 1 page) because rows are not allowed to span pages - its a basic limit of SQL Server [...]
Note that SQL server will allow you to create the table, however if you try to actually insert any data which spans multiple pages then it will give the above error.
Of course this doesn't quite add up, because if the above was the whole truth then single VARCHAR(8000) column would fill a row in a table! (This used to be the case). SQL Server 2005 got around this limitation by allowing certain data from a row to be stored in another page, and instead leaving a 24-bit pointer instead.
I would suggest normalizing your table into one or more relalted tables.

is there an advantage to varchar(500) over varchar(8000)?

I've read up on this on MSDN forums and here and I'm still not clear. I think this is correct: Varchar(max) will be stored as a text datatype, so that has drawbacks. So lets say your field will reliably be under 8000 characters. Like a BusinessName field in my database table. In reality, a business name will probably always be under (pulling a number outta my hat) 500 characters. It seems like plenty of varchar fields that I run across fall well under the 8k character count.
So should I make that field a varchar(500) instead of varchar(8000)? From what I understand of SQL there's no difference between those two. So, to make life easy, I'd want to define all my varchar fields as varchar(8000). Does that have any drawbacks?
Related: Size of varchar columns (I didn't feel like this one answered my question).
One example where this can make a difference is that it can prevent a performance optimization that avoids adding row versioning information to tables with after triggers.
This is covered by Paul White here
The actual size of the data stored is immaterial – it is the potential
size that matters.
Similarly if using memory optimised tables since 2016 it has been possible to use LOB columns or combinations of column widths that could potentially exceed the inrow limit but with a penalty.
(Max) columns are always stored off-row. For other columns, if the data row size in the table definition can exceed 8,060 bytes, SQL Server pushes largest variable-length column(s) off-row. Again, it does not depend on amount of the data you store there.
This can have a large negative effect on memory consumption and performance
Another case where over declaring column widths can make a big difference is if the table will ever be processed using SSIS. The memory allocated for variable length (non BLOB) columns is fixed for each row in an execution tree and is per the columns' declared maximum length which can lead to inefficient usage of memory buffers (example). Whilst the SSIS package developer can declare a smaller column size than the source this analysis is best done up front and enforced there.
Back in the SQL Server engine itself a similar case is that when calculating the memory grant to allocate for SORT operations SQL Server assumes that varchar(x) columns will on average consume x/2 bytes.
If most of your varchar columns are fuller than that this can lead to the sort operations spilling to tempdb.
In your case if your varchar columns are declared as 8000 bytes but actually have contents much less than that your query will be allocated memory that it doesn't require which is obviously inefficient and can lead to waits for memory grants.
This is covered in Part 2 of SQL Workshops Webcast 1 downloadable from here or see below.
use tempdb;
CREATE TABLE T(
id INT IDENTITY(1,1) PRIMARY KEY,
number int,
name8000 VARCHAR(8000),
name500 VARCHAR(500))
INSERT INTO T
(number,name8000,name500)
SELECT number, name, name /*<--Same contents in both cols*/
FROM master..spt_values
SELECT id,name500
FROM T
ORDER BY number
SELECT id,name8000
FROM T
ORDER BY number
From a processing standpoint, it will not make a difference to use varchar(8000) vs varchar(500). It's more of a "good practice" kind of thing to define a maximum length that a field should hold and make your varchar that length. It's something that can be used to assist with data validation. For instance, making a state abbreviation be 2 characters or a postal/zip code as 5 or 9 characters. This used to be a more important distinction for when your data interacted with other systems or user interfaces where field length was critical (e.g. a mainframe flat file dataset), but nowadays I think it's more habit than anything else.
There are some disadvantages to large columns that are a bit less obvious and might catch you a little later:
All columns you use in an INDEX - must not exceed 900 bytes
All the columns in an ORDER BY clause may not exceed 8060 bytes. This is a bit difficult to grasp since this only applies to some columns. See SQL 2008 R2 Row size limit exceeded for details)
If the total row size exceeds 8060 bytes, you get a "page spill" for that row. This might affect performance (A page is an allocation unit in SQLServer and is fixed at 8000 bytes+some overhead. Exceeding this will not be severe, but it's noticable and you should try to avoid it if you easily can)
Many other internal datastructures, buffers and last-not-least your own varaibles and table-variables all need to mirror these sizes. With excessive sizes, excessive memory allocation can affect performance
As a general rule, try to be conservative with the column width. If it becomes a problem, you can easily expand it to fit the needs. If you notice memory issues later, shrinking a wide column later may become impossible without losing data and you won't know where to begin.
In your example of the business names, think about where you get to display them. Is there really space for 500 characters?? If not, there is little point in storing them as such. http://en.wikipedia.org/wiki/List_of_companies_of_the_United_States lists some company names and the max is about 50 characters. So I'd use 100 for the column max. Maybe more like 80.
Apart from best practices (BBlake's answer)
You get warnings about maximum row size (8060) bytes and index width (900 bytes) with DDL
DML will die if you exceed these limits
ANSI PADDING ON is the default so you could end up storing a wholeload of whitespace
Ideally you'd want to go smaller than that, down to a reasonably sized length (500 isn't reasonably sized) and make sure the client validation catches when the data is going to be too large and send a useful error.
While the varchar isn't actually going to reserve space in the database for the unused space, I recall versions of SQL Server having a snit about database rows being wider than some number of bytes (do not recall the exact count) and actually throwing out whatever data didn't fit. A certain number of those bytes were reserved for things internal to SQL Server.

How much more inefficient are text (blobs) than varchar/nvarchar's?

We're doing a lot of large, but straightforward forms for a fairly big project (about 600 users using it throughout the day - that's big for me at least ;-) ).
The forms have a lot of question/answer type sections, so it's natural for some people to type a sentence, while others type a novel. How beneficial would it be to put a character limit on some of these fields really?
(Please include references or citations, if necessary/possible - Thanks!)
If you have no limitations on the data size, then why worry. This doesn't sound like a mission critical project, even with 600 users and several thousand records. Use CLOB/BLOB and be done with it. I have doubts as to whether you would see any major gains in limiting sizes and risking data loss. That said, you should layout such boundaries before implementation.
Usually varchar is best for storing values that you wish to use logically and perform "whole value" comparisons against. Text is for unstructured data. If your project is a survey result with unstructured text, use CLOB/BLOB
Semi-Reference: I work with hundreds of thousands of call center records sometimes where we use a CLOB to store the dialog between employees and customers.
I say, focus on the needs of the users and only worry about database performance issues when/if those issues arise. Ask yourself "will my users benefit if I limit the amount of data they can enter".
I keep a great gapingvoid cartoon on my wall that says "it's not what the software does. it's what the user does".
You don't mention which sql server you are using
If you are using MySql there are definite advantages in speed to using fixed length fields to keep the table in static mode, however if you have any variable width fields the table will switch to dynamic and you lose the benefit of specifying the length of the field.
http://dev.mysql.com/doc/refman/5.0/en/static-format.html
http://dev.mysql.com/doc/refman/5.0/en/dynamic-format.html
Microsoft SQL Server has similar performance gains when you use fixed length columns. With fixed length columns the server knows exactly what the offset and length of the data in the row is. With variable length columns the server knows the offset but has to store the actual length of the data as a preceding 2byte counter. This has a couple of implications that are discussed in this interesting article that discusses performance as a function of disk space and the advantages of variable length columns.
If you are using SQL Server 2005 or newer you can take advantage of varchar(max). This column type has the same 2GB storage capacity of BLOBs but the data is stored in 8K chunks with the table data pages instead of in a separate store. So you get the large size advantage, only use 8K in your pages at a time, quick access for the DB engine, and the same query semantics that work with other column types work with varchar(max).
In the end specifying a max length on a variable column mainly lets you constrain the growth size of your database. Once you use variable length columns you lose the advantage of fixed size rows and varchar(max) will perform the same as varchar(10) when holding the same amount of data.
blob and text / ntext are stored outside of the row context, and only a reference stored to the object, resulting in a smaller row size, which will improve performance on clustered indexes.
However because text / ntext are not stored with the row data retrival takes longer, and these fields cannot be used in any comparison statements.
from: http://www.making-the-web.com/2008/03/24/saving-bytes-efficient-data-storage-mysql-part-1/
There are a few variations of the TEXT and BLOB types which affect size; they are:
Type - Maximum Length -Storage
TINYBLOB, TINYTEXT 255 Length+1 bytes
BLOB, TEXT 65535 Length+2 bytes
MEDIUMBLOB, MEDIUMTEXT 16777215 Length+3 bytes
LONGBLOB, LONGTEXT 4294967295 Length+4 bytes