Difference between different string types in SQL Server? - sql

What is the difference between char, nchar, ntext, nvarchar, text and varchar in SQL?
Is there really an application case for each of these types, or are some of them just deprecated?

text and ntext are deprecated, so lets omit them for a moment. For what is left, there are 3 dimensions:
Unicode (UCS-2) vs. non-unicode: N in front of the name denotes Unicode
Fixed length vs. variable length: var denotes variable, otherwise fixed
In-row vs. BLOB: (max) as length denotes a BLOB, otherwise is an in-row value
So with this, you can read any type's meaning:
CHAR(10): is an in-row fixed length non-Unicode of size 10
NVARCHAR(256): is an in-row variable length Unicode of size up-to 256
VARCHAR(MAX): is a BLOB variable length non-Unicode
The deprecated types text and ntext correspond to the new types varchar(max) and nvarchar(max) respectively.
When you go to details, the meaning of in-row vs. BLOB blurs for small lengths as the engine may optimize the storage and pull a BLOB in-row or push an in-row value into the 'small BLOB' allocation unit, but this is just an implementation detail. See Table and Index Organization.
From a programming point of view, all types: CHAR, VARCHAR, NCHAR, NVARCHAR, VARCHAR(MAX) and NVARCHAR(MAX), support an uniform string API: String Functions. The old, deprecated, types TEXT and NTEXT do not support this API, they have a separate, deperated, TEXT API to manipulate. You should not use the deprecated types.
BLOB types support efficient in-place updates by using the UPDATE table SET column.WRITE(#value, #offset) syntax.
The difference between fixed-length and variable length types vanishes when row-compression on a table. With row-compression enabled, fixed lenght types and variable length are stored in the same format and trailing spaces are not stored on disk, see Row Compression Implementation. Note that page-compression implies row-compression.

'n' represents support for unicode characters.
char - specifies string with fixed length storage. Space allocated with or without data present.
varchar - Varying length storage. Space is allocated as much as length of data in column.
text - To store huge data. The space allocated is 16 bytes for column storage.
Additionally - text and ntext have been deprecated for varchar(max) and nvarchar(max)

text and ntext are deprecated in favor of varchar(max) and nvarchar(max)

The n prefix simply means Unicode. They "n" types work similarly to the plain versions except they work with Unicode text.
char is a fixed length field. Thus char(10) filled with "Yes" will still take 10 bytes of storage.
varchar is a variable length field. char(10) filled with "Yes" will take 5 bytes of storage (there is a 2 byte overhead for using var data types).
char(n) holding string of length x. Storage = n bytes.
varchar(n) holding string of length x. Storage = x+2 bytes.
vchar and nvarchar are similar except it is 2 bytes per character.
Generally speaking you should only use char & char (over varchar & nvarchar) when working with fixed or semi-fixed strings. A good example would be a product_code or user_type which is always n characters long.
You shouldn't use text (or ntext) as it has been deprecated. varchar(max) & nvarchar(max) provides the same functionality.

N prefix indicates unicode support and takes up twice the bytes per character of non-unicode.
Varchar is variable length. You use an extra 2 bytes per field to store the length.
Char is fixed length. If you know how long your data will be, use char as you will save bytes!
Text is mostly deprecated in my experience.
Be wary of using Varchar(max) and NVarchar(max) as these fields cannot be indexed.

I only know between "char" and "varchar".
char: it can allocate memory of specified size whether or not it is filled
varchar: it will allocate memory based on the number of characters in it but it should have some size called maximum size.

Text is meant for very large amounts of text, and is in general not meant to be searchable (but can be in some circumstances. It will be slow anyway).
The char/nchar datatypes are of fixed lenghts, and are padded if entered stuff is shorter, as opposed to the varchar/nvarchar types, which are variable length.
The n types have unicode support, where the non-n types don't.

Text is deprecated.
Char is a set value. When you say char(10), you are reserving 10 characters for every single row, whether they are used or not. Use this for something that shouldn't change lengths (For example, Zip Code or SSN)
varchar is variable. When you say varchar(10), 2 bytes is set aside to store the size of the data, as well as the actual data (which might be only say, four bytes).
The N represents uni-code. Twice the space.

n-prefix: unicode.
var*: variable length, the rest is fixed length.
All data types are properly and nicely... documented.
Like here:
http://msdn.microsoft.com/en-us/library/ms187752.aspx
Is there really an application case
for each of these types, or are some
of them just deprecated?
No, there is a good case for ANY of them.

Related

About TDengine supported data type

Please explain the difference between the character data type varchar and nchar in TDengine. For the storage of character data, the case provided by TDengine is compared with the traditional mysql database. Nchar is more used as the character format type instead of varchar, are there more details about the data type selection?
From my understanding, in mysql char/nchar is used to represent fixed length characters, while varchar/varnchar to represent variable length characters:
https://dev.mysql.com/doc/refman/8.0/en/char.html
For char/nchar types usually a length is specified. For example NCHAR type in TDengine is fixed to 4 bytes. If we create the column entry with type NCHAR(4), "abc" is stored as 4 bytes and trailing spaces will be used to pad the original string. However, if "abc" is stored as varchar the length will be 3. Since NCHAR is fixed length so it would be faster when operating data, but may cause additional storage overhead IMO.
There is no varchar data type in TDengine.

Best data type for storing strings in SQL Server?

What's the best data type to be used when storing strings, like a first name? I've seen varchar and nvarchar both used. Which one is better? Does it matter?
I've also heard that the best length to use is 255, but I don't know why. Is there a specific length that is preferred for strings?
nvarchar stores unicode character data which is required if you plan to store non-English names. If it's a web application, I highly recommend using nvarchar even if you don't plan on being international. The downside is that it consumes twice as much space, 16-bits per character for nvarchar and 8-bits per character for varchar.
What's the best data type to be used
when storing strings, like a first
name? I've seen varchar and nvarchar
both used. Which one is better? Does
it matter?
See What is the difference between nchar(10) and varchar(10) in MSSQL?
If you need non-ASCII characters, you have to use nchar/nvarchar. If you don't, then you may want to use char/varchar to save space.
Note that this issue is specific to MS SQL Server, which doesn't have good support for UTF-8. In other SQL implementations that do, you can use Unicode strings with no extra space requirements (for English).
EDIT: Since this answer was originally written, SQL Server 2019 (15.x) finally introduced UTF-8 support. You may want to consider using it as your default database text encoding.
I've also heard that the best length
to use is 255, but I don't know why.
See Is there a good reason I see VARCHAR(255) used so often (as opposed to another length)?
Is there a specific length that is
preferred for strings?
If you data has a well-defined maximum limit (e.g., 17 characters for a VIN), then use that.
OTOH, if the limit is arbitrary, then choose a generous maximum size to avoid rejecting valid data. In SQL Server, you may want to consider the 900-byte maximum size of index keys.
nvarchar means you can save unicode character inside it. there is 2GB limit for nvarchar type. if the field length is more than 4000 characters, an overflow page is used. smaller fields means one page can hold more rows which increase the query performance.
Generally, for small strings use nvarchar(n), which supports Unicode characters. The string is compressed when used with row or page compression (at least one of which is generally desirable).
Large strings need nvarchar(max), which Unicode compression does not support.
For special-case scenarios when your data set never uses Unicode characters, varchar(n) and varchar(max) restrict the string type of one byte per character.
If you know the max length (n) is less than 256, SQL Server only needs to use 1 byte to store the string length. This reduces storage space by about half a percent compared a string type whose max length is just over 255.

Does varchar(max) = varchar?

I noticed that I can write
SELECT CAST(Min(mynumber) AS VARCHAR(Max))+'mystring' AS X
as
SELECT CAST(Min(mynumber) AS VARCHAR)+'mystring' X
Will I regret leaving out the (Max) parameter?
You'll regret it in the (unlikely) situation that MAX(mynumber) has more than 30 characters:
When n is not specified when using the CAST and CONVERT functions, the default length is 30.
VARCHAR(MAX) should be used for Large Objects.It uses the normal datapages until the content actually fills 8k of data. When overflow happens, data is stored as old TEXT, IMAGE and a pointer is replacing the old content.
Varchar is for Variable-length, non-Unicode character data. n can be a value from 1 through 8,000. Max indicates that the maximum storage size is 2^31-1 bytes.
Hope it helps.
When a varchar's lenght is not specified in a data definition or variable declaration statement, the default length is 1. When it is not specified when using the CAST and CONVERT functions, the default length is 30.
See: char and varchar (Transact-SQL)
I feel that it is poor practice to code without specifying a length for varchar.

How much real storage is used with a varchar(100) declaration in mysql?

If I have a table with a field which is declared as accepting varchar(100) and then I actually insert the word "hello" how much real storage will be used on the mysql server? Also will an insert of NULL result in no storage being used even though varchar(100) is declared?
What ever the answer is, is it consistent accross different database implementations?
If I have a table with a field which
is declared as accepting varchar(100)
and then I actually insert the word
"hello" how much real storage will be
used on the mysql server?
Mysql will store 5 bytes plus one byte for the length. If the varchar is greater than 255, then it will store 2 bytes for the length.
Note that this is dependent on the charset of the column. If the charset is utf8, mysql will require up to 3 bytes per character. Some storage engines (i.e. memory) will always require the maximum byte length per character for the character set.
Also will an insert of NULL result in
no storage being used even though
varchar(100) is declared?
Making a column nullable means that mysql will have to set aside an extra byte per up to 8 nullable columns per row. This is called the "null mask".
What ever the answer is, is it consistent accross different database implementations?
It's not even consistent between storage engines within mysql!
It really depends on your table's charset.
In contrast to CHAR, VARCHAR values
are stored as a one-byte or two-byte
length prefix plus data. The length
prefix indicates the number of bytes
in the value. A column uses one length
byte if values require no more than
255 bytes, two length bytes if values
may require more than 255 bytes.
- source
UTF-8 often takes more space than an
encoding made for one or a few
languages. Latin letters with
diacritics and characters from other
alphabetic scripts typically take one
byte per character in the appropriate
multi-byte encoding but take two in
UTF-8. East Asian scripts generally
have two bytes per character in their
multi-byte encodings yet take three
bytes per character in UTF-8.
- source
varchar only stores what is used whereas char stores a set number of bytes.
Utf16 sometimes takes less data then utf8, for some rare languages, I don't know which ones.
Guys, is there an option to use COMPRESSed tables in MySql? Like in Apache. Thanks a lot

What's the difference between VARCHAR and CHAR?

What's the difference between VARCHAR and CHAR in MySQL?
I am trying to store MD5 hashes.
VARCHAR is variable-length.
CHAR is fixed length.
If your content is a fixed size, you'll get better performance with CHAR.
See the MySQL page on CHAR and VARCHAR Types for a detailed explanation (be sure to also read the comments).
CHAR
Used to store character string value of fixed length.
The maximum no. of characters the data type can hold is 255 characters.
It's 50% faster than VARCHAR.
Uses static memory allocation.
VARCHAR
Used to store variable length alphanumeric data.
The maximum this data type can hold is up to
Pre-MySQL 5.0.3: 255 characters.
Post-MySQL 5.0.3: 65,535 characters shared for the row.
It's slower than CHAR.
Uses dynamic memory allocation.
CHAR Vs VARCHAR
CHAR is used for Fixed Length Size Variable
VARCHAR is used for Variable Length Size Variable.
E.g.
Create table temp
(City CHAR(10),
Street VARCHAR(10));
Insert into temp
values('Pune','Oxford');
select length(city), length(street) from temp;
Output will be
length(City) Length(street)
10 6
Conclusion: To use storage space efficiently must use VARCHAR Instead CHAR if variable length is variable
A CHAR(x) column can only have exactly x characters.
A VARCHAR(x) column can have up to x characters.
Since your MD5 hashes will always be the same size, you should probably use a CHAR.
However, you shouldn't be using MD5 in the first place; it has known weaknesses.
Use SHA2 instead.
If you're hashing passwords, you should use bcrypt.
What's the difference between VARCHAR and CHAR in MySQL?
To already given answers I would like to add that in OLTP systems or in systems with frequent updates consider using CHAR even for variable size columns because of possible VARCHAR column fragmentation during updates.
I am trying to store MD5 hashes.
MD5 hash is not the best choice if security really matters. However, if you will use any hash function, consider BINARY type for it instead (e.g. MD5 will produce 16-byte hash, so BINARY(16) would be enough instead of CHAR(32) for 32 characters representing hex digits. This would save more space and be performance effective.
Varchar cuts off trailing spaces if the entered characters is shorter than the declared length, while char will not. Char will pad spaces and will always be the length of the declared length. In terms of efficiency, varchar is more adept as it trims characters to allow more adjustment. However, if you know the exact length of char, char will execute with a bit more speed.
CHAR is fixed length and VARCHAR is variable length. CHAR always uses the same amount of storage space per entry, while VARCHAR only uses the amount necessary to store the actual text.
CHAR is a fixed length field; VARCHAR is a variable length field. If you are storing strings with a wildly variable length such as names, then use a VARCHAR, if the length is always the same, then use a CHAR because it is slightly more size-efficient, and also slightly faster.
In most RDBMSs today, they are synonyms. However for those systems that still have a distinction, a CHAR field is stored as a fixed-width column. If you define it as CHAR(10), then 10 characters are written to the table, where "padding" (typically spaces) is used to fill in any space that the data does not use up. For example, saving "bob" would be saved as ("bob"+7 spaces). A VARCHAR (variable character) column is meant to store data without wasting the extra space that a CHAR column does.
As always, Wikipedia speaks louder.
CHAR
CHAR is a fixed length string data type, so any remaining space in the field is padded with blanks.
CHAR takes up 1 byte per character. So, a CHAR(100) field (or variable) takes up 100 bytes on disk, regardless of the string it holds.
VARCHAR
VARCHAR is a variable length string data type, so it holds only the characters you assign to it.
VARCHAR takes up 1 byte per character, + 2 bytes to hold length information (For example, if you set a VARCHAR(100) data type = ‘Dhanika’, then it would take up 7 bytes (for D, H, A, N, I, K and A) plus 2 bytes, or 9 bytes in all.)
CHAR
Uses specific allocation of memory
Time efficient
VARCHAR
Uses dynamic allocation of memory
Memory efficient
The char is a fixed-length character data type, the varchar is a variable-length character data type.
Because char is a fixed-length data type, the storage size of the char value is equal to the maximum size for this column. Because varchar is a variable-length data type, the storage size of the varchar value is the actual length of the data entered, not the maximum size for this column.
You can use char when the data entries in a column are expected to be the same size.
You can use varchar when the data entries in a column are expected to vary considerably in size.
Distinguishing between the two is also good for an integrity aspect.
If you expect to store things that have a rule about their length such as yes or no then you can use char(1) to store Y or N. Also useful for things like currency codes, you can use char(3) to store things like USD, EUR or AUD.
Then varchar is better for things were there is no general rule about their length except for the limit. It's good for things like names or descriptions where there is a lot of variation of how long the values will be.
Then the text data type comes along and puts a spanner in the works (although it's generally just varchar with no defined upper limit).
according to High Performance MySQL book:
VARCHAR stores variable-length character strings and is the most common string data type. It can require less storage space than
fixed-length types, because it uses only as much space as it needs
(i.e., less space is used to store shorter values). The exception is a
MyISAM table created with ROW_FORMAT=FIXED, which uses a fixed amount
of space on disk for each row and can thus waste space. VARCHAR helps
performance because it saves space.
CHAR is fixed-length: MySQL always allocates enough space for the specified number of characters. When storing a CHAR value, MySQL
removes any trailing spaces. (This was also true of VARCHAR in MySQL
4.1 and older versions—CHAR and VAR CHAR were logically identical and differed only in storage format.) Values are padded with spaces as
needed for comparisons.
Char has a fixed length (supports 2000 characters), it is stand for character is a data type
Varchar has a variable length (supports 4000 characters)
Char or varchar- it is used to enter texual data where the length can be indicated in brackets
Eg- name char (20)
CHAR :
Supports both Character & Numbers.
Supports 2000 characters.
Fixed Length.
VARCHAR :
Supports both Character & Numbers.
Supports 4000 characters.
Variable Length.
any comments......!!!!