Please explain the difference between the character data type varchar and nchar in TDengine. For the storage of character data, the case provided by TDengine is compared with the traditional mysql database. Nchar is more used as the character format type instead of varchar, are there more details about the data type selection?
From my understanding, in mysql char/nchar is used to represent fixed length characters, while varchar/varnchar to represent variable length characters:
https://dev.mysql.com/doc/refman/8.0/en/char.html
For char/nchar types usually a length is specified. For example NCHAR type in TDengine is fixed to 4 bytes. If we create the column entry with type NCHAR(4), "abc" is stored as 4 bytes and trailing spaces will be used to pad the original string. However, if "abc" is stored as varchar the length will be 3. Since NCHAR is fixed length so it would be faster when operating data, but may cause additional storage overhead IMO.
There is no varchar data type in TDengine.
What's the best data type to be used when storing strings, like a first name? I've seen varchar and nvarchar both used. Which one is better? Does it matter?
I've also heard that the best length to use is 255, but I don't know why. Is there a specific length that is preferred for strings?
nvarchar stores unicode character data which is required if you plan to store non-English names. If it's a web application, I highly recommend using nvarchar even if you don't plan on being international. The downside is that it consumes twice as much space, 16-bits per character for nvarchar and 8-bits per character for varchar.
What's the best data type to be used
when storing strings, like a first
name? I've seen varchar and nvarchar
both used. Which one is better? Does
it matter?
See What is the difference between nchar(10) and varchar(10) in MSSQL?
If you need non-ASCII characters, you have to use nchar/nvarchar. If you don't, then you may want to use char/varchar to save space.
Note that this issue is specific to MS SQL Server, which doesn't have good support for UTF-8. In other SQL implementations that do, you can use Unicode strings with no extra space requirements (for English).
EDIT: Since this answer was originally written, SQL Server 2019 (15.x) finally introduced UTF-8 support. You may want to consider using it as your default database text encoding.
I've also heard that the best length
to use is 255, but I don't know why.
See Is there a good reason I see VARCHAR(255) used so often (as opposed to another length)?
Is there a specific length that is
preferred for strings?
If you data has a well-defined maximum limit (e.g., 17 characters for a VIN), then use that.
OTOH, if the limit is arbitrary, then choose a generous maximum size to avoid rejecting valid data. In SQL Server, you may want to consider the 900-byte maximum size of index keys.
nvarchar means you can save unicode character inside it. there is 2GB limit for nvarchar type. if the field length is more than 4000 characters, an overflow page is used. smaller fields means one page can hold more rows which increase the query performance.
Generally, for small strings use nvarchar(n), which supports Unicode characters. The string is compressed when used with row or page compression (at least one of which is generally desirable).
Large strings need nvarchar(max), which Unicode compression does not support.
For special-case scenarios when your data set never uses Unicode characters, varchar(n) and varchar(max) restrict the string type of one byte per character.
If you know the max length (n) is less than 256, SQL Server only needs to use 1 byte to store the string length. This reduces storage space by about half a percent compared a string type whose max length is just over 255.
I noticed that I can write
SELECT CAST(Min(mynumber) AS VARCHAR(Max))+'mystring' AS X
as
SELECT CAST(Min(mynumber) AS VARCHAR)+'mystring' X
Will I regret leaving out the (Max) parameter?
You'll regret it in the (unlikely) situation that MAX(mynumber) has more than 30 characters:
When n is not specified when using the CAST and CONVERT functions, the default length is 30.
VARCHAR(MAX) should be used for Large Objects.It uses the normal datapages until the content actually fills 8k of data. When overflow happens, data is stored as old TEXT, IMAGE and a pointer is replacing the old content.
Varchar is for Variable-length, non-Unicode character data. n can be a value from 1 through 8,000. Max indicates that the maximum storage size is 2^31-1 bytes.
Hope it helps.
When a varchar's lenght is not specified in a data definition or variable declaration statement, the default length is 1. When it is not specified when using the CAST and CONVERT functions, the default length is 30.
See: char and varchar (Transact-SQL)
I feel that it is poor practice to code without specifying a length for varchar.
If I have a table with a field which is declared as accepting varchar(100) and then I actually insert the word "hello" how much real storage will be used on the mysql server? Also will an insert of NULL result in no storage being used even though varchar(100) is declared?
What ever the answer is, is it consistent accross different database implementations?
If I have a table with a field which
is declared as accepting varchar(100)
and then I actually insert the word
"hello" how much real storage will be
used on the mysql server?
Mysql will store 5 bytes plus one byte for the length. If the varchar is greater than 255, then it will store 2 bytes for the length.
Note that this is dependent on the charset of the column. If the charset is utf8, mysql will require up to 3 bytes per character. Some storage engines (i.e. memory) will always require the maximum byte length per character for the character set.
Also will an insert of NULL result in
no storage being used even though
varchar(100) is declared?
Making a column nullable means that mysql will have to set aside an extra byte per up to 8 nullable columns per row. This is called the "null mask".
What ever the answer is, is it consistent accross different database implementations?
It's not even consistent between storage engines within mysql!
It really depends on your table's charset.
In contrast to CHAR, VARCHAR values
are stored as a one-byte or two-byte
length prefix plus data. The length
prefix indicates the number of bytes
in the value. A column uses one length
byte if values require no more than
255 bytes, two length bytes if values
may require more than 255 bytes.
- source
UTF-8 often takes more space than an
encoding made for one or a few
languages. Latin letters with
diacritics and characters from other
alphabetic scripts typically take one
byte per character in the appropriate
multi-byte encoding but take two in
UTF-8. East Asian scripts generally
have two bytes per character in their
multi-byte encodings yet take three
bytes per character in UTF-8.
- source
varchar only stores what is used whereas char stores a set number of bytes.
Utf16 sometimes takes less data then utf8, for some rare languages, I don't know which ones.
Guys, is there an option to use COMPRESSed tables in MySql? Like in Apache. Thanks a lot
What's the difference between VARCHAR and CHAR in MySQL?
I am trying to store MD5 hashes.
VARCHAR is variable-length.
CHAR is fixed length.
If your content is a fixed size, you'll get better performance with CHAR.
See the MySQL page on CHAR and VARCHAR Types for a detailed explanation (be sure to also read the comments).
CHAR
Used to store character string value of fixed length.
The maximum no. of characters the data type can hold is 255 characters.
It's 50% faster than VARCHAR.
Uses static memory allocation.
VARCHAR
Used to store variable length alphanumeric data.
The maximum this data type can hold is up to
Pre-MySQL 5.0.3: 255 characters.
Post-MySQL 5.0.3: 65,535 characters shared for the row.
It's slower than CHAR.
Uses dynamic memory allocation.
CHAR Vs VARCHAR
CHAR is used for Fixed Length Size Variable
VARCHAR is used for Variable Length Size Variable.
E.g.
Create table temp
(City CHAR(10),
Street VARCHAR(10));
Insert into temp
values('Pune','Oxford');
select length(city), length(street) from temp;
Output will be
length(City) Length(street)
10 6
Conclusion: To use storage space efficiently must use VARCHAR Instead CHAR if variable length is variable
A CHAR(x) column can only have exactly x characters.
A VARCHAR(x) column can have up to x characters.
Since your MD5 hashes will always be the same size, you should probably use a CHAR.
However, you shouldn't be using MD5 in the first place; it has known weaknesses.
Use SHA2 instead.
If you're hashing passwords, you should use bcrypt.
What's the difference between VARCHAR and CHAR in MySQL?
To already given answers I would like to add that in OLTP systems or in systems with frequent updates consider using CHAR even for variable size columns because of possible VARCHAR column fragmentation during updates.
I am trying to store MD5 hashes.
MD5 hash is not the best choice if security really matters. However, if you will use any hash function, consider BINARY type for it instead (e.g. MD5 will produce 16-byte hash, so BINARY(16) would be enough instead of CHAR(32) for 32 characters representing hex digits. This would save more space and be performance effective.
Varchar cuts off trailing spaces if the entered characters is shorter than the declared length, while char will not. Char will pad spaces and will always be the length of the declared length. In terms of efficiency, varchar is more adept as it trims characters to allow more adjustment. However, if you know the exact length of char, char will execute with a bit more speed.
CHAR is fixed length and VARCHAR is variable length. CHAR always uses the same amount of storage space per entry, while VARCHAR only uses the amount necessary to store the actual text.
CHAR is a fixed length field; VARCHAR is a variable length field. If you are storing strings with a wildly variable length such as names, then use a VARCHAR, if the length is always the same, then use a CHAR because it is slightly more size-efficient, and also slightly faster.
In most RDBMSs today, they are synonyms. However for those systems that still have a distinction, a CHAR field is stored as a fixed-width column. If you define it as CHAR(10), then 10 characters are written to the table, where "padding" (typically spaces) is used to fill in any space that the data does not use up. For example, saving "bob" would be saved as ("bob"+7 spaces). A VARCHAR (variable character) column is meant to store data without wasting the extra space that a CHAR column does.
As always, Wikipedia speaks louder.
CHAR
CHAR is a fixed length string data type, so any remaining space in the field is padded with blanks.
CHAR takes up 1 byte per character. So, a CHAR(100) field (or variable) takes up 100 bytes on disk, regardless of the string it holds.
VARCHAR
VARCHAR is a variable length string data type, so it holds only the characters you assign to it.
VARCHAR takes up 1 byte per character, + 2 bytes to hold length information (For example, if you set a VARCHAR(100) data type = ‘Dhanika’, then it would take up 7 bytes (for D, H, A, N, I, K and A) plus 2 bytes, or 9 bytes in all.)
CHAR
Uses specific allocation of memory
Time efficient
VARCHAR
Uses dynamic allocation of memory
Memory efficient
The char is a fixed-length character data type, the varchar is a variable-length character data type.
Because char is a fixed-length data type, the storage size of the char value is equal to the maximum size for this column. Because varchar is a variable-length data type, the storage size of the varchar value is the actual length of the data entered, not the maximum size for this column.
You can use char when the data entries in a column are expected to be the same size.
You can use varchar when the data entries in a column are expected to vary considerably in size.
Distinguishing between the two is also good for an integrity aspect.
If you expect to store things that have a rule about their length such as yes or no then you can use char(1) to store Y or N. Also useful for things like currency codes, you can use char(3) to store things like USD, EUR or AUD.
Then varchar is better for things were there is no general rule about their length except for the limit. It's good for things like names or descriptions where there is a lot of variation of how long the values will be.
Then the text data type comes along and puts a spanner in the works (although it's generally just varchar with no defined upper limit).
according to High Performance MySQL book:
VARCHAR stores variable-length character strings and is the most common string data type. It can require less storage space than
fixed-length types, because it uses only as much space as it needs
(i.e., less space is used to store shorter values). The exception is a
MyISAM table created with ROW_FORMAT=FIXED, which uses a fixed amount
of space on disk for each row and can thus waste space. VARCHAR helps
performance because it saves space.
CHAR is fixed-length: MySQL always allocates enough space for the specified number of characters. When storing a CHAR value, MySQL
removes any trailing spaces. (This was also true of VARCHAR in MySQL
4.1 and older versions—CHAR and VAR CHAR were logically identical and differed only in storage format.) Values are padded with spaces as
needed for comparisons.
Char has a fixed length (supports 2000 characters), it is stand for character is a data type
Varchar has a variable length (supports 4000 characters)
Char or varchar- it is used to enter texual data where the length can be indicated in brackets
Eg- name char (20)
CHAR :
Supports both Character & Numbers.
Supports 2000 characters.
Fixed Length.
VARCHAR :
Supports both Character & Numbers.
Supports 4000 characters.
Variable Length.
any comments......!!!!