Which is the biggest String type in SQL? - sql

Which is the biggest possible String type in SQL?
TEXT? or another?
DBMS: SQLITE and MYSQL

The largest character data type in MySql is LONGTEXT (see Storage Requirements for String Types), that is capable to hold 2^32-1 bytes.
SQLite does not impose any length restrictions (other than the large global SQLITE_MAX_LENGTH limit) on the length of strings, BLOBs or numeric values. The default limit for that is 1 billion, but you can change it at compile time to a maximum of 2^31-1 (see Maximum length of a string or BLOB).

CLOB.
The SQL standard uses the name CLOB (it's feature T041-02 in SQL2008), but note that other databases may have different names for it. I think TEXT is one.

Related

Impala Data types

I'm trying to understand the difference between the following data tyoes in Impala
String
Char
Varchar
Being schema on read what would be the need for 3 different types. I am wondering if there's any performance benefits of using Char/Varchars over Strings in scenarios where we know the upper bounds on column lengths?
STRING stores variable length data and is (essentially--barring some practical limitations, of course) unbounded.
VARCHAR(x) stores variable length data with an upper bound of x characters, so data will be truncated to the defined length. For example, if you have VARCHAR(10), your input data can have size in [0,10].
CHAR(x) is an x-character fixed-size data type. Data is padded if it is shorter than x. Data is truncated if it is longer than x.
Both VARCHAR and CHAR were introduced in Impala 2.0.0 (CDH 5.2.0) and are mostly useful for compatibility with other database systems. However, both VARCHAR and CHAR are not recommended except for some special use cases (with specific legacy systems) as both have some functional limitations.
While there should be similar performance between STRING and VARCHAR, CHAR has some different characteristics: notably it is not codegen'ed so typically performance will suffer. However, small CHARs (where x < 128) are stored along with the tuples during execution rather than in auxiliary memory as variable length data is.
The above provides some differences between these types, but it is recommended to use STRING whenever possible.
See the STRING, VARCHAR, and CHAR docs for more details.

what is the maximum length of varchar(n) in postgresql 9.2 and which is best to use varchar(n) or text?

Hi I am using postgresql 9.2 and I want to use varchar(n) to store some long string but I don't know the maximum length of character which varchar(n) supports. and which one is better to use so could you please suggest me? thanks
tl;dr: 1 GB (each character (really: codepoint) may be represented by 1 or more bytes, depending on where they are on a unicode plane - assuming a UTF-8 encoded database). You should always use text datatype for arbitrary-length character data in Postgresql now.
Explanation:
varchar(n) and text use the same backend storage type (varlena): a variable length byte array with a 32bit length counter. For indexing behavior text may even have some performance benefits. It is considered a best practice in Postgres to use text type for new development; varchar(n) remains for SQL standard support reasons. NB: varchar() (with empty brackets) is a Postgres-specific alias for text.
See also:
http://www.postgresql.org/about/
According to the official documentation ( http://www.postgresql.org/docs/9.2/static/datatype-character.html ):
In any case, the longest possible character string that can be stored is about 1 GB. (The maximum value that will be allowed for n in the data type declaration is less than that. It wouldn't be useful to change this because with multibyte character encodings the number of characters and bytes can be quite different. If you desire to store long strings with no specific upper limit, use text or character varying without a length specifier, rather than making up an arbitrary length limit.)
Searching online reveals that the maximum value allowed varies depending on the installation and compilation options, some users report a maximum of 10485760 characters (10MiB exactly, assuming 1-byte-per-character fixed encoding).
By "the installation and compilation options" I mean that you can always build PostgreSQL from source yourself and before you compile PostgreSQL to make your own database server you can configure how it stores text to change the maximum amount you can store - but if you do this then it means you might run into trouble if you try to use your database files with a "normal", non-customized build of PostgreSQL.

Best data type for storing strings in SQL Server?

What's the best data type to be used when storing strings, like a first name? I've seen varchar and nvarchar both used. Which one is better? Does it matter?
I've also heard that the best length to use is 255, but I don't know why. Is there a specific length that is preferred for strings?
nvarchar stores unicode character data which is required if you plan to store non-English names. If it's a web application, I highly recommend using nvarchar even if you don't plan on being international. The downside is that it consumes twice as much space, 16-bits per character for nvarchar and 8-bits per character for varchar.
What's the best data type to be used
when storing strings, like a first
name? I've seen varchar and nvarchar
both used. Which one is better? Does
it matter?
See What is the difference between nchar(10) and varchar(10) in MSSQL?
If you need non-ASCII characters, you have to use nchar/nvarchar. If you don't, then you may want to use char/varchar to save space.
Note that this issue is specific to MS SQL Server, which doesn't have good support for UTF-8. In other SQL implementations that do, you can use Unicode strings with no extra space requirements (for English).
EDIT: Since this answer was originally written, SQL Server 2019 (15.x) finally introduced UTF-8 support. You may want to consider using it as your default database text encoding.
I've also heard that the best length
to use is 255, but I don't know why.
See Is there a good reason I see VARCHAR(255) used so often (as opposed to another length)?
Is there a specific length that is
preferred for strings?
If you data has a well-defined maximum limit (e.g., 17 characters for a VIN), then use that.
OTOH, if the limit is arbitrary, then choose a generous maximum size to avoid rejecting valid data. In SQL Server, you may want to consider the 900-byte maximum size of index keys.
nvarchar means you can save unicode character inside it. there is 2GB limit for nvarchar type. if the field length is more than 4000 characters, an overflow page is used. smaller fields means one page can hold more rows which increase the query performance.
Generally, for small strings use nvarchar(n), which supports Unicode characters. The string is compressed when used with row or page compression (at least one of which is generally desirable).
Large strings need nvarchar(max), which Unicode compression does not support.
For special-case scenarios when your data set never uses Unicode characters, varchar(n) and varchar(max) restrict the string type of one byte per character.
If you know the max length (n) is less than 256, SQL Server only needs to use 1 byte to store the string length. This reduces storage space by about half a percent compared a string type whose max length is just over 255.

Why does VARCHAR need length specification?

Why do we always need to specify VARCHAR(length) instead of just VARCHAR? It is dynamic anyway.
UPD: I'm puzzled specifically by the fact that it is mandatory (e.g. in MySQL).
The "length" of the VARCHAR is not the length of the contents, it is the maximum length of the contents.
The max length of a VARCHAR is not dynamic, it is fixed and therefore has to be specified.
If you don't want to define a maximum size for it then use VARCHAR(MAX).
First off, it does not needed it in all databases. Look at SQL Server, where it is optional.
Regardless, it defines a maximum size for the content of the field. Not a bad thing in itself, and it conveys meaning (for example - phone numbers, where you do not want international numbers in the field).
You can see it as a constraint on your data. It ensures that you don't store data that violates your constraint. It is conceptionally similar to e.g. a check constraint on a integer column that ensure that only positive values are entered.
The more the database knows about the data it is storing, the more optimisations it can make when searching/adding/updating data with requests.
The answer is you don't need to, it's optional.
It's there if you want to ensure that strings do not exceed a certain length.
From Wikipedia:
Varchar fields can be of any size up
to the limit. The limit differs from
types of databases, an Oracle 9i
Database has a limit of 4000 bytes, a
MySQL Database has a limit of 65,535
bytes (for the entire row) and
Microsoft SQL Server 2005 8000 bytes
(unless varchar(max) is used, which
has a maximum storage capacity of
2,147,483,648 bytes).
The most dangerous thing for programmers, as #DimaFomin pointed out in comments, is the default length enforced, if there is no length specified.
How SQL Server enforces the default length:
declare #v varchar = '123'
select #v
result:
1

How much real storage is used with a varchar(100) declaration in mysql?

If I have a table with a field which is declared as accepting varchar(100) and then I actually insert the word "hello" how much real storage will be used on the mysql server? Also will an insert of NULL result in no storage being used even though varchar(100) is declared?
What ever the answer is, is it consistent accross different database implementations?
If I have a table with a field which
is declared as accepting varchar(100)
and then I actually insert the word
"hello" how much real storage will be
used on the mysql server?
Mysql will store 5 bytes plus one byte for the length. If the varchar is greater than 255, then it will store 2 bytes for the length.
Note that this is dependent on the charset of the column. If the charset is utf8, mysql will require up to 3 bytes per character. Some storage engines (i.e. memory) will always require the maximum byte length per character for the character set.
Also will an insert of NULL result in
no storage being used even though
varchar(100) is declared?
Making a column nullable means that mysql will have to set aside an extra byte per up to 8 nullable columns per row. This is called the "null mask".
What ever the answer is, is it consistent accross different database implementations?
It's not even consistent between storage engines within mysql!
It really depends on your table's charset.
In contrast to CHAR, VARCHAR values
are stored as a one-byte or two-byte
length prefix plus data. The length
prefix indicates the number of bytes
in the value. A column uses one length
byte if values require no more than
255 bytes, two length bytes if values
may require more than 255 bytes.
- source
UTF-8 often takes more space than an
encoding made for one or a few
languages. Latin letters with
diacritics and characters from other
alphabetic scripts typically take one
byte per character in the appropriate
multi-byte encoding but take two in
UTF-8. East Asian scripts generally
have two bytes per character in their
multi-byte encodings yet take three
bytes per character in UTF-8.
- source
varchar only stores what is used whereas char stores a set number of bytes.
Utf16 sometimes takes less data then utf8, for some rare languages, I don't know which ones.
Guys, is there an option to use COMPRESSed tables in MySql? Like in Apache. Thanks a lot