Why does VARCHAR need length specification? - sql

Why do we always need to specify VARCHAR(length) instead of just VARCHAR? It is dynamic anyway.
UPD: I'm puzzled specifically by the fact that it is mandatory (e.g. in MySQL).

The "length" of the VARCHAR is not the length of the contents, it is the maximum length of the contents.
The max length of a VARCHAR is not dynamic, it is fixed and therefore has to be specified.
If you don't want to define a maximum size for it then use VARCHAR(MAX).

First off, it does not needed it in all databases. Look at SQL Server, where it is optional.
Regardless, it defines a maximum size for the content of the field. Not a bad thing in itself, and it conveys meaning (for example - phone numbers, where you do not want international numbers in the field).

You can see it as a constraint on your data. It ensures that you don't store data that violates your constraint. It is conceptionally similar to e.g. a check constraint on a integer column that ensure that only positive values are entered.

The more the database knows about the data it is storing, the more optimisations it can make when searching/adding/updating data with requests.

The answer is you don't need to, it's optional.
It's there if you want to ensure that strings do not exceed a certain length.

From Wikipedia:
Varchar fields can be of any size up
to the limit. The limit differs from
types of databases, an Oracle 9i
Database has a limit of 4000 bytes, a
MySQL Database has a limit of 65,535
bytes (for the entire row) and
Microsoft SQL Server 2005 8000 bytes
(unless varchar(max) is used, which
has a maximum storage capacity of
2,147,483,648 bytes).

The most dangerous thing for programmers, as #DimaFomin pointed out in comments, is the default length enforced, if there is no length specified.
How SQL Server enforces the default length:
declare #v varchar = '123'
select #v
result:
1

Related

How are you supposed to choose K when creating a VARCHAR(K) column? Or are you?

This is something I've never understood. Let's say I want a column that is storing an email address. I think,
"Ok, email addresses are usually no more than 15 characters, but I'll
say 50 max characters just to play it safe."
and making that column VARCHAR(50). Of course, then this means that I have to create extra code, possibly both client- and server-side validation of entries to that column.
That brings up the question of Why not just use NVARCHAR all the time except in those rare circumstances where the logic of my application dicates a fixed or maximum length. From what I understand, if I create a VARCHAR(50) and none of the entries are more than 25 characters, that does not mean that 50% of the space is wasted, as the database knows how to optimize everything.
Again, that brings up the question of why not just use NVARCHAR.
nvarchar itself has nothing to with "unlimited length of string" since it is just unicode version of varchar. At present time there are no reasons to use varchar (except some backward compatibility issues) and nvarchar should be preferred.
So I'm supposing you're asking why don't use nvarchar(max) everywhere which is almost unlimited (2 GByte of storage) instead of specifying nvarchar(n) for concrete columns.
There are many reasons of using nvarchar(n) instead of nvarchar(max).
For example, if your column should be included in index - it can't be nvarchar(max).
Also nvarchar(max) data internally stored differently than nvarchar(n) and somtimes it can affect performance.

what is the maximum length of varchar(n) in postgresql 9.2 and which is best to use varchar(n) or text?

Hi I am using postgresql 9.2 and I want to use varchar(n) to store some long string but I don't know the maximum length of character which varchar(n) supports. and which one is better to use so could you please suggest me? thanks
tl;dr: 1 GB (each character (really: codepoint) may be represented by 1 or more bytes, depending on where they are on a unicode plane - assuming a UTF-8 encoded database). You should always use text datatype for arbitrary-length character data in Postgresql now.
Explanation:
varchar(n) and text use the same backend storage type (varlena): a variable length byte array with a 32bit length counter. For indexing behavior text may even have some performance benefits. It is considered a best practice in Postgres to use text type for new development; varchar(n) remains for SQL standard support reasons. NB: varchar() (with empty brackets) is a Postgres-specific alias for text.
See also:
http://www.postgresql.org/about/
According to the official documentation ( http://www.postgresql.org/docs/9.2/static/datatype-character.html ):
In any case, the longest possible character string that can be stored is about 1 GB. (The maximum value that will be allowed for n in the data type declaration is less than that. It wouldn't be useful to change this because with multibyte character encodings the number of characters and bytes can be quite different. If you desire to store long strings with no specific upper limit, use text or character varying without a length specifier, rather than making up an arbitrary length limit.)
Searching online reveals that the maximum value allowed varies depending on the installation and compilation options, some users report a maximum of 10485760 characters (10MiB exactly, assuming 1-byte-per-character fixed encoding).
By "the installation and compilation options" I mean that you can always build PostgreSQL from source yourself and before you compile PostgreSQL to make your own database server you can configure how it stores text to change the maximum amount you can store - but if you do this then it means you might run into trouble if you try to use your database files with a "normal", non-customized build of PostgreSQL.

SQL - Numeric data type with leading zeros

I need to store Medicare APC codes. I believe the format requires 4 numbers. Leading zeros are relevant. Is there any way to store this data type with verification? How should I store this data (varchar(4), int)?
This kind of issue, storing zero leading numbers that need to be treated as Numeric values on some scenarios (i.e. sorting) and as textual values in others (i.e. addresses) is always a pain and there is no one answer that is best for all users. At my company we have a database that stores numbers as text for codes (not Medicare APC codes) and we must pad them with zero’s so they will sort properly when used in an order operation.
Do not use a numeric data type for this because the item is not a true number but textual data that uses numeric characters. You will not be performing any calculations or aggregates on the codes and so the only benefit to storing them as a number would be to ensure proper sorting of the codes and that can be done with the code stored as text by padding it with zeros where needed. If you sue a numeric data type then any time the code is combined with other textual values you will have to explicitly convert it to CHAR/VARCHAR or let SQL Server do it since implicit conversions should always be avoided that means a lot of extra work for you and the query processor any time the code is used.
Assuming you decide to go with a textual data type the question then is should you use VARCHAR or CHAR and while many who have posted say VARCHAR I would suggest you go with CHAR set to a length of 4. WHY?
The VARCHAR data type is for textual data in which the size (the length or number of characters) is unknown in advance. For this Medicare code we know the length will always be at least 4 and possibly no more than 4 for the foreseeable future. SQL Server handles the storage of the data differently between CHAR and VARCHAR. SQL Server’s BOL (Books On Line) says :
Use CHAR when the size of the column data entries are consistent
Use VARCHAR when the size of the column data varies considerably.
I can’t say for certain this is true for SQL Server 2008 and up but for earlier versions, the use of a VARCHAR data type carries an extra overhead of 1 byte per row of data per column in a table that has a VARCHAR data type. If the data stored is always the same size and in your scenario it sounds like it is then this extra byte is a waste.
In the end it’s up to you as to whether you like CHAR or VARCHAR better but definitely don’t use a numeric data type to store a fixed length code.
That's not numeric data; it's textual data that happens to contain digits.
Use a VARCHAR.
I agree, using
CHAR(4)
for the check constraint use
check( APC_ODE LIKE '[0-9][0-9][0-9][0-9]' )
This will force a 4 digit number only to be accepted...
varchar(4)
optionally, you can still add a check constraint to ensure the data is numeric with leading zeros. This example will throw exceptions in Oracle. In other RDBMS, you could use regular expression checks:
alter table X add constraint C
check (cast(APC_CODE as int) = cast(APC_CODE as int))
If you are certain that the APC codes will always be numeric (that is if it wouldn't change in the near future), a better way would be to leave the database column as is, and handle the formatting (to include leading zeros) at places where you use this field values.
If you need leading 0s, then you must use a varchar or other string data type.
There are ways to format the output for leading 0s without compromising your actual data.
See this blog entry for an easy method.
CHAR(4) seems more appropriate to me (if I understood you right, and the code is always 4 digits).
What you want to use is a VARCHAR data type with a CHECK constraint, using LIKE with a pattern to check for numeric values.
in TSQL
check( isnumeric(APC_ODE) = 1)

Which is the biggest String type in SQL?

Which is the biggest possible String type in SQL?
TEXT? or another?
DBMS: SQLITE and MYSQL
The largest character data type in MySql is LONGTEXT (see Storage Requirements for String Types), that is capable to hold 2^32-1 bytes.
SQLite does not impose any length restrictions (other than the large global SQLITE_MAX_LENGTH limit) on the length of strings, BLOBs or numeric values. The default limit for that is 1 billion, but you can change it at compile time to a maximum of 2^31-1 (see Maximum length of a string or BLOB).
CLOB.
The SQL standard uses the name CLOB (it's feature T041-02 in SQL2008), but note that other databases may have different names for it. I think TEXT is one.

Best data type for storing strings in SQL Server?

What's the best data type to be used when storing strings, like a first name? I've seen varchar and nvarchar both used. Which one is better? Does it matter?
I've also heard that the best length to use is 255, but I don't know why. Is there a specific length that is preferred for strings?
nvarchar stores unicode character data which is required if you plan to store non-English names. If it's a web application, I highly recommend using nvarchar even if you don't plan on being international. The downside is that it consumes twice as much space, 16-bits per character for nvarchar and 8-bits per character for varchar.
What's the best data type to be used
when storing strings, like a first
name? I've seen varchar and nvarchar
both used. Which one is better? Does
it matter?
See What is the difference between nchar(10) and varchar(10) in MSSQL?
If you need non-ASCII characters, you have to use nchar/nvarchar. If you don't, then you may want to use char/varchar to save space.
Note that this issue is specific to MS SQL Server, which doesn't have good support for UTF-8. In other SQL implementations that do, you can use Unicode strings with no extra space requirements (for English).
EDIT: Since this answer was originally written, SQL Server 2019 (15.x) finally introduced UTF-8 support. You may want to consider using it as your default database text encoding.
I've also heard that the best length
to use is 255, but I don't know why.
See Is there a good reason I see VARCHAR(255) used so often (as opposed to another length)?
Is there a specific length that is
preferred for strings?
If you data has a well-defined maximum limit (e.g., 17 characters for a VIN), then use that.
OTOH, if the limit is arbitrary, then choose a generous maximum size to avoid rejecting valid data. In SQL Server, you may want to consider the 900-byte maximum size of index keys.
nvarchar means you can save unicode character inside it. there is 2GB limit for nvarchar type. if the field length is more than 4000 characters, an overflow page is used. smaller fields means one page can hold more rows which increase the query performance.
Generally, for small strings use nvarchar(n), which supports Unicode characters. The string is compressed when used with row or page compression (at least one of which is generally desirable).
Large strings need nvarchar(max), which Unicode compression does not support.
For special-case scenarios when your data set never uses Unicode characters, varchar(n) and varchar(max) restrict the string type of one byte per character.
If you know the max length (n) is less than 256, SQL Server only needs to use 1 byte to store the string length. This reduces storage space by about half a percent compared a string type whose max length is just over 255.