How to choose SQL DDL type parameters? - sql

All the SQL dialects I've seen use to either allow or to require to specify an integer argument for some of the data types they support when defining a table. But I haven't managed to find any comprehensive information (at least for MySQL and SQLite) about what exactly do these numbers mean and how to chose them adequately...

If you mean the notation like INT(11), VARCHAR(255), then normally it's the length of the values stored (or retrieved) from the table.
http://dev.mysql.com/doc/refman/5.5/en/numeric-type-overview.html
http://dev.mysql.com/doc/refman/5.5/en/string-type-overview.html
There's an "M" in parentheses after name of every type which supports it.
Also, keep in mind that these numbers may affect both storage and selection length. For example, if you define a column as VARCHAR(100), the actual space reserved for each value will still be 255. But the values you retrieve in SELECT will be trimmed down to 100 characters. And if you define a VARCHAR(256), then it reserves up to 65535 characters, if I remember correctly.
The same with INT(5). It still reserves space for storing values up to 2147483647 (max value for signed integers), but trims the input/output to 5 digits.

Related

variable character field

Work in oracle database and new bi here. Just wondering if it is possible to extend variable character defined with length 12 to 64 or more permanently(to hold longer strings), what kind of issues I might see?
I have some programs and files depend on this field. THis is more of general question. A field defined as variable character is because to extend length in the future?
THanks
It is possible and very easy, and there are no issues to worry about (at least as long as you work in Oracle; a bit more about that below). It works something like this:
create table tbl (x varchar2(10));
insert into tbl values ('mathguy')
alter table tbl modify (x varchar2(20));
The issue you may have is with applications in other languages (like Java). If you must load data from Oracle tables to a Java application (say), then the memory reserved for the values from the table is allocated based on the declared length of the strings, not their actual size. So if you change from length 10 to length 4000, about 40 times more memory is allocated for values coming from the table, whether that is really needed or not. This may or may not be an issue, it depends on how RAM-starved your applications are (and what else is happening on the same machine).
The purpose of varchar has nothing to do with preparing to extend the column size in the future. However, it should be fine to alter the table to increase the field length.
The varchar2 data type is used for fields that can contain a string of variable length--the size of the field is the maximum number of characters that can be stored. As long as your string never exceeds the field length, you should be fine, and may save some space compared to using straight-up char characters.
If you need to extend the field definition to accommodate longer strings, you can use the ALTER TABLE ... MODIFY... syntax. Note that if there is a large amount of data already in the table this may be a costly operation.
ALTER TABLE
customer
MODIFY
(
cust_name varchar2(100)
);
See the Oracle Documentation of VARCHAR2 Here.

What is the major difference between Varchar2 and char

Creating Table:
CREATE TABLE test (
charcol CHAR(10),
varcharcol VARCHAR2(10));
SELECT LENGTH(charcol), LENGTH(varcharcol) FROM test;
Result:
LENGTH(CHARCOL) LENGTH(VARCHARCOL)
--------------- ------------------
10 1
Please Let me know what is the difference between Varchar2 and char?
At what times we use both?
Although there are already several answers correctly describing the behaviour of char, I think it needs to be said that you should not use it except in three specific situations:
You are building a fixed-length file or report, and assigning a non-null value to a char avoids the need to code an rpad() expression. For example, if firstname and lastname are both defined as char(20), then firstname||lastname is a shorter way of writing rpad(firstname,20)||rpad(lastname,20) to create
Chuck Norris
You need to distinguish between the explicit empty string '' and null. Normally they are the same thing in Oracle, but assigning '' to a char value will trigger its blank-padding behaviour while null will not, so if it's important to tell the difference, and I can't really think of a reason why it would be, then you have a way to do that.
Your code is ported from (or needs to be compatible with) some other system that requires blank-padding for legacy reasons. In that case you are stuck with it and you have my sympathy.
There is really no reason to use char just because some length is fixed (e.g. a Y/N flag or an ISO currency code such as 'USD'). It's not more efficient, it doesn't save space (there's no mythical length indicator for a varchar2, there's just a blank padding overhead for char), and it doesn't stop anyone entering shorter values. (If you enter 'ZZ' in your char(3) currency column, it will just get stored as 'ZZ '.) It's not even backward-compatible with some ancient version of Oracle that once relied on it, because there never was one.
And the contagion can spread, as (following best practice) you might anchor a variable declaration using something like sales.currency%type. Now your l_sale_currency variable is a stealth char which will get invisibly blank-padded for shorter values (or ''), opening the door to obscure bugs where l_sale_currency does not equal l_refund_currency even though you assigned 'ZZ' to both of them.
Some argue that char(n) (where n is some character length) indicates that values are expected to be n characters long, and this is a form of self-documentation. But surely if you are serious about a 3-character format (ISO-Alpha-3 country codes rather than ISO-Alpha-2, for example), wouldn't you define a constraint to enforce the rule, rather than letting developers glance at a char(3) datatype and draw their own conclusions?
CHAR was introduced in Oracle 6 for, I'm sure, ANSI compatibility reasons. Probably there are potential customers deciding which database product to purchase and ANSI compatibility is on their checklist (or used to be back then), and CHAR with blank-padding is defined in the ANSI standard, so Oracle needs to provide it. You are not supposed to actually use it.
Simple example to show the difference:
SELECT
'"'||CAST('abc' AS VARCHAR2(10))||'"',
'"'||CAST('abc' AS CHAR(10))||'"'
FROM dual;
'"'||CAST('ABC'ASVARCHAR2(10))||'"' '"'||CAST('ABC'ASCHAR(10))||'"'
----------------------------------- -------------------------------
"abc" "abc "
1 row selected.
The CHAR is usefull for expressions where the length of charaters is always fix, e.g. postal code for US states, for example CA, NY, FL, TX
Just to avoid confusion about much wrong information. Here are some information about difference including performance
Reference: https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:2668391900346844476
Since a char is nothing more than a VARCHAR2 that is blank padded out
to the maximum length - that is, the difference between the columns X
and Y below:
create table t ( x varchar2(30), y char(30) ); insert into t (x,y)
values ( rpad('a',' ',30), 'a' );
IS ABSOLUTELY NOTHING, and given that the difference between columns X
and Y below:
insert into t (x,y) values ('a','a')
is that X consumes 3 bytes (null indicator, leading byte length, 1
byte for 'a') and Y consumes 32 bytes (null indicator, leading byte
length, 30 bytes for 'a ' )
Umm, varchar2 is going to be somewhat "at an advantage performance
wise". It helps us NOT AT ALL that char(30) is always 30 bytes - to
us, it is simply a varchar2 that is blank padded out to the maximum
length. It helps us in processing - ZERO, zilch, zippo.
Anytime you see anyone say "it is up to 50% faster", and that is it -
no example, no science, no facts, no story to back it up - just laugh
out loud at them and keep on moving along.
There are other "made up things" on that page as well, for example:
"Searching is faster in CHAR as all the strings are stored at a
specified position from the each other, the system doesnot have to
search for the end of string. Whereas in VARCHAR the system has to
first find the end of string and then go for searching."
FALSE: a char is just a varchar2 blank padded - we do not store
strings "at a specified position from each other". We do search for
the end of the string - we use a leading byte length to figure things
out.
CHAR
CHAR should be used for storing fix length character strings. String values will be space/blank padded before stored on disk. If this type is used to store varibale length strings, it will waste a lot of disk space.
VARCHAR2
VARCHAR2 is used to store variable length character strings. The string value's length will be stored on disk with the value itself.
And
At what times we use both?
Its all depend upon your requirement.
CHAR type has fixed size, so if you say it is 10 bytes, then it always stores 10 bytes in the database and it doesn't matter whether you store any text or just empty 10 bytes
VARCHAR2 size depends on how many bytes you are actually going to store in the database. The number you specify is just the maximum number of bytes that can be stored (although 1 byte is minimum)
You should use CHAR when dealing with fixed length strings (you know in advance the exact length of string you will be storing) - database can then manipulate with it better and faster since it knows the exact lenght
You should use VARCHAR2 when you don't know the exact lenght of stored strings.
Situation you would use both may be:
name VARCHAR2(255),
zip_code CHAR(5) --if your users have only 5 place zip codes
When stored in a database, varchar2 uses only the allocated space. E.g. if you have a varchar2(1999) and put 50 bytes in the table, it will use 52 bytes.
But when stored in a database, char always uses the maximum length and is blank-padded. E.g. if you have char(1999) and put 50 bytes in the table, it will consume 2000 bytes.
CHAR is used for storing fix length character strings. It will waste a lot of disk space if this type is used to store varibale length strings.
VARCHAR2 is used to store variable length character strings.
At what times we use both?
This may vary and depend on your requirement.
EDIT:-
Lets understand this with an example, If you have an student name column with size 10; sname CHAR(10) and If a column value 'RAMA' is inserted, 6 empty spaces will be inserted to the right of the value. If this was a VARCHAR column; sname VARCHAR2(10). Then Varchar will take 4 spaces out of 10 possible and free the next 6 for other usage.

SQL - Numeric data type with leading zeros

I need to store Medicare APC codes. I believe the format requires 4 numbers. Leading zeros are relevant. Is there any way to store this data type with verification? How should I store this data (varchar(4), int)?
This kind of issue, storing zero leading numbers that need to be treated as Numeric values on some scenarios (i.e. sorting) and as textual values in others (i.e. addresses) is always a pain and there is no one answer that is best for all users. At my company we have a database that stores numbers as text for codes (not Medicare APC codes) and we must pad them with zero’s so they will sort properly when used in an order operation.
Do not use a numeric data type for this because the item is not a true number but textual data that uses numeric characters. You will not be performing any calculations or aggregates on the codes and so the only benefit to storing them as a number would be to ensure proper sorting of the codes and that can be done with the code stored as text by padding it with zeros where needed. If you sue a numeric data type then any time the code is combined with other textual values you will have to explicitly convert it to CHAR/VARCHAR or let SQL Server do it since implicit conversions should always be avoided that means a lot of extra work for you and the query processor any time the code is used.
Assuming you decide to go with a textual data type the question then is should you use VARCHAR or CHAR and while many who have posted say VARCHAR I would suggest you go with CHAR set to a length of 4. WHY?
The VARCHAR data type is for textual data in which the size (the length or number of characters) is unknown in advance. For this Medicare code we know the length will always be at least 4 and possibly no more than 4 for the foreseeable future. SQL Server handles the storage of the data differently between CHAR and VARCHAR. SQL Server’s BOL (Books On Line) says :
Use CHAR when the size of the column data entries are consistent
Use VARCHAR when the size of the column data varies considerably.
I can’t say for certain this is true for SQL Server 2008 and up but for earlier versions, the use of a VARCHAR data type carries an extra overhead of 1 byte per row of data per column in a table that has a VARCHAR data type. If the data stored is always the same size and in your scenario it sounds like it is then this extra byte is a waste.
In the end it’s up to you as to whether you like CHAR or VARCHAR better but definitely don’t use a numeric data type to store a fixed length code.
That's not numeric data; it's textual data that happens to contain digits.
Use a VARCHAR.
I agree, using
CHAR(4)
for the check constraint use
check( APC_ODE LIKE '[0-9][0-9][0-9][0-9]' )
This will force a 4 digit number only to be accepted...
varchar(4)
optionally, you can still add a check constraint to ensure the data is numeric with leading zeros. This example will throw exceptions in Oracle. In other RDBMS, you could use regular expression checks:
alter table X add constraint C
check (cast(APC_CODE as int) = cast(APC_CODE as int))
If you are certain that the APC codes will always be numeric (that is if it wouldn't change in the near future), a better way would be to leave the database column as is, and handle the formatting (to include leading zeros) at places where you use this field values.
If you need leading 0s, then you must use a varchar or other string data type.
There are ways to format the output for leading 0s without compromising your actual data.
See this blog entry for an easy method.
CHAR(4) seems more appropriate to me (if I understood you right, and the code is always 4 digits).
What you want to use is a VARCHAR data type with a CHECK constraint, using LIKE with a pattern to check for numeric values.
in TSQL
check( isnumeric(APC_ODE) = 1)

Why does VARCHAR need length specification?

Why do we always need to specify VARCHAR(length) instead of just VARCHAR? It is dynamic anyway.
UPD: I'm puzzled specifically by the fact that it is mandatory (e.g. in MySQL).
The "length" of the VARCHAR is not the length of the contents, it is the maximum length of the contents.
The max length of a VARCHAR is not dynamic, it is fixed and therefore has to be specified.
If you don't want to define a maximum size for it then use VARCHAR(MAX).
First off, it does not needed it in all databases. Look at SQL Server, where it is optional.
Regardless, it defines a maximum size for the content of the field. Not a bad thing in itself, and it conveys meaning (for example - phone numbers, where you do not want international numbers in the field).
You can see it as a constraint on your data. It ensures that you don't store data that violates your constraint. It is conceptionally similar to e.g. a check constraint on a integer column that ensure that only positive values are entered.
The more the database knows about the data it is storing, the more optimisations it can make when searching/adding/updating data with requests.
The answer is you don't need to, it's optional.
It's there if you want to ensure that strings do not exceed a certain length.
From Wikipedia:
Varchar fields can be of any size up
to the limit. The limit differs from
types of databases, an Oracle 9i
Database has a limit of 4000 bytes, a
MySQL Database has a limit of 65,535
bytes (for the entire row) and
Microsoft SQL Server 2005 8000 bytes
(unless varchar(max) is used, which
has a maximum storage capacity of
2,147,483,648 bytes).
The most dangerous thing for programmers, as #DimaFomin pointed out in comments, is the default length enforced, if there is no length specified.
How SQL Server enforces the default length:
declare #v varchar = '123'
select #v
result:
1

SQL When to use Which Data Type

Hi I was wondering when I should use the different data types. As in in my table, how can I decide which to use: nvarchar, nchar, varchar, varbinary, etc.
Examples:
What would I use for a ... column:
Phone number,
Address,
First Name, Last Name,
Email,
ID number,
etc.
Thanks for any help!
As a general rule, I would not define anything as a "number" field if I wasn't going to be doing arithmetic on it, even if the data itself was numeric.
Your "phone" field is one example. I'd define that as a varchar.
Varchar, Integer, and Bit cover 99% of my day to day uses.
The question really depends on your requirements. I know that's not a particularly satisfactory answer, but it's true.
The n..char data types are for Unicode data, so if you're going to need to use unicode character sets in your data you should use those types as opposed to their "non-n" analogs. the nchar and char type are fixed length, and the nvarchar and varchar type can have a variable length, which will effect the size of the column on the disk and in memory. Generally I would say to use the type that uses the least disk space but fits for your needs.
This page has links to the Microsoft descriptions of these datatypes for SQL Server 2005, many of which give pointers for when to use which type. You might be particularly interested in this page regarding char and varchar types.
A data type beginning with n means it can be used for unicode characters... eg nVarchar.
Selection of integers is also quite fun.
http://www.databasejournal.com/features/mssql/article.phpr/2212141/Choosing-SQL-Server-2000-Data-Types.htm
The most common data type i use is varchar....
The N* data types (NVARCHAR, NCHAR, NTEXT) are for Unicode strings. They take up two times the space their "normal" pendants (VARCHAR, CHAR, TEXT) need, but they can store Unicode without conversion and possible loss of fidelity.
The TEXT data types can store nearly unlimited amounts of data, but they perform not as good as the CHAR data types because they are stored outside of the record.
THE VARCHAR data types are of variable length. They will not be padded with spaces at the end, but their CHAR pendants will (a CHAR(20) is always twenty characters long, even if if contains 5 letters only. The remaining 15 will be spaces).
The binary data types are for binary data, whatever you care to store into them (images are a primary example).
Other people have given good general answers, but I'd add one important point: when using VARCHAR()s (which I would recommend for those kinds of fields), be sure to use a length that's big enough for any reasonable value. For example, I typically declare VARCHAR(100) for a name, e-mail address, domain name, city name, etc., and VARCHAR(200) for an URL or street address.
This is more than you'll routinely need. In fact, 30 characters is enough for almost all of these values (except full name, but a good database should always store first and last name separately), but it's better than having to change data types some day down the road. There's very little cost in specifying a higher-than-necessary length for a VARCHAR, but note that VARCHAR(MAX) and TEXT do entail significant overhead, so use them only when necessary.
Here's a post which points out a case where a longer-than-necessary VARCHAR can hurt performance: Importance of varchar length in MySQL table. Goes to show that everything has a cost, though in general I'd still favor long VARCHARs.