What is the major difference between Varchar2 and char - sql

Creating Table:
CREATE TABLE test (
charcol CHAR(10),
varcharcol VARCHAR2(10));
SELECT LENGTH(charcol), LENGTH(varcharcol) FROM test;
Result:
LENGTH(CHARCOL) LENGTH(VARCHARCOL)
--------------- ------------------
10 1
Please Let me know what is the difference between Varchar2 and char?
At what times we use both?

Although there are already several answers correctly describing the behaviour of char, I think it needs to be said that you should not use it except in three specific situations:
You are building a fixed-length file or report, and assigning a non-null value to a char avoids the need to code an rpad() expression. For example, if firstname and lastname are both defined as char(20), then firstname||lastname is a shorter way of writing rpad(firstname,20)||rpad(lastname,20) to create
Chuck Norris
You need to distinguish between the explicit empty string '' and null. Normally they are the same thing in Oracle, but assigning '' to a char value will trigger its blank-padding behaviour while null will not, so if it's important to tell the difference, and I can't really think of a reason why it would be, then you have a way to do that.
Your code is ported from (or needs to be compatible with) some other system that requires blank-padding for legacy reasons. In that case you are stuck with it and you have my sympathy.
There is really no reason to use char just because some length is fixed (e.g. a Y/N flag or an ISO currency code such as 'USD'). It's not more efficient, it doesn't save space (there's no mythical length indicator for a varchar2, there's just a blank padding overhead for char), and it doesn't stop anyone entering shorter values. (If you enter 'ZZ' in your char(3) currency column, it will just get stored as 'ZZ '.) It's not even backward-compatible with some ancient version of Oracle that once relied on it, because there never was one.
And the contagion can spread, as (following best practice) you might anchor a variable declaration using something like sales.currency%type. Now your l_sale_currency variable is a stealth char which will get invisibly blank-padded for shorter values (or ''), opening the door to obscure bugs where l_sale_currency does not equal l_refund_currency even though you assigned 'ZZ' to both of them.
Some argue that char(n) (where n is some character length) indicates that values are expected to be n characters long, and this is a form of self-documentation. But surely if you are serious about a 3-character format (ISO-Alpha-3 country codes rather than ISO-Alpha-2, for example), wouldn't you define a constraint to enforce the rule, rather than letting developers glance at a char(3) datatype and draw their own conclusions?
CHAR was introduced in Oracle 6 for, I'm sure, ANSI compatibility reasons. Probably there are potential customers deciding which database product to purchase and ANSI compatibility is on their checklist (or used to be back then), and CHAR with blank-padding is defined in the ANSI standard, so Oracle needs to provide it. You are not supposed to actually use it.

Simple example to show the difference:
SELECT
'"'||CAST('abc' AS VARCHAR2(10))||'"',
'"'||CAST('abc' AS CHAR(10))||'"'
FROM dual;
'"'||CAST('ABC'ASVARCHAR2(10))||'"' '"'||CAST('ABC'ASCHAR(10))||'"'
----------------------------------- -------------------------------
"abc" "abc "
1 row selected.
The CHAR is usefull for expressions where the length of charaters is always fix, e.g. postal code for US states, for example CA, NY, FL, TX

Just to avoid confusion about much wrong information. Here are some information about difference including performance
Reference: https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:2668391900346844476
Since a char is nothing more than a VARCHAR2 that is blank padded out
to the maximum length - that is, the difference between the columns X
and Y below:
create table t ( x varchar2(30), y char(30) ); insert into t (x,y)
values ( rpad('a',' ',30), 'a' );
IS ABSOLUTELY NOTHING, and given that the difference between columns X
and Y below:
insert into t (x,y) values ('a','a')
is that X consumes 3 bytes (null indicator, leading byte length, 1
byte for 'a') and Y consumes 32 bytes (null indicator, leading byte
length, 30 bytes for 'a ' )
Umm, varchar2 is going to be somewhat "at an advantage performance
wise". It helps us NOT AT ALL that char(30) is always 30 bytes - to
us, it is simply a varchar2 that is blank padded out to the maximum
length. It helps us in processing - ZERO, zilch, zippo.
Anytime you see anyone say "it is up to 50% faster", and that is it -
no example, no science, no facts, no story to back it up - just laugh
out loud at them and keep on moving along.
There are other "made up things" on that page as well, for example:
"Searching is faster in CHAR as all the strings are stored at a
specified position from the each other, the system doesnot have to
search for the end of string. Whereas in VARCHAR the system has to
first find the end of string and then go for searching."
FALSE: a char is just a varchar2 blank padded - we do not store
strings "at a specified position from each other". We do search for
the end of the string - we use a leading byte length to figure things
out.

CHAR
CHAR should be used for storing fix length character strings. String values will be space/blank padded before stored on disk. If this type is used to store varibale length strings, it will waste a lot of disk space.
VARCHAR2
VARCHAR2 is used to store variable length character strings. The string value's length will be stored on disk with the value itself.
And
At what times we use both?
Its all depend upon your requirement.

CHAR type has fixed size, so if you say it is 10 bytes, then it always stores 10 bytes in the database and it doesn't matter whether you store any text or just empty 10 bytes
VARCHAR2 size depends on how many bytes you are actually going to store in the database. The number you specify is just the maximum number of bytes that can be stored (although 1 byte is minimum)
You should use CHAR when dealing with fixed length strings (you know in advance the exact length of string you will be storing) - database can then manipulate with it better and faster since it knows the exact lenght
You should use VARCHAR2 when you don't know the exact lenght of stored strings.
Situation you would use both may be:
name VARCHAR2(255),
zip_code CHAR(5) --if your users have only 5 place zip codes

When stored in a database, varchar2 uses only the allocated space. E.g. if you have a varchar2(1999) and put 50 bytes in the table, it will use 52 bytes.
But when stored in a database, char always uses the maximum length and is blank-padded. E.g. if you have char(1999) and put 50 bytes in the table, it will consume 2000 bytes.

CHAR is used for storing fix length character strings. It will waste a lot of disk space if this type is used to store varibale length strings.
VARCHAR2 is used to store variable length character strings.
At what times we use both?
This may vary and depend on your requirement.
EDIT:-
Lets understand this with an example, If you have an student name column with size 10; sname CHAR(10) and If a column value 'RAMA' is inserted, 6 empty spaces will be inserted to the right of the value. If this was a VARCHAR column; sname VARCHAR2(10). Then Varchar will take 4 spaces out of 10 possible and free the next 6 for other usage.

Related

How to choose SQL DDL type parameters?

All the SQL dialects I've seen use to either allow or to require to specify an integer argument for some of the data types they support when defining a table. But I haven't managed to find any comprehensive information (at least for MySQL and SQLite) about what exactly do these numbers mean and how to chose them adequately...
If you mean the notation like INT(11), VARCHAR(255), then normally it's the length of the values stored (or retrieved) from the table.
http://dev.mysql.com/doc/refman/5.5/en/numeric-type-overview.html
http://dev.mysql.com/doc/refman/5.5/en/string-type-overview.html
There's an "M" in parentheses after name of every type which supports it.
Also, keep in mind that these numbers may affect both storage and selection length. For example, if you define a column as VARCHAR(100), the actual space reserved for each value will still be 255. But the values you retrieve in SELECT will be trimmed down to 100 characters. And if you define a VARCHAR(256), then it reserves up to 65535 characters, if I remember correctly.
The same with INT(5). It still reserves space for storing values up to 2147483647 (max value for signed integers), but trims the input/output to 5 digits.

SQL - Numeric data type with leading zeros

I need to store Medicare APC codes. I believe the format requires 4 numbers. Leading zeros are relevant. Is there any way to store this data type with verification? How should I store this data (varchar(4), int)?
This kind of issue, storing zero leading numbers that need to be treated as Numeric values on some scenarios (i.e. sorting) and as textual values in others (i.e. addresses) is always a pain and there is no one answer that is best for all users. At my company we have a database that stores numbers as text for codes (not Medicare APC codes) and we must pad them with zero’s so they will sort properly when used in an order operation.
Do not use a numeric data type for this because the item is not a true number but textual data that uses numeric characters. You will not be performing any calculations or aggregates on the codes and so the only benefit to storing them as a number would be to ensure proper sorting of the codes and that can be done with the code stored as text by padding it with zeros where needed. If you sue a numeric data type then any time the code is combined with other textual values you will have to explicitly convert it to CHAR/VARCHAR or let SQL Server do it since implicit conversions should always be avoided that means a lot of extra work for you and the query processor any time the code is used.
Assuming you decide to go with a textual data type the question then is should you use VARCHAR or CHAR and while many who have posted say VARCHAR I would suggest you go with CHAR set to a length of 4. WHY?
The VARCHAR data type is for textual data in which the size (the length or number of characters) is unknown in advance. For this Medicare code we know the length will always be at least 4 and possibly no more than 4 for the foreseeable future. SQL Server handles the storage of the data differently between CHAR and VARCHAR. SQL Server’s BOL (Books On Line) says :
Use CHAR when the size of the column data entries are consistent
Use VARCHAR when the size of the column data varies considerably.
I can’t say for certain this is true for SQL Server 2008 and up but for earlier versions, the use of a VARCHAR data type carries an extra overhead of 1 byte per row of data per column in a table that has a VARCHAR data type. If the data stored is always the same size and in your scenario it sounds like it is then this extra byte is a waste.
In the end it’s up to you as to whether you like CHAR or VARCHAR better but definitely don’t use a numeric data type to store a fixed length code.
That's not numeric data; it's textual data that happens to contain digits.
Use a VARCHAR.
I agree, using
CHAR(4)
for the check constraint use
check( APC_ODE LIKE '[0-9][0-9][0-9][0-9]' )
This will force a 4 digit number only to be accepted...
varchar(4)
optionally, you can still add a check constraint to ensure the data is numeric with leading zeros. This example will throw exceptions in Oracle. In other RDBMS, you could use regular expression checks:
alter table X add constraint C
check (cast(APC_CODE as int) = cast(APC_CODE as int))
If you are certain that the APC codes will always be numeric (that is if it wouldn't change in the near future), a better way would be to leave the database column as is, and handle the formatting (to include leading zeros) at places where you use this field values.
If you need leading 0s, then you must use a varchar or other string data type.
There are ways to format the output for leading 0s without compromising your actual data.
See this blog entry for an easy method.
CHAR(4) seems more appropriate to me (if I understood you right, and the code is always 4 digits).
What you want to use is a VARCHAR data type with a CHECK constraint, using LIKE with a pattern to check for numeric values.
in TSQL
check( isnumeric(APC_ODE) = 1)

What's the difference between VARCHAR and CHAR?

What's the difference between VARCHAR and CHAR in MySQL?
I am trying to store MD5 hashes.
VARCHAR is variable-length.
CHAR is fixed length.
If your content is a fixed size, you'll get better performance with CHAR.
See the MySQL page on CHAR and VARCHAR Types for a detailed explanation (be sure to also read the comments).
CHAR
Used to store character string value of fixed length.
The maximum no. of characters the data type can hold is 255 characters.
It's 50% faster than VARCHAR.
Uses static memory allocation.
VARCHAR
Used to store variable length alphanumeric data.
The maximum this data type can hold is up to
Pre-MySQL 5.0.3: 255 characters.
Post-MySQL 5.0.3: 65,535 characters shared for the row.
It's slower than CHAR.
Uses dynamic memory allocation.
CHAR Vs VARCHAR
CHAR is used for Fixed Length Size Variable
VARCHAR is used for Variable Length Size Variable.
E.g.
Create table temp
(City CHAR(10),
Street VARCHAR(10));
Insert into temp
values('Pune','Oxford');
select length(city), length(street) from temp;
Output will be
length(City) Length(street)
10 6
Conclusion: To use storage space efficiently must use VARCHAR Instead CHAR if variable length is variable
A CHAR(x) column can only have exactly x characters.
A VARCHAR(x) column can have up to x characters.
Since your MD5 hashes will always be the same size, you should probably use a CHAR.
However, you shouldn't be using MD5 in the first place; it has known weaknesses.
Use SHA2 instead.
If you're hashing passwords, you should use bcrypt.
What's the difference between VARCHAR and CHAR in MySQL?
To already given answers I would like to add that in OLTP systems or in systems with frequent updates consider using CHAR even for variable size columns because of possible VARCHAR column fragmentation during updates.
I am trying to store MD5 hashes.
MD5 hash is not the best choice if security really matters. However, if you will use any hash function, consider BINARY type for it instead (e.g. MD5 will produce 16-byte hash, so BINARY(16) would be enough instead of CHAR(32) for 32 characters representing hex digits. This would save more space and be performance effective.
Varchar cuts off trailing spaces if the entered characters is shorter than the declared length, while char will not. Char will pad spaces and will always be the length of the declared length. In terms of efficiency, varchar is more adept as it trims characters to allow more adjustment. However, if you know the exact length of char, char will execute with a bit more speed.
CHAR is fixed length and VARCHAR is variable length. CHAR always uses the same amount of storage space per entry, while VARCHAR only uses the amount necessary to store the actual text.
CHAR is a fixed length field; VARCHAR is a variable length field. If you are storing strings with a wildly variable length such as names, then use a VARCHAR, if the length is always the same, then use a CHAR because it is slightly more size-efficient, and also slightly faster.
In most RDBMSs today, they are synonyms. However for those systems that still have a distinction, a CHAR field is stored as a fixed-width column. If you define it as CHAR(10), then 10 characters are written to the table, where "padding" (typically spaces) is used to fill in any space that the data does not use up. For example, saving "bob" would be saved as ("bob"+7 spaces). A VARCHAR (variable character) column is meant to store data without wasting the extra space that a CHAR column does.
As always, Wikipedia speaks louder.
CHAR
CHAR is a fixed length string data type, so any remaining space in the field is padded with blanks.
CHAR takes up 1 byte per character. So, a CHAR(100) field (or variable) takes up 100 bytes on disk, regardless of the string it holds.
VARCHAR
VARCHAR is a variable length string data type, so it holds only the characters you assign to it.
VARCHAR takes up 1 byte per character, + 2 bytes to hold length information (For example, if you set a VARCHAR(100) data type = ‘Dhanika’, then it would take up 7 bytes (for D, H, A, N, I, K and A) plus 2 bytes, or 9 bytes in all.)
CHAR
Uses specific allocation of memory
Time efficient
VARCHAR
Uses dynamic allocation of memory
Memory efficient
The char is a fixed-length character data type, the varchar is a variable-length character data type.
Because char is a fixed-length data type, the storage size of the char value is equal to the maximum size for this column. Because varchar is a variable-length data type, the storage size of the varchar value is the actual length of the data entered, not the maximum size for this column.
You can use char when the data entries in a column are expected to be the same size.
You can use varchar when the data entries in a column are expected to vary considerably in size.
Distinguishing between the two is also good for an integrity aspect.
If you expect to store things that have a rule about their length such as yes or no then you can use char(1) to store Y or N. Also useful for things like currency codes, you can use char(3) to store things like USD, EUR or AUD.
Then varchar is better for things were there is no general rule about their length except for the limit. It's good for things like names or descriptions where there is a lot of variation of how long the values will be.
Then the text data type comes along and puts a spanner in the works (although it's generally just varchar with no defined upper limit).
according to High Performance MySQL book:
VARCHAR stores variable-length character strings and is the most common string data type. It can require less storage space than
fixed-length types, because it uses only as much space as it needs
(i.e., less space is used to store shorter values). The exception is a
MyISAM table created with ROW_FORMAT=FIXED, which uses a fixed amount
of space on disk for each row and can thus waste space. VARCHAR helps
performance because it saves space.
CHAR is fixed-length: MySQL always allocates enough space for the specified number of characters. When storing a CHAR value, MySQL
removes any trailing spaces. (This was also true of VARCHAR in MySQL
4.1 and older versions—CHAR and VAR CHAR were logically identical and differed only in storage format.) Values are padded with spaces as
needed for comparisons.
Char has a fixed length (supports 2000 characters), it is stand for character is a data type
Varchar has a variable length (supports 4000 characters)
Char or varchar- it is used to enter texual data where the length can be indicated in brackets
Eg- name char (20)
CHAR :
Supports both Character & Numbers.
Supports 2000 characters.
Fixed Length.
VARCHAR :
Supports both Character & Numbers.
Supports 4000 characters.
Variable Length.
any comments......!!!!

SQL When to use Which Data Type

Hi I was wondering when I should use the different data types. As in in my table, how can I decide which to use: nvarchar, nchar, varchar, varbinary, etc.
Examples:
What would I use for a ... column:
Phone number,
Address,
First Name, Last Name,
Email,
ID number,
etc.
Thanks for any help!
As a general rule, I would not define anything as a "number" field if I wasn't going to be doing arithmetic on it, even if the data itself was numeric.
Your "phone" field is one example. I'd define that as a varchar.
Varchar, Integer, and Bit cover 99% of my day to day uses.
The question really depends on your requirements. I know that's not a particularly satisfactory answer, but it's true.
The n..char data types are for Unicode data, so if you're going to need to use unicode character sets in your data you should use those types as opposed to their "non-n" analogs. the nchar and char type are fixed length, and the nvarchar and varchar type can have a variable length, which will effect the size of the column on the disk and in memory. Generally I would say to use the type that uses the least disk space but fits for your needs.
This page has links to the Microsoft descriptions of these datatypes for SQL Server 2005, many of which give pointers for when to use which type. You might be particularly interested in this page regarding char and varchar types.
A data type beginning with n means it can be used for unicode characters... eg nVarchar.
Selection of integers is also quite fun.
http://www.databasejournal.com/features/mssql/article.phpr/2212141/Choosing-SQL-Server-2000-Data-Types.htm
The most common data type i use is varchar....
The N* data types (NVARCHAR, NCHAR, NTEXT) are for Unicode strings. They take up two times the space their "normal" pendants (VARCHAR, CHAR, TEXT) need, but they can store Unicode without conversion and possible loss of fidelity.
The TEXT data types can store nearly unlimited amounts of data, but they perform not as good as the CHAR data types because they are stored outside of the record.
THE VARCHAR data types are of variable length. They will not be padded with spaces at the end, but their CHAR pendants will (a CHAR(20) is always twenty characters long, even if if contains 5 letters only. The remaining 15 will be spaces).
The binary data types are for binary data, whatever you care to store into them (images are a primary example).
Other people have given good general answers, but I'd add one important point: when using VARCHAR()s (which I would recommend for those kinds of fields), be sure to use a length that's big enough for any reasonable value. For example, I typically declare VARCHAR(100) for a name, e-mail address, domain name, city name, etc., and VARCHAR(200) for an URL or street address.
This is more than you'll routinely need. In fact, 30 characters is enough for almost all of these values (except full name, but a good database should always store first and last name separately), but it's better than having to change data types some day down the road. There's very little cost in specifying a higher-than-necessary length for a VARCHAR, but note that VARCHAR(MAX) and TEXT do entail significant overhead, so use them only when necessary.
Here's a post which points out a case where a longer-than-necessary VARCHAR can hurt performance: Importance of varchar length in MySQL table. Goes to show that everything has a cost, though in general I'd still favor long VARCHARs.

What are the use cases for selecting CHAR over VARCHAR in SQL?

I realize that CHAR is recommended if all my values are fixed-width. But, so what? Why not just pick VARCHAR for all text fields just to be safe.
The general rule is to pick CHAR if all rows will have close to the same length. Pick VARCHAR (or NVARCHAR) when the length varies significantly. CHAR may also be a bit faster because all the rows are of the same length.
It varies by DB implementation, but generally, VARCHAR (or NVARCHAR) uses one or two more bytes of storage (for length or termination) in addition to the actual data. So (assuming you are using a one-byte character set) storing the word "FooBar"
CHAR(6) = 6 bytes (no overhead)
VARCHAR(100) = 8 bytes (2 bytes of overhead)
CHAR(10) = 10 bytes (4 bytes of waste)
The bottom line is CHAR can be faster and more space-efficient for data of relatively the same length (within two characters length difference).
Note: Microsoft SQL has 2 bytes of overhead for a VARCHAR. This may vary from DB to DB, but generally, there is at least 1 byte of overhead needed to indicate length or EOL on a VARCHAR.
As was pointed out by Gaven in the comments: Things change when it comes to multi-byte characters sets, and is a is case where VARCHAR becomes a much better choice.
A note about the declared length of the VARCHAR: Because it stores the length of the actual content, then you don't waste unused length. So storing 6 characters in VARCHAR(6), VARCHAR(100), or VARCHAR(MAX) uses the same amount of storage. Read more about the differences when using VARCHAR(MAX). You declare a maximum size in VARCHAR to limit how much is stored.
In the comments AlwaysLearning pointed out that the Microsoft Transact-SQL docs seem to say the opposite. I would suggest that is an error or at least the docs are unclear.
If you're working with me and you're working with Oracle, I would probably make you use varchar in almost every circumstance. The assumption that char uses less processing power than varchar may be true...for now...but database engines get better over time and this sort of general rule has the making of a future "myth".
Another thing: I have never seen a performance problem because someone decided to go with varchar. You will make much better use of your time writing good code (fewer calls to the database) and efficient SQL (how do indexes work, how does the optimizer make decisions, why is exists faster than in usually...).
Final thought: I have seen all sorts of problems with use of CHAR, people looking for '' when they should be looking for ' ', or people looking for 'FOO' when they should be looking for 'FOO (bunch of spaces here)', or people not trimming the trailing blanks, or bugs with Powerbuilder adding up to 2000 blanks to the value it returns from an Oracle procedure.
In addition to performance benefits, CHAR can be used to indicate that all values should be the same length, e.g., a column for U.S. state abbreviations.
Char is a little bit faster, so if you have a column that you KNOW will be a certain length, use char. For example, storing (M)ale/(F)emale/(U)nknown for gender, or 2 characters for a US state.
Does NChar or Char perform better that their var alternatives?
Great question. The simple answer is yes in certain situations. Let's see if this can be explained.
Obviously we all know that if I create a table with a column of varchar(255) (let's call this column myColumn) and insert a million rows but put only a few characters into myColumn for each row, the table will be much smaller (overall number of data pages needed by the storage engine) than if I had created myColumn as char(255). Anytime I do an operation (DML) on that table and request alot of rows, it will be faster when myColumn is varchar because I don't have to move around all those "extra" spaces at the end. Move, as in when SQL Server does internal sorts such as during a distinct or union operation, or if it chooses a merge during it's query plan, etc. Move could also mean the time it takes to get the data from the server to my local pc or to another computer or wherever it is going to be consumed.
But there is some overhead in using varchar. SQL Server has to use a two byte indicator (overhead) to, on each row, to know how many bytes that particular row's myColumn has in it. It's not the extra 2 bytes that presents the problem, it's the having to "decode" the length of the data in myColumn on every row.
In my experiences it makes the most sense to use char instead of varchar on columns that will be joined to in queries. For example the primary key of a table, or some other column that will be indexed. CustomerNumber on a demographic table, or CodeID on a decode table, or perhaps OrderNumber on an order table. By using char, the query engine can more quickly perform the join because it can do straight pointer arithmetic (deterministically) rather than having to move it's pointers a variable amount of bytes as it reads the pages. I know I might have lost you on that last sentence. Joins in SQL Server are based around the idea of "predicates." A predicate is a condition. For example myColumn = 1, or OrderNumber < 500.
So if SQL Server is performing a DML statement, and the predicates, or "keys" being joined on are a fixed length (char), the query engine doesn't have to do as much work to match rows from one table to rows from another table. It won't have to find out how long the data is in the row and then walk down the string to find the end. All that takes time.
Now bear in mind this can easily be poorly implemented. I have seen char used for primary key fields in online systems. The width must be kept small i.e. char(15) or something reasonable. And it works best in online systems because you are usually only retrieving or upserting a small number of rows, so having to "rtrim" those trailing spaces you'll get in the result set is a trivial task as opposed to having to join millions of rows from one table to millions of rows on another table.
Another reason CHAR makes sense over varchar on online systems is that it reduces page splits. By using char, you are essentially "reserving" (and wasting) that space so if a user comes along later and puts more data into that column SQL has already allocated space for it and in it goes.
Another reason to use CHAR is similar to the second reason. If a programmer or user does a "batch" update to millions of rows, adding some sentence to a note field for example, you won't get a call from your DBA in the middle of the night wondering why their drives are full. In other words, it leads to more predictable growth of the size of a database.
So those are 3 ways an online (OLTP) system can benefit from char over varchar. I hardly ever use char in a warehouse/analysis/OLAP scenario because usually you have SO much data that all those char columns can add up to lots of wasted space.
Keep in mind that char can make your database much larger but most backup tools have data compression so your backups tend to be about the same size as if you had used varchar. For example LiteSpeed or RedGate SQL Backup.
Another use is in views created for exporting data to a fixed width file. Let's say I have to export some data to a flat file to be read by a mainframe. It is fixed width (not delimited). I like to store the data in my "staging" table as varchar (thus consuming less space on my database) and then use a view to CAST everything to it's char equivalent, with the length corresponding to the width of the fixed width for that column. For example:
create table tblStagingTable (
pkID BIGINT (IDENTITY,1,1),
CustomerFirstName varchar(30),
CustomerLastName varchar(30),
CustomerCityStateZip varchar(100),
CustomerCurrentBalance money )
insert into tblStagingTable
(CustomerFirstName,CustomerLastName, CustomerCityStateZip) ('Joe','Blow','123 Main St Washington, MD 12345', 123.45)
create view vwStagingTable AS
SELECT CustomerFirstName = CAST(CustomerFirstName as CHAR(30)),
CustomerLastName = CAST(CustomerLastName as CHAR(30)),
CustomerCityStateZip = CAST(CustomerCityStateZip as CHAR(100)),
CustomerCurrentBalance = CAST(CAST(CustomerCurrentBalance as NUMERIC(9,2)) AS CHAR(10))
SELECT * from vwStagingTable
This is cool because internally my data takes up less space because it's using varchar. But when I use DTS or SSIS or even just a cut and paste from SSMS to Notepad, I can use the view and get the right number of trailing spaces. In DTS we used to have a feature called, damn I forget I think it was called "suggest columns" or something. In SSIS you can't do that anymore, you have to tediously define the flat file connection manager. But since you have your view setup, SSIS can know the width of each column and it can save alot of time when building your data flow tasks.
So bottom line... use varchar. There are a very small number of reasons to use char and it's only for performance reasons. If you have a system with hundrends of millions of rows you will see a noticeable difference if the predicates are deterministic (char) but for most systems using char is simply wasting space.
Hope that helps.
Jeff
There are performance benefits, but here is one that has not been mentioned: row migration. With char, you reserve the entire space in advance.So let's says you have a char(1000), and you store 10 characters, you will use up all 1000 charaters of space. In a varchar2(1000), you will only use 10 characters. The problem comes when you modify the data. Let's say you update the column to now contain 900 characters. It is possible that the space to expand the varchar is not available in the current block. In that case, the DB engine must migrate the row to another block, and make a pointer in the original block to the new row in the new block. To read this data, the DB engine will now have to read 2 blocks.
No one can equivocally say that varchar or char are better. There is a space for time tradeoff, and consideration of whether the data will be updated, especially if there is a good chance that it will grow.
There is a difference between early performance optimization and using a best practice type of rule. If you are creating new tables where you will always have a fixed length field, it makes sense to use CHAR, you should be using it in that case. This isn't early optimization, but rather implementing a rule of thumb (or best practice).
i.e. - If you have a 2 letter state field, use CHAR(2). If you have a field with the actual state names, use VARCHAR.
I would choose varchar unless the column stores fixed value like US state code -- which is always 2 chars long and the list of valid US states code doesn't change often :).
In every other case, even like storing hashed password (which is fixed length), I would choose varchar.
Why -- char type column is always fulfilled with spaces, which makes for column my_column defined as char(5) with value 'ABC' inside comparation:
my_column = 'ABC' -- my_column stores 'ABC ' value which is different then 'ABC'
false.
This feature could lead to many irritating bugs during development and makes testing harder.
CHAR takes up less storage space than VARCHAR if all your data values in that field are the same length. Now perhaps in 2009 a 800GB database is the same for all intents and purposes as a 810GB if you converted the VARCHARs to CHARs, but for short strings (1 or 2 characters), CHAR is still a industry "best practice" I would say.
Now if you look at the wide variety of data types most databases provide even for integers alone (bit, tiny, int, bigint), there ARE reasons to choose one over the other. Simply choosing bigint every time is actually being a bit ignorant of the purposes and uses of the field. If a field simply represents a persons age in years, a bigint is overkill. Now it's not necessarily "wrong", but it's not efficient.
But its an interesting argument, and as databases improve over time, it could be argued CHAR vs VARCHAR does get less relevant.
I would NEVER use chars. I’ve had this debate with many people and they always bring up the tired cliché that char is faster. Well I say, how much faster? What are we talking about here, milliseconds, seconds and if so how many? You’re telling me because someone claims its a few milliseconds faster, we should introduce tons of hard to fix bugs into the system?
So here are some issues you will run into:
Every field will be padded, so you end up with code forever that has RTRIMS everywhere. This is also a huge disk space waste for the longer fields.
Now let’s say you have the quintessential example of a char field of just one character but the field is optional. If somebody passes an empty string to that field it becomes one space. So when another application/process queries it, they get one single space, if they don’t use rtrim. We’ve had xml documents, files and other programs, display just one space, in optional fields and break things.
So now you have to ensure that you’re passing nulls and not empty string, to the char field. But that’s NOT the correct use of null. Here is the use of null. Lets say you get a file from a vendor
Name|Gender|City
Bob||Los Angeles
If gender is not specified than you enter Bob, empty string and Los Angeles into the table. Now lets say you get the file and its format changes and gender is no longer included but was in the past.
Name|City
Bob|Seattle
Well now since gender is not included, I would use null. Varchars support this without issues.
Char on the other hand is different. You always have to send null. If you ever send empty string, you will end up with a field that has spaces in it.
I could go on and on with all the bugs I’ve had to fix from chars and in about 20 years of development.
I stand by Jim McKeeth's comment.
Also, indexing and full table scans are faster if your table has only CHAR columns. Basically the optimizer will be able to predict how big each record is if it only has CHAR columns, while it needs to check the size value of every VARCHAR column.
Besides if you update a VARCHAR column to a size larger than its previous content you may force the database to rebuild its indexes (because you forced the database to physically move the record on disk). While with CHAR columns that'll never happen.
But you probably won't care about the performance hit unless your table is huge.
Remember Djikstra's wise words. Early performance optimization is the root of all evil.
Many people have pointed out that if you know the exact length of the value using CHAR has some benefits. But while storing US states as CHAR(2) is great today, when you get the message from sales that 'We have just made our first sale to Australia', you are in a world of pain. I always send to overestimate how long I think fields will need to be rather than making an 'exact' guess to cover for future events. VARCHAR will give me more flexibility in this area.
I think in your case there is probably no reason to not pick Varchar. It gives you flexibility and as has been mentioned by a number of respondants, performance is such now that except in very specific circumstances us meer mortals (as opposed to Google DBA's) will not notice the difference.
An interesting thing worth noting when it comes to DB Types is the sqlite (a popular mini database with pretty impressive performance) puts everything into the database as a string and types on the fly.
I always use VarChar and usually make it much bigger than I might strickly need. Eg. 50 for Firstname, as you say why not just to be safe.
It's the classic space versus performance tradeoff.
In MS SQL 2005, Varchar (or NVarchar for lanuagues requiring two bytes per character ie Chinese) are variable length. If you add to the row after it has been written to the hard disk it will locate the data in a non-contigious location to the original row and lead to fragmentation of your data files. This will affect performance.
So, if space is not an issue then Char are better for performance but if you want to keep the database size down then varchars are better.
Fragmentation. Char reserves space and VarChar does not. Page split can be required to accommodate update to varchar.
There is some small processing overhead in calculating the actual needed size for a column value and allocating the space for a Varchar, so if you are definitely sure how long the value will always be, it is better to use Char and avoid the hit.
when using varchar values SQL Server needs an additional 2 bytes per row to store some info about that column whereas if you use char it doesn't need that
so unless you
Using CHAR (NCHAR) and VARCHAR (NVARCHAR) brings differences in the ways the database server stores the data. The first one introduces trailing blanks; I have encountered problem when using it with LIKE operator in SQL SERVER functions. So I have to make it safe by using VARCHAR (NVARCHAR) all the times.
For example, if we have a table TEST(ID INT, Status CHAR(1)), and you write a function to list all the records with some specific value like the following:
CREATE FUNCTION List(#Status AS CHAR(1) = '')
RETURNS TABLE
AS
RETURN
SELECT * FROM TEST
WHERE Status LIKE '%' + #Status '%'
In this function we expect that when we put the default parameter the function will return all the rows, but in fact it does not. Change the #Status data type to VARCHAR will fix the issue.
In some SQL databases, VARCHAR will be padded out to its maximum size in order to optimize the offsets, This is to speed up full table scans and indexes.
Because of this, you do not have any space savings by using a VARCHAR(200) compared to a CHAR(200)