Related
I need to store Medicare APC codes. I believe the format requires 4 numbers. Leading zeros are relevant. Is there any way to store this data type with verification? How should I store this data (varchar(4), int)?
This kind of issue, storing zero leading numbers that need to be treated as Numeric values on some scenarios (i.e. sorting) and as textual values in others (i.e. addresses) is always a pain and there is no one answer that is best for all users. At my company we have a database that stores numbers as text for codes (not Medicare APC codes) and we must pad them with zero’s so they will sort properly when used in an order operation.
Do not use a numeric data type for this because the item is not a true number but textual data that uses numeric characters. You will not be performing any calculations or aggregates on the codes and so the only benefit to storing them as a number would be to ensure proper sorting of the codes and that can be done with the code stored as text by padding it with zeros where needed. If you sue a numeric data type then any time the code is combined with other textual values you will have to explicitly convert it to CHAR/VARCHAR or let SQL Server do it since implicit conversions should always be avoided that means a lot of extra work for you and the query processor any time the code is used.
Assuming you decide to go with a textual data type the question then is should you use VARCHAR or CHAR and while many who have posted say VARCHAR I would suggest you go with CHAR set to a length of 4. WHY?
The VARCHAR data type is for textual data in which the size (the length or number of characters) is unknown in advance. For this Medicare code we know the length will always be at least 4 and possibly no more than 4 for the foreseeable future. SQL Server handles the storage of the data differently between CHAR and VARCHAR. SQL Server’s BOL (Books On Line) says :
Use CHAR when the size of the column data entries are consistent
Use VARCHAR when the size of the column data varies considerably.
I can’t say for certain this is true for SQL Server 2008 and up but for earlier versions, the use of a VARCHAR data type carries an extra overhead of 1 byte per row of data per column in a table that has a VARCHAR data type. If the data stored is always the same size and in your scenario it sounds like it is then this extra byte is a waste.
In the end it’s up to you as to whether you like CHAR or VARCHAR better but definitely don’t use a numeric data type to store a fixed length code.
That's not numeric data; it's textual data that happens to contain digits.
Use a VARCHAR.
I agree, using
CHAR(4)
for the check constraint use
check( APC_ODE LIKE '[0-9][0-9][0-9][0-9]' )
This will force a 4 digit number only to be accepted...
varchar(4)
optionally, you can still add a check constraint to ensure the data is numeric with leading zeros. This example will throw exceptions in Oracle. In other RDBMS, you could use regular expression checks:
alter table X add constraint C
check (cast(APC_CODE as int) = cast(APC_CODE as int))
If you are certain that the APC codes will always be numeric (that is if it wouldn't change in the near future), a better way would be to leave the database column as is, and handle the formatting (to include leading zeros) at places where you use this field values.
If you need leading 0s, then you must use a varchar or other string data type.
There are ways to format the output for leading 0s without compromising your actual data.
See this blog entry for an easy method.
CHAR(4) seems more appropriate to me (if I understood you right, and the code is always 4 digits).
What you want to use is a VARCHAR data type with a CHECK constraint, using LIKE with a pattern to check for numeric values.
in TSQL
check( isnumeric(APC_ODE) = 1)
What is the difference between char, nchar, ntext, nvarchar, text and varchar in SQL?
Is there really an application case for each of these types, or are some of them just deprecated?
text and ntext are deprecated, so lets omit them for a moment. For what is left, there are 3 dimensions:
Unicode (UCS-2) vs. non-unicode: N in front of the name denotes Unicode
Fixed length vs. variable length: var denotes variable, otherwise fixed
In-row vs. BLOB: (max) as length denotes a BLOB, otherwise is an in-row value
So with this, you can read any type's meaning:
CHAR(10): is an in-row fixed length non-Unicode of size 10
NVARCHAR(256): is an in-row variable length Unicode of size up-to 256
VARCHAR(MAX): is a BLOB variable length non-Unicode
The deprecated types text and ntext correspond to the new types varchar(max) and nvarchar(max) respectively.
When you go to details, the meaning of in-row vs. BLOB blurs for small lengths as the engine may optimize the storage and pull a BLOB in-row or push an in-row value into the 'small BLOB' allocation unit, but this is just an implementation detail. See Table and Index Organization.
From a programming point of view, all types: CHAR, VARCHAR, NCHAR, NVARCHAR, VARCHAR(MAX) and NVARCHAR(MAX), support an uniform string API: String Functions. The old, deprecated, types TEXT and NTEXT do not support this API, they have a separate, deperated, TEXT API to manipulate. You should not use the deprecated types.
BLOB types support efficient in-place updates by using the UPDATE table SET column.WRITE(#value, #offset) syntax.
The difference between fixed-length and variable length types vanishes when row-compression on a table. With row-compression enabled, fixed lenght types and variable length are stored in the same format and trailing spaces are not stored on disk, see Row Compression Implementation. Note that page-compression implies row-compression.
'n' represents support for unicode characters.
char - specifies string with fixed length storage. Space allocated with or without data present.
varchar - Varying length storage. Space is allocated as much as length of data in column.
text - To store huge data. The space allocated is 16 bytes for column storage.
Additionally - text and ntext have been deprecated for varchar(max) and nvarchar(max)
text and ntext are deprecated in favor of varchar(max) and nvarchar(max)
The n prefix simply means Unicode. They "n" types work similarly to the plain versions except they work with Unicode text.
char is a fixed length field. Thus char(10) filled with "Yes" will still take 10 bytes of storage.
varchar is a variable length field. char(10) filled with "Yes" will take 5 bytes of storage (there is a 2 byte overhead for using var data types).
char(n) holding string of length x. Storage = n bytes.
varchar(n) holding string of length x. Storage = x+2 bytes.
vchar and nvarchar are similar except it is 2 bytes per character.
Generally speaking you should only use char & char (over varchar & nvarchar) when working with fixed or semi-fixed strings. A good example would be a product_code or user_type which is always n characters long.
You shouldn't use text (or ntext) as it has been deprecated. varchar(max) & nvarchar(max) provides the same functionality.
N prefix indicates unicode support and takes up twice the bytes per character of non-unicode.
Varchar is variable length. You use an extra 2 bytes per field to store the length.
Char is fixed length. If you know how long your data will be, use char as you will save bytes!
Text is mostly deprecated in my experience.
Be wary of using Varchar(max) and NVarchar(max) as these fields cannot be indexed.
I only know between "char" and "varchar".
char: it can allocate memory of specified size whether or not it is filled
varchar: it will allocate memory based on the number of characters in it but it should have some size called maximum size.
Text is meant for very large amounts of text, and is in general not meant to be searchable (but can be in some circumstances. It will be slow anyway).
The char/nchar datatypes are of fixed lenghts, and are padded if entered stuff is shorter, as opposed to the varchar/nvarchar types, which are variable length.
The n types have unicode support, where the non-n types don't.
Text is deprecated.
Char is a set value. When you say char(10), you are reserving 10 characters for every single row, whether they are used or not. Use this for something that shouldn't change lengths (For example, Zip Code or SSN)
varchar is variable. When you say varchar(10), 2 bytes is set aside to store the size of the data, as well as the actual data (which might be only say, four bytes).
The N represents uni-code. Twice the space.
n-prefix: unicode.
var*: variable length, the rest is fixed length.
All data types are properly and nicely... documented.
Like here:
http://msdn.microsoft.com/en-us/library/ms187752.aspx
Is there really an application case
for each of these types, or are some
of them just deprecated?
No, there is a good case for ANY of them.
I noticed that I can write
SELECT CAST(Min(mynumber) AS VARCHAR(Max))+'mystring' AS X
as
SELECT CAST(Min(mynumber) AS VARCHAR)+'mystring' X
Will I regret leaving out the (Max) parameter?
You'll regret it in the (unlikely) situation that MAX(mynumber) has more than 30 characters:
When n is not specified when using the CAST and CONVERT functions, the default length is 30.
VARCHAR(MAX) should be used for Large Objects.It uses the normal datapages until the content actually fills 8k of data. When overflow happens, data is stored as old TEXT, IMAGE and a pointer is replacing the old content.
Varchar is for Variable-length, non-Unicode character data. n can be a value from 1 through 8,000. Max indicates that the maximum storage size is 2^31-1 bytes.
Hope it helps.
When a varchar's lenght is not specified in a data definition or variable declaration statement, the default length is 1. When it is not specified when using the CAST and CONVERT functions, the default length is 30.
See: char and varchar (Transact-SQL)
I feel that it is poor practice to code without specifying a length for varchar.
I am storing first name and last name with up to 30 characters each. Which is better varchar or nvarchar.
I have read that nvarchar takes up twice as much space compared to varchar and that nvarchar is used for internationalization.
So what do you suggest should I use: nvarchar or varchar ?
Also please let me know about the performance of both. Is performance for both is same or they differ in performance. Because space is not too big issue. Issue is the performance.
Basically, nvarchar means you can handle lots of alphabets, not just regular English. Technically, it means unicode support, not just ANSI. This means double-width characters or approximately twice the space. These days disk space is so cheap you might as well use nvarchar from the beginning rather than go through the pain of having to change during the life of a product.
If you're certain you'll only ever need to support one language you could stick with varchar, otherwise I'd go with nvarchar.
This has been discussed on SO before here.
EDITED: changed ascii to ANSI as noted in comment.
First of all, to clarify, nvarchar stores unicode data while varchar stores ANSI (8-bit) data. They function identically but nvarchar takes up twice as much space.
Generally, I prefer storing user names using varchar datatypes unless those names have characters which fall out of the boundary of characters which varchar can store.
It also depends on database collation also. For e.g. you'll not be able to store Russian characters in a varchar field, if your database collation is LATIN_CS_AS. But, if you are working on a local application, which will be used only in Russia, you'd set the database collation to Russian. What this will do is that it will allow you to enter Russian characters in a varchar field, saving some space.
But, now-a-days, most of the applications being developed are international, so you'd yourself have to decide which all users will be signing up, and based on that decide the datatype.
I have red that nvarchar takes twice as varchar.
Yes.
nvarchar is used for internationalization.
Yes.
what u suggest should i use nvarchar or varchar?
It's depends upon the application.
By default go with nvarchar. There is very little reason to go with varchar these days, and every reason to go with nvarchar (allows international characters; as discussed).
varchar is 1 byte per character, nvarchar is 2 bytes per character.
You will use more space with nvarchar but there are many more allowable characters. The extra space is negligible, but you may miss those extra characters in the future. Even if you don't expect to require internationalization, people will often have non-English characters (e.g. é, ñ or ö) in their names.
I would suggest you use nvarchar.
I have red that nvarchar takes twice as varchar
Yes. According to Microsoft: "Storage size, in bytes, is two times the number of characters entered + 2 bytes" (http://msdn.microsoft.com/en-us/library/ms186939(SQL.90).aspx).
But storage is cheap; I never worry about a few extra bytes.
Also, save yourself trouble in the future and set the maximum widths to something more generous, like 100 characters. There is absolutely no storage overhead to this when you're using varchar or nvarchar (as opposed to char/nchar). You never know when you're going to encounter a triple-barrelled surname or some long foreign name which exceeds 30 characters.
nvarchar is used for internationalization.
nvarchar can store any unicode character, such as characters from non-Latin scripts (Arabic, Chinese, etc). I'm not sure how your application will be taking data (via the web, via a GUI toolkit, etc) but it's likely that whatever technology you're using supports unicode out of the box. That means that for any user-entered data (such as name) there is always the possibility of receiving non-Latin characters, if not now then in the future.
If I was building a new application, I would use nvarchar. Call it "future-proofing" if you like.
The nvarchar type is Unicode, so it can handle just about any character that exist in every language on the planet. The characters are stored as UTF-16 or UCS-2 (not sure which, and the differences are subtle), so each character uses two bytes.
The varchar type uses an 8 bit character set, so it's limited to the 255 characters of the character set that you choose for the field. There are different character set that handles different character groups, so it's usually sufficient for text local to a country or a region.
If varchar works for what you want to do, you should use that. It's a bit less data, so it's overall slightly faster. If you need to handle a wide variety of characters, use nvarchar.
on performance:
a reason to use varchar over nvarchar is that you can have twice as many characters in your indexes! index keys are limited to 900 bytes
on usability:
if the application is only ever intended for a english audience & contain english names, use varchar
Data to store: "Sunil"
varchar(5) takes 7B
nvarchar(5) takes 12B
What's the difference between VARCHAR and CHAR in MySQL?
I am trying to store MD5 hashes.
VARCHAR is variable-length.
CHAR is fixed length.
If your content is a fixed size, you'll get better performance with CHAR.
See the MySQL page on CHAR and VARCHAR Types for a detailed explanation (be sure to also read the comments).
CHAR
Used to store character string value of fixed length.
The maximum no. of characters the data type can hold is 255 characters.
It's 50% faster than VARCHAR.
Uses static memory allocation.
VARCHAR
Used to store variable length alphanumeric data.
The maximum this data type can hold is up to
Pre-MySQL 5.0.3: 255 characters.
Post-MySQL 5.0.3: 65,535 characters shared for the row.
It's slower than CHAR.
Uses dynamic memory allocation.
CHAR Vs VARCHAR
CHAR is used for Fixed Length Size Variable
VARCHAR is used for Variable Length Size Variable.
E.g.
Create table temp
(City CHAR(10),
Street VARCHAR(10));
Insert into temp
values('Pune','Oxford');
select length(city), length(street) from temp;
Output will be
length(City) Length(street)
10 6
Conclusion: To use storage space efficiently must use VARCHAR Instead CHAR if variable length is variable
A CHAR(x) column can only have exactly x characters.
A VARCHAR(x) column can have up to x characters.
Since your MD5 hashes will always be the same size, you should probably use a CHAR.
However, you shouldn't be using MD5 in the first place; it has known weaknesses.
Use SHA2 instead.
If you're hashing passwords, you should use bcrypt.
What's the difference between VARCHAR and CHAR in MySQL?
To already given answers I would like to add that in OLTP systems or in systems with frequent updates consider using CHAR even for variable size columns because of possible VARCHAR column fragmentation during updates.
I am trying to store MD5 hashes.
MD5 hash is not the best choice if security really matters. However, if you will use any hash function, consider BINARY type for it instead (e.g. MD5 will produce 16-byte hash, so BINARY(16) would be enough instead of CHAR(32) for 32 characters representing hex digits. This would save more space and be performance effective.
Varchar cuts off trailing spaces if the entered characters is shorter than the declared length, while char will not. Char will pad spaces and will always be the length of the declared length. In terms of efficiency, varchar is more adept as it trims characters to allow more adjustment. However, if you know the exact length of char, char will execute with a bit more speed.
CHAR is fixed length and VARCHAR is variable length. CHAR always uses the same amount of storage space per entry, while VARCHAR only uses the amount necessary to store the actual text.
CHAR is a fixed length field; VARCHAR is a variable length field. If you are storing strings with a wildly variable length such as names, then use a VARCHAR, if the length is always the same, then use a CHAR because it is slightly more size-efficient, and also slightly faster.
In most RDBMSs today, they are synonyms. However for those systems that still have a distinction, a CHAR field is stored as a fixed-width column. If you define it as CHAR(10), then 10 characters are written to the table, where "padding" (typically spaces) is used to fill in any space that the data does not use up. For example, saving "bob" would be saved as ("bob"+7 spaces). A VARCHAR (variable character) column is meant to store data without wasting the extra space that a CHAR column does.
As always, Wikipedia speaks louder.
CHAR
CHAR is a fixed length string data type, so any remaining space in the field is padded with blanks.
CHAR takes up 1 byte per character. So, a CHAR(100) field (or variable) takes up 100 bytes on disk, regardless of the string it holds.
VARCHAR
VARCHAR is a variable length string data type, so it holds only the characters you assign to it.
VARCHAR takes up 1 byte per character, + 2 bytes to hold length information (For example, if you set a VARCHAR(100) data type = ‘Dhanika’, then it would take up 7 bytes (for D, H, A, N, I, K and A) plus 2 bytes, or 9 bytes in all.)
CHAR
Uses specific allocation of memory
Time efficient
VARCHAR
Uses dynamic allocation of memory
Memory efficient
The char is a fixed-length character data type, the varchar is a variable-length character data type.
Because char is a fixed-length data type, the storage size of the char value is equal to the maximum size for this column. Because varchar is a variable-length data type, the storage size of the varchar value is the actual length of the data entered, not the maximum size for this column.
You can use char when the data entries in a column are expected to be the same size.
You can use varchar when the data entries in a column are expected to vary considerably in size.
Distinguishing between the two is also good for an integrity aspect.
If you expect to store things that have a rule about their length such as yes or no then you can use char(1) to store Y or N. Also useful for things like currency codes, you can use char(3) to store things like USD, EUR or AUD.
Then varchar is better for things were there is no general rule about their length except for the limit. It's good for things like names or descriptions where there is a lot of variation of how long the values will be.
Then the text data type comes along and puts a spanner in the works (although it's generally just varchar with no defined upper limit).
according to High Performance MySQL book:
VARCHAR stores variable-length character strings and is the most common string data type. It can require less storage space than
fixed-length types, because it uses only as much space as it needs
(i.e., less space is used to store shorter values). The exception is a
MyISAM table created with ROW_FORMAT=FIXED, which uses a fixed amount
of space on disk for each row and can thus waste space. VARCHAR helps
performance because it saves space.
CHAR is fixed-length: MySQL always allocates enough space for the specified number of characters. When storing a CHAR value, MySQL
removes any trailing spaces. (This was also true of VARCHAR in MySQL
4.1 and older versions—CHAR and VAR CHAR were logically identical and differed only in storage format.) Values are padded with spaces as
needed for comparisons.
Char has a fixed length (supports 2000 characters), it is stand for character is a data type
Varchar has a variable length (supports 4000 characters)
Char or varchar- it is used to enter texual data where the length can be indicated in brackets
Eg- name char (20)
CHAR :
Supports both Character & Numbers.
Supports 2000 characters.
Fixed Length.
VARCHAR :
Supports both Character & Numbers.
Supports 4000 characters.
Variable Length.
any comments......!!!!