varchar or nvarchar - sql

I am storing first name and last name with up to 30 characters each. Which is better varchar or nvarchar.
I have read that nvarchar takes up twice as much space compared to varchar and that nvarchar is used for internationalization.
So what do you suggest should I use: nvarchar or varchar ?
Also please let me know about the performance of both. Is performance for both is same or they differ in performance. Because space is not too big issue. Issue is the performance.

Basically, nvarchar means you can handle lots of alphabets, not just regular English. Technically, it means unicode support, not just ANSI. This means double-width characters or approximately twice the space. These days disk space is so cheap you might as well use nvarchar from the beginning rather than go through the pain of having to change during the life of a product.
If you're certain you'll only ever need to support one language you could stick with varchar, otherwise I'd go with nvarchar.
This has been discussed on SO before here.
EDITED: changed ascii to ANSI as noted in comment.

First of all, to clarify, nvarchar stores unicode data while varchar stores ANSI (8-bit) data. They function identically but nvarchar takes up twice as much space.
Generally, I prefer storing user names using varchar datatypes unless those names have characters which fall out of the boundary of characters which varchar can store.
It also depends on database collation also. For e.g. you'll not be able to store Russian characters in a varchar field, if your database collation is LATIN_CS_AS. But, if you are working on a local application, which will be used only in Russia, you'd set the database collation to Russian. What this will do is that it will allow you to enter Russian characters in a varchar field, saving some space.
But, now-a-days, most of the applications being developed are international, so you'd yourself have to decide which all users will be signing up, and based on that decide the datatype.

I have red that nvarchar takes twice as varchar.
Yes.
nvarchar is used for internationalization.
Yes.
what u suggest should i use nvarchar or varchar?
It's depends upon the application.

By default go with nvarchar. There is very little reason to go with varchar these days, and every reason to go with nvarchar (allows international characters; as discussed).

varchar is 1 byte per character, nvarchar is 2 bytes per character.
You will use more space with nvarchar but there are many more allowable characters. The extra space is negligible, but you may miss those extra characters in the future. Even if you don't expect to require internationalization, people will often have non-English characters (e.g. é, ñ or ö) in their names.
I would suggest you use nvarchar.

I have red that nvarchar takes twice as varchar
Yes. According to Microsoft: "Storage size, in bytes, is two times the number of characters entered + 2 bytes" (http://msdn.microsoft.com/en-us/library/ms186939(SQL.90).aspx).
But storage is cheap; I never worry about a few extra bytes.
Also, save yourself trouble in the future and set the maximum widths to something more generous, like 100 characters. There is absolutely no storage overhead to this when you're using varchar or nvarchar (as opposed to char/nchar). You never know when you're going to encounter a triple-barrelled surname or some long foreign name which exceeds 30 characters.
nvarchar is used for internationalization.
nvarchar can store any unicode character, such as characters from non-Latin scripts (Arabic, Chinese, etc). I'm not sure how your application will be taking data (via the web, via a GUI toolkit, etc) but it's likely that whatever technology you're using supports unicode out of the box. That means that for any user-entered data (such as name) there is always the possibility of receiving non-Latin characters, if not now then in the future.
If I was building a new application, I would use nvarchar. Call it "future-proofing" if you like.

The nvarchar type is Unicode, so it can handle just about any character that exist in every language on the planet. The characters are stored as UTF-16 or UCS-2 (not sure which, and the differences are subtle), so each character uses two bytes.
The varchar type uses an 8 bit character set, so it's limited to the 255 characters of the character set that you choose for the field. There are different character set that handles different character groups, so it's usually sufficient for text local to a country or a region.
If varchar works for what you want to do, you should use that. It's a bit less data, so it's overall slightly faster. If you need to handle a wide variety of characters, use nvarchar.

on performance:
a reason to use varchar over nvarchar is that you can have twice as many characters in your indexes! index keys are limited to 900 bytes
on usability:
if the application is only ever intended for a english audience & contain english names, use varchar

Data to store: "Sunil"
varchar(5) takes 7B
nvarchar(5) takes 12B

Related

How are you supposed to choose K when creating a VARCHAR(K) column? Or are you?

This is something I've never understood. Let's say I want a column that is storing an email address. I think,
"Ok, email addresses are usually no more than 15 characters, but I'll
say 50 max characters just to play it safe."
and making that column VARCHAR(50). Of course, then this means that I have to create extra code, possibly both client- and server-side validation of entries to that column.
That brings up the question of Why not just use NVARCHAR all the time except in those rare circumstances where the logic of my application dicates a fixed or maximum length. From what I understand, if I create a VARCHAR(50) and none of the entries are more than 25 characters, that does not mean that 50% of the space is wasted, as the database knows how to optimize everything.
Again, that brings up the question of why not just use NVARCHAR.
nvarchar itself has nothing to with "unlimited length of string" since it is just unicode version of varchar. At present time there are no reasons to use varchar (except some backward compatibility issues) and nvarchar should be preferred.
So I'm supposing you're asking why don't use nvarchar(max) everywhere which is almost unlimited (2 GByte of storage) instead of specifying nvarchar(n) for concrete columns.
There are many reasons of using nvarchar(n) instead of nvarchar(max).
For example, if your column should be included in index - it can't be nvarchar(max).
Also nvarchar(max) data internally stored differently than nvarchar(n) and somtimes it can affect performance.

What's the SQL national character (NCHAR) datatype really for?

As well as CHAR (CHARACTER) and VARCHAR (CHARACTER VARYING), SQL offers an NCHAR (NATIONAL CHARACTER) and NVARCHAR (NATIONAL CHARACTER VARYING) type. In some databases, this is the better datatype to use for character (non-binary) strings:
In SQL Server, NCHAR is stored as UTF-16LE and is the only way to reliably store non-ASCII characters, CHAR being a single-byte codepage only;
In Oracle, NVARCHAR may be stored as UTF-16 or UTF-8 rather than a single-byte collation;
But in MySQL, NVARCHAR is VARCHAR, so it makes no difference, either type can be stored with UTF-8 or any other collation.
So, what does NATIONAL actually conceptually mean, if anything? The vendors' docs only tell you about what character sets their own DBMSs use, rather than the actual rationale. Meanwhile the SQL92 standard explains the feature even less helpfully, stating only that NATIONAL CHARACTER is stored in an implementation-defined character set. As opposed to a mere CHARACTER, which is stored in an implementation-defined character set. Which might be a different implementation-defined character set. Or not.
Thanks, ANSI. Thansi.
Should one use NVARCHAR for all character (non-binary) storage purposes? Are there currently-popular DBMSs in which it will do something undesirable, or which just don't recognise the keyword (or N'' literals)?
"NATIONAL" in this case means characters specific to different nationalities. Far east languages especially have so many characters that one byte is not enough space to distinguish them all. So if you have an english(ascii)-only app or an english-only field, you can get away using the older CHAR and VARCHAR types, which only allow one byte per character.
That said, most of the time you should use NCHAR/NVARCHAR. Even if you don't think you need to support (or potentially support) multiple languages in your data, even english-only apps need to be able to sensibly handle security attacks using foreign-language characters.
In my opinion, about the only place where the older CHAR/VARCHAR types are still preferred is for frequently-referenced ascii-only internal codes and data on platforms like Sql Server that support the distinction — data that would be the equivalent of an enum in a client language like C++ or C#.
Meanwhile the SQL92 standard explains
the feature even less helpfully,
stating only that NATIONAL CHARACTER
is stored in an implementation-defined
character set. As opposed to a mere
CHARACTER, which is stored in an
implementation-defined character set.
Which might be a different
implementation-defined character set.
Or not.
Coincidentally, this is the same "distinction" the C++ standard makes between char and wchar_t. A relic of the Dark Ages of Character Encoding when every language/OS combination has its own character set.
Should one use NVARCHAR for all
character (non-binary) storage
purposes?
It is not important whether the declared type of your column is VARCHAR or NVARCHAR. But it is important to use Unicode (whether UTF-8, UTF-16, or UTF-32) for all character storage purposes.
Are there currently-popular DBMSs in
which it will do something undesirable
Yes: In MS SQL Server, using NCHAR makes your (English) data take up twice as much space. Unfortunately, UTF-8 isn't supported yet.
EDIT: SQL Server 2019 finally introduced UTF-8 support.
In Oracle, the database character set can be a multi-byte character set, so you can store all manner of characters in there....but you need to understand and define the length of the columns appropriately (in either BYTES or CHARACTERS).
NVARCHAR gives you the option to have a database character set that is a single-byte (which reduces the potential for confusion between BYTE or CHARACTER sized columns) and use NVARCHAR as the multi-byte. See here.
Since I predominantly work with English data, I'd go with a multi-byte character set (UTF-8 mostly) as the database character set and ignore NVARCHAR. If I inherited an old database which was in a single-byte characterset and was too big to convert, I may use NVARCHAR. But I'd prefer not to.

what could be the datatype for the column having weblinks

One curious question. if i have a table with column with weblinks then what could be the datatype nvarchar or varchar. and what could be the size of that datatype?
In general, use nvarchar.
What are the main performance differences between varchar and nvarchar SQL Server data types?
RFC2616 says there's no maximum length of a URL, but 2000 is probably safe.
What is the maximum length of a URL in different browsers?
You should use nvarchar since chinese national characters were allowed in URL names and varchar can't handle those. Maximum URL size is 2083 characters (at least in IE), but you don't see those quite often. If you want to be completely sure that you can handle all URLs you shuold use nvarchar(2083).
I'd say varchar(1000) would be enough (unless you're going to store some Amazon URLs, of course) :). You don't need nvarchar because national URLs are experimental and are eventually converted to Latin with special characters.
Typically Web servers set fairly generous limits on length for genuine URLs e.g. up to 2048 or 4096 characters.
So, if you want to be safe and still don't want to use varchar(max), you can use varchar(2048) and varchar(4096), respectively.
For data with embedded URLs, you can use either varchar or nvarchar. The only difference between nvarchar and varchar is nvarchar is a varchar that natively supports unicode data. Also, the storage space is larger: varchar is 8-bit, while unicode is 16-bit, so double the space.
A future-proof solution would be nvarchar, since recent movements toward full unicode domain names are noticable, e.g. Russia Begins Registering Domains in Cyrillic.
URLs are subject to RFC1738:
URLs are written only with the graphic
printable characters of the
US-ASCII coded character set. The
octets 80-FF hexadecimal are not
used in US-ASCII, and the octets 00-1F
and 7F hexadecimal represent
control characters; these must be
encoded
This would place all 'weblinks' safely in the VARCHAR camp. With SQL Server 2008 R2 though you need not to worry anymore, since Unicode Compression is available (on Enterprise and DataCenter Editions).

Best data type for storing strings in SQL Server?

What's the best data type to be used when storing strings, like a first name? I've seen varchar and nvarchar both used. Which one is better? Does it matter?
I've also heard that the best length to use is 255, but I don't know why. Is there a specific length that is preferred for strings?
nvarchar stores unicode character data which is required if you plan to store non-English names. If it's a web application, I highly recommend using nvarchar even if you don't plan on being international. The downside is that it consumes twice as much space, 16-bits per character for nvarchar and 8-bits per character for varchar.
What's the best data type to be used
when storing strings, like a first
name? I've seen varchar and nvarchar
both used. Which one is better? Does
it matter?
See What is the difference between nchar(10) and varchar(10) in MSSQL?
If you need non-ASCII characters, you have to use nchar/nvarchar. If you don't, then you may want to use char/varchar to save space.
Note that this issue is specific to MS SQL Server, which doesn't have good support for UTF-8. In other SQL implementations that do, you can use Unicode strings with no extra space requirements (for English).
EDIT: Since this answer was originally written, SQL Server 2019 (15.x) finally introduced UTF-8 support. You may want to consider using it as your default database text encoding.
I've also heard that the best length
to use is 255, but I don't know why.
See Is there a good reason I see VARCHAR(255) used so often (as opposed to another length)?
Is there a specific length that is
preferred for strings?
If you data has a well-defined maximum limit (e.g., 17 characters for a VIN), then use that.
OTOH, if the limit is arbitrary, then choose a generous maximum size to avoid rejecting valid data. In SQL Server, you may want to consider the 900-byte maximum size of index keys.
nvarchar means you can save unicode character inside it. there is 2GB limit for nvarchar type. if the field length is more than 4000 characters, an overflow page is used. smaller fields means one page can hold more rows which increase the query performance.
Generally, for small strings use nvarchar(n), which supports Unicode characters. The string is compressed when used with row or page compression (at least one of which is generally desirable).
Large strings need nvarchar(max), which Unicode compression does not support.
For special-case scenarios when your data set never uses Unicode characters, varchar(n) and varchar(max) restrict the string type of one byte per character.
If you know the max length (n) is less than 256, SQL Server only needs to use 1 byte to store the string length. This reduces storage space by about half a percent compared a string type whose max length is just over 255.

SQL When to use Which Data Type

Hi I was wondering when I should use the different data types. As in in my table, how can I decide which to use: nvarchar, nchar, varchar, varbinary, etc.
Examples:
What would I use for a ... column:
Phone number,
Address,
First Name, Last Name,
Email,
ID number,
etc.
Thanks for any help!
As a general rule, I would not define anything as a "number" field if I wasn't going to be doing arithmetic on it, even if the data itself was numeric.
Your "phone" field is one example. I'd define that as a varchar.
Varchar, Integer, and Bit cover 99% of my day to day uses.
The question really depends on your requirements. I know that's not a particularly satisfactory answer, but it's true.
The n..char data types are for Unicode data, so if you're going to need to use unicode character sets in your data you should use those types as opposed to their "non-n" analogs. the nchar and char type are fixed length, and the nvarchar and varchar type can have a variable length, which will effect the size of the column on the disk and in memory. Generally I would say to use the type that uses the least disk space but fits for your needs.
This page has links to the Microsoft descriptions of these datatypes for SQL Server 2005, many of which give pointers for when to use which type. You might be particularly interested in this page regarding char and varchar types.
A data type beginning with n means it can be used for unicode characters... eg nVarchar.
Selection of integers is also quite fun.
http://www.databasejournal.com/features/mssql/article.phpr/2212141/Choosing-SQL-Server-2000-Data-Types.htm
The most common data type i use is varchar....
The N* data types (NVARCHAR, NCHAR, NTEXT) are for Unicode strings. They take up two times the space their "normal" pendants (VARCHAR, CHAR, TEXT) need, but they can store Unicode without conversion and possible loss of fidelity.
The TEXT data types can store nearly unlimited amounts of data, but they perform not as good as the CHAR data types because they are stored outside of the record.
THE VARCHAR data types are of variable length. They will not be padded with spaces at the end, but their CHAR pendants will (a CHAR(20) is always twenty characters long, even if if contains 5 letters only. The remaining 15 will be spaces).
The binary data types are for binary data, whatever you care to store into them (images are a primary example).
Other people have given good general answers, but I'd add one important point: when using VARCHAR()s (which I would recommend for those kinds of fields), be sure to use a length that's big enough for any reasonable value. For example, I typically declare VARCHAR(100) for a name, e-mail address, domain name, city name, etc., and VARCHAR(200) for an URL or street address.
This is more than you'll routinely need. In fact, 30 characters is enough for almost all of these values (except full name, but a good database should always store first and last name separately), but it's better than having to change data types some day down the road. There's very little cost in specifying a higher-than-necessary length for a VARCHAR, but note that VARCHAR(MAX) and TEXT do entail significant overhead, so use them only when necessary.
Here's a post which points out a case where a longer-than-necessary VARCHAR can hurt performance: Importance of varchar length in MySQL table. Goes to show that everything has a cost, though in general I'd still favor long VARCHARs.