I've read all about varchar versus nvarchar. But I didn't see an answer to what I think is a simple question. How do you determine the length of your nvarchar column? For varchar it's very simple: my Description, for example, can have 100 characters, so I define varchar(100). Now I'm told we need to internationalize and support any language. Does this mean I need to change my Description column to nvarchar(200), i.e. simply double the length? (And I'm ignoring all the other issues that are involved with internationalization for the moment.)
Is it that simple?
Generally it is the same as for varchar really. The number is still the maximum number of characters not the data length.
nvarchar(100) allows 100 characters (which would potentially consume 200 bytes in SQL Server).
You might want to allow for the fact that different cultures may take more characters to express the same thing though.
An exception to this is however is if you are using an SC collation (which supports supplementary characters). In that case a single character can potentially take up to 4 bytes.
So worst case would be to double the character value declared.
From microsoft web site:
A common misconception is to think that NCHAR(n) and NVARCHAR(n), the n defines the number of characters. But in NCHAR(n) and NVARCHAR(n) the n defines the string length in byte-pairs (0-4,000). n never defines numbers of characters that can be stored. This is similar to the definition of CHAR(n) and VARCHAR(n).
The misconception happens because when using characters defined in the Unicode range 0-65,535, one character can be stored per each byte-pair. However, in higher Unicode ranges (65,536-1,114,111) one character may use two byte-pairs. For example, in a column defined as NCHAR(10), the Database Engine can store 10 characters that use one byte-pair (Unicode range 0-65,535), but less than 10 characters when using two byte-pairs (Unicode range 65,536-1,114,111). For more information about Unicode storage and character ranges, see
https://learn.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql?view=sql-server-ver15
#Musa Calgar - exactly right. That link has the information for the answer to this question.
But to make sure the question itself is clear, we are talking about the 'length' attribute we see when we look at the column definition for a given table, right? That is the storage allocated per column. On the other hand, if we want to know the number of characters for a given string in the table at a given moment you can:
"SELECT myColumn, LEN(myColumn) FROM myTable"
But if the storage length is desired, you can drag the table name into the query window using SSMS, highlight it, and use 'Alt-F1' to see the defined lengths of each column.
So as an example, I created a table like this specifiying collations. (Latin1_General_100_CI_AS_SC allows for supplemental characters - that is, characters that take more than just 2 bytes):
CREATE TABLE [dbo].[TestTable1](
[col1] [varchar](10) COLLATE Latin1_General_100_CI_AS,
[col2] [nvarchar](10) COLLATE Latin1_General_100_CI_AS_SC,
[col3] [nvarchar](10) COLLATE Latin1_General_100_CI_AS
) ON [PRIMARY]
The lengths show up like this (Highlight in query window and Alt-F1):
Column_Name Type Length [...] Collation
col1 varchar 10 Latin1_General_100_CI_AS
col2 nvarchar 20 Latin1_General_100_CI_AS_SC
col3 nvarchar 20 Latin1_General_100_CI_AS
If you insert ASCII characters into the varchar and nvarchar fields, it will allow you to put 10 characters into all of them. There will be an error if you try to put more than 10 characters into those fields:
"String or binary data would be truncated.
The statement has been terminated."
If you insert non-ASCII characters like 'ā' you can still put 10 of them into each one, but SQL Server will convert the values going into col1 to the closest known character that fits into 1-byte. In this case, 'ā' will be converted to 'a'.
However, if you insert characters that require 4 bytes to store, like for example, '𠜎', you will only be allowed to put FIVE of them into the varchar and nvarchar fields. Any more than that will result in the truncation error shown above. The varchar field will show question marks because it has no single-byte character that it can convert that input to.
So when you insert five of these '𠜎', do a select of that row using len(<colname>) and you will see this:
col1 len(col1) col2 len(col2) col3 len(col3)
?????????? 10 𠜎𠜎𠜎𠜎𠜎 5 𠜎𠜎𠜎𠜎𠜎 10
So the length of col2 shows 5 characters since supplemental characters were defined when the table was created (see above CREATE TABLE DDL statement). However, col3 did not have _SC for its collation, so it is showing length 10 for the five characters we inserted.
Note that col1 has ten question marks. If we had defined the col1 varchar using the _SC collation instead of the non-supplemental one, it would behave the same way.
Related
I have seen prefix N in some insert T-SQL queries. Many people have used N before inserting the value in a table.
I searched, but I was not able to understand what is the purpose of including the N before inserting any strings into the table.
INSERT INTO Personnel.Employees
VALUES(N'29730', N'Philippe', N'Horsford', 20.05, 1),
What purpose does this 'N' prefix serve, and when should it be used?
It's declaring the string as nvarchar data type, rather than varchar
You may have seen Transact-SQL code that passes strings around using
an N prefix. This denotes that the subsequent string is in Unicode
(the N actually stands for National language character set). Which
means that you are passing an NCHAR, NVARCHAR or NTEXT value, as
opposed to CHAR, VARCHAR or TEXT.
To quote from Microsoft:
Prefix Unicode character string constants with the letter N. Without
the N prefix, the string is converted to the default code page of the
database. This default code page may not recognize certain characters.
If you want to know the difference between these two data types, see this SO post:
What is the difference between varchar and nvarchar?
Let me tell you an annoying thing that happened with the N' prefix - I wasn't able to fix it for two days.
My database collation is SQL_Latin1_General_CP1_CI_AS.
It has a table with a column called MyCol1. It is an Nvarchar
This query fails to match Exact Value That Exists.
SELECT TOP 1 * FROM myTable1 WHERE MyCol1 = 'ESKİ'
// 0 result
using prefix N'' fixes it
SELECT TOP 1 * FROM myTable1 WHERE MyCol1 = N'ESKİ'
// 1 result - found!!!!
Why? Because latin1_general doesn't have big dotted İ that's why it fails I suppose.
1. Performance:
Assume your where clause is like this:
WHERE NAME='JON'
If the NAME column is of any type other than nvarchar or nchar, then you should not specify the N prefix. However, if the NAME column is of type nvarchar or nchar, then if you do not specify the N prefix, then 'JON' is treated as non-unicode. This means the data type of NAME column and string 'JON' are different and so SQL Server implicitly converts one operand’s type to the other. If the SQL Server converts the literal’s type
to the column’s type then there is no issue, but if it does the other way then performance will get hurt because the column's index (if available) wont be used.
2. Character set:
If the column is of type nvarchar or nchar, then always use the prefix N while specifying the character string in the WHERE criteria/UPDATE/INSERT clause. If you do not do this and one of the characters in your string is unicode (like international characters - example - ā) then it will fail or suffer data corruption.
Assuming the value is nvarchar type for that only we are using N''
I have seen prefix N in some insert T-SQL queries. Many people have used N before inserting the value in a table.
I searched, but I was not able to understand what is the purpose of including the N before inserting any strings into the table.
INSERT INTO Personnel.Employees
VALUES(N'29730', N'Philippe', N'Horsford', 20.05, 1),
What purpose does this 'N' prefix serve, and when should it be used?
It's declaring the string as nvarchar data type, rather than varchar
You may have seen Transact-SQL code that passes strings around using
an N prefix. This denotes that the subsequent string is in Unicode
(the N actually stands for National language character set). Which
means that you are passing an NCHAR, NVARCHAR or NTEXT value, as
opposed to CHAR, VARCHAR or TEXT.
To quote from Microsoft:
Prefix Unicode character string constants with the letter N. Without
the N prefix, the string is converted to the default code page of the
database. This default code page may not recognize certain characters.
If you want to know the difference between these two data types, see this SO post:
What is the difference between varchar and nvarchar?
Let me tell you an annoying thing that happened with the N' prefix - I wasn't able to fix it for two days.
My database collation is SQL_Latin1_General_CP1_CI_AS.
It has a table with a column called MyCol1. It is an Nvarchar
This query fails to match Exact Value That Exists.
SELECT TOP 1 * FROM myTable1 WHERE MyCol1 = 'ESKİ'
// 0 result
using prefix N'' fixes it
SELECT TOP 1 * FROM myTable1 WHERE MyCol1 = N'ESKİ'
// 1 result - found!!!!
Why? Because latin1_general doesn't have big dotted İ that's why it fails I suppose.
1. Performance:
Assume your where clause is like this:
WHERE NAME='JON'
If the NAME column is of any type other than nvarchar or nchar, then you should not specify the N prefix. However, if the NAME column is of type nvarchar or nchar, then if you do not specify the N prefix, then 'JON' is treated as non-unicode. This means the data type of NAME column and string 'JON' are different and so SQL Server implicitly converts one operand’s type to the other. If the SQL Server converts the literal’s type
to the column’s type then there is no issue, but if it does the other way then performance will get hurt because the column's index (if available) wont be used.
2. Character set:
If the column is of type nvarchar or nchar, then always use the prefix N while specifying the character string in the WHERE criteria/UPDATE/INSERT clause. If you do not do this and one of the characters in your string is unicode (like international characters - example - ā) then it will fail or suffer data corruption.
Assuming the value is nvarchar type for that only we are using N''
I have table with an integer column.
CREATE TABLE [dbo].[tble1](
[id] [int] NOT NULL,
[test] [nchar](10) NULL
)
When I try to insert some values and pass an empty string to the id column like below, it gets inserted and the value of the id column is 0 by default.
INSERT INTO [dbo].[tble1]
([id],[test])
VALUES
('','a')
I couldn't find any satisfying reasoning behind it. Could some one please share your thoughts on this?
What is happening is that '' is being converted to an integer. The rules are that a string can be converted, based on the digit characters in the string.
If a string is empty, it gets converted to 0.
So, the conversion is happening at the very "top" level. The types don't match so SQL Server attempts an implicit conversion.
Unfortunately, the documentation is not really clear on the topic:
Character expressions that are being converted to an exact numeric
data type must consist of digits, a decimal point, and an optional
plus (+) or minus (-). Leading blanks are ignored. Comma separators,
such as the thousands separator in 123,456.00, are not allowed in the
string.
To be honest, I would interpret the "must consist of digits" as saying that there must be at least one digit (although technically in English "zero" is treated as a plural, I don't necessarily think of plurals as including zero elements). However, the empty string has been used -- pretty much for forever -- as a valid value for any type across a broad range of databases.
It will try to Convert ' ' to Integer and it got succeeded.
SELECT CONVERT(INT, '')
Output
0
You are getting defaulted value to 0, as you have NOT null, defined for the column,
if you keep ID as null, then it will put NULL,
Also, if you want to populate the value automatically then you set the identity for the column
I have seen prefix N in some insert T-SQL queries. Many people have used N before inserting the value in a table.
I searched, but I was not able to understand what is the purpose of including the N before inserting any strings into the table.
INSERT INTO Personnel.Employees
VALUES(N'29730', N'Philippe', N'Horsford', 20.05, 1),
What purpose does this 'N' prefix serve, and when should it be used?
It's declaring the string as nvarchar data type, rather than varchar
You may have seen Transact-SQL code that passes strings around using
an N prefix. This denotes that the subsequent string is in Unicode
(the N actually stands for National language character set). Which
means that you are passing an NCHAR, NVARCHAR or NTEXT value, as
opposed to CHAR, VARCHAR or TEXT.
To quote from Microsoft:
Prefix Unicode character string constants with the letter N. Without
the N prefix, the string is converted to the default code page of the
database. This default code page may not recognize certain characters.
If you want to know the difference between these two data types, see this SO post:
What is the difference between varchar and nvarchar?
Let me tell you an annoying thing that happened with the N' prefix - I wasn't able to fix it for two days.
My database collation is SQL_Latin1_General_CP1_CI_AS.
It has a table with a column called MyCol1. It is an Nvarchar
This query fails to match Exact Value That Exists.
SELECT TOP 1 * FROM myTable1 WHERE MyCol1 = 'ESKİ'
// 0 result
using prefix N'' fixes it
SELECT TOP 1 * FROM myTable1 WHERE MyCol1 = N'ESKİ'
// 1 result - found!!!!
Why? Because latin1_general doesn't have big dotted İ that's why it fails I suppose.
1. Performance:
Assume your where clause is like this:
WHERE NAME='JON'
If the NAME column is of any type other than nvarchar or nchar, then you should not specify the N prefix. However, if the NAME column is of type nvarchar or nchar, then if you do not specify the N prefix, then 'JON' is treated as non-unicode. This means the data type of NAME column and string 'JON' are different and so SQL Server implicitly converts one operand’s type to the other. If the SQL Server converts the literal’s type
to the column’s type then there is no issue, but if it does the other way then performance will get hurt because the column's index (if available) wont be used.
2. Character set:
If the column is of type nvarchar or nchar, then always use the prefix N while specifying the character string in the WHERE criteria/UPDATE/INSERT clause. If you do not do this and one of the characters in your string is unicode (like international characters - example - ā) then it will fail or suffer data corruption.
Assuming the value is nvarchar type for that only we are using N''
I have seen prefix N in some insert T-SQL queries. Many people have used N before inserting the value in a table.
I searched, but I was not able to understand what is the purpose of including the N before inserting any strings into the table.
INSERT INTO Personnel.Employees
VALUES(N'29730', N'Philippe', N'Horsford', 20.05, 1),
What purpose does this 'N' prefix serve, and when should it be used?
It's declaring the string as nvarchar data type, rather than varchar
You may have seen Transact-SQL code that passes strings around using
an N prefix. This denotes that the subsequent string is in Unicode
(the N actually stands for National language character set). Which
means that you are passing an NCHAR, NVARCHAR or NTEXT value, as
opposed to CHAR, VARCHAR or TEXT.
To quote from Microsoft:
Prefix Unicode character string constants with the letter N. Without
the N prefix, the string is converted to the default code page of the
database. This default code page may not recognize certain characters.
If you want to know the difference between these two data types, see this SO post:
What is the difference between varchar and nvarchar?
Let me tell you an annoying thing that happened with the N' prefix - I wasn't able to fix it for two days.
My database collation is SQL_Latin1_General_CP1_CI_AS.
It has a table with a column called MyCol1. It is an Nvarchar
This query fails to match Exact Value That Exists.
SELECT TOP 1 * FROM myTable1 WHERE MyCol1 = 'ESKİ'
// 0 result
using prefix N'' fixes it
SELECT TOP 1 * FROM myTable1 WHERE MyCol1 = N'ESKİ'
// 1 result - found!!!!
Why? Because latin1_general doesn't have big dotted İ that's why it fails I suppose.
1. Performance:
Assume your where clause is like this:
WHERE NAME='JON'
If the NAME column is of any type other than nvarchar or nchar, then you should not specify the N prefix. However, if the NAME column is of type nvarchar or nchar, then if you do not specify the N prefix, then 'JON' is treated as non-unicode. This means the data type of NAME column and string 'JON' are different and so SQL Server implicitly converts one operand’s type to the other. If the SQL Server converts the literal’s type
to the column’s type then there is no issue, but if it does the other way then performance will get hurt because the column's index (if available) wont be used.
2. Character set:
If the column is of type nvarchar or nchar, then always use the prefix N while specifying the character string in the WHERE criteria/UPDATE/INSERT clause. If you do not do this and one of the characters in your string is unicode (like international characters - example - ā) then it will fail or suffer data corruption.
Assuming the value is nvarchar type for that only we are using N''