Oracle Insert Into NVarchar2(4000) does not allow 4000 characters? - sql

I have a table with a field datatype of NVarchar2(4000) I am moving data from a SQL Server to an Oracle Server. The SQL Server datatype is also nvarchar(4000). I have checked the MAX Size of this field on the SQL Server side, and the MAX is 3996, which is 4 characters short of the 4000 limit.
When I try to insert this data into Oracle, I get an error "LONG" due to the size.
What is going on here, will the Oracle NVarchar2(4000) not allow 4000 characters? If not, what is the limit, or how can I get around this?

There is a limit of 4000 bytes not 4000 characters. So NVARCHAR2(4000) with an AL16UTF16 national character set would occupy the maximum 4000 bytes.
From the oracle docs of MAX_STRING SIZE:
Tables with virtual columns will be updated with new data type
metadata for virtual columns of VARCHAR2(4000), 4000-byte NVARCHAR2,
or RAW(2000) type.
Solution:-
Also if you want to store 4000 characters then I would recommend you to use CLOB
A CLOB (Character Large Object) is an Oracle data type that can hold
up to 4 GB of data. CLOBs are handy for storing text.
You may try like this to change column data type to CLOB:
ALTER TABLE table_name
ADD (tmpcolumn CLOB);
UPDATE table_name SET tmpcolumn =currentnvarcharcolumn;
COMMIT;
ALTER TABLE table_name DROP COLUMN currentnvarcharcolumn;
ALTER TABLE table_name
RENAME COLUMN tmpcolumn TO whatevernameyouwant;

First, as others have pointed out, unless you're using 12.1, both varchar2 and nvarchar2 data types are limited in SQL to 4000 bytes. In PL/SQL, they're limited to 32767. In 12.1, you can increase the SQL limit to 32767 using the MAX_STRING_SIZE parameter.
Second, unless you are working with a legacy database that uses a non-Unicode character set that cannot be upgraded to use a Unicode character set, you would want to avoid nvarchar2 and nchar data types in Oracle. In SQL Server, you use nvarchar when you want to store Unicode data. In Oracle, the preference is to use varchar2 in a database whose character set supports Unicode (generally AL32UTF8) when you want to store Unicode data.
If you store Unicode data in an Oracle NVARCHAR2 column, the national character set will be used-- this is almost certainly AL16UTF16 which means that every character requires at least 2 bytes of storage. A NVARCHAR2(4000), therefore, probably can't store more than 2000 characters. If you use a VARCHAR2 column, on the other hand, you can use a variable width Unicode character set (AL32UTF8) in which case English characters generally require just 1 byte, most European characters require 2 bytes, and most Asian characters require 3 bytes (this is, of course, just a generalization). That is generally going to allow you to store substantially more data in a VARCHAR2 column.
If you do need to store more than 4000 bytes of data and you're using Oracle 11.2 or later, you'd have to use a LOB data type (CLOB or NCLOB).

As per the documentation though the width refers to the number of characters there's still a 4,000 byte limit:
Width specifications of character data type NVARCHAR2 refer to the number of characters. The maximum column size allowed is 4000 bytes.
You probably have 4 multi-byte characters.

Related

Get the encoded value of a character with current code page

I created a database in SQL Server with COLLATION KR949_BIN2, which means that the codepage of this database is 949.
Is it possible to get the encoded value of a character based on the codepage in this database?
For example, the encoded value of character '좩' in codepage 949 is 0xA144, is there a SQL statement that I can get 0xA144 from char '좩' in this database?
Also, is there a way to insert '좩' into a column by its encoded value 0xA144?
Based on Character data is represented incorrectly when the code page of the client computer differs from the code page of the database in SQL Server 2005 I suspect that you're actually using the Korean_Wansung_CI_AS collation or something similar:
Method 2: Use an appropriate collation for the database
If you must use a non-Unicode data type, always make sure that the code page of the database and the code page of any non-Unicode columns can store the non-Unicode data correctly. For example, if you want to store code page 949 (Korean) character data, use a Korean collation for the database. For example, use the Korean_Wansung_CI_AS collation for the database.
That being the case, yes, you can see and insert 0xA144 as per the following example:
create table #Wansung (
[Description] varchar(50),
Codepoint varchar(50) collate Korean_Wansung_CI_AS
);
insert #Wansung ([Description], Codepoint)
select 'U+C8A9 Hangul Syllable Jwaeg', N'좩';
insert #Wansung ([Description], Codepoint)
select 'From Windows-949 encoding', 0xA144;
select [Description], Codepoint, cast(Codepoint as varbinary(max)) as Bytes, cast(unicode(Codepoint) as varbinary(max)) as UTF32
from #Wansung;
Which returns the results:
Description
Codepoint
Bytes
UTF32
U+C8A9 Hangul Syllable Jwaeg
좩
0xA144
0x0000C8A9
From Windows-949 encoding
좩
0xA144
0x0000C8A9

What is the 10 in sqlite3 datatype varchar(10)?

If I create a table in sqlite3 like:
create table sodapops (name text, description varchar(10));
What does the number 10 in varchar do? How would it be any different than 255 or otherwise?
In any other database, it would specify the maximum length of the field.
However, SQLite ignores it. All strings are simply of text type and can be of any length.
As explained in the documentation:
Note that numeric arguments in parentheses that following the type
name (ex: "VARCHAR(255)") are ignored by SQLite - SQLite does not
impose any length restrictions (other than the large global
SQLITE_MAX_LENGTH limit) on the length of strings, BLOBs or numeric
values.

Determining Nvarchar length

I've read all about varchar versus nvarchar. But I didn't see an answer to what I think is a simple question. How do you determine the length of your nvarchar column? For varchar it's very simple: my Description, for example, can have 100 characters, so I define varchar(100). Now I'm told we need to internationalize and support any language. Does this mean I need to change my Description column to nvarchar(200), i.e. simply double the length? (And I'm ignoring all the other issues that are involved with internationalization for the moment.)
Is it that simple?
Generally it is the same as for varchar really. The number is still the maximum number of characters not the data length.
nvarchar(100) allows 100 characters (which would potentially consume 200 bytes in SQL Server).
You might want to allow for the fact that different cultures may take more characters to express the same thing though.
An exception to this is however is if you are using an SC collation (which supports supplementary characters). In that case a single character can potentially take up to 4 bytes.
So worst case would be to double the character value declared.
From microsoft web site:
A common misconception is to think that NCHAR(n) and NVARCHAR(n), the n defines the number of characters. But in NCHAR(n) and NVARCHAR(n) the n defines the string length in byte-pairs (0-4,000). n never defines numbers of characters that can be stored. This is similar to the definition of CHAR(n) and VARCHAR(n).
The misconception happens because when using characters defined in the Unicode range 0-65,535, one character can be stored per each byte-pair. However, in higher Unicode ranges (65,536-1,114,111) one character may use two byte-pairs. For example, in a column defined as NCHAR(10), the Database Engine can store 10 characters that use one byte-pair (Unicode range 0-65,535), but less than 10 characters when using two byte-pairs (Unicode range 65,536-1,114,111). For more information about Unicode storage and character ranges, see
https://learn.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql?view=sql-server-ver15
#Musa Calgar - exactly right. That link has the information for the answer to this question.
But to make sure the question itself is clear, we are talking about the 'length' attribute we see when we look at the column definition for a given table, right? That is the storage allocated per column. On the other hand, if we want to know the number of characters for a given string in the table at a given moment you can:
"SELECT myColumn, LEN(myColumn) FROM myTable"
But if the storage length is desired, you can drag the table name into the query window using SSMS, highlight it, and use 'Alt-F1' to see the defined lengths of each column.
So as an example, I created a table like this specifiying collations. (Latin1_General_100_CI_AS_SC allows for supplemental characters - that is, characters that take more than just 2 bytes):
CREATE TABLE [dbo].[TestTable1](
[col1] [varchar](10) COLLATE Latin1_General_100_CI_AS,
[col2] [nvarchar](10) COLLATE Latin1_General_100_CI_AS_SC,
[col3] [nvarchar](10) COLLATE Latin1_General_100_CI_AS
) ON [PRIMARY]
The lengths show up like this (Highlight in query window and Alt-F1):
Column_Name Type Length [...] Collation
col1 varchar 10 Latin1_General_100_CI_AS
col2 nvarchar 20 Latin1_General_100_CI_AS_SC
col3 nvarchar 20 Latin1_General_100_CI_AS
If you insert ASCII characters into the varchar and nvarchar fields, it will allow you to put 10 characters into all of them. There will be an error if you try to put more than 10 characters into those fields:
"String or binary data would be truncated.
The statement has been terminated."
If you insert non-ASCII characters like 'ā' you can still put 10 of them into each one, but SQL Server will convert the values going into col1 to the closest known character that fits into 1-byte. In this case, 'ā' will be converted to 'a'.
However, if you insert characters that require 4 bytes to store, like for example, '𠜎', you will only be allowed to put FIVE of them into the varchar and nvarchar fields. Any more than that will result in the truncation error shown above. The varchar field will show question marks because it has no single-byte character that it can convert that input to.
So when you insert five of these '𠜎', do a select of that row using len(<colname>) and you will see this:
col1 len(col1) col2 len(col2) col3 len(col3)
?????????? 10 𠜎𠜎𠜎𠜎𠜎 5 𠜎𠜎𠜎𠜎𠜎 10
So the length of col2 shows 5 characters since supplemental characters were defined when the table was created (see above CREATE TABLE DDL statement). However, col3 did not have _SC for its collation, so it is showing length 10 for the five characters we inserted.
Note that col1 has ten question marks. If we had defined the col1 varchar using the _SC collation instead of the non-supplemental one, it would behave the same way.

Impact on changing the data type from char to varchar2

Can anyone tell me will there be any impact on changing the datatype of a column from char to varchar2.
Because the issue i am facing is when i fire a select query i.e
select * from table_name where column_name in ('X','Y','Z');
The above query is returning only few rows. And recently the column_name data type was changed from char to varchar. The rows returned are the rows after the data type was changed.
A varchar2 datatype, when stored in a database , uses only the space allocated to
it. If you have a varchar2(100) and put 50 bytes in the table, it will use 52 bytes
(leading length byte).
A char datatype, when stored in a database table, always uses the maximum length and is
blank padded. If you have char(100) and put 50 bytes into it, it will consume 102
bytes.
So in your case probably its only giving the rows from the space allocated to varchar and hence only few rows are returned i believe.
Refered from : http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1542606219593

Substring column values that exceed 4000 characters

I have a column (URL) with URL values with a varchar2(4000) data type. Some of the values in there exceed 4000 characters, so is there a way for me to substring only those values in the column that exceed 4000 characters replacing the original value with the substring within that same column?
You are able find out the columns that exceeds 4000 characters like this,
select userid, length(description)
from Users
where length(description) > 4000;
So the resolution you can do it in the CTL file itself as follows,
description "SUBSTR(:description, 1, 4000)",
If you don want to loose data then you can use CLOB datatype instead of varchar2(4000)
Varchar supports up to 4,000 characters; each and every function output, like SUBSTR, is a varchar so will not able to hold more than a 4,000 character string. You will have use a clob; but a clob is a lobobject; so, it will be slow.
You could, while creating the table, add 2 columns and use a PL/SQL block as a varchar can be up to 32,767 characters in PL/SQL.