What are NLS Strings in Oracle SQL? - sql

What are NLS Strings in Oracle SQL which is shown as a difference between char and nchar as well as varchar2 and nvarchar2 data types ? Thank you

Every Oracle database instance has 2 available character set configurations:
The default character set (used by char, varchar2, clob etc. types)
The national character set (used by nchar, nvarchar2, nclob, etc. types)
Because the default character set could be configured to be a character set that doesn't support the full range of Unicode characters (such as Windows 1252), that's why Oracle provides this alternate character set configuration as well, that is guaranteed to support Unicode.
So let's say your database uses Windows-1252 for its default character set (not that I'm recommending it), and UTF-8 for the national (or alternate) character set...
Then if you have a table column where you don't need to support all kinds of weird unicode characters, then you can use a type such as varchar2 if you want to. And by doing so, you may be saving some space.
But if you do have a specific need to store and support unicode characters, then for that very specific instance, your column should be defined as nvarchar2, or some other type that uses the national character set.
That said, if your database's default character set is already a character set that supports Unicode, then using the nchar, nvarchar2, etc. types is not really necessary.
You can find more complete information on the topic here.

AFAIK, NLS stands for National Language Support which supports local languages (In other words supporting Localization). From Oracle Documentation
National Language Support (NLS) is a technology enabling Oracle
applications to interact with users in their native language, using
their conventions for displaying data

When you talk about "NLS" Settings it is not limited to character set configuration of your database.
You also have parameters like NLS_DATE_FORMAT, NLS_CURRENCY, NLS_CALENDAR, NLS_LANGUAGE, etc.
Most of them you can set on session level, i.e. individually for each user.

Related

difference between character set and national character set oracle?

difference between character set and national character set oracle?
This is answered in Oracle's documentation: Choosing a Character Set
Character Set Encoding
When computer systems process characters, they use numeric codes instead of the graphical representation of the character. For example, when the database stores the letter A, it actually stores a numeric code that the computer system interprets as the letter. These numeric codes are especially important in a global environment because of the potential need to convert data between different character sets.
What is an Encoded Character Set?
You specify an encoded character set when you create a database.
Choosing a character set determines what languages can be represented
in the database. It also affects:
How you create the database schema
How you develop applications that process character data
How the database works with the operating system
Database performance
Storage required for storing character data
A group of characters (for example, alphabetic characters, ideographs,
symbols, punctuation marks, and control characters) can be encoded as
a character set. An encoded character set assigns a unique numeric
code to each character in the character set. The numeric codes are
called code points or encoded values. The following table shows
examples of characters that have been assigned a hexadecimal code
value in the ASCII character set.
Choosing a National Character Set
The term national character set refers to an alternative character set that enables you to store Unicode character data in a database that does not have a Unicode database character set. Another reason for choosing a national character set is that the properties of a different character encoding scheme may be more desirable for extensive character processing operations.
SQL NCHAR, NVARCHAR2, and NCLOB data types support Unicode data only. You can use either the UTF8 or the AL16UTF16 character set. The default is AL16UTF16.
Oracle recommends using SQL CHAR, VARCHAR2, and CLOB data types in AL32UTF8 database to store Unicode character data. Use of SQL NCHAR, NVARCHAR2, and NCLOB should be considered only if you must use a database whose database character set is not AL32UTF8.
In Oracle you have these two character sets mainly for historical reasons. In earlier times a typical setup was
Character Set: US7ASCII
National Character Set: WE8ISO8859P1
The character set was used for the generic part of your application. Then for your customers over all countries in the world you set the National Character Set according to the customer local requirement.
Nowadays with Unicode (i.e. AL32UTF8) there is actually no reason to use the National Character Set any more. More and more new native Oracle functions even do not support the National Character Set at all.
The only reason could be a heavy use of Asian characters where AL16UTF16 is more efficient in terms of space.

Western European Characterset to Turkish in sql

I am having a serious issue with character encoding. To give some background:
I have turkish business users who enter some data on Unix screens in Turkish language.
My database NLS parameter is set to AMERICAN, WE8ISO8859P1 and Unix NLS_LANG to AMERICAN_AMERICA.WE8ISO8859P1.
Turkey business is able to see all the turkish characters on UNIX screens and TOAD while I'm not. I can only see them in Western European Character set.
At business end: ÖZER İNŞAAT TAAHHÜT VE
At our end : ÖZER ÝNÞAAT TAAHHÜT VE
If you notice the turkish characters İ and Ş are getting converted to ISO 8859-1 character set. However, all the settings(NLS paramaters in db and unix) are same at both end- ISO8859-1(Western European)
With some study, I can understand - Turkish machines can display turkish data by doing conversion in real-time(DB NLS settings are overridden by local NLS settings).
Now, I have a interface running in my db- have some PL/SQL scripts(run through shell script) that extracts some data from database and spool them to a .csv file on a unix path. Then that .csv file is transferred to an external system via MFT(Managed File transfer).
The problem is- Exract never conains any turkish character. Every turkish character is getting converted into Western European Characterset and goes like this to the external system which is treated as a case of data conversion/loss and my business is really unhappy.
Could anyone tell me - How could I retain all the turkish characters?
P.S. : External System's characterset could be set to ISP8859-9 charcterset.
Many thanks in advance.
If you are saying that your database character set is ISO-8859-1, i.e.
SELECT parameter, value
FROM v$nls_parameters
WHERE parameter = 'NLS_CHARACTERSET'
returns a value of WE8ISO8859P1 and you are storing the data in CHAR, VARCHAR, or VARCHAR2 columns, the problem is that the database character set does not support the full set of Turkish characters. If a character is not in the ISO-8859-1 codepage layout, it cannot be stored properly in database columns governed by the database character set. If you want to store Turkish data in an ISO-8859-1 database, you could potentially use the workaround characters instead (i.e. substituting S for Ş). If you want to support the full range of Turkish characters, however, you would need to move to a character set that supported all those characters-- either ISO-8859-9 or UTF-8 would be relatively common.
Changing the character set of your existing database is a non-trivial undertaking, however. There is a chapter in the Globalization Support Guide for whatever version of Oracle you are using that covers character set migration. If you want to move to a Unicode character set (which is generally the preferred approach rather than sticking with one of the single-byte ISO character sets), you can potentially leverage the Oracle Database Migration Assistant for Unicode.
At this point, you'll commonly see the objection that at least some applications are seeing the data "correctly" so the database must support the Turkish characters. The problem is that if you set up your NLS_LANG incorrectly, it is possible to bypass character set conversion entirely meaning that whatever binary representation a character has on the client gets persisted without modification to the database. As long as every process that reads the data configures their NLS_LANG identically and incorrectly, things may appear to work. However, you will very quickly find that some other application won't be able to configure their NLS_LANG identically incorrectly. A Java application, for example, will always want to convert the data from the database into a Unicode string internally. So if you're storing the data incorrectly in the database, as it sounds like you are, there is no way to get those applications to read it correctly. If you are simply using SQL*Plus in a shell script to generate the file, it is almost certainly possible to get your client configured incorrectly so that the data file appears to be correct. But it would be a very bad idea to let the existing misconfiguration persist. You open yourself up to much bigger problems in the future (if you're not already there) where different clients insert data in different character sets into the database making it much more difficult to disentangle, when you find that tools like the Oracle export utility have corrupted the data that is exported or when you want to use a tool that can't be configured incorrectly to view the data. You're much better served getting the problem corrected early.
Just setting your NLS_LANG parameter to AMERICAN_AMERICA.WE8ISO8859P9 is enough for Turkish language.

What's the SQL national character (NCHAR) datatype really for?

As well as CHAR (CHARACTER) and VARCHAR (CHARACTER VARYING), SQL offers an NCHAR (NATIONAL CHARACTER) and NVARCHAR (NATIONAL CHARACTER VARYING) type. In some databases, this is the better datatype to use for character (non-binary) strings:
In SQL Server, NCHAR is stored as UTF-16LE and is the only way to reliably store non-ASCII characters, CHAR being a single-byte codepage only;
In Oracle, NVARCHAR may be stored as UTF-16 or UTF-8 rather than a single-byte collation;
But in MySQL, NVARCHAR is VARCHAR, so it makes no difference, either type can be stored with UTF-8 or any other collation.
So, what does NATIONAL actually conceptually mean, if anything? The vendors' docs only tell you about what character sets their own DBMSs use, rather than the actual rationale. Meanwhile the SQL92 standard explains the feature even less helpfully, stating only that NATIONAL CHARACTER is stored in an implementation-defined character set. As opposed to a mere CHARACTER, which is stored in an implementation-defined character set. Which might be a different implementation-defined character set. Or not.
Thanks, ANSI. Thansi.
Should one use NVARCHAR for all character (non-binary) storage purposes? Are there currently-popular DBMSs in which it will do something undesirable, or which just don't recognise the keyword (or N'' literals)?
"NATIONAL" in this case means characters specific to different nationalities. Far east languages especially have so many characters that one byte is not enough space to distinguish them all. So if you have an english(ascii)-only app or an english-only field, you can get away using the older CHAR and VARCHAR types, which only allow one byte per character.
That said, most of the time you should use NCHAR/NVARCHAR. Even if you don't think you need to support (or potentially support) multiple languages in your data, even english-only apps need to be able to sensibly handle security attacks using foreign-language characters.
In my opinion, about the only place where the older CHAR/VARCHAR types are still preferred is for frequently-referenced ascii-only internal codes and data on platforms like Sql Server that support the distinction — data that would be the equivalent of an enum in a client language like C++ or C#.
Meanwhile the SQL92 standard explains
the feature even less helpfully,
stating only that NATIONAL CHARACTER
is stored in an implementation-defined
character set. As opposed to a mere
CHARACTER, which is stored in an
implementation-defined character set.
Which might be a different
implementation-defined character set.
Or not.
Coincidentally, this is the same "distinction" the C++ standard makes between char and wchar_t. A relic of the Dark Ages of Character Encoding when every language/OS combination has its own character set.
Should one use NVARCHAR for all
character (non-binary) storage
purposes?
It is not important whether the declared type of your column is VARCHAR or NVARCHAR. But it is important to use Unicode (whether UTF-8, UTF-16, or UTF-32) for all character storage purposes.
Are there currently-popular DBMSs in
which it will do something undesirable
Yes: In MS SQL Server, using NCHAR makes your (English) data take up twice as much space. Unfortunately, UTF-8 isn't supported yet.
EDIT: SQL Server 2019 finally introduced UTF-8 support.
In Oracle, the database character set can be a multi-byte character set, so you can store all manner of characters in there....but you need to understand and define the length of the columns appropriately (in either BYTES or CHARACTERS).
NVARCHAR gives you the option to have a database character set that is a single-byte (which reduces the potential for confusion between BYTE or CHARACTER sized columns) and use NVARCHAR as the multi-byte. See here.
Since I predominantly work with English data, I'd go with a multi-byte character set (UTF-8 mostly) as the database character set and ignore NVARCHAR. If I inherited an old database which was in a single-byte characterset and was too big to convert, I may use NVARCHAR. But I'd prefer not to.

what could be the datatype for the column having weblinks

One curious question. if i have a table with column with weblinks then what could be the datatype nvarchar or varchar. and what could be the size of that datatype?
In general, use nvarchar.
What are the main performance differences between varchar and nvarchar SQL Server data types?
RFC2616 says there's no maximum length of a URL, but 2000 is probably safe.
What is the maximum length of a URL in different browsers?
You should use nvarchar since chinese national characters were allowed in URL names and varchar can't handle those. Maximum URL size is 2083 characters (at least in IE), but you don't see those quite often. If you want to be completely sure that you can handle all URLs you shuold use nvarchar(2083).
I'd say varchar(1000) would be enough (unless you're going to store some Amazon URLs, of course) :). You don't need nvarchar because national URLs are experimental and are eventually converted to Latin with special characters.
Typically Web servers set fairly generous limits on length for genuine URLs e.g. up to 2048 or 4096 characters.
So, if you want to be safe and still don't want to use varchar(max), you can use varchar(2048) and varchar(4096), respectively.
For data with embedded URLs, you can use either varchar or nvarchar. The only difference between nvarchar and varchar is nvarchar is a varchar that natively supports unicode data. Also, the storage space is larger: varchar is 8-bit, while unicode is 16-bit, so double the space.
A future-proof solution would be nvarchar, since recent movements toward full unicode domain names are noticable, e.g. Russia Begins Registering Domains in Cyrillic.
URLs are subject to RFC1738:
URLs are written only with the graphic
printable characters of the
US-ASCII coded character set. The
octets 80-FF hexadecimal are not
used in US-ASCII, and the octets 00-1F
and 7F hexadecimal represent
control characters; these must be
encoded
This would place all 'weblinks' safely in the VARCHAR camp. With SQL Server 2008 R2 though you need not to worry anymore, since Unicode Compression is available (on Enterprise and DataCenter Editions).

Sql Server Collation

The book I am reading says that
SQL Server supports two kinds of character data types—regular and Unicode. Regular data types include CHAR and VARCHAR, and Unicode data types include NCHAR and NVARCHAR. The difference is that regular characters use one byte of storage for each character, while Unicode characters require two bytes per character. With one byte of storage per character, a choice of a regular character type for a column restricts you to only one language in addition to English because only 256 (2^8) different characters can be represented by a single byte.
What I came to know by this is, if I use Varchar then I can use only one language(For ex. Hindi, an Indian Language) along with English.
But When I run this
Create Table NameTable
(
NameColumn varchar(MAX) COLLATE Indic_General_90_CI_AS_KS
)
It shows me error "Collation 'Indic_General_90_CI_AS_KS' is supported on Unicode data types only and cannot be applied to char, varchar or text data types."
So where have I misunderstood the author?
Thanks
You can find a list of collations here, along with the encoding type
Certain collations will apply only to 1-byte encodings -- 127 bits are used for normal ASCII and 128 are available for other characters -- hindi probably does not fit in 128 characters so a 1-byte collation does not apply to it.
You will have to use a nvarchar (or other 'n' prefixed character type).
-- edit --
French_CI_AS as a non-english example
One of the things collations enable is language and locale specific ordering of characters. Therefore French != latin.
Another example Arabic_CI_AS
This is a 1-byte encoding with the arabic alphabet.
Use this in your SQL Statement, considering "content" is a variable containing the Arabic string you want to insert:
update Table set contents = convert(text, N'" + content + "' collate Arabic_CI_AS)
It works fine.
you can use this
name = N'مرحبا كيف حالك'