Encoding in databases sql commands - sql

I will like to know what entity is responsible for doing the encoding conversions necessaries to accomplish a SQL command successfully. For example: you have several places where output a SQL command.
SELECT title from T1 where title='título'
This may be execute from within the database client (which I assume it reads the database encoding and encode its commands after that) but what happen when this is a string in a programming language whose string encoding is not the same as the database?
Where the conversion takes place? In the class that connects to the database? The database and the connector do some kind of agreement when they are handshaking?
I'll love some information about this topic or some link where I can read about it.
Thanks in advance.

Case Java + MySQL
Internally in Java String is text is Unicode encoded.
In a Java source text should have the same encoding that the java compiler uses. A wrong matching between editor and compiler would mess up string literals.
Java thus transfers a Unicode string to the JDBC driver, the database client library.
The MySQL connections string can indicate which encoding to use in the client library to communicate with the database server. useEncoding=UTF-8, so Unicode, would be a good international choice.
The database can set a default encoding.
As also any table.
As also per column (say one for Hindi one for Chinese).
Besides the encoding, also the collation (sorting order of strings) is language and encoding specific. And have to be considered too.

Related

Why accents are not recognized in sqlplus

I have a subject table which has a theme field contains the following rows :
theme
-----
pays
économie
associée
And I have this basic query :
SELECT * FROM SUBJECT WHERE THEME='associée';
The query runs fine in Sql developer and returns the expected row to me.
On the other hand under Sqlplus it returns 0 lines to me (which is not normal).
I have the impression that the query does not recognize accented characters under sqlplus. I am thinking of an NLS_LANG problem but I do not know about it. Please help.
Thank you in advance.
Set your OS session's NLS_LANG variable to the value of, e.g., ENGLISH_AMERICA.AL32UTF8 and restart your SQL Developer. Retry afterwards.
If that didn't help, try also running your query as follows:
SELECT * FROM SUBJECT WHERE THEME = n'associée';
Notice the n before the string literal. That's a nvarchar2 string literal modifier. Depending on your DB charset/national charset settings you may need to explicitly state that the value you are querying for, is "national charset", not just a "regular charset".
If that didn't help, there's actually a multitude of additional variables that come into play when working with accented characters against an Oracle DB.
Explanation:
Your SQL Developer does recognize accents... provided that you have your Oracle DB session using character set compatible with your database character set. And your Oracle DB session's character set can be set either on OS level (via OS environment variable) or, possibly(!), in SQL Developer's options directly. Alas, the said multitude of other factors may include (though not exclusively):
your OS regional settings,
your OS Unicode support,
your Oracle client software's (SQL Developer) Unicode support,
your Java JDK/JRE's Unicode support,
your JDBC driver's Unicode support,
your other *DBC drivers' Unicode support, if there are any more in chain.
Sad thing is that the more interfaces you have between your keyboard and your Oracle database, the more likely is one of them to fiddle with your charset conversions badly.
So, let's just hope that the first two hints work for you, otherwise I can't help you (that easily).

How to fix character encoding in sql query

I have a db2 database where I store names containing special characters. When I try to retrieve them with an internal software, I get proper results. However when I tried to do the same with queries or look into the db, the characters are stored strangely.
The documentation says that the encoding is utf-8 latin1.
My query looks something like this:
SELECT firstn, lastn
FROM unams
WHERE unamid = 12345
The user with the given ID has some special characters in his/her name: é and ó, but the query returns it as Ă© and Ăł.
Is there a way to convert the characters back to their original form with using some simple SQL function? I am new to databases and encoding, trying to understand the latter by reading this but I'm quite lost.
EDIT: Currently sending queries via SPSS Modeler with a proper ODBC driver, the database lies on a Windows Server 2016
Per the comments, the solution was to create a Windows environment variable DB2CODEPAGE=1208 , then restart, then drop and re-populate the tables.
If the applications runs locally on the Db2-server (i.e. only one hostname is involved) then the same variable can be set. This will impact all local applications that use the UTF-8 encoded database.
If the application runs remotely from the Db2-server (i.e. two hostnames are involved) then set the variable on the workstation and on the Windows Db2-server.
Current versions of IBM supplied Db2-clients on Windows will derive their codepage from the regional settings which might not always render Unicode characters correctly, so using the DB2CODEPAGE=1208 forces the Db2-client CLI drivers to use a Unicode application code page to override this.
with t (firstn) as (
values ('éó')
--SELECT firstn
--FROM unams
--WHERE unamid = 12345
)
select x.c, hex(x.c) c_hes
from
t
, xmltable('for $id in (1 to string-length($s)) return <i>{substring($s, $id, 1)}</i>'
passing t.firstn as "s" columns tok varchar(6) path '.') x(c);
C C_HEX
- -----
é C3A9
ó C3B3
The query above converts the string of characters to a table with each character (C) and its hex representation (C_HEX) in each row.
You can run it as is to check if you get the same output. It must be as described for a UTF-8 database.
Now try to comment out the line with values ('éó') and uncomment the select statement returning some row with these special characters.
If you see the same hex representation of these characters stored in the firstn column, then this means, that the string is stored appropriately, but your client tool (SPSS Modeller) can't show these characters correctly due to some reason (wrong font, for example).

How does oracle (via sql/plus) determine the charset used to evaluate a sql script

I'd like to have a few details about how oracle (via sql/plus) determine the charset used to evaluate a sql script.
My database is configured like this:
select VALUE from nls_database_parameters where parameter='NLS_CHARACTERSET';
VALUE
------
WE8ISO8859P15
The problem is that I read here http://www.orafaq.com/wiki/NLS that session parameters could take precedence over database parameters.
Does it mean that database encoding is overriden by the one defined in the NLS_LANG environment variable of the user who executes the script?
Apparently, it's not possible to modify the encoding in a script via an alter session statement.
I'm asking this question since I already had a problem of corrupted characters with a production script executed by a subcontractor in India. I actually don't know if it was because he did something wrong with my file (like copy/paste in a sql gui client) or if it was because of his environment.
To summarize my actual problem, will everything be OK if
The user is configured with a charset of UTF8
My sql file is encoded in UTF8
My database is in WE8ISO8859P15
Thank's in advance for your answers.
Yes, you are correct. The Oracle Client always converts between the database characterset and the characterset of the client machine, which is determined by the NLS_LANG environment variable or the system settings.
Please note that UTF8 supports only Unicode version 3.1 and earlier. Use AL32UTF8 instead to get full Unicode support.

SQL Server database with Latin1 codepage shows Japanese Chars as "?"

Three questions with the following scenario:
SQL Server 2005 production db with a Latin1 codepage and showing "?" for invalid chars in Management Studio.
SomeCompanyApp client as a service that populates the data from servers and workstations.
SomeCompanyApp management console that shows "?" for Asian characters.
Since this is a prod db I will not write to it.
I don't know if the client app that is storing the data in the database is actually storing it correctly as Unicode and it simply doesn't show because they are using Latin1 for the console.
Q1: As I understand it, SQL Server stores nvarchar text as Unicode regardless of the codepage or am I completely wrong and if the codepage is Latin1 then everything that is not in that codepage gets converted to "?".
Q2: Is it the same with a text column?
Q3: Is there a way using SQL Server Management Studio or Visual Studio and some code (don't care which language :)) to query the db and show me if the chars really do show up as Japanese, Chinese, Korean, etc.?
My final goal is to extract data from the db and store it in another db using UTF-8 to show Japanese and other Asian chars as what they are in my own client webapp. I will settle for an answer to Q3. I can code in several languages and at the very least understand some others but I'm just not knowledgeable enough about Unicode. In case you want to know my webapp will be using pyodbc and cassandra but for these questions that doesn't matter.
When inserting into an NVARCHAR column in SSMS, you need to make absolutely sure you're prefixing your string with a N:
This will NOT work:
INSERT INTO dbo.MyTable(NVarcharColumn) VALUES('Some Text with Special Char')
SQL Server will interpret your string in the VALUES(..) as VARCHAR and thus strip off any special characters.
You need this:
INSERT INTO dbo.MyTable(NVarcharColumn) VALUES(N'Some Text with Special Char')
Prefixing your text literal with an N'..' tells SQL Server to treat this as NVARCHAR all the way.
Does this help you solve your Q3 ??

Setting collation property in the connection string to SQL Server 2005

I have a ASP.Net web application with connection string for SQL Server 2005 in the web.config.
Data Source=ABCSERVER;Network Library=DBMSSOCN;Initial Catalog=myDataBase;
User ID=myUsername;Password=myPassword;
I want to specify the collation property in the web.config for different languages like French like
Data Source=ABCSERVER;Network Library=DBMSSOCN;Initial Catalog=myDataBase;
User ID=myUsername;Password=myPassword;Collation=French_CS_AS
But the Collation word is not valid in the connection string.
What is the correct keyword that we need to use to specify the collation in SQL Server 2005 connection string?
Edit
I understand that collation can be set during the database installation and can also be changed. I do not want to change it permanently in the database. But I want the SQLClient to set the collation based on the application's settings.
I only want use it when using SQL Query like
SELECT * FROM TESTTABLE ORDER BY TESTCOLUMN COLLATE French_CS_AS
I am trying to ensure that for a given connection, all the commands/queries for that connection would automatically use the "French_CS_AS" - based on the property setting in the connection string, rather than changing the query definitions
You cannot set collation for a connection. It's simply not supported. See SQL Server Native Client: Connection strings and OLE DB for a really interesting blog article on how connection strings parse out.
You can set a language for a connection. Setting the language for a connection changes how dates are handled and causes system error messages to be provided in the specified language. See Session Language for more information on setting language.
A warning about using collations on non-Unicode types from COLLATE (Transact-SQL):
Code page translations are supported for char and varchar data types, but not for text data type. Data loss during code page translations is not reported.
Ideally, if you want consistent multilingual support from your data you should be using Unicode data types (nvarchar, etc.). You should also see the Collation and International Terminology article on MSDN for more information on this. It contains references to some additional articles that are quite useful as well so don't stop there.