Pound sign getting converted into question mark in Azure Build Pipeline - sql

We have one column in our table whose name is "House€1000" but after deploying the code from Azure Build Pipeline, we could see that the pound sign got converted to "?" in Azure Build Artifacts. Can anyone suggest some something which can resolve this issue?

The possibe cause is when we use non-unicode data types like char, varchar while defining the columns.
To cover characters of all languages , there might be different number of bytes involved .
Using unicode data types like nvarchar,nchar can covert them to UTF-8 encoded value .But they may not contain enough bytes to use that language .So try by including the bytes involved in that particular language for that symbol to appear .ex:VARCHAR( 270), currency > decimal (19, 9) to avoid data loss through data truncation.
Enable utf encoding while preparing the columns.Some times if not enabled , try to work around by using escaping characters ex: [% ] or [^] which means % and ^.
Please go through this Collation and Unicode support - SQL Server | Microsoft Docs
Reference:
Introducing UTF-8 support for Azure SQL Database | Azure updates | Microsoft Azure
storing-uk-pound-sterling-in-a-database

Related

How to fix character encoding in sql query

I have a db2 database where I store names containing special characters. When I try to retrieve them with an internal software, I get proper results. However when I tried to do the same with queries or look into the db, the characters are stored strangely.
The documentation says that the encoding is utf-8 latin1.
My query looks something like this:
SELECT firstn, lastn
FROM unams
WHERE unamid = 12345
The user with the given ID has some special characters in his/her name: é and ó, but the query returns it as Ă© and Ăł.
Is there a way to convert the characters back to their original form with using some simple SQL function? I am new to databases and encoding, trying to understand the latter by reading this but I'm quite lost.
EDIT: Currently sending queries via SPSS Modeler with a proper ODBC driver, the database lies on a Windows Server 2016
Per the comments, the solution was to create a Windows environment variable DB2CODEPAGE=1208 , then restart, then drop and re-populate the tables.
If the applications runs locally on the Db2-server (i.e. only one hostname is involved) then the same variable can be set. This will impact all local applications that use the UTF-8 encoded database.
If the application runs remotely from the Db2-server (i.e. two hostnames are involved) then set the variable on the workstation and on the Windows Db2-server.
Current versions of IBM supplied Db2-clients on Windows will derive their codepage from the regional settings which might not always render Unicode characters correctly, so using the DB2CODEPAGE=1208 forces the Db2-client CLI drivers to use a Unicode application code page to override this.
with t (firstn) as (
values ('éó')
--SELECT firstn
--FROM unams
--WHERE unamid = 12345
)
select x.c, hex(x.c) c_hes
from
t
, xmltable('for $id in (1 to string-length($s)) return <i>{substring($s, $id, 1)}</i>'
passing t.firstn as "s" columns tok varchar(6) path '.') x(c);
C C_HEX
- -----
é C3A9
ó C3B3
The query above converts the string of characters to a table with each character (C) and its hex representation (C_HEX) in each row.
You can run it as is to check if you get the same output. It must be as described for a UTF-8 database.
Now try to comment out the line with values ('éó') and uncomment the select statement returning some row with these special characters.
If you see the same hex representation of these characters stored in the firstn column, then this means, that the string is stored appropriately, but your client tool (SPSS Modeller) can't show these characters correctly due to some reason (wrong font, for example).

Handling chinese characters in SQL Server 2016

Our ETL team is sending us some data with chinese description. When we are loading that data in our SQL Server database, those descriptions are coming up as blank.
We tried changing the column format to nvarchar, but that doesnt help.
Can you please help.
Thanks
You must use the N prefix when dealing with NVARCHAR.
INSERT INTO table (column) VALUES (N'chinese characters')
Prefix a Unicode character string constants with the letter N to
signal UCS-2 or UTF-16 input, depending on whether an SC collation is
used or not. Without the N prefix, the string is converted to the
default code page of the database that may not recognize certain
characters. Starting with SQL Server 2019 preview, when a UTF-8
enabled collation is used, the default code page is capable of storing
UNICODE UTF-8 character set.
Source: https://learn.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql?view=sql-server-2017

What is the best SQL type to use for a large string variable?

Apologies for the rather basic question.
I have an error string that is built dynamically. The data in the string is passed by various third parties so I don't have any control, nor do I know the ultimate size of the string.
I have a transaction table that currently logs details and I want to include the string so that I can reference back to it if necessary.
2 questions:
How should I store it in the database?
Should I do anything else such as contrain the string in code?
I'm using Sql Server 2008 Web.
If you want to store non unicode text, you can use:
varchar(max) or nvarchar(max)
Maximum length is 2GB.
Other alternatives are:
binary or varbinary
Drawbacks: you can't search into these fields and index and order them
and the maximum size : 2GB.
There are TEXT and NTEXT, but they will be deprecated in the future,
so I don't suggest to use them.
They have the same drawbacks as binary.
So the best choice is one of varchar(max) or nvarchar(max).
You can use SQL Server nvarchar(MAX).
Check out this too.
Eventualy, you can enable and use a FILESTREAM feature of SQL Server 2008 (it's supported by WEB edition), and deal with extra large amount of data in sense of documents.
Of course, you need to be sure that you will use a benefit of this service.

Encoding in databases sql commands

I will like to know what entity is responsible for doing the encoding conversions necessaries to accomplish a SQL command successfully. For example: you have several places where output a SQL command.
SELECT title from T1 where title='título'
This may be execute from within the database client (which I assume it reads the database encoding and encode its commands after that) but what happen when this is a string in a programming language whose string encoding is not the same as the database?
Where the conversion takes place? In the class that connects to the database? The database and the connector do some kind of agreement when they are handshaking?
I'll love some information about this topic or some link where I can read about it.
Thanks in advance.
Case Java + MySQL
Internally in Java String is text is Unicode encoded.
In a Java source text should have the same encoding that the java compiler uses. A wrong matching between editor and compiler would mess up string literals.
Java thus transfers a Unicode string to the JDBC driver, the database client library.
The MySQL connections string can indicate which encoding to use in the client library to communicate with the database server. useEncoding=UTF-8, so Unicode, would be a good international choice.
The database can set a default encoding.
As also any table.
As also per column (say one for Hindi one for Chinese).
Besides the encoding, also the collation (sorting order of strings) is language and encoding specific. And have to be considered too.

SQL Server database with Latin1 codepage shows Japanese Chars as "?"

Three questions with the following scenario:
SQL Server 2005 production db with a Latin1 codepage and showing "?" for invalid chars in Management Studio.
SomeCompanyApp client as a service that populates the data from servers and workstations.
SomeCompanyApp management console that shows "?" for Asian characters.
Since this is a prod db I will not write to it.
I don't know if the client app that is storing the data in the database is actually storing it correctly as Unicode and it simply doesn't show because they are using Latin1 for the console.
Q1: As I understand it, SQL Server stores nvarchar text as Unicode regardless of the codepage or am I completely wrong and if the codepage is Latin1 then everything that is not in that codepage gets converted to "?".
Q2: Is it the same with a text column?
Q3: Is there a way using SQL Server Management Studio or Visual Studio and some code (don't care which language :)) to query the db and show me if the chars really do show up as Japanese, Chinese, Korean, etc.?
My final goal is to extract data from the db and store it in another db using UTF-8 to show Japanese and other Asian chars as what they are in my own client webapp. I will settle for an answer to Q3. I can code in several languages and at the very least understand some others but I'm just not knowledgeable enough about Unicode. In case you want to know my webapp will be using pyodbc and cassandra but for these questions that doesn't matter.
When inserting into an NVARCHAR column in SSMS, you need to make absolutely sure you're prefixing your string with a N:
This will NOT work:
INSERT INTO dbo.MyTable(NVarcharColumn) VALUES('Some Text with Special Char')
SQL Server will interpret your string in the VALUES(..) as VARCHAR and thus strip off any special characters.
You need this:
INSERT INTO dbo.MyTable(NVarcharColumn) VALUES(N'Some Text with Special Char')
Prefixing your text literal with an N'..' tells SQL Server to treat this as NVARCHAR all the way.
Does this help you solve your Q3 ??