SQL NVARCHAR(MAX) returning ASCII and Weird Characters instead of Text - sql

I have an SQL Table and I'm trying to return the values as a string.
The values should be city names like Sydney, Melbourne, Port Maquarie etc.
But When I run a select I either get black results or as detailed in the first picture some strange backwards L character. The column is an NVARCHAR(MAX)
SELECT ctGlobalName FROM Crm.Cities
Then I tried using MSSQL's Edit top 200 rows feature and I could see the names of the cities, but also all these weird ascii characters.
Now I didn't create the database, I'm just running queries on it. Some things I've read have suggested it is a problem with the Collation. But the table is SQL_Latin1_General_CP1_CI_AS which matches the server collation.
I'm sure there must be something I can add to my select query to return the values as an ordinary string. Is there something I can do to my select query to return the expected format without the weird characters?

An NVARCHAR datatype can store Unicode characters, which are used for languages that are not supported by the ASCII character set i.e. non-English (or related) languages such as Chinese or Indonesian. If your SQL Server or Windows doesn't have that language installed then you might see strange-looking representations of the data.
On the other hand, it could also be that the application that updates this table has just stored bad data in that column.
Either way you might need to do some string manipulation to strip out the characters you don't want.

Related

How to translate and what could cause characters such as å¿è€

Goal:
I only use select statements with the dbs I have access to.
One of the columns is supposed to store legible english sentences but there are values with strange characters. I would like to find a way to translate those special characters to legible characters
My question is two fold:
Can I translate the following string to a legible format all stored data is basically lost in translation
How can I ensure that the data is stored correctly?
Column Collation: SQL_Latin1_General_CP1_CI_AS
Column Data Type: NVARCHAR(300)
Data Examples:
å¿è€
ÐžÐ±Ð°Ð¶Ð´Ð°Ð½Ð¸Ñ Ð·Ð°
Use prefix of ‘N’ While you Enter to table
Insert Into TownMessage_Tbl Values (elanat=N' + Elanat +"')

SQL Server NText field limited to 43,679 characters?

I working with SQL Server data base in order to store very long Unicode string. The field is from type 'ntext', which theoretically should be limit to 2^30 Unicode characters.
From MSDN documentation:
ntext
Variable-length Unicode data with a maximum string length of 2^30 - 1 (1,073,741,823) bytes. Storage size, in bytes, is two times the string length that is entered. The ISO synonym for ntext is national
text.
I'm made this test:
Generate 50,000 characters string.
Run an Update SQL statement
UPDATE [table]
SET Response='... 50,000 character string...'
WHERE ID='593BCBC0-EC1E-4850-93B0-3A9A9EB83123'
Check the result - what actually stored in the field at the end.
The result was that the field [Response] contain only 43,679 characters. All the characters at the end of the string was thrown out.
Why this happens? How I can fix this?
If this is really the capacity limit of this data type (ntext), which another data type can store longer Unicode string?
Based on what I've seen, you may just only be able to copy 43679 characters. It is storing all the characters, they're in the db(check this with Select Len(Reponse) From [table] Where... to verify this), and SSMS has problem copying more than when you go to look at the full data.
NTEXT datatype is deprecated and you should use NVARCHAR(MAX).
I see two possible explanations:
Your ODBC driver you use to connect to database truncate parameter value when it is too long (try using SSMS)
You write you generate your input string. I suspect you generate CHAR(0) which is Null literal
If second is your case make sure you cannot generate \0 char.
EDIT:
I don't know how you check the length but keep in mind that LEN does not count trailing whitespaces
SELECT LEN('aa ') AS length -- 2
,DATALENGTH('aa ') AS datalength -- 7
Last possible solution I see you do sth like:
SELECT 'aa aaaa'
-- result in SSMS `aa aaaa`: so when you count you lose all multiple whitespaces
Check query below if returns 100k:
SELECT DATALENGTH(ntext_column)
For all bytes; Grid result on right click and click save result to file.
Can confirm. The actual limit is 43679. Had a problem with a subscription service for a week now. Every data looked good, but it still gave us an error that one of the fields have invalid values, even tho, it got correct values in. It turned out that the parameters was stored in NText and it maxed out at 43679 characters. And because we cannot change the database design, we had to make 2 different subscriptions for the same thing and put half of the entities to the other one.

SQL column collation change

I would like to change a column collation to some Polish collation and be able to view Polish characters properly. All three, original column, original table and original database, use SQL_Scandinavian_CP850_CS_AS.
For column collation change I tried:
SELECT CAST([ColumnName] AS nvarchar(50)) COLLATE Polish_CI_AS FROM t1
These 3 example letters appear in Scandinavian table:
SELECT 'ØùÒ' COLLATE Polish_CI_AS
Should return in results łŚń. Instead it shows 'OuO'.
Unfortunately SQL Server does not support OEM code page 852 which is what you need to convert code page 850 data into if you want to convert 'ØùÒ' to 'łŚń'. You can change the collation of data without SQL Server doing character mapping by CASTing through varbinary, but this only works with supported collations.
An alternative approach might be to create a user-defined function that takes a string and maps characters one-at-a-time, so Ø maps to ł etc. Fiddly to do, there are (up to) 127 characters to map, but not difficult.

SQL Server : how do I get SQL to recognize ö and ß correctly

I currently have a table that has 4 columns. The ID of the object, ID of another object in another table, nvarchar data, and a bool.
PK is made up of the first 3 columns.
The values größe conflicts with grösse, and große conflicts with grosse
meaning I can have one of the first two and one of the second two, but not all of them
The column has collation set to SQL_Latin1_General_CS_AS
I believe this is where the problem lies but this does handle many other unicode characters correctly. Has anyone encountered this and know what my problem is?
For reference both of these play okay with all of the above.
gråsse
grøsse
Example for clarity, for me this is printing equal:
IF (N'grösse' COLLATE Latin1_General_CS_AS = N'größe' COLLATE Latin1_General_CS_AS)
BEGIN
PRINT 'EQUAL'
END
When I expect these to be different.
handle many other unicode characters correctly
What does correctly mean to you? The different collations in SQL Server have different behavior. Maybe you are looking for a binary collation like LATIN1_GENERAL_BIN2. This one compares code-points only. Duplicates will only occur when the strings are binary-identical. Your example code would behave like you want it to.
The non-binary collations try to apply lexicographic rules. They sort and compare strings like a phone book would.

When should I use the SQL Server Unicode 'N' Constant?

I've been looking into the use of the Unicode 'N' constant within my code, for example:
select object_id(N'VW_TABLE_UPDATE_DATA', N'V');
insert into SOME_TABLE (Field1, Field2) values (N'A', N'B');
After doing some reading around when to use it, and I'm still not entirely clear as to the circumstances under which it should and should not be used.
Is it as simple as using it when data types or parameters expect a unicode data type (as per the above examples), or is it more sophiticated than that?
The following Microsoft site gives an explanation, but I'm also a little unclear as to some of the terms it is using
http://msdn.microsoft.com/en-us/library/ms179899.aspx
Or to precis:
Unicode constants are interpreted as
Unicode data, and are not evaluated by
using a code page. Unicode constants
do have a collation. This collation
primarily controls comparisons and
case sensitivity. Unicode constants
are assigned the default collation of
the current database, unless the
COLLATE clause is used to specify a
collation.
What does it mean by:
'evaluated by using a code page'?
Collation?
I realise this is quite a broad question, but any links or help would be appreciated.
Thanks
Is it as simple as using it when data types or parameters expect a unicode data type?
Pretty much.
To answer your other points:
A code page is another name for encoding of a character set. For example, windows code page 1255 encodes Hebrew. This is normally used for 8bit encodings for characters. In terms of your question, strings may be evaluated using different code pages (so the same bit pattern may be interpreted as a Japanese character or an Arabic one, depending on what code page was used to evaluate it).
Collation is about how SQL Server is to order strings - this depends on code page, as you would order strings in different languages differently. See this article for an example.
National character nchar() and nvarchar() use two bytes per character and support international character set -- think internet.
The N prefix converts a string constant to two bytes per character. So if you have people from different countries and would like their names properly stored -- something like:
CREATE TABLE SomeTable (
id int
,FirstName nvarchar(50)
);
Then use:
INSERT INTO SomeTable
( Id, FirstName )
VALUES ( 1, N'Guðjón' );
and
SELECT *
FROM SomeTable
WHERE FirstName = N'Guðjón';