I'm trying to clean a recently imported sql server 2008 database that have to many invalid charcters for my application. and I found different characters with the same ASCII code, ¿that is posible?.
If I execute this query:
select ASCII('║'), ASCII('¦')
I get:
166 166
I need to do a similar work, but with .net code.
If I ask for these char in .net:
? ((int)'║').ToString() + ", " + ((int)'¦').ToString()
I get:
"9553, 166"
Anybody can Explain what happens
Instead of ASCII, use the UNICODE function.
Both ║ and | are not an ASCII characters, so calling ASCII with them would convert incorrectly and result in the wrong value.
Additionally, you need to use unicode strings when calling the UNICODE function, using the N prefix:
SELECT UNICODE(N'║'), UNICODE(N'|')
-- Results in: 9553, 166
Related
I'm using Report Builder 3.0 for my reports. My report runs, however, if a user exports the results to Excel (xlsx) instead of Excel 2003 (xls), they get an "illegal xml character" message when the file is open.
4 of the columns contain "&" and / or " ' "; so I'm trying to replace these special characters; which I believe are causing the issue.
I've tried to update this line:
j.journal_desc AS "Jrnl Description",
with this line:
oreplace(oreplace(j.journal_desc, ’&’, ‘and’),'''','') AS "Jrnl Description",
and it works fine. However when I do this on a second line I get the message: "SELECT Failed. [9804] Response Row size or Constant Row size overflow".
I've tried "otranslate" and it works on 2 columns. However, when I try it on the 3rd column, I get the same overflow message.
Is it possible to use oreplace or otranslate on multiple columns? Am I doing something wrong? Is there a better way to replace these special characters? t
Thanks for the help......
oreplace and otranslate when used the result string will have length of 8000 unicode characterset.each of otranslate will make much longer by 8000. Try to cast to smaller length should fix problem.
CAST(oreplace(journal_desc,'&','and') AS VARCHAR(100))
Based on my research so far this character indicates bad encoding between the database and front end. Unfortunately, I don't have any control over either of those. I'm using Teradata Studio.
How can I filter this character out? I'm trying to perform a REGEX_SUBSTR function on a column that occasionally contains �, which throws the error "The string contains an untranslatable character".
Here is my SQL. AIRCFT_POSITN_ID is the column that contains the replacement character.
SELECT DISTINCT AIRCFT_POSITN_ID,
REGEXP_SUBSTR(AIRCFT_POSITN_ID, '[0-9]+') AS AUTOROW
FROM PROD_MAE_MNTNC_VW.FMR_DISCRPNCY_DFRL
WHERE DFRL_CREATE_TMS > CURRENT_DATE -25
Your diagnostic is correct, so first of all, you might want to check the Session Character Set (it is part of the connection definition).
If it is ASCII change it to UTF8 and you will be able to see the original characters instead of the substitute character.
And in case the character is indeed part of the data and not just an indication for encoding translations issues:
The substitute character AKA SUB (DEC: 26 HEX: 1A) is quite unique in Teradata.
you cannot use it directly -
select '�';
-- [6706] The string contains an untranslatable character.
select '1A'XC;
-- [6706] The string contains an untranslatable character.
If you are using version 14.0 or above you can generate it with the CHR function:
select chr(26);
If you're below version 14.0 you can generate it like this:
select translate (_unicode '05D0'XC using unicode_to_latin with error);
Once you have generated the character you can now use it with REPLACE or OTRANSLATE
create multiset table t (i int,txt varchar(100) character set latin) unique primary index (i);
insert into t (i,txt) values (1,translate ('Hello שלום world עולם' using unicode_to_latin with error));
select * from t;
-- Hello ���� world ����
select otranslate (txt,chr(26),'') from t;
-- Hello world
select otranslate (txt,translate (_unicode '05D0'XC using unicode_to_latin with error),'') from t;
-- Hello world
BTW, there are 2 versions for OTRANSLATE and OREPLACE:
The functions under syslib works with LATIN.
the functions under TD_SYSFNLIB works with UNICODE.
In addition to Dudu's excellent answer above, I wanted to add the following now that I've encountered the issue again and had more time to experiment. The following SELECT command produced an untranslatable character:
SELECT IDENTIFY FROM PROD_MAE_MNTNC_VW.SCHD_MNTNC;
IDENTIFY
24FEB1747659193DC330A163DCL�ORD
Trying to perform a REGEXP_REPLACE or OREPLACE directly on this character produces an error:
Failed [6706 : HY000] The string contains an untranslatable character.
I changed the CHARSET property in my Teradata connection from UTF8 to ASCII and I could now see the offending character, looks like a tab
IDENTIFY
Using the TRANSLATE_CHK command using this specific conversion succeeds and identifies the position of the offending character (Note that this does not work using the UTF8 charset):
TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE) AS BADCHAR
BADCHAR
28
Now this character can be dealt with using some CASE statements to remove the bad character and retain the remainder of the string:
CASE WHEN TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE) = 0 THEN IDENTIFY
ELSE SUBSTR(IDENTIFY, 1, TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE)-1)
END AS IDENTIFY
Hopes this helps someone out.
For some reason Visual Studio does not show me special characters when I query for an XML field. Maybe I stored them wrong? These are smart quotes
Here's the query:
select CustomFields from TABLE where ID=422567 FOR XML PATH('')
When I copy/paste into notepad++ I see this:
What are STS and CCH?
Strings are - as you surely know - just chains of numbers. What they mean and how they are interpreted is depending on codepages, encodings, little or big endian ...
Just have a look on this
SELECT 'test' AS NormalText
--non printable characters
--they are things like backspace, carriage return
,CHAR(0x6) AS ACK --DEC 7
,CHAR(0x7) AS BEL --DEC 9
,CHAR(0x1A) AS CR --DEC 13
,CHAR(0x1B) AS ESC --DEC 27
--printable characters from 0x21 (DEC 33) up to 0x7F (DEC 127) - (almost) not depending on encoding
,CHAR(0x41) AS BigA --DEC 65
,CHAR(0x7E) AS Tilde --DEC 126
--extended - from 0x80 (DEC 128) - very much depending on encoding!
,CHAR(0x93) AS STS --DEC 147
,CHAR(0x94) AS CCH --DEC 148
,CHAR(0x93) + 'test' + CHAR(0x94) AS Mixed
FOR XML PATH('')
This will produce this
<NormalText>test</NormalText>
<ACK></ACK>
<BEL></BEL>
<CR></CR>
<ESC></ESC>
<BigA>A</BigA>
<Tilde>~</Tilde>
<STS>“</STS>
<CCH>”</CCH>
<Mixed>“test”</Mixed>
As you see, there are characters which must be encoded, as there is no character expression for them, others are displayed with their corresponding "picture".
With codes above DEC 127 you enter dangerous terrain. The same string can produce quite different output depending on where you read it.
The "STS" and "CCH" Notepad shows to you, are taken from C1 Controls and Latin-1 Supplement.
This, and the written Smart qoutes in your example point to this. In order to allow smart qoutes there are general characters for start and end which are "replaced" with the fitting opening and closing qoutation marks.
Finally XML in SQL Server is always UTF16. Have a look at this feff0093 and feff0094. These are the signs UTF16 binds to 0x93 and 0x94. My small example shows this clearly...
So the question is: Why does your picture not show the “ and the ” ?
I don't know... The select you put in the first line would not "produce" this XML, it rather takes existing XML out of a column "CustomFields". I'm fairly sure, that this is not a "real" XML-column...
I'm testing UTF-32 characters (specifically emojis) with SQL Server (2008 R2, 10.5) and at this stage I'm checking if the server supports the given code
For this case I'm using the :rose with the following query
SELECT '' + nchar(0x1F339) + 'test'
which returns back in Management Studio with (NULL).
What format do I need to encode the character to have it not return null in SQL Server
SQL Server only supports UCS-2, which is currently (almost) the same as UTF-16. So exactly 2 bytes per character and all that.
An idea, if I may. You can store the data in a BINARY or VARBINARY data field which doesn't care about encoding. You can then use a mapping table or external script to parse the binary into a text field replacing 0x1F339 with :rose: or your own custom forma for example.
Since it's UTF-32, it has two be written as two UTF-16 characters:
-- Returns: 🌹test
SELECT '' + nchar(0xD83C) + nchar(0xDF39) + 'test'
You can find this code under "UTF-16 Hex (C Syntax)" title, following your link.
Also I have to recommend this article, because it was very helpful during investigation: Unicode Escape Sequences Across Various Languages and Platforms (including Supplementary Characters)
Couple of options for those who are looking for answers:
SQL Server technically does not have character escape sequences, but
you can still create characters using either byte sequences or Code
Points using the CHAR() and NCHAR() functions. We are only concerned
with Unicode here, so we will only be using NCHAR().
All versions:
NCHAR(0 - 65535) for BMP Code Points (using an int/decimal value)
NCHAR(0x0 - 0xFFFF) for BMP Code Points (using a binary/hex value)
NCHAR(0 - 65535) + NCHAR(0 - 65535) for a Surrogate Pair / Two UTF-16
Code Units
NCHAR(0x0 - 0xFFFF) + NCHAR(0x0 - 0xFFFF) for a Surrogate Pair / Two
UTF-16 Code Units
CONVERT(NVARCHAR(size), 0xHHHH) for one or more characters in UTF-16
Little Endian (“HHHH” is 1 or more sets of 4 hex digits)
Starting in SQL Server 2012:
If database’s default collation supports Supplementary Characters
(collation name ends in _SC, or starting in SQL Server 2017 name
contains 140 but does not end in _BIN*, or starting in SQL Server
2019 name ends in _UTF8 but does not contain _BIN2), then NCHAR() can
be given Supplementary Character Code Points:
decimal value can go up to 1114111
hex value can go up to 0x10FFFF
Starting in SQL Server 2019:
“_UTF8” collations enable CHAR and VARCHAR data to use the UTF-8
encoding:
CONVERT(VARCHAR(size), 0xHH) for one or more characters in UTF-8 (“HH”
is 1 or more sets of 2 hex digits)
NOTE: The CHAR() function does not work for this purpose. It can only
produce a single byte, and UTF-8 is only a single byte for values 0 –
127 / 0x00 – 0x7F.
I have a Sql statament using special character (ex: ('), (/), (&)) and I don't know how to write them in my VB.NET code. Please help me. Thanks.
Find out the Unicode code point for the character (from http://www.unicode.org) and then use ChrW to convert from the code point to the character. (To put this in another string, use concatenation. I'm somewhat surprised that VB doesn't have an escape sequence, but there we go.)
For example, for the Euro sign (U+20AC) you'd write:
Dim euro as Char = ChrW(&H20AC)
The advantage of this over putting the character directly into source code is that your source code stays "just pure ASCII" - which means you won't have any strange issues with any other program trying to read it, diff it, etc. The disadvantage is that it's harder to see the symbol in the code, of course.
The most common way seems to be to append a character of the form Chr(34)... 34 represents a double quote character. The character codes can be found from the windows program "charmap"... just windows/Run... and type charmap
If you are passing strings to be processed as SQL statement try doubling the characters for example.
"SELECT * FROM MyRecords WHERE MyRecords.MyKeyField = ""With a "" Quote"" "
The '' double works with the other special characters as well.
The ' character can be doubled up to allow it into a string e.g
lSQLSTatement = "Select * from temp where name = 'fred''s'"
Will search for all records where name = fred's
Three points:
1) The example characters you've given are not special characters. They're directly available on your keyboard. Just press the corresponding key.
2) To type characters that don't have a corresponding key on the keyboard, use this:
Alt + (the ASCII code number of the special character)
For example, to type ¿, press Alt and key in 168, which is the ASCII code for that special character.
You can use this method to type a special character in practically any program not just a VB.Net text editor.
3) What you probably looking for is what is called 'escaping' characters in a string. In your SQL query string, just place a \ before each of those characters. That should do.
Chr() is probably the most popular.
ChrW() can be used if you want to generate unicode characters
The ControlChars class contains some special and 'invisible' characters, plus the quote - for example, ControlChars.Quote