I'm trying to update a column of type XML.
Text to be inserted in the XML fields: "& Decision ↨‼ Agreement"
Text converted to XML: <?xml version="1.0" encoding="utf-16"?><Informations xmlns="http://monschema"><Text lGic="fdf475bc-9fed-4f61-b321-f81949cb51ca" id="71e231e6-ecbd-4848-ba6f-004bdddefb79">& Décision Accord</Text></Informations>
Error: Msg 9420, Level 16, State 1, Line 7
XML parsing: line 1, character 263 character non-compliant XML
I do not understand why the character with ascii code "" has a problem.
If I replace  by  , it works !
Can you help me?
Thank you in advance
The character references and denote control characters that are disallowed in XML 1.0. The real problem here is that they do not denote the characters you have in the text. The characters “↨‼” are U+21A8 UP DOWN ARROW WITH BASE and U+203C DOUBLE EXCLAMATION MARK, so they should be written as ↨‼.
The reason why get the odd character references is probably that in the CP437 encoding, “↨‼” are placed in code positions 12 and 13 (hex.). So this is an encoding confusion, and some conversion has applied a wrong conversion. In XML, the numbers in character references always mean Unicode code numbers.
These control characters are not supported in XML version 1.0 documents.
You should be able to change your version to 1.1 in the version attribute of the document, in which case the document should validate.
I solved my problem.
This character is from a SQL obtenues view on ORACLE database.
The character -> on ORACLE Is interpreted by ↨ on SQL SERVER.
I'll do a replace in my view
Related
I have a launch condition error string in String_en-US.wxl:
<WixLocalization Culture="en-us" Codepage="1252" xmlns="http://schemas.microsoft.com/wix/2006/localization">
<String Id="ERR_REQUIRED_APP_ABSENT">This product requires XXX to be on the system. Please download it from "https://knowledge.xxx.com/knowledge/llisapi.dll?func=ll&objId=59284919&objAction=browse&sort=name&viewType=1", install it and try again.</String>
</WixLocalization>
It seems having the ampersand signs (&) and the equal signs (=) cause the light error:
Strings_en-US.wxl(0,0): error LGHT0104: Not a valid localization file; detail: '=' is an unexpected token. The expected token is ';'. Line 36, position 172.
I even tried to escape them using = which is equivalent to the equal sign but it complaint about the ampersand. "How can I avoid the error?
CDATA: A CDATA section is "...a section of element content that is marked for the parser to interpret as only character data, not markup."
In this case, something like this:
<String Id="TEST1"><![CDATA[https://www.hi.com/one&two&three&v=1]]></String>
XML Escape Characters: XML escape characters are normally used for encoding special characters in XML documents. The escape character for & is & & (more) - CDATA is an alternative approach.
Links:
What characters do I need to escape in XML documents?
https://en.wikipedia.org/wiki/CDATA
How do you store chars 128 to 255 in VARCHAR..?
SQL seems to change some of these to char(63) '?'. I'm not sure if it's something to do with collation? UTF-8? N'..'? I've tried COLLATE Latin1_General_Bin, not sure if it supports extended ascii though..
Obviously works with NVARCHAR, but in theory this should work in VARCHAR too..?
The character stored in varchar/char columns beyond the ASCII 0-127 character range is determined by the code page associated with the collation. Characters not specifically defined by the code page are ether mapped to a similar character or, when there is none, '?'.
You can list collations along with the associated code page with this query:
SELECT name, description, COLLATIONPROPERTY(name, 'CodePage') AS CodePage
FROM fn_helpcollations();
Dan's answer got me on the write track.
VARCHAR definitely does store Extended ASCII, but it depends on the code page associated with the collation. I'm using Latin1_General_100_BIN which uses code page 1252.
https://en.wikipedia.org/wiki/Windows-1252
According to this code page the the following chars are undefined:
129, 141, 143, 144, 157
In reality it looks like SQL exclude most chars from 128 to 159. Easy solution was just to remove those characters.
I have a strange situation displaying value from SQL server. There is a value stored in SQL server 2008 field which is hidden when queried from server and shown in Management Studio (see below).
Test template 2
But when displayed on a screen in HTML editor it is showing as ? (see below)
Test template 2?
When I check for ascii value it shows 63. Not sure how user got this special value into this field in SQL server. When I test by entering ? into input field and display it works fine without any issues.
I don't want to blindly remove last character from this field. I am trying to determine a solution to identify this invisible value and remove it either while storing or displaying.
Any solution is greatly appreciated.
As comments below suggests this turned out to be Unicode 8203 (zero width space).
My next question is how to replace this Unicode 8203 in one statement in T-SQL without parsing through each character?
Use REPLACE to remove the zero-width space character:
-- setup unicode string containing zero-width character
DECLARE #UnicodeReplace NVARCHAR(5) = N'Test' + NCHAR(8203);
-- check that unicode string length is 5,
-- and prove existence of zero-width space character matching unicode 8203
SELECT #UnicodeReplace AS String,
LEN(#UnicodeReplace) AS Length,
UNICODE(SUBSTRING(#UnicodeReplace, 5, 1)) AS UnicodeValue
-- replace and prove the unicode string length is reduced to 4
SELECT REPLACE(#UnicodeReplace, NCHAR(8203), N''),
LEN(REPLACE(#UnicodeReplace, NCHAR(8203), N'')) AS Length;
SQL Fiddle
Such characters could not be replaced if database collation has default values like this: SQL_Latin1_General_CP1_CI_AS. In such cases this command could work:
set #word=replace(#word collate Latin1_General_100_BIN2, nchar(8205),N'')
I have a xml that says it's encoding is UTF-8. When I use openxml to import data into sql, I always get "XML parsing: line xxxxxx, character xx, illegal xml character.
Right now I can go to each line and replace it with the a legal character and it goes well. Sometimes there maybe be more than 5 mac roman characters and it becomes tedious to replace. I am currently using notepad ++ and there is probably a way for this.
Can anyone suggest if anything can be done in sql level or does it have to checked before ran in sql?
So far, most of the characters found are, x95, x92, x96, xbc, xbd, xbo.
Thanks.
In your question, you did not specify whether illegal characters you had to remove were Unicode or not. Or whether the file was really expected to contain UTF-8 characters. Unlike for the ASCII, for UTF-8 some byte combinations are illegal, so if you declare the text file to be encoded in UTF-8, you might not be able to read it successfully till end (such a thing could never happen with ASCII).
So it is possible that by removal of <?xml version="1.0" encoding="UTF-8"?> you just declared some non-unicode encoding of your file (instead of previously declared UTF-8), so reading the data passed. You did not have many foreign characters like ľťčý in the file, did you? Normally, it is a must that you check what happened to those after the import. It might happen that your import passes without error, but city name Čadca becomes äadca and somebody will thank your company for rendering his address unreadable.
Our application receives data from various sources. Some of these contain HTML character makeup instead of regular characters. So instead of string "â" we receive string "â".
How can we convert "â" to a character in the database character set using SQL/PLSQL?
Our database is 10GR2.
Unescape_reference and excape_reference I believe is what you're looking for
UTL_I18N.UNESCAPE_REFERENCE('hello < å')
This returns 'hello <'||chr(229).
http://docs.oracle.com/cd/B28359_01/appdev.111/b28419/u_i18n.htm#i998992
You can use the CHR() function to convert an ascii character number to a character representation.
SELECT chr(226)
FROM dual;
CHR(226)
--------
â
For more information see: http://www.techonthenet.com/oracle/functions/chr.php
Hope it helps...
one solution
replace(your_test, 'â', chr(226))
but you'd have to nest many replace functions, one for each entity you need to replace. This might be very slow if you have to replace many.
You can wrote your own function, seqrching for the ampersand and replacing when found.
Have you searched the Oracle Supplied Packages manual? I know they have a function that does the opposite for a few entities.
to convert a column in oracle which contains HTML items to plain text, you could use:
trim(regexp_replace(UTL_I18N.unescape_reference(column_name), '<[^>]+>'))
It will replace HTML character as above stated but will also remove HTML tags en remove leading and trailing spaces.
I hope it will help someone.