CDATA removed in target xml - xslt-1.0

I'm trying to Copy an XML as it is using XSLT. My input XML contains CDATA section. The Output Strips the CDATA and keeps its contents. I want to just make an exact copy of input XML(Including CDATA tags). kindly help.

As long as the CDATA doesn't contain any characters that have a special meaning, stripping the CDATA while keeping the content intact does not change anything as far as XML semantics is concerned. So from the view of an XML processor, you are creating an exact copy. If you want to keep the bytes intact, don't use an XML parser.

You could try using cdata-section-elements attribute on <xsl:output> element. That attribute takes a whitespace separated list of element names (QNames) whose text node children should be output using CDATA sections.
For more info see http://www.w3.org/TR/xslt#output

Related

Why are carriage returns removed when modyfing XML attributes in SQL Server?

In SQL Server 2014, I try to add an XML element with an attribute (that contains a carriage return) using the 'modify' method on the XML datatype.
The carriage returns gets removed - why is that?
Example:
declare #xmldata xml
select
#xmldata = '<root><child myattr="carriage returns
are not a problem"></child></root>'
set
#xmldata.modify('insert <child>modifying text with carriage returns works
ok</child> after (//child)[1]')
set
#xmldata.modify('insert <child myattr="but not
attribute values... why is that?"></child> after (//child)[2]')
select #xmldata
Result:
<root>
<child myattr="carriage returns
are not a problem" />
<child>modifying text with carriage returns works
ok</child>
<child myattr="but not attribute values... why is that?" />
</root>
White space characters can be normalized by parsers.
cf http://www.w3.org/TR/1998/REC-xml-19980210#AVNormalize
While your XML is valid, how exactly white space is rendered is implementation dependent. As you can see the crlf was replaced with a single space.
Please note
In general XML works different with Content and Structural/Meta Data
Attribute values are considered structure and data between tags is considered content.
In the design of XML it was never expected that attributes would be displayed on end-user devices, I would suggest you just make another tag for this end user content.
Section 3.3.3, Attribute-Value Normalization
Before the value of an attribute is passed to the application or
checked for validity, the XML processor MUST normalize the attribute
value by applying the algorithm below, or by using some other method
such that the value passed to the application is the same as that
produced by the algorithm.
All line breaks MUST have been normalized on input to #xA as described in 2.11 End-of-Line Handling, so the rest of this algorithm
operates on text normalized in this way.
Begin with a normalized value consisting of the empty string.
For each character, entity reference, or character reference in the unnormalized attribute value, beginning with the first and
continuing to the last, do the following:
For a character reference, append the referenced character to the normalized value.
For an entity reference, recursively apply step 3 of this algorithm to the replacement text of the entity.
For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value.
For another character, append the character to the normalized value.
If the attribute type is not CDATA, then the XML processor MUST further process the normalized attribute value by discarding any
leading and trailing space (#x20) characters, and by replacing
sequences of space (#x20) characters by a single space (#x20)
character.
The XML specification demands that your CR/LF in an attribute is converted to a single space.

Not Able to Parse multiple records in Libxml2

I have a collection of records which i am parsing using libxml2.
example:
<Customer><name>ABC</name><age>22</age></Customer>
<Customer><name>XBF</name><age>23</age></Customer>
<Customer><name>AHG</name><age>22</age></Customer>
<Customer><name>KKK</name><age>24</age></Customer>
<Customer><name>NNN</name><age>25</age></Customer>'
The problem is that on parsing i am able to parse the first record but the consecutive records are not parsed as the sax delegate startElementSAX() is not being called after parsing the first record.
Is there any way to call this SAX delegate function after the first record is parsed.
Thanks in Advance!
Because they all got the same name. How to distinguish an element from another? You need to make it be an array:
<Customers>
<Customer><name>ABC</name><age>22</age></Customer>
<Customer><name>XBF</name><age>23</age></Customer>
<Customer><name>AHG</name><age>22</age></Customer>
<Customer><name>KKK</name><age>24</age></Customer>
<Customer><name>NNN</name><age>25</age></Customer>
</Customers>
There is also an extra character at the end of the xml which I assume is a copy-paste mistake.

equivalent of normalize function of DOM in JDOM

Can some tell me the function similar to normalize() of DOM in JDOM? I actually want to normalize the XML content and serialise it through XMLSerializer.
Thank You
Sam
Sandeep.
JDOM does not have a direct 'normalize' concept. Writing one would not be particularly hard, though. On the other hand, your intention is to output the XML in some format, and all the JDOM Output mechanisms will normalize the data for you.
So, for example, if you want to output the JDOM document as plain XML text, you can use the XMLOutputter class in org.jdom2.output and use an appropriate org.jdom2.output.Format instance (say, Format.getPrettyFormat() - do not use getRawFormat() as the raw formatter will not normalize the output at all).
In addition to outputting the JDOM document as text-based XML, you can also output to a DOM document, a SAX even stream, and even StAX streams. Each of these will produce a 'Normalized' output.
So, what you want to do (probably), is to:
Document mudoc = .....;
XMLOutputter xout = new XMLOutputter(Format.getPrettyFormat());
xout.output(mydoc, somestream);
Rolf

Using NSXMLParser with ISO-8859-1 truncates words with accents

I have the same exact problem that's in this question, but it didn't get any good answers.
I'm trying to parse an XML file with an ISO-8859-1 encoding, but everytime there's an accentuated word, it gets truncated and doesn't show properly.
Example:
Original Word: Interés
Word Shown: és
You're making the assumption that you only get one -parser:foundCharacters: delegate method for the text. In this case, that's wrong. You're getting two calls to -parser:foundCharacters:, the first being the text up to the accented character, and the second being the text after it. Your logs even demonstrate this.
Therefore, what you need to do is, when you start a new element, you should also initialize a new NSMutableString* instance. Then when you get -parser:foundCharacters: you append to this string instead of replacing it. When the tag closes, this string now contains all of the text in the tag, instead of just the last text block.
You must use a NSMutableString and append chars with it on the foundCharacters method.
That's why your string becomes truncated.

How to get SQL query to not escape HTML data returned in query

I have the following SQL query....
select AanID as '#name', '<![CDATA[' + Answer + ']]>' from AuditAnswers for XML PATH('str'), ROOT('root')
which works wonderfully but the column 'Answer' can sometimes have HTML markup in it. The query automatically escapes this HTML from the 'Answer' column in the generated XML. I don't want that. I will be wrapping this resulting column in CDATA so the escaping is not necessary.
I want the result to be this...
<str name="2"><![CDATA[<DIV><DIV Style="width:55%;float:left;">Indsfgsdfg]]></str>
instead of this...
<str name="2"><![CDATA[<DIV><DIV Style="width:55%;float:left;">In</str>
Is there a function or other mechanism to do this?
Selecting anything "FOR XML" escapes any pre-existing XML so that it will not break the consistency of the XmlDocument. The first example line you gave is considered to be improperly formed XML, and will not be able to be loaded by an XmlDocument object, as well as most parsers. I would consider restructuring what you're trying to do so that you can have a more efficient solution.
You can use for xml explicit and the cdata directive:
select
1 as tag,
null as parent,
AanID as [str!1!name],
Answer as [str!1!!cdata]
from AuditAnswers
for xml explicit
You can specify that the output be treated as CDATA when using EXPLICIT mode XML queries. See:
Using EXPLICIT Mode
and
Example: Specifying the CDATA Directive
What would be the benefit of having <[CDATA[ <div></div> ]]> over having <div></div> in your database output? To me, it looks like you would have a properly escaped HTML fragment in your XML output in both cases, and reading it back with a decent XML parser should give you the unescaped original version in both cases.