Using "is less than" character (<) in a XML document and parse it - objective-c

I need to parse a XML document in which there are conditions like the below example:
<element condition="var < 5">element name</element>
The problem is that the parser doesn't allow this 'is less than' (<) character.
I tried GDataXML -> it gives me an error saying there is an illegal character.
I also tried TBXML -> it doesn't take into account the attributes where there is this character.
I guess it's the same for other parsers.
How can I fix this?

You should replace < with <

Related

WIX How to include the equal signs and ampersand in the string table to avoid LGHT0104 error

I have a launch condition error string in String_en-US.wxl:
<WixLocalization Culture="en-us" Codepage="1252" xmlns="http://schemas.microsoft.com/wix/2006/localization">
<String Id="ERR_REQUIRED_APP_ABSENT">This product requires XXX to be on the system. Please download it from "https://knowledge.xxx.com/knowledge/llisapi.dll?func=ll&objId=59284919&objAction=browse&sort=name&viewType=1", install it and try again.</String>
</WixLocalization>
It seems having the ampersand signs (&) and the equal signs (=) cause the light error:
Strings_en-US.wxl(0,0): error LGHT0104: Not a valid localization file; detail: '=' is an unexpected token. The expected token is ';'. Line 36, position 172.
I even tried to escape them using = which is equivalent to the equal sign but it complaint about the ampersand. "How can I avoid the error?
CDATA: A CDATA section is "...a section of element content that is marked for the parser to interpret as only character data, not markup."
In this case, something like this:
<String Id="TEST1"><![CDATA[https://www.hi.com/one&two&three&v=1]]></String>
XML Escape Characters: XML escape characters are normally used for encoding special characters in XML documents. The escape character for & is & & (more) - CDATA is an alternative approach.
Links:
What characters do I need to escape in XML documents?
https://en.wikipedia.org/wiki/CDATA

How to efficiently replace special characters in an XML in Oracle SQL?

I'm parsing an xml in oracle sql.
XMLType(replace(column1,'&','<![CDATA[&]]>')) //column1 is a column name that has xml data
While parsing, I'm temporarily wrapping '&' in CDATA to prevent any xml exception. After getting rid of the exception caused by '&', I'm getting "invalid character 32 (' ') found in a Name or Nmtoken". This is because of '<' character.
E.g: <child> 40 < 50 </child> // This causes the above exception.
So I tried the below and it works.
XMLType(replace(replace(column1,'&','<![CDATA[&]]>'),'< ','<![CDATA[< ]]>'))
In the above, I'm wrapping '< '(less than symbol followed by space) in CDATA. But the above is a bit time consuming. So I'm trying to use regex to reduce the time taken.
Does anyone know how to implement the above action using regex in Oracle sql??
Input : <child> 40 & < 50 </child>
Expected Output : <child> 40 <![CDATA[&]]> <![CDATA[< ]]> 50 </child>
Note: Replacing '& ' with ampersand semicolon sometimes is leading to 'entity reference not well formed' exception. Hence I have opted to wrap in CDATA.
You can do that with a regexp like this:
select regexp_replace(sr.column1,'(&|< )','<![CDATA[\1]]>') from dual;
However, regexp_replace (and all the regexp_* functions) are often slower than using plain replace, because they do more complicated logic. So I'm not sure if it'll be faster or not.
You might already be aware, but your underlying problem here is that you're starting out with invalid XML that you're trying to fix, which is a hard problem! The ideal solution is to not have invalid XML in the first place - if possible, you should escape special characters when originally generating your XML. There are built-in functions which can do that quickly, like DBMS_XMLGEN.CONVERT or HTF.ESCAPE_SC.

awk pattern to match an XML PI at the start of a line

I have an XML document containing a number of XML Processing Instructions which are of the form:
<?cpdoc something?>
I am trying to match them in awk with the pattern
/^\<\?cpdoc/
but it's not returning anything. If I remove the ^ anchor, it works (but I have other similar PIs which don't start a line which I don't want matched).
It looks as if it's being confused by the \<\? but why is it ignoring the line-start anchor?
Don't parse XML with regex, use a proper XML/HTML parser.
theory :
According to the compiling theory, XML can't be parsed using regex based on finite state machine. Due to hierarchical construction of XML you need to use a pushdown automaton and manipulate LALR grammar using tool like YACC.
realLife©®™ everyday tool in a shell :
You can use one of the following :
xmllint
xmlstarlet
saxon-lint (my own project)
Check: Using regular expressions with HTML tags
Example using xpath :
xmllint --xpath '//processing-instruction()' file.xml
Solution by OP and explanation by Ed Morton.
It works if the less-than is not escaped, as otherwise it's a word boundary. So instead of:
\<\?
I should use literal:
<\?
This is because we can't just go escaping any character and hoping for the best, we have to know which characters are metacharacters and then escape them if we want them treated as literal.

enclosing all special characters in xml in cdata

i am trying to parse the xml file but getting exceptions since the file contains the special character. To make file to parse successfully i have to enclose the value in since there are many special characters in my xml files at different places is there any way like declaration or something so that these fields will get replaced with CDATA whicch will reduce the mannual work exerytime?
Well, either your parser is broken, or your XML.
XML has 5 predefined entities, which your parser should support, and whatever generates your XML should use them when appropiate:
< <
> >
& &
&apos; '
" "

Trying to parse non well-formed XML using NSXMLParser

I am parsing XML Data using NSXMLParser and I notice now, that the Elements can contain ALL characters, including for example a &. Since the parser is giving an error when it comes across this character I replaced every Occurence of this character.
Now I want to make sure to handle every of these characters that may cause Errors.
What are they and how do you think I should handle these characters best?
Thanks in advance!
To answer half your question, XML has 5 special characters that you may want to escape:
< -- replace with <
> -- replace with >
& -- replace with &
' -- replace with &apos;
and
" -- replace with "
Now, for the other half--how to find and replace these without also replacing all the tags, etc... Not easy, but I'd look in to regular expressions and NSRegularExpression: http://developer.apple.com/library/ios/#documentation/Foundation/Reference/NSRegularExpression_Class/Reference/Reference.html
Remember, depending on your use case, to escape the values of the parameters on tags, too; <tag parameter="with "quotes"" />
You should encode these characters for instance & becomes & or " becomes "
When it goes through the parser it should come out ok. Your other option is to use a different XML parser like TBXML which doesn't do format checking.