How to validate xml:lang ATTLIST inside XML with DTD? - xml-validation

Many articles on the internet (like this one) suggest using xml:lang or some custom attribute to encode meta-information about language inside XML tags. They mention that these codes have to comply with BCP47 standard.
Let's see what would happen if I encode language attribute as articles suggest:
Inside DTD: <!ATTLIST text xml:lang NMTOKEN #IMPLIED>
Inside XML: <text xml:lang="YODU991Yklew-e-ijsw02ijwk">...</text>
What is the expected result?
DTD validator would check if YODU991Yklew-e-ijsw02ijwk code is a real BCP47 language code, if country and script exist and mark it red, if those codes that are incorrect. Exactly the same way as http://schneegans.de/ helps validating these codes (WRONG code vs. CORRECT code).
What happens instead?
Validator percieves this attribute only as some text and does not validate, if it as a real language code or some gibberish.

Related

How can I Map from XML HREF+Text to Word Document Hyperlink?

I Have a simple XML file e.g.:
<root>
<some_link_href>http://www.microsoft.com/</some_link_href>
<some_link_text>goto microsoft site</some_link_text>
</root>
I have setup XmlMapping, but I can't seem to get a Hyperlink.
I tried using richTextControl and and providing RTF text
(e.g. '{\rtf1\pc some \b BOLD \b0 text}') but it just shown the raw RTF.
Note: even though I'd like to have a Text with HREF, I can settle for clickable URI
Is there any other control? other good methods to use?
also posted # https://learn.microsoft.com/en-us/answers/questions/720964/place-link-on-winword-from-xmlmapping.html

Allow custom attribute to a script tag element?

I can't validate script tag with attribute nomodule.
I am using odoo framework which is a python backend. It is using lxml to validate xml views or pages. I am building a view with a script tag like:
<script src="src.js" nomodule></script>
It returns an error
lxml.etree.XMLSyntaxError: Specification mandate value for attribute nomodule
However this should be valid according to https://developer.mozilla.org/en-US/docs/Web/HTML/Element/script
Is there a way so that I can make the parser ignore this new attribute or I can bypass such as special data or character.
That's possibly because XML != HTML. And as you can see in the error, it's an XML error.
Is an xml attribute without a value, valid? --> your attribute isn't valid.
You need to specify the attribute value always in xml. Odoo uses xml to produce html, so you need to comply with xml rules. You can do it in this case by specifying an empty value for xml attribute like this:
<script src="src.js" nomodule=””></script>

Sitecore 7 Lucene: strip HTML from computed field

I am pasting together all "paragraph" child nodes from an "article" node in a computed field. This is to achieve that an article can be searched & found by its paragraph contents.
To achieve this, I did the following, under the <fields hint="raw:AddComputedIndexField"> node:
<field fieldName="Paragraphs" storageType="YES" indexType="TOKENIZED">
MyWebsite.ComputedFields.Paragraphs,MyWebsite
</field>
In this computed field, I concat the paragraph HTML bodies together.
I was assuming Sitecore would strip the HTML for me (like it does for rich text fields), but it does noet.
For "rich text" fields, it is probably the RichTextFieldReader that strips the HTML tags out. Decompiling the code confirms this.
The RichTextFieldReader is configured in the FieldReaders section. Trying to add a raw:AddFieldReaderByFieldNamesection below, does not seem to do anything.
The full section looks as follows, but does not work in this setup:
<FieldReaders type="Sitecore.ContentSearch.FieldReaders.FieldReaderMap, Sitecore.ContentSearch">
<mapFieldByTypeName hint="raw:AddFieldReaderByFieldTypeName">
....default stuff here...
</mapFieldByTypeName>
<mapFieldByFieldName hint="raw:AddFieldReaderByFieldName">
<fieldReader fieldName="Paragraphs" fieldReaderType="Sitecore.ContentSearch.FieldReaders.RichTextFieldReader, Sitecore.ContentSearch"></fieldReader>
</mapFieldByFieldName>
</FieldReaders>
Any other clues on how to achieve this (by config, not by using HTML agility pack etc)
The problem is the mapFieldByFieldName is expecting to match a field with that name from the Sitecore item, not a custom computed field in your index so the field reader is never called.
I don't know how to achieve this from config, but if you do not want to directly use HAP but are willing to use some code then after you paste your fields together in your computed field class just do what Sitecore does in the GetPlainText() method:
string input = "concatenated string";
return HttpUtility.HtmlDecode(Regex.Replace(input, "<[^>]*>", string.Empty));
or use the util method Sitecore.StringUtil.RemoveTags(text)

Get an XML Element via XPath when attributes are irrelevant

I'm looking for a way to receive a XML Element (the id of an entry) from a YouTube feed (e.g. http://gdata.youtube.com/feeds/api/users/USERNAME/uploads).
The feed looks like this:
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:gd="http://schemas.google.com/g/2005" xmlns:yt="http://gdata.youtube.com/schemas/2007" gd:etag="W/"DUcFQncyfCp7I2A9WhVUFE4."">
<id>tag:youtube.com,2008:user:USERNAME:uploads</id>
<updated>2012-05-19T14:16:53.994Z</updated>
...
<entry gd:etag="W/"DE8NSX47eCp7I2A9WhVUFE4."">
<id>tag:youtube.com,2008:video:MfPpj7f6Jj0</id>
<published>2012-05-18T13:30:38.000Z</published>
...
I want to get the first tag in entry (tag:youtube.com, 2008 ...).
After googling for some hours and looking through the GDataXML wiki, I'm clueless because neither XPath nor GData could deliver the right element.
My first guess is, they can't ignore the attributes in the feed and entry tags.
A solution using XPath would be great, but one in Objective-C is equally welcome.
You might be having an issue trying to get XPath to work because of the default namespace.
If you just want the first tag in entry, you can use this:
/*/*[name()='entry']/*[1]
If you want the first id specifically, you can use this:
/*/*[name()='entry']/*[name()='id'][1]
Also if you can use XPath 2.0, you can skip the predicate entirely and use * for the namespace prefix:
/*/*:entry/*:id[1]

XML configuration of Zend_Form: child nodes and attributes not always equal?

A set of forms (using Zend_Form) that I have been working on were causing me some headaches trying to figure out what was wrong with my XML configuration, as I kept getting unexpected HTML output for a particular INPUT element. It was supposed to be getting a default value, but nothing appeared.
It appears that the following 2 pieces of XML are not equal when used to instantiate Zend_Form:
Snippet #1:
<form>
<elements>
<test type="hidden">
<options ignore="true" value="foo"/>
</test>
</elements>
</form>
Snippet #2:
<form>
<elements>
<test type="hidden">
<options ignore="true">
<value>foo</value>
</options>
</test>
</elements>
</form>
The type of the element doesn't appear to make a difference, so it doesn't appear to be related to hidden fields.
Is this expected or not?
As it was rather quiet on here, I took a look further into the source code and documentation.
On line 259 of Zend_Config_Xml, the SimpleXMLElement object attributes are converted to a string, resulting in:
options Object of: SimpleXMLElement
#attributes Array [2]
label (string:7) I can't see this because
value (string:21) something happens to this
becoming
options (string:21) something happens to this
So, I hunted through the documentation only to find that "value" is a reserved keyword when used as an attribute in an XML file that is loaded into Zend_Config_Xml:
Example #2 Using Tag Attributes in Zend_Config_Xml
"..Zend_Config_Xml also supports two
additional ways of defining nodes in
the configuration. Both make use of
attributes. Since the extends and the
value attributes are reserved keywords
(the latter one by the second way of
using attributes), they may not be
used..."
Thus, it would appear to be "expected" according to the documentation.
I'm not entirely happy that this is a good idea though, considering "value" is an attribute of form elements.
Don't worry about this. The reserved keywords were moved to their own namespace, and the previous attributes were depricated. In Zend Framework 2.0 the non-namespaced attributes will be removed so you can use them again.